A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anch
A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.
대표청구항▼
1. A method for removing duplicate data stored on a storage system, the method comprising: performing an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplicatio
1. A method for removing duplicate data stored on a storage system, the method comprising: performing an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication;determining a number of consecutive bits or bytes of data that match between the first data set and a second data set forwards and backwards from the identified anchor; andreplacing the matching data in the first data set with an indication of the second data set, the anchor, and the number of matching bits or bytes forwards from the anchor and the number of matching bits or bytes backwards from the anchor. 2. The method of claim 1 wherein the operation comprises a rolling hash on the first data set. 3. The method of claim 1 further comprising: determining that the identified anchor already exists within an anchor data store before determining the matching data between the first data set and the second data set;determining that a second anchor identified from the operation on the first data set does not already exist in the anchor data store; andin response to determining that the second anchor does not already exist in the anchor data store, storing the second anchor in the anchor data store. 4. The method of claim 1 wherein the second data set is stored in a pattern data store. 5. The method of claim 1 further comprising forming an anchor hierarchy by performing an operation on a plurality of adjacent anchors within the first data set. 6. The method of claim 5 wherein the operation on the plurality of adjacent anchors comprises a hash. 7. The method of claim 1 wherein the first data set comprises a backup data stream. 8. A system configured to remove duplicate data, the system comprising: a processor;a computer readable medium comprising program code stored therein, the program code executable by the processor to cause the system to,identify an anchor within a first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication;determine whether the identified anchor exists within a data store storing a plurality of anchors;in response to determining that the anchor exists within the data store, perform a data comparison between the first data set and a second data set forwards from the anchor and backwards from the anchor to determine a forwards delta value and a backwards delta value; andreplace matching data in the first data set with an indication of the second data set, an indication of the anchor, the forwards delta value, and the backwards delta value. 9. The system of claim 8 wherein the backwards delta value comprises a number of consecutive bits or bytes backwards from the anchor that match between the first and second data sets and the forwards delta value comprises a number of consecutive bits or bytes forwards from the anchor that match between the first and second data sets. 10. The system of claim 8 further comprising a pattern data store configured to store the second data set. 11. The system of claim 8 wherein the program code to identify the anchor comprises program code executable by the processor to cause the system to place the anchor at a predefined location within the first data set. 12. The system of claim 8 wherein the first data set comprises a backup data stream. 13. The system of claim 8 wherein the program code to identify the anchor comprises program code executable by the processor to perform a rolling hash on the first data set. 14. A non-transitory computer readable medium comprising program instructions for data de-duplication, the program instructions: program instructions that perform an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point within a first region of the first data set for potential data de-duplication;determine consecutive data forwards from the anchor that matches consecutive data forwards from the anchor in a second data set and consecutive data backwards from the anchor in the first data set that matches consecutive data backwards from the anchor in the second data set, wherein the consecutive forwards matching data is represented with a forwards delta value and the consecutive backwards matching data is represented with a backwards delta value; andreplace the matching data in the first data set with an indication of the anchor, the second data set, the forwards delta value, and the backwards delta value. 15. The non-transitory computer readable medium of claim 14 further comprising program instructions that determine whether the identified anchor exists within an anchor data store. 16. The non-transitory computer readable medium of claim 14 further comprising program instructions that perform a rolling hash on the first data set to identify the anchor within the first data set. 17. The method of claim 1 further comprising: identifying a second anchor within the first data set from performing the operation, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;determining a number of consecutive bits or bytes of data forwards from the second anchor that match between the first and the second data sets and a number of consecutive bits or bytes of data backwards from the second anchor that match between the first and the second data sets; andreplacing, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the number of consecutive bits or bytes of data forwards from the second anchor that match between the first and the second data sets, and the number of consecutive bits or bytes of data backwards from the second anchor that match between the first and the second data sets. 18. The system of claim 8, wherein the computer readable medium further comprises program code executable by the processor to cause the system to: identify a second anchor within the first data set, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;perform a data comparison between the first and the second data sets forwards from the second anchor to determine a forwards delta value and backwards from the second anchor to determine a backwards delta value; andreplace, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the forwards delta value, and the backwards delta value. 19. The non-transitory computer readable medium of claim 14 further comprising program instructions to: identify a second anchor within the first data set with the operation, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;determine consecutive data forwards from the second anchor that matches consecutive data forwards from the second anchor in the second data set and consecutive data backwards from the anchor in the first data set that matches consecutive data backwards from the anchor in the second data set, wherein the consecutive forwards matching data is represented with a forwards delta value and the consecutive backwards matching data is represented with a backwards delta value; andreplace, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the forwards delta value, and the backwards delta value.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (44)
Fair,Robert L., Adaptive file readahead technique for multiple read streams.
Clifton Richard J. ; Chatterjee Sanjoy ; Larson John P. ; Richart Joseph R. ; Sagan Cyril E., Apparatus and method for backup of a disk storage system.
Simoens Anthony (Vedrin BEX), Compatibilized compositions comprising a polyamide and polypropylene and adhesive composites containing these compositio.
Belsan Jay S. (Nederland CO) Rudeseal George A. (Boulder CO) Milligan Charles A. (Golden CO), Dynamically mapped data storage subsystem having multiple open destage cylinders and method of managing that subsystem.
Milligan Charles A. (Golden CO) Rudeseal George A. (Boulder CO), Logical track write scheduling system for a parallel disk drive array data storage subsystem.
Allen Bruce S. (Willow St. East Kingston NH 03827) Dunalvey Michael R. (276 Harris Ave. Needham MA 02192) King Bruce A. (R.F.D. 2 Bolton MA 01740) DuPrie Harold J. (57 High St. ; Apt. 1B Andover MA 0, Man machine interface.
Albornoz,Jordi; Feigenbaum,Lee D.; Martin,Sean J.; Martin,Simon L.; McCullough,Lonnie A.; Torres,Elias, Management and recovery of data object annotations using digital fingerprinting.
Hitz David ; Malcolm Michael ; Lau James ; Rakitzis Byron, Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file s.
Brown George L. (3285 Sprig Dr. Benecia CA 94510) Hales Paul (Heron\s View ; Norman\s Bay East Sussex BN24 6PU GBX), Oscillating, lateral thrust power generator.
Row Edward J. (Mountain View CA) Boucher Laurence B. (Saratoga CA) Pitts William M. (Los Altos CA) Blightman Stephen E. (San Jose CA), Parallel I/O network file server architecture.
Svarcas,Rimas; Manley,Stephen L., System and method for fault-tolerant synchronization of replica updates for fixed persistent consistency point image consumption.
Foster Richard D. (Poughkeepsie NY) McCaulley Ellory K. (Boulder CO), Version management system using pointers shared by a plurality of versions for indicating active lines of a version.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.