Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.
대표청구항▼
1. A system for processing data, comprising: a delta compression system comprising a primary system with a primary system storage device and a processor, and a replica system with a replica system storage device and a processor, wherein the processor of the primary system is configured for: transmit
1. A system for processing data, comprising: a delta compression system comprising a primary system with a primary system storage device and a processor, and a replica system with a replica system storage device and a processor, wherein the processor of the primary system is configured for: transmitting a data segment ID of a second data segment and a sketch of the second data segment to the replica system;checking the replica system for a sketch match using the sketch of the second data segment, wherein the sketch match comprises a first data segment on the replica system that is similar to the second data segment, wherein the first data segment on the replica system is determined to be similar to the second data segment when a predetermined fraction of the one or more sketch data of the segments are identical;in the event the replica system has a sketch match, receiving a data segment ID of the first data segment that is similar to the second data segment on the replica system;checking the primary system storage for the data segment ID of the first data segment to determine whether the primary system storage also has the first data segment;in the event that the primary system storage has the first data segment, indicating the primary system storage and the replica systems both have an identical data segment that is similar to the second data segment;encoding the second data segment, wherein the encoding of the second data segment refers to the first data segment, wherein the encoding comprises determining a difference between the second data segment and the first data segment, and wherein the encoding comprises data from the second segment comprising the determined difference between the second data segment and the first data segment that is on the primary system, and a reference to the first data segment that is on the replica system;transmitting the encoding of the second data segment to the replica system instead of the second data segment from the primary system, wherein the encoding of the second data segment is smaller than the second data segment, wherein the processor of the replica system is configured for:decoding the encoding of the second data segment using the first data segment in the replica system;storing the encoding of the second data segment on the replica system. 2. The system as in claim 1, wherein the processor of the replica system is configured for determining whether the replica system has a sketch match. 3. The system as in claim 1, wherein the processor of the replica system is configured for determining the replica system has a data segment that is identical to the second data segment using the transmitted data segment ID. 4. The system as in claim 3, wherein in the event the replica system has a data segment that is identical to the second data segment, the processor of the replica system is configured for storing the data segment ID of the second data segment instead of storing the second data segment. 5. The system as in claim 3, wherein determining that the replica system has a data segment that is identical to the second data segment further comprises the processor of the primary system is further configured for: determining the data segment ID associated with the second data segment; and wherein the processor the of the replica system is configured for: determining whether the data segment ID of the second data segment is identical to a previously stored ID in an ID index on the replica system. 6. The system as in claim 5, wherein determining the data segment ID associated with the second data segment uses one or more of the following: a fingerprint function, a hash function, a cryptographic hash function, and a digital signature. 7. The system as in claim 1, wherein the processor of the primary system is further configured for storing the encoding of the second data segment. 8. The system as in claim 1, wherein the processor of the primary system is further configured for not storing the encoding of the second data segment. 9. The system as in claim 1, wherein the processor of the primary system is further configured for storing an encoding of the second data segment that-references a fourth data segment, wherein the fourth data segment is not the same as the first data segment. 10. The system as in claim 1, wherein the processor of the primary system is configured for compressing the encoding of the second data segment. 11. The system as in claim 10, wherein the transmitted encoding of the second data segment to the replica system is the compressed encoding of the second data segment. 12. The system as in claim 1, wherein the processor of the replica system is configured for storing a decoding of the encoding of the second data segment. 13. The system as in claim 1, wherein the first data segment that is similar to the second data segment is determined using a sketch function. 14. The system as in claim 13, wherein the sketch function comprises a hash function. 15. The system as in claim 13, wherein the sketch function comprises a plurality of hash functions. 16. The system as in claim 13, wherein the sketch function comprises one or more functions that return a same value for similar data segments. 17. The system as in claim 13, wherein the sketch function comprises one or more functions that return a similar value for similar data segments. 18. The system as in claim 13, wherein the sketch function comprises one or more functions that may return a same value for similar data segments. 19. The system as in claim 13, wherein the sketch function comprises one or more functions that may return a similar value for similar data segments. 20. The system as in claim 19, wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, or nearest-neighbor-search. 21. The system as in claim 13, wherein the first data segment is identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, or frequency of selection for other compressed segments. 22. The system as in claim 1, wherein the second data segment is similar to one or more data segments on both the primary and replica systems in addition to the first data segment. 23. The system as in claim 22, wherein the processor of the primary system is further configured for computing an encoding of the second data segment based at least in part on the first data segment and the one or more additional data segments. 24. The system as in claim 1, wherein the second data segment was stored as an encoding of a fourth data segment. 25. The system as in claim 1, wherein the processor of the replica system is configured for: receiving the encoding of the second data segment on the replica system. 26. A method for processing data in a delta compression system, comprising: transmitting from a primary system a data segment ID of a second data segment and a sketch of the second data segment to a replica system;checking the replica system for a sketch match using the sketch of the second data segment, wherein the sketch match comprises a first data segment on the replica system that is similar to the second data segment, wherein the first data segment on the replica system is determined to be similiar to the second data segment when a predetermined fraction of the one or more sketch data of the segments are identical;in the event the replica system has a sketch match, receiving a data segment ID of the first data segment that is similar to the second data segment on the replica system;checking the primary system storage for the data segment ID of the first data segment to determine whether the primary system storage also has the first data segment;in the event that the primary system storage has the first data segment, indicating the primary system storage and the replica systems both have an identical data segment that is similar to the second data segment;encoding, using a processor of the primary system, a second data segment, wherein the encoding of the second data segment refers to the first data segment, wherein the encoding comprises determining a difference between the second data segment and the first data segment, and wherein the encoding comprises data from the second segment comprising the determined difference between the second data segment and the first data segment that is on the primary system, and a reference to the first data segment that is on the replica system;transmitting the encoding of the second data segment to the replica system instead of the second data segment from the primary system, wherein the encoding of the second data segment is smaller than the second data segment, wherein the processor of the replica system is configured for: decoding the encoding of the second data segment using the first data segment in the replica system;storing the encoding of the second data segment on the replica system. 27. The method as in claim 26, wherein the checking whether the replica system for a sketch match is done by a processor of the replica system. 28. The method as in claim 26, further comprising determining, using a processor of the replica system, that the replica system has a data segment that is identical to a the second data segment using the transmitted data segment ID. 29. The method as in claim 28, further comprising in the event the replica system has a data segment that is identical to the second data segment, storing, using the processor of the primary system, the data segment ID of the second data segment instead of storing the second data segment. 30. The method as in claim 28, wherein determining that the replica system has a data segment that is identical to the second data segment comprises: determining, using the processor of the primary system, a data segment ID associated with the second data segment;determining, using the processor of the replica system, whether the data segment ID of the second data segment is identical to a previously stored ID in an ID index on the replica system. 31. The method as in claim 30, wherein determining the first data segment ID associated with the second data segment uses one or more of the following: a fingerprint function, a hash function, a cryptographic hash function, and a digital signature. 32. The method as in claim 26, further comprising storing the encoding of the second data segment on the primary system. 33. The method as in claim 26, further comprising not storing the encoding of the second data segment on the primary system. 34. The method as in claim 26, further comprising storing on the primary system an encoding of the second data segment that references a fourth data segment, wherein the fourth data segment is not the same as the first data segment. 35. The method as in claim 26, further comprising compressing the encoding of the second data segment. 36. The method as in claim 35, further comprising transmitting the compressed encoding of the second data segment. 37. The method as in claim 26, further comprising storing, using a processor of the replica system, a decoding of the encoding of the second data segment. 38. The method as in claim 26, wherein the first data segment that is similar to the second data segment is determined using a sketch function. 39. The method as in claim 38, wherein the sketch function comprises a hash function. 40. The method as in claim 38, wherein the sketch function comprises a plurality of hash functions. 41. The method as in claim 38, wherein the sketch function comprises one or more functions that return the same value for similar data segments. 42. The method as in claim 38, wherein the sketch function comprises one or more functions that return a similar value for similar data segments. 43. The method as in claim 38, wherein the sketch function comprises one or more functions that may return a same value for similar data segments. 44. The method as in claim 38, wherein the sketch function comprises one or more functions that may return a similar value for similar data segments. 45. The method as in claim 44, wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, or nearest-neighbor-search. 46. The method as in claim 38, wherein the first data segment is identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, or frequency of selection for other compressed segments. 47. The method as in claim 26, wherein the second data segment is similar to one or more previous segments in addition to the first data segment. 48. The method as in claim 47, further comprising computing, using the processor of the primary system, an encoding of the second data segment based at least in part on the first data segment and the one or more additional data segments. 49. The method as in claim 26, wherein the second data segment was stored as an encoding of a fourth data segment. 50. The method as in claim 26, further comprising receiving the encoded second data segment on the replica system. 51. A computer program product for processing data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: transmitting from a primary system a data segment ID of a second data segment and a sketch of the second data segment to a replica system;checking the replica system for a sketch match using the sketch of the second data segment, wherein the sketch match comprises a first data segment on the replica system that is similar to the second data segment, wherein the first data segment on the replica system is determined to be similar to the second data segment when a predetermined fraction of the one or more sketch data of the segments are identical;in the event the replica system has a sketch match, receiving a data segment ID of the first data segment that is similar to the second data segment on the replica system;checking the primary system storage for the data segment ID of the first data segment to determine whether the primary system storage also has the first data segment;encoding, using a processor of the primary system, a second data segment, wherein the encoding of the second data segment refers to the first data segment, wherein the encoding comprises determining a difference between the second data segment and the first data segment, and wherein the encoding comprises data from the second segment comprising the determined difference between the second data segment and the first data segment that is on the primary system, and a reference to the first data segment that is on the replica system;transmitting the encoding of the second data segment to the replica system instead of the second data segment from the primary system, wherein the encoding of the second data segment is smaller than the second data segment, wherein the processor of the replica system is configured for: decoding the encoding of the second data segment using the first data segment in the replica system;storing the encoding of the second data segment on the replica system.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (12)
Farber, David A.; Lachman, Ronald D., De-duplication of data in a data processing system.
Andrei Z. Broder ; Steven C. Glassman ; Charles G. Nelson ; Mark S. Manasse ; Geoffrey G. Zweig, Method for clustering closely resembling data objects.
Miklos Ajtai ; Randal Chilton Burns ; Ronald Fagin ; Larry Joseph Stockmeyer, System and method for differential compression of data from a plurality of binary sources.
Ting, Daniel; Zheng, Ling; Manley, Stephen L.; DeStefano, John Frederick, System and method for managing data deduplication of storage systems utilizing persistent consistency point images.
Morris Robert J. T. (Los Gatos CA), System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.