[특허]Delta compression after identity deduplication

Delta compression after identity deduplication 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/00 G06F-007/00 G06F-011/14
출원번호	US-0291998 (2008-11-14)
등록번호	US-8751462 (2014-06-10)
발명자 / 주소	Huang, Mark Lee, Edward K. Li, Kai Shilane, Philip Wallace, Grant Zhu, Ming Benjamin
출원인 / 주소	EMC Corporation
대리인 / 주소	Van Pelt, Yi & James LLP
인용정보	피인용 횟수 : 10 인용 특허 : 12

초록 ▼

Delta compression after identity deduplication is disclosed. A first data segment is determined to be identical to a first previous data segment. A second data segment, not determined to be identical to a second previous data segment, is then determined to be similar to a third previous data segment

대표청구항 ▼

1. A system for processing data, comprising: a deduplicating system, wherein the deduplicating system comprises a processor and a storage unit, wherein the deduplicating system is configured for determining, using the processor, whether a first data segment is identical to a first previously stored data segment in the storage unit, wherein in the event that the first data segment is determined to be identical to the first previously stored data segment, a reference to the first data segment is stored in the storage unit; anda delta compression system is configured for determining, only in the event that the first data segment is determined not to be identical to the first previously stored data segment, whether the first data segment is similar to a second previously stored data segment, wherein the first data segment is determined to be similar to the second previously stored data segment using a sketch function and the sketch function comprises one or more functions that returns a same value for similar data segments;wherein in the event that the first data segment is determined to be similar to the second previously stored data segment, the delta compression system is further configured for computing an encoding of the first data segment, wherein the encoding comprises determining one or more differences between the first data segment and the second previously stored data segment, and storing, in the storage unit, the first data segment using a sequence comprising the one or more differences, one or more first sequence locations corresponding to each of the one or more differences, a reference to the second previously stored data segment, and one or more second sequence references, wherein the one or more second sequence references corresponding to a sequence of data from within the second previously stored segment identifying the subset of the second previously stored segment; andwherein in the event that the first data segment is not determined to be similar to the second previously stored data segment and that the first data segment is determined not to be identical to the first stored data segment, the delta compression system is further configured for storing the first data segment in the storage unit. 2. The system as in claim 1, wherein the deduplicating system receives a data stream or data block. 3. The system as in claim 2, wherein the deduplicating system breaks the data stream or data block into a plurality of data segments. 4. The system as in claim 1, wherein determining that the first data segment is identical comprises: determining a first data segment ID associated with the first data segment;determining whether the first data segment ID is identical to a previously stored ID in an ID index. 5. The system as in claim 4, where determining the first data segment ID associated with the first data segment uses one or more of the following: a fingerprint function, a hash function, a cryptographic hash function, and a digital signature. 6. The system as in claim 1, further comprising compressing the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 7. The system as in claim 1, further comprising transmitting the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 8. The system as in claim 1, further comprising replicating the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 9. The system as in claim 1, wherein the encoding is based at least in part on the second previously stored data segment. 10. The system as in claim 1, wherein the encoding comprises an indication of a set of data blocks in the first data segment not present in the second previously stored data segment and an indication of a set of data blocks in the second previously stored data segment. 11. The system as in claim 1, wherein in the event that the first data segment is determined to be similar to the second previously stored data segment, the delta compression system further comprises determining whether the encoding is smaller than the first data segment. 12. The system as in claim 1, wherein the sketch function comprises a hash function. 13. The system as in claim 1, wherein the sketch function comprises a plurality of hash functions. 14. The system as in claim 1, wherein the sketch function comprises one or more functions that returns a similar value for similar data segments. 15. The system as in claim 1, wherein the sketch function comprises one or more functions that may return a same value for similar data segments. 16. The system as in claim 1, wherein the sketch function comprises one or more functions that may return a similar value for similar data segments. 17. The system as in claim 16, wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, or nearest-neighbor-search. 18. The system as in claim 1, wherein in the event that the first data segment is determined to be similar to the second previously stored data segment, the delta compression system further comprises determining whether the first data segment is similar to one or more previously stored segments in addition to the second previously stored data segment. 19. The system as in claim 18, wherein in the event that the first data segment is determined to be similar to the one or more previously stored data segments, the encoding is based at least in part on the second previously stored data segment and the one or more additional similar previously stored data segments. 20. The system as in claim 18, wherein the one or more previously stored data segments and second previously stored data segment are identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, or frequency of selection for other compressed segments. 21. The system as in claim 1, wherein the second previously stored data segment was stored as an encoding of a third previously stored data segment. 22. A method for processing data, comprising: determining, using a processor, whether a first data segment is identical to a first previously stored data segment, wherein in the event that the first data segment is determined to be identical to the first previously stored data segment, a reference to the first data segment is stored in a storage unit; anddetermining, only in the event that the first data segment is determined not to be identical to the first previously stored data segment, whether the first data segment is similar to a second previously stored data segment, wherein the first data segment is determined to be similar to the second previously stored data segment using a sketch function, wherein the sketch function comprises one or more functions that returns a same value for similar data segments;in the event that the first data segment is determined to be similar to the second previously stored data segment, computing an encoding of the first data segment, wherein the encoding comprises determining one or more differences between the first data segment and the second previously stored data segment, and storing, in the storage unit, the first data segment using a sequence comprising the one or more differences, one or more first sequence locations corresponding to each of the one or more differences, a reference to the second previously stored data segment, and one or more second sequence references, wherein the one or more second sequence references corresponding to a sequence of data from within the second previously stored segment identifying the subset of the second previously stored segment; andin the event that the first data segment is not determined to be similar to the second previously stored data segment and that the first data segment is determined not to be identical to the first stored data segment, storing the first data segment in the storage unit. 23. The method as in claim 22, further comprising receiving a data stream or data block. 24. The method as in claim 22, further comprising breaking the data stream or data block into a plurality of data segments. 25. The method as in claim 22, wherein determining that the first data segment is identical comprises: determining a first data segment ID associated with the first data segment;determining whether the first data segment ID is identical to a previously stored ID in an ID index. 26. The method as in claim 25, where determining the first data segment ID associated with the first data segment uses one or more of the following: a fingerprint function, a hash function, a cryptographic hash function, and a digital signature. 27. The method as in claim 22, further comprising compressing the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 28. The method as in claim 22, further comprising transmitting the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 29. The method as in claim 22, further comprising replicating the encoding of the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 30. The method as in claim 22, wherein the encoding is based at least in part on the second previously stored data segment. 31. The method as in claim 22, wherein the encoding comprises an indication of a set of data blocks in the first data segment not present in the second previously stored data segment and an indication of a set of data blocks in the second previously stored data segment. 32. The method as in claim 22, further comprising determining whether the encoding is smaller than the first data segment in the event that the first data segment is determined to be similar to the second previously stored data segment. 33. The method as in claim 22, wherein the sketch function comprises a hash function. 34. The method as in claim 22, wherein the sketch function comprises a plurality of hash functions. 35. The method as in claim 22, wherein the sketch function comprises one or more functions that returns a similar value for similar data segments. 36. The method as in claim 22, wherein the sketch function comprises one or more functions that may return the same value for similar data segments. 37. The method as in claim 22, wherein the sketch function comprises one or more functions that may return a similar value for similar data segments. 38. The method as in claim 37, wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, or nearest-neighbor-search. 39. The method as in claim 22, wherein in the event that the first data segment is determined to be similar to the second previously stored data segment, determining whether the first data segment is similar to one or more previously stored segments in addition to the second previously stored data segment. 40. The method as in claim 39, wherein the encoding is based at least in part on the second previously stored data segment and the one or more additional similar previously stored data segments. 41. The method as in claim 39, wherein the one or more previously stored data segments and second previously stored data segment are identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, or frequency of selection for other compressed segments. 42. The method as in claim 22, wherein the second previously stored data segment was stored as an encoding of a third previously stored data segment. 43. A computer program product for processing data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: determining, using a processor, whether a first data segment is identical to a first previously stored data segment, wherein in the event that the first data segment is determined to be identical to the first previously stored data segment, a reference to the first data segment is stored in a storage unit; anddetermining, only in the event that the first data segment is determined not to be identical to the first previously stored data segment, whether the first segment is similar to a second previously stored data segment, wherein the first data segment is determined to be similar to the second previously stored data segment using a sketch function and the sketch function comprises one or more functions that returns a same value for similar data segments;in the event that the first data segment is determined to be similar to the second previously stored data segment, computing an encoding of the first data segment, wherein the encoding comprises determining one or more differences between the first data segment and the second previously stored data segment, and storing, in the storage unit, the first data segment using a sequence comprising the one or more differences, one or more first sequence locations corresponding to each of the one or more differences, a reference to the second previously stored data segment, and one or more second sequence references, wherein the one or more second sequence references corresponding to a sequence of data from within the second previously stored segment identifying the subset of the second previously stored segment; andin the event that the first data segment is not determined to be similar to the second previously stored data segment and that the first data segment is determined not to be identical to the first stored data segment, storing the first data segment in the storage unit.

이 특허에 인용된 특허 (12)

Farber, David A.; Lachman, Ronald D., De-duplication of data in a data processing system.
상세보기
Pugh, William; Henzinger, Monika H., Detecting duplicate and near-duplicate files.
상세보기
Kapoor,Rahul; Ganti,Venkatesh; Chaudhuri,Surajit, Duplicate data elimination system.
상세보기
Yueh, Jedidiah, Global de-duplication in shared architectures.
상세보기
Andrei Z. Broder ; Steven C. Glassman ; Charles G. Nelson ; Mark S. Manasse ; Geoffrey G. Zweig, Method for clustering closely resembling data objects.
상세보기
Williams Ross Neil,AUX, Method for partitioning a block of data into subblocks and for storing and communcating such subblocks.
상세보기
Anglin, Matthew J.; Cannon, David M.; Dawson, Colin S.; Martin, Howard N., Policy based tiered data deduplication strategy.
상세보기
Murase, Atsushi, Power efficient storage with data de-duplication.
상세보기
Miklos Ajtai ; Randal Chilton Burns ; Ronald Fagin ; Larry Joseph Stockmeyer, System and method for differential compression of data from a plurality of binary sources.
상세보기
Jernigan, IV, Richard P., System and method for enabling de-duplication in a storage system architecture.
상세보기
Ting, Daniel; Zheng, Ling; Manley, Stephen L.; DeStefano, John Frederick, System and method for managing data deduplication of storage systems utilizing persistent consistency point images.
상세보기
Morris Robert J. T. (Los Gatos CA), System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing.
상세보기

이 특허를 인용한 특허 (10)

Kraft, Emil Burns; Milnes, Jacob Thomas, Business content authoring and distribution.
상세보기
Harnik, Danny; Sotnikov, Dmitry, Compression-based filtering for deduplication.
상세보기
Mutalik, Madhav; Provenzano, Christopher A.; Abercrombie, Philip J., Data replication system.
상세보기
Mutalik, Madhav; Provenzano, Christopher A.; Abercrombie, Philip J., Data replication system.
상세보기
Mutalik, Madhav; Palaparthi, Srikanth, Facilitating test failover using a thin provisioned virtual machine created from a snapshot.
상세보기
Provenzano, Christopher A.; Abercrombie, Philip J.; Mutalik, Madhav, Incremental copy performance between data stores.
상세보기
Provenzano, Christopher A.; Abercrombie, Philip J.; Mutalik, Madhav, Incremental copy performance between data stores.
상세보기
Hayasaka, Mitsuo; Yamasaki, Koji; Tashiro, Naomitsu, Storage device to backup content based on a deduplication system.
상세보기
Mutalik, Madhav; Abercrombie, Philip J.; Provenzano, Christopher A., Successive data fingerprinting for copy accuracy assurance.
상세보기
Mutalik, Madhav; Abercrombie, Philip J.; Provenzano, Christopher A., Successive data fingerprinting for copy accuracy assurance.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Delta compression after identity deduplication 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (12)

이 특허를 인용한 특허 (10)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Delta compression after identity deduplication 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (12)

이 특허를 인용한 특허 (10)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트