[특허]Sampling based elimination of duplicate data

Sampling based elimination of duplicate data 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	H04N-007/18 H03M-007/00 H04N-019/20 H04N-019/23 H04N-019/25
출원번호	US-0443650 (2012-04-10)
등록번호	US-9344112 (2016-05-17)
발명자 / 주소	Zheng, Ling Stager, Roger Johnston, Craig Trimmer, Don Frandzel, Yuval
출원인 / 주소	Zheng, Ling
대리인 / 주소	Gilliam IP PLLC
인용정보	피인용 횟수 : 0 인용 특허 : 44

초록 ▼

A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.

대표청구항 ▼

1. A method for removing duplicate data stored on a storage system, the method comprising: performing an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication;determining a number of consecutive bits or bytes of data that match between the first data set and a second data set forwards and backwards from the identified anchor; andreplacing the matching data in the first data set with an indication of the second data set, the anchor, and the number of matching bits or bytes forwards from the anchor and the number of matching bits or bytes backwards from the anchor. 2. The method of claim 1 wherein the operation comprises a rolling hash on the first data set. 3. The method of claim 1 further comprising: determining that the identified anchor already exists within an anchor data store before determining the matching data between the first data set and the second data set;determining that a second anchor identified from the operation on the first data set does not already exist in the anchor data store; andin response to determining that the second anchor does not already exist in the anchor data store, storing the second anchor in the anchor data store. 4. The method of claim 1 wherein the second data set is stored in a pattern data store. 5. The method of claim 1 further comprising forming an anchor hierarchy by performing an operation on a plurality of adjacent anchors within the first data set. 6. The method of claim 5 wherein the operation on the plurality of adjacent anchors comprises a hash. 7. The method of claim 1 wherein the first data set comprises a backup data stream. 8. A system configured to remove duplicate data, the system comprising: a processor;a computer readable medium comprising program code stored therein, the program code executable by the processor to cause the system to,identify an anchor within a first data set, wherein the anchor defines a starting point in a first region of the first data set for potential data de-duplication;determine whether the identified anchor exists within a data store storing a plurality of anchors;in response to determining that the anchor exists within the data store, perform a data comparison between the first data set and a second data set forwards from the anchor and backwards from the anchor to determine a forwards delta value and a backwards delta value; andreplace matching data in the first data set with an indication of the second data set, an indication of the anchor, the forwards delta value, and the backwards delta value. 9. The system of claim 8 wherein the backwards delta value comprises a number of consecutive bits or bytes backwards from the anchor that match between the first and second data sets and the forwards delta value comprises a number of consecutive bits or bytes forwards from the anchor that match between the first and second data sets. 10. The system of claim 8 further comprising a pattern data store configured to store the second data set. 11. The system of claim 8 wherein the program code to identify the anchor comprises program code executable by the processor to cause the system to place the anchor at a predefined location within the first data set. 12. The system of claim 8 wherein the first data set comprises a backup data stream. 13. The system of claim 8 wherein the program code to identify the anchor comprises program code executable by the processor to perform a rolling hash on the first data set. 14. A non-transitory computer readable medium comprising program instructions for data de-duplication, the program instructions: program instructions that perform an operation on a first data set to identify an anchor within the first data set, wherein the anchor defines a starting point within a first region of the first data set for potential data de-duplication;determine consecutive data forwards from the anchor that matches consecutive data forwards from the anchor in a second data set and consecutive data backwards from the anchor in the first data set that matches consecutive data backwards from the anchor in the second data set, wherein the consecutive forwards matching data is represented with a forwards delta value and the consecutive backwards matching data is represented with a backwards delta value; andreplace the matching data in the first data set with an indication of the anchor, the second data set, the forwards delta value, and the backwards delta value. 15. The non-transitory computer readable medium of claim 14 further comprising program instructions that determine whether the identified anchor exists within an anchor data store. 16. The non-transitory computer readable medium of claim 14 further comprising program instructions that perform a rolling hash on the first data set to identify the anchor within the first data set. 17. The method of claim 1 further comprising: identifying a second anchor within the first data set from performing the operation, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;determining a number of consecutive bits or bytes of data forwards from the second anchor that match between the first and the second data sets and a number of consecutive bits or bytes of data backwards from the second anchor that match between the first and the second data sets; andreplacing, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the number of consecutive bits or bytes of data forwards from the second anchor that match between the first and the second data sets, and the number of consecutive bits or bytes of data backwards from the second anchor that match between the first and the second data sets. 18. The system of claim 8, wherein the computer readable medium further comprises program code executable by the processor to cause the system to: identify a second anchor within the first data set, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;perform a data comparison between the first and the second data sets forwards from the second anchor to determine a forwards delta value and backwards from the second anchor to determine a backwards delta value; andreplace, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the forwards delta value, and the backwards delta value. 19. The non-transitory computer readable medium of claim 14 further comprising program instructions to: identify a second anchor within the first data set with the operation, wherein the second anchor defines a starting point in a second region of the first data set for potential data de-duplication;determine consecutive data forwards from the second anchor that matches consecutive data forwards from the second anchor in the second data set and consecutive data backwards from the anchor in the first data set that matches consecutive data backwards from the anchor in the second data set, wherein the consecutive forwards matching data is represented with a forwards delta value and the consecutive backwards matching data is represented with a backwards delta value; andreplace, in the first data set, the matching data with respect to the second anchor with an indication of the second anchor, the second data set, the forwards delta value, and the backwards delta value.

이 특허에 인용된 특허 (44)

Fair,Robert L., Adaptive file readahead technique for multiple read streams.
상세보기
Clifton Richard J. ; Chatterjee Sanjoy ; Larson John P. ; Richart Joseph R. ; Sagan Cyril E., Apparatus and method for backup of a disk storage system.
상세보기
Gutierrez Bill (3428 Belmont Ave. El Cerrito CA 94530), Articulated blade with automatic pitch and camber control.
상세보기
Simoens Anthony (Vedrin BEX), Compatibilized compositions comprising a polyamide and polypropylene and adhesive composites containing these compositio.
상세보기
Saboe Michael S. (Trumbell CT) Goldblatt Barry (Orange CT), Composite ceramic/metallic turbine blade and method of making same.
상세보기
Dalal, Chirag Deepak; Pendharkar, Niranjan S., Conversion between full-data and space-saving snapshots.
상세보기
Hitz, David; Malcolm, Michael; Lau, James; Rakitzis, Byron, Copy on write file system consistency and block usage.
상세보기
Margolus,Norman H.; Knight, Jr.,Thomas F.; Pratt,Gill A., Data repository and method for promoting network storage of data.
상세보기
Koning,G. Paul; Hayden,Peter C.; Long,Paula; Lee,Hsin H., Distributed snapshot process.
상세보기
Belsan Jay S. (Nederland CO) Rudeseal George A. (Boulder CO) Milligan Charles A. (Golden CO), Dynamically mapped data storage subsystem having multiple open destage cylinders and method of managing that subsystem.
상세보기
Kahn,Andy C.; Patel,Kayuri; Chen,Raymond C.; Edwards,John K., File folding technique.
상세보기
Milligan Charles A. (Golden CO) Rudeseal George A. (Boulder CO), Logical track write scheduling system for a parallel disk drive array data storage subsystem.
상세보기
Allen Bruce S. (Willow St. East Kingston NH 03827) Dunalvey Michael R. (276 Harris Ave. Needham MA 02192) King Bruce A. (R.F.D. 2 Bolton MA 01740) DuPrie Harold J. (57 High St. ; Apt. 1B Andover MA 0, Man machine interface.
상세보기
Albornoz,Jordi; Feigenbaum,Lee D.; Martin,Sean J.; Martin,Simon L.; McCullough,Lonnie A.; Torres,Elias, Management and recovery of data object annotations using digital fingerprinting.
상세보기
Dorward, Sean Matthew; Quinlan, Sean, Method and apparatus for archival data storage.
상세보기
Ralph Shnelvar, Method and apparatus for storing information in a data processing system.
상세보기
Gonzalez, Cesar A., Method and apparatus for tape library emulation.
상세보기
Kolavi,Shashi Kumar M, Method and system for value-based data compression.
상세보기
Hitz David ; Malcolm Michael ; Lau James ; Rakitzis Byron, Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file s.
상세보기
Williams Ross Neil,AUX, Method for partitioning a block of data into subblocks and for storing and communcating such subblocks.
상세보기
Brunk,Hugh L.; Levy,Kenneth L., Method, apparatus and programs for generating and utilizing content signatures.
상세보기
Woytowitz,Peter J., Modular irrigation controller with separate field valve line wiring terminals.
상세보기
Brown George L. (3285 Sprig Dr. Benecia CA 94510) Hales Paul (Heron\s View ; Norman\s Bay East Sussex BN24 6PU GBX), Oscillating, lateral thrust power generator.
상세보기
Rangi ; Rajindar S. ; South ; Peter, Overspeed spoilers for vertical axis wind turbine.
상세보기
Row Edward J. (Mountain View CA) Boucher Laurence B. (Saratoga CA) Pitts William M. (Los Altos CA) Blightman Stephen E. (San Jose CA), Parallel I/O network file server architecture.
상세보기
Anglin, Matthew J.; Cannon, David M.; Dawson, Colin S.; Martin, Howard N., Policy based tiered data deduplication strategy.
상세보기
Hobbs, David Victor; Ratto, Patrick; Dorey, legal representative, Debra, Progressive block encoding using region analysis.
상세보기
Dewitt Frederick J. ; McGuire Thomas D., Storage optimizing encoder and method.
상세보기
Hillis W. Daniel (Cambridge MA) Liu Clement K. (Brighton MA), Storage system using multiple independently mechanically-driven storage units.
상세보기
Moody, II, William H.; Sims, Robert, System and method for controlling access to media libraries.
상세보기
Black,Cameron; Schmidt,Ross A.; Brockway,Sean M.; Craig,Robert M.; Partington,Todd, System and method for data management.
상세보기
Jeffrey L. Grummon ; Chris R. Franklin, System and method for disk control with snapshot feature including read-write snapshot half.
상세보기
Dice David, System and method for efficiently implementing an authenticated communications channel that facilitates tamper detection.
상세보기
Svarcas,Rimas; Manley,Stephen L., System and method for fault-tolerant synchronization of replica updates for fixed persistent consistency point image consumption.
상세보기
Patterson,Hugo; Skardal,Harald I.; Manley,Stephen L., System and method for managing a plurality of snapshots.
상세보기
Franklin Chris, System and method for real-time data backup using snapshot copying with selective compaction of backup data.
상세보기
Chen,Raymond C.; Manley,Stephen L., System and method for redirecting access to a remote mirrored snapshot.
상세보기
Manley,Stephen L.; Chen,Raymond C.; Edwards,John K., System and method for storage of snapshot metadata in a remote file.
상세보기
Matze John E. G. ; Whiting Douglas L., System for backing up computer disk volumes with error remapping of flawed memory addresses.
상세보기
Puri, Atul; Civanlar, Mehmet Reha, System for content adaptive video decoding.
상세보기
Foster Richard D. (Poughkeepsie NY) McCaulley Ellory K. (Boulder CO), Version management system using pointers shared by a plurality of versions for indicating active lines of a version.
상세보기
Smith Raoul D. (Lompoc CA), Wind motor.
상세보기
Grose David L. (Wichita KS), Wind powered apparatus.
상세보기
Lippert ; Jr. Joseph (Locust Valley NY), Wind rotor automatic air brake.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Sampling based elimination of duplicate data 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (44)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Sampling based elimination of duplicate data 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (44)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트