[특허]Calculating deduplication digests for a synthetic backup by a deduplication storage system

Calculating deduplication digests for a synthetic backup by a deduplication storage system 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/30 G06F-011/14
출원번호	US-0691787 (2015-04-21)
등록번호	US-9575983 (2017-02-21)
발명자 / 주소	Aronovich, Lior Hirsch, Michael Toaff, Yair
출원인 / 주소	INTERNATIONAL BUSINESS MACHINES CORPORATION
대리인 / 주소	Griffiths & Seaton PLLC
인용정보	피인용 횟수 : 0 인용 특허 : 31

초록 ▼

Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup and the deduplication digests are stored in a digests index. When new backup data is processed, deduplication digests of the new data are calculated and searched in the digests index. A data segment of the synthetic backup is partitioned into fixed sized sub-segments. The calculated digests of sub-segment are aggregated to produce the deduplication digest, and the deduplication digest is formed for the synthetic backup.

대표청구항 ▼

1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising: constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;locating stored data segments referenced by the synthetic backup;calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments;partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;aggregating the calculated digests of the synthetic backup sub-segments;forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;calculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment;calculating a threshold digest value from the retrieved deduplication digests;calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment;arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments; andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 2. The method of claim 1, further including specifying, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 3. The method of claim 1, further including processing each of the plurality of metadata instructions by each of: locating stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 4. A system for calculating deduplication digests for a synthetic backup by a deduplication storage system, comprising: the deduplication storage system; andat least one processor device, operable in the deduplication computing storage environment, wherein the at least one processor device: constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application,locates stored data segments referenced by the synthetic backup,calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments,partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments,aggregates the calculated digests of the synthetic backup sub-segments,forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup,calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment,calculates a threshold digest value from the retrieved deduplication digests,calculates a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranges digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculates the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 5. The system of claim 4, wherein the at least one processor device specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 6. The system of claim 4, wherein the at least one processor device processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 7. A computer program product calculating deduplication digests for a synthetic backup by a deduplication storage system using at least one processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: a first executable portion that constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application;a second executable portion that locates stored data segments referenced by the synthetic backup;a third executable portion that calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments;a fourth executable portion that partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;a fifth executable portion that aggregates the calculated digests of the synthetic backup sub-segments;a sixth executable portion that forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;a seventh executable portion that calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment; andan eighth executable portion that performs each of: calculating a threshold digest value from the retrieved deduplication digests,calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 8. The computer program product of claim 7, further including a ninth executable portion that specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 9. The computer program product of claim 7, further including a ninth executable portion that processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created.

이 특허에 인용된 특허 (31)

Nakao, Yoshio, Apparatus and method for generating digest according to hierarchical structure of topic.
상세보기
Yuval Ofek ; Zoran Cakeljic ; Samuel Krikler IL; Sharon Galtzur IL; Michael Hirsch IL; Dan Arnon ; Peter Kamvysselis, Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size.
상세보기
Parab, Nitin, Cloud synthetic backups.
상세보기
McCanne, Steven; Demmer, Michael J., Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation.
상세보기
Laffin, Aaron Wallace, Creating synthetic backup images on a remote computer system.
상세보기
Anglin, Matthew Joseph, Data deduplication by separating data from meta data.
상세보기
Farber, David A.; Lachman, Ronald D., De-duplication of data in a data processing system.
상세보기
Arora, Gurjeet S.; Bassov, Ivan; Faibish, Sorin; Sezer, Ugur, Efficient backup and restore of storage objects in a version set.
상세보기
Zhu,Ming Benjamin; Li,Kai; Patterson,R. Hugo, Efficient data storage system.
상세보기
David A. Farber ; Ronald D. Lachman, Identifying and requesting data in network using identifiers which are based on contents of data.
상세보기
Vaikar, Amol Manohar, Method and apparatus for continuous data protection.
상세보기
McGrattan, Emma K.; Ball, Stephen; Moucaddem, Sami R.; Rivet, Jean-Francois; Kuo, Chin L.; Yang, Frank H., Method and apparatus for data backup using data blocks.
상세보기
Warnock, Christopher; Abrams, Ken; Holzgrafe, Rick, Method and apparatus for improved information transactions.
상세보기
Ralph Shnelvar, Method and apparatus for storing information in a data processing system.
상세보기
Van Ingen, Catharine; Berkowitz, Brian T., Method and system for synthetic backup and restore.
상세보기
Monga, Vishal, Method for identifying images under distortion via noise characterization and bregman optimal matrix approximations.
상세보기
Williams Ross Neil,AUX, Method for partitioning a block of data into subblocks and for storing and communcating such subblocks.
상세보기
Stringham, Russell, Methods and systems for creating full backups.
상세보기
Doerner, Don, No touch synthetic full backup.
상세보기
Ohr, James; Zeis, Michael; Elling, Dean; Gipp, Stephan Kurt; DesJardin, William, Optimizing the de-duplication rate for a backup stream.
상세보기
Efstathopoulos, Petros; Guo, Fanglu; Shah, Dharmesh, Progressive sampling for deduplication indexing.
상세보기
Zeis, Mike; Wu, Weibao, Source classification for performing deduplication in a backup operation.
상세보기
Narayanan, Priyesh, Synthetic differential backups creation for a database using binary log conversion.
상세보기
Niles,Ronald S.; Lam,Wai, System and method for backing up data.
상세보기
Woodhill James R. (Houston TX) Woodhill Louis R. (Richmond TX) More ; Jr. William Russell (Houston TX) Berlin Jay Harris (Houston TX), System and method for distributed storage management on networked computer systems using binary object identifiers.
상세보기
Wittenberg David K. (Hudson MA) Leichter Jerrold S. (Stamford CT), System for controlling access to a secure system by verifying acceptability of proposed password by using hashing and gr.
상세보기
Elling, Dean; Laffin, Aaron; Zhang, Xianbo; Zeis, Mike, Systems and methods for creating reference-based synthetic backups.
상세보기
Hirsch, Michael; Bitner, Haim; Aronovich, Lior; Asher, Ron; Bachmat, Eitan; Klein, Shmuel T., Systems and methods for efficient data searching, storage and reduction.
상세보기
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
상세보기
Manson, Carl R., Tracking files which have been processed by a backup or a restore operation.
상세보기
Jordan, Kevin, Utilizing peer-to-peer services with single instance storage techniques.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Calculating deduplication digests for a synthetic backup by a deduplication storage system 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (31)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Calculating deduplication digests for a synthetic backup by a deduplication storage system 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (31)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트