Calculating deduplication digests for a synthetic backup by a deduplication storage system
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/30
G06F-011/14
출원번호
US-0691787
(2015-04-21)
등록번호
US-9575983
(2017-02-21)
발명자
/ 주소
Aronovich, Lior
Hirsch, Michael
Toaff, Yair
출원인 / 주소
INTERNATIONAL BUSINESS MACHINES CORPORATION
대리인 / 주소
Griffiths & Seaton PLLC
인용정보
피인용 횟수 :
0인용 특허 :
31
초록▼
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup and the deduplication digests are stored in a digests index. When new backup data is processed, deduplication digests of the new data are calculated and searched in the digests index. A data segment of the synthetic backup is partitioned into fixed sized sub-segments. The calculated digests of sub-segment are aggregated to produce the deduplication digest, and the deduplication digest is formed for the synthetic backup.
대표청구항▼
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising: constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;locating stored data segments referen
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising: constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;locating stored data segments referenced by the synthetic backup;calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments;partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;aggregating the calculated digests of the synthetic backup sub-segments;forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;calculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment;calculating a threshold digest value from the retrieved deduplication digests;calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment;arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments; andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 2. The method of claim 1, further including specifying, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 3. The method of claim 1, further including processing each of the plurality of metadata instructions by each of: locating stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 4. A system for calculating deduplication digests for a synthetic backup by a deduplication storage system, comprising: the deduplication storage system; andat least one processor device, operable in the deduplication computing storage environment, wherein the at least one processor device: constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application,locates stored data segments referenced by the synthetic backup,calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments,partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments,aggregates the calculated digests of the synthetic backup sub-segments,forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup,calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment,calculates a threshold digest value from the retrieved deduplication digests,calculates a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranges digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculates the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 5. The system of claim 4, wherein the at least one processor device specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 6. The system of claim 4, wherein the at least one processor device processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 7. A computer program product calculating deduplication digests for a synthetic backup by a deduplication storage system using at least one processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: a first executable portion that constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application;a second executable portion that locates stored data segments referenced by the synthetic backup;a third executable portion that calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments;a fourth executable portion that partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;a fifth executable portion that aggregates the calculated digests of the synthetic backup sub-segments;a sixth executable portion that forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;a seventh executable portion that calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment; andan eighth executable portion that performs each of: calculating a threshold digest value from the retrieved deduplication digests,calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 8. The computer program product of claim 7, further including a ninth executable portion that specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 9. The computer program product of claim 7, further including a ninth executable portion that processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (31)
Nakao, Yoshio, Apparatus and method for generating digest according to hierarchical structure of topic.
Yuval Ofek ; Zoran Cakeljic ; Samuel Krikler IL; Sharon Galtzur IL; Michael Hirsch IL; Dan Arnon ; Peter Kamvysselis, Apparatus and methods for copying, backing up, and restoring data using a backup segment size larger than the storage block size.
McCanne, Steven; Demmer, Michael J., Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation.
McGrattan, Emma K.; Ball, Stephen; Moucaddem, Sami R.; Rivet, Jean-Francois; Kuo, Chin L.; Yang, Frank H., Method and apparatus for data backup using data blocks.
Woodhill James R. (Houston TX) Woodhill Louis R. (Richmond TX) More ; Jr. William Russell (Houston TX) Berlin Jay Harris (Houston TX), System and method for distributed storage management on networked computer systems using binary object identifiers.
Wittenberg David K. (Hudson MA) Leichter Jerrold S. (Stamford CT), System for controlling access to a secure system by verifying acceptability of proposed password by using hashing and gr.
Hirsch, Michael; Bitner, Haim; Aronovich, Lior; Asher, Ron; Bachmat, Eitan; Klein, Shmuel T., Systems and methods for efficient data searching, storage and reduction.
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.