Calculating deduplication digests for a synthetic backup by a deduplication storage system
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-011/14
G06F-017/30
출원번호
US-0801785
(2013-03-13)
등록번호
US-9031921
(2015-05-12)
발명자
/ 주소
Aronovich, Lior
Hirsch, Michael
Toaff, Yair
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Griffiths & Seaton PLLC
인용정보
피인용 횟수 :
0인용 특허 :
20
초록▼
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup and the deduplication digests are stored in a digests index. When new backup data is processed, deduplication digests of the new data are calculated and searched in the digests index. Matching digests of previously constructed synthetic backups are located in the digests index. Each of the located matching digest references stored data are included in the synthetic backup, and the stored data is similar to the input backup data. Data matches are found in the input data and data in the synthetic backup.
대표청구항▼
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising: constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;locating stored data segments referen
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising: constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;locating stored data segments referenced by the synthetic backup;calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments;partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments; andcalculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment, including: calculating a threshold digest value from the retrieved deduplication digests,calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 2. The method of claim 1, further including specifying, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 3. The method of claim 1, further including processing each of the plurality of metadata instructions by each of: locating stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 4. The method of claim 1, further including performing each of: aggregating the calculated digests of the synthetic backup sub-segment to produce the deduplication digest of the calculated digests enclosing data segment, and forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup. 5. A system for calculating deduplication digests for a synthetic backup by a deduplication storage system, comprising: the deduplication storage system; andat least one processor device, operable in the deduplication computing storage environment, wherein the at least one processor device: constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application,locates stored data segments referenced by the synthetic backup,calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments,partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments, andcalculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment, including: calculating a threshold digest value from the retrieved deduplication digests,calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 6. The system of claim 5, wherein the at least one processor device specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 7. The system of claim 5, wherein the at least one processor device processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 8. The system of claim 5, wherein the at least one processor device performs each of: aggregating the calculated digests of the synthetic backup sub-segment to produce the deduplication digest of the calculated digests enclosing data segment, and forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup. 9. A computer program product calculating deduplication digests for a synthetic backup by a deduplication storage system using at least one processor device, the computer program product comprising a computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: a first executable portion that constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application;a second executable portion that locates stored data segments referenced by the synthetic backup;a third executable portion that calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments;a fourth executable portion that partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments; anda sixth executable portion that calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment, including: calculating a threshold digest value from the retrieved deduplication digests,calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment,arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, andcalculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. 10. The computer program product of claim 9, further including a seventh executable portion that specifies, by each one of the plurality of metadata instructions provided by the backup application, a data segment of an originating backup and a designated location of the data segment in the synthetic backup being created. 11. The computer program product of claim 9, further including a seventh executable portion that processes each of the plurality of metadata instructions by each of: locating the stored data in the deduplication storage system specified by a data segment in the plurality of metadata instructions, andadding references to the stored data to metadata of the synthetic backup being created. 12. The computer program product of claim 9, further including a seventh executable portion that performs each of: aggregating the calculated digests of the synthetic backup sub-segment to produce the deduplication digest of the calculated digests enclosing data segment, andforming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (20)
Nakao, Yoshio, Apparatus and method for generating digest according to hierarchical structure of topic.
McGrattan, Emma K.; Ball, Stephen; Moucaddem, Sami R.; Rivet, Jean-Francois; Kuo, Chin L.; Yang, Frank H., Method and apparatus for data backup using data blocks.
Woodhill James R. (Houston TX) Woodhill Louis R. (Richmond TX) More ; Jr. William Russell (Houston TX) Berlin Jay Harris (Houston TX), System and method for distributed storage management on networked computer systems using binary object identifiers.
Wittenberg David K. (Hudson MA) Leichter Jerrold S. (Stamford CT), System for controlling access to a secure system by verifying acceptability of proposed password by using hashing and gr.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.