IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0855514
(2013-04-02)
|
등록번호 |
US-8725687
(2014-05-13)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
12 인용 특허 :
196 |
초록
▼
Described in detail herein are systems and methods for deduplicating data using byte-level or quasi byte-level techniques. In some embodiments, a file is divided into multiple blocks. A block includes multiple bytes. Multiple rolling hashes of the file are generated. For each byte in the file, a sea
Described in detail herein are systems and methods for deduplicating data using byte-level or quasi byte-level techniques. In some embodiments, a file is divided into multiple blocks. A block includes multiple bytes. Multiple rolling hashes of the file are generated. For each byte in the file, a searchable data structure is accessed to determine if the data structure already includes an entry matching a hash of a minimum sequence length. If so, this indicates that the corresponding bytes are already stored. If one or more bytes in the file are already stored, then the one or more bytes in the file are replaced with a reference to the already stored bytes. The systems and methods described herein may be used for file systems, databases, storing backup data, or any other use case where it may be useful to reduce the amount of data being stored.
대표청구항
▼
1. An apparatus for deduplicating data among one or more storage devices coupled via a network, wherein the network also couples to one or more computing systems, and wherein the one or more storage devices include a searchable data structure and a first set of data, the apparatus comprising: a comp
1. An apparatus for deduplicating data among one or more storage devices coupled via a network, wherein the network also couples to one or more computing systems, and wherein the one or more storage devices include a searchable data structure and a first set of data, the apparatus comprising: a computing device having at least one processor and at least one memory, wherein the computing device is configured to:receive a second set of data;divide the second set of data into at least one block, wherein the block includes a total number of bytes;access the searchable data structure;determine whether one or more bytes of the block are included in a portion of the first set of data in the searchable data structure, wherein the number of the one or more bytes is less than the total number of bytes of the block;replace the one or more bytes with a reference to the portion of the second set of data if the one or more bytes of the block are included in the portion of the first set of data in the searchable data structure; andcause the block to be stored using the one or more storage devices. 2. The apparatus of claim 1 wherein the computing device is further configured to generate powers of 2 rolling hashes for the second set of data. 3. The apparatus of claim 1 wherein the searchable data structure includes a hierarchical data structure that includes multiple nodes, and wherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 4. The apparatus of claim 1 wherein the computing device is further configured to compress the second set of data. 5. At least one tangible computer-readable medium storing instructions, which when executed by at least one computing system, processes data, wherein the computing system includes at least one processor, and memory communicatively coupled to the at least one processor, comprising: receiving a file, wherein the file includes multiple bytes;accessing at least some of multiple blocks of data in a data structure: wherein the multiple blocks of data have a first size,wherein the multiple blocks of data represent a set of data having a second size that is greater than the first size,wherein a block of data is associated with multiple first identifiers, andwherein the multiple blocks of data are identified in the data structure by the associated multiple first identifiers;based at least partly upon accessing of the data structure, determining, by the computing system, whether one or more of the multiple bytes are already stored, wherein the number of the one or more bytes is less than a number of bytes in a block of data; andcausing bytes that are not already stored using a storage device to be stored. 6. The at least one tangible computer-readable medium of claim 5, further comprising generating multiple second identifiers for the file, wherein the multiple second identifiers for the file include powers of 2 rolling hashes. 7. The at least one tangible computer-readable medium of claim 5, further comprising generating multiple second identifiers for the file, wherein the multiple second identifiers include powers of 2 rolling hashes, and wherein determining whether the one or more of the multiple bytes are already stored includes comparing the multiple second identifiers for the file with the multiple first identifiers associated with the multiple blocks of data. 8. The at least one tangible computer-readable medium of claim 5 wherein the data structure includes a hierarchical data structure that includes multiple nodes, and wherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 9. The at least one tangible computer-readable medium of claim 5 wherein at least some of the multiple blocks of data are compressed. 10. At least one tangible computer-readable medium carrying instructions for managing data by at least one data processor, comprising: dividing a first set of data into at least one block, wherein the block includes a total number of bytes;accessing a searchable data structure, wherein the searchable data structure includes a second set of data;determining whether one or more bytes of the block are included in a portion of the second set of data in the searchable data structure, wherein the number of the one or more bytes is less than the total number of bytes of the block;replacing the one or more bytes with a reference to the portion of the second set of data, wherein the replacing includes replacing the one or more bytes with a reference to the portion of the second set of data if the one or more bytes of the block are included in the portion of the second set of data in the searchable data structure; andcausing the block to be stored. 11. The at least one tangible computer-readable medium of claim 10, further comprising generating powers of 2 rolling hashes. 12. The at least one tangible computer-readable medium of claim 10, wherein the searchable data structure includes a hierarchical data structure that includes multiple nodes, and wherein a first node can reference data in any other node excepting nodes that are descendants of the first node. 13. The at least one tangible computer-readable medium of claim 10, further comprising compressing data.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.