Probabilistic summary data structure based encoding for garbage collection
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-012/00
G06F-017/30
G06F-007/00
G06F-017/00
출원번호
US-0611237
(2003-06-30)
등록번호
US-7424498
(2008-09-09)
발명자
/ 주소
Patterson,R. Hugo
출원인 / 주소
Data Domain, Inc.
대리인 / 주소
Blakely, Sokoloff, Taylor & Zafman LLP
인용정보
피인용 횟수 :
61인용 특허 :
15
초록▼
A method and apparatus for different embodiments of probabilistic summary data structure based encoding for garbage collection are described. In one embodiment, a method comprises generating a probabilistic summary data structure that represents active blocks of data within a storage device based on
A method and apparatus for different embodiments of probabilistic summary data structure based encoding for garbage collection are described. In one embodiment, a method comprises generating a probabilistic summary data structure that represents active blocks of data within a storage device based on identifications of the active blocks or the data within the active blocks. The method also includes performing garbage collection of at least a portion of the storage device based on the probabilistic summary data structure.
대표청구항▼
What is claimed is: 1. A method comprising: generating a probabilistic summary data structure based on a Bloom filter that represents active blocks of data within a storage device that are referenced by other blocks of data based on identification of the active blocks or the data within the active
What is claimed is: 1. A method comprising: generating a probabilistic summary data structure based on a Bloom filter that represents active blocks of data within a storage device that are referenced by other blocks of data based on identification of the active blocks or the data within the active blocks; and performing garbage collection of at least a portion of the storage device based on the probabilistic data structure by performing the following for each block of data within a space to be cleaned: applying a probabilistic algorithm on the identification of the block of data or the data within the block of data, the applying resulting in a bit vector for each block of data or the data within the block of data; and comparing the resulting bit vector with the probabilistic data structure to determine if the block of data is active. 2. The method of claim 1, wherein generating the probabilistic summary data structure that represents active blocks of data within the storage device comprises: generating a number of hashes based on the identifications of the active blocks or the data within the active blocks; and encoding the number of hashes for the active blocks within the storage device into the probabilistic summary data structure. 3. The method of claim 2, wherein the identifications of the active blocks include a hash of the data within the active blocks. 4. The method of claim 3, wherein the generating of the number of hashes comprises selecting a bit range of the hash of the data within the active block. 5. The method of claim 2, wherein generating a hash based on an identification of the active block or the data within the active block within the storage device comprises: generating a first hash based on the identification of the active block or the data within the active block; and generating different hashes based on the first hash for the active block. 6. The method of claim 5, wherein generating different hashes based on the first hash for the active block includes selecting different bits within the first hash. 7. The method of claim 1, wherein a size of the encoded value fits within memory of a computing device that is coupled to the storage device. 8. The method of claim 1, wherein performing garbage collection of at least a portion of the storage device based on the probabilistic summary data structure includes copying the active blocks to an unallocated address space within the storage device based on the determined active blocks. 9. The method of claim 1, wherein the probabilistic summary data structure indicates that at least one inactive block is an active block. 10. The method of claim 1, wherein the blocks stored in the storage device that are marked as allocated are non-modifiable. 11. A method comprising: for each referenced block of data with at least a portion of an allocated address space in a storage device, performing the following: generating a number of hash values based on hashes of an identification of a block or the data in the block within the storage device, and setting bits within a probabilistic summary data structure, the probabilistic summary data structure having been generated based on a Bloom filter, at offsets equal to each of the number of hash values, wherein a size of the probabilistic summary data structure fits within memory of a computing device that accesses the storage device; and reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure. 12. The method of claim 11, wherein reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure comprises: performing the following operations until an identification of each allocated block stored in a table has been processed, the operations including: generating a number of hash values based on hashes of an identification of a block, and determining whether the block is referenced based on the bit values within the summary data structure that are at offsets equal to the hash values. 13. The method of claim 12, wherein the performing the following operations until the identification of each allocated block stored in the table has been processed includes, marking a block as unallocated when at least one of the bits is not set within the summary data structure at the offsets equaling the hash values for the identification of the block. 14. The method of claim 12, wherein performing the following operations until the identification of each allocated block stored in the table has been processed, the operations including: copying a block that is referenced to an unallocated address space in the storage device, upon determining, for each of the number of hash values, that a bit within the bit vector at an offset equal to a hash value is set, and marking the at least a portion of the allocated address space as unallocated. 15. A method comprising: generating a first encoded value based on a first Bloom filter, the first encoded value representative of blocks of data within an allocated address space to be cleaned within a storage device; locating blocks of data that are currently references be at least one other block of data and are within an allocated address space to be cleaned based on the first encoded value; generating a second encoded value based on a second Bloom filter, the second encoded value representative of the blocks of data that are currently referenced and within the allocated address space to be cleaned; and reclaiming at least a portion of the allocated address space based on the first encoded value and the second encoded value; wherein the generating the second encoded value includes performing the following for each located block: applying the first Bloom filter on the located block, and comparing the result of applying the first Bloom filter on the located block to the first encoded value, if there is a match, performing the following, applying the second Bloom filter on the located block, and storing the result of applying the second Bloom filter on the located block in the second encoded value. 16. The method of claim 15 wherein the reclaiming includes copying the blocks that are currently referenced by at least one reference source and within the allocated address space of the storage device to an unallocated address space of the storage device. 17. The method of claim 16 wherein the reclaiming includes marking the allocated address space of the storage device as unallocated address space. 18. The method of claim 15, wherein the second encoded value fits within local memory within a computing device that accesses the storage device. 19. A system comprising: a storage device having stored therein a number of blocks of data, wherein some of the blocks of data are active and some might be inactive; a memory having stored therein a probabilistic summary data structure that represents the blocks of data that are active and within space to be cleaned within the storage device; and a garbage collection logic that generated the probabilistic summary data structure based on a Bloom filter and performs garbage collection on the space through comparison of the probabilistic summary data structure to a bit vector generated for each block within the space, wherein each bit vector is generated by the garbage collection logic from an application of the Bloom filter onto an identification of the blocks of data or the data within the blocks of data. 20. The system of claim 19, wherein the garbage collection logic is to copy the blocks of data that are currently referenced and within the allocated address space to be cleaned within the storage device to an unallocated address space of the storage device. 21. The system of claim 20, wherein the garbage collection logic is to mark the allocated address space to be cleaned within the storage device as unallocated address space. 22. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: generating a probabilistic summary data structure based on a Bloom filter that represents active blocks of data within a storage device that are referenced by other blocks of data based on identification of the active blocks or the data within the active blocks; and performing garbage collection of at least a portion of the storage device based on the probabilistic data structure by performing the following for each block of data within a space to be cleaned: applying a probabilistic algorithm on the identification of the block of data or the data within the block of data, the application resulting in a bit vector for each block of data or the data within the block of data; and comparing the resulting bit vector with the probabilistic data structure to determine if the block of data is active. 23. The machine storage medium of claim 22, wherein generating the probabilistic summary data structure that represents active blocks of data within the storage device comprises: generating a number of hashes based on the identifications of the active blocks or the data within the active blocks; and encoding the number of hashes for the active blocks within the storage device into the probabilistic summary data structure. 24. The machine storage medium of claim 23, wherein the identifications of the active blocks include a hash of the data within the active blocks. 25. The machined storage medium of claim 23, wherein the generating of the number of hashes comprises selecting a bit range of the hash of the data within the active block. 26. The machine storage medium of claim 23, wherein generating a hash based on an identification of the active block or the data within the active block within the storage device comprises: generating a first hash based on the identification of the active block or the data within the active block; and generating different hashes based on the first hash for the active block. 27. The machine storage medium of claim 22, wherein the probabilistic summary data structure indicates that at least one inactive block is an active block. 28. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: for each referenced block of data with at least a portion of an allocated address space in a storage device, performing the following: generating a number of hash values based on hashes of an identification of a block or the data in the block within the storage device, and setting bits within a probabilistic summary data structure, the probabilistic summary data structure having been generated based on a Bloom filter, at offsets equal to each of the number of hash values, wherein a size of the probabilistic summary data structure fits within memory of a computing device that accesses the storage device; and reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure. 29. The machine storage medium of claim 28, wherein reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure includes: performing the following operations until an identification of each allocated block stored in a table has been processed: generating a number of hash values based on hashes of an identification of a block, and determining whether the block is referenced based on the bit values within the summary data structure that are at offsets equal to the hash values. 30. The machine storage medium of claim 29, wherein performing the following operations until the identification of each allocated block stored in the table has been processed comprises: marking a block as unallocated when at least one of the bits is not set within the summary data structure at the offsets equaling the hash values for the identification of the block. 31. The machine storage medium of claim 29, wherein performing the following operations until the identification of each allocated block stored in the table has been processed, the operations comprising: copying a block that is referenced to an unallocated address space in the storage device, upon determining, for each of the number of hash values, that a bit within a bit vector at an offset equal to a hash value is set; and marking the at least a portion of the allocated address space as unallocated. 32. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: generating a first encoded value based on a first Bloom filter, the first encoded value representative of blocks of data within an allocated address space to be cleaned within a storage device; locating blocks of data that are currently references be at least one other block of data and are within an allocated address space to be cleaned based on the first encoded value; generating a second encoded value based on a second Bloom filter, the second encoded value representative of the blocks of data that are currently referenced and within the allocated address space to be cleaned; and reclaiming at least a portion of the allocated address space based on the first encoded value and the second encoded value; wherein the generating the second encoded value includes performing the following for each located block: applying the first Bloom filter on the located block, and comparing the result of applying the first Bloom filter on the located block to the first encoded value, if there is a match, performing the following, applying the second Bloom filter on the located block, and storing the result of applying the second Bloom filter on the located block in the second encoded value. 33. The machine storage medium of claim 32 comprising copying the blocks that are currently referenced by at least one reference source and within the allocated address space of the storage device to an unallocated address space of the storage device.
Gladney Henry M. (Saratoga CA) Lorch Douglas J. (San Jose CA) Mattson Richard L. (San Jose CA), Communication for version management in a distributed information service.
Davis, John D.; Hayes, John; Kannan, Hari; Miladinovic, Nenad; Tan, Zhangxi, Data rebuild on feedback from a queue in a non-volatile solid-state storage.
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Efficient read/write algorithms and associated mapping for block-level data reduction processes.
Watkins, Kathryn R.; McWhorter, Michael; Hill, William H.; Long, Jeffrey W.; Shrauder, Christian, Method and apparatus for transferring and reconstructing an image of a computer readable medium.
Stuart, Alan L.; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program for implementing retention policies to archive records.
Stuart, Alan; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program implementing retention policies to archive records.
Hayes, John; Gupta, Shantanu; Davis, John; Gold, Brian; Tan, Zhangxi, Nonrepeating identifiers in an address space of a non-volatile solid-state storage.
Raizen, Helen S.; Camp, Jeffrey; Bappe, Michael E., Selective I/O to logical unit when encrypted, but key is not available or when encryption status is unknown.
Raizen, Helen S.; Freund, David W.; Harwood, John; Bappe, Michael E., Systems and methods for accessing storage or network based replicas of encrypted volumes with no additional key management.
Raizen, Helen S.; Harwood, John; Bappe, Michael E.; Kothandan, Sathiyamoorthy; Epstein, Edith, Systems and methods for selective encryption of operating system metadata for host-based encryption of data at rest on a logical unit.
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.