[특허]Probabilistic summary data structure based encoding for garbage collection

Probabilistic summary data structure based encoding for garbage collection 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-012/00 G06F-017/30 G06F-007/00 G06F-017/00
출원번호	US-0611237 (2003-06-30)
등록번호	US-7424498 (2008-09-09)
발명자 / 주소	Patterson,R. Hugo
출원인 / 주소	Data Domain, Inc.
대리인 / 주소	Blakely, Sokoloff, Taylor & Zafman LLP
인용정보	피인용 횟수 : 61 인용 특허 : 15

초록 ▼

A method and apparatus for different embodiments of probabilistic summary data structure based encoding for garbage collection are described. In one embodiment, a method comprises generating a probabilistic summary data structure that represents active blocks of data within a storage device based on identifications of the active blocks or the data within the active blocks. The method also includes performing garbage collection of at least a portion of the storage device based on the probabilistic summary data structure.

대표청구항 ▼

What is claimed is: 1. A method comprising: generating a probabilistic summary data structure based on a Bloom filter that represents active blocks of data within a storage device that are referenced by other blocks of data based on identification of the active blocks or the data within the active blocks; and performing garbage collection of at least a portion of the storage device based on the probabilistic data structure by performing the following for each block of data within a space to be cleaned: applying a probabilistic algorithm on the identification of the block of data or the data within the block of data, the applying resulting in a bit vector for each block of data or the data within the block of data; and comparing the resulting bit vector with the probabilistic data structure to determine if the block of data is active. 2. The method of claim 1, wherein generating the probabilistic summary data structure that represents active blocks of data within the storage device comprises: generating a number of hashes based on the identifications of the active blocks or the data within the active blocks; and encoding the number of hashes for the active blocks within the storage device into the probabilistic summary data structure. 3. The method of claim 2, wherein the identifications of the active blocks include a hash of the data within the active blocks. 4. The method of claim 3, wherein the generating of the number of hashes comprises selecting a bit range of the hash of the data within the active block. 5. The method of claim 2, wherein generating a hash based on an identification of the active block or the data within the active block within the storage device comprises: generating a first hash based on the identification of the active block or the data within the active block; and generating different hashes based on the first hash for the active block. 6. The method of claim 5, wherein generating different hashes based on the first hash for the active block includes selecting different bits within the first hash. 7. The method of claim 1, wherein a size of the encoded value fits within memory of a computing device that is coupled to the storage device. 8. The method of claim 1, wherein performing garbage collection of at least a portion of the storage device based on the probabilistic summary data structure includes copying the active blocks to an unallocated address space within the storage device based on the determined active blocks. 9. The method of claim 1, wherein the probabilistic summary data structure indicates that at least one inactive block is an active block. 10. The method of claim 1, wherein the blocks stored in the storage device that are marked as allocated are non-modifiable. 11. A method comprising: for each referenced block of data with at least a portion of an allocated address space in a storage device, performing the following: generating a number of hash values based on hashes of an identification of a block or the data in the block within the storage device, and setting bits within a probabilistic summary data structure, the probabilistic summary data structure having been generated based on a Bloom filter, at offsets equal to each of the number of hash values, wherein a size of the probabilistic summary data structure fits within memory of a computing device that accesses the storage device; and reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure. 12. The method of claim 11, wherein reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure comprises: performing the following operations until an identification of each allocated block stored in a table has been processed, the operations including: generating a number of hash values based on hashes of an identification of a block, and determining whether the block is referenced based on the bit values within the summary data structure that are at offsets equal to the hash values. 13. The method of claim 12, wherein the performing the following operations until the identification of each allocated block stored in the table has been processed includes, marking a block as unallocated when at least one of the bits is not set within the summary data structure at the offsets equaling the hash values for the identification of the block. 14. The method of claim 12, wherein performing the following operations until the identification of each allocated block stored in the table has been processed, the operations including: copying a block that is referenced to an unallocated address space in the storage device, upon determining, for each of the number of hash values, that a bit within the bit vector at an offset equal to a hash value is set, and marking the at least a portion of the allocated address space as unallocated. 15. A method comprising: generating a first encoded value based on a first Bloom filter, the first encoded value representative of blocks of data within an allocated address space to be cleaned within a storage device; locating blocks of data that are currently references be at least one other block of data and are within an allocated address space to be cleaned based on the first encoded value; generating a second encoded value based on a second Bloom filter, the second encoded value representative of the blocks of data that are currently referenced and within the allocated address space to be cleaned; and reclaiming at least a portion of the allocated address space based on the first encoded value and the second encoded value; wherein the generating the second encoded value includes performing the following for each located block: applying the first Bloom filter on the located block, and comparing the result of applying the first Bloom filter on the located block to the first encoded value, if there is a match, performing the following, applying the second Bloom filter on the located block, and storing the result of applying the second Bloom filter on the located block in the second encoded value. 16. The method of claim 15 wherein the reclaiming includes copying the blocks that are currently referenced by at least one reference source and within the allocated address space of the storage device to an unallocated address space of the storage device. 17. The method of claim 16 wherein the reclaiming includes marking the allocated address space of the storage device as unallocated address space. 18. The method of claim 15, wherein the second encoded value fits within local memory within a computing device that accesses the storage device. 19. A system comprising: a storage device having stored therein a number of blocks of data, wherein some of the blocks of data are active and some might be inactive; a memory having stored therein a probabilistic summary data structure that represents the blocks of data that are active and within space to be cleaned within the storage device; and a garbage collection logic that generated the probabilistic summary data structure based on a Bloom filter and performs garbage collection on the space through comparison of the probabilistic summary data structure to a bit vector generated for each block within the space, wherein each bit vector is generated by the garbage collection logic from an application of the Bloom filter onto an identification of the blocks of data or the data within the blocks of data. 20. The system of claim 19, wherein the garbage collection logic is to copy the blocks of data that are currently referenced and within the allocated address space to be cleaned within the storage device to an unallocated address space of the storage device. 21. The system of claim 20, wherein the garbage collection logic is to mark the allocated address space to be cleaned within the storage device as unallocated address space. 22. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: generating a probabilistic summary data structure based on a Bloom filter that represents active blocks of data within a storage device that are referenced by other blocks of data based on identification of the active blocks or the data within the active blocks; and performing garbage collection of at least a portion of the storage device based on the probabilistic data structure by performing the following for each block of data within a space to be cleaned: applying a probabilistic algorithm on the identification of the block of data or the data within the block of data, the application resulting in a bit vector for each block of data or the data within the block of data; and comparing the resulting bit vector with the probabilistic data structure to determine if the block of data is active. 23. The machine storage medium of claim 22, wherein generating the probabilistic summary data structure that represents active blocks of data within the storage device comprises: generating a number of hashes based on the identifications of the active blocks or the data within the active blocks; and encoding the number of hashes for the active blocks within the storage device into the probabilistic summary data structure. 24. The machine storage medium of claim 23, wherein the identifications of the active blocks include a hash of the data within the active blocks. 25. The machined storage medium of claim 23, wherein the generating of the number of hashes comprises selecting a bit range of the hash of the data within the active block. 26. The machine storage medium of claim 23, wherein generating a hash based on an identification of the active block or the data within the active block within the storage device comprises: generating a first hash based on the identification of the active block or the data within the active block; and generating different hashes based on the first hash for the active block. 27. The machine storage medium of claim 22, wherein the probabilistic summary data structure indicates that at least one inactive block is an active block. 28. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: for each referenced block of data with at least a portion of an allocated address space in a storage device, performing the following: generating a number of hash values based on hashes of an identification of a block or the data in the block within the storage device, and setting bits within a probabilistic summary data structure, the probabilistic summary data structure having been generated based on a Bloom filter, at offsets equal to each of the number of hash values, wherein a size of the probabilistic summary data structure fits within memory of a computing device that accesses the storage device; and reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure. 29. The machine storage medium of claim 28, wherein reclaiming the at least a portion of the allocated address space based on the probabilistic summary data structure includes: performing the following operations until an identification of each allocated block stored in a table has been processed: generating a number of hash values based on hashes of an identification of a block, and determining whether the block is referenced based on the bit values within the summary data structure that are at offsets equal to the hash values. 30. The machine storage medium of claim 29, wherein performing the following operations until the identification of each allocated block stored in the table has been processed comprises: marking a block as unallocated when at least one of the bits is not set within the summary data structure at the offsets equaling the hash values for the identification of the block. 31. The machine storage medium of claim 29, wherein performing the following operations until the identification of each allocated block stored in the table has been processed, the operations comprising: copying a block that is referenced to an unallocated address space in the storage device, upon determining, for each of the number of hash values, that a bit within a bit vector at an offset equal to a hash value is set; and marking the at least a portion of the allocated address space as unallocated. 32. A machine storage medium that provides instructions, which when executed by a machine, cause said machine to perform operations comprising: generating a first encoded value based on a first Bloom filter, the first encoded value representative of blocks of data within an allocated address space to be cleaned within a storage device; locating blocks of data that are currently references be at least one other block of data and are within an allocated address space to be cleaned based on the first encoded value; generating a second encoded value based on a second Bloom filter, the second encoded value representative of the blocks of data that are currently referenced and within the allocated address space to be cleaned; and reclaiming at least a portion of the allocated address space based on the first encoded value and the second encoded value; wherein the generating the second encoded value includes performing the following for each located block: applying the first Bloom filter on the located block, and comparing the result of applying the first Bloom filter on the located block to the first encoded value, if there is a match, performing the following, applying the second Bloom filter on the located block, and storing the result of applying the second Bloom filter on the located block in the second encoded value. 33. The machine storage medium of claim 32 comprising copying the blocks that are currently referenced by at least one reference source and within the allocated address space of the storage device to an unallocated address space of the storage device.

이 특허에 인용된 특허 (15)

Garthwaite, Alexander T., Binned remembered sets.
상세보기
Ernest H. Wurmlinger CA, Collapsible wall mounted seat.
상세보기
Gladney Henry M. (Saratoga CA) Lorch Douglas J. (San Jose CA) Mattson Richard L. (San Jose CA), Communication for version management in a distributed information service.
상세보기
Chilimbi Trishul M. ; Larus James R., Data structure partitioning with garbage collection to optimize cache utilization.
상세보기
Grantham Paul V. ; Lyles Joseph B. ; Smith William T., Document communications controller.
상세보기
Li Liang, Method and system for comparing strings with entries of a lexicon.
상세보기
Williams Ross Neil,AUX, Method for partitioning a block of data into subblocks and for storing and communcating such subblocks.
상세보기
Troisi James H. (Sunnyvale CA), Method for sorting and storing data employing dynamic sort tree reconfiguration in volatile memory.
상세보기
Stoutamire, David P.; Grarup, Steffen, Methods and apparatus for improving locality of reference through memory management.
상세보기
Zwilling Michael ; Blackman Rande ; Agarwal Sameet ; Lindell Steven J., On-line dynamic file shrink facility.
상세보기
Thomas Alan Gall, Relation-based ordering of objects in an object heap.
상세보기
Thekkath Chandramohan A. ; Mann Timothy P. ; Lee Edward K., Scalable distributed file system.
상세보기
Clark Carl Edward ; Greenspan Steven Jay ; Shah Hiren Ramlal, Tail compression of a log stream using a scratch pad of logically deleted entries.
상세보기
Clark Carl Edward ; Greenspan Steven Jay, Tail compression of a sparse log stream of a computer system.
상세보기
Hitz David ; Malcolm Michael ; Lau James ; Rakitzis Byron, Write anywhere file-system layout.
상세보기

이 특허를 인용한 특허 (61)

Botes, Par; Colgrove, John; Hayes, John, Ability to partition an array into two or more logical arrays with independently running software.
상세보기
Kreutzer, Tor; Viken Valvag, Steffen; Eidesen, Dag Steinnes; Johansen, Amund Kronen; Heen, Peter Dahle; Karlberg, Jan-Ove Almli; Meling, Jon; Kvalnes, Age, Access controlled graph query spanning.
상세보기
Davis, John D., Aggressive data deduplication using lazy garbage collection.
상세보기
Hayes, John; Lee, Robert, Authorizing I/O commands with I/O tokens.
상세보기
Kannan, Hari; Miladinovic, Nenad; Tan, Zhangxi; Zhao, Randy, Calibration of flash channels in SSD.
상세보기
Fukutomi, Kazuhiro; Yoshii, Kenichiro; Kanno, Shinichi; Asano, Shigehiro, Controller, data storage device, and program product.
상세보기
Fukutomi, Kazuhiro; Yoshii, Kenichiro; Kanno, Shinichi; Asano, Shigehiro, Controller, data storage device, and program product.
상세보기
Fukutomi, Kazuhiro; Yoshii, Kenichiro; Kanno, Shinichi; Asano, Shigehiro, Controller, data storage device, and program product.
상세보기
Davis, John D.; Hayes, John; Kannan, Hari; Miladinovic, Nenad; Tan, Zhangxi, Data rebuild on feedback from a queue in a non-volatile solid-state storage.
상세보기
Emigh, Aaron; Roskind, James, Data restoration utilizing redundancy data.
상세보기
Hayes, John; Botes, Par, Data striping across storage nodes that are assigned to multiple logical arrays.
상세보기
Hayes, John; Gupta, Shantanu; Davis, John; Gold, Brian; Tan, Zhangxi, Direct memory access data movement.
상세보기
Hayes, John; Lee, Robert; Ostrovsky, Igor; Vajgel, Peter, Distributed transactions with token-associated execution.
상세보기
Emigh, Aaron T.; Roskind, James A., Efficient data sharing.
상세보기
Gupta, Anurag Windlass, Efficient query processing using histograms in a columnar database.
상세보기
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Efficient read/write algorithms and associated mapping for block-level data reduction processes.
상세보기
Hayes, John Martin; Kannan, Hari; Miladinovic, Nenad, Erase block state detection.
상세보기
Boyle, William B., Garbage collection based on the inactivity level of stored data.
상세보기
Feigin, Boris; Kleinerman, Andrew; Tumanova, Svitlana; Vohra, Taher; Wang, Xiaohui, Geometry based, space aware shelf/writegroup evacuation.
상세보기
Young, Hadley Rasch, Group based complete and incremental computer file backup system, process and apparatus.
상세보기
Young, Hadley Rasch, Group based complete and incremental computer file backup system, process and apparatus.
상세보기
Davis, John D., Increased storage unit encryption based on loss of trust.
상세보기
Patterson, R. Hugo, Incremental garbage collection of data in a secondary storage.
상세보기
Patterson, R. Hugo, Incremental garbage collection of data in a secondary storage.
상세보기
Bachar, Yariv; David, Johnny; Levy, Asaf; Shenhar, Elez, Management of multiple capacity types in storage systems.
상세보기
Watkins, Kathryn R.; McWhorter, Michael; Hill, William H.; Long, Jeffrey W.; Shrauder, Christian, Method and apparatus for transferring and reconstructing an image of a computer readable medium.
상세보기
Xu, Xia; Lindemann, Aaron, Method and system for copying a snapshot tree.
상세보기
Davis, Steven Charles, Method and system for desynchronization recovery for permissioned blockchains using bloom filters.
상세보기
Wallace, Grant, Method and system for distributed garbage collection of deduplicated datasets.
상세보기
Mondal, Shishir, Method and system for maintaining persistent live segment records for garbage collection.
상세보기
Pruett, David C.; Neppalli, Srinivas, Method for selective defragmentation in a data storage device.
상세보기
Stuart, Alan L.; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program for implementing retention policies to archive records.
상세보기
Stuart, Alan; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program implementing retention policies to archive records.
상세보기
Li, Kai; Patterson, R. Hugo; Zhu, Ming Benjamin; Bricker, Allan; Johnsson, Richard; Reddy, Sazzala; Zabarsky, Jeffery, Network file system-based data storage system.
상세보기
Li, Kai; Patterson, R. Hugo; Zhu, Ming Benjamin; Bricker, Allan; Johnsson, Richard; Reddy, Sazzala; Zabarsky, Jeffery, Network file system-based data storage system.
상세보기
Hayes, John; Gupta, Shantanu; Davis, John; Gold, Brian; Tan, Zhangxi, Nonrepeating identifiers in an address space of a non-volatile solid-state storage.
상세보기
Neppalli, Srinivas; Fallone, Robert M.; Boyle, William B., Opportunistic defragmentation during garbage collection.
상세보기
Kannan, Hari; Kirkpatrick, Peter E., Page writes for triple level cell flash memory.
상세보기
Hayes, John; Gold, Brian; Lee, Robert, Parallel update to NVRAM.
상세보기
Botes, Par; Hayes, John; Tan, Zhangxi, Point to point based backend communication layer for storage processing.
상세보기
Patterson, R. Hugo, Probabilistic summary data structure based encoding for garbage collection.
상세보기
Patterson, R. Hugo, Probabilistic summary data structure based encoding for garbage collection in backup systems.
상세보기
Bernat, Andrew R.; Miller, Ethan L., Resharing of a split secret.
상세보기
Bernat, Andrew R.; Miller, Ethan L., Resharing of a split secret.
상세보기
Botes, Par; Colgrove, John; Davis, John; Hayes, John; Lee, Robert; Robinson, Joshua; Vajgel, Peter, Scalable non-uniform storage sizes.
상세보기
Hayes, John; Gupta, Shantanu; Davis, John; Gold, Brian; Tan, Zhangxi, Scheduling policy for queues in a non-volatile solid-state storage.
상세보기
Raizen, Helen S.; Camp, Jeffrey; Bappe, Michael E., Selective I/O to logical unit when encrypted, but key is not available or when encryption status is unknown.
상세보기
Hayes, John; Gupta, Shantanu; Davis, John; Gold, Brian; Tan, Zhangxi, Self-describing data format for DMA in a non-volatile solid-state storage.
상세보기
Yang, Wun Mo; Kim, Kyeong Rho; Kwak, Jeong Soon, Semiconductor storage system for decreasing page copy frequency and controlling method thereof.
상세보기
Colgrove, John; Davis, John D.; Hayes, John, Storage system architecture.
상세보기
Hayes, John; Colgrove, John; Davis, John D., Storage system architecture.
상세보기
Lam, Wai, System and method for identifying and mitigating redundancies in stored data.
상세보기
Boyle, William B.; Fallone, Robert M., System and method for optimizing garbage collection in data storage.
상세보기
Boyle, William B.; Fallone, Robert M., System and method for optimizing garbage collection in data storage.
상세보기
Raizen, Helen S.; Freund, David W.; Harwood, John; Bappe, Michael E., Systems and methods for accessing storage or network based replicas of encrypted volumes with no additional key management.
상세보기
Shalev, Ori, Systems and methods for operating a storage system.
상세보기
Raizen, Helen S.; Harwood, John; Bappe, Michael E.; Kothandan, Sathiyamoorthy; Epstein, Edith, Systems and methods for selective encryption of operating system metadata for host-based encryption of data at rest on a logical unit.
상세보기
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
상세보기
Raizen, Helen S.; Bappe, Michael E.; Nikolaevich, Agarkov Vadim; Biester, William Carl; Ruef, Richard; Owen, Karl M., Systems and methods for using thin provisioning to reclaim space identified by data reduction processes.
상세보기
Amiri, Behzad; Miladinovic, Nenad, Tracking of optimum read voltage thresholds in nand flash devices.
상세보기
Hayes, John; Gold, Brian; Gupta, Shantanu; Lee, Robert; Kannan, Hari, Transactional commits with hardware assists in remote memory.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Probabilistic summary data structure based encoding for garbage collection 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (15)

이 특허를 인용한 특허 (61)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Probabilistic summary data structure based encoding for garbage collection 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (15)

이 특허를 인용한 특허 (61)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트