Incremental garbage collection of data in a secondary storage
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/00
G06F-012/00
G06F-017/30
G06F-013/00
G06F-013/28
출원번호
US-0216025
(2011-08-23)
등록번호
US-8195614
(2012-06-05)
발명자
/ 주소
Patterson, R. Hugo
출원인 / 주소
EMC Corporation
대리인 / 주소
Blakely Sokoloff Taylor & Zafman LLP
인용정보
피인용 횟수 :
2인용 특허 :
27
초록▼
A method and apparatus for different embodiments of incremental garbage collection of data in a secondary storage. In one embodiment, a method comprises locating blocks of data in a log that are referenced and within a range at a tail of the log. The method also includes copying the blocks of data t
A method and apparatus for different embodiments of incremental garbage collection of data in a secondary storage. In one embodiment, a method comprises locating blocks of data in a log that are referenced and within a range at a tail of the log. The method also includes copying the blocks of data that are referenced and within the range to an unallocated segment of the log.
대표청구항▼
1. A computer-implemented method comprising: storing a plurality of active snapshots as a plurality of storage trees within a memory space, each active snapshot taken at a point in time of target data being processed and at least including data blocks to store user-generated data and data blocks to
1. A computer-implemented method comprising: storing a plurality of active snapshots as a plurality of storage trees within a memory space, each active snapshot taken at a point in time of target data being processed and at least including data blocks to store user-generated data and data blocks to store system-generated data, the system-generated data including pointers to reference other data blocks to enable overlapping and sharing of the user-generated data between and among different snapshots of the plurality of snapshots, wherein each of the storage trees includes a plurality of nodes, each node representing a data block of an active snapshot associated with each storage tree, the plurality of nodes including leaf nodes representing the data blocks of user-generated data and root and intermediate nodes representing the data blocks of system-generated data;deleting the active snapshots from the memory space over time as each of the active snapshots reaches a pre-determined age, wherein the active snapshots are those snapshots that have not yet been deleted; andperforming garbage collection on at least a portion of the memory space to remove data associated with snapshots that have been previously deletedwherein the memory space is a log and the at least a portion of the memory space is a segment of the log. 2. The method of claim 1, wherein the performing garbage collection comprises: locating those data blocks in the memory space that are active and within a range of addresses to be cleaned using pruned walking, wherein an active data block is a data block corresponding to a node in one of the storage trees that is referenced by at least one other node within a storage tree corresponding to an active snapshot currently stored within the memory space;copying the located data blocks to an unallocated segment of the memory space, wherein the data blocks within the range of addresses to be cleaned that are not active remain untouched; andmarking the range of addresses within the memory space as unallocated so that at least a portion of the range of addresses can be reclaimed. 3. The method of claim 2, wherein the pruned walking includes walking only those nodes of the plurality of nodes of the plurality of storage trees that have descendant nodes corresponding to data blocks stored within the range of addresses to be cleaned or that are themselves nodes corresponding to data blocks stored within the range of addresses to be cleaned. 4. The method of claim 1, wherein data stored in the log is stored as contiguous data in a sequential order. 5. The method of claim 1, wherein data stored in the log is wrapped around to the beginning of the log once the end of the log is reached and the beginning of the log has been cleaned. 6. The method of claim 1, wherein two different nodes from two different storage trees reference a same node. 7. The method of claim 1, wherein two different nodes of a same storage tree reference a same node in the same storage tree. 8. The method of claim 1, wherein one node makes multiple references to another node. 9. A machine-readable storage medium having instructions stored therein, which when executed by a processor, cause the processor to perform a method, the method comprising: storing a plurality of active snapshots as a plurality of storage trees within a memory space, each active snapshot taken at a point in time of target data being processed and at least including data blocks to store user-generated data and data blocks to store system-generated data, the system-generated data including pointers to reference other data blocks to enable overlapping and sharing of the user-generated data between and among different snapshots of the plurality of snapshots, wherein each of the storage trees includes a plurality of nodes, each node representing a data block of an active snapshot associated with each storage tree, the plurality of nodes including leaf nodes representing the data blocks of user-generated data and root and intermediate nodes representing the data blocks of system-generated data;deleting the active snapshots from the memory space over time as each of the active snapshots reaches a pre-determined age, wherein the active snapshots are those snapshots that have not yet been deleted; andperforming garbage collection on at least a portion of the memory space to remove data associated with snapshots that have been previously deletedwherein the memory space is a log and the at least a portion of the memory space is a segment of the log. 10. The machine-readable storage medium of claim 9, wherein the performing garbage collection comprises: locating those data blocks in the memory space that are active and within a range of addresses to be cleaned using pruned walking, wherein an active data block is a data block corresponding to a node in one of the storage trees that is referenced by at least one other node within a storage tree corresponding to an active snapshot currently stored within the memory space;copying the located data blocks to an unallocated segment of the memory space, wherein the data blocks within the range of addresses to be cleaned that are not active remain untouched; andmarking the range of addresses within the memory space as unallocated so that at least a portion of the range of addresses can be reclaimed. 11. The machine-readable storage medium of claim 10, wherein the pruned walking includes walking only those nodes of the plurality of nodes of the plurality of storage trees that have descendant nodes corresponding to data blocks stored within the range of addresses to be cleaned or that are themselves nodes corresponding to data blocks stored within the range of addresses to be cleaned. 12. The machine-readable storage medium of claim 9, wherein data stored in the log is stored as contiguous data in a sequential order. 13. The machine-readable storage medium of claim 9, wherein data stored in the log is wrapped around to the beginning of the log once the end of the log is reached and the beginning of the log has been cleaned. 14. The machine-readable storage medium of claim 9, wherein two different nodes from two different storage trees reference a same node. 15. The machine-readable storage medium of claim 9, wherein two different nodes of a same storage tree reference a same node in the same storage tree. 16. The machine-readable storage medium of claim 9, wherein one node makes multiple references to another node. 17. A storage system, comprising: a memory to store a plurality of active snapshots as a plurality of storage trees within a memory space, each active snapshot taken at a point in time of target data being processed and at least including data blocks to store user-generated data and data blocks to store system-generated data, the system-generated data including pointers to reference other data blocks to enable overlapping and sharing of the user-generated data between and among different snapshots of the plurality of snapshots, wherein each of the storage trees includes a plurality of nodes, each node representing a data block of an active snapshot associated with each storage tree, the plurality of nodes including leaf nodes representing the data blocks of user-generated data and root and intermediate nodes representing the data blocks of system-generated data;a backup logic to delete the active snapshots from the memory space over time as each of the active snapshots reaches a pre-determined age, wherein the active snapshots are those snapshots that have not yet been deleted; anda garbage collection logic to perform garbage collection on at least a portion of the memory space to remove data associated with snapshots that have been previously deletedwherein the memory space is a log and the at least a portion of the memory space is a segment of the log. 18. The system of claim 17, wherein the garbage collection logic is configured to locate those data blocks in the memory space that are active and within a range of addresses to be cleaned using pruned walking, wherein an active data block is a data block corresponding to a node in one of the storage trees that is referenced by at least one other node within a storage tree corresponding to an active snapshot currently stored within the memory space,copy the located data blocks to an unallocated segment of the memory space, wherein the data blocks within the range of addresses to be cleaned that are not active remain untouched, andmark the range of addresses within the memory space as unallocated so that at least a portion of the range of addresses can be reclaimed. 19. The system of claim 18, wherein the pruned walking includes walking only those nodes of the plurality of nodes of the plurality of storage trees that have descendant nodes corresponding to data blocks stored within the range of addresses to be cleaned or that are themselves nodes corresponding to data blocks stored within the range of addresses to be cleaned. 20. The system of claim 17, wherein data stored in the log is stored as contiguous data in a sequential order.
Gladney Henry M. (Saratoga CA) Lorch Douglas J. (San Jose CA) Mattson Richard L. (San Jose CA), Communication for version management in a distributed information service.
Hitz David ; Malcolm Michael ; Lau James ; Rakitzis Byron, Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file s.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.