History preservation in a computer storage system
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/30
G06F-012/00
출원번호
US-0374517
(2003-02-26)
등록번호
US-7478096
(2009-01-13)
발명자
/ 주소
Margolus,Norman H.
Floyd,Jered J.
Homsy, II,George E.
Keller,Jeffrey M.
출원인 / 주소
Burnside Acquisition, LLC
대리인 / 주소
Fish & Richardson P.C.
인용정보
피인용 횟수 :
30인용 특허 :
31
초록▼
A method by which a disk-based distributed data storage system is organized for protecting historical records of stored data entities. The method comprises recording distinct states of an entity, corresponding to different moments of time, as separate entity versions coexisting within the distribute
A method by which a disk-based distributed data storage system is organized for protecting historical records of stored data entities. The method comprises recording distinct states of an entity, corresponding to different moments of time, as separate entity versions coexisting within the distributed data storage system, and assigning expiration times to the entity versions independently within each of a plurality of storage sites according to a shared set of rules, before which times deletion is prohibited.
대표청구항▼
What is claimed is: 1. A method by which a disk-based distributed data storage system is organized for protecting historical records of stored data entities, the method comprising: recording distinct states of entities, corresponding to different moments of time, as separate entity versions coexist
What is claimed is: 1. A method by which a disk-based distributed data storage system is organized for protecting historical records of stored data entities, the method comprising: recording distinct states of entities, corresponding to different moments of time, as separate entity versions coexisting within the disk-based distributed data storage system; storing an entity version in the disk-based distributed data storage system, with a copy of the entity version located at each of a plurality of storage sites; and assigning expiration times to the entity version, with separate expiration times assigned at each of the plurality of storage sites according to a shared set of rules, before which times both modification and deletion of the entity version are prohibited, and after which times it is deleted from the disk-based distributed data storage system; wherein a storage client communicating with the disk-based distributed data storage system deposits the entity version into the storage system and later retrieves it; wherein an action taken by the storage client communicating with the disk-based distributed data storage system causes an expiration time, which is one of the separate expiration times, to be assigned to the entity version; wherein no subsequent action that the storage client communicating with the disk-based distributed data storage system can take will cause the expiration time to be changed to an earlier time; wherein a request communicated by the storage client to the disk-based distributed data storage system, which would cause or allow the entity version to be deleted from the disk-based distributed data storage system before the expiration time, is denied; and wherein no alteration or corruption of expiration times assigned at any single storage site that is a part of the disk-based distributed data storage system will allow the entity version to be deleted from all of the plurality of storage sites before the assigned expiration times. 2. The method of claim 1 in which the storage system is adapted for storing an unstructured-set of entities. 3. The method of claim 2 in which the unstructured set comprises more than a million entities. 4. The method of claim 2 in which the unstructured set comprises more than a billion entities. 5. The method of claim 1 in which the storage system associates an entity with an identifier chosen by the storage client. 6. The method of claim 1 in which the storage system associates an entity version with an identifier that depends on a hash of its contents. 7. The method of claim 1 in which the storage client uses the storage system to record states of entities that constitute a hierarchical file system, with separately accessible entities playing the roles of files and directories. 8. The method of claim 1 or 7 in which expiration times of entity versions can be extended, and extension periods for different versions can be specified independently. 9. The method of claim 8 in which the expiration time is extended at the request of the storage client. 10. The method of claim 1 in which the plurality of storage sites is chosen from a larger set of storage sites based on a hash. 11. The method of claim 1 or 7 in which entity versions can be accessed separately, without needing to access a larger aggregate first. 12. The method of claim 1 in which each of the plurality of storage sites is located in a different city. 13. The method of claim 1 in which no single individual is allowed physical access to all of the plurality of storage sites. 14. The method of claim 1 in which an authorized client of the storage system can cause or allow the entity version to be deleted before the expiration time. 15. The method of claim 1 in which no single individual is given the authority to cause or allow the entity version to be deleted before its expiration time at all of the plurality of storage sites. 16. The method of claim 1 in which the entity version is one of a plurality of versions of an entity that are each assigned deposit times, and the version with the latest deposit time is considered current. 17. The method of claim 16 in which non-current versions are assigned expiration times. 18. The method of claim 16 in which the deposit time is specified by the storage client. 19. The method of claim 18 in which the deposit time is constrained to agree with the actual time that the deposit reaches a storage site, to within predetermined limits. 20. The method of claim 19 in which the imposition of the constraint begins at a predefined event, before which event versions of the entity are deposited with deposit times that violate the constraint. 21. The method of claim 20 in which the predefined event is the deposit of a version of the entity with a deposit time specified that agrees with the actual time, to within predetermined limits. 22. The method of claim 20 in which the predefined event is a request from the storage client to begin monitoring deposit times for the entity. 23. The method of claim 18 in which the deposit time is constrained to agree with the actual time that the deposit reaches a storage site, to within predetermined limits, except when the deposit time specified by the client is earlier than the latest deposit time of any existing version of the entity. 24. The method of claim 18 in which the entity is used to record the history of a file in a source file system, and an historical version of the file is added from a separate record of the file system's history with a deposit time that precedes the most current version of the entity. 25. The method of claim 24 or 21 in which the storage client deposits records of a source file system's history into the disk-based distributed storage system, with entities corresponding to files and directories, and the deposit times specified for versions of entities correspond to times associated with the records. 26. The method of claim 25 in which two distinct entities, each of which holds records of the content of a file in the source file system during different time intervals, are linked within a third entity. 27. The method of claim 26 in which the third entity records information about a state of a directory in the source file system. 28. The method of claim 16 in which the deposit time is based on the time the deposit reaches a storage site. 29. The method of claim 16 in which the entity version is a non-current version and the expiration time assigned to the non-current version depends on when it was superseded as the current version. 30. The method of claim 29 in which the expiration time assigned to the non-current version depends on the deposit time that was assigned to it when it was current. 31. The method of claim 30 in which the expiration time assigned to the non-current version depends on the deposit time assigned to the version that superseded it as the current version. 32. The method of claim 31 wherein the expiration time assigned to the non-current version depends on the length of the time interval during which the non-current version was current. 33. The method of claim 31 wherein a sequence of snapshot times are defined and the expiration time assigned to the non-current version depends upon the latest snapshot time at which the non-current version was current. 34. The method of claim 31 in which the non-current version makes reference to constituent blocks of stored content, with each block assigned a reference count which reflects the number of references there are to the block in any entity version. 35. The method of claim 34 in which the non-current version is deleted by a storage client, the reference counts assigned to its constituent blocks of stored content are decremented, and a block with reference count of zero is discarded and its storage space is reused. 36. The method of claim 35 in which the reference counts for blocks of stored content are incremented when the blocks are deposited. 37. The method of claim 35 in which the blocks of stored content are strings of bytes with a predetermined maximum length. 38. The method of claim 37 in which a block that is one of the blocks of stored content is referenced using a block name which depends upon a hash of the content of the block. 39. The method of claim 38 in which the block content has been encrypted using a key derived from its unencrypted content. 40. The method of claim 31 in which entity versions make reference to constituent blocks of stored content, with each block assigned a reference count which reflects the number of references there are to the block in current versions. 41. The method of claim 40 in which each block is also assigned an expiration time that depends on the latest of expiration times associated with entity versions which make reference to it. 42. The method of claim 41 in which a block which has a reference count of zero and an expiration time which has passed is discarded, and its storage space is reused. 43. The method of claim 42 in which the expiration time for a block of stored content is set to a default non-zero value when the block is deposited. 44. The method of claim 30 in which the expiration time assigned to the non-current version depends on the actual time when it was superseded as the current version. 45. The method of claim 29 wherein the expiration time depends on the deposit times of non-current versions of the entity. 46. The method of claim 16 in which the storage client supplies information that allows the storage system to associate a newer version of the entity with an older version that it supersedes as the current version. 47. The method of claim 46 in which the information supplied by the storage client allows the storage system to order the versions of the entity by deposit time. 48. The method of claim 46 in which the information supplied by the storage client that associates the newer version with the older superseded version is discarded while the two versions are retained. 49. The method of claim 16 in which the recorded entities are associated with entity version records, with each entity version record storing the association between an entity identifier freely chosen by a storage client and the versions of the entity. 50. The method of claim 49 in which each entity version record is assigned a reference count which reflects the number of references there are to the corresponding entity from within current entity versions. 51. The method of claim 50 in which each entity version record is also assigned an expiration time that depends on the latest of all of the expiration times associated with the versions of the entity recorded in the entity version record. 52. The method of claim 51 in which an entity version record with reference count of zero and an expiration time which has passed is discarded and the storage space is reused. 53. The method of claim 52 in which the expiration time for an entity version record is set to a default non-zero value when it is created. 54. The method of claim 1 in which a determination of the actual time at which the entity version is stored at a storage site that is one of the storage sites is made using a clock at the storage site, operating without reference to any time standards outside of the storage site. 55. The method of claim 1 in which a determination of the actual time at which the entity version is stored at a storage site that is one of the storage sites is made using a clock at the storage site, with a limit to a total correction applied to the clock per fixed period using time standards outside of the storage site. 56. The method of claim 1 wherein the expiration time is set by the storage client. 57. The method of claim 1 wherein a time interval during which the entity version is presumed to have been current is assigned by the storage client. 58. The method of claim 57 wherein the expiration time depends on the time interval. 59. The method of claim 1 in which the entity is a first entity, and a plurality of versions of the first entity which are deposited during a time interval all have their expiration times extended to at least a first expiration time. 60. The method of claim 59 in which a second entity which records hierarchical directory information including that of the first entity has a version deposited during the time interval which expires earlier than the first expiration time. 61. The method of claim 60 in which summary information is stored in a version of the second entity that does not expire before the first expiration time, that is sufficient to recreate hierarchical directory information of the version that does. 62. The method of claim 1 in which the connection between the entity version and a constituent block of content is not visible to a server storing the block of content. 63. The method of claim 1 in which the entity version is one of a plurality of versions of an entity corresponding to states of the entity at different moments of time. 64. The method of claim 63 in which the storage client deposits a version of the entity that is different than the entity version into the disk-based distributed data storage system. 65. The method of claim 1 in which the storage client causes the expiration time to be changed to a later time. 66. The method of claim 1 in which no action taken by any client that only communicates with the disk-based distributed data storage system over a wide area network can cause the expiration time assigned to the entity version to be changed to an earlier time. 67. The method of claim 1 in which a plurality of different expiration times are assigned to different entity versions. 68. The method of claim 1 in which the expiration times assigned to the entity version at two different storage sites are different. 69. The method of claim 1 in which each of the plurality of storage sites comprises one or more storage servers. 70. The method of claim 69 in which communication between the storage servers within a storage site is faster than communication between storage sites. 71. The method of claim 1 in which the expiration time is assigned a value that indicates that the entity version will never expire. 72. The method of claim 1 in which the expiration times have passed and the entity version is not deleted until the storage client requests that it be deleted. 73. The method of claim 1 in which the shared set of rules are communicated to the plurality of storage sites at the time that the entity version is stored at the plurality of storage sites. 74. The method of claim 1 in which the request communicated by the storage client is a request that would assign a new expiration time to the entity version that is earlier than the expiration time.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (31)
Johnson Judy J. ; McElroy ; Jr. James R., Computer based records management system method.
Farber David A. ; Lachman Ronald D., Data processing system using substantially unique identifiers to identify data items, whereby identical data items hav.
Almond Kenneth ; Wait Robert ; Thombre Atul ; Shaw Richung, Development system providing methods for managing different versions of objects with a meta model.
Howell William E. (N. Richland Hills TX) Reddy Hari N. (Grapevine TX) Wang Diana S. (Trophy Club TX), Dynamic verification of authorization in retention management schemes for data processing systems.
Ort James R. (Kenmore NY) Lange Douglas L. (Snyder NY) Kiefer Frederick W. (Williamsville NY) Dennison Raymond J. (West Seneca NY), Fingerprint identification system.
Coates, Joshua L.; Bozeman, Patrick E.; Patterson, David A., Method and apparatus for accessing remote storage in a distributed storage cluster architecture.
MacPhail Margaret G. (Austin TX), Method of assigning retention and deletion criteria to electronic documents stored in an interactive information handlin.
Yuasa, Aki; Takeda, Hideyuki; Noda, Ken; Iitsuka, Hiroyuki, Receiving apparatus that receives and accumulates broadcast contents and makes contents available according to user requests.
Bohannon Philip L. ; Leinbaugh Dennis W. ; Rastogi Rajeev ; Seshadri Srinivasan,INX ; Silberschatz Abraham ; Sudarshan Sundararajarao,INX, System and method for aging versions of data in a main memory database.
Zarmer Craig (Mountain View CA) Jones Anne (Redwood City CA) Arnold Kevin M. (Cupertino CA) Chambers Paul S. (San Jose CA) Eastwood Tom (Menlo Park CA) Helfinstein Ruth A. (Sunnyvale CA) Rusoff Jason, System for managing local database updates published to different online information services in different formats from.
Challenger, James Robert Harold; Ferstat, Cameron Donald; Iyengar, Arun Kwangil; Reed, Paul; Witting, Karen A., Systems and methods for publishing data with expiration times.
Bosley,Carleton J.; Wilken,Benjamin B.; Srivastava,Gitika, Systems, methods and programming for routing and indexing globally addressable objects and associated business models.
Gonzalez, Carlos J.; Bryce, Alan Douglas; Gorobets, Sergey Anatolievich; Bennett, Alan David, Adaptive deterministic grouping of blocks into multi-block units.
Gonzalez, Carlos J.; Bryce, Alan Douglas; Gorobets, Sergey Anatolievich; Bennett, Alan David, Adaptive deterministic grouping of blocks into multi-block units.
Anderson, Janice Carver; Hall, Renu Susan Babu; Williams, Alex Daniel; Harold, John Ambrose; Tulek, Anne Genevieve, Computer implemented method for accelerating electronic file migration from multiple sources to multiple destinations.
Tulek, Anne Genevieve; Anderson, Janice Carver; Greer, Jennifer Christian; Lloyd, Claudette Landry; Tulek, Ali Guray, Computer implemented method for forming an accelerated compliance plan with a graphic visualization.
Anderson, Janice Carver; Hall, Renu Susan Babu; Williams, Alex Daniel; Harold, John Ambrose; Tulek, Anne Genevieve, Computer implemented system for accelerating electronic file migration from multiple sources to multiple destinations.
Tulek, Anne Genevieve; Anderson, Janice Carver; Greer, Jennifer Christian; Lloyd, Claudette Landry; Tulek, Ali Guray, Computer implemented system for forming an accelerated compliance plan with a graphic visualization.
Islam, Nazrul; Paknad, Deidre; Raynaud-Richard, Pierre, External scoping sources to determine affected people, systems, and classes of information in legal matters.
Paknad, Deidre; Raynaud-Richard, Pierre; Pogodin, Andrey, Method and apparatus for managing the disposition of data in systems when data is on legal hold.
Stuart, Alan L.; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program for implementing retention policies to archive records.
Stuart, Alan; Marek, Toby Lyn; Hochberg, Avishai Haim; Cannon, David Maxwell; Martin, Howard Newton, Method, system, and program implementing retention policies to archive records.
Senthilnathan, Muthusamy; Thati, Ravi; Kumarasamy, Paramasivam; Mishra, Hemant, Predicting scale of data migration between production and archive storage systems, such as for enterprise customers having large and/or numerous files.
Raynaud-Richard, Pierre; Pogodin, Andrey, Providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.