IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0972634
(2010-12-20)
|
등록번호 |
US-8504535
(2013-08-06)
|
발명자
/ 주소 |
- He, Gang
- Sorenson, III, James Christopher
|
출원인 / 주소 |
- Amazon Technologies, Inc.
|
대리인 / 주소 |
Thomas | Horstemeyer, LLP
|
인용정보 |
피인용 횟수 :
23 인용 특허 :
1 |
초록
▼
Disclosed are various embodiments for employing an erasure coding storage scheme and a redundant replication storage scheme in a data storage system. Data objects that are greater than a size threshold and accessed less frequently than an access threshold are stored in an erasure coding scheme, whil
Disclosed are various embodiments for employing an erasure coding storage scheme and a redundant replication storage scheme in a data storage system. Data objects that are greater than a size threshold and accessed less frequently than an access threshold are stored in an erasure coding scheme, while data objects that are sized less than a size threshold or accessed more often than an access threshold are stored in a redundant replication storage scheme.
대표청구항
▼
1. A non-transitory computer-readable medium embodying a program executable in a computing device, the program comprising: code that generates an object size distribution of a plurality of data objects stored in a data storage system, the data storage system comprising at least one data store;code t
1. A non-transitory computer-readable medium embodying a program executable in a computing device, the program comprising: code that generates an object size distribution of a plurality of data objects stored in a data storage system, the data storage system comprising at least one data store;code that generates an access pattern distribution of the plurality of data objects;code that identifies from the object size distribution a first at least one object stored in the data storage system that is greater than a size threshold, the first at least one object stored in a first data replication scheme, the first data replication scheme comprising a redundant replication scheme wherein a copy of the first at least one object is stored in a plurality of data stores in the data storage system;code that identifies from the access pattern distribution whether the first at least one object is accessed less often than an access threshold frequency;code that stores the first at least one object in a second data replication scheme in the data storage system when the first at least one object exceeds the size threshold and the first at least one object is accessed less often than the access threshold frequency over a period of time, the second data replication scheme comprising an erasure coding scheme, wherein the at first least one data object is divided into a plurality of shards, each of the plurality of shards having a size less than an object size of the first at least one object and stored in a respective plurality of data stores in the data storage system;code that identifies from the object size distribution a second at least one object stored in the data storage system that is less than the size threshold, the second at least one object stored in the second data replication scheme;code that identifies from the access pattern distribution whether the second at least one object is accessed more often than the access threshold frequency; andcode that stores the second at least one object in the first data replication scheme in the data storage system when the first at least one object is either less than the size threshold or the second at least one object is accessed more often than the access threshold frequency over a period of time. 2. A system, comprising: at least one computing device; anda data storage application executable in the at least one computing device, the data storage application comprising: logic that generates an object size distribution of a plurality of data objects stored in a data storage system, the data storage system comprising at least one data store;logic that generates an access pattern distribution of the plurality of data objects;logic that identifies from the object size distribution at least one data object stored in the data storage system that is greater than a size threshold, the at least one data object stored in a first data replication scheme, the first data replication scheme comprising a redundant replication scheme wherein a copy of the at least one data object is stored in a plurality of data stores in the data storage system;logic that identifies from the access pattern distribution whether the at least one data object is accessed less often than an access threshold frequency; andlogic that stores the at least one data object in a second replication scheme in the data storage system when the at least one data object exceeds the size threshold and the at least one data object is accessed less often than the access threshold frequency over a period of time, the second replication scheme comprising an erasure coding scheme, wherein the at least one data object is divided into a plurality of shards, each of the plurality of shards having a size less than an object size of the at least one data object and stored in a respective plurality of data stores in the data storage system. 3. The system of claim 2, wherein the plurality of shards have a total size greater than or equal the at least one data object. 4. The system of claim 2, wherein the data storage application further comprises logic that stores one of the plurality of shards at least a subset of the at least one data store. 5. The system of claim 2, wherein the data storage application further comprises: logic that identifies a location of a subset of the shards in an index accessible to the data storage application;logic that retrieves the subset of the shards from the at least one data store; andlogic that reconstructs the at least one data object from the subset of the shards. 6. The system of claim 2, wherein the logic that generates the access pattern distribution of the plurality of data objects further comprises: logic that scans an access log of the data storage system over a specified period of time; andlogic that identifies at least one data object accessed within the specified period of time. 7. The system of claim 2, wherein the logic that generates the access pattern distribution of the plurality of data objects further comprises: logic that samples an access log of the data storage system over a specified period of time; andlogic that identifies at least one data object accessed within the specified period of time. 8. The system of claim 2, wherein the logic that generates the access pattern distribution of the plurality of data objects further comprises: logic that scans an index of a plurality of data objects stored in the data storage system, the index specifying a storage location in the data storage system of the objects and a most recent access of at least one of the objects; andlogic that identifies at least one data object accessed within a specified period of time. 9. The system of claim 2, wherein the data storage application further comprises: logic that receives a request to retrieve a data object from the data storage system;logic that determines whether a size of the data object is greater than the size threshold;logic that stores the data object according to the second replication scheme when the size is greater than the size threshold; andlogic that stores the data object according to the first data replication scheme when the size is less than the size threshold. 10. The system of claim 2, wherein the data storage application further comprises: logic that receives a request to retrieve a data object from the data storage system;logic that determines whether the data object has been accessed during the period of time more often than the access threshold frequency;logic that stores the data object according to the second replication scheme when the data object has been accessed less often than the access threshold frequency; andlogic that stores the data object according to the first data replication scheme when the data object has been accessed during the period of time more often than the access threshold frequency. 11. A method, comprising the steps of: receiving, in at least one computing device, a request to retrieve a data object from a data storage system comprising at least one data store;logging, in the at least one computing device, the request in an access log accessible to the at least one computing device;determining, in the at least one computing device, whether a size of the data object exceeds a size threshold;determining, in the at least one computing device, whether the data object is stored in a first replication scheme in the data storage system, the first replication scheme comprising a redundant replication scheme wherein a copy of the data object is stored in a plurality of data stores in the data storage system;encoding, in the at least one computing device, the data object in a second replication scheme when the size exceeds the size threshold, the second replication scheme comprising an erasure coding scheme, wherein the data object is divided into a plurality of shards, each of the plurality of shards having a size less than an object size of the data object and stored in a respective plurality of data stores in the data storage system; andstoring, in the data storage system, the data object in the second replication scheme. 12. The method of claim 11, further comprising the steps of: determining, in the at least one computing device, whether the size of the data object is less than the size threshold;determining, in the at least one computing device, whether the data object is stored in the second replication scheme in the data storage system;encoding, in the at least one computing device, the data object in the first replication scheme when the size is less than the size threshold; andstoring, in the data storage system, the data object in the first replication scheme. 13. The method of claim 11, further comprising the steps of: determining, in the at least one computing device, whether the data object has been accessed less often than an access threshold over a period of time; andstoring, in the data storage system, the data object in the second replication scheme if the data object has been accessed less often than the access threshold and the size exceeds the size threshold. 14. The method of claim 11, wherein the step of encoding the data object in the second replication scheme further comprises: dividing the data object into M fragments;generating N fragments from the M fragments;storing the N fragments and the M fragments in the at least one data store. 15. The method of claim 14, wherein the step of encoding the data object in the second replication scheme further comprises generating an index describing a location of the N fragments and the M fragments corresponding to the data object. 16. The method of claim 14, further comprising the step of reconstructing the data object using an erasure coding algorithm, the data object reconstructed using a first number of the N fragments and the M fragments, wherein the first number is at least equal to M. 17. The method of claim 11, wherein the step of encoding the data object in the second replication scheme further comprises generating N fragments from the data object, a total storage size of the N fragments being greater than or equal to a size of the data object. 18. The method of claim 17, wherein the step of storing the data object in the second replication scheme further comprises storing each one of the N fragments in a different one of the respective plurality of data stores in the data storage system. 19. The method of claim 17, further comprising the step of reconstructing the data object from a subset of the N fragments. 20. The method of claim 19, wherein the step of reconstructing the data object from the subset of the N fragments further comprises the step of retrieving a fragment from each of a subset of the respective plurality of data stores in the data storage system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.