Computation refinement storage in a data storage system
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-011/00
G06F-011/10
출원번호
US-0570952
(2014-12-15)
등록번호
US-10198319
(2019-02-05)
발명자
/ 주소
Sieklucki, Mark Robert
Sorenson, III, James Christopher
Patel, Rajesh Shanker
Rahmany, Dave
출원인 / 주소
Amazon Technologies Inc.
대리인 / 주소
Kowert, Robert C.
인용정보
피인용 횟수 :
0인용 특허 :
4
초록▼
A storage manager may be used to perform a coding operation (e.g., an encoding operation or a decoding operation) on a data object using a refined version of a coding matrix in a storage system, such as an object-redundant storage system. The coding operation may include identifying a coding matrix
A storage manager may be used to perform a coding operation (e.g., an encoding operation or a decoding operation) on a data object using a refined version of a coding matrix in a storage system, such as an object-redundant storage system. The coding operation may include identifying a coding matrix for the data object and retrieving a refined version of the identified coding matrix from a data store of pre-refined coding matrices. The refined version of the coding matrix may identify two or more computations to be performed as part of the coding operation, where the two or more computations have common terms and operations. The coding operation may include determining one or more precomputed results of the two or more identified computations. The precomputed results may be used instead of performing the computations when performing the coding operation.
대표청구항▼
1. A system, comprising: a storage system comprising a plurality of storage devices, wherein the storage system is configured to store a plurality of data objects according to an erasure coding technique, wherein each respective data object of the plurality of data objects is stored as a respective
1. A system, comprising: a storage system comprising a plurality of storage devices, wherein the storage system is configured to store a plurality of data objects according to an erasure coding technique, wherein each respective data object of the plurality of data objects is stored as a respective plurality of shards such that the respective data object can be reconstructed from a particular number of shards that is fewer than a total number of shards for that data object, wherein each shard for the respective data object is stored on a different one of the plurality of storage devices than any other shard of the plurality of shards for the respective data object; anda storage manager implemented by one or more hardware processors and configured to receive a plurality of requests from one or more clients for respective data objects of the plurality of data objects, wherein in response to each individual request of the plurality of requests the storage manager is configured to: retrieve, while the storage devices storing the plurality of shards for the requested data object are each operational, a subset of the plurality of shards for the requested data object from the storage system, wherein the subset includes at least the particular number of shards;identify a decoding matrix for the requested data object, wherein the decoding matrix specifies a plurality of computations for reconstructing the data object from the subset of the plurality of shards;retrieve a refined version of the identified decoding matrix from a data store of pre-refined decoding matrices, wherein the refined version of the decoding matrix identifies two or more computations of the plurality of computations having common terms and operations according to the decoding matrix;determine one or more precomputed results for the two or more identified computations;perform computations according to the refined version of the decoding matrix to reconstruct the requested data object from the subset, wherein the storage manager is further configured to use the one or more precomputed results instead of performing the identified computations while reconstructing the requested data object; andreturn the reconstructed data object to the client. 2. The system of claim 1, wherein, to retrieve the refined version of the identified decoding matrix, the storage manager is configured generate a subset shard index corresponding to the subset and to query the data store of pre-refined decoding matrices using the subset shard index, andwherein the data store of pre-refined decoding matrices is configured, in response to the query, to: determine whether the subset shard index corresponds to a particular stored pre-refined decoding matrix of a plurality of stored pre-refined decoding matrices;in response to determining that the subset shard index corresponds to the particular stored pre-refined decoding matrix, return the particular stored pre-refined decoding matrix as the refined version of the decoding matrix; andin response to determining that that the subset shard index does not correspond to the plurality of stored pre-refined decoding matrices, return a miss indicator for the subset shard index. 3. The system of claim 2, wherein, in response to receiving the miss indicator for the subset shard index and prior to determining the one or more precomputed results, the storage manager is configured to: determine the decoding matrix for the requested data object;perform a refinement operation on the decoding matrix, wherein to perform the refinement operation the storage manager is further configured to: identify the two or more computations of the plurality of computations having common terms and operations according to the decoding matrix; andmodify the decoding matrix to indicate the two or more identified computations; andupdate the data store to include the modified decoding matrix as a refined version of the decoding matrix, wherein the modified decoding matrix is indexed at the data store according to the subset shard index. 4. The system of claim 1, wherein the storage system and storage manager are part of a network-based storage service, wherein respective data objects of the plurality of data objects are stored on behalf of a plurality of different clients according to a network-based interface of the network-based storage service, and wherein at least some of the plurality of storage devices store shards for multiple different clients of the plurality of different clients. 5. The system of claim 1, wherein, to determine the decoding matrix of the subset, the storage manager is further configured to: retrieve a stored erasure coding matrix corresponding to the data object; andgenerate the decoding matrix from the erasure coding matrix according to which shards of the plurality of shards form the subset, wherein each shard of the plurality of shards corresponds to a respective portion of the stored erasure coding matrix, wherein the decoding matrix is generated from respective portions of the stored erasure coding matrix. 6. A method, comprising: performing, by one or more computers: receiving a plurality of requests from one or more clients, wherein each request is for at least one respective data object of a plurality of data objects, wherein each respective data object of the plurality of data objects is stored as a respective plurality of shards, such that the respective data object can be reconstructed from a particular number of shards that is fewer than a total number of shards for that data object, wherein each shard of the total number of shards is stored on a different one of a plurality of storage devices than any other shard of the total number of shards; andin response to receiving each individual request of the plurality of requests retrieve, while the storage devices storing the plurality of shards for the at least one requested data object are each operational, a subset of the plurality of shards for the at least one requested data object from the storage system, wherein the subset includes at least the particular number of shards;identifying a decoding matrix for the at least one requested data object, wherein the decoding matrix specifies a plurality of computations for reconstructing the at least one requested data object from the subset of the plurality of shards;retrieving a refined version of the identified decoding matrix from a data store of pre-refined decoding matrices, wherein the refined version of the decoding matrix identifies two or more computations of the plurality of computations having common terms and operations according to the decoding matrix;determining one or more precomputed results for the two or more identified computations; andperforming computations according to the refined version of the decoding matrix to reconstruct the at least one requested data object from the subset of the plurality of shards, wherein the one or more precomputed results are used instead of performing the two or more identified computations;return the at least one reconstructed data object to the client. 7. The method of claim 6, wherein, prior to reconstructing the at least one requested data object from the subset of the plurality of shards, the data decoding system is further configured to:generate a plurality of other decoding matrices, wherein the plurality of other decoding matrices correspond to different respective subsets of the plurality of shards of the at least one requested data object, wherein each different respective subset includes at least a sufficient number of shards to reconstruct the at least one requested data object;refine the plurality of other decoding matrices; andstore the refined versions of the plurality of other decoding matrices at the data store of pre-refined coding matrices. 8. The method of claim 6, further comprising, prior to receiving the request for the at least one requested data object of the plurality of data objects: refining the decoding matrix, comprising: identifying the two or more computations of the plurality of computations having common terms and operations according to the decoding matrix; andmodifying the decoding matrix into the refined version of the decoding matrix, wherein the refined version of the decoding matrix indicates the two or more identified computations; andstoring the refined version of the decoding matrix at the data store of pre-refined decoding matrices. 9. The method of claim 8, wherein storing the refined version of the decoding matrix comprises replacing another refined version of a different decoding matrix at the data store of pre-refined decoding matrices with the refined version of the decoding matrix according to a cache eviction algorithm. 10. The method of claim 8, wherein refining the decoding matrix is performed in response to determining to refine the decoding matrix based on a size of the at least one requested data object, an expected number of times the decoding matrix will be used, one or more computation characteristics of the decoding matrix, or any combination thereof. 11. The method of claim 6, further comprising: wherein identifying the decoding matrix for the at least one requested data object further comprises determining which shards of the plurality of shards form the retrieved particular subset of the plurality of shards for the at least one requested data object, and wherein the decoding matrix specifies a plurality of computations for reconstructing the at least one data object from the retrieved subset of the plurality of shards. 12. The method of claim 6, wherein the data store of pre-refined decoding matrices stores a fixed group of pre-refined decoding matrices that includes the refined version of the decoding matrix. 13. The method of claim 6, wherein each computation of the two or more identified computations includes one or more exclusive or (XOR) operations, and wherein determining the one or more precomputed results for the two or more identified computations comprises performing the one or more XOR operations. 14. A non-transitory, computer-readable storage medium storing program instructions that when executed on one or more computers cause the one or more computers to: implement a data decoding system configured to: receive a plurality of requests from one or more clients, wherein each request is for at least one respective data object of a plurality of data objects, wherein each respective data object of the plurality of data objects is stored as a respective plurality of shards, such that the respective data object can be reconstructed from a particular number of shards that is fewer than a total number of shards for that data object, wherein each shard of the total number of shards is stored on a different one of a plurality of storage devices than any other shard of the total number of shards; andin response to each individual request of the plurality of requests: retrieve, while the storage devices storing the plurality of shards for the at least one requested data object are each operational, a subset of the plurality of shards for the at least one requested data object from the storage system, wherein the subset includes at least the particular number of shards;identify a decoding matrix for the at least one requested data object, wherein the decoding matrix specifies a plurality of computations for reconstructing the at least one requested data object from the subset of the plurality of shards;retrieve a refined version of the identified decoding matrix from a data store of pre-refined decoding matrices, wherein the refined version of the decoding matrix identifies two or more computations of the plurality of computations having common terms and operations according to the decoding matrix;determine one or more precomputed results for the two or more identified computations; andperform computations according to the refined version of the decoding matrix to reconstruct the at least one requested data object from the subset of the plurality of shards, wherein the one or more precomputed results are used instead of performing the two or more identified computations;return the at least one reconstructed data object to the client. 15. The non-transitory, computer-readable storage medium of claim 14: wherein identifying the decoding matrix for the at least one requested data object further comprises determining which shards of the plurality of shards form the retrieved subset of the plurality of shards for the at least one requested data object, and wherein the decoding matrix specifies a plurality of computations for reconstructing the at least one data object from the retrieved subset of the plurality of shards. 16. The non-transitory, computer-readable storage medium of claim 15, wherein, prior to reconstructing the at least one requested data object from the subset of the plurality of shards, the data coding system is further configured to: generate the decoding matrix for the at least one requested data object;refine the decoding matrix, comprising: identifying the two or more computations of the plurality of computations having common terms and operations according to the decoding matrix; andmodifying the decoding matrix into the refined version of the decoding matrix, wherein the refined version of the decoding matrix indicates the two or more identified computations; andstore the refined version of the decoding matrix at the data store of pre-refined decoding matrices. 17. The non-transitory, computer-readable storage medium of claim 16, wherein refining the decoding matrix is performed in response to determining to refine the decoding matrix based on a size of the at least one requested data object, an expected number of times the decoding matrix will be used, one or more computation characteristics of the decoding matrix, or any combination thereof. 18. The non-transitory, computer-readable storage medium of claim 17, further comprising determining a refinement complexity for the decoding matrix based on the size of the at least one requested data object, the expected number of times the decoding matrix will be used, the one or more computation characteristics of the decoding matrix, or any combination thereof, wherein refining the decoding matrix is limited according to the refinement complexity. 19. The non-transitory, computer-readable storage medium of claim 18, wherein the refinement complexity pertains to a thoroughness of refinement comprising a refinement algorithm used to refine the decoding matrix, an amount of time spent refining the decoding matrix, a number of computations of the two or more computations to identify, or any combination thereof. 20. The non-transitory, computer-readable storage medium of claim 16, wherein, prior to reconstructing the at least one requested data object from the subset of the plurality of shards, the data decoding system is further configured to: generate a plurality of other decoding matrices, wherein the plurality of other decoding matrices correspond to different respective subsets of the plurality of shards of the at least one requested data object, wherein each different respective subset includes at least a sufficient number of shards to reconstruct the at least one requested data object;refine the plurality of other decoding matrices; andstore the refined versions of the plurality of other decoding matrices at the data store of pre-refined coding matrices. 21. The non-transitory, computer-readable storage medium of claim 20, wherein the plurality of other decoding matrices and the decoding matrix together represent a complete set of combinations of the plurality of shards of the at least one requested data object for reconstructing the at least one requested data object. 22. The non-transitory, computer-readable storage medium of claim 20, wherein, after receiving the request for the at least one requested data object of the plurality of data objects, the data decoding system is further configured to, prior to retrieving the subset of the plurality of shards for the at least one requested data object, requesting a group of shards of the plurality of shards that is fewer than the total number of shards of the plurality of shards,wherein the group of shards includes all shards of the subset and all shards of each respective subset corresponding to each respective other decoding matrix.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (4)
Dayal, Shobhit; Lee, Edward K.; Gritter, Mark G., Avoiding long access latencies in redundant storage systems.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.