IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0370353
(2006-03-08)
|
등록번호 |
US-7647329
(2010-02-22)
|
발명자
/ 주소 |
- Fischman, Ami K.
- Vermeulen, Allan H.
|
출원인 / 주소 |
- Amazon Technologies, Inc.
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
124 인용 특허 :
14 |
초록
▼
A keymap service architecture for a distributed storage system. A system may implement storage nodes configured to store replicas of data objects, where each of the replicas is accessible via a respective locator value, and keymap instances each configured to store keymap entries corresponding respe
A keymap service architecture for a distributed storage system. A system may implement storage nodes configured to store replicas of data objects, where each of the replicas is accessible via a respective locator value, and keymap instances each configured to store keymap entries corresponding respectively to the data objects. A given keymap entry may indicate a mapping from a given key value corresponding to a given data object to each respective locator value of its replicas. Each of the keymap instances may store a replica of the given keymap entry and may index its respective stored keymap entries within a respective index data structure including hierarchically arranged index nodes corresponding to keymap entries. For a given keymap entry having a given corresponding index node, each tag value associated with each ancestor of the given corresponding index node may be a prefix of the given key value.
대표청구항
▼
What is claimed is: 1. A system, comprising: a plurality of computing nodes configured to implement: a plurality of storage nodes configured to store one or more replicas of each of a plurality of data objects, wherein each of said one or more replicas is accessible via a unique respective locator
What is claimed is: 1. A system, comprising: a plurality of computing nodes configured to implement: a plurality of storage nodes configured to store one or more replicas of each of a plurality of data objects, wherein each of said one or more replicas is accessible via a unique respective locator value, each node of the computing nodes comprising a processor and a memory; and a plurality of keymap instances each configured to store keymap entries corresponding respectively to said data objects, wherein a given one of said keymap entries corresponding to a given one of said plurality of data objects indicates a mapping from a given key value corresponding to said given data object to each said respective locator value of said one or more replicas of said given data object; wherein for said given data object, each of said plurality of keymap instances is configured to store a replica of said given keymap entry, such that said given keymap entry is replicated across said plurality of keymap instances; wherein each of said keymap instances is further configured to index its respective stored keymap entries within a respective index data structure; wherein each of said index data structures includes a respective plurality of index nodes arranged hierarchically and each index node having an associated tag value, wherein for each of said stored keymap entries, there exists a respective, uniquely corresponding one of said index nodes such that at most one keymap entry corresponds to any given one of said index nodes, and wherein at least some of the index data structures are unbalanced; wherein for each given one of the index nodes, each tag value associated with each ancestor of said given index node is a prefix of the key value that is associated with the given index node; and wherein a given one of said plurality of keymap instances selects a different one of said plurality of keymap instances at regular or irregular intervals and to at least partially synchronize said respective index data structures of said given keymap instance and said different keymap instance. 2. The system as recited in claim 1, wherein said plurality of computing nodes is further configured to implement a web services interface configured to present a data storage web service to clients, wherein said data storage web service includes one or more web services endpoints; wherein each web services endpoint is configured to implement a corresponding web services application programming interface (API) defining data storage web service operations that are available to ones of said clients via web services calls, and is addressable to receive said web services calls, formatted according to said web services API to specify one or more of said data storage web service operations, from ones of said clients according to an Internet-based application layer data transport protocol; wherein in response to receiving web services calls formatted according to said web services API to store data objects, each said web services endpoint is further configured to store data objects supplied by said clients within ones of said plurality of storage nodes; wherein each said web services endpoint is further configured to receive web services calls formatted according to said web services API that are indicative of client requests for access to ones of said data objects, wherein a given one of said client requests for access to a given one of said data objects includes a key value corresponding to said given data object. 3. The system as recited in claim 1, wherein each of said plurality of keymap instances includes a respective plurality of host systems, and wherein for said given data object, each of said plurality of keymap instances is further configured to store a plurality of replicas of said given keymap entry distributed among said respective plurality of host systems. 4. The system as recited in claim 3, wherein a given one of said plurality of keymap instances is configured to receive a keymap entry put operation including said given key value and one or more keymap entry parameters, and wherein in response to receiving said keymap entry put operation, said given keymap instance is further configured to generate update operations to update each of said plurality of replicas of said given keymap entry according to said one or more keymap entry parameters. 5. The system as recited in claim 4, wherein said given keymap instance is further configured to indicate that said keymap entry put operation is complete in response to determining that at least a quorum number of said update operations have completed with respect to said plurality of replicas of said given keymap entry. 6. The system as recited in claim 3, wherein a given one of said plurality of keymap instances is configured to receive a keymap entry get operation including said given key value, and wherein in response to receiving said keymap entry get operation, said given keymap instance is further configured to generate retrieval operations to retrieve each of said plurality of replicas of said given keymap entry. 7. The system as recited in claim 6, wherein said given keymap instance is further configured to: determine whether at least a quorum number of said retrieval operations have completed with respect to said plurality of replicas of said given keymap entry; if at least said quorum number of said retrieval operations have completed, determine whether said retrieved replicas differ; and if said retrieved replicas do not differ, return one of said retrieved replicas and indicate that said keymap entry get operation is complete. 8. The system as recited in claim 7, wherein in response to determining that said retrieved replicas differ, said given keymap instance is further configured to: select a particular one of said retrieved replicas according to a selection criterion; initiate a keymap entry put operation specifying said given key value and said particular retrieved replica; and return said particular retrieved replica and indicate that said keymap entry get operation is complete. 9. The system as recited in claim 3, wherein a given one of said keymap instances is configured to, in response to detecting an update operation directed to said given keymap instance, forward said update operation to one or more other ones of said keymap instances according to a forwarding protocol. 10. The system as recited in claim 9, wherein said forwarding protocol does not guarantee that said update operation will be received by each of said one or more corresponding host systems associated with said given host system. 11. The system as recited in claim 1, wherein said given one of said plurality of keymap instances is further configured to select said different keymap instance at random and to at least partially synchronize said respective index data structures of said given keymap instance and said different keymap instance according to an anti-entropy synchronization protocol. 12. The system as recited in claim 1, wherein for a given one of said index data structures, each of said plurality of index nodes included within said given index data structure is associated with a respective count value, wherein for a given one of said index nodes, said respective count value is indicative of a number of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 13. The system as recited in claim 1, wherein for a given one of said index data structures, each of said plurality of index nodes included within said given index data structure is associated with a respective fingerprint value, wherein for a given one of said index nodes, said respective fingerprint value is indicative of a hash value computed according to a hash algorithm, wherein said hash value is computed as a function of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 14. The system as recited in claim 1, wherein a given one of said index data structures is stratified among a plurality of index tree data structures, wherein a root node of said given index data structure corresponds to a root node of a first one of said plurality of index tree data structures, and wherein each of a plurality of leaf nodes of said first index tree data structure corresponds to a root node of a respective one of said index tree data structures. 15. The system as recited in claim 14, wherein at least two of said plurality of index tree data structures are stored on different ones of said plurality of computing nodes. 16. The system as recited in claim 1, wherein for a given one of said index nodes having a first and a second child node, a number of descendants of said first child node does not equal a number of descendants of said second child node. 17. A method, comprising: one or more computer systems performing: storing one or more replicas of each of a plurality of data objects, wherein each of said one or more replicas is accessible via a unique respective locator value; storing respective replicas, within each of a plurality of keymap instances, of keymap entries corresponding respectively to said data objects, such that each of said keymap entries is replicated across said plurality of keymap instances, wherein a given one of said keymap entries corresponding to a given one of said plurality of data objects indicates a mapping from a given key value corresponding to said given data object to each said respective locator value of said one or more replicas of said given data object; indexing stored keymap entries of each of said keymap instances within a respective index data structure, wherein at least some of index data structures are unbalanced; and a given one of said plurality of keymap instances selecting a different one of said plurality of keymap instances at regular or irregular intervals and at least partially synchronizing said respective index data structures of said given keymap instance and said different keymap instance; wherein each of said index data structures includes a respective plurality of index nodes arranged hierarchically and each index node having an associated tag value, wherein for each of said stored keymap entries, there exists a respective, uniquely corresponding one of said index nodes such that at most one keymap entry corresponds to any given one of said index nodes; and wherein for each given one of the index nodes, each tag value associated with each ancestor of said given index node is a prefix of the key value that is associated with the given index node. 18. The method as recited in claim 17, further comprising: presenting a data storage web service to clients, wherein said data storage web service includes one or more web services endpoints; wherein each web services endpoint implements a corresponding web services application programming interface (API) defining data storage web service operations that are available to ones of said clients via web services calls, and is addressable to receive said web services calls, formatted according to said web services API to specify one or more of said data storage web service operations, according to an Internet-based application layer data transport protocol; and wherein in response to receiving web services calls formatted according to said web services API to store data objects, each said web services endpoint is configured to store data objects supplied by said clients within said data storage web service; receiving web services calls formatted according to said web services API that are indicative of client requests for access to ones of said data objects via said one or more web services endpoints according to said Internet-based application layer data transport protocol, wherein a given one of said client requests for access to a given one of said data objects includes a key value corresponding to said given data object. 19. The method as recited in claim 17, wherein each of said plurality of keymap instances includes a respective plurality of host systems, and wherein for said given data object, the method further comprises each of said plurality of keymap instances storing a plurality of replicas of said given keymap entry distributed among said respective plurality of host systems. 20. The method as recited in claim 19, further comprising: a given one of said plurality of keymap instances receiving a keymap entry put operation including said given key value and one or more keymap entry parameters; and in response to receiving said keymap entry put operation, said given keymap instance generating update operations to update each of said plurality of replicas of said given keymap entry according to said one or more keymap entry parameters. 21. The method as recited in claim 20, further comprising said given keymap instance indicating that said keymap entry put operation is complete in response to determining that at least a quorum number of said update operations have completed with respect to said plurality of replicas of said given keymap entry. 22. The method as recited in claim 19, further comprising: a given one of said plurality of keymap instances receiving a keymap entry get operation including said given key value; and in response to receiving said keymap entry get operation, said given keymap instance generating retrieval operations to retrieve each of said plurality of replicas of said given keymap entry. 23. The method as recited in claim 22, further comprising said given keymap instance: determining whether at least a quorum number of said retrieval operations have completed with respect to said plurality of replicas of said given keymap entry; if at least said quorum number of said retrieval operations have completed, determining whether said retrieved replicas differ; and if said retrieved replicas do not differ, returning one of said retrieved replicas and indicating that said keymap entry get operation is complete. 24. The method as recited in claim 23, further comprising: in response to determining that said retrieved replicas differ, said given keymap instance selecting a particular one of said retrieved replicas according to a selection criterion; said given keymap instance initiating a keymap entry put operation specifying said given key value and said particular retrieved replica; and said given keymap instance returning said particular retrieved replica and indicating that said keymap entry get operation is complete. 25. The method as recited in claim 19, further comprising a given one of said keymap instances, in response to detecting an update operation directed to said given keymap instance, forwarding said update operation to one or more other ones of said keymap instances according to a forwarding protocol. 26. The method as recited in claim 25, wherein said forwarding protocol does not guarantee that said update operation will be received by each of said one or more corresponding host systems associated with said given host system. 27. The method as recited in claim 17, wherein said given one of said plurality of keymap instances selecting said different keymap instance occurs at random and wherein said at least partially synchronizing said respective index data structures of said given keymap instance and said different keymap instance is performed according to an anti-entropy synchronization protocol. 28. The method as recited in claim 17, wherein for a given one of said index data structures, each of said plurality of index nodes included within said given index data structure is associated with a respective count value, wherein for a given one of said index nodes, said respective count value is indicative of a number of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 29. The method as recited in claim 17, wherein for a given one of said index data structures, each of said plurality of index nodes included within said given index data structure is associated with a respective fingerprint value, wherein for a given one of said index nodes, said respective fingerprint value is indicative of a hash value computed according to a hash algorithm, wherein said hash value is computed as a function of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 30. The method as recited in claim 17, wherein a given one of said index data structures is stratified among a plurality of index tree data structures, wherein a root node of said given index data structure corresponds to a root node of a first one of said plurality of index tree data structures, and wherein each of a plurality of leaf nodes of said first index tree data structure corresponds to a root node of a respective one of said index tree data structures. 31. The method as recited in claim 30, further comprising storing at least two of said plurality of index tree data structures on different ones of a plurality of computing nodes. 32. The method as recited in claim 17, wherein for a given one of said index nodes having a first and a second child node, a number of descendants of said first child node does not equal a number of descendants of said second child node. 33. A computer-accessible storage medium storing instructions, wherein the instructions are executable to: store, within a given one of a plurality of keymap instances, keymap entries corresponding respectively to a plurality of data objects, wherein each of said plurality of data objects corresponds to one or more replicas, wherein each of said one or more replicas is accessible via a unique respective locator value, and wherein a given one of said keymap entries corresponding to a given one of said plurality of data objects indicates a mapping from a given key value corresponding to said given data object to each said respective locator value of said one or more replicas of said given data object; replicate said keymap entries across said plurality of keymap instances; index stored keymap entries of said given keymap instance within an unbalanced index data structure; and select a different one of said plurality of keymap instances at regular or irregular intervals and at least partially synchronize said index data structure of said given keymap instance with an index data structure of said different keymap instance; wherein said index data structure includes a plurality of index nodes arranged hierarchically and each index node having an associated tag value, wherein for each of said stored keymap entries, there exists a respective, uniquely corresponding one of said index nodes such that at most one keymap entry corresponds to any given one of said index nodes; and wherein for each given one of the index nodes, each tag value associated with each ancestor of said given index node is a prefix of the key value that is associated with the given index node. 34. The computer-accessible storage medium as recited in claim 33, wherein the instructions are further executable to implement a web services interface configured to present a data storage web service to clients, wherein said data storage web service includes one or more web services endpoints; wherein each web services endpoint is configured to implement a corresponding web services application programming interface (API) defining data storage web service operations that are available to ones of said clients via web services calls, and is addressable to receive said web services calls, formatted according to said web services API to specify one or more of said data storage web service operations, from ones of said clients according to an Internet-based application layer data transport protocol; wherein in response to receiving web services calls formatted according to said web services API to store data objects, each said web services endpoint is further configured to store data objects supplied by said clients within the data storage web service; wherein each said web services endpoint is further configured to receive web services calls formatted according to said web services API that are indicative of client requests for access to ones of said data objects, wherein a given one of said client requests for access to a given one of said data objects includes a key value corresponding to said given data object. 35. The computer-accessible storage medium as recited in claim 33, wherein for said given data object, the instructions are further executable to instruct that a plurality of replicas of said given keymap entry be stored in a distributed fashion among a plurality of host systems included within said given keymap instance. 36. The computer-accessible storage medium as recited in claim 35, wherein the instructions are further executable to: receive a keymap entry put operation including said given key value and one or more keymap entry parameters; and in response to receiving said keymap entry put operation, generate update operations to update each of said plurality of replicas of said given keymap entry according to said one or more keymap entry parameters. 37. The computer-accessible storage medium as recited in claim 36, wherein the instructions are further executable to indicate that said keymap entry put operation is complete in response to determining that at least a quorum number of said update operations have completed with respect to said plurality of replicas of said given keymap entry. 38. The computer-accessible storage medium as recited in claim 35, wherein the instructions are further executable to: receive a keymap entry get operation including said given key value; and in response to receiving said keymap entry get operation, generate retrieval operations to retrieve each of said plurality of replicas of said given keymap entry. 39. The computer-accessible storage medium as recited in claim 38, wherein the instructions are further executable to: determine whether at least a quorum number of said retrieval operations have completed with respect to said plurality of replicas of said given keymap entry; if at least said quorum number of said retrieval operations have completed, determine whether said retrieved replicas differ; and if said retrieved replicas do not differ, return one of said retrieved replicas and indicate that said keymap entry get operation is complete. 40. The computer-accessible storage medium as recited in claim 39, wherein the instructions are further executable to: in response to determining that said retrieved replicas differ, select a particular one of said retrieved replicas according to a selection criterion; initiate a keymap entry put operation specifying said given key value and said particular retrieved replica; and return said particular retrieved replica and indicate that said keymap entry get operation is complete. 41. The computer-accessible storage medium as recited in claim 35, wherein said instructions are further executable to instruct said given one of said keymap instances to, in response to detecting an update operation directed to said given keymap instance, forward said update operation to one or more other ones of said plurality of keymap instances according to a forwarding protocol. 42. The computer-accessible storage medium as recited in claim 41, wherein said forwarding protocol does not guarantee that said update operation will be received by each of said one or more corresponding host systems associated with said given host system. 43. The computer-accessible storage medium as recited in claim 33, wherein the instructions are further executable to select said different keymap instance at random and at least partially synchronize said index data structure of said given keymap instance with an index data structure of said different keymap instance according to an anti-entropy synchronization protocol. 44. The computer-accessible storage medium as recited in claim 33, wherein each of said plurality of index nodes included within said index data structure is associated with a respective count value, wherein for a given one of said index nodes, said respective count value is indicative of a number of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 45. The computer-accessible storage medium as recited in claim 33, wherein each of said plurality of index nodes included within said index data structure is associated with a respective fingerprint value, wherein for a given one of said index nodes, said respective fingerprint value is indicative of a hash value computed according to a hash algorithm, wherein said hash value is computed as a function of ones of said plurality of index nodes that are hierarchical descendants of said given index node. 46. The computer-accessible storage medium as recited in claim 33, wherein said index data structure is stratified among a plurality of index tree data structures, wherein a root node of said index data structure corresponds to a root node of a first one of said plurality of index tree data structures, and wherein each of a plurality of leaf nodes of said first index tree data structure corresponds to a root node of a respective one of said index tree data structures. 47. The computer-accessible storage medium as recited in claim 46, wherein said instructions are further executable to instruct that at least two of said plurality of index tree data structures be stored on different ones of a plurality of computing nodes. 48. The computer-accessible storage medium as recited in claim 33, wherein for a given one of said index nodes having a first and a second child node, a number of descendants of said first child node does not equal a number of descendants of said second child node. 49. A computer-implemented keymap instance, comprising: a plurality of host computers configured to store keymap entries corresponding respectively to a plurality of data objects, wherein each of said plurality of data objects corresponds to one or more replicas, wherein each of said one or more replicas is accessible via a unique respective locator value, wherein a given one of said keymap entries corresponding to a given one of said plurality of data objects indicates a mapping from a given key value corresponding to said given data object to each said respective locator value of said one or more replicas of said given data object, and wherein said host computers are further configured to store a plurality of replicas of said given keymap entry in a distributed fashion among said host computers, each host computer of the host computers comprising a processor and a memory; a computer-implemented keymap coordinator configured to index stored keymap entries of said host computers within an unbalanced index data structure; wherein said keymap coordinator is configured to select a different keymap instance at regular or irregular intervals and to at least partially synchronize said unbalanced index data structure with an index data structure of said different keymap instance; wherein said index data structure includes a plurality of index nodes arranged hierarchically and each index node having an associated tag value, wherein for each of said stored keymap entries, there exists a respective, uniquely corresponding single one of said index nodes such that at most one keymap entry corresponds to any given one of said index nodes; and wherein for each given one of the index nodes, each tag value associated with each ancestor of said given index node is a prefix of the key value that is associated with the given index node.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.