IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0316680
(2014-06-26)
|
등록번호 |
US-9529882
(2016-12-27)
|
발명자
/ 주소 |
|
출원인 / 주소 |
- Amazon Technologies, Inc.
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
0 인용 특허 :
10 |
초록
▼
A target commit sequence number (CSN) to be used to synchronize state information pertaining to an application among nodes of a state replication group (SRG) prior to a suspension of the SRG's operations is identified. Each node stores a respective commit record set of the application. Some number o
A target commit sequence number (CSN) to be used to synchronize state information pertaining to an application among nodes of a state replication group (SRG) prior to a suspension of the SRG's operations is identified. Each node stores a respective commit record set of the application. Some number of SRG nodes suspend operations after synchronizing their local commit records up to the CSN. A configuration manager of the SRG verifies that, subsequent to a suspension of operations at the nodes, at least a threshold number of the nodes are available for service and have updated their commit record sets. The configuration manager then re-activates the SRG.
대표청구항
▼
1. A system, comprising: one or more computing devices configured to: instantiate a state replication group (SRG) comprising a plurality of nodes to replicate state information of a particular application, wherein at least some nodes of the SRG store a respective commit record set of the application
1. A system, comprising: one or more computing devices configured to: instantiate a state replication group (SRG) comprising a plurality of nodes to replicate state information of a particular application, wherein at least some nodes of the SRG store a respective commit record set of the application, wherein each commit record of a commit record set comprises a commit sequence number (CSN) indicative of an order in which a state transition of the application was committed relative to other state transitions, and wherein the SRG comprises a committer node configured to commit requested state transitions;determine, at the committer node, in response to a detection that a threshold condition has been met, that state transition processing operations of the SRG are to be suspended;transmit, from the committer node to a fault-tolerant configuration manager of the SRG, a suspend request indicating a highest commit sequence number (HCSN) among the CSNs of the commit record set stored at the committer node;transmit a respective suspend command from the configuration manager to one or more other nodes of the SRG including a second node, wherein the suspend command indicates the HCSN;pause, in response to receiving a suspend command from the configuration manager, state transition processing operations at the second node;verify, by the second node, that the second node's commit record set includes a commit record with the HCSN; anddefer, by the committer node and the second node, further processing of state transition operations until a reactivation message is received from the configuration manager. 2. The system as recited in claim 1, wherein the detection that the threshold condition has been met comprises a determination that a metric is outside an acceptable range, wherein the metric comprises one or more of: (a) a number of active nodes of the SRG, (b) a rate of SRG configuration-delta messages received from the fault-tolerant configuration manager at a selected node of the SRG, or (c) a number of client connections to a selected node of the SRG. 3. The system as recited in claim 1, wherein the one or more computing devices are further configured to: determine, by the second node, in response to receiving the suspend command from the fault-tolerant configuration manager, that the commit record set at the second node does not include a commit record comprising HCSN; andrequest, by the second node from the committer node, one or more commit records including a commit record comprising the HCSN. 4. The system as recited in claim 1, wherein the second node comprises a first thread of execution at a particular host wherein the one or more computing devices are further configured to: restart the first thread of execution after verifying that the commit record comprising the HCSN is stored in the second node's commit record set. 5. The system as recited in claim 4, wherein the one or more computing devices are further configured to: determine, by the fault-tolerant configuration manager after the suspend command has been sent, that a number of available SRG nodes whose commit records have been updated up to the HCSN exceeds a threshold; andtransmit, by the fault-tolerant configuration manager to each node whose commit record set has been updated up to the HCSN, a respective re-activation request including a representation of a targeted configuration of the SRG. 6. A method, comprising: performing, by one or more computing devices: determining that state transition processing operations of a state replication group (SRG) comprising a plurality of nodes are to be suspended, wherein the SRG is designated to replicate state information comprising a respective commit record set of an application, wherein each commit record of the commit record set has an associated commit sequence number (CSN) indicative of an order in which the corresponding state transition of the application was committed at the SRG;identifying a target CSN up to which commit record sets of one or more nodes of the SRG are to be synchronized;transmitting a respective suspend command from a configuration manager of the SRG to the one or more other nodes of the SRG, wherein the suspend command indicates the target CSN;verifying, by a particular node of the one or more other nodes, that a commit record corresponding to the target CSN is stored in the particular node's commit record set; andsuspending state transition processing operations by the particular node. 7. The method as recited in claim 6, wherein said determining that the state transition processing operations of the SRG are to be suspended is responsive to a detection that a metric is outside an acceptable range, wherein the metric comprises one or more of: (a) a number of active nodes of the SRG, (b) a rate of SRG configuration-delta messages received from the configuration manager at a selected node of the SRG, or (c) a number of client connections to a selected node of the SRG. 8. The method as recited in claim 6, wherein said determining that the state transition processing operations of the SRG are to be suspended is performed at a committer node of the SRG, wherein the committer node is responsible for committing one or more requested state transitions of the application, and wherein the target CSN is the highest CSN among the CSNs of the commit record set of the committer node. 9. The method as recited in claim 6, further comprising performing, by the one or more computing devices prior to said verifying: determining, by the particular node, in response to receiving a suspend command from the configuration manager, that the commit record set at the second node does not include a commit record corresponding to the target CSN; andrequesting, by the particular node from a different node of the SRG, one or more commit records including a commit record corresponding to the target CSN. 10. The method as recited in claim 6, wherein the particular node comprises a first thread of execution at a particular host, further comprising performing, by the one or more computing devices: restarting, subsequent to said suspending, the first thread of execution. 11. The method as recited in claim 6, further comprising performing, by the one or more computing devices: receiving, at the configuration manager from the particular node subsequent to said verifying, a confirmation that the particular node has updated its commit record set up to the target CSN; andincluding, by the configuration manager in a collection of up-to-date nodes of the SRG, the particular node. 12. The method as recited in claim 11, further comprising performing, by the one or more computing devices: receiving, at the configuration manager subsequent to said suspending, respective messages from a second plurality of nodes of the SRG indicating that the respective nodes are available for service, wherein the second plurality of nodes includes a committer node, the particular node, and a third node;determining, by the configuration manager using the collection of up-to-date nodes, that the third node's commit record set does not include a commit record corresponding to the target CSN; andtransmitting, by the configuration manager to the third node, an indication of the target CSN. 13. The method as recited in claim 12, further comprising performing, by the one or more computing devices: receiving, at the configuration manager, a confirmation that the third node's commit record set has been updated up to the target CSN;determining, by the configuration manager that a number of available SRG nodes whose commit records have been updated up to the target CSN exceeds a threshold; andtransmitting, by the configuration manager to each node whose commit record sets have been updated up to the target CSN, a respective re-activation request including a representation of a targeted configuration of the SRG. 14. The method as recited in claim 6, wherein the plurality of nodes of the SRG comprise a directed acyclic graph that includes a replication pathway from an acceptor node to a committer node, further comprising performing, by the one or more computing devices: receiving, at the acceptor node prior to said determining that the state transition processing operations are to be suspended, a request from a client to commit a particular state transition of the application;storing, at the acceptor node, a record indicating that the particular state transition has been accepted for replication;propagating, from the second node via the replication pathway to the committer node, the request to commit the particular state transition;determining, by the committer node, that a number of nodes of the SRG at which a respective record indicative of the particular state transition has been stored is above a replication threshold, andstoring, at the committer node, a commit record corresponding to the particular state transition. 15. The method as recited in claim 6, wherein the application comprises one of: a database service, a logging service, or a control-plane component of a provider network service. 16. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors: determine a target commit sequence number (CSN) to be used to synchronize state information pertaining to a particular application among a plurality of nodes of a state replication group (SRG) prior to a suspension of application state transition processing operations at the SRG, wherein the plurality of nodes includes a first node and a second node, wherein each node of the first and second nodes stores a respective commit record set of the particular application, and wherein each commit record of the set has an associated respective CSN indicative of an order in which the corresponding state transition was committed at the SRG;store, by a configuration manager of the SRG at a persistent storage device, the target CSN;transmit, from the configuration manager to at least one node of the first node and the second node, a respective suspend command indicating the target CSN; andin response to an indication received at the configuration manager that, subsequent to a respective suspension of operations at the first node and the second node, the first node and the second node are available for resumption of operations, verify that a number of available nodes of the SRG whose commit record sets include a commit record corresponding to the target CSN exceeds a threshold; andtransmit a re-activation message to at least a subset of available nodes whose commit record sets include a commit record corresponding to the target CSN. 17. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the instructions when executed at the one or more processors: receive a request to suspend the application state transition processing operations from the first node, wherein the first node is a committer node responsible for committing a requested state transition, and wherein the request comprises the target CSN. 18. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the instructions when executed at the one or more processors: determine to suspend the application state transition processing operations in response to a detection that a metric is outside an acceptable range, wherein the metric comprises one or more of: (a) a number of active nodes of the SRG, (b) a rate of SRG configuration-delta messages received from the configuration manager at a selected node of the SRG, or (c) a number of client connections to a selected node of the SRG. 19. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the plurality of nodes of the SRG comprise a directed acyclic graph. 20. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the instructions when executed at the one or more processors utilize a consensus protocol to determine that the state transition processing operations of the SRG are to be suspended.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.