IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0858418
(2004-06-01)
|
등록번호 |
US-7478263
(2009-01-13)
|
발명자
/ 주소 |
- Kownacki,Ronald William
- Bertschi,Jason S.
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
56 인용 특허 :
16 |
초록
▼
A system and method for permitting bi-directional failover in two node clusters utilizing quorum-based data replication. In response to detecting an error in its partner the surviving node establishes itself as the primary of the cluster and sets a first persistent state in its local unit. A tempora
A system and method for permitting bi-directional failover in two node clusters utilizing quorum-based data replication. In response to detecting an error in its partner the surviving node establishes itself as the primary of the cluster and sets a first persistent state in its local unit. A temporary epsilon value for quorum voting purposes is then assigned to the surviving node, which causes it to be in quorum. A second persistent state is stored in the local unit and the surviving node comes online as a result of being in quorum.
대표청구항
▼
What is claimed is: 1. A method for providing bi-directional failover for data replication services in a two node cluster, comprising: detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, where
What is claimed is: 1. A method for providing bi-directional failover for data replication services in a two node cluster, comprising: detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein in the high availability state a single node is designated as a stand alone node that is a full read/write replica for the data replication services of the cluster, thereby enabling management services reliant on updates for replicated data to function normally, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value. 2. The method of claim 1 wherein the failure in one of the nodes comprises a failure in communication between the nodes. 3. The method of claim 1 wherein the replicated data/services comprise a VFS location database. 4. The method of claim 1 wherein the replicated data/services comprises a management framework. 5. The method of claim 1 wherein the replicated data/services comprises a high availability manager. 6. The method of claim 1, wherein each node is healthy when the node is active and responding to one or more client requests. 7. The method of claim 1, wherein the epsilon value gives greater weight in voting to the node assigned the epsilon value. 8. A method for providing a bi-directional failover in a cluster comprising a first node and a second node, comprising: providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as active in the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; and setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster. 9. The method of claim 8 wherein the step of detecting the error condition further comprises: detecting a lack of a heartbeat signal from the second node. 10. The method of claim 8 wherein the first persistent state comprises a HA_PREACTIVE state. 11. The method of claim 8 wherein the local unit comprises a storage device. 12. The method of claim 8 further comprising: detecting, by the first node, the post-failure presence of the second node; performing a resynchronization routine between the first and second nodes; removing the temporary epsilon value from the first node; clearing the local cached activity lock from the local unit of the first node; clearing an activity lock from a D-blade; and wherein the first and second nodes are in quorum and capable of processing write operations. 13. The method of claim 12 wherein the step of performing a resynchronization routine between the first and second nodes exchanges deltas in order to ensure that both database replicas (RDB) on the first and second node are identical. 14. A computer readable medium for providing a bi-directional failover in a cluster comprising a first node and a second node, the computer readable medium including program instructions for performing the steps of: providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as a primary of the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; and setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster. 15. The computer readable medium of claim 14 wherein the computer readable medium further includes program instructions for performing the steps of: detecting, by the first node, the post-failure presence of the second node; performing a resynchronization routine between the first and second nodes; removing the temporary epsilon value from the first node; clearing the local cached activity lock from the local unit of the first node; clearing an activity lock from a D-blade; and wherein the first and second nodes are in quorum and capable of processing write operations. 16. A system for providing a bi-directional failover in a cluster comprising a first node and a second node, the system comprising: a storage operating system executed by a processor on the first node and the storage operating system having a replicated database (RDB), the RDB comprising a quorum manager configured to assign a temporary epsilon value to the first node in response to detecting an error condition in the second node, the temporary epsilon causing the first node to be in quorum and to allow the second node to come online to form the cluster between the first node and the second node, wherein the RDB further comprises a recovery manager configured to set a lock in a data structure identifying the first node as the owner of an HA activity lock in the cluster and further configured to set a first persistent state value in a local unit of the first node. 17. The system of claim 16 wherein the recovery manager is further configured to set a second persistent state to indicate that the recovery manager is in a HA_ACTIVE state in the local unit of the first node in response to the quorum manager assigning the temporary epsilon to the first node and thereby establishing quorum. 18. A computer readable medium for providing bi-directional failover among nodes of a two node replicated data cluster, the computer readable medium including program instructions for performing the steps of: detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein in the high availability state a single node is designated as a full read/write replica within the cluster data replication service, the full read/write replica modifying configuration information relating to one or more replicated services provided by the replicated services cluster, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value. 19. A system to provide bi-directional failover for data replication services in a two node cluster, comprising: in response to detecting a failure, a disk element module executed by a processor, the disk element module configured, to designate a first node of the two nodes as a stand alone node that is a full read/write replica for the data replication services of the cluster, thereby enabling management services reliant on updates for replicated data to function normally; and a quorum manager configured to assign a temporary epsilon value to the first node in response to detecting an error condition in a second node, the temporary epsilon causing the first node to be in quorum and to allow the second node to come online to form the cluster between the first node and the second node. 20. The system of claim 19 wherein the failure in one of the nodes comprises a failure in communication between the nodes. 21. The system of claim 19 wherein the replicated data/services comprise a VFS location database. 22. The system of claim 19 wherein the replicated data/services comprises a management framework. 23. The system of claim 19 wherein the replicated data/services comprises a high availability manager. 24. The system of claim 19, further comprising: an operating system to exit a conventional quorum state and enter a high availability state. 25. A method for providing bi-directional failover for data replication services in a two node cluster, comprising: detecting a failure of one of the nodes; and in response to detecting the failure, exiting a conventional quorum state and entering a high availability state, wherein a single node is designated as a full read/write replica for the data replication services of the cluster by storing in a lock associated with the single node in a disk element, thereby enabling management services reliant on updates for replicated data to function normally, and wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value. 26. A method for providing a bi-directional failover in a cluster comprising a first node and a second node, comprising: providing the first node and the second node configured in a conventional quorum state, wherein the conventional quorum state requires a majority of the nodes to be healthy to have quorum and neither node is initially configured with an epsilon value; detecting, by the first node, an error condition on the second node; setting, by the first node, a local cached activity lock identifying the first node as active in the cluster; setting a first persistent state in a local unit of the first node; assigning a temporary epsilon value to the first node, wherein the first node enters into quorum as a result of the temporary epsilon value; setting a second persistent state in the local unit of the first node, wherein the second persistent states is a high availability state where the first node is designated as a stand alone node that is a full read/write replica of the cluster; detecting, by the first node, the post-failure presence of the second node; performing a resynchronization routine between the first and second nodes; and removing the temporary epsilon value from the first node. 27. The method of claim 26 wherein the step of detecting the error condition further comprises: detecting a lack of a heartbeat signal from the second node. 28. The method of claim 26 wherein the first persistent state comprises a HA_PREACTIVE state. 29. The method of claim 26 wherein the local unit comprises a storage device. 30. The method of claim 26 further comprising: clearing the local cached activity lock from the local unit of the first node; clearing an activity lock from a D-blade; and wherein the first and second nodes are in quorum and capable of processing write operations. 31. The method of claim 26 wherein the step of performing a resynchronization routine between the first and second nodes exchanges deltas in order to ensure that both database replicas (RDB) on the first and second node are identical.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.