IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0933883
(2001-08-20)
|
발명자
/ 주소 |
- Cramer, Samuel M.
- Schoenthal, Scott
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
77 인용 특허 :
28 |
초록
▼
The invention is a method for operating a file server system in a cluster mode that provides for relatively rapid and reliable takeover of a failed file server in the cluster by a partner file server when the failed file server has detected a fault that will cause it to shut down. This is done by th
The invention is a method for operating a file server system in a cluster mode that provides for relatively rapid and reliable takeover of a failed file server in the cluster by a partner file server when the failed file server has detected a fault that will cause it to shut down. This is done by the failed file server requesting the one file server to take over its file services after detecting the fault in its operations, the one file server letting the failed file server complete existing file service requests from clients, refusing further file service requests addressed to the failed file server, and the one file server taking over by transferring file service requests to the one file server. As part of this takeover the one file server takes on the identity of the failed filer and activates network interfaces and network addresses that replicate the failed server's network addresses.
대표청구항
▼
1. A method for use in a negotiated graceful takeover in a computer cluster having a first and second computer, the method comprising the steps of:detecting an operational fault at the first computer; requesting, from the first computer, in response to the operational fault, that the second computer
1. A method for use in a negotiated graceful takeover in a computer cluster having a first and second computer, the method comprising the steps of:detecting an operational fault at the first computer; requesting, from the first computer, in response to the operational fault, that the second computer take over for the first computer; requesting, from the second computer, that the first computer shut down; completing service requests at the first computer pending at the time the first computer was requested to shut down; transferring responsibilities of the first computer to the second computer; and shutting down the first computer. 2. The method as in claim 1, further comprising: monitoring, from the second computer, for any operational faults at the first computer.3. The method as in claim 1, further comprising: diagnosing, at the first computer, the operational fault of the first computer.4. The method as in claim 1, further comprising: requesting, from the first computer, that the second computer diagnose the operational fault of the first computer.5. The method as in claim 1, further comprising: diagnosing, at the second computer, the operational fault of the first computer.6. The method as in claim 1, further comprising: sending, from the first computer to the second computer, an indication of the type of operational fault detected at the first computer.7. The method as in claim 1, further comprising: determining, at the second computer, if the second computer can take over for the first computer before requesting the shut down of the first computer.8. The method as in claim 1, further comprising: refusing further service requests at the first computer after the first computer was requested to shut down.9. The method as in claim 1, further comprising: transferring access of a storage device for the first computer to the second computer.10. The method as in claim 1, further comprising: asserting, at the second computer, disk reservations of disks of the first computer.11. The method as in claim 1, further comprising: rerouting file service requests from the first computer to the second computer.12. The method as in claim 1, further comprising: activating, at the second computer, network interfaces and network addresses that replicate those of the first computer.13. The method as in claim 1, further comprising: initiating a countdown timer subsequent to the shut down request from the second computer.14. The method as in claim 13, further comprising: forcing the first computer to shut down in the event the first computer is still operating at the expiration of the countdown timer.15. The method as in claim 1, further comprising: detecting, at the second computer, the shut down of the first computer by the absence of a periodic heartbeat signal.16. The method as in claim 1, further comprising: storing, at the first computer, state information of the first computer prior to shutting down.17. The method as in claim 1, further comprising: sending periodic requests from the second computer to the first computer to remain shut down, after the first computer has shut down.18. The method as in claim 1, further comprising: requesting, from the first computer, that the second computer restore responsibilities of the first computer to the first computer.19. The method as in claim 1, further comprising: restoring responsibilities of the first computer to the first computer upon restart of the first computer.20. The method as in claim 1, further comprising: restoring responsibilities of the first computer to the first computer upon curing the operational fault of the first computer.21. The method as in claim 1, further comprising: using the first and second computers as a file servers.22. A storage system capable of performing a negotiated graceful takeover, the storage system comprising:a first computer; a second computer; a first processor for the first computer to i) detect an operational fault at the first computer, and ii) request, in response to the operational fault, that the second computer take over for the first computer, and a second processor for the second computer to i) request that the first computer shut down, ii) allow the first computer to complete service requests pending at the time the first computer was requested to shut down, iii) take over any responsibilities of the first computer, and iv) allow the first computer to shut down. 23. The storage system as in claim 22, further comprising: a failover monitor to monitor for any operational faults at the first computer.24. The storage system as in claim 22, further comprising: the first processor to diagnose the operational fault of the first computer.25. The storage system as in claim 22, further comprising: the first processor to request that the second computer diagnose the operational fault of the first computer.26. The storage system as in claim 22, further comprising: the second processor to diagnose the operational fault of the first computer.27. The storage system as in claim 22, further comprising: the first processor to send, to the second computer, an indication of the type of operational fault detected at the first computer.28. The storage system as in claim 22, further comprising: the second processor to determine if the second computer can take over for the first computer before requesting the shut down of the first computer.29. The storage system as in claim 22, further comprising: the first processor to refuse further service requests at the first computer after the first computer was requested to shut down.30. The storage system as in claim 22, further comprising:a storage device for the first computer; and an interconnect to transfer access of the storage device for the first computer to the second computer. 31. The storage system as in claim 22, further comprising: disks of the first computer, the disks to be reserved by the second computer while the first computer is shut down.32. The storage system as in claim 22, further comprising: an interconnect to reroute file service requests from the first computer to the second computer.33. The storage system as in claim 22, further comprising:network interfaces at the first computer; network addresses at the first computer; network interfaces at the second computer that replicate the network interfaces of the first computer; and network addresses at the second computer that replicate the network interfaces of the first computer, the network interfaces and addresses at the second computer that replicate the network interfaces and addresses of the first computer to be activated by the second computer while the first computer is shut down. 34. The storage system as in claim 22, further comprising: a countdown timer, the countdown timer to be initiated subsequent to the shut down request from the second computer.35. The storage system as in claim 34, further comprising: an interconnect to force the first computer to shut down in the event the first computer is still operating at the expiration of the countdown timer.36. The storage system as in claim 22, further comprising: an interconnect at the second computer to detect the shut down of the first computer by the absence of a periodic heartbeat signal.37. The storage system as in claim 22, further comprising: persistent memory at the first computer to store state information of the first computer prior to shutting down.38. The storage system as in claim 22, further comprising: an interconnect at the second computer to send periodic requests to the first computer to remain shut down, after the first computer has shut down.39. The storage system as in claim 22, further comprising: the first processor to request that the second computer restore responsibilities of the first computer to the first computer.40. The storage system as in claim 22, further comprising: an interconnect to restore responsibilities of the first computer to the first computer upon restart of the first computer.41. The storage system as in claim 22, further comprising: an interconnect to restore responsibilities of the first computer to the first computer upon curing the operational fault of the first computer.42. The storage system as in claim 22, further comprising: the first and second computers are file servers.43. A storage system capable of performing a negotiated graceful takeover, the storage system comprising:a first computer; a second computer; means for detecting an operational fault at the first computer; means for requesting, from the first computer, in response to the operational fault, that the second computer take over for the first computer; means for requesting, from the second computer, that the first computer shut down; means for completing service requests at the first computer pending at the time the first computer was requested to shut down; means for transferring responsibilities of the first computer to the second computer; and means for shutting down the first computer. 44. A computer readable media, comprising: the computer readable media containing instructions for execution in a processor for the method of,detecting an operational fault at a first computer; requesting, from the first computer, in response to the operational fault, that a second computer take over for the first computer; requesting, from the second computer, that the first computer shut down; completing service requests at the first computer pending at the time the first computer was requested to shut down; transferring responsibilities of the first computer to the second computer; and shutting down the first computer.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.