System and method for monitoring cluster partner boot status over a cluster interconnect
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/167
G06F-015/16
출원번호
US-0650005
(2007-01-05)
등록번호
US-7437423
(2008-10-14)
발명자
/ 주소
Gole,Abhijeet
출원인 / 주소
Network Appliance, Inc.
대리인 / 주소
Cesari and McKenna, LLP
인용정보
피인용 횟수 :
5인용 특허 :
39
초록▼
A method for detecting an un-bootable first computer is described. A failed first computer initiates a boot procedure, and the boot procedure is controlled by boot firmware of the first computer. A virtual interface is established by the boot firmware, the virtual interface having boot status data w
A method for detecting an un-bootable first computer is described. A failed first computer initiates a boot procedure, and the boot procedure is controlled by boot firmware of the first computer. A virtual interface is established by the boot firmware, the virtual interface having boot status data written therein as the failed computer boots. A second computer reads the boot status data in the virtual interface using a remote direct memory access procedure to access data in the virtual interface. The second computer determines, in response to the boot status data, if the boot procedure of the first computer failed, and if it failed performing a failover routine; and if it succeeded allowing the failed computer to complete its boot procedure. Another connection between the first computer and the second computer is opened, in response to the boot procedure succeeding, using higher level software than the boot firmware.
대표청구항▼
What is claimed is: 1. A method for detecting a failed computer, comprising: initiating a boot procedure by a first computer, the boot procedure controlled by boot firmware of the first computer; establishing a virtual interface by the boot firmware, the virtual interface having boot status data wr
What is claimed is: 1. A method for detecting a failed computer, comprising: initiating a boot procedure by a first computer, the boot procedure controlled by boot firmware of the first computer; establishing a virtual interface by the boot firmware, the virtual interface having boot status data written therein as the first computer proceeds through the boot procedure; and reading the boot status data in the virtual interface by a second computer, the second computer using a remote direct memory access procedure to access data in the virtual interface. 2. The method of claim 1, further comprising: deciding by the second computer, in response to the boot status data, whether to perform a takeover of the first computer. 3. The method of claim 1, further comprising: deciding by the second computer, in response to the boot status data, whether to permit the first computer to continue the boot procedure. 4. The method of claim 1, further comprising: determining by the second computer if the boot procedure of the first computer failed; performing, in response to determining that the boot procedure of the first computer failed, a failover routine. 5. The method of claim 1, further comprising: determining by the second computer if the boot procedure of the first computer failed; allowing, in response to determining that the boot procedure of the first computer succeeded, the first computer to complete its boot procedure. 6. The method of claim 4, further comprising: opening, in response to the boot procedure succeeding, a connection between the first computer and the second computer using higher level software than the boot firmware. 7. A system to detect a failed computer, comprising: a first computer, the first computer initiating a boot procedure controlled by boot firmware of the first computer; a virtual interface established by the boot firmware, the virtual interface having boot status data written therein as the first computer proceeds through the boot procedure; and a second computer to read the boot status data in the virtual interface by using a remote direct memory access procedure to access data in the virtual interface. 8. The system of claim 7, further comprising: the second computer to decide, in response to the boot status data, whether to perform a takeover of the first computer. 9. The system of claim 7, further comprising: the second computer to decide, in response to the boot status data, whether to permit the first computer to continue the boot procedure. 10. The system of claim 7, further comprising: the second computer to determine if the boot procedure of the first computer failed, and to perform, in response to determining that the boot procedure of the first computer failed, a failover routine. 11. The system of claim 7, further comprising: determining by the second computer if the boot procedure of the first computer failed; allowing, in response to determining that the boot procedure of the first computer succeeded, the first computer to complete its boot procedure. 12. The system of claim 7, further comprising: a connection between the first computer and the second computer opened, in response to the boot procedure succeeding, using higher level software than the boot firmware. 13. A computer readable media, comprising: said computer readable media containing instructions for execution on a processor for the practice of a method of detecting a failed computer, the method having the steps of, initiating a boot procedure by a first computer, the boot procedure controlled by boot firmware of the first computer; establishing a virtual interface by the boot firmware, the virtual interface having boot status data written therein as the first computer proceeds through the boot procedure; and reading the boot status data in the virtual interface by a second computer, the second computer using a remote direct memory access procedure to access data in the virtual interface. 14. A method for detecting a failed computer, comprising: initiating a boot procedure by a first computer, the boot procedure controlled by boot firmware of the first computer; establishing a virtual interface by the boot firmware, the virtual interface having boot status data written therein as the first computer proceeds through the boot procedure; reading the boot status data in the virtual interface of the first computer by a second computer to ascertain whether the first computer's boot procedure is progressing normally; in response, if the second computer determines that the first computer's boot procedure is progressing normally, the first computer completes its initialization routine; and in response, if the second computer determines that the first computer's boot procedure is not progressing normally, the second computer will perform a failover routine. 15. A method for detecting a failed computer, comprising: assigning a first virtual interface (VI) for failure recovery to the first computer and a second VI for failure recovery to the second computer; establishing during booting by the first computer, after a failure by the first computer, a VI for connection to the second VI of the second computer; and reading status information by the second computer of the first computer through the first VI of the first computer, the second computer using its second VI to communicate with the first VI of the first computer. 16. The method of claim 15, further comprising: deciding by the second computer, in response to the reading the status information, whether to perform a takeover of the first computer. 17. The method of claim 15, further comprising: deciding by the second computer, in response to the reading the status information, whether to permit the first computer to continue the boot procedure.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (39)
Uzrad-Nali, Oran; Har-Chen, Dror, Apparatus and method for receive transport protocol termination.
Maximino Aguilar ; Norbert M. Blam ; Michael Edward Criscolo ; Sanjay Gupta ; John William Gorrell, Jr. ; Roy Moonseuk Kim ; James Michael Stafford, Boot sequence for a network computer including prioritized scheduling of boot code retrieval.
Byers Russell Francis,CAX ; Duchaine Joseph Marcel Gilles,CAX ; Schuett Michael Leonard,CAX ; Grootenboer Cornelius Jacob,GBX, Method and controller for controlling shutdown of a processing unit.
Ohran Richard S. ; Rollins Richard N. ; Ohran Michael R. ; Marsden Wally, Method for improving recovery performance from hardware and software errors in a fault-tolerant computer system.
Hitz David ; Malcolm Michael ; Lau James ; Rakitzis Byron, Method for maintaining consistent states of a file system and for creating user-accessible read-only copies of a file s.
Wallach Walter A. ; Findlay Bruce ; Pellicer Thomas J. ; Chrabaszcz Michael, Method for providing a fault tolerant network using distributed server processes to remap clustered network resources to other servers during server failure.
McCown Patricia M. (Cresskill NJ) Conway Timothy J. (Highland Park NJ) Jessen Karl M. (Bayonne NJ), Methods and apparatus for monitoring system performance.
Ekrot Alexander C. ; Singer James H. ; Hemphill John M. ; Autor Jeffrey S. ; Galloway William C. ; Alexander Dennis J., Multi-server fault tolerance using in-band signalling.
Hitz David (Sunnyvale CA) Schwartz Allan (Saratoga CA) Lau James (Cupertino CA) Harris Guy (Mountain View CA), Multiple facility operating system architecture.
Hitz David ; Schwartz Allan ; Lau James ; Harris Guy, Multiple software-facility component operating system for co-operative processor control within a multiprocessor computer system.
Row Edward J. (Mountain View CA) Boucher Laurence B. (Saratoga CA) Pitts William M. (Los Altos CA) Blightman Stephen E. (San Jose CA), Parallel I/O network file server architecture.
Row Edward J. (Mountain View CA) Boucher Laurence B. (Saratoga CA) Pitts William M. (Los Altos CA) Blightman Stephen E. (San Jose CA), Parallel I/O network file server architecture.
Beardsley Brent Cameron (Tucson AZ) Hathorn Roger Gregory (Tucson AZ) Holley Bret Wayne (Tucson AZ) Iskiyan James Lincoln (Tucson AZ), Remote copy system for setting request interconnect bit in each adapter within storage controller and initiating request.
Clowes Richard F. (New York NY) Tims Fred W. (Springfield Center NY), Workstation-implemented data storage re-routing for server fault-tolerance on computer networks.
Yalamanchili, Chaitanya; Dash, Prasanta; Jagtap, Asmita; Kasina, Sudhakar, Systems and methods for transferring input/output operations within computer clusters.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.