IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0082043
(2005-03-16)
|
등록번호 |
US-7395455
(2008-07-01)
|
우선권정보 |
GB-0405941.6(2004-03-17) |
발명자
/ 주소 |
- Nash,Richard John
- Noble,Gary Paul
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
5 |
초록
▼
System, method and computer program product for recovering from a failure of a computing device. Start up of a first component of the device is monitored and a determination is made whether the first component has started successfully. If so, a second, higher level component of the device is started
System, method and computer program product for recovering from a failure of a computing device. Start up of a first component of the device is monitored and a determination is made whether the first component has started successfully. If so, a second, higher level component of the device is started. Operational data received from the second component is monitored. If the operational data falls outside of an operational boundary, an action is performed on the second component to enable the second component to operate within a preferred operational boundary. If the first component does not start up successfully, a determination is made if start up of the first component is critical to operation of the second component. If so, a corrective action is performed relative to the first component and afterwards, an attempt is made to start up the second component.
대표청구항
▼
The invention claimed is: 1. A method for recovering from a failure of a computing device, the method comprising the steps of: monitoring, by a first recovery component installed on said computing device, start up of a first component of said computing device, said first component including a plura
The invention claimed is: 1. A method for recovering from a failure of a computing device, the method comprising the steps of: monitoring, by a first recovery component installed on said computing device, start up of a first component of said computing device, said first component including a plurality of low level components; determining, using operational data gathered by said first recovery component from said plurality of low level components, whether or not start up of said first component in said computing device is successful, wherein if said start up of said first component is determined to be successful, initiating start up of a second component of said computing device, said second component including a plurality of high level components, and wherein if said start up of said first component is determined to be unsuccessful, determining if start up of said first component is critical to operation of said second component, wherein if said start up of said first component is determined to be critical to operation of said second component, performing a corrective action with respect to said first component for initiating said start up of said second component; monitoring, by a second recovery component installed on said computing device, operational data gathered from said plurality of high level components of said second component; determining whether or not said operational data monitored for said second component falls outside of an operational boundary and, if said operational data monitored for said second component is determined to fall outside of said operational boundary, performing an action on said second component to enable said second component to operate within said operational boundary. 2. A method as claimed in claim 1, further comprising the step of: logging a status of each of said plurality of low level components of said first component for determining whether or not start up of said first component is successful. 3. A method as claimed in claim 2, wherein if said start up of said first component is determined to be unsuccessful and if said start up of said first component is determined to be critical to operation of said second component, further comprising the steps of: disabling said first component; and communicating a message to an external system requesting assistance. 4. A method as claimed in claim 3, further comprising the step of: logging a status of each of said plurality of high level components of said second component for determining whether or not said operational data monitored for said second component falls outside of said operational boundary, and wherein said operational data monitored for said second component is based on one or more predefined, programmed rules which trigger performance of said action on said second component to enable said second component to operate within said operational boundary. 5. A method as claimed in claim 4, wherein if said start up of said first component is determined to be unsuccessful and if said start up of said first component is determined to be not critical to operation of said second component, further comprising the steps of: sending, by said first recovery component, a message to said second recovery component, to start said second component; and transferring recovery control for said computing device from said first recovery component to said second recovery component. 6. A computer program product for recovering from a failure of a computing device, said computer program product comprising: a computer readable storage medium; first program instructions to monitor start up of a first component comprising of a plurality of low level components of said computing device, said first program instructions including instructions to determine whether or not said start up of said first component is successful, and if said start up of said first component is determined to be successful, initiating start up of a second component of said computing device, said second component including a plurality of high level components, and wherein if said start up of said first component is determined to be unsuccessful, said first program instructions include instructions to determine if start up of said first component is critical to operation of said second component, wherein if said start up of said first component is determined to be critical to said operation of said second component, said first program instructions include instructions to perform a corrective action with respect to said first component for initiating said start up of said second component; second program instructions to start up said second component of said computing device responsive to a determination of successful start of said first component, said second program instructions including instructions to monitor operational data received from said second component and to determine whether or not said operational data monitored falls outside of an operational boundary; and third program instructions, responsive to said operational data falling outside of said operational boundary, to perform an action on said second, component to enable said second component to operate within a preferred operational boundary; and wherein said first, second and third program instructions are stored on said medium for execution by said computing device. 7. A computer program product as claimed in claim 6, further comprising: fourth program instructions to log a status of each of said plurality of low level components of said first component for determining whether or not said start up of said first component is successful; and wherein said fourth program instructions are stored on said medium for execution by said computing device. 8. A computer program product as claimed in claim 7, wherein said fourth program instructions include instructions, responsive to said first component not starting up successfully and said first component being critical to operation of said second component, to disable said first component and communicate a message to an external system requesting assistance. 9. A computer program product as claimed in claim 8, wherein said fourth program instructions include instructions to log a status of each of said plurality of high level components of said second component for determining whether or not said operational data monitored for said second component falls outside of said operational boundary. 10. A computer program product as claimed in claim 9, wherein said operational data monitored for said second component is based on one or more predefined, programmed rules which trigger performance of said action on said second component to enable said second component to operate within said operational boundary. 11. A computer system for recovering from a failure of a computing device, said computer system comprising: a first recovery program for monitoring start up of a first component of said computing device, said first component comprising a plurality of low level components, wherein said first recovery program initiates start up of a second component that comprises a plurality of high level components, upon making a determination, based on operational data monitored for said first component, that start up of said first component is successful, and wherein if said start up of said first component is determined to be unsuccessful and upon making a determination that said start up of said first component is critical to operation of said second component, said first recovery program performs a corrective action with respect to said first component; and a second recovery program for monitoring said start up of said second component of said computing device, wherein said second recovery program determines, based on operational data monitored for said second component, whether or not said start up of said second component falls outside of an operational boundary, and wherein if said start up of said second component falls outside of said operational boundary, said second recovery program performs an action on said second component to enable said second component to operate within said operational boundary. 12. A system as claimed in claim 11, further comprising: a first log component for logging a status of each of said plurality of low level components of said first component for determining whether or not said start up of said first component is successful. 13. A system as claimed in claim 12, wherein said first recovery program, responsive to said first component not starting up successfully and being critical to operation of said second component, disables said first component and communicates a message to an external system requesting assistance. 14. A system as claimed in claim 13, further comprising: a second log component for logging a status of each of said plurality of high level components of said second component for determining whether or not said operational data monitored for said second component falls outside of said operational boundary, and wherein said operational data monitored for said second component is based on one or more predefined, programmed rules which trigger performance of said action on said second component to enable said second component to operate within said operational boundary.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.