IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0128597
(2008-05-28)
|
등록번호 |
US-7721153
(2010-06-10)
|
우선권정보 |
GB-0405941.6(2004-03-17) |
발명자
/ 주소 |
- Nash, Richard John
- Noble, Gary Paul
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
2 인용 특허 :
5 |
초록
▼
System, method and computer program product for recovering from a failure of a computing device. Start up of a first component of the device is monitored and a determination is made whether the first component has started successfully. If so, a second, higher level component of the device is started
System, method and computer program product for recovering from a failure of a computing device. Start up of a first component of the device is monitored and a determination is made whether the first component has started successfully. If so, a second, higher level component of the device is started. Operational data received from the second component is monitored. If the operational data falls outside of an operational boundary, an action is performed on the second component to enable the second component to operate within a preferred operational boundary. If the first component does not start up successfully, a determination is made if start up of the first component is critical to operation of the second component. If so, a corrective action is performed relative to the first component and afterwards, an attempt is made to start up the second component.
대표청구항
▼
What is claimed is: 1. A method for recovering from a failure of a computing device, the method comprising the steps of: the computing device monitoring a first component in a first layer of a device stack of the computing device; the computing device determining, using data gathered by the first c
What is claimed is: 1. A method for recovering from a failure of a computing device, the method comprising the steps of: the computing device monitoring a first component in a first layer of a device stack of the computing device; the computing device determining, using data gathered by the first component, whether or not start up of the first component in the computing device is successful, wherein if the start up of the first component is determined to be successful, the computing device initiating start up of a second component residing in a second higher layer of the device stack of the computing device, and wherein if the start up of the first component is determined to be unsuccessful, the computing device determining if the start up of the first component is significant for continued functioning of the computing device, wherein if the start up of the first component is determined to be significant for continued functioning of the computing device, the computing device performing at least one corrective action with respect to the first component for initiating the start up of the second component; the computing device monitoring data gathered from the second component; and the computing device determining whether or not the data monitored for the second component falls outside of a boundary and, if the data monitored for the second component is determined to fall outside of the boundary, performing at least one action on the second component to enable the second component to operate within the boundary. 2. A method as claimed in claim 1, further comprising the steps of: the computing device logging a status of a plurality of first sub-components of the first component for determining whether or not the start up of the first component is successful. 3. A method as claimed in claim 2, wherein if the start up of the first component is determined to be unsuccessful and if the start up of the first component is determined to be significant to operation of the second component, further comprising the steps of: the computing device disabling the first component; and the computing device communicating a message to an external system requesting assistance. 4. A method as claimed in claim 3, further comprising the steps of: the computing device logging a status of each of a plurality of second sub-components of the second component for determining whether or not the data monitored for the second component falls outside of the boundary. 5. A method as claimed in claim 4, wherein the data monitored for the second component is based on one or more predefined, programmed rules which trigger performance of the at least one action on the second component to enable the second component to operate within the boundary. 6. A method as claimed in claim 3, further comprising the step of: if the start up of the first component is determined to be unsuccessful and if the start up of the first component is determined to be not significant to operation of the second component, the computing device communicating recovery control for the computing device from the first component to the second component. 7. A method as claimed in claim 6, wherein the communicating step further comprises the steps of: sending, by the first component, a message to the second component, to start the second component; and transferring recovery control for the computing device from the first component to the second component. 8. A computer program product for recovering from a failure of a computing device, the computer program product comprising: a computer readable storage medium; first program instructions for monitoring a first component of the computing device, the first program instructions including instructions to determine whether start up of the first component is successful, and if the start up of the first component is determined to be successful, initiating start up of a second component of the computing device, wherein the first program instructions include instructions, responsive to unsuccessful start up of the first component, determine if the start up of the first component is significant to operation of the second component, wherein the first program instructions include instructions, responsive to the start up of the first component being determined to be significant to the operation of the second component, to perform at least one corrective action with respect to the first component for initiating start up of the second component; second program instructions to start up the second component of the computing device responsive to a determination of successful start up of the first component, the second program instructions including instructions to monitor data received from the second component and to determine whether or not the data monitored falls outside of a boundary; and third program instructions, responsive to the data falling outside of the boundary, to perform at least one action on the second component to enable the second component to operate within a preferred boundary; wherein the first, second and third program instructions are stored on the computer readable storage medium. 9. A computer program product as claimed in claim 8, further comprising: fourth program instructions to log a status of a plurality of first sub-components of the first component for determining whether the start up of the first component is successful, and wherein the fourth program instructions are stored on the computer readable storage medium. 10. A computer program product as claimed in claim 8, wherein the first program instructions include instructions, responsive to the first recovery component not starting up successfully and responsive to the first recovery component being determined to be not significant for continued operation of the second recovery component, to communicate transfer of recovery control for the computing device from the first component to the second component. 11. A computer program product as claimed in claim 10, wherein the second component operates responsive to the first component communicating to the second component that the first component is secure and stable. 12. A computer program product as claimed in claim 10, wherein the fourth program instructions include instructions, responsive to the first component not starting up successfully and responsive to the first component being significant to operation of said the second component, to disable the first component and communicate a message, to an external system, requesting assistance. 13. A computer program product as claimed in claim 10, wherein the fourth program instructions include instructions to log a status of a plurality of second sub-components of the second component for determining whether or not the data monitored for the second component falls outside of the boundary. 14. A computer program product as claimed in claim 13, wherein the data monitored for the second component is based on one or more predefined, programmed rules which trigger performance of the at least one action on the second component to enable the second component to operate within the boundary. 15. A system for performing recovery and corrective actions in a computing device, the system comprising: a CPU, a computer readable memory and a computer readable storage media; a first recovery program component operating in a first level environment of the computing device, the first recovery program component determining whether or not start up of the first level environment is stable and secure based on data monitored for the first level environment; a second recovery program component operating in a second level environment of the computing device, for determining based on data monitored for the second level environment whether or not the data monitored for the second level environment falls outside of a boundary, wherein the first recovery program component initiates start up of the second recovery program component if the start up of the first level environment is determined to be stable and secure, and wherein if the start up of the first level environment is determined not to be stable and secure, the first recovery program component determines whether or not the start up of the first level environment is significant for continued functioning of the computing device and, if the start up of the first level environment is determined to be significant for continued functioning of the computing device, performing at least one corrective action on the first recovery program component for stabilizing and securing the first level environment to initiate the start up of the second recovery program component, and wherein if the data monitored for the second level environment is determined to fall outside of the boundary for the second recovery program component, performing at least one action for enabling the second recovery program component to operate within the boundary; and wherein the first recovery program component and the second recovery program component are stored on the computer readable storage media for execution by the CPU via the computer readable memory. 16. A system as claimed in claim 15, further comprising: a first logging program component for logging a status of each of a plurality of first subcomponents of the first recovery program component for determining whether or not the start up of the first recovery program component is successful; and wherein the first logging program component is stored on the computer readable storage media for execution by the CPU via the computer readable memory. 17. A system as claimed in claim 15, wherein, responsive to the first recovery program component not starting up successfully and being determined significant to operation of the second recovery program component, the second recovery program component disables the first recovery program component and communicates a message to an external system requesting assistance. 18. A system as claimed in claim 17, further comprising: a second logging program component for logging a status of each of a plurality of second subcomponents of the second recovery program component for determining whether or not the data monitored for the second recovery program component falls outside of the boundary: and wherein the second logging program component is stored on the computer readable storage media for execution by the CPU via the computer readable memory. 19. A system as claimed in claim 18, wherein the data monitored for the second recovery program component is based on one or more predefined, programmed rules which trigger performance of the at least one action on the second recovery program component to enable the second component to operate within the boundary. 20. A system as claimed in claim 17, wherein, responsive to the first recovery program component not starting up successfully and the first recovery program component being determined to be not significant for continued operation of the second recovery component, transferring recovery control for the computing device to the second recovery program component.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.