IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0676876
(2007-02-20)
|
등록번호 |
US-7681089
(2010-04-21)
|
발명자
/ 주소 |
|
출원인 / 주소 |
- Dot Hill Systems Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
22 인용 특허 :
50 |
초록
▼
A redundant storage controller system that robustly provides failure analysis information (FAI) to an operator of the system is disclosed. The system includes first and second storage controllers in communication with one another, such as via a PCI-Express link. When one of the controllers fails, th
A redundant storage controller system that robustly provides failure analysis information (FAI) to an operator of the system is disclosed. The system includes first and second storage controllers in communication with one another, such as via a PCI-Express link. When one of the controllers fails, the FAI is transferred from the failed controller to the surviving controller over the link. The operator issues a command to the surviving storage controller, which responsively provides the FAI. In one embodiment, the failed storage controller writes the FAI to the second storage controller. In one embodiment, each storage controller periodically writes the FAI before there is a failure. In one embodiment, the second storage controller reads the FAI from the failed storage controller. The FAI may include boot logs, crash logs, debug logs, and event logs. The FAI may also be written to a disk drive connected to the controllers.
대표청구항
▼
I claim: 1. A method for robustly providing failure analysis information to an operator of a redundant storage controller system having first and second storage controllers in communication via an inter-controller link, the method comprising: transferring from the first storage controller to the se
I claim: 1. A method for robustly providing failure analysis information to an operator of a redundant storage controller system having first and second storage controllers in communication via an inter-controller link, the method comprising: transferring from the first storage controller to the second storage controller, via the inter-controller link, information for use in analysis of a failure of the first storage controller; and providing, by the second storage controller, the failure analysis information to the operator; wherein the second storage controller performs said providing the failure analysis information to the operator before the first storage controller is rebooted. 2. The method of claim 1, wherein said transferring the failure analysis information comprises: writing, by the first storage controller, the failure analysis information to the second storage controller via the inter-controller link. 3. The method of claim 2, further comprising: detecting, by the first storage controller, the failure; wherein the first storage controller performs said writing in response to said detecting. 4. The method of claim 3, further comprising: collecting the failure analysis information, by the first storage controller, in response to said detecting, and prior to said writing. 5. The method of claim 4, wherein said collecting the failure analysis information, by the first storage controller, is also performed periodically by the first storage controller, prior to said detecting. 6. The method of claim 2, wherein said the first storage controller writing the failure analysis information to the second storage controller comprises: periodically writing, by the first storage controller, the failure analysis information to the second storage controller, via the inter-controller link, prior to the failure. 7. The method of claim 6, further comprising: generating, by the first storage controller, additional failure analysis information after detecting the failure. 8. The method of claim 7, further comprising: writing, by the first storage controller, the additional failure analysis information to the second storage controller, via the inter-controller link, after said generating. 9. The method of claim 7, further comprising: detecting, by the second storage controller, the failure; and reading, by the second storage controller, the failure analysis information from the first storage controller, via the inter-controller link, in response to said detecting. 10. The method of claim 1, wherein said transferring the failure analysis information comprises: reading, by the second storage controller, the failure analysis information from the first storage controller, via the inter-controller link. 11. The method of claim 10, further comprising: detecting, by the first storage controller, the failure; and notifying the second storage controller of the failure, by the first storage controller, via the inter-controller link, in response to said detecting; wherein the second storage controller performs said reading the failure analysis information in response to said notifying. 12. The method of claim 11, further comprising: collecting, by the first storage controller, the failure analysis information, in response to said detecting. 13. The method of claim 12, wherein said collecting comprises: collecting, by the first storage controller, the failure analysis information in a memory of the first storage controller, wherein said reading comprises the second storage controller reading the failure analysis information from the first storage controller memory, via the inter-controller link. 14. The method of claim 12, further comprising: periodically collecting, by the first storage controller, the failure analysis information, prior to the failure. 15. The method of claim 10, further comprising: detecting, by the second storage controller, the failure; wherein the second storage controller performs said reading the failure analysis information in response to said detecting. 16. The method of claim 15, further comprising: periodically collecting, by the first storage controller, the failure analysis information, prior to the failure. 17. The method of claim 10, wherein said the second storage controller reading the failure analysis information from the first storage controller comprises: performing, by the second storage controller, a direct memory access (DMA) read operation of the failure analysis information from a memory of the first storage controller to a memory of the second storage controller via the inter-controller link. 18. The method of claim 17, wherein the inter-controller link comprises a PCI-Express link, wherein said performing a direct memory access (DMA) read operation comprises: transmitting, by the second storage controller, a PCI-Express memory read request transaction layer packet (TLP) to the first storage controller; and transmitting, by the first storage controller, at least one PCI-Express memory write request TLP to the second storage controller, wherein the failure analysis information is contained in a payload portion of the at least one PCI-Express memory write request TLP. 19. The method of claim 10, wherein said the second storage controller reading the failure analysis information from the first storage controller comprises: performing a plurality of load operations of the failure analysis information, by a CPU of the second storage controller, from a memory of the first storage controller via the inter-controller link. 20. The method of claim 19, wherein the inter-controller link comprises a PCI-Express link, wherein said performing a plurality of load operations comprises: transmitting, by the second storage controller, a PCI-Express memory read request transaction layer packet (TLP) to the first storage controller; and transmitting, by the second storage controller, at least one PCI-Express memory write request TLP to the first storage controller, wherein the failure analysis information is contained in a payload portion of the at least one PCI-Express memory write request TLP. 21. The method of claim 10, wherein said the second storage controller reading the failure analysis information from the first storage controller comprises: reading the failure analysis information, by the second storage controller, without involvement by a CPU of the first storage controller. 22. The method of claim 10, wherein said the second storage controller reading the failure analysis information from the first storage controller comprises: reading the failure analysis information, by the second storage controller, independent of whether a CPU of the first storage controller is operational. 23. The method of claim 1, further comprising: detecting, by the first storage controller, the failure; wherein the first storage controller performs said transferring the failure analysis information, in response to said detecting. 24. The method of claim 23, wherein said the first storage controller detecting the failure comprises the first storage controller detecting the failure while the first storage controller is booting up. 25. The method of claim 1, further comprising: detecting, by the second storage controller, the failure; wherein the second storage controller performs said transferring the failure analysis information, in response to said detecting. 26. The method of claim 1, wherein the inter-controller link comprises a PCI-Express link. 27. The method of claim 26, wherein said transferring the failure analysis information from the first storage controller to the second storage controller comprises transferring the failure analysis information from a memory of the first storage controller directly to a memory of the second storage controller via the PCI-Express link. 28. The method of claim 1, further comprising: writing the failure analysis information, by the first storage controller, to a disk drive connected to the first and second storage controllers. 29. The method of claim 28, further comprising: reading the failure analysis information, by the second storage controller, from the disk drive, prior to said providing the failure analysis information to the operator. 30. The method of claim 1, further comprising: writing the failure analysis information, by the second storage controller, to a disk drive connected to the first and second storage controllers, after said transferring. 31. The method of claim 30, further comprising: reading the failure analysis information, by the second storage controller, from the disk drive, prior to said providing the failure analysis information to the operator. 32. The method of claim 1, wherein the failure analysis information comprises: text messages generated by firmware executing on a CPU of the first controller. 33. The method of claim 32, wherein the text messages indicate that predetermined portions of firmware routines have been executed by the CPU. 34. The method of claim 32, wherein the text messages specify characteristics of a new disk drive that has been discovered by the first controller. 35. The method of claim 32, wherein the text messages include a timestamp of a time when the message was generated. 36. The method of claim 32, wherein the text messages include an indication of which module of the firmware generated the message. 37. The method of claim 32, further comprising: generating, by the CPU of the first storage controller, the text messages to a volatile memory of the first storage controller, prior to said transferring; wherein said transferring comprises transferring the text messages from the volatile memory of the first storage controller to the second storage controller. 38. The method of claim 32, wherein the text messages indicate an occurrence of a system-level event. 39. The method of claim 38, wherein the system-level event comprises creation of a redundant array of disks. 40. The method of claim 38, wherein the system-level event comprises deletion of a redundant array of disks. 41. The method of claim 38, wherein the system-level event comprises a temperature of a component of the first storage controller has exceeded a predetermined threshold. 42. The method of claim 38, wherein the system-level event comprises a failure of a rechargeable energy source for providing power to the first storage controller during a loss of main power. 43. The method of claim 38, wherein the system-level event comprises a capacitor for providing power to the first storage controller during a loss of main power has been recharged beyond a predetermined level. 44. The method of claim 1, wherein the failure analysis information comprises: a crash dump of system software executing on a CPU of the first storage controller. 45. The method of claim 44, wherein the crash dump comprises a listing of contents of registers of the CPU. 46. The method of claim 44, wherein the crash dump comprises a listing of contents of a memory stack of the CPU. 47. The method of claim 1, wherein the failure comprises overheating of a component of the first storage controller. 48. The method of claim 1, wherein the failure comprises a memory parity error of the first storage controller. 49. The method of claim 1, wherein the failure comprises a memory ECC error of the first storage controller. 50. The method of claim 1, wherein the failure comprises invalid data received from a network interface controller of the first storage controller. 51. The method of claim 1, wherein the failure comprises invalid data received from a storage interface controller of the first storage controller. 52. The method of claim 1, wherein the failure comprises an invalid pointer detected by firmware executing on a CPU of the first storage controller. 53. A redundant storage controller system, comprising: first and second redundant storage controllers, coupled together by a communications link, each storage controller of the first and second storage controllers comprising: a CPU, configured to generate information for use in analysis of a failure of the storage controller; a memory, configured to receive the information from the other, failed storage controller via the communications link; and an interface, coupled to the memory, configured to receive a command from an operator of the system, and in response to the command, to provide from the memory of the storage controller the information that was received from the other, failed storage controller; wherein the interface of the non-failed storage controller is configured to provide the information to the operator before the failed storage controller is rebooted. 54. The system of claim 53, wherein the CPU of the failed storage controller is further configured to cause the information to be transferred from the failed storage controller to the non-failed storage controller via the communications link. 55. The system of claim 54, wherein the CPU of the failed storage controller is further configured to detect the failure and to cause the information to be transferred in response to detecting the failure. 56. The system of claim 55, wherein the CPU of the failed storage controller is further configured to collect the information in response to detecting the failure, and prior to causing the information to be transferred. 57. The system of claim 56, wherein the CPU of the failed storage controller is further configured to periodically collect the information prior to detecting the failure. 58. The system of claim 54, wherein the CPU of the failed storage controller periodically causes the information to be transferred, prior to the failure. 59. The system of claim 58, wherein the CPU of the failed storage controller is further configured to generate additional failure analysis information after the failure. 60. The system of claim 59, wherein the CPU of the failed storage controller is further configured to cause the additional failure analysis information to be transferred from the failed storage controller to the non-failed storage controller, via the communications link. 61. The system of claim 59, wherein the CPU of the non-failed storage controller is configured to detect the failure of the failed storage controller and to cause the additional failure analysis information to be transferred from the failed storage controller to the non-failed storage controller, in response to detecting the failure. 62. The system of claim 53, wherein the CPU of the non-failed storage controller is configured to cause the information to be transferred from the failed storage controller to the non-failed storage controller, via the communications link. 63. The system of claim 62, wherein the CPU of the failed storage controller is further configured to detect the failure, and notify the non-failed storage controller of the failure, wherein the CPU of non-failed storage controller is configured to cause the information to be transferred in response to the notification. 64. The system of claim 63, wherein the CPU of the failed storage controller is further configured to collect the information, in response to detecting the failure. 65. The system of claim 64, wherein the CPU of the failed storage controller is configured to collect the information in the memory of the failed storage controller, wherein the CPU of the non-failed storage controller is configured to cause the information to be transferred from the failed storage controller memory. 66. The system of claim 64, wherein the CPU of the failed storage controller is further configured to periodically collect the information, prior to the failure. 67. The system of claim 62, wherein the CPU of non-failed storage controller is configured to detect the failure and to cause the information to be transferred, in response to detecting the failure. 68. The system of claim 67, wherein the CPU of the failed storage controller is further configured to periodically collect the information, prior to the failure. 69. The system of claim 62, wherein each of the storage controllers further comprises: a direct memory access controller (DMAC), coupled to the memory, configured to transfer the information from the memory of the failed storage controller to the memory of the non-failed storage controller, via the communications link. 70. The system of claim 69, wherein said communications link comprises a PCI-Express link, wherein each of the storage controllers further comprises: a PCI-Express interface, configured for coupling to the PCI-Express link; wherein the DMAC is configured to cause the PCI-Express interface of the non-failed storage controller to transmit a PCI-Express memory read request transaction layer packet (TLP) to the failed storage controller on the communications link; wherein the PCI-Express interface of the failed storage controller is configured to transmit at least one PCI-Express memory write request TLP to the non-failed storage controller, in response to the memory read request TLP, wherein the information is contained in a payload portion of the at least one PCI-Express memory write request TLP. 71. The system of claim 62, wherein the CPU of the non-failed storage controller is configured to transfer the information by performing a plurality of load operations of the information from the failed storage controller, via the communications link. 72. The system of claim 71, wherein the communications link comprises a PCI-Express link, wherein each of the storage controllers further comprises: a PCI-Express interface, configured for coupling to the PCI-Express link; wherein the PCI-Express interface of the non-failed storage controller is configured to transmit at least one PCI-Express memory read request transaction layer packet (TLP) to the failed storage controller on the communications link, in response to the plurality of load operations; wherein the PCI-Express interface of the failed storage controller is configured to transmit at least one PCI-Express memory write request TLP to the non-failed storage controller, in response to the memory read request TLP, wherein the information is contained in a payload portion of the at least one PCI-Express memory write request TLP. 73. The system of claim 62, wherein the CPU of the non-failed storage controller is configured to cause the information to be transferred from the failed storage controller to the non-failed storage controller, without involvement by the CPU of the failed storage controller. 74. The system of claim 62, wherein the CPU of the non-failed storage controller is configured to cause the information to be transferred from the failed storage controller to the non-failed storage controller, independent of whether a CPU of the failed storage controller is operational. 75. The system of claim 53, wherein the CPU of the failed storage controller is further configured to detect the failure and to cause the information to be transferred, in response to detecting the failure. 76. The system of claim 75, wherein the CPU of the failed storage controller is further configured to detect the failure while the failed storage controller is booting up. 77. The system of claim 53, wherein the CPU of the non-failed storage controller is further configured to detect the failure and to cause the information to be transferred, in response to detecting the failure. 78. The system of claim 53, wherein the communications link comprises a PCI-Express link. 79. The system of claim 53, wherein the CPU of the failed storage controller is further configured to write the information to a disk drive connected to the first and second storage controllers. 80. The system of claim 79, wherein the CPU of the non-failed storage controller is configured to read the information from the disk drive, prior to providing the information to the operator. 81. The system of claim 53, wherein the CPU of the non-failed storage controller is configured to write the information to a disk drive connected to the first and second storage controllers, after the information is transferred to the non-failed storage controller. 82. The system of claim 81, wherein the CPU of the non-failed storage controller is configured to read the information from the disk drive, prior to providing the information to the operator. 83. The system of claim 53, wherein each of the first and second storage controllers comprises a redundant array of inexpensive disks (RAID) controller. 84. A storage controller, comprising: a PCI-Express interface, configured to couple to a PCI-Express link, and configured to receive thereon from a failed storage controller coupled thereto information for use in analysis of a failure of the failed storage controller; a memory, coupled to the PCI-Express interface, configured to store the received information; and an operator interface, coupled to the memory, configured to receive a command from an operator of the storage controller, and to responsively provide to the operator the received information. 85. The storage controller of claim 84, wherein the operator interface comprises an Ethernet interface. 86. The storage controller of claim 84, wherein the operator interface comprises an RS-232 interface. 87. The storage controller of claim 84, wherein the operator interface comprises a Fibre Channel interface.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.