Certified memory-to-memory data transfer between active-active raid controllers
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-013/00
G06F-012/16
출원번호
UP-0317504
(2005-12-22)
등록번호
US-7536495
(2009-07-01)
발명자
/ 주소
Ashmore, Paul Andrew
Davies, Ian Robert
Maine, Gene
Vedder, Rex Weldon
출원인 / 주소
Dot Hill Systems Corporation
대리인 / 주소
Davis, E. Alan
인용정보
피인용 횟수 :
9인용 특허 :
46
초록▼
A system for performing an efficient mirrored posted-write operation having first and second RAID controllers in communication via a PCI-Express link is disclosed. The first bus bridge transmits a PCI-Express memory write request TLP to the second bus bridge. The TLP header includes an indication of
A system for performing an efficient mirrored posted-write operation having first and second RAID controllers in communication via a PCI-Express link is disclosed. The first bus bridge transmits a PCI-Express memory write request TLP to the second bus bridge. The TLP header includes an indication of whether the first CPU requests a certification that certifies the payload data has been written to the second write cache memory. If the indication requests the certification, the second bus bridge automatically transmits the certification to the first bus bridge independent of the second CPU, after writing the payload data to the second write cache memory. The first bus bridge generates an interrupt to the first CPU in response to receiving the certification. The certified transfer may be used to validate and/or invalidate mirrored copies of a write cache directory on the RAID controllers, among other uses.
대표청구항▼
We claim: 1. A system for performing a mirrored posted-write operation, comprising: first and second redundant array of inexpensive disks (RAID) controllers in communication via a PCI-Express link, each comprising a CPU, a write cache memory, and a bus bridge coupled to said CPU, said write cache m
We claim: 1. A system for performing a mirrored posted-write operation, comprising: first and second redundant array of inexpensive disks (RAID) controllers in communication via a PCI-Express link, each comprising a CPU, a write cache memory, and a bus bridge coupled to said CPU, said write cache memory, and said communications link; wherein said first bus bridge is configured to transmit a PCI-Express memory write request transaction layer packet (TLP) on said link to said second bus bridge, said TLP comprising payload data and a header, said header including an indication of whether a certification is requested by said first CPU, said certification certifying that said payload data has been written to said write cache memory of said second RAID controller; wherein if said indication requests said certification, said second bus bridge is configured to automatically transmit said certification to said first bus bridge independent of said second CPU, after writing said payload data to said write cache memory of said second RAID controller; and wherein said first bus bridge is configured to generate an interrupt to said first CPU in response to receiving said certification. 2. The system of claim 1, wherein said indication comprises a predetermined bit of a field of said TLP header interpreted by said second bus bridge as said indication. 3. The system of claim 1, wherein said second bus bridge comprises a storage element for storing an address range, wherein said indication comprises a memory address within said TLP header specifying a destination of said payload data in said second write cache, wherein said indication indicates said certification is requested if said memory address is within said address range. 4. The system of claim 1, wherein said payload data comprises a portion of a directory of said write cache indicating whether one or more write cache buffers of said write cache are valid. 5. The system of claim 1, wherein said payload data comprises a RAID 5 parity data log. 6. The system of claim 1, wherein said certification comprises a second PCI-Express TLP. 7. The system of claim 1, wherein said first bus bridge comprises: a timer, configured to commence running when said first bus bridge transmits said PCI-Express memory write request TLP on said link to said second bus bridge, wherein said first bus bridge is configured to interrupt said first CPU if said first bus bridge does not receive said certification from said second bus bridge within a predetermined time. 8. The system of claim 2, wherein said predetermined bit of said field of said TLP header indication comprises a predetermined address bit of the address field of said TLP header interpreted by said second bus bridge as said indication. 9. The system of claim 3, wherein said second CPU is configured to write said address range into said storage element at initialization time of said second RAID controller. 10. The system of claim 4, wherein said first CPU is configured to: command said first bus bridge to transfer said portion of said directory from said first write cache to said second write cache and generate said interrupt to said first CPU in response to receiving said certification, wherein said first bus bridge is configured to transmit said PCI-Express memory write request TLP on said link to said second bus bridge with said indication set to a predetermined value to request said certification, in response to said command from said first CPU; and invalidate said one or more write cache buffers in said portion of said directory, prior to commanding said first bus bridge to transfer said portion of said directory. 11. The system of claim 4, wherein said first CPU is configured to: command said first bus bridge to transfer said portion of said directory from said first write cache to said second write cache and generate said interrupt to said first CPU in response to receiving said certification, wherein said first bus bridge is configured to transmit said PCI-Express memory write request TLP on said link to said second bus bridge with said indication set to a predetermined value to request said certification, in response to said command from said first CPU; and validate said one or more write cache buffers in said portion of said directory, prior to commanding said first bus bridge to transfer said portion of said directory. 12. The system of claim 6, wherein the second TLP header includes a second indication certifying that said payload data has been written to said second write cache memory. 13. The system of claim 6, wherein said second PCI-Express TLP comprises a PCI-Express vendor-defined message TLP. 14. The system of claim 6, wherein said second PCI-Express TLP comprises a PCI-Express message-signaled interrupt (MSI) message TLP. 15. The system of claim 8, wherein said PCI-Express memory write request TLP has a 4 double word header with data format, wherein said predetermined address bit is one of bits 63 through 32 of the address field. 16. The system of claim 8, wherein said PCI-Express memory write request TLP has a 3 double word header with data format, wherein said predetermined address bit is bit 31 of the address field. 17. The system of claim 10, wherein said first bus bridge is configured to: write posted-write data to said first write cache and broadcast a copy of said posted-write data to said second bus bridge for writing to said second write cache, after generating said interrupt to said first CPU. 18. The system of claim 10, wherein said first CPU is configured to: commence running a timer after commanding said first bus bridge to transfer said portion of said directory; and determine if said first bus bridge does not generate said interrupt to said first CPU within a predetermined time. 19. The system of claim 12, wherein said second PCI-Express TLP comprises a PCI-Express memory write request TLP. 20. The system of claim 12, wherein said first bus bridge comprises a storage element for storing an address range, wherein said second indication comprises a memory address within the second TLP header, wherein said second indication certifies that said payload data has been written to said second write cache memory if said memory address is within said address range. 21. The system of claim 17, wherein said first CPU is further configured to: validate said one or more write cache buffers in said portion of said directory, after said first bus bridge broadcasts said copy of said posted-write data to said second bus bridge; and command said first bus bridge to transfer said validated portion of said directory from said first write cache to said second write cache and generate a second interrupt to said first CPU in response to receiving a second said certification, after validating said one or more write cache buffers in said portion of said directory. 22. The system of claim 17, wherein said first RAID controller further comprises: a host interface, coupled to said first bus bridge, configured to receive said posted-write data received from a host computer coupled to said first RAID controller, and to write said posted-write write data to said first bus bridge, wherein said first bus bridge writes posted-write data to said first write cache and broadcasts said copy of said posted-write data in response to said host interface writing said posted-write write data to said first bus bridge. 23. The system of claim 11, wherein said first CPU is further configured to: populate said portion of said directory with information specifying a disk drive of a disk drive array coupled to said first and second RAID controllers and a destination location on said disk drive of said posted-write data, prior to commanding said first bus bridge to transfer said validated portion of said directory from said first write cache to said second write cache. 24. The system of claim 11, wherein said first CPU is further configured to: send good status regarding the mirrored posted-write operation to a host computer coupled to said first RAID controller, in response to receiving said second interrupt from said first bus bridge. 25. The system of claim 11, wherein said first bus bridge is configured to: write posted-write data to said first write cache and broadcast a copy of said posted-write data to said second bus bridge for writing to said second write cache, before said first CPU commands said first bus bridge to transfer said portion of said directory. 26. The system of claim 21, wherein said first CPU is further configured to: populate said portion of said directory with information specifying a disk drive of a disk drive array coupled to said first and second RAID controllers and a destination location on said disk drive of said posted-write data, prior to commanding said first bus bridge to transfer said validated portion of said directory from said first write cache to said second write cache. 27. The system of claim 21, wherein said first CPU is further configured to: send good status regarding the mirrored posted-write operation to a host computer coupled to said first RAID controller, in response to receiving said second interrupt from said first bus bridge. 28. The system of claim 25, wherein said first CPU is further configured to: invalidate said one or more write cache buffers in said portion of said directory, before said first bus bridge broadcasts said copy of said posted-write data to said second bus bridge; and command said first bus bridge to transfer said invalidated portion of said directory from said first write cache to said second write cache and generate a second interrupt to said first CPU in response to receiving a second said certification, after invalidating said one or more write cache buffers in said portion of said directory. 29. The system of claim 25, wherein said first RAID controller further comprises: a host interface, coupled to said first bus bridge, configured to receive said posted-write data received from a host computer coupled to said first RAID controller, and to write said posted-write write data to said first bus bridge, wherein said first bus bridge writes said posted-write data to said first write cache and broadcasts said copy of said posted-write data in response to said host interface writing said posted-write write data to said first bus bridge. 30. The system of claim 19, wherein said second indication comprises a predetermined bit of a field of said second memory write request TLP header interpreted by said first bus bridge as said second indication certifying that said payload data has been written to said second write cache memory. 31. A method for performing a certified memory-to-memory transfer operation between first and second redundant array of inexpensive disks (RAID) controllers in communication via a PCI-Express link, each comprising a CPU, a write cache memory, and a bus bridge coupled to the CPU, the write cache memory, and the communications link, the method comprising: the first bus bridge transmitting a PCI-Express memory write request transaction layer packet (TLP) on the link to the second bus bridge, the TLP comprising payload data and a header, the header including an indication of whether a certification is requested by the first CPU, the certification certifying that the payload data has been written to the write cache memory of said second RAID controller; the second bus bridge determining whether the indication requests the certification; the second bus bridge automatically transmitting the certification to the first bus bridge independent of the second CPU, after writing the payload data to the write cache memory of said second RAID controller, if the indication requests the certification; and the first bus bridge generating an interrupt to the first CPU in response to receiving the certification. 32. The method of claim 31, wherein the payload data comprises a portion of a directory of the write cache indicating whether one or more write cache buffers of the write cache are valid. 33. The method of claim 31, wherein the payload data comprises a RAID 5 parity data log. 34. The method of claim 31, wherein the certification comprises a second PCI-Express TLP. 35. The method of claim 31, further comprising: the first bus bridge commencing running a timer when the first bus bridge transmits the PCI-Express memory write request TLP on the link to the second bus bridge; the first bus bridge interrupting the first CPU if the first bus bridge does not receive the certification from the second bus bridge within a predetermined time. 36. The method of claim 32, further comprising: the first CPU commanding the first bus bridge to transfer the portion of the directory from the first write cache to the second write cache and generate the interrupt to the first CPU in response to receiving the certification, wherein the first bus bridge is configured to transmit the PCI-Express memory write request TLP on the link to the second bus bridge with the indication set to a predetermined value to request the certification, in response to said first CPU commanding; and the first CPU invalidating the one or more write cache buffers in the portion of the directory, prior to said first CPU commanding the first bus bridge to transfer the portion of the directory. 37. The method of claim 32, further comprising: the first CPU commanding the first bus bridge to transfer the portion of the directory from the first write cache to the second write cache and generate the interrupt to the first CPU in response to receiving the certification, wherein the first bus bridge is configured to transmit the PCI-Express memory write request TLP on the link to the second bus bridge with the indication set to a predetermined value to request the certification, in response to said first CPU commanding; and the first CPU validating the one or more write cache buffers in the portion of the directory, prior to said commanding the first bus bridge to transfer the portion of the directory. 38. The method of claim 36, further comprising: the first bus bridge writing posted-write data to the first write cache and broadcasting a copy of the posted-write data to the second bus bridge for writing to the second write cache, after said generating the interrupt to the first CPU. 39. The method of claim 36, further comprising: the first CPU commencing running a timer after said commanding the first bus bridge to transfer the portion of the directory; and the first CPU determining if the first bus bridge does not generate the interrupt to the first CPU within a predetermined time. 40. The method of claim 38, further comprising: the first CPU validating the one or more write cache buffers in the portion of the directory, after said first bus bridge broadcasting the copy of the posted-write data to the second bus bridge; and the first CPU commanding the first bus bridge to transfer the validated portion of the directory from the first write cache to the second write cache and generate a second interrupt to the first CPU in response to receiving a second certification, after said validating the one or more write cache buffers in the portion of the directory. 41. The method of claim 40, further comprising: the first CPU populating the portion of the directory with information specifying a disk drive array coupled to the first and second RAID controllers and a destination location on the disk drive array of the posted-write data, prior to said commanding the first bus bridge to transfer the validated portion of the directory from the first write cache to the second write cache. 42. The method of claim 40, further comprising: the first CPU sending good status regarding the mirrored posted-write operation to a host computer coupled to the first RAID controller, in response to receiving the second interrupt from the first bus bridge. 43. The method of claim 37, further comprising: the first CPU populating the portion of the directory with information specifying a disk drive array coupled to the first and second RAID controllers and a destination location on the disk drive array of the posted-write data, prior to said commanding the first bus bridge to transfer the validated portion of the directory from the first write cache to the second write cache. 44. The method of claim 37, further comprising: the first CPU sending good status regarding the mirrored posted-write operation to a host computer coupled to the first RAID controller, in response to receiving the second interrupt from the first bus bridge. 45. The method of claim 37, further comprising: the first bus bridge writing posted-write data to the first write cache and broadcasting a copy of the posted-write data to the second bus bridge for writing to the second write cache, before said first CPU commanding the first bus bridge to transfer the portion of the directory. 46. The method of claim 45, further comprising: the first CPU invalidating the one or more write cache buffers in the portion of the directory, before said first bus bridge broadcasting the copy of the posted-write data to the second bus bridge; and the first CPU commanding the first bus bridge to transfer the invalidated portion of the directory from the first write cache to the second write cache and generate a second interrupt to the first CPU in response to receiving a second said certification, after said invalidating the one or more write cache buffers in the portion of the directory. 47. A bus bridge, for instantiation on each of primary and secondary redundant array of inexpensive disks (RAID) controllers coupled for communication on a PCI-Express link, the bus bridge comprising: a PCI-Express interface, configured for coupling to the PCI-Express link; a local bus interface, configured for coupling to a CPU of the respective RAID controller; a memory bus interface, configured for coupling to a write cache memory of the respective RAID controller; and control logic, coupled to and configured to control said PCI-Express interface, said local bus interface, and said memory bus interface; wherein said primary control logic is configured to control said primary PCI-Express interface to transmit a PCI-Express memory write request transaction layer packet (TLP) on said link, said TLP comprising payload data and a header, said header including an indication of whether a certification is requested by said primary CPU, said certification certifying that said payload data has been written to said write cache memory of said secondary RAID controller; wherein, said secondary control logic is configured to determine whether said indication received by said secondary PCI-Express interface requests said certification, and to automatically control said secondary PCI-Express interface to transmit said certification on said link independent of said secondary CPU, after controlling said secondary memory bus interface to write said payload data to said write cache memory of said secondary RAID controller, if said indication requests said certification; and wherein said primary control logic is configured to control said local bus interface to generate an interrupt to said primary CPU in response to said primary PCI-Express interface receiving said certification. 48. The bus bridge of claim 47, wherein said payload data comprises a portion of a directory of said primary write cache indicating whether one or more write cache buffers of said primary write cache are valid. 49. The bus bridge of claim 47, wherein said control logic comprises: a timer, configured to commence running when said primary PCI-Express interface transmits said PCI-Express memory write request TLP on said link, wherein said primary control logic is configured to cause said primary local bus interface to interrupt said primary CPU if said primary PCI-Express interface does not receive said certification from said secondary bus bridge within a predetermined time. 50. The bus bridge of claim 48, wherein said primary local bus interface is configured to receive from said primary CPU a command for said primary bus bridge to transfer an invalidated said portion of said directory from said primary write cache to said secondary write cache and generate said interrupt to said primary CPU in response to receiving said certification, wherein said primary control logic is configured to control said primary PCI-Express interface to transmit said PCI-Express memory write request TLP on said link with said indication set to a predetermined value to request said certification, in response to said command from said primary CPU. 51. The bus bridge of claim 50, wherein said primary control logic is configured to: control said primary memory bus interface to write posted-write data to said primary write cache and broadcast a copy of said posted-write data on said link for writing to said secondary write cache, after controlling said primary local bus interface to generate said interrupt to said primary CPU. 52. The bus bridge of claim 51, wherein said primary local bus interface is configured to receive from said primary CPU a second command for said primary bus bridge to transfer a validated said portion of said directory from said primary write cache on said link for said secondary bus bridge to write to said secondary write cache and to generate a second interrupt to said primary CPU in response to receiving a second said certification. 53. The bus bridge of claim 51, further comprising: a second local bus interface, configured for coupling to a host interface of said primary RAID controller, said primary second local bus interface configured to receive said posted-write data received by said host interface from a host computer coupled to said primary RAID controller. 54. The bus bridge of claim 51, further comprising: a second local bus interface, configured for coupling to a disk interface of said primary RAID controller, said primary second local bus interface configured to write said posted-write data to said disk interface for writing to one or more disk drives coupled to said primary RAID controller. 55. The bus bridge of claim 53, further comprising: a third local bus interface, configured for coupling to a disk interface of said primary RAID controller, said primary third local bus interface configured to write said posted-write data to said disk interface for writing to one or more disk drives coupled to said primary RAID controller.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (46)
Jeddeloh,Joseph, Accelerated graphics port for a multiple memory controller computer system.
Harriman, David, Communicating transaction types between agents in a computer system using packet headers including an extended type/extended length field.
Nielson Michael E. (Broomfield CO) Brant William A. (Boulder CO) Neben Gary (Boulder CO), Fault tolerant memory system which utilizes data from a shadow memory device upon the detection of erroneous data in a m.
Abraham Menachem (Lexington MA) Bartolini David (Dudley MA) Ben-Meir Samuel (Sharon MA) Carmi Ilan (Framingham MA) Cook ; III John L. (Southborough MA) Hart Ira (Cambridge MA) Herman Alex (Sharon MA), Generic backplane system which is configurable to serve different network access methods simultaneously.
Yang,Kent YingKuang; Goodwin,William Patrick, Lock and release mechanism for out-of-order frame prevention and support of native command queueing in FC-SATA.
Schneider Randy D. (Spring TX) Flower David L. (Tomball TX), Method and apparatus for improving the performance of partial stripe operations in a disk array subsystem.
Rahman Monis ; Poplingher Mircea ; Yeh Tse-Yu ; Chen Wenliang, Method and apparatus for performing reads of related data from a set-associative cache memory.
Gregg Thomas A. (Highland NY) Capowski Robert S. (Verbank NY) Ferraiolo Frank D. (New Windsor NY) Halma Marten J. (Poughquag NY) Hillock Thomas H. (Woodstock NY) Murray Robert E. (Kingston NY), Method for transferring data between processors on a network by establishing an address space for each processor in each.
Jibbe Mahmoud K. (Wichita KS) McCombs Craig C. (Wichita KS) Thompson Kenneth J. (Wichita KS), Multiple configuration data path architecture for a disk array controller.
Islam Shah Mohammad Rezaul ; Oza Bharatkumar Jayantilal, RAID system having a selectable unattended mode of operation with conditional and hierarchical automatic re-configuration.
Young Paul R. (Cromwell CT) Solari Peter L. (Lebanon CT) Shumski Gregory J. (Colchester CT) So Yin Cheung (Fremont CA), Redundant array of solid state memory devices.
Browne Hendrik A., Secure computer system and method of providing secure access to a computer system including a stand alone switch operable to inhibit data corruption on a storage device.
Crockett Robert N. (Tucson AZ) Kern Ronald M. (Tucson AZ) Micka William F. (Tucson AZ), Software directed microcode state save for distributed storage controller.
Phillip M. Jones ; Robert Allan Lester, System for identifying memory requests as noncacheable or reduce cache coherence directory lookups and bus snoops.
Biran, Giora; Granovsky, Ilya; Perlin, Elchanan, System and method for a credit based flow device that utilizes PCI express packets having modified headers wherein ID fields includes non-ID data.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.