IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0782791
(2010-05-19)
|
등록번호 |
US-8346883
(2013-01-01)
|
발명자
/ 주소 |
- Archer, Charles J.
- Blocksome, Michael A.
- Ratterman, Joseph D.
- Smith, Brian E.
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
2 인용 특허 :
43 |
초록
▼
Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's
Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node.
대표청구항
▼
1. A method of effecting hardware acceleration of broadcast operations in a parallel computer, the parallel computer comprising a plurality of compute nodes organized for collective operations via a data communications network, each compute node having a receive buffer, the method comprising: establ
1. A method of effecting hardware acceleration of broadcast operations in a parallel computer, the parallel computer comprising a plurality of compute nodes organized for collective operations via a data communications network, each compute node having a receive buffer, the method comprising: establishing a network topology for the data communications network;selecting, in dependence upon the network topology, a schedule for a broadcast operation;depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer;depositing, by the root node in a memory region designated for storing broadcast data length for the target node, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; andtriggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node. 2. The method of claim 1 wherein depositing broadcast data further comprises: packetizing the broadcast data into one or more broadcast data packets; andenabling, in a header of the broadcast data packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data packet to store a copy of the broadcast data packet. 3. The method of claim 1 wherein depositing a length of the broadcast data further comprises: packetizing the length of the broadcast data into one or more broadcast data length packets; andenabling, in a header of the broadcast data length packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data length packet to store a copy of the broadcast data length packet. 4. The method of claim 1 wherein depositing an instruction to inject the broadcast data into the receive buffer of a subsequent target node further comprises: packetizing the instruction into one or more instruction packets; andenabling, in a header of the instruction packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the instruction packets to store the instruction packets and perform another DMA operation. 5. The method of claim 1 wherein establishing within the network a topology further comprises: defining the compute nodes in the network topology; andinitializing, by each compute node, a DMA data descriptor, the DMA data descriptor comprising:the compute node's receive buffer;the compute node's broadcast data length memory region; anda well-known memory location of another compute node's receive buffer. 6. The method of claim 5 further comprising depositing, by the target node, the broadcast data in the subsequent target node's receive buffer including performing a DMA operation with the well-known memory location of the subsequent target node's receive buffer included in the target node's DMA data descriptor and in accordance with the broadcast data length size stored in the target node's broadcast data length memory region. 7. An apparatus for effecting hardware acceleration of broadcast operations in a parallel computer, the parallel computer comprising a plurality of compute nodes organized for collective operations via a data communications network, each compute node having a receive buffer, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions capable of: establishing a network topology for the data communications network;selecting, in dependence upon the network topology, a schedule for a broadcast operation;depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer;depositing, by the root node in a memory region designated for storing broadcast data length for the target node, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; andtriggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node. 8. The apparatus of claim 7 wherein depositing broadcast data further comprises: packetizing the broadcast data into one or more broadcast data packets; andenabling, in a header of the broadcast data packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data packet to store a copy of the broadcast data packet. 9. The apparatus of claim 7 wherein depositing a length of the broadcast data further comprises: packetizing the length of the broadcast data into one or more broadcast data length packets; andenabling, in a header of the broadcast data length packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data length packet to store a copy of the broadcast data length packet. 10. The apparatus of claim 7 wherein depositing an instruction to inject the broadcast data into the receive buffer of a subsequent target node further comprises: packetizing the instruction into one or more instruction packets; andenabling, in a header of the instruction packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the instruction packets to store the instruction packets and perform another DMA operation. 11. The apparatus of claim 7 wherein establishing within the network a topology further comprises: defining the compute nodes in the network topology; andinitializing, by each compute node, a DMA data descriptor, the DMA data descriptor comprising:the compute node's receive buffer;the compute node's broadcast data length memory region; anda well-known memory location of another compute node's receive buffer. 12. The apparatus of claim 11 further comprising computer program instructions capable of depositing, by the target node, the broadcast data in the subsequent target node's receive buffer including performing a DMA operation with the well-known memory location of the subsequent target node's receive buffer included in the target node's DMA data descriptor and in accordance with the broadcast data length size stored in the target node's broadcast data length memory region. 13. A computer program product for effecting hardware acceleration of broadcast operations in a parallel computer, the parallel computer comprising a plurality of compute nodes organized for collective operations via a data communications network, each compute node having a receive buffer, the computer program product disposed in a computer readable storage medium, the computer program product comprising computer program instructions capable of: establishing a network topology for the data communications network;selecting, in dependence upon the network topology, a schedule for a broadcast operation;depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer;depositing, by the root node in a memory region designated for storing broadcast data length for the target node, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; andtriggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node. 14. The computer program product of claim 13 wherein depositing broadcast data further comprises: packetizing the broadcast data into one or more broadcast data packets; andenabling, in a header of the broadcast data packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data packet to store a copy of the broadcast data packet. 15. The computer program product of claim 13 wherein depositing a length of the broadcast data further comprises: packetizing the length of the broadcast data into one or more broadcast data length packets; andenabling, in a header of the broadcast data length packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the broadcast data length packet to store a copy of the broadcast data length packet. 16. The computer program product of claim 13 wherein depositing an instruction to inject the broadcast data into the receive buffer of a subsequent target node further comprises: packetizing the instruction into one or more instruction packets; andenabling, in a header of the instruction packets, a deposit flag, the enabled deposit flag representing an instruction to any compute node receiving the instruction packets to store the instruction packets and perform another DMA operation. 17. The computer program product of claim 13 wherein establishing within the network a topology further comprises: defining the compute nodes in the network topology; andinitializing, by each compute node, a DMA data descriptor, the DMA data descriptor comprising:the compute node's receive buffer;the compute node's broadcast data length memory region; anda well-known memory location of another compute node's receive buffer. 18. The computer program product of claim 17 further comprising computer program instructions capable of depositing, by the target node, the broadcast data in the subsequent target node's receive buffer including performing a DMA operation with the well-known memory location of the subsequent target node's receive buffer included in the target node's DMA data descriptor and in accordance with the broadcast data length size stored in the target node's broadcast data length memory region. 19. The computer program product of claim 13, wherein the computer readable storage medium comprises a transmission medium. 20. The computer program product of claim 13, wherein the computer readable storage medium comprises a recordable medium.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.