IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0832918
(2007-08-02)
|
등록번호 |
US-7827385
(2010-11-22)
|
발명자
/ 주소 |
- Almasi, Gheorghe
- Archer, Charles J.
- Ratterman, Joseph D.
- Smith, Brian E.
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
28 |
초록
▼
A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a log
A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
대표청구항
▼
What is claimed is: 1. A method for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the par
What is claimed is: 1. A method for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root, the method comprising: configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce. 2. The method of claim 1 wherein injecting one or more zeros corresponding to a size of the element further comprises injecting one or more zeros from dedicated hardware of the compute node. 3. The method of claim 1 wherein performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros further comprises performing the bitwise OR with an arithmetic logic unit (‘ALU’) on a global combining network adapter for the global combing network. 4. The method of claim 1 further comprising: receiving by the logical root the result of the allreduce; and disregarding by the logical root the result of the allreduce. 5. The method of claim 1 further comprising configuring a class routing algorithm to prevent reception of the result of the allreduce by the logical root. 6. The method of claim 1 wherein the global combining network comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree. 7. A parallel computer for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root, the parallel computer comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions capable of: configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce. 8. The parallel computer of claim 7 wherein computer program instructions capable of: injecting one or more zeros corresponding to a size of the element further comprise computer program instructions capable of: injecting one or more zeros from dedicated hardware of the compute node. 9. The parallel computer of claim 7 wherein computer program instructions capable of: performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros further comprise computer program instructions capable of: performing the bitwise OR with an arithmetic logic unit (‘ALU’) on a global combining network adapter for the global combing network. 10. The parallel computer of claim 7 wherein the computer memory also has disposed within it computer program instructions capable of: receiving by the logical root the result of the allreduce; and disregarding by the logical root the result of the allreduce. 11. The parallel computer of claim 7 wherein the computer memory also has disposed within it computer program instructions capable of configuring a class routing algorithm to prevent reception of the result of the allreduce by the logical root. 12. The parallel computer of claim 7 wherein the global combining network comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree. 13. A computer program product for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root, the computer program product disposed upon a recordable computer readable medium, the computer program product comprising computer program instructions capable of: configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce. 14. The computer program product of claim 13 wherein computer program instructions capable of: injecting one or more zeros corresponding to a size of the element further comprise computer program instructions capable of: injecting one or more zeros from dedicated hardware of the compute node. 15. The computer program product of claim 13 wherein computer program instructions capable of: performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros further comprise computer program instructions capable of: performing the bitwise OR with an arithmetic logic unit (‘ALU’) on a global combining network adapter for the global combing network. 16. The computer program product of claim 13 wherein the recordable computer readable medium also has disposed within it computer program instructions capable of: receiving by the logical root the result of the allreduce; and disregarding by the logical root the result of the allreduce. 17. The computer program product of claim 13 wherein the recordable computer readable medium also has disposed within it computer program instructions capable of configuring a class routing algorithm to prevent reception of the result of the allreduce by the logical root. 18. The computer program product of claim 13 wherein the global combining network comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.