IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0737209
(2007-04-19)
|
등록번호 |
US-7752421
(2010-07-26)
|
발명자
/ 주소 |
- Archer, Charles J.
- Peters, Amanda
- Ricard, Gary R.
- Sidelnik, Albert
- Smith, Brian E.
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
18 |
초록
▼
A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a oper
A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a operation, the results buffer having a position for each node that corresponds to node's rank; and repeatedly for each position in the results buffer: processing in parallel by each node, including: determining, by the node, whether the current position in the results buffer is to include the node's contribution, if the current position is not to include the contribution, contributing the identity element, and if the current position is to include the contribution, contributing the contribution, performing, by each node, the operation using the contributed identity elements and the contributed contributions, yielding a result from the operation, and storing, by each node, the result in the position in the results buffer.
대표청구항
▼
What is claimed is: 1. A method for parallel-prefix broadcast for a parallel-prefix operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the plurality of compute nodes organized into at least one operational group of compute nodes for collective parallel
What is claimed is: 1. A method for parallel-prefix broadcast for a parallel-prefix operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the plurality of compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the at least one operational group assigned a unique rank, and the at least one operational group coupled for data communications through a global combining network, the method comprising: configuring, on each ranked compute node, a parallel-prefix contribution buffer for storing a parallel-prefix contribution of the ranked compute node; configuring, on each ranked compute node, a parallel-prefix results buffer for storing results of a parallel-prefix operation, the parallel-prefix results buffer having a position for each compute node that corresponds to the rank of the compute node; and repeatedly for each position in the parallel-prefix results buffer: processing in parallel by each ranked compute node in the at least one operational group, including: determining, by the ranked compute node, whether a current position in the parallel-prefix results buffer is to include a contribution of the ranked compute node, if the current position in the parallel-prefix results buffer is not to include the contribution of the ranked compute node, contributing an identity element for the parallel-prefix operation, and if the current position in the parallel-prefix results buffer is to include the contribution of the ranked compute node, contributing the parallel-prefix contribution of the ranked compute node for the parallel-prefix operation, performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation, and storing, by each ranked compute node, the result in the position in the parallel-prefix results buffer. 2. The method of claim 1 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to not include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than the rank of the compute node. 3. The method of claim 1 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than or equal to the rank of the compute node. 4. The method of claim 1 wherein performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation further comprises performing the parallel-prefix operation with an arithmetic logic unit (‘ALU’) of a global combining network adapter for the global combing network. 5. The method of claim 1 wherein contributing the identity element for the parallel-prefix operation further comprises injecting the identity element from dedicated hardware of the compute node. 6. The method of claim 1 further comprising configuring, by each ranked compute node, a global combining network adapter for the global combining network in dependence upon the parallel-prefix operation. 7. A parallel computer for parallel-prefix broadcast for a parallel-prefix operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the plurality of compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the at least one operational group assigned a unique rank, and the at least one operational group coupled for data communications through a global combining network, the parallel computer comprising computer processors, computer memory operatively coupled to the computer processors, the computer memory having disposed within it computer program instructions capable of: configuring, on each ranked compute node, a parallel-prefix contribution buffer for storing a parallel-prefix contribution of the ranked compute node; configuring, on each ranked compute node, a parallel-prefix results buffer for storing results of a parallel-prefix operation, the parallel-prefix results buffer having a position for each compute node that corresponds to the rank of the compute node; and repeatedly for each position in the parallel-prefix results buffer: processing in parallel by each ranked compute node in the at least one operational group, including: determining, by the ranked compute node, whether a current position in the parallel-prefix results buffer is to include a contribution of the ranked compute node, if the current position in the parallel-prefix results buffer is not to include the contribution of the ranked compute node, contributing an identity element for the parallel-prefix operation, and if the current position in the parallel-prefix results buffer is to include the contribution of the ranked compute node, contributing the parallel-prefix contribution of the ranked compute node for the parallel-prefix operation, performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation, and storing, by each ranked compute node, the result in the position in the parallel-prefix results buffer. 8. The parallel computer of claim 7 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to not include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than the rank of the compute node. 9. The parallel computer of claim 7 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than or equal to the rank of the compute node. 10. The parallel computer of claim 7 wherein performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation further comprises performing the parallel-prefix operation with an arithmetic logic unit (‘ALU’) of a global combining network adapter for the global combing network. 11. The parallel computer of claim 7 wherein contributing the identity element for the parallel-prefix operation further comprises injecting the identity element from dedicated hardware of the compute node. 12. The parallel computer of claim 7 wherein the computer memory also has disposed within it computer program instructions capable of configuring, by each ranked compute node, a global combining network adapter for the global combining network in dependence upon the parallel-prefix operation. 13. A computer program product for parallel-prefix broadcast for a parallel-prefix operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the plurality of compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the at least one operational group assigned a unique rank, and the at least one operational group coupled for data communications through a global combining network, the computer program product disposed upon a signal bearing recordable medium, the computer program product comprising computer program instructions capable of: configuring, on each ranked compute node, a parallel-prefix contribution buffer for storing a parallel-prefix contribution of the ranked compute node; configuring, on each ranked compute node, a parallel-prefix results buffer for storing results of a parallel-prefix operation, the parallel-prefix results buffer having a position for each compute node that corresponds to the rank of the compute node; and repeatedly for each position in the parallel-prefix results buffer: processing in parallel by each ranked compute node in the at least one operational group, including: determining, by the ranked compute node, whether a current position in the parallel-prefix results buffer is to include a contribution of the ranked compute node, if the current position in the parallel-prefix results buffer is not to include the contribution of the ranked compute node, contributing an identity element for the parallel-prefix operation, and if the current position in the parallel-prefix results buffer is to include the contribution of the ranked compute node, contributing the parallel-prefix contribution of the ranked compute node for the parallel-prefix operation, performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation, and storing, by each ranked compute node, the result in the position in the parallel-prefix results buffer. 14. The computer program product of claim 13 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to not include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than the rank of the compute node. 15. The computer program product of claim 13 wherein determining, by the ranked compute node, whether the current position in the parallel-prefix results buffer is to include a contribution by the compute node further comprises determining whether the current position of the parallel prefix results buffer is greater than or equal to the rank of the compute node. 16. The computer program product of claim 13 wherein performing, by each ranked compute node, the parallel-prefix operation using the contributed identity elements and the contributed parallel-prefix contributions, yielding a result from the parallel-prefix operation further comprises performing the parallel-prefix operation with an arithmetic logic unit (‘ALU’) of a global combining network adapter for the global combing network. 17. The computer program product of claim 13 wherein contributing the identity element for the parallel-prefix operation further comprises injecting the identity element from dedicated hardware of the compute node. 18. The computer program product of claim 13 further comprising computer program instructions capable of configuring, by each ranked compute node, a global combining network adapter for the global combining network in dependence upon the parallel-prefix operation.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.