Executing a gather operation on a parallel computer
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/00
G06F-015/76
G06F-011/00
출원번호
US-0754740
(2007-05-29)
등록번호
US-8140826
(2012-03-20)
발명자
/ 주소
Archer, Charles J.
Ratterman, Joseph D.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Biggers & Ohanian, LLP
인용정보
피인용 횟수 :
1인용 특허 :
44
초록▼
Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, ea
Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.
대표청구항▼
1. A method for executing a gather operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the o
1. A method for executing a gather operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, and one compute node assigned to be a logical root, the method comprising: receiving, by each compute node, a control message indicating a beginning of the gather operation;initializing, by each compute node, a position counter;configuring, by the logical root, a result buffer for the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node;repeatedly for each position in the result buffer:determining, by each compute node of the operational group, whether a current position in the result buffer corresponds with the rank of the compute node, including: determining whether a current value of the position counter matches the rank of the compute node, identifying that the position in the result buffer corresponds with the rank of the compute node if the current value of the position counter matches the rank of the compute node, and identifying that the position in the result buffer does not correspond with the rank of the compute node if the current value of the position counter does not match the rank of the compute node, and incrementing the current value of the position counter;if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data,if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, andstoring, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network. 2. The method of claim 1 wherein contributing, by the compute node, a value of zero further comprises injecting the zero from dedicated hardware of the compute node. 3. The method of claim 1 wherein: a size of each position in the result buffer is the same; anda size of the contribution data of each compute node is the same. 4. The method of claim 1 wherein: a size of the contribution data of each compute node varies; anda size of each position in the result buffer matches the size of the contribution data of the compute node whose rank corresponds with the position. 5. The method of claim 1 wherein the parallel computer further comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree, each compute node having a separate ALU dedicated to parallel operations. 6. A parallel computer for executing a gather operation, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, and one compute node assigned to be a logical root, the parallel computer comprising computer processors, a computer memory operatively coupled to the computer processors, the computer memory having disposed within it computer program instructions capable of: receiving, by each compute node, a control message indicating a beginning of the gather operation;initializing, by each compute node, a position counter;configuring, by the logical root, a result buffer for the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node;repeatedly for each position in the result buffer:determining, by each compute node of the operational group, whether a current position in the result buffer corresponds with the rank of the compute node, including: determining whether a current value of the position counter matches the rank of the compute node, identifying that the position in the result buffer corresponds with the rank of the compute node if the current value of the position counter matches the rank of the compute node, and identifying that the position in the result buffer does not correspond with the rank of the compute node if the current value of the position counter does not match the rank of the compute node, and incrementing the current value of the position counter;if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data,if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, andstoring, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network. 7. The parallel computer of claim 6 wherein computer program instructions capable of contributing, by the compute node, a value of zero further comprises computer program instructions capable of injecting the zero from dedicated hardware of the compute node. 8. The parallel computer of claim 6 wherein: a size of each position in the result buffer is the same; anda size of the contribution data of each compute node is the same. 9. The parallel computer of claim 6 wherein: a size of the contribution data of each compute node varies; anda size of each position in the result buffer matches the size of the contribution data of the compute node whose rank corresponds with the position. 10. The parallel computer of claim 6 wherein the parallel computer further comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree, each compute node having a separate ALU dedicated to parallel operations. 11. A computer program product for executing a gather operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, and one compute node assigned to be a logical root, the computer program product disposed upon a computer readable recordable medium, the computer program product comprising computer program instructions capable of: receiving, by each compute node, a control message indicating a beginning of the gather operation;initializing, by each compute node, a position counter;configuring, by the logical root, a result buffer for the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node;repeatedly for each position in the result buffer:determining, by each compute node of the operational group, whether a current position in the result buffer corresponds with the rank of the compute node, including: determining whether a current value of the position counter matches the rank of the compute node, identifying that the position in the result buffer corresponds with the rank of the compute node if the current value of the position counter matches the rank of the compute node, and identifying that the position in the result buffer does not correspond with the rank of the compute node if the current value of the position counter does not match the rank of the compute node, and incrementing the current value of the position counter;if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data,if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, andstoring, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network. 12. The computer program product of claim 11 wherein computer program instructions capable of contributing, by the compute node, a value of zero further comprises computer program instructions capable of injecting the zero from dedicated hardware of the compute node. 13. The computer program product of claim 11 wherein: a size of each position in the result buffer is the same; anda size of the contribution data of each compute node is the same. 14. The computer program product of claim 11 wherein: a size of the contribution data of each compute node varies; anda size of each position in the result buffer matches the size of the contribution data of the compute node whose rank corresponds with the position. 15. The computer program product of claim 11 wherein the computer program product further comprises a data communications network that includes data communications links connected to the compute nodes so as to organize the compute nodes as a tree, each compute node having a separate ALU dedicated to parallel operations.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (44)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.