Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more
Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more nodes to perform the collective operation is not available; if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; selecting a next optimized collective operation; and performing the next optimized collective operation.
대표청구항▼
1. A method of optimizing collective operations by an operational group on a parallel computer, wherein the operational group comprises a plurality of compute nodes, the method comprising: receiving, by each of the nodes in the operational group, an instruction to perform a collective operation type
1. A method of optimizing collective operations by an operational group on a parallel computer, wherein the operational group comprises a plurality of compute nodes, the method comprising: receiving, by each of the nodes in the operational group, an instruction to perform a collective operation type;selecting, by each of the nodes in the operational group from a list of optimized collective operations, an optimized collective operation for the collective operation type;performing, by each of the nodes in the operational group, the selected optimized collective operation;determining, by one or more of the nodes in the operational group, whether a resource needed by the one or more nodes to perform the collective operation is not available;if a resource needed by the one or more nodes to perform the collective operation is not available: notifying, by one or more of the nodes in the operational group, the other nodes that the resource is not available;selecting, by each of the nodes in the operational group from the list of optimized collective operations, a next optimized collective operation; andperforming, by each of the nodes in the operational group, the next optimized collective operation. 2. The method of claim 1 wherein determining, by one or more of the nodes in the operational group, whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises identifying an invalid class route. 3. The method of claim 1 wherein determining, by one or more of the nodes in the operational group, whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises identifying an link failure on a link adjacent to the one or more nodes. 4. The method of claim 1 wherein the selected optimized collective operation further comprises an in-place operation; and performing the selected optimized collective operation includes copying the contents of the source buffer before performing the optimized collective operation; andselecting a next optimized collective operation further comprises restoring the copied contents to the source buffer. 5. The method of claim 1 wherein one or more of the nodes of the operational group supports transactional memory; and performing the collective operation includes beginning a transaction but not committing the transaction; andselecting a next optimized collective operation further comprises:starting a new transaction without committing the previous transaction. 6. The method of claim 1 wherein the selected optimized collective operation further comprises an in-place operation; and performing the selected optimized collective operation includes blocking until receiving a notification that the operational group is valid. 7. The method of claim 1 wherein the parallel computer comprises: a plurality of compute nodes;a first data communications network coupling the compute nodes for data communications and optimized for point to point data communications; anda second data communications network that includes data communications links coupling the compute nodes so as to organize the compute nodes as a tree, each compute node having a separate arithmetic logic unit (‘ALU’) dedicated to parallel operations. 8. The method of claim 1 wherein the parallel computer comprises a plurality of compute nodes and where the compute nodes comprise: a host computer having a host computer architecture; andan accelerator having an accelerator architecture, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerator adapted to one another for data communications by a system level message passing module. 9. An apparatus for optimizing collective operations by an operational group on a parallel computer, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions for: receiving an instruction to perform a collective operation type;selecting an optimized collective operation for the collective operation type;performing the selected optimized collective operation;determining whether a resource needed by the one or more nodes to perform the collective operation is not available;if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; andselecting a next optimized collective operation; andperforming the next optimized collective operation. 10. The apparatus of claim 9 wherein computer program instructions for determining whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises computer program instructions for identifying an invalid class route. 11. The apparatus of claim 9 wherein computer program instructions for determining whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises computer program instructions for identifying an link failure on a link adjacent to the one or more nodes. 12. The apparatus of claim 9 wherein the selected optimized collective operation further comprises an in-place operation; and computer program instructions for performing the selected optimized collective operation includes computer program instructions for copying the contents of the source buffer before performing the optimized collective operation; andcomputer program instructions for selecting a next optimized collective operation further comprises computer program instructions for restoring the copied contents to the source buffer. 13. The apparatus of claim 9 wherein one or more of the nodes of the operational group supports transactional memory; and computer program instructions for performing the collective operation further comprises beginning a transaction but not committing the transaction; andcomputer program instructions for selecting a next optimized collective operation further comprises starting a new transaction without committing the previous transaction. 14. The apparatus of claim 9 wherein the parallel computer comprises: a plurality of compute nodes;a first data communications network coupling the compute nodes for data communications and optimized for point to point data communications; anda second data communications network that includes data communications links coupling the compute nodes so as to organize the compute nodes as a tree, each compute node having a separate arithmetic logic unit (‘ALU’) dedicated to parallel operations. 15. The apparatus of claim 9 wherein the parallel computer comprises a plurality of compute nodes and where the compute nodes comprise: a host computer having a host computer architecture; andan accelerator having an accelerator architecture, the accelerator architecture optimized, with respect to the host computer architecture, for speed of execution of a particular class of computing functions, the host computer and the accelerator adapted to one another for data communications by a system level message passing module. 16. A computer program product for optimizing collective operations by an operational group on a parallel computer, the computer program product disposed in a non-transitory, computer readable storage medium, the computer program product comprising computer program instructions for: receiving an instruction to perform a collective operation type;selecting an optimized collective operation for the collective operation type;performing the selected optimized collective operation;determining whether a resource needed by the one or more nodes to perform the collective operation is not available;if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; andselecting a next optimized collective operation; andperforming the next optimized collective operation. 17. The computer program product of claim 16 wherein computer program instructions for determining whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises computer program instructions for identifying an invalid class route. 18. The computer program product of claim 16 wherein computer program instructions for determining whether a resource needed by the one or more nodes to perform the collective operation is not available further comprises computer program instructions for identifying an link failure on a link adjacent to the one or more nodes. 19. The computer program product of claim 16 wherein the selected optimized collective operation further comprises an in-place operation; and computer program instructions for performing the selected optimized collective operation further comprises computer program instructions for copying the contents of the source buffer before performing the optimized collective operation; andcomputer program instructions for selecting a next optimized collective operation further comprises computer program instructions for restoring the copied contents to the source buffer. 20. The computer program product of claim 16 wherein one or more of the nodes of the operational group supports transactional memory; and computer program instructions for performing the collective operation further comprises beginning a transaction but not committing the transaction; andcomputer program instructions for selecting a next optimized collective operation further comprises starting a new transaction without committing the previous transaction.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (70)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Blumrich, Matthias A.; Chen, Dong; Chiu, George L.; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Kopcsay, Gerard V.; Mok, Lawrence S.; Takken, Todd E., Massively parallel supercomputer.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Rodgers,Dion; Marr,Deborah T.; Hill,David L.; Kaushik,Shiv; Crossland,James B.; Koufaty,David A., Method and apparatus for suspending execution of a thread until a specified memory access occurs.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
VanHuben Gary Alan ; Blake Michael A. ; Mak Pak-kin, SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.