Developing collective operations for a parallel computer
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/44
G06F-009/54
출원번호
US-0369451
(2012-02-09)
등록번호
US-9495135
(2016-11-15)
발명자
/ 주소
Archer, Charles J.
Carey, James E.
Sanders, Philip J.
Smith, Brian E.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Kennedy, Brandon C.
인용정보
피인용 횟수 :
0인용 특허 :
103
초록▼
Developing collective operations for a parallel computer that includes compute nodes includes: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer; receiving, by the collective development tool from the collective developer through the GUI, a se
Developing collective operations for a parallel computer that includes compute nodes includes: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer; receiving, by the collective development tool from the collective developer through the GUI, a selection of one or more collective primitives; receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives and a specification of input and output buffers for each collective primitive; and generating, by the collective development tool in dependence upon the selection of collective primitives, the serial order of the collective primitives, and the input and output buffers for each collective primitive, executable code that carries out the collective operation specified by the collective primitives.
대표청구항▼
1. An apparatus for developing collective operations for a parallel computer comprising a plurality of compute nodes, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instruc
1. An apparatus for developing collective operations for a parallel computer comprising a plurality of compute nodes, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer;receiving, by the collective development tool from the collective developer through the GUI, a selection of a plurality of collective primitives;receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives;receiving, by the collective development tool from the collective developer through the GUI, a specification of an input buffer and an output buffer for each collective primitive; andwherein receiving, by the collective development tool from the collective developer through the GUI, the selection of the plurality of collective primitives further comprises:detecting, by the collective development tool through the GUI, user input device activity that indicates a selection of one or more graphical icons related to the collective primitives;generating, by a collective development tool in dependence upon the selection of the plurality of graphical icons related to collective primitives, the serial order of the plurality of collective primitives specifying an order of execution of the plurality of collective primitives, and the specification of an input buffer and an output buffer for each collective primitive, executable code that carries out the collective operation specified by the collective primitives, including:converting the serial order of the collective primitives into an execution order of the plurality of collective primitive modules of computer program instructions;inserting, into an executable file, the plurality of collective primitive modules of computer program instructions in the execution order; andinserting, into the executable file, one or more glue modules between the plurality of collective primitive modules, wherein each glue module is a module of computer program instructions configured to be inserted between two collective primitive modules for the purpose of linking the two collective primitive modules during execution of the collective operation, wherein each glue module is selected in dependence upon attributes of the parallel computer upon which the collective operation is to be carried out, including respective collective primitives, respective network topologies, compute node architecture, and number of compute nodes in a computational group. 2. The apparatus of claim 1 wherein at least one collective primitive comprises: a multi-sync primitive that, when executed, carries out synchronization among a plurality of compute nodes. 3. The apparatus of claim 1 wherein at least one collective primitive comprises: a multi-cast primitive that, when executed, sends a message to a group of nodes in parallel. 4. The apparatus of claim 1 wherein at least one collective primitive comprises: a multi-combine primitive that, when executed, performs an operation on data received from more than one compute node. 5. The apparatus of claim 1 wherein at least one collective primitive comprises: a many-to-many primitive that, when executed, sends unique date to a group of compute nodes and receives data from another group of compute nodes. 6. A computer program product for developing collective operations for a parallel computer comprising a plurality of compute nodes, the computer program product disposed upon a computer readable storage medium, wherein the computer program product is not a signal, the computer program product comprising computer program instructions that, when executed, cause a computer to carry out the steps of: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer;receiving, by the collective development tool from the collective developer through the GUI, a selection of a plurality of collective primitives;receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives;receiving, by the collective development tool from the collective developer through the GUI, a specification of an input buffer and an output buffer for each collective primitive; andwherein receiving, by the collective development tool from the collective developer through the GUI, the selection of the plurality of collective primitives further comprises:detecting, by the collective development tool through the GUI, user input device activity that indicates a selection of one or more graphical icons related to the collective primitives;generating, by a collective development tool in dependence upon the selection of the plurality of graphical icons related to collective primitives, the serial order of the plurality of collective primitives specifying an order of execution of the plurality of collective primitives, and the specification of an input buffer and an output buffer for each collective primitive, executable code that carries out the collective operation specified by the collective primitives, including:converting the serial order of the collective primitives into an execution order of the plurality of collective primitive modules of computer program instructions;inserting, into an executable file, the plurality of collective primitive modules of computer program instructions in the execution order; andinserting, into the executable file, one or more glue modules between the plurality of collective primitive modules, wherein each glue module is a module of computer program instructions configured to be inserted between two collective primitive modules for the purpose of linking the two collective primitive modules during execution of the collective operation, wherein each glue module is selected in dependence upon attributes of the parallel computer upon which the collective operation is to be carried out, including respective collective primitives, respective network topologies, compute node architecture, and number of compute nodes in a computational group. 7. The computer program product of claim 6 wherein at least one collective primitive comprises: a multi-sync primitive that, when executed, carries out synchronization among a plurality of compute nodes. 8. The computer program product of claim 6 wherein at least one collective primitive comprises: a multi-cast primitive that, when executed, sends a message to a group of nodes in parallel. 9. The computer program product of claim 6 wherein at least one collective primitive comprises: a multi-combine primitive that, when executed, performs and operation on data received from more than one compute node. 10. The computer program product of claim 6 wherein at least one collective primitive comprises: a many-to-many primitive that, when executed, sends unique date to a group of compute nodes an receives data from another group of compute nodes.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (103)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E., Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Rhoades, John; Cameron, Ken; Winser, Paul; McConnell, Ray; Faulds, Gordon; McIntosh-Smith, Simon; Spencer, Anthony; Bond, Jeff; Dejaegher, Matthias; Halamish, Danny; Panesar, Gajinder, Data processing architectures for packet handling wherein batches of data packets of unpredictable size are distributed across processing elements arranged in a SIMD array operable to process different respective packet protocols at once while executing a single common instruction stream.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Woods, Randy D.; Dupree, Wayne P.; Jachim, David M.; Verniers, Gerrit H.; Churchill, Stephen G.; Fernandez, George P., Distributed computing environment using real-time scheduling logic and time deterministic architecture.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E., Locating hardware faults in a data communications network of a parallel computer.
Blumrich, Matthias A.; Chen, Dong; Chiu, George L.; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Kopcsay, Gerard V.; Mok, Lawrence S.; Takken, Todd E., Massively parallel supercomputer.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Rangarajan, Vijay; Maniyar, Shyamsundar N.; Eatherton, William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rangarajan,Vijay; Maniyar,Shyamsundar N.; Eatherton,William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rodgers,Dion; Marr,Deborah T.; Hill,David L.; Kaushik,Shiv; Crossland,James B.; Koufaty,David A., Method and apparatus for suspending execution of a thread until a specified memory access occurs.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Birrittella Mark S. ; Kessler Richard E. ; Oberlin Steven M. ; Passint Randal S. ; Thorson Greg, Multiprocessor computer system with interleaved processing element nodes.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E., Performing a scatterv operation on a hierarchical tree network optimized for collective operations.
VanHuben Gary Alan ; Blake Michael A. ; Mak Pak-kin, SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum.
Padmanabha I. Venkitakrishnan ; Gopalakrishnan Janakiraman ; Tsen-Gong Jim Hsu ; Rajendra Kumar, Scalable system control unit for distributed shared memory multi-processor systems.
Kil, David H.; Pottschmidt, David B., System and method for automatic generation of a hierarchical tree network and the use of two complementary learning algorithms, optimized for each leaf of the hierarchical tree network.
Papakipos, Matthew N.; Grant, Brian K.; McGuire, Morgan S.; Demetriou, Christopher G., Systems and methods for determining compute kernels for an application in a parallel-processing computer system.
Mahesh N. Ganmukhi ; Jeffrey V. Hill ; Monica C. Wong-Chan ; David C. Douglas, Tree network including arrangement for establishing sub-tree having a logical root below the network's physical root.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.