Improving efficiency of a global barrier operation in a parallel computer
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/52
G06F-009/54
G06F-009/46
G06F-015/173
출원번호
US-0683726
(2012-11-21)
등록번호
US-9459934
(2016-10-04)
발명자
/ 주소
Archer, Charles J.
Blocksome, Michael A.
Ratterman, Joseph D.
Smith, Brian E.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Kennedy, Brandon C.
인용정보
피인용 횟수 :
0인용 특허 :
105
초록▼
Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have
Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier.
대표청구항▼
1. A method of performing a global barrier operation in a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes coupled for data communications, each compute node executing a plurality of tasks, with one task on each compute node designated as a master t
1. A method of performing a global barrier operation in a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes coupled for data communications, each compute node executing a plurality of tasks, with one task on each compute node designated as a master task, the method comprising: for each task on each compute node until all master tasks have joined a global barrier:determining whether the task is a master task, wherein each task includes an indicator indicating whether the task is or is not a master task;if the task is not a master task, joining a single local barrier on a compute node of the plurality of compute nodes;if the task is a master task, joining both the single local barrier on the compute node and the global barrier only after all other tasks on the compute node have joined the single local barrier on the compute node; andwherein joining the single local barrier includes atomically incrementing a value of a counter, which tracks tasks that joined the single local barrier, and a number of times equivalent to a result of a difference between a total number of tasks joining the single local barrier and a replacement value, the replacement value comprising a power-of-two greater than or equal to the total number of tasks joining the single local barrier on the compute node. 2. The method of claim 1 wherein joining the single local barrier on the compute node further comprises: for each task on the compute node:retrieving a present value of the counter that tracks tasks that joined the single local barrier for the compute node;calculating, in dependence upon the present value of the counter and the total number of tasks joining the single local barrier on the compute node, a base value of the counter, the base value representing the counter's value prior to any task joining the single local barrier on the compute node;calculating, in dependence upon the base value and the total number of tasks joining the single local barrier on the compute node, a target value of the counter, the target value representing the counter's value when all tasks have joined the single local barrier on the compute node; andrepetitively, until the present value of the counter is no less than the target value of the counter: retrieving the present value of the counter and determining whether the present value equals the target value. 3. The method of claim 2 wherein: calculating the base value of the counter further comprises: calculating the base value as zero if the present value of the counter is less than the total number of tasks joining the single local barrier on the compute node, and calculating the base value as the difference between the present value of the counter and a remainder after division of the present value of the counter by the total number of tasks joining the single local barrier on the compute node, if the present value of the counter is not less than or equal to the total number of tasks; andcalculating the target value of the counter further comprises calculating the target value as the sum of the base value and the total number of tasks joining the single local barrier on the compute node. 4. The method of claim 2 wherein: calculating the base value of the counter further comprises: establishing the replacement value;establishing a bitmask, including calculating a bitwise inverse of one less than the replacement value; andcalculating the base value as a result of a bitwise AND operation with the bitmask and the present counter value; andcalculating the target value of the counter further comprises calculating the target value as the sum of the base value and the replacement value. 5. The method of claim 2 wherein calculating the base value of the counter includes establishing the replacement value. 6. The method of claim 2 wherein calculating the target value of the counter further comprises calculating the target value as the sum of the base value and the replacement value. 7. The method of claim 1 wherein the compute nodes of the parallel computer are coupled for data communications by a plurality of data communications networks, the plurality of data communication networks comprising a mesh network and a torus network. 8. The method of claim 1 further comprising, for each task on each compute node until all master tasks have joined the global barrier, if the task is the master task and all other tasks on the compute node have not joined the single local barrier on the compute node, waiting a predefined amount of time before proceeding again to determine whether all tasks have joined the single local barrier.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (105)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E., Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Rhoades, John; Cameron, Ken; Winser, Paul; McConnell, Ray; Faulds, Gordon; McIntosh-Smith, Simon; Spencer, Anthony; Bond, Jeff; Dejaegher, Matthias; Halamish, Danny; Panesar, Gajinder, Data processing architectures for packet handling wherein batches of data packets of unpredictable size are distributed across processing elements arranged in a SIMD array operable to process different respective packet protocols at once while executing a single common instruction stream.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Woods, Randy D.; Dupree, Wayne P.; Jachim, David M.; Verniers, Gerrit H.; Churchill, Stephen G.; Fernandez, George P., Distributed computing environment using real-time scheduling logic and time deterministic architecture.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E., Locating hardware faults in a data communications network of a parallel computer.
Blumrich, Matthias A.; Chen, Dong; Chiu, George L.; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Kopcsay, Gerard V.; Mok, Lawrence S.; Takken, Todd E., Massively parallel supercomputer.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Rangarajan, Vijay; Maniyar, Shyamsundar N.; Eatherton, William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rangarajan,Vijay; Maniyar,Shyamsundar N.; Eatherton,William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rodgers,Dion; Marr,Deborah T.; Hill,David L.; Kaushik,Shiv; Crossland,James B.; Koufaty,David A., Method and apparatus for suspending execution of a thread until a specified memory access occurs.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Birrittella Mark S. ; Kessler Richard E. ; Oberlin Steven M. ; Passint Randal S. ; Thorson Greg, Multiprocessor computer system with interleaved processing element nodes.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E., Performing a scatterv operation on a hierarchical tree network optimized for collective operations.
VanHuben Gary Alan ; Blake Michael A. ; Mak Pak-kin, SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum.
Padmanabha I. Venkitakrishnan ; Gopalakrishnan Janakiraman ; Tsen-Gong Jim Hsu ; Rajendra Kumar, Scalable system control unit for distributed shared memory multi-processor systems.
Kil, David H.; Pottschmidt, David B., System and method for automatic generation of a hierarchical tree network and the use of two complementary learning algorithms, optimized for each leaf of the hierarchical tree network.
Papakipos, Matthew N.; Grant, Brian K.; McGuire, Morgan S.; Demetriou, Christopher G., Systems and methods for determining compute kernels for an application in a parallel-processing computer system.
Mahesh N. Ganmukhi ; Jeffrey V. Hill ; Monica C. Wong-Chan ; David C. Douglas, Tree network including arrangement for establishing sub-tree having a logical root below the network's physical root.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.