Performing an all-to-all data exchange on a plurality of data buffers by performing swap operations
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-003/00
G06F-015/16
출원번호
US-0459832
(2012-04-30)
등록번호
US-8775698
(2014-07-08)
발명자
/ 주소
Archer, Charles J.
Peters, Amanda E.
Smith, Brian E.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Biggers Kennedy Lenart Spraggins LLP
인용정보
피인용 횟수 :
0인용 특허 :
99
초록▼
Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank v
Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting data element i in data buffer j; selecting data element j in data buffer i; and exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation.
대표청구항▼
1. A computer-implemented method of performing an all-to-all exchange on n number of data buffers stored in computer memory using computer-implemented swap operations, each data buffer having n number of data elements, each data buffer stored on a distinct compute node of a parallel computer, the co
1. A computer-implemented method of performing an all-to-all exchange on n number of data buffers stored in computer memory using computer-implemented swap operations, each data buffer having n number of data elements, each data buffer stored on a distinct compute node of a parallel computer, the compute nodes connected together with a global combining network, the method comprising, for each rank value of i and j where i is greater than j and where i is less than or equal to n: exchanging, by the module of automated computing machinery and without user intervention, contents of data element i in data buffer j with contents of a data element j in data buffer i using a computer-implemented swap operation, including: performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j, and combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 2. The method of claim 1 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 3. The method of claim 1 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations. 4. An apparatus for performing an all-to-all exchange on n number of data buffers using swap operations, each data buffer having n number of data elements, each data buffer stored on a distinct compute node of a parallel computer, the compute nodes connected together with a global combining network, the apparatus comprising: one or more computer processors and computer memory operatively coupled to the computer processors, the computer memory having disposed within it computer program instructions, the computer processor executing the computer program instructions, causing the apparatus to carry out the steps of:for each rank value of i and j where i is greater than j and where i is less than or equal to n:exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using a swap operation including: performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j, and combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 5. The apparatus of claim 4 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 6. The apparatus of claim 4 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations. 7. A computer program product for performing an all-to-all exchange on n number of data buffers using XOR swap operations, each data buffer having n number of data elements, each data buffer stored on a distinct compute node of a parallel computer, the compute nodes connected together with a global combining network, the computer program product comprising: a non-transitory computer readable medium, the computer program product comprising computer program instructions that, when executed by a computer processor, cause a computer to carry out the steps of:for each rank value of i and j where i is greater than j and where i is less than or equal to n:exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using a swap operation including: performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j, and combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 8. The computer program product of claim 7 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 9. The computer program product of claim 7 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (99)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E., Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Blumrich, Matthias A.; Chen, Dong; Chiu, George L.; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Kopcsay, Gerard V.; Mok, Lawrence S.; Takken, Todd E., Massively parallel supercomputer.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Rangarajan, Vijay; Maniyar, Shyamsundar N.; Eatherton, William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rangarajan,Vijay; Maniyar,Shyamsundar N.; Eatherton,William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rodgers,Dion; Marr,Deborah T.; Hill,David L.; Kaushik,Shiv; Crossland,James B.; Koufaty,David A., Method and apparatus for suspending execution of a thread until a specified memory access occurs.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Birrittella Mark S. ; Kessler Richard E. ; Oberlin Steven M. ; Passint Randal S. ; Thorson Greg, Multiprocessor computer system with interleaved processing element nodes.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E., Performing a scatterv operation on a hierarchical tree network optimized for collective operations.
VanHuben Gary Alan ; Blake Michael A. ; Mak Pak-kin, SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum.
Padmanabha I. Venkitakrishnan ; Gopalakrishnan Janakiraman ; Tsen-Gong Jim Hsu ; Rajendra Kumar, Scalable system control unit for distributed shared memory multi-processor systems.
Kil, David H.; Pottschmidt, David B., System and method for automatic generation of a hierarchical tree network and the use of two complementary learning algorithms, optimized for each leaf of the hierarchical tree network.
Papakipos, Matthew N.; Grant, Brian K.; McGuire, Morgan S.; Demetriou, Christopher G., Systems and methods for determining compute kernels for an application in a parallel-processing computer system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.