Performing an all-to-all data exchange on a plurality of data buffers by performing swap operations
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-003/00
G06F-015/16
출원번호
US-0176816
(2008-07-21)
등록번호
US-8281053
(2012-10-02)
발명자
/ 주소
Archer, Charles J.
Peters, Amanda E.
Smith, Brian E.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Biggers & Ohanian, LLP
인용정보
피인용 횟수 :
3인용 특허 :
36
초록▼
Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank v
Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting data element i in data buffer j; selecting data element j in data buffer i; and exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation.
대표청구항▼
1. A computer-implemented method of performing an all-to-all exchange on n number of data buffers stored in computer memory using computer-implemented XOR swap operations, each data buffer having n number of data elements, the method comprising for each rank value of i and j where i is greater than
1. A computer-implemented method of performing an all-to-all exchange on n number of data buffers stored in computer memory using computer-implemented XOR swap operations, each data buffer having n number of data elements, the method comprising for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting, by a module of automated computing machinery and without user intervention, data element i in data buffer j, wherein data buffer j is stored in computer memory of a compute node;selecting, by the module of automated computing machinery and without user intervention, data element j in data buffer i, wherein data buffer i is stored in computer memory of a compute node; andexchanging, by the module of automated computing machinery and without user intervention, contents of data element i in data buffer j with contents of data element j in data buffer i using a computer-implemented XOR swap operation, wherein each data element has a value. 2. The method of claim 1 wherein: each data buffer is stored on a distinct compute node of a parallel computer, the compute nodes connected together using a global combining network; andexchanging contents of data element i in data buffer j with contents of data element j in data buffer i using a computer-implemented XOR swap operation further comprises performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j. 3. The method of claim 2 wherein performing the series of bitwise XOR allreduce operations further comprises combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 4. The method of claim 2 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 5. The method of claim 2 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations. 6. An apparatus for performing an all-to-all exchange on n number of data buffers using XOR swap operations, each data buffer having n number of data elements, the apparatus comprising: one or more computer processors and computer memory operatively coupled to the computer processors, the computer memory having disposed within it computer program instructions, the computer processor executing the computer program instructions, causing the apparatus to carry out the steps of:for each rank value of i and j where i is greater than j and where i is less than or equal to n:selecting data element i in data buffer j;selecting data element j in data buffer i; andexchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation, wherein each data element has a value. 7. The apparatus of claim 6 wherein: each data buffer is stored on a distinct compute node of a parallel computer, the compute nodes connected together using a global combining network; andexchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation further comprises performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j. 8. The apparatus of claim 7 wherein performing the series of bitwise XOR allreduce operations further comprises combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 9. The apparatus of claim 7 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 10. The apparatus of claim 7 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations. 11. A computer program product for performing an all-to-all exchange on n number of data buffers using XOR swap operations, each data buffer having n number of data elements, the computer program product comprising: a computer readable medium, where the computer readable medium is not a signal and the computer readable medium comprises computer program instructions that, when executed by a computer processor, cause a computer to carry out the steps of:for each rank value of i and j where i is greater than j and where i is less than or equal to n:selecting data element i in data buffer j;selecting data element j in data buffer i; andexchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation, wherein each data element has a value. 12. The computer program product of claim 11 wherein: each data buffer is stored on a distinct compute node of a parallel computer, the compute nodes connected together using a global combining network; andexchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation further comprises performing a series of bitwise XOR allreduce operations through the global combining network in which the compute node storing data buffer j contributes the contents of data element i and the compute node storing data buffer i contributes the contents of data element j. 13. The computer program product of claim 12 wherein performing the series of bitwise XOR allreduce operations further comprises combining, by network hardware using a bitwise XOR operator, contributions of the compute nodes participating in the bitwise XOR allreduce operation. 14. The computer program product of claim 12 wherein performing the series of bitwise XOR allreduce operations further comprises contributing, by the compute nodes other than the compute nodes storing data buffer i and data buffer j, the identity value to the series of bitwise XOR allreduce operations. 15. The computer program product of claim 12 wherein the compute nodes are connected together for data communications using a plurality of data communications networks, at least one of the networks optimized for collective operations, and at least one of the data communications networks optimized for point to point operations.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (36)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
Archer, Charles J.; K. A., Nysal Jan; Sharkawi, Sameh S., Executing an all-to-allv operation on a parallel computer that includes a plurality of compute nodes.
Archer, Charles J.; K.A., Nysal Jan; Sharkawi, Sameh S., Executing an all-to-ally operation on a parallel computer that includes a plurality of compute nodes.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E., Performing a vector collective operation on a parallel computer having a plurality of compute nodes.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.