Collective operation protocol selection in a parallel computer
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/44
G06F-009/45
G06F-009/38
G06F-015/78
G06F-009/50
G06F-011/34
출원번호
US-0206116
(2011-08-09)
등록번호
US-8893083
(2014-11-18)
발명자
/ 주소
Archer, Charles J.
Blocksome, Michael A.
Ratterman, Joseph D.
Smith, Brian E.
출원인 / 주소
International Business Machines Coporation
대리인 / 주소
Biggers Kennedy Lenart Spraggins LLP
인용정보
피인용 횟수 :
0인용 특허 :
91
초록▼
Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol inc
Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold.
대표청구항▼
1. An apparatus for collective operation protocol selection in a parallel computer, the parallel computer comprising a plurality of compute nodes, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed withi
1. An apparatus for collective operation protocol selection in a parallel computer, the parallel computer comprising a plurality of compute nodes, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: calling a collective operation with one or more operating parameters;selecting one of a plurality of protocols that define execution of the collective operation, including, iteratively, for each protocol beginning with a first prospective protocol until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters of the collective operation;determining, by the performance function, whether the prospective protocol meets predefined performance criteria for the operating parameters, including evaluating, with the operating parameters, a predefined performance fit equation for the prospective protocol, calculating a measure of performance of the prospective protocol for the operating parameters, and determining that the prospective protocol meets predetermined performance criteria; andselecting the prospective protocol as the protocol for executing the collective operation only if the calculated measure of performance is greater than a predefined minimum performance threshold; andexecuting the collective operation with the selected protocol. 2. The apparatus of claim 1 wherein: each protocol of the collective operation is associated with metadata, the metadata for each collective operation including a pointer to the protocol's performance function; andproviding, to a protocol performance function for the prospective protocol, the operating parameters of the collective operation further comprises retrieving, from the prospective protocol's metadata, the pointer to the prospective protocol's performance function. 3. The apparatus of claim 1 further comprising computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: prior to protocol selection, for one or more sets of operating parameters and one or more prospective protocols of the collective operation: determining whether the prospective protocol meets predetermined performance criteria; andcaching each determination of a prospective protocol meeting the predetermined performance criteria upon establishment of an operational group of the compute nodes,wherein selecting one of a plurality of protocols for executing the collective operation further comprises:determining, for the operating parameters of the collective operation, whether there is a cached determination of a prospective protocol meeting the predetermined performance criteria; andif there is a cached determination of a prospective protocol meeting the predetermined performance criteria, selecting the prospective protocol as the protocol for executing the collective operation, without calculating a measure of performance of the prospective protocol for the operating parameter during protocol selection. 4. The apparatus of claim 1 further comprising computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: establishing, for each protocol of the collective operation, a predefined performance fit equation, including:executing the protocol once for each of a plurality of sets of operating parameters;recording, for each execution, a performance measurement; andcalculating a fit equation for the recorded performance measurements. 5. The apparatus of claim 4 wherein calculating a fit equation for the recorded performance measurements further comprises calculating one of: a linear approximation fit equation;a cubic approximation fit equation; anda quartic approximation fit equation. 6. The apparatus of claim 4 wherein calculating a fit equation for the recorded performance measurements further comprises calculating an exact function for all possible operating parameters. 7. A computer program product for collective operation protocol selection in a parallel computer, the parallel computer comprising a plurality of compute nodes, the computer program product disposed upon a computer readable medium that is not a signal medium, the computer program product comprising computer program instructions that, when executed, cause a computer to carry out the steps of: calling a collective operation with one or more operating parameters;selecting one of a plurality of protocols that define execution of the collective operation, including, iteratively, for each protocol beginning with a first prospective protocol until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters of the collective operation;determining, by the performance function, whether the prospective protocol meets predefined performance criteria for the operating parameters, including evaluating, with the operating parameters, a predefined performance fit equation for the prospective protocol, calculating a measure of performance of the prospective protocol for the operating parameters, and determining that the prospective protocol meets predetermined performance criteria; andselecting the prospective protocol as the protocol for executing the collective operation only if the calculated measure of performance is greater than a predefined minimum performance threshold; andexecuting the collective operation with the selected protocol. 8. The computer program product of claim 7 wherein: each protocol of the collective operation is associated with metadata, the metadata for each collective operation including a pointer to the protocol's performance function; andproviding, to a protocol performance function for the prospective protocol, the operating parameters of the collective operation further comprises retrieving, from the prospective protocol's metadata, the pointer to the prospective protocol's performance function. 9. The computer program product of claim 7 further comprising computer program instructions that, when executed, cause the computer to carry out the steps of: prior to protocol selection, for one or more sets of operating parameters and one or more prospective protocols of the collective operation: determining whether the prospective protocol meets predetermined performance criteria; andcaching each determination of a prospective protocol meeting the predetermined performance criteria upon establishment of an operational group of the compute nodes,wherein selecting one of a plurality of protocols for executing the collective operation further comprises:determining, for the operating parameters of the collective operation, whether there is a cached determination of a prospective protocol meeting the predetermined performance criteria; andif there is a cached determination of a prospective protocol meeting the predetermined performance criteria, selecting the prospective protocol as the protocol for executing the collective operation, without calculating a measure of performance of the prospective protocol for the operating parameter during protocol selection. 10. The computer program product of claim 7 further comprising computer program instructions that, when executed, cause the computer to carry out the steps of: establishing, for each protocol of the collective operation, a predefined performance fit equation, including:executing the protocol once for each of a plurality of sets of operating parameters;recording, for each execution, a performance measurement; andcalculating a fit equation for the recorded performance measurements. 11. The computer program product of claim 10 wherein calculating a fit equation for the recorded performance measurements further comprises calculating one of: a linear approximation fit equation;a cubic approximation fit equation; anda quartic approximation fit equation. 12. The computer program product of claim 10 wherein calculating a fit equation for the recorded performance measurements further comprises calculating an exact function for all possible operating parameters.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (91)
Scott Steven L. ; Pribnow Richard D. ; Logghe Peter G. ; Kunkel Daniel L. ; Schwoerer Gerald A., Adaptive congestion control mechanism for modular computer networks.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E., Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks.
Kato Sadayuki,JPX ; Ishihata Hiroaki,JPX ; Horie Takeshi,JPX ; Inano Satoshi,JPX ; Shimizu Toshiyuki,JPX, Data gathering/scattering system for a plurality of processors in a parallel computer.
Connor, Patrick L.; McVay, Robert G., Direct memory access transfer reduction method and apparatus to overlay data on to scatter gather descriptors for bus-mastering I/O controllers.
Michael Olivier, Dynamically matching users for group communications based on a threshold degree of matching of sender and recipient predetermined acceptance criteria.
Archer, Charles J.; Ratterman, Joseph D., Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation.
Cypher Robert E. (Los Gatos CA) Sanz Jorge L. C. (Los Gatos CA), Hierarchical interconnection network architecture for parallel processing, having interconnections between bit-addressib.
Flaig Charles M. (Pasadena CA) Seitz Charles L. (San Luis Rey CA), Inter-computer message routing system with each computer having separate routinng automata for each dimension of the net.
Blumrich, Matthias A.; Chen, Dong; Chiu, George L.; Cipolla, Thomas M.; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Kopcsay, Gerard V.; Mok, Lawrence S.; Takken, Todd E., Massively parallel supercomputer.
Carmichael Richard D. ; Ward Joel M. ; Winchell Michael A., Method and apparatus for controlling (N+I) I/O channels with (N) data managers in a homogenous software programmable en.
Rangarajan, Vijay; Maniyar, Shyamsundar N.; Eatherton, William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rangarajan,Vijay; Maniyar,Shyamsundar N.; Eatherton,William N., Method and apparatus for storing tree data structures among and within multiple memory channels.
Rodgers,Dion; Marr,Deborah T.; Hill,David L.; Kaushik,Shiv; Crossland,James B.; Koufaty,David A., Method and apparatus for suspending execution of a thread until a specified memory access occurs.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Krishnamoorthy Ashok V. (11188 Caminito Rodar San Diego CA 92126) Kiamilev Fouad (c/o UNC Charlotte ; Dept. of EE ; Smith Hall Room 332 Charlotte NC 28223), Packet-switched self-routing multistage interconnection network having contention-free fanout, low-loss routing, and fan.
Yasuda Yoshiko,JPX ; Tanaka Teruo,JPX, Parallel computer system using properties of messages to route them through an interconnect network and to select virtua.
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, Partitioning of processing elements in a SIMD/MIMD array processor.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E., Performing a scatterv operation on a hierarchical tree network optimized for collective operations.
VanHuben Gary Alan ; Blake Michael A. ; Mak Pak-kin, SMP clusters with remote resource managers for distributing work to other clusters while reducing bus traffic to a minimum.
Kil, David H.; Pottschmidt, David B., System and method for automatic generation of a hierarchical tree network and the use of two complementary learning algorithms, optimized for each leaf of the hierarchical tree network.
Papakipos, Matthew N.; Grant, Brian K.; McGuire, Morgan S.; Demetriou, Christopher G., Systems and methods for determining compute kernels for an application in a parallel-processing computer system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.