Modifying an operation of one or more processors executing message passing interface tasks
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/46
G06F-015/173
출원번호
US-0846101
(2007-08-28)
등록번호
US-8108876
(2012-01-31)
발명자
/ 주소
Arimilli, Lakshminarayana B.
Arimilli, Ravi K.
Rajamony, Ramakrishnan
Speight, William E.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Walder, Jr., Stephen J.
인용정보
피인용 횟수 :
7인용 특허 :
19
초록▼
Mechanisms for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing work loads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a sy
Mechanisms for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing work loads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI Load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
대표청구항▼
1. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive one or more message passing interface (MPI) synchronization operation calls from
1. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive one or more message passing interface (MPI) synchronization operation calls from one or more processors of a plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors;store an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp;modify an operation of the plurality of processors for executing the MPI job based on the history data structure by:determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; andin response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor;determine a measure of a relative completion of computation phases of tasks of the MPI job on the plurality of processors based on the history data structure; andmodify the operation of the plurality of processors based on the relative completion of computation phases of tasks of the MPI job, wherein the measure of the relative completion of computation phases of tasks of the MPI job indicate a relative order in which the processors in the plurality of processors completed their respective computation phases of tasks. 2. The computer program product of claim 1, wherein the MPI job is a set of tasks to be performed in parallel on the plurality of processors, and wherein each processor of the plurality of processors executes a corresponding task of the MPI job in parallel on a corresponding set of data allocated to the processor from a superset of data. 3. The computer program product of claim 1, wherein the computer readable program further causes the computing device to: determine if the measure of the relative completion of computation phases exceeds a threshold; andmodify the operation of the plurality of processors based on the relative completion of computation phases of tasks of the MPI job only if the measure of the relative completion of computation phases exceeds the threshold. 4. The computer program product of claim 1, wherein the computer readable program causes the computing device to determine a measure of the relative completion of computation phases of tasks of the MPI job on the plurality of processors based on the history data structure by: determining, based on task identifiers and timestamps in entries of the history data structure, which processor in the plurality of processors has completed its allocated task of the MPI job prior to all other processors in the plurality of processors. 5. The computer program product of claim 1, wherein the computer readable program causes the computing device to modify an operation of the plurality of processors for executing the MPI job based on the entries in the history data structure by: determining, based on the history data structure, in a current processor of the plurality of processors, in response to a call of the MPI synchronization operation by the current processor, if a call to the MPI synchronization operation has been made by another processor prior to the call of the MPI synchronization operation by the current processor; andperforming an operation in the current processor to reduce wasted resources of the current processor in response to a call to the MPI synchronization operation not having been made by another processor prior to the call of the MPI synchronization operation by the current processor. 6. The computer program product of claim 1, wherein each processor of the plurality of processors comprises a MPI load balancing controller, each MPI load balancing controller maintains a version of the history data structure, and wherein each MPI load balancing controller executes the computer readable program. 7. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by: identifying a slowest processor in the plurality of processors based on the history data structure, wherein the slowest processor is a last processor to call the MPI synchronization operation; andperforming one or more operations to reduce an amount of workload of the slowest processor in a subsequent MPI job processing cycle. 8. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by: modifying workloads of the processors in the plurality of processors in order to bring time periods for completion of MPI tasks executing on the processors within a tolerance of each other. 9. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by: selecting another program to execute on at least one of the processors in the plurality of processors while that processor is idle with regard to the MPI job and while other processors in the plurality of processors are executing their tasks of the MPI job. 10. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by: executing one or more housekeeping operations in at least one of the processors in the plurality of processors while that processor is idle with regard to the MPI job. 11. The computer program product of claim 10, wherein the one or more housekeeping operations comprise at least one of a memory management operation or a garbage collection operation. 12. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by: placing at least one of the processors in the plurality of processors into a low power consumption state while that processor is idle with regard to the MPI job. 13. The computer program product of claim 1, wherein the computer readable program causes the computing device to further modify an operation of the plurality of processors for executing the MPI job based on the history data structure by performing a load balancing function for shifting a workload amongst at least two processors in the plurality of processors. 14. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive one or more message passing interface (MPI) synchronization operation calls from one or more processors of a plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors;store an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp;modify an operation of the plurality of processors for executing the MPI job based on the history data structure by:determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; andin response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor, wherein the computer readable program causes the computing device to modify an operation of the plurality of processors for executing the MPI job based on the entries in the history data structure by:performing, in a current MPI job processing cycle, one or more setup operations in a second processor of the plurality of processors for preparing to process one of a larger portion of data or a larger number of tasks, in a subsequent MPI job processing cycle subsequent to the current MPI job processing cycle, wherein the one or more setup operations are performed while other processors of the plurality of processors are executing their respective tasks of the MPI job in the current MPI job processing cycle. 15. The computer program product of claim 14, wherein the one or more setup operations comprise at least one of allocating a larger portion of cache memory for use by the second processor, setting up buffer space to receive an additional amount of data, or acquiring a host fabric interface window or windows for communication. 16. The computer program product of claim 14, wherein the one or more setup operations comprise at least one of allocating a larger portion of cache memory for use by the first processor or acquiring a host fabric interface window or windows for communication. 17. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive one or more message passim interface MPI synchronization operation calls from one or more processors of a plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors;store an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp;modify an operation of the plurality of processors for executing the MPI job based on the history data structure by:determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; andin response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor, wherein the computer readable program causes the computing device to modify the operation of the plurality of processors for executing the MPI job based on the history data structure by selecting another program for execution on at least one of the processors of the plurality of processors during an idle period before a last processor in the plurality of processors calls the MPI synchronization operation and while other processors in the plurality of processors are executing their tasks of the MPI job. 18. A system for executing a message passing interface (MPI) job using a plurality of processors, comprising: a plurality of processors; andat least one load balancing controller associated with the plurality of processors, wherein the load balancing controller:receives one or more MPI synchronization operation calls from one or more processors of the plurality of processors, wherein the MPI synchronization operation calls include an identifier of a MPI task performing the MPI synchronization operation call and a timestamp of the MPI synchronization operation call, the MPI task being part of an MPI job being executed on the plurality of processors;stores an entry in a history data structure identifying the one or more MPI synchronization operation calls and their associated MPI task identifier and timestamp;modifies an operation of the plurality of processors for executing the MPI job based on the entries in the history data structure by:determining if a wait period of a first processor in the plurality of processors meets or exceeds a threshold value; andin response to the wait period of the first processor meeting or exceeding the threshold value, modifying an operation of the plurality of processors to reduce the wait period of the first processor;determine a measure of a relative completion of computation phases of tasks of the MPI job on the plurality of processors based on the history data structure; andmodify the operation of the plurality of processors based on the relative completion of computation phases of tasks of the MPI job, wherein the measure of the relative completion of computation phases of tasks of the MPI job indicate a relative order in which the processors in the plurality of processors completed their respective computation phases of tasks.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (19)
Diard,Franck R., Adaptive load balancing in a multi-processor graphics processing system.
Ted Eric Blank ; Tammie Dang ; Fen-Ling Lin ; Randy Mitchell Nakagawa ; Bryan Frederick Smith ; Craig Leonard Sutton ; Darren Benjamin Swank ; Hong Sang Tie ; Dino Carlo Tonelli ; Annie S. T, Apportioning a work unit to execute in parallel in a heterogeneous environment.
Vrba Richard Alan ; Klecka James Stevens ; Fey ; Jr. Kyran Wilfred ; Lamano Larry Leonard ; Mehta Nikhil A., High-performance fault tolerant computer system with clock length synchronization of loosely coupled processors.
Hwang, Cherng-Daw; Wong, Kenley, Method and apparatus for providing a time-division multiplexing (TDM) interface among a high-speed data stream and multiple processors.
Konno Chisato,JPX ; Okochi Toshio,JPX, Program execution control in parallel processor system for parallel execution of plural jobs by selected number of proce.
Cousins, David Bruce; Daily, Matthew Paul; Lirakis, Christopher Burbank, System and method for automatically optimizing heterogenous multiprocessor software performance.
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Hardware based dynamic load balancing of message passing interface tasks by modifying tasks.
Chung, Won-young; Lee, Yong Surk; Park, Jong-su; Jeong, Ha-young, Method of performing collective communication according to status-based determination of a transmission order between processing nodes and collective communication system using the same.
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks.
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.