IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0846168
(2007-08-28)
|
등록번호 |
US-8312464
(2012-11-13)
|
발명자
/ 주소 |
- Arimilli, Lakshminarayana B.
- Arimilli, Ravi K.
- Rajamony, Ramakrishnan
- Speight, William E.
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
3 인용 특허 :
22 |
초록
▼
Mechanisms are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waitin
Mechanisms are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.
대표청구항
▼
1. A method, in a multiple processor system, for balancing a Message Passing Interface (MPI) workload across a plurality of processors, comprising: receiving a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determining if a state of the mu
1. A method, in a multiple processor system, for balancing a Message Passing Interface (MPI) workload across a plurality of processors, comprising: receiving a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determining if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identifying a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identifying a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; and modifying MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors. 2. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed by one or more of the plurality of processors, and wherein modifying a number of MPI tasks to be executed by one or more of the plurality of processors comprises: splitting one or more MPI tasks of the MPI workload into subtasks; andassigning each of the subtasks to respective processors of the plurality of processors. 3. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor. 4. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor, and wherein modifying MPI tasks of the MPI workload further comprises modifying an MPI task to be executed by the second processor such that the modified MPI task requires less computation than a MPI task executed in a previous computation cycle by the second processor. 5. The method of claim 4, wherein modifying MPI tasks of the MPI workload comprises splitting one or more MPI tasks of the MPI workload into subtasks, and wherein modifying a number of MPI tasks to be executed by the first processor comprises assigning a plurality of MPI subtasks to the first processor for concurrent execution by the first processor. 6. The method of claim 2, wherein splitting the one or more MPI tasks of the MPI workload into subtasks comprises: determining a delta value for increasing a number of MPI subtasks for the MPI workload;dividing the MPI workload by a value corresponding to a sum of a number of processors in the plurality of processors and the delta value to thereby identify a number of MPI subtasks for the MPI workload; andsplitting the one or more MPI tasks to generate the identified number of MPI subtasks. 7. The method of claim 6, wherein the delta value is determined based on a difference in completion times of MPI tasks of a fastest processor and a slowest processor. 8. The method of claim 1, further comprising: determining if a difference in the fastest time of completion and the slowest time of completion exceeds a threshold, modifying MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors is performed in response to the difference exceeding the threshold. 9. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a data processing system, causes the data processing system to: receive a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determine if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identify a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identify a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; andmodify MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors. 10. The computer program product of claim 9, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload by modifying a number of MP1 tasks to be executed by one or more of the plurality of processors, and wherein modifying a number of MPI tasks to be executed by one or more of the plurality of processors comprises: splitting one or more MPI tasks of the MPI workload into subtasks; andassigning each of the subtasks to respective processors of the plurality of processors. 11. The computer program product of claim 9, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload by modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor. 12. The computer program product of claim 9, wherein the computer readable program causes the data processing system to: modify MPI tasks of the MPI workload by modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor, andmodify an MPI task to be executed by the second processor such that the modified MPI task requires less computation than a MPI task executed in a previous computation cycle by the second processor. 13. The computer program product of claim 12, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload comprises splitting one or more MPI tasks of the MPI workload into subtasks, and wherein modifying a number of MPI tasks to be executed by the first processor comprises assigning a plurality of MPI subtasks to the first processor for concurrent execution by the first processor. 14. The computer program product of claim 10, wherein splitting the one or more MPI tasks of the MPI workload into subtasks comprises: determining a delta value for increasing a number of MPI subtasks for the MPI workload;dividing the MPJ workload by a value corresponding to a sum of a number of processors in the plurality of processors and the delta value to thereby identify a number of MPI subtasks for the MN workload; andsplitting the one or more MPI tasks to generate the identified number of MPI subtasks. 15. The computer program product of claim 14, wherein the delta value is determined based on a difference in completion times of MN tasks of a fastest processor and a slowest processor. 16. A data processing system, comprising: a plurality of processors; andat least one load balancing controller coupled to the plurality of processors, wherein the at least one load balancing controller:receives a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determines if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identifies a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identifies a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; andmodifies MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.