[특허]Hardware based dynamic load balancing of message passing interface tasks by modifying tasks

Hardware based dynamic load balancing of message passing interface tasks by modifying tasks 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/46
출원번호	US-0846168 (2007-08-28)
등록번호	US-8312464 (2012-11-13)
발명자 / 주소	Arimilli, Lakshminarayana B. Arimilli, Ravi K. Rajamony, Ramakrishnan Speight, William E.
출원인 / 주소	International Business Machines Corporation
대리인 / 주소	Walder, Jr., Stephen J.
인용정보	피인용 횟수 : 3 인용 특허 : 22

초록 ▼

Mechanisms are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.

대표청구항 ▼

1. A method, in a multiple processor system, for balancing a Message Passing Interface (MPI) workload across a plurality of processors, comprising: receiving a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determining if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identifying a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identifying a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; and modifying MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors. 2. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed by one or more of the plurality of processors, and wherein modifying a number of MPI tasks to be executed by one or more of the plurality of processors comprises: splitting one or more MPI tasks of the MPI workload into subtasks; andassigning each of the subtasks to respective processors of the plurality of processors. 3. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor. 4. The method of claim 1, wherein modifying MPI tasks of the MPI workload comprises modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor, and wherein modifying MPI tasks of the MPI workload further comprises modifying an MPI task to be executed by the second processor such that the modified MPI task requires less computation than a MPI task executed in a previous computation cycle by the second processor. 5. The method of claim 4, wherein modifying MPI tasks of the MPI workload comprises splitting one or more MPI tasks of the MPI workload into subtasks, and wherein modifying a number of MPI tasks to be executed by the first processor comprises assigning a plurality of MPI subtasks to the first processor for concurrent execution by the first processor. 6. The method of claim 2, wherein splitting the one or more MPI tasks of the MPI workload into subtasks comprises: determining a delta value for increasing a number of MPI subtasks for the MPI workload;dividing the MPI workload by a value corresponding to a sum of a number of processors in the plurality of processors and the delta value to thereby identify a number of MPI subtasks for the MPI workload; andsplitting the one or more MPI tasks to generate the identified number of MPI subtasks. 7. The method of claim 6, wherein the delta value is determined based on a difference in completion times of MPI tasks of a fastest processor and a slowest processor. 8. The method of claim 1, further comprising: determining if a difference in the fastest time of completion and the slowest time of completion exceeds a threshold, modifying MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors is performed in response to the difference exceeding the threshold. 9. A computer program product comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed on a data processing system, causes the data processing system to: receive a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determine if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identify a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identify a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; andmodify MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors. 10. The computer program product of claim 9, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload by modifying a number of MP1 tasks to be executed by one or more of the plurality of processors, and wherein modifying a number of MPI tasks to be executed by one or more of the plurality of processors comprises: splitting one or more MPI tasks of the MPI workload into subtasks; andassigning each of the subtasks to respective processors of the plurality of processors. 11. The computer program product of claim 9, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload by modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor. 12. The computer program product of claim 9, wherein the computer readable program causes the data processing system to: modify MPI tasks of the MPI workload by modifying a number of MPI tasks to be executed concurrently by the first processor to increase a number of MPI tasks to be executed concurrently by the first processor, andmodify an MPI task to be executed by the second processor such that the modified MPI task requires less computation than a MPI task executed in a previous computation cycle by the second processor. 13. The computer program product of claim 12, wherein the computer readable program causes the data processing system to modify MPI tasks of the MPI workload comprises splitting one or more MPI tasks of the MPI workload into subtasks, and wherein modifying a number of MPI tasks to be executed by the first processor comprises assigning a plurality of MPI subtasks to the first processor for concurrent execution by the first processor. 14. The computer program product of claim 10, wherein splitting the one or more MPI tasks of the MPI workload into subtasks comprises: determining a delta value for increasing a number of MPI subtasks for the MPI workload;dividing the MPJ workload by a value corresponding to a sum of a number of processors in the plurality of processors and the delta value to thereby identify a number of MPI subtasks for the MN workload; andsplitting the one or more MPI tasks to generate the identified number of MPI subtasks. 15. The computer program product of claim 14, wherein the delta value is determined based on a difference in completion times of MN tasks of a fastest processor and a slowest processor. 16. A data processing system, comprising: a plurality of processors; andat least one load balancing controller coupled to the plurality of processors, wherein the at least one load balancing controller:receives a plurality of MPI synchronization operation calls from one or more processors of the plurality of processors;determines if a state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors based on timestamps associated with the received one or more MPI synchronization operation calls;identifies a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls;identifies a second processor, in the plurality of processors, having a slowest time of completion of the computation phase of an associated MPI task, during a computation cycle, based on the received MPI synchronization operation calls; andmodifies MPI tasks of the MPI workload to rebalance the MPI workload across the plurality of processors in response to determining that the state of the multiple processor system requires rebalancing of the MPI workload across the plurality of processors, wherein modifying MPI tasks comprises at least one of modifying an amount of data to be processed by one or more of the plurality of processors or modifying a number of MPI tasks to be executed by one or more of the plurality of processors.

이 특허에 인용된 특허 (22)

Diard,Franck R., Adaptive load balancing in a multi-processor graphics processing system.
상세보기
Brenner, Larry Bert; Browning, Luke Matthew, Apparatus and method for periodic load balancing in a multiple run queue system.
상세보기
Ted Eric Blank ; Tammie Dang ; Fen-Ling Lin ; Randy Mitchell Nakagawa ; Bryan Frederick Smith ; Craig Leonard Sutton ; Darren Benjamin Swank ; Hong Sang Tie ; Dino Carlo Tonelli ; Annie S. T, Apportioning a work unit to execute in parallel in a heterogeneous environment.
상세보기
Zhang,Bin; Hsu,Meichun; Forman,George, Distributed data clustering system and method.
상세보기
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Hardware based dynamic load balancing of message passing interface tasks.
상세보기
Vrba Richard Alan ; Klecka James Stevens ; Fey ; Jr. Kyran Wilfred ; Lamano Larry Leonard ; Mehta Nikhil A., High-performance fault tolerant computer system with clock length synchronization of loosely coupled processors.
상세보기
Hwang, Cherng-Daw; Wong, Kenley, Method and apparatus for providing a time-division multiplexing (TDM) interface among a high-speed data stream and multiple processors.
상세보기
Horst Robert W. (Champaign IL), Method and apparatus for synchronizing a plurality of processors.
상세보기
Kanai,Tatsunori; Maeda,Seiji; Yano,Hirokuni; Yoshii,Kenichiro, Method and system for performing real-time operation.
상세보기
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Modifying an operation of one or more processors executing message passing interface tasks.
상세보기
Lee J. William ; Bridge ; Jr. William H., Multi-node fault-tolerant timestamp generation.
상세보기
Naganuma Jiro (Zama JPX) Ogura Takeshi (Chigasaki JPX), Multiprocessor system and a method of load balancing thereof.
상세보기
Fukuda Munehiro (Sagamihara JPX) Matsumoto Takashi (Tokyo JPX) Nakada Takeo (Kawaguchi JPX), Multiprocessor system having synchronization control mechanism.
상세보기
Hardwick Jonathan C.,GBX, Nested parallel language preprocessor for converting parallel language programs into sequential code.
상세보기
Agrawal Rakesh ; Ho Ching-Tien ; Zaki Mohammed J., Parallel classification for data mining in a shared-memory multiprocessor system.
상세보기
Gulko,Abraham; Mellor,David, Parallel computing system, method and architecture.
상세보기
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; Caffrey, Patrick; Fier, Jeffrey, Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes.
상세보기
Konno Chisato,JPX ; Okochi Toshio,JPX, Program execution control in parallel processor system for parallel execution of plural jobs by selected number of proce.
상세보기
Cousins, David Bruce; Daily, Matthew Paul; Lirakis, Christopher Burbank, System and method for automatically optimizing heterogenous multiprocessor software performance.
상세보기
Pronovost,Steve; Gosalia,Anuj B.; Langley,Bryan L.; Nagase,Hideyuki, Systems and methods for scheduling coprocessor resources in a computing system.
상세보기
Smith Bradley J. (Converse TX), Weighted system and method for spatial allocation of a parallel load.
상세보기

이 특허를 인용한 특허 (3)

Nadathur, Gokul; Singh, Manpreet; Ho, Grace, Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by destroying parallizable group of threads in sub-domains.
상세보기
Nadathur, Gokul; Singh, Manpreet; Ho, Grace, Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by sub-dividing parallizable group of threads to sub-domains.
상세보기
Chang, Wooseok; Roh, Kangho; Lee, Jongwon, Storage device and garbage collection method of data storage system having the storage device.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Hardware based dynamic load balancing of message passing interface tasks by modifying tasks 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (22)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Hardware based dynamic load balancing of message passing interface tasks by modifying tasks 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (22)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트