[특허]Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks

Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/46 G06F-009/50 G06F-009/52
출원번호	US-0524585 (2012-06-15)
등록번호	US-8893148 (2014-11-18)
발명자 / 주소	Arimilli, Lakshminarayana B. Arimilli, Ravi K. Rajamony, Ramakrishnan Speight, William E.
출원인 / 주소	International Business Machines Corporation
대리인 / 주소	Walder, Jr., Stephen J.
인용정보	피인용 횟수 : 0 인용 특허 : 23

초록 ▼

A system and method are provided for performing setup operations for receiving a different amount of data while processors are performing message passing interface (MPI) tasks. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. An MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, setup operations may be performed while processors are performing MPI tasks to prepare for receiving different sized portions of data in a subsequent computation cycle based on the history.

대표청구항 ▼

1. A method, in a multiple processor system, for balancing a Message Passing Interface (MPI) workload across a plurality of processors, comprising: receiving one or more MPI synchronization operation calls from one or more processors of the plurality of processors;identifying a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task of a MPI job, during a computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the first associated MPI task involves executing the MPI task on a first data set; andperforming a first setup operation in the first processor for preparing to receive a second data set that is larger than the first data set in response to identifying the first processor as having a fastest time of completion of the computation phase, wherein the first setup operation modifies an allocation of resources in the multiple processor system for use by the first processor in receiving the second data set, wherein the first data set and second data set are associated with the MPI job. 2. The method of claim 1, wherein the first setup operation is performed while at least one other processor in the plurality of processors is still in a computation phase of its associated MPI task during the same computation cycle. 3. The method of claim 1, wherein the first setup operation comprises at least one of allocating a larger portion of cache memory for use by the first processor or acquiring a host fabric interface window or windows for communication by the first processor. 4. The method of claim 1, further comprising: identifying a second processor, in the plurality of processors, having a slowest time of completion of a computation phase of a second associated MPI task, during the computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the second associated MPI task involves executing the MPI task on a third data set; andperforming a second setup operation in the second processor for preparing to receive a fourth data set that is smaller than the third data set in response to identifying the second processor as having a slowest time of completion of the computation phase, wherein the second setup operation modifies an allocation of resources in the multiple processor system for use by the second processor in receiving the fourth data set. 5. The method of claim 4, wherein the second setup operation comprises at least one of allocating a smaller portion of cache memory for use by the second processor or acquiring a host fabric interface window or windows for communication by the second processor. 6. The method of claim 4, further comprising: determining if a difference in the fastest time of completion and the slowest time of completion exceeds a threshold, wherein the first setup operation and the second setup operation are performed in response to the difference exceeding the threshold. 7. The method of claim 4, wherein each processor of the plurality of processors comprises a MPI load balancing controller, wherein each MPI load balancing controller implements the receiving and identifying operations, an MN load balancing controller associated with the first processor implements performing the first setup operation, and an MPI load balancing controller associated with the second processor implements performing the second setup operation. 8. The method of claim 1, wherein the MPI job is a set of tasks to be performed in parallel on the plurality of processors, and wherein each processor of the plurality of processors executes a corresponding task of the MPI job in parallel on a corresponding set of data allocated to the processor from a superset of data. 9. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program, wherein the computer readable program, when executed on a data processing system, causes the data processing system to: receive one or more Message Passing Interface (MPI) synchronization operation calls from one or more processors of a plurality of processors;identify a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task of a MPI job, during a computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the first associated MPI task involves executing the MPI task on a first data set; andperform a first setup operation in the first processor for preparing to receive a second data set that is larger than the first data set in response to identifying the first processor as having a fastest time of completion of the computation phase, wherein the first setup operation modifies an allocation of resources in the multiple processor system for use by the first processor in receiving the second data set, wherein the first data set and second data set are associated with the MPI job. 10. The computer program product of claim 9, wherein the first setup operation is performed while at least one other processor in the plurality of processors is still in a computation phase of its associated MPI task during the same computation cycle. 11. The computer program product of claim 9, wherein the first setup operation comprises at least one of allocating a larger portion of cache memory for use by the first processor or acquiring a host fabric interface window or windows for communication by the first processor. 12. The computer program product of claim 9, wherein the computer readable program further causes the data processing system to: identify a second processor, in the plurality of processors, having a slowest time of completion of a computation phase of a second associated MPI task, during the computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the first associated MPI task involves executing the MPI task on a third data set; andperform a second setup operation in the second processor for preparing to receive a fourth data set that is smaller than the third data set in response to identifying the second processor as having a slowest time of completion of the computation phase, wherein the second setup operation modifies an allocation of resources in the multiple processor system for use by the second processor in receiving the fourth data set. 13. The computer program product of claim 12, wherein the second setup operation comprises at least one of allocating a smaller portion of cache memory for use by the second processor or acquiring a host fabric interface window or windows for communication by the second processor. 14. The computer program product of claim 12, wherein the computer readable program further causes the data processing system to: determine if a difference in the fastest time of completion and the slowest time of completion exceeds a threshold, wherein the first setup operation and the second setup operation are performed in response to the difference exceeding the threshold. 15. The computer program product of claim 12, wherein each processor of the plurality of processors comprises a MPI load balancing controller, wherein each MPI load balancing controller performs the operations to receive one or more MPI synchronization operation calls and identify the first processor and second processor, an MPI load balancing controller associated with the first processor performs the first setup operation, and an MPI load balancing controller associated with the second processor performs the second setup operation. 16. The computer program product of claim 9, wherein the MPI job is a set of tasks to be performed in parallel on the plurality of processors, and wherein each processor of the plurality of processors executes a corresponding task of the MPI job in parallel on a corresponding set of data allocated to the processor from a superset of data. 17. A data processing system, comprising: a plurality of processors; andat least one load balancing controller associated with the plurality of processors, wherein the at least one load balancing controller:receives one or more MPI synchronization operation calls from one or more processors of the plurality of processors;identifies a first processor, in the plurality of processors, having a fastest time of completion of a computation phase of an associated MPI task of a MPI job, during a computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the first associated MPI task involves executing the MPI task on a first data set; andperforms a first setup operation in the first processor for preparing to receive a second data set that is larger than the first data set in response to identifying the first processor as having a fastest time of completion of the computation phase, wherein the first setup operation modifies an allocation of resources in the multiple processor system for use by the first processor in receiving the second data set, wherein the first data set and second data set are associated with the MPI job. 18. The system of claim 17, wherein the first setup operation is performed while at least one other processor in the plurality of processors is still in a computation phase of its associated MPI task during the same computation cycle. 19. The system of claim 17, wherein the first setup operation comprises at least one of allocating a larger portion of cache memory for use by the first processor or acquiring a host fabric interface window or windows for communication by the first processor. 20. The system of claim 17, further comprising: identifying a second processor, in the plurality of processors, having a slowest time of completion of a computation phase of a second associated MPI task, during the computation cycle, based on the received one or more MPI synchronization operation calls, wherein the computation phase of the first associated MPI task involves executing the MPI task on a third data set; andperforming a second setup operation in the second processor for preparing to receive a fourth data set that is smaller than the third data set in response to identifying the second processor as having a slowest time of completion of the computation phase, wherein the second setup operation modifies an allocation of resources in the multiple processor system for use by the second processor in receiving the fourth data set.

이 특허에 인용된 특허 (23)

Diard,Franck R., Adaptive load balancing in a multi-processor graphics processing system.
상세보기
Brenner, Larry Bert; Browning, Luke Matthew, Apparatus and method for periodic load balancing in a multiple run queue system.
상세보기
Ted Eric Blank ; Tammie Dang ; Fen-Ling Lin ; Randy Mitchell Nakagawa ; Bryan Frederick Smith ; Craig Leonard Sutton ; Darren Benjamin Swank ; Hong Sang Tie ; Dino Carlo Tonelli ; Annie S. T, Apportioning a work unit to execute in parallel in a heterogeneous environment.
상세보기
Zhang,Bin; Hsu,Meichun; Forman,George, Distributed data clustering system and method.
상세보기
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Hardware based dynamic load balancing of message passing interface tasks.
상세보기
Vrba Richard Alan ; Klecka James Stevens ; Fey ; Jr. Kyran Wilfred ; Lamano Larry Leonard ; Mehta Nikhil A., High-performance fault tolerant computer system with clock length synchronization of loosely coupled processors.
상세보기
Hwang, Cherng-Daw; Wong, Kenley, Method and apparatus for providing a time-division multiplexing (TDM) interface among a high-speed data stream and multiple processors.
상세보기
Horst Robert W. (Champaign IL), Method and apparatus for synchronizing a plurality of processors.
상세보기
Kanai,Tatsunori; Maeda,Seiji; Yano,Hirokuni; Yoshii,Kenichiro, Method and system for performing real-time operation.
상세보기
Krum, Brent, Method for configuring an application server system.
상세보기
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Modifying an operation of one or more processors executing message passing interface tasks.
상세보기
Lee J. William ; Bridge ; Jr. William H., Multi-node fault-tolerant timestamp generation.
상세보기
Naganuma Jiro (Zama JPX) Ogura Takeshi (Chigasaki JPX), Multiprocessor system and a method of load balancing thereof.
상세보기
Fukuda Munehiro (Sagamihara JPX) Matsumoto Takashi (Tokyo JPX) Nakada Takeo (Kawaguchi JPX), Multiprocessor system having synchronization control mechanism.
상세보기
Hardwick Jonathan C.,GBX, Nested parallel language preprocessor for converting parallel language programs into sequential code.
상세보기
Agrawal Rakesh ; Ho Ching-Tien ; Zaki Mohammed J., Parallel classification for data mining in a shared-memory multiprocessor system.
상세보기
Gulko,Abraham; Mellor,David, Parallel computing system, method and architecture.
상세보기
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; Caffrey, Patrick; Fier, Jeffrey, Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes.
상세보기
Konno Chisato,JPX ; Okochi Toshio,JPX, Program execution control in parallel processor system for parallel execution of plural jobs by selected number of proce.
상세보기
Cousins, David Bruce; Daily, Matthew Paul; Lirakis, Christopher Burbank, System and method for automatically optimizing heterogenous multiprocessor software performance.
상세보기
Pronovost,Steve; Gosalia,Anuj B.; Langley,Bryan L.; Nagase,Hideyuki, Systems and methods for scheduling coprocessor resources in a computing system.
상세보기
Smith Bradley J. (Converse TX), Weighted system and method for spatial allocation of a parallel load.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (23)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (23)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트