[특허]Runtime optimization of an application executing on a parallel computer

Runtime optimization of an application executing on a parallel computer 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/46 G06F-015/16 G06F-011/34 G06F-009/52 G06F-009/54
출원번호	US-0663545 (2012-10-30)
등록번호	US-8898678 (2014-11-25)
발명자 / 주소	Faraj, Daniel A. Smith, Brian E.
출원인 / 주소	International Business Machines Corporation
대리인 / 주소	Biggers Kennedy Lenart Spraggins LLP
인용정보	피인용 횟수 : 0 인용 특허 : 81

초록 ▼

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

대표청구항 ▼

1. An apparatus for runtime optimization of an application executing on a parallel computer, the parallel computer having a plurality of compute nodes organized into a communicator, the apparatus comprising a computer processor and a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that when executed by the computer processor cause the apparatus to carry out the steps of: determining, by each compute node, whether a collective operation is root-based;if the collective operation is not root-based, establishing a tuning session administered by a self tuning module for the collective operation in dependence upon an identifier of a call site of the collective operation and executing the collective operation in the tuning session;if the collective operation is root-based, determining, through use of a single other collective operation, whether all compute nodes executing the application identified the collective operation at the same call site;if all compute nodes executing the application identified the collective operation at the same call site, establishing a tuning session administered by the self tuning module for the collective operation in dependence upon the identifier of the call site of the collective operation and executing the collective operation in the tuning session; andif all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session. 2. The apparatus of claim 1 wherein a root-based collective operation comprises one of: a broadcast operation, a scatter operation, a gather operation, or a reduce operation. 3. The apparatus of claim 1 wherein determining whether all compute nodes executing the application identified the collective operation at the same call site further comprising performing on all the compute nodes of the communicator an ‘allreduce’ collective operation to identify the minimum and maximum values of all of the identified call sites. 4. The apparatus of claim 1 further comprising computer program instructions that when executed by the computer processor cause the apparatus to carry out the steps of: selecting, for a particular collective operation of the application in dependence upon one or more tuning sessions for the particular collective operation, one or more algorithms to carry out the particular collective operation, the one or more algorithms representing an optimized set of algorithms to carry out the particular collective operation;recording the one or more selected algorithms; andduring a subsequent execution of the application and without performing another tuning session, carrying out the particular collective operation of the application with the recorded selected algorithms. 5. The apparatus of claim 4 wherein recording the one or more selected algorithms from the tuning session further comprises recording, in association with the one or more selected algorithms, an identifier of the call site for the particular collective operation, a message size, and a communicator identifier. 6. The apparatus of claim 4 wherein: recording the one or more selected algorithms from the tuning session further comprises identifying any of the tuned collective operations that are non-critical collective operations; andcarrying out the particular collective operation of the application with the recorded selected algorithms further comprises carrying out the non-critical collective operations with standard messaging module algorithms. 7. A computer program product for runtime optimization of an application executing on a parallel computer, the parallel computer having a plurality of compute nodes organized into a communicator, the computer program product disposed in a computer readable hardware storage medium, the computer program product comprising computer program instructions that when executed by a processor cause a computer to carry out the steps of: determining, by each compute node, whether a collective operation is root-based;if the collective operation is not root-based, establishing a tuning session administered by a self tuning module for the collective operation in dependence upon an identifier of a call site of the collective operation and executing the collective operation in the tuning session;if the collective operation is root-based, determining, through use of a single other collective operation, whether all compute nodes executing the application identified the collective operation at the same call site;if all compute nodes executing the application identified the collective operation at the same call site, establishing a tuning session administered by the self tuning module for the collective operation in dependence upon the identifier of the call site of the collective operation and executing the collective operation in the tuning session; andif all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session. 8. The computer program product of claim 7 wherein a root-based collective operation comprises one of: a broadcast operation, a scatter operation, a gather operation, or a reduce operation. 9. The computer program product of claim 7 wherein determining whether all compute nodes executing the application identified the collective operation at the same call site further comprising performing on all the compute nodes of the communicator an ‘allreduce’ collective operation to identify the minimum and maximum values of all of the identified call sites. 10. The computer program product of claim 7 further comprising computer program instructions that when executed by a processor cause a computer to carry out the steps of: selecting, for a particular collective operation of the application in dependence upon one or more tuning sessions for the particular collective operation, one or more algorithms to carry out the particular collective operation, the one or more algorithms representing an optimized set of algorithms to carry out the particular collective operation;recording the one or more selected algorithms; andduring a subsequent execution of the application and without performing another tuning session, carrying out the particular collective operation of the application with the recorded selected algorithms. 11. The computer program product of claim 10 wherein recording the one or more selected algorithms from the tuning session further comprises recording, in association with the one or more selected algorithms, an identifier of the call site for the particular collective operation, a message size, and a communicator identifier. 12. The computer program product of claim 10 wherein: recording the one or more selected algorithms from the tuning session further comprises identifying any of the tuned collective operations that are non-critical collective operations; andcarrying out the particular collective operation of the application with the recorded selected algorithms further comprises carrying out the non-critical collective operations with standard messaging module algorithms.

이 특허에 인용된 특허 (81)

Wilford, Bruce; Dan, Yie-Fong, Architecture for high speed class of service enabled linecard.
상세보기
Kumar, Sameer, Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited.
상세보기
Law Ka Lun,CAX ITX M2K 2Y2, Asynchronous transfer mode switching system.
상세보기
Lau,Richard; Siegell,Bruce, Auto-IP traffic optimization in mobile telecommunications systems.
상세보기
Nellitheertha,Hariprasad, Autonomic input/output scheduler selector.
상세보기
Gorin Allen L. (Fairlawn NJ) Lewine Robert N. (Hanover Township ; Morris County NJ) Makofsky Patrick A. (Randolph NJ) Shively Richard R. (Convent Station NJ), Binary tree multiprocessor.
상세보기
Stolfo Salvatore J. (Ridgewood NJ) Miranker Daniel P. (Austin TX), Binary tree parallel processor.
상세보기
Gakovic, Luka, Ceramic center pin for compaction tooling and method for making same.
상세보기
Lavoie, Martin; Dionne, Carl, Coherent data sharing.
상세보기
Evan W. Steeg CA, Coincidence detection method, products and apparatus.
상세보기
Allen John David, Communication system and method providing optimal restoration of failed paths.
상세보기
Wingard Drew E. ; Rosseel Geert Paul ; Tomlinson Jay S. ; Robinson Lisa A., Communications system and method with multilevel connection identification.
상세보기
Wingard, Drew E.; Tomlinson, Jay S., Communications system and method with multilevel connection identification.
상세보기
Wingard, Drew Eric; Rosseel, Geert Paul; Tomlinson, Jay S.; Robinson, Lisa A., Communications system and method with multilevel connection identification.
상세보기
Wingard,Drew Eric; Rosseel,Geert Paul; Tomlinson,Jay S.; Robinson,Lisa A., Communications system and method with multilevel connection identification.
상세보기
Willis John Christopher ; Newshutz Robert Neill, Compiler-oriented apparatus for parallel compilation, simulation and execution of computer programs and hardware models.
상세보기
Golestani S. Jamaloddin (Parsippany-Troy Hills NJ), Congestion free packet network.
상세보기
Caldara Stephen A. ; Hauser Stephen A. ; Manning Thomas A. ; Strouble Raymond L., Controlling bandwidth allocation using a pace counter.
상세보기
Van Huben Gary Alan ; Mueller Joseph Lawrence ; Xiao Steve Yun ; Mak Joyce Chang, Data management system having shared libraries.
상세보기
Blackard Joe Wayne ; Gillaspy Richard Adams ; Henthorn William John ; Petersen Lynn Erich ; Russell Lance W. ; Shippy Gary Roy, Data processing system and method for pacing information transfers in a communications network.
상세보기
Basso, Claude; Calvignac, Jean Louis; Heddes, Marco C.; Logan, Joseph Franklin; Verplanken, Fabrice Jean, Data structures for efficient processing of multicast transmissions.
상세보기
Komatsu Hideaki,JPX ; Ogasawara Takeshi,JPX, Determining a communication schedule between processors.
상세보기
Komatsu Hideaki,JPX ; Ogasawara Takeshi,JPX, Determining a communication schedule between processors.
상세보기
Shakeri,Mojdeh; Mosterman,Pieter J., Distributed model compilation.
상세보기
Aviani, James; Swanson, David Eric; Baker, Frederick; Mueller, II, Kenneth E.; Gnagy, Matthew Richard, Distributed network traffic load balancing technique implemented without gateway router.
상세보기
Kloth,Axel K.; Andrews,Warner; Bergantino,Paul; Bicknell,Jeremy; Fu,Daniel; De Leon,Moshe; Mills,Stephen M., Dynamic bandwidth allocation for wide area networks.
상세보기
Ben-Ayed Mondher (Rochester NY) Merriam Charles W. (Rochester NY), Dynamic routing system for a multinode communications network.
상세보기
Edwards, Chris Michael, ERI (enhanced roaming indicator) implementation.
상세보기
Edholm, Phil, End node pacing for QOS and bandwidth management.
상세보기
Bhattiprolu, Phaneendra; Au, Grace K., Execution of requests in a parallel database system.
상세보기
Barzilai Tsipora P. (Millwood NY) Chen Mon-Song (Katonah NY) Kadaba Bharath K. (Peekskill NY) Kaplan Marc A. (Purdys NY), Flow control for high speed networks.
상세보기
Thomson, Andrew, GPIB system and method which performs autopolling using heuristics.
상세보기
Craig W. Schmidt, Graph-based schedule builder for tightly constrained scheduling problems.
상세보기
Blackmore, Robert S.; Chang, Fu Chung; Chaudhary, Piyush; Gildea, Kevin J.; Goscinski, Jason E.; Govindaraju, Rama K.; Grice, Donald G.; Helmer, Jr., Leonard W.; Heywood, Patricia E.; Hochschild, Peter H.; Houston, John S.; Kim, Chulho; Martin, Steven J., Half RDMA and half FIFO operations.
상세보기
Joshi Ashok M. (Nashua NH), Hybrid lock escalation and de-escalation protocols.
상세보기
Rolfe David B. ; Wack Andrew P., Incidence graph based communications and operations method and apparatus for parallel processing architecture.
상세보기
Blackmore,Robert S.; Martin,Steven J., Lazy deregistration of user virtual machine to adapter protocol virtual offsets.
상세보기
Burns, Randal Chilton; Goel, Atul; Long, Darrell D. E.; Rees, Robert Michael, Lease based safety protocol for distributed system with multiple networks.
상세보기
Matsushita Masayuki,JPX ; Ugajin Atsushi,JPX, Management system and method for parallel computer system.
상세보기
Nugent Steven F. (Portland OR), Message routing in a multiprocessor computer system.
상세보기
Ebert,Jeffrey Allen; Venugopalan,Ravi; Evans,Scott Carlton, Method and apparatus for decomposing and verifying configurable hardware.
상세보기
Daynes, Laurent, Method and apparatus for filtering lock requests.
상세보기
Bournas, Redha M., Method and apparatus for load balancing of parallel servers in a network environment.
상세보기
Richard Alan Diedrich ; Harvey Gene Kiel, Method and apparatus for multimedia data interchange with pacing capability in a distributed data processing system.
상세보기
Shtayer Ronen (Tel-Aviv ILX) Alon Naveh (Ranat Hashnron ILX) Alexander Joffe (Rehovot ILX), Method and apparatus for pacing asynchronous transfer mode (ATM) data cell transmission.
상세보기
Crawley Eric S. ; Zhang Zhaohui ; Salkewicz William M. ; Sanchez Cheryl A., Method and apparatus for providing quality of service routing in a network.
상세보기
Moore Mark Justin,GBX ; Stark Gavin J.,GBX, Method and apparatus for source rate pacing in an ATM network.
상세보기
Laurent Daynes, Method and apparatus that utilizes state locks to lock resources.
상세보기
Pelley, Barry Leo, Method and device for preserving pacing information across a transport medium.
상세보기
Levin Vladimir K.,RUX ; Karatanov Vjacheslav V.,RUX ; Jalin Valerii V.,RUX ; Titov Alexandr,RUX ; Agejev Vjacheslav M.,RUX ; Patrikeev Andrei,RUX ; Jablonsky Sergei V.,RUX ; Korneev Victor V.,RUX ; M, Method for deadlock-free message passing in MIMD systems using routers and buffers.
상세보기
Packer Robert L. ; Galloway Brett D., Method for pacing data flow in a packet-based network.
상세보기
James E. Hodge, Method for providing a precise network time service.
상세보기
Kadakia,Viral; Chinta,Ramakrishna; Menna,Randy, Method for the secure and timely delivery of large messages over a distributed communication network.
상세보기
Chang,Fu Chung; Chaudhary,Piyush; Goscinski,Jason E.; Houston,John S.; Martin,Steven J., Method for third party, broadcast, multicast and conditional RDMA operations.
상세보기
Tsuchida Masashi,JPX ; Masai Kazuo,JPX ; Torii Shunichi,JPX, Method of executing partition operations in a parallel database system.
상세보기
Aweva, James; Ouellette, Michel; Montuno, Delfin Y., Method, apparatus, media, and signals for controlling packet transmission rate from a packet source.
상세보기
Przygienda, Antoni B.; Chanak, John, Multi-resolution tree for longest match address lookups.
상세보기
Bui Thuan Quang ; Helt Scott Dennis ; Iyer Balakrishna Raghavendra ; Ricard Gary Ross, Parallel bottom-up construction of radix trees.
상세보기
Ferguson Robert, Parallel computer.
상세보기
Arimilli, Lakshminarayana B.; Arimilli, Ravi K.; Rajamony, Ramakrishnan; Speight, William E., Performing collective operations using software setup and partial software execution at leaf nodes in a multi-tiered full-graph interconnect architecture.
상세보기
Hansen,Craig; Moussouris,John; Massalin,Alexia, Programmable processor and method with wide operations.
상세보기
Hansen,Craig; Moussouris,John, Programmable processor with group floating-point operations.
상세보기
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Reducing power consumption while performing collective operations on a plurality of compute nodes.
상세보기
Daruwalla, Feisal; Forster, James R.; Roeck, Guenter E.; Woundy, Richard M.; Thomas, Michael A., Routing protocol based redundancy design for shared-access networks.
상세보기
Iwata Atsushi,JPX, Source routing for connection-oriented network with repeated call attempts for satisfying user-specified QOS parameters.
상세보기
Ma, Qingming; Ramakrishnan, Kadangode K., Startup management system and method for rate-based flow and congestion control within a network.
상세보기
Ray, Amar N.; Bugenhagen, Michael K.; Morrill, Robert J.; Chakravarthy, Cadathur V., System and method for adjusting the window size of a TCP packet through network elements.
상세보기
Duke,Jeremy P., System and method for dynamically adjusting a thread scheduling quantum value.
상세보기
Blandy Geoffrey Owen ; Saba Maher Afif, System and method for instruction burst performance profiling for single-processor and multi-processor systems.
상세보기
Motles Luis (Pittsburgh PA), System and method for measuring inter-nodal transmission delays in a communications network.
상세보기
Buskirk, Glenn A.; Santiago, Rodolfo A., System and method for policing multiple data flows and multi-protocol data flows.
상세보기
Sylvain Dany,CAX, System and method for providing competing local exchange carriers unbundled access to subscriber access lines.
상세보기
Schumacher, Larry Lee; Gonzales-Tuchmann, Agustin; Yogman, Laurence Tobin; Dingman, Paul C., System for deadlock condition detection and correction by allowing a queue limit of a number of data tokens on the queue to increase.
상세보기
Spiller Cynthia J., System for method for performing a context switch operation in a massively parallel computer system.
상세보기
Case, Ralph B.; Topol, Brad B., Technique for measuring round-trip latency to computing devices requiring no client-side proxy presence.
상세보기
Archer, Charles J.; Peters, Amanda; Smith, Brian E.; Swartz, Brent A., Tracking network contention.
상세보기
Yang,Brian; Barrack,Craig I.; Wang,Linghsiao, Unified algorithm for frame scheduling and buffer management in differentiated services networks.
상세보기
Konsella Shane (Boise ID), Use of a genetic algorithm to optimize memory space.
상세보기
Levy Henry M. ; Feeley Michael J.,CAX ; Karlin Anna R. ; Morgan William E. ; Thekkath Chandramohan A., Using global memory information to manage memory in a computer network.
상세보기
Advani Deepak Mohan ; Byron Michael Justin ; Hansell Steven Robert ; Ming Chun Li Todd ; Marino John Paul ; Panda Rajendra Datta ; Pierce James Andrew ; Wang Ko-Yang ; Weinel Dennis George ; Welch Ro, Visualization tool for graphically displaying trace data.
상세보기
Advani Deepak Mohan ; Byron Michael Justin ; Hansell Steven Robert ; Li Todd Ming Chun ; Marino John Paul ; Panda Rajendra Datta ; Pierce James Andrew ; Wang Ko-Yang ; Weinel Dennis George ; Welch Ro, Visualization tool for graphically displaying trace data produced by a parallel processing computer.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Runtime optimization of an application executing on a parallel computer 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (81)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Runtime optimization of an application executing on a parallel computer 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (81)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트