IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0714619
(2007-03-05)
|
등록번호 |
US-8443348
(2013-05-14)
|
발명자
/ 주소 |
- McGuire, Morgan S.
- Demetriou, Christopher G.
- Grant, Brian K.
- Papakipos, Matthew N.
|
출원인 / 주소 |
|
대리인 / 주소 |
Morgan, Lewis & Bockius LLP
|
인용정보 |
피인용 횟수 :
14 인용 특허 :
46 |
초록
▼
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-in
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
대표청구항
▼
1. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes multiple processing elements, further including a first processing element and a second processing element, the two processing elements having different instruct
1. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes multiple processing elements, further including a first processing element and a second processing element, the two processing elements having different instruction set architectures and memory storing one or more programs for execution by the multiple processing elements, at runtime: receiving from an application executing in conjunction with the runtime system one or more application program interface calls to the runtime system, the application program interface calls comprising one or more operation requests directed to the parallel-processing computer system;selecting an application program interface module of the runtime system, from among a plurality of application program interface modules, based on a programming language of the application;generating, using the application program interface module, a programming language-independent, processor-independent intermediate representation for at least one of the one or more operation requests, wherein the intermediate representation includes one or more objects corresponding to the at least one of the one or more operation requests and information for generating optimized compute kernels for the first processing element and the second processing element, respectively;dynamically selecting one of the first processing element and the second processing element on which to perform the one or more operation requests of the intermediate representation; anddynamically preparing one or more compute kernels for the intermediate representation in accordance with the instruction set architecture of the selected processing element, wherein the one or more compute kernels are configured to execute on the selected processing element, and wherein dynamically preparing the one or more compute kernels includes:selecting from a source code library one or more source code segments corresponding to the at least one of the one or more operation requests, and dynamically compiling the one or more source code segments into the one or more compute kernels, or selecting from a binary code library the one or more compute kernels corresponding to the at least one of the one or more operation requests, wherein the binary code library includes a plurality of pre-compiled compute kernels and each pre-compiled compute kernel is configured to be executed on at least one of the one or more types of processing elements. 2. The method of claim 1, further comprising: identifying from a plurality of language-specific application program interface modules an application program interface module; andgenerating the intermediate representation using the identified application program interface module. 3. The method of claim 2, wherein at least one of the one or more intermediate representation objects corresponds to a function call to the identified application program interface module in the application. 4. The method of claim 1, further comprising: defining an object scope for at least one of the one or more intermediate representation objects. 5. The method of claim 1, wherein at least one of the intermediate representation objects has an associated handle and the associated handle is embedded in at least one of the one or more operation requests. 6. The method of claim 5, wherein the application has an access to the at least one of the intermediate representation objects through its associated handle. 7. The method of claim 1, further comprising: dynamically arranging for execution of at least one of the one or more compute kernels on the selected processing element. 8. A parallel-processing computer system, comprising: memory;multiple processing elements, further including a first processing element and a second processing element, the two processing elements having different instruction set architectures;a runtime system configured to run on a parallel-processing computer system; andat least one program stored in the memory and executed by the multiple processing elements, the at least one program including: instructions for receiving, at runtime, from an application executing in conjunction with the runtime system one or more application program interface calls to the runtime system, the application program interface calls comprising one or more operation requests directed to the parallel-processing computer system;instructions for selecting, at runtime, an application program interface module of the runtime system, from among a plurality of application program interface modules, based on a programming language of the application;instructions for generating, at runtime, using the application program interface module, a programming language-independent, processor-independent intermediate representation for at least one of the one or more operation requests, wherein the intermediate representation includes one or more objects corresponding to the at least one of the one or more operation requests and information for generating optimized compute kernels for the first processing element and the second processing element, respectively;instructions for, at runtime, dynamically selecting one of the first processing element and the second processing element on which to perform the one or more operation requests of the intermediate representation; andinstructions for, at runtime, dynamically preparing one or more compute kernels for the intermediate representation in accordance with the instruction set architecture of the selected processing element, wherein the one or more compute kernels are configured to execute on the selected processing element, wherein dynamically preparing the one or more compute kernels includes:selecting from a source code library one or more source code segments corresponding to the at least one of the one or more operation requests, and dynamically compiling the one or more source code segments into the one or more compute kernels, or selecting from a binary code library the one or more compute kernels corresponding to the at least one of the one or more operation requests, wherein the binary code library includes a plurality of pre-compiled compute kernels and each pre-compiled compute kernel is configured to be executed on at least one of the one or more types of processing elements. 9. The computer system of claim 8, further comprising: instructions for identifying from a plurality of language-specific application program interface modules an application program interface module; andinstructions for generating the intermediate representation using the identified application program interface module. 10. The computer system of claim 9, wherein at least one of the one or more intermediate representation objects corresponds to a function call to the identified application program interface module in the application. 11. The computer system of claim 8, further comprising: instructions for defining an object scope for at least one of the one or more intermediate representation objects. 12. The computer system of claim 8, wherein at least one of the intermediate representation objects has an associated handle and the associated handle is embedded in at least one of the one or more operation requests. 13. The computer system of claim 12, wherein the application has an access to the at least one of the intermediate representation objects through its associated handle. 14. The computer system of claim 8, further comprising: instructions for dynamically arranging for execution of at least one of the one or more compute kernels on the selected processing element. 15. A non-transitory computer readable storage medium storing one or more programs configured to be executed by a parallel-processing computer system that includes multiple processing elements, further including a first processing element and a second processing element, the two processing elements having different instruction set architectures, the one or more programs comprising instructions for: receiving, at runtime, from an application executing in conjunction with the runtime system one or more application program interface calls to the runtime system, the application program interface calls comprising one or more operation requests directed to the parallel-processing computer system;selecting, at runtime, an application program interface module of the runtime system, from among a plurality of application program interface modules, based on a programming language of the application;generating, at runtime, using the application program interface module a programming language-independent, processor-independent intermediate representation for at least one of the one or more operation requests, wherein the intermediate representation includes one or more objects corresponding to the at least one of the one or more operation requests and information for generating optimized compute kernels for the first processing element and the second processing element, respectively;dynamically selecting, at runtime, one of the first processing element and the second processing element on which to perform the one or more operation requests of the intermediate representation; anddynamically preparing, at runtime, one or more compute kernels for the intermediate representation in accordance with the instruction set architecture of the selected processing element, wherein the one or more compute kernels are configured to execute on the selected processing element, and wherein dynamically preparing the one or more compute kernels includes: selecting from a source code library one or more source code segments corresponding to the at least one of the one or more operation requests, and dynamically compiling the one or more source code segments into the one or more compute kernels, or selecting from a binary code library the one or more compute kernels corresponding to the at least one of the one or more operation requests, wherein the binary code library includes a plurality of pre-compiled compute kernels and each pre-compiled compute kernel is configured to be executed on at least one of the one or more types of processing elements. 16. The method of claim 1, wherein one of the operation requests is a request to initialize the runtime system. 17. The method of claim 1, wherein one of the one or more operation requests is a request to shut down the runtime system. 18. The method of claim 1, wherein one of the one or more operation requests is a request to create, duplicate, or destroy data held by the runtime system. 19. The method of claim 1, wherein one of the one or more operation requests is a request to allocate or de-allocate main system memory managed by the runtime system. 20. The method of claim 1, wherein one of the one or more operation requests is a request to control error handling behavior of the runtime system.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.