IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0714591
(2007-03-05)
|
등록번호 |
US-8381202
(2013-02-19)
|
발명자
/ 주소 |
- Papakipos, Matthew N.
- Demetriou, Christopher G.
- Tuck, Nathan D.
- Grant, Brian K.
|
출원인 / 주소 |
|
인용정보 |
피인용 횟수 :
6 인용 특허 :
46 |
초록
▼
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-in
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
대표청구항
▼
1. A computer-implemented method, comprising: at a parallel-processing computer system that includes two or more processing elements including a first processing element of a first architecture and a second processing element of a second architecture different from the first architecture: at the fir
1. A computer-implemented method, comprising: at a parallel-processing computer system that includes two or more processing elements including a first processing element of a first architecture and a second processing element of a second architecture different from the first architecture: at the first processing element: receiving one or more compute kernels, wherein the one or more compute kernels are configured to execute on the two or more processing elements; anddynamically arranging for execution of at least one of the one or more compute kernels on at least one of the two or more processing elements in response to or in anticipation of a request for a result associated with the at least one of the one or more compute kernels, further comprising:arranging the execution of the at least one of the one or more compute kernels on the second processing element after receiving the request for a result associated with the at least one of the one or more compute kernels; andreceiving a callback function from the second processing element, wherein the call back function includes a completion signal before a completion of executing the at least one of the one or more compute kernels on the second processing element;after determining that the second processing element is unavailable for executing the at least one of the one or more compute kernels: identifying another compute kernel among the one or more compute kernels, wherein the another compute kernel is an equivalent version of the at least one of the one or more compute kernels prepared for the first compute kernel; andarranging the execution of the another compute kernel on the first processing element. 2. The method of claim 1, further comprising: prior to receiving the one or more compute kernels, receiving one or more operation requests;dynamically selecting at least one of the two or more processing elements for the one or more operation requests; anddynamically preparing the one or more compute kernels for the one or more operation requests. 3. The method of claim 2, further comprising: generating a programming language-independent, processor-independent intermediate representation for the one or more operation requests. 4. The method of claim 2, wherein the one or more operation requests are from an application being executed on the parallel-processing computer system. 5. The method of claim 2, wherein said dynamic execution of the compute kernels is triggered by an operation request subsequent to the one or more operation requests. 6. The method of claim 1, wherein the two or more processing elements include single-core/multi-core central processing units, graphics processing units, or single-core/multi-core co-processors. 7. The method of claim 1, further comprising: generating a pending operation table;for each of the one or more compute kernels, inserting one or more entries into the pending operation table;in response to or in anticipation of the request for a result associated with the at least one of the one or more compute kernels, updating entries associated with the at least one of the one or more compute kernels in the pending operation table before the execution of the at least one of the one or more compute kernels; andremoving the updated entries from the pending operation table after the execution of the at least one of the one or more compute kernels. 8. The method of claim 1, wherein the first processing element is a CPU and the second processing element is a GPU. 9. A parallel-processing computer system, comprising: memory;two or more processing elements including a first processing element of a first architecture and a second processing element of a second architecture different from the first architecture; andat least one program stored in the memory and executed by the two or more processing elements, the at least one program including:instructions performed by the first processing element for receiving one or more compute kernels, wherein the one or more compute kernels are configured to execute on the two or more processing elements; andinstructions performed by the first processing element for dynamically arranging for execution of at least one of the one or more compute kernels on at least one of the two or more processing elements in response to or in anticipation of a request for a result associated with the at least one of the one or more compute kernels, further comprising: instructions for arranging the execution of the at least one of the one or more compute kernels on the second processing element after receiving the request for a result associated with the at least one of the one or more compute kernels; andinstructions for receiving a callback function from the second processing element, wherein the callback function includes a completion signal before a completion of executing the at least one of the one or more compute kernels on the second processing element;instructions for, after determining that the second processing element is unavailable for executing the at least one of the one or more compute kernels: identifying another compute kernel among the one or more compute kernels, wherein the another compute kernel is an equivalent version of the at least one of the one or more compute kernels prepared for the first compute kernel; andarranging the execution of the another compute kernel on the first processing element. 10. The computer system of claim 9, further comprising: instructions for generating a programming language-independent, processor-independent intermediate representation for the one or more operation requests. 11. The computer system of claim 10, wherein the one or more operation requests are from an application being executed on the parallel-processing computer system. 12. The computer system of claim 10, wherein said dynamic execution of the compute kernels is triggered by an operation request subsequent to the one or more operation requests. 13. The computer system of claim 9, wherein the two or more processing elements include single-core/multi-core central processing units, graphics processing units, or single-core/multi-core co-processors. 14. The computer system of claim 9, wherein the request is associated with another compute kernel that will be suspended until after the execution of the at least one of the one or more compute kernels. 15. The computer system of claim 14, wherein the result associated with the at least one of the one or more compute kernels is an input to the another compute kernel. 16. The computer system of claim 9, further comprising: instructions for generating a pending operation table;instructions for, for each of the one or more compute kernels, inserting one or more entries into the pending operation table;instructions for, in response to or in anticipation of the request for a result associated with the at least one of the one or more compute kernels, updating entries associated with the at least one of the one or more compute kernels in the pending operation table before the execution of the at least one of the one or more compute kernels; andinstructions for removing the updated entries from the pending operation table after the execution of the at least one of the one or more compute kernels. 17. The computer system of claim 9, wherein the first processing element is a CPU and the second processing element is a GPU. 18. A computer program product for use in conjunction with a parallel-processing computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising: at the parallel-processing computer system that includes two or more processing elements including a first processing element of a first architecture and a second processing element of a second architecture different from the first architecture, instructions performed by the first processing element for receiving one or more compute kernels, wherein the one or more compute kernels are configured to execute on the two or more processing elements; andinstructions performed by the first processing element for dynamically arranging for execution of at least one of the one or more compute kernels on at least one of the two or more processing elements in response to or in anticipation of a request for a result associated with the at least one of the one or more compute kernels, further comprising:instructions for arranging the execution of the at least one of the one or more compute kernels on the second processing element after receiving the request for a result associated with the at least one of the one or more compute kernels; andinstructions for receiving a callback function from the second processing element, wherein the callback function includes a completion signal before a completion of executing the at least one of the one or more compute kernels on the second processing element;instructions for, after determining that the second processing element is unavailable for executing the at least one of the one or more compute kernels:identifying another compute kernel among the one or more compute kernels, wherein the another compute kernel is an equivalent version of the at least one of the one or more compute kernels prepared for the first compute kernel; andarranging the execution of the another compute kernel on the first processing element. 19. The method of claim 1, wherein the request is associated with another compute kernel that will be suspended until after the execution of the at least one of the one or more compute kernels. 20. The method of claim 19, wherein the result associated with the at least one of the one or more compute kernels is an input to the another compute kernel.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.