IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0716508
(2007-03-09)
|
등록번호 |
US-8375368
(2013-02-12)
|
발명자
/ 주소 |
- Tuck, Nathan D.
- Papakipos, Matthew N.
- Grant, Brian K.
- Demetriou, Christopher G.
- Civlin, Jan
|
출원인 / 주소 |
|
인용정보 |
피인용 횟수 :
27 인용 특허 :
43 |
초록
▼
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of the parallel-processing computer systems to accelerate/optimize numeric and arra
A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of the parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. A profiling tool is used to collect, analyze, and visualize the performance data of an application in connection with its execution on a parallel-processing computer system through the runtime system. This profiling tool greatly enhances an application developer's ability to understand how an application is executed on the parallel-processing computer system and fine-tune the application to achieve high performance.
대표청구항
▼
1. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes one or more first and second processing elements and memory storing one or more programs for execution by the one or more first and second processing elements, r
1. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes one or more first and second processing elements and memory storing one or more programs for execution by the one or more first and second processing elements, receiving one or more operation requests from an application being executed by the computer system;at runtime: partitioning the one or more operation requests into one or more compute kernels;selecting a respective one of the first and second processing elements for executing each of the compute kernels;executing each respective compute kernel at the processing element to which it was targeted; anddetermining performance information of the compute kernels in association with their execution by the one or more of first and second processing elements;mapping the performance information of the compute kernels into performance information of the one or more operation requests from the application; andreporting and/or storing the performance information of the one or more operation requests. 2. The method of claim 1, wherein at least two of the second processing elements execute the one or more compute kernels in parallel. 3. The method of claim 1, wherein at least one of the first and second processing elements is one selected from the group comprising of a GPU, a coprocessor, and a CPU. 4. The method of claim 3, wherein the coprocessor has multiple cores. 5. The method of claim 3, wherein the CPU has multiple cores. 6. The method of claim 1, wherein at least two of the first and second processing elements are of different processor architectures. 7. The method of claim 1, wherein the performance information includes computation performance information, data transfer performance information, dynamic code generation performance information, dynamic workload distribution performance information. 8. The method of claim 1, wherein a respective compute kernel of the one or more compute kernels includes a plurality of compute kernel sub-portions, each sub-portion corresponding to one of the one or more operation requests. 9. The method of claim 8, the determining further comprising: determining a weighting value for each of the compute kernel sub-portions, each weighting value corresponding to the relative cost of executing an associated compute kernel sub-portion; anddetermining performance information of the compute kernel sub-portions. 10. The method of claim 9, the mapping further comprising: mapping the performance information of the compute kernel sub-portions into performance information of the corresponding operation request. 11. The method of claim 8, wherein each compute kernel sub-portion corresponds to an application program interface call. 12. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes first and second processing elements and memory storing one or more programs for execution by the first and the second processing elements, receiving one or more operation requests from an application being executed by the computer system;at runtime: partitioning the one or more operation requests into at least a first compute kernel and a second compute kernel;selecting the first processing element for executing the first compute kernel;selecting the second processing element for executing the second compute kernel;executing the first compute kernel at the first processing element;executing the second compute kernel at the second processing element;determining performance information of the first compute kernel in association with its execution by the first processing element, wherein the performance information is in a first format; anddetermining performance information of the second compute kernel in association with its execution by the second processing element, wherein the performance information is in a second format;mapping the performance information of the first and second compute kernels from the first and second formats into a unified format, respectively; andreporting and/or storing the unified-format performance information. 13. The method of claim 12, further comprising: displaying the mapped performance information of the first and second compute kernels in the unified format. 14. The method of claim 12, wherein the first and second processing elements have different architectures. 15. A computer-implemented method, comprising: in a runtime system configured to run on a parallel-processing computer system that includes a primary processor and a secondary processor and memory storing one or more programs for execution by the primary and secondary processing elements, receiving one or more operation requests from an application being executed by the computer system;at runtime: partitioning the one or more operation requests into one or more compute kernels;selecting a respective one of the first and second processing elements for executing each of the compute kernels;executing each respective compute kernel at the respective processor to which it was targeted; andemploying a facility at the secondary processor to determine I/O performance information associated with the execution of a respective compute kernel by the secondary processor in connection with an operation of moving data from the primary processor to the secondary processor;mapping the I/O performance information associated with the secondary processor back to I/O performance information associated with the primary processor; andreporting and/or storing the I/O performance information associated with the primary processor. 16. The method of claim 15, wherein the primary processor is a CPU and the secondary processor is a GPU. 17. A parallel-processing computer system, comprising: memory;one or more of first processing elements;one or more of second processing elements;a runtime system configured to run on the parallel-processing computer system; andat least one program stored in the memory and executed by at least one of the first and second processing elements, the at least one program including instructions for: receiving one or more operation requests from an application being executed by the computer system;at runtime: partitioning the one or more operation requests into one or more compute kernels;selecting a respective one of the first and second processing elements for executing each of the compute kernels;executing each respective compute kernel at the processing element to which it was targeted; anddetermining performance information of the compute kernels in association with their execution by the one or more of first and second processing elements;mapping the performance information of the compute kernels into performance information of the one or more operation requests from the application; andreporting and/or storing the performance information of the one or more operation requests. 18. The computer system of claim 17, wherein at least two of the second processing elements execute the one or more compute kernels in parallel. 19. The computer system of claim 17, wherein at least one of the first and second processing elements is one selected from the group comprising of a GPU, a coprocessor, and a CPU. 20. The computer system of claim 19, wherein the coprocessor has multiple cores. 21. The computer system of claim 19, wherein the CPU has multiple cores. 22. The computer system of claim 17, wherein at least two of the first and second processing elements are of different processor architectures. 23. The computer system of claim 17, wherein the performance information includes computation performance information, data transfer performance information, dynamic code generation performance information, dynamic workload distribution performance information. 24. A non-transitory computer readable storage medium storing one or more programs configured to be executed by a parallel-processing computer system that includes one or more first and second processing elements, the one or more programs comprising instructions for: receiving one or more operation requests from an application being executed by the computer system;at runtime: partitioning the one or more operation requests into one or more compute kernels;selecting a respective one of the first and second processing elements for executing each of the compute kernels;executing each respective compute kernel at the processing element to which it was targeted; anddetermining performance information of the compute kernels in association with their execution by the one or more of first and second processing elements;mapping the performance information of the compute kernels into performance information of the one or more operation requests from the application; andreporting and/or storing the performance information of the one or more operation requests.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.