IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0535871
(2006-09-27)
|
등록번호 |
US-7526634
(2009-07-01)
|
발명자
/ 주소 |
- Duluk, Jr., Jerome F.
- Lew, Stephen D.
- Nickolls, John R.
|
출원인 / 주소 |
|
대리인 / 주소 |
Townsend and Townsend and Crew LLP
|
인용정보 |
피인용 횟수 :
34 인용 특허 :
4 |
초록
▼
Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or "sets" of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon compl
Systems and methods for synchronizing processing work performed by threads, cooperative thread arrays (CTAs), or "sets" of CTAs. A central processing unit can load launch commands for a first set of CTAs and a second set of CTAs in a pushbuffer, and specify a dependency of the second set upon completion of execution of the first set. A parallel or graphics processor (GPU) can autonomously execute the first set of CTAs and delay execution of the second set of CTAs until the first set of CTAs is complete. In some embodiments the GPU may determine that a third set of CTAs is not dependent upon the first set, and may launch the third set of CTAs while the second set of CTAs is delayed. In this manner, the GPU may execute launch commands out of order with respect to the order of the launch commands in the pushbuffer.
대표청구항
▼
What is claimed is: 1. A method, comprising: executing a first set of thread arrays in a processor, wherein the first set of thread arrays comprises a first set of thread groups, wherein thread groups from the first set of thread groups execute, in parallel, instructions associated with the first p
What is claimed is: 1. A method, comprising: executing a first set of thread arrays in a processor, wherein the first set of thread arrays comprises a first set of thread groups, wherein thread groups from the first set of thread groups execute, in parallel, instructions associated with the first process, and wherein a first set of thread groups is associated with a first reference counter that increments upon completion of execution of each thread group from the first set of thread groups; specifying a second set of thread arrays as dependent on a status of execution of the first set of thread arrays; and delaying execution of the second set of thread arrays in the processor based on the status of execution of the first set of thread arrays, wherein delaying execution of the second set of thread arrays further comprises: counting a number of thread groups that have completed execution in the first set of thread arrays; and delaying until the number of thread groups that have completed execution equals a number of thread groups in the first set of thread arrays. 2. The method of claim 1 wherein the status of execution of the first set of thread arrays includes an indication that each thread array in the set has completed execution. 3. The method of claim 1 wherein delaying execution comprises determining based on a comparison of a launch counter and a completion counter whether the first set of thread arrays has completed execution. 4. The method of claim 1 wherein executing the first set of thread arrays comprises incrementing a launch counter for each launched thread array in the first set of thread arrays. 5. The method of claim 1 wherein executing the first set of thread arrays comprises incrementing a completion counter for each thread array in the first set of thread arrays that has completed execution. 6. The method of claim 1 further comprising, while delaying execution of the second set of thread arrays, executing a third set of thread arrays in the processor. 7. A method, comprising: loading from a central processing unit into a pushbuffer coupled to a processing unit: a first launch command for a first process, wherein the first launch command is to be executed by a first set of thread groups, wherein thread groups from the first set of thread groups operate in parallel to execute instructions associated with the first process, and wherein the first set of thread groups is associated with a reference counter, the reference counter being incremented upon completion of execution of each of the thread groups from the first set of thread groups, a second launch command for a second process, wherein the second launch command is to be executed by a second set of thread groups, and wherein thread groups from the second set of thread groups execute in parallel instructions associated with the second process, and a dependency between the first process and the second process, wherein the dependency associates the first reference counter with the second launch command; executing the first launch command in the processing unit; and delaying execution of the second launch command in the processing unit based on the dependency, wherein execution of the second launch command is delayed until the first reference counter equals a number of thread groups in the first set of thread groups used to execute instructions associated with the first process. 8. The method of claim 7 wherein the dependency between the first process and the second process includes an indication that the first process has completed execution. 9. The method of claim 7 wherein delaying execution comprises determining based on a comparison of a launch counter and a completion counter whether the first process has completed execution. 10. The method of claim 7 wherein the reference counter further comprises a launch counter, and wherein executing the first launch command comprises incrementing the launch counter for each launched thread group in the first set of thread groups in the first process. 11. The method of claim 8 wherein the reference counter further comprises a completion counter, and wherein executing the first process comprises incrementing the completion counter for each thread group in the first set of thread groups in the first process, and wherein delaying execution of the second launch command further comprises delaying the execution of the second launch command until the completion counter equals the launch counter. 12. The method of claim 7 further comprising, while delaying execution of the second command, executing a third process in the processor. 13. A system, comprising: a central processing unit configured to generate: a first launch command for a first process, wherein the first launch command is to be executed by a first set of thread groups, wherein thread groups from the first set of thread groups operate in parallel to execute instructions associated with the first process, and wherein the first set of thread groups is associated with a reference counter, the reference counter being incremented upon completion of execution of each of the thread groups from the first set of thread groups, a second launch command for a second process, including a dependency of the second process upon the first process, wherein the second launch command is to be executed by a second set of thread groups, wherein thread groups from the second set of thread groups execute, in parallel, instructions associated with the second process, and a third launch command for a third process, wherein the third launch command is to be executed by a third set of thread groups, wherein thread groups from the third set of thread groups execute, in parallel, instructions associated with the third process; a pushbuffer coupled to the central processing unit, the pushbuffer configured to sequentially receive the first, second, and third launch commands; and a processing unit configured to: execute the first launch command to complete the first process, delay execution of the second launch command based on the dependency; and execute the third launch command while the second process is delayed. 14. The system of claim 13 wherein executing the third launch command for the third process while the second process is delayed comprises utilizing resources in the processing unit that would not otherwise be utilized. 15. The system of claim 13 wherein executing the third launch command comprises executing instructions out of order in the processing unit with respect to an order of launch commands in the pushbuffer. 16. The system of claim 13 wherein executing the third launch command comprises determining that the third process may execute while the second process is delayed. 17. The system of claim 15 wherein the processing unit includes logic configured to determine that the third process is not dependent upon a result of the first process or the second process. 18. The system of claim 17 wherein the logic is configured to determine that the third process is not dependent upon a result of the first process by analyzing a reference counter identifier assigned to the first process.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.