IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0836408
(2007-08-09)
|
등록번호 |
US-8094157
(2012-01-10)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
Patterson & Sheridan, LLP
|
인용정보 |
피인용 횟수 :
22 인용 특허 :
12 |
초록
▼
One embodiment of the present invention sets forth a technique for efficiently performing a radix sort operation on a graphics processing unit (GPU). The radix sort operation is conducted on an input list of data using one or more passes of a series of three processing phases. In each processing pha
One embodiment of the present invention sets forth a technique for efficiently performing a radix sort operation on a graphics processing unit (GPU). The radix sort operation is conducted on an input list of data using one or more passes of a series of three processing phases. In each processing phase, thread groups are each associated with one segment of input data. In the first phase, occurrences of each radix symbol are counted and stored in a list of counters. In the second phase, the list of counters is processed by a parallel prefix sum operation to generate a list of offsets. In the third phase, the list of offsets is used to perform re-ordering on the list of data, according to the current radix symbol. To maintain sort stability, the one or more passes proceed from least significant data to most significant data in the sort key.
대표청구항
▼
1. A method of performing an occurrence count of radices, the method comprising: associating a first thread group with a first set of sort keys, wherein each thread in the first thread group corresponds with a different one of the sort keys, and each sort key comprises an identical number of sub-key
1. A method of performing an occurrence count of radices, the method comprising: associating a first thread group with a first set of sort keys, wherein each thread in the first thread group corresponds with a different one of the sort keys, and each sort key comprises an identical number of sub-keys;selecting a first sub-key for each thread in the first thread group;determining a value for the first sub-key for each thread in the first thread group; andtabulating a first number of occurrences of each radix symbol in a plurality of radix symbols based on the values of the first sub-key determined for the threads in the first thread group, wherein each radix symbol corresponds to a different sub-key value. 2. The method of claim 1, wherein the step of tabulating comprises incrementing a counter corresponding to each radix symbol each time the value of the first sub-key for one of the threads corresponds to the radix symbol. 3. The method of claim 1, further comprising the steps of: selecting a second sub-key for each thread in the first thread group;determining a value for the second sub-key for each thread in the first thread group; andtabulating a second number of occurrences of each radix symbol in the plurality of radix symbols based on the values of the second sub-key determined for the threads in the first thread group. 4. The method of claim 1, wherein the steps of determining a value for a sub-key for each thread in the first thread group and tabulating a number of occurrences of each radix symbol based on the sub-key values determined for the threads in the first thread group are repeated for each of the sub-keys comprising the sort keys. 5. The method of claim 1, further comprising the steps of: associating a second thread group with a second set of sort keys, wherein each thread in the second thread group corresponds with a different one of the sort keys;selecting the first sub-key for each thread in the second thread group;determining a value for the first sub-key for each thread in the second thread group; andtabulating a second number of occurrences of each radix symbol in a plurality of radix symbols based on the values of the first sub-key determined for the threads in the second thread group. 6. The method of claim 5, further comprising the step of performing a prefix sum operation across each radix symbol and the first number of occurrences of the radix symbol and the second number of occurrences of the radix symbol to produce a prefix sum output list. 7. The method of claim 6, further comprising the step of modifying the prefix sum output list to produce a set prefix sum offsets. 8. The method of claim 6, wherein the first thread group and the second thread group are part of a cooperative thread array that executes in a core of a parallel processing unit. 9. The method of claim 8, wherein the parallel processing unit comprises a graphics processing unit. 10. The method of claim 1, wherein each thread in the first thread group executes within a different processing engine within a core of a parallel processing unit. 11. A non-transitory computer-readable medium including instructions that, when executed by a processing unit, cause the processing unit to perform an occurrence count of radices, by performing the steps of: associating a first thread group with a first set of sort keys, wherein each thread in the first thread group corresponds with a different one of the sort keys, and each sort key comprises an identical number of sub-keys;selecting a first sub-key for each thread in the first thread group;determining a value for the first sub-key for each thread in the first thread group; andtabulating a first number of occurrences of each radix symbol in a plurality of radix symbols based on the values of the first sub-key determined for the threads in the first thread group, wherein each radix symbol corresponds to a different sub-key value. 12. The non-transitory computer-readable medium of claim 11, wherein the step of tabulating comprises incrementing a counter corresponding to each radix symbol each time the value of the first sub-key for one of the threads corresponds to the radix symbol. 13. The non-transitory computer-readable medium of claim 11, further comprising the steps of: selecting a second sub-key for each thread in the first thread group;determining a value for the second sub-key for each thread in the first thread group; andtabulating a second number of occurrences of each radix symbol in the plurality of radix symbols based on the values of the second sub-key determined for the threads in the first thread group. 14. The non-transitory computer-readable medium of claim 11, wherein the steps of determining a value for a sub-key for each thread in the first thread group and tabulating a number of occurrences of each radix symbol based on the sub-key values determined for the threads in the first thread group are repeated for each of the sub-keys comprising the sort keys. 15. The non-transitory computer-readable medium of claim 11, further comprising the steps of: associating a second thread group with a second set of sort keys, wherein each thread in the second thread group corresponds with a different one of the sort keys;selecting the first sub-key for each thread in the second thread group;determining a value for the first sub-key for each thread in the second thread group; andtabulating a second number of occurrences of each radix symbol in a plurality of radix symbols based on the values of the first sub-key determined for the threads in the second thread group. 16. The non-transitory computer-readable medium of claim 15, further comprising the step of performing a prefix sum operation across each radix symbol and the first number of occurrences of the radix symbol and the second number of occurrences of the radix symbol to produce a prefix sum output list. 17. The non-transitory computer-readable medium of claim 16, further comprising the step of modifying the prefix sum output list to produce a set prefix sum offsets. 18. The non-transitory computer-readable medium of claim 16, wherein the first thread group and the second thread group are part of a cooperative thread array that executes in a core of a parallel processing unit. 19. The non-transitory computer-readable medium of claim 11, wherein each thread in the first thread group executes within a different processing engine within a core of a parallel processing unit. 20. A computing device, comprising: a memory; anda parallel processing unit configured to support the execution of a plurality of thread groups such that: a first thread group is associated with a first set of sort keys, wherein each thread in the first thread group corresponds with a different one of the sort keys, and each sort key comprises an identical number of sub-keys,a first sub-key is selected for each thread in the first thread group,a value for the first sub-key is determined for each thread in the first thread group, anda first number of occurrences is tabulated of each radix symbol in a plurality of radix symbols based on the values of the first sub-key determined for the threads in the first thread group, wherein each radix symbol corresponds to a different sub-key value.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.