IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0836415
(2007-08-09)
|
등록번호 |
US-7689541
(2010-04-23)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
Patterson & Sheridan, LLP
|
인용정보 |
피인용 횟수 :
7 인용 특허 :
10 |
초록
▼
One embodiment of the present invention sets forth a technique for efficiently performing a radix sort operation on a graphics processing unit (GPU). The radix sort operation is conducted on an input list of data using one or more passes of a series of three processing phases. In each processing pha
One embodiment of the present invention sets forth a technique for efficiently performing a radix sort operation on a graphics processing unit (GPU). The radix sort operation is conducted on an input list of data using one or more passes of a series of three processing phases. In each processing phase, thread groups are each associated with one segment of input data. In the first phase, occurrences of each radix symbol are counted and stored in a list of counters. In the second phase, the list of counters is processed by a parallel prefix sum operation to generate a list of offsets. In the third phase, the list of offsets is used to perform re-ordering on the list of data, according to the current radix symbol. To maintain sort stability, the one or more passes proceed from least significant data to most significant data in the sort key.
대표청구항
▼
I claim: 1. A method for reordering data, the method comprising: generating an ordered list of radix occurrences based on radix symbol and thread group number; performing a prefix sum operation across the radix occurrences to produce a prefix sum output list; modifying the prefix sum output list to
I claim: 1. A method for reordering data, the method comprising: generating an ordered list of radix occurrences based on radix symbol and thread group number; performing a prefix sum operation across the radix occurrences to produce a prefix sum output list; modifying the prefix sum output list to produce an ordered list of offsets; selecting a first sub-key for each thread in a first thread group, wherein each thread corresponds to a different sort key in a first set of sort keys, and each sort key comprises an identical number of sub-keys; determining a value of the first sub-key for each thread in the first thread group; determining a first offset associated with the first sub-key value determined for each thread in the first thread group; and storing data associated with the sort key corresponding to each thread in the first thread group at a buffer location based on the first offset determined for the thread. 2. The method of claim 1, wherein the ordered list of radix occurrences comprises an array organized by radix symbol and thread group number. 3. The method of claim 1, further comprising the steps of: selecting a second sub-key for each thread in the first thread group; determining a value of the second sub-key for each thread in the first thread group; determining a second offset associated with the second sub-key value determined for each thread in the first thread group; and storing the data associated with the sort key corresponding to each thread in the first thread group at a buffer location based on the second offset determined for the thread. 4. The method of claim 1, wherein the steps of determining a value of a sub-key, determine an offset associated with the sub-key value determined for each thread in the first thread group, and storing the data associated with the sort key corresponding to each thread in the first group at a buffer location based on the offset determined for the thread are repeated for each of the sub-keys comprising the sort keys. 5. The method of claim 1, further comprising the steps of: selecting a first sub-key for each thread in a second thread group, wherein each thread corresponds to a different sort key in a second set of sort keys, and each sort key comprises the identical number of sub-keys; determining a value of the first sub-key for each thread in the second thread group; determining a second offset associated with the first sub-key value determined for each thread in the second thread group; and storing data associated with the sort key corresponding to each thread in the second thread group at a buffer location based on the second offset determined for the thread. 6. The method of claim 5, wherein the first thread group and the second thread group are part of a cooperative thread array. 7. The method of claim 6, wherein the cooperative thread array executes within a core of a parallel processing unit. 8. The method of claim 7, wherein the parallel processing unit comprises a graphics processing unit. 9. The method of claim 1, wherein each of the threads in the first thread group executes within a different processing engine within a core of a parallel processing unit. 10. A computer-readable medium including instructions that, when executed by a processing unit, cause the processing unit to reorder data, by performing the steps of: generating an ordered list of radix occurrences based on radix symbol and thread group number; performing a prefix sum operation across the radix occurrences to produce a prefix sum output list; modifying the prefix sum output list to produce an ordered list of offsets; selecting a first sub-key for each thread in a first thread group, wherein each thread corresponds to a different sort key in a first set of sort keys, and each sort key comprises an identical number of sub-keys; determining a value of the first sub-key for each thread in the first thread group; determining a first offset associated with the first sub-key value determined for each thread in the first thread group; and storing data associated with the sort key corresponding to each thread in the first thread group at a buffer location based on the first offset determined for the thread. 11. The computer-readable medium of claim 10, wherein the ordered list of radix occurrences comprises an array organized by radix symbol and thread group number. 12. The computer-readable medium of claim 10, further comprising the steps of: selecting a second sub-key for each thread in the first thread group; determining a value of the second sub-key for each thread in the first thread group; determining a second offset associated with the second sub-key value determined for each thread in the first thread group; and storing the data associated with the sort key corresponding to each thread in the first thread group at a buffer location based on the second offset determined for the thread. 13. The computer-readable medium of claim 10, wherein the steps of determining a value of a sub-key, determine an offset associated with the sub-key value determined for each thread in the first thread group, and storing the data associated with the sort key corresponding to each thread in the first group at a buffer location based on the offset determined for the thread are repeated for each of the sub-keys comprising the sort keys. 14. The computer-readable medium of claim 10, further comprising the steps of: selecting a first sub-key for each thread in a second thread group, wherein each thread corresponds to a different sort key in a second set of sort keys, and each sort key comprises the identical number of sub-keys; determining a value of the first sub-key for each thread in the second thread group; determining a second offset associated with the first sub-key value determined for each thread in the second thread group; and storing data associated with the sort key corresponding to each thread in the second thread group at a buffer location based on the second offset determined for the thread. 15. The computer-readable medium of claim 14, wherein the first thread group and the second thread group are part of a cooperative thread array. 16. The computer-readable medium of claim 15, wherein the cooperative thread array executes within a core of a parallel processing unit. 17. The computer-readable medium of claim 10, wherein each of the threads in the first thread group executes within a different processing engine within a core of a parallel processing unit. 18. A computing device, comprising: a memory; and a parallel processing unit configured to support the execution of a plurality of thread groups such that: an ordered list of radix occurrences is generated based on radix symbol and thread group number, a prefix sum operation is performed across the radix occurrences to produce a prefix sum output list, the prefix sum output list is modified to produce an ordered list of offsets; a first sub-key is selected for each thread in a first thread group, wherein each thread corresponds to a different sort key in a first set of sort keys, and each sort key comprises an identical number of sub-keys, a value of the first sub-key is determined for each thread in the first thread group, a first offset associated with the first sub-key value determined for each thread in the first thread group is determined, and data associated with the sort key corresponding to each thread in the first thread group is stored at a buffer location based on the first offset determined for the thread. 19. The computing device of claim 18, wherein each of the threads in the first thread group executes within a different processing engine within a core of the parallel processing unit. 20. The computing device of claim 19, wherein the parallel processing unit comprises a graphics processing unit.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.