IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0453436
(2006-06-14)
|
등록번호 |
US-8644643
(2014-02-04)
|
발명자
/ 주소 |
- Jiao, Guofang
- Du, Yun
- Yu, Chun
- Chen, Lingjun
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
4 인용 특허 :
97 |
초록
▼
Techniques for performing convolution filtering using hardware normally available in a graphics processor are described. Convolution filtering of an arbitrary H×W grid of pixels is achieved by partitioning the grid into smaller sections, performing computation for each section, and combining the int
Techniques for performing convolution filtering using hardware normally available in a graphics processor are described. Convolution filtering of an arbitrary H×W grid of pixels is achieved by partitioning the grid into smaller sections, performing computation for each section, and combining the intermediate results for all sections to obtain a final result. In one design, a command to perform convolution filtering on a grid of pixels with a kernel of coefficients is received, e.g., from a graphics application. The grid is partitioned into multiple sections, where each section may be 2×2 or smaller. Multiple instructions are generated for the multiple sections, with each instruction performing convolution computation on at least one pixel in one section. Each instruction may include pixel position information and applicable kernel coefficients. Instructions to combine the intermediate results from the multiple instructions are also generated.
대표청구항
▼
1. A method comprising: receiving, at a first processor, a command to perform convolution filtering on a grid of pixels;partitioning the grid into multiple sections;generating multiple instructions for the multiple sections, each instruction for performing a convolution computation on at least one p
1. A method comprising: receiving, at a first processor, a command to perform convolution filtering on a grid of pixels;partitioning the grid into multiple sections;generating multiple instructions for the multiple sections, each instruction for performing a convolution computation on at least one pixel in one section;dispatching at least one of the multiple instructions to a second processor, the second processor multiplying the at least one pixel with at least one coefficient received in the at least one instruction, and accumulating at least one result of the multiply to generate an intermediate result; andgenerating instructions to combine intermediate results from the multiple instructions for the multiple sections. 2. The method of claim 1, further comprising: determining position of each pixel in the grid; andincluding pixel position information for the at least one pixel in each instruction. 3. The method of claim 1, further comprising: receiving a kernel of coefficients for the grid of pixels; andincluding at least one coefficient from the kernel in each instruction. 4. The non-transitory computer-readable media of claim 1, and further for storing instructions operable to: receive a kernel of coefficients for the grid of pixels; andinclude at least one coefficient from the kernel in each instruction. 5. An apparatus comprising: first processing means for receiving an instruction, multiplying a pixel by a coefficient in the instruction and accumulating a result of the multiplication to generate an intermediate result for the received instruction; andsecond processing means for receiving a command to perform convolution filtering on a grid of pixels; for partitioning the grid into multiple sections; for generating multiple instructions for the multiple sections, each instruction performing convolution computation on at least one pixel in one section; for dispatching each instruction to the first processing means; and for generating instructions to combine intermediate results from the multiple instructions for the multiple sections. 6. The apparatus of claim 5, further comprising: means for determining position of each pixel in the grid; andmeans for including pixel position information for the at least one pixel in each instruction. 7. The apparatus of claim 5, further comprising: means for receiving a kernel of coefficients for the grid of pixels; andmeans for including at least one coefficient from the kernel in each instruction. 8. The graphics processor of claim 5, wherein for each dispatched instruction the second processing unit is configured to retrieve a pixel from memory, to derive an interpolated coefficient for the pixel based on coefficients received in the instruction, and to multiply the pixel with the interpolated coefficient to generate the intermediate result for the instruction. 9. A non-transitory computer-readable media storing instructions that configure circuitry to: receive, at a first processor, a command to perform convolution filtering on a grid of pixels;partition the grid into multiple sections;generate multiple instructions for the multiple sections, each instruction performing convolution computation on at least one pixel in one section;dispatch the multiple instructions to a second processor, which is configured to multiply a pixel in a section with a coefficient in one of the multiple instructions, and accumulated a result of the multiplication as an intermediate result; andgenerate instructions to combine intermediate results from the multiple instructions for the multiple sections. 10. The non-transitory computer-readable media of claim 2, and further for storing instructions operable to: determine position of each pixel in the grid; andinclude pixel position information for the at least one pixel in each instruction. 11. An graphics processor comprising: a first processing unit configured to receive a set of instructions for convolution filtering of a grid of pixels, to dispatch a plurality of instructions in the set, to receive intermediate results for the dispatched instructions, and to combine the intermediate results to generate a final result for the convolution filtering of the grid of pixels; anda second processing unit configured to receive the instructions dispatched by the shader core, to perform computation on at least one pixel in the grid for each instruction, and to provide an intermediate result for each instruction, wherein the second processing unit is configured to retrieve the at least one pixel from memory, to multiply the at least one pixel with at least one coefficient received in the instruction, and to accumulate at least one result of the multiply to generate the intermediate result for the instruction. 12. The graphics processor of claim 11, wherein the first processing unit is a shader core and the second processing unit is a texture engine. 13. The graphics processor of claim 11, wherein each dispatched instruction covers a 2×2 or smaller section of the grid. 14. The graphics processor of claim 11, wherein for each dispatched instruction the second processing unit is further configured to compute at least one position of the at least one pixel based on a reference position and horizontal and vertical offsets received in the instruction, and to retrieve the at least one pixel from the memory at the at least one position. 15. The graphics processor of claim 11, wherein the convolution filtering is performed with a kernel of coefficients, and wherein the at least one coefficient received in each instruction is closest to the at least one pixel among the coefficients in the kernel. 16. The graphics processor of claim 11, wherein the convolution filtering is performed with a kernel of coefficients, and wherein the coefficients received in each instruction are closest to the pixel among the coefficients in the kernel. 17. The graphics processor of claim 11, wherein for each dispatched instruction the second processing unit is further configured to derive the interpolated coefficient based on four coefficients using bilinear interpolation. 18. The graphics processor of claim 11, wherein for each dispatched instruction the second processing unit is further configured to compute position of the pixel based on a reference position and horizontal and vertical offsets received in the instruction, and to retrieve the pixel from the memory at the computed position. 19. The graphics processor of claim 11, wherein the second processing unit comprises a cache operative to store pixels, and wherein for each dispatched instruction the second processing unit is configured to retrieve the at least one pixel from the cache and to perform a cache fill if the at least one pixel is not located in the cache. 20. The graphics processor of claim 11, wherein the first processing unit comprises a buffer operative to store a kernel of coefficients used for convolution filtering. 21. The graphics processor of claim 20, wherein the buffer is further operative to store horizontal and vertical offsets for each of the pixels in the grid. 22. The graphics processor of claim 11, wherein each dispatched instruction comprises a reference position, up to two horizontal offsets and up to two vertical offsets for up to four pixels, up to four coefficients, and a mask identifying the up to four pixels in the instruction. 23. The graphics processor of claim 11, wherein each dispatched instruction comprises a reference position, a horizontal offset and a vertical offset for a pixel, and four coefficients for the pixel. 24. A method comprising: receiving a set of instructions for convolution filtering of a grid of pixels;dispatching a plurality of instructions in the set;performing computation on at least one pixel in the grid for each dispatched instruction to obtain an intermediate result for the dispatched instruction; andcombining intermediate results for the plurality of dispatched instructions to generate a final result,wherein performing computation on the at least one pixel in the grid for each dispatched instruction comprises: retrieving the at least one pixel from memory,multiplying the at least one pixel with at least one coefficient received in the instruction, andaccumulating at least one result of the multiply to generate the intermediate result for the instruction. 25. The method of claim 24, wherein the multiplying comprises: deriving an interpolated coefficient for the pixel based on coefficients received in the instruction, andmultiplying the pixel with the interpolated coefficient to generate the intermediate result for the instruction.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.