Pipelined L2 cache for memory transfers for a video processor
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G09G-005/36
G06T-001/60
G06F-012/08
출원번호
US-0267606
(2005-11-04)
등록번호
US-9111368
(2015-08-18)
발명자
/ 주소
Karandikar, Ashish
Gadre, Shirish
Sijstermans, Franciscus W.
Su, Zhiqiang Jonathan
출원인 / 주소
NVIDIA CORPORATION
인용정보
피인용 횟수 :
0인용 특허 :
148
초록▼
A method for using a pipelined L2 cache to implement memory transfers for a video processor. The method includes accessing a queue of read requests from a video processor. For each of the read requests, a determination is made as to whether there is a cache line hit corresponding to the request. For
A method for using a pipelined L2 cache to implement memory transfers for a video processor. The method includes accessing a queue of read requests from a video processor. For each of the read requests, a determination is made as to whether there is a cache line hit corresponding to the request. For each cache line miss, a cache line slot is allocated to store a new cache line responsive to the cache line miss. An in-order set of cache lines is output to the video processor responsive to the queue of read requests.
대표청구항▼
1. A method for using a pipelined L2 cache to implement memory transfers for a video processor, the method comprising: accessing a queue of read requests from a video processor, wherein the queue of read requests comprises a first set of read requests from a scalar execution unit within the video pr
1. A method for using a pipelined L2 cache to implement memory transfers for a video processor, the method comprising: accessing a queue of read requests from a video processor, wherein the queue of read requests comprises a first set of read requests from a scalar execution unit within the video processor and a second set of read requests from a vector execution unit within the video processor;for each of the read requests, determining whether there is a cache line hit corresponding to the read request;for a cache line miss, allocating a cache line slot to store a new cache line responsive to the cache line miss;outputting an in-order set of valid cache lines to the video processor responsive to the queue of read requests, wherein the cache lines of the in-order set of valid cache lines are output together as a group in response to the set of cache lines becoming valid, and wherein the group comprises the in-order set of valid cache lines; andarbitrating between the first set of read requests and the second set of read requests by using an arbiter coupled to the pipelined L2 cache. 2. The method of claim 1, wherein the in-order set of cache lines are output to a DMA engine of the video processor. 3. The method of claim 1, further comprising: tracking an in flight read request issued in response to a miss for a cache line to avoid issuing a redundant read request. 4. The method of claim 1, wherein the group of in-order set of valid cache lines is output in response to a group of read requests. 5. The method of claim 1, wherein a frame buffer memory comprises a local graphics memory. 6. The method of claim 1, wherein the frame buffer memory comprises a system memory of a computer system. 7. The method of claim 1, wherein cache lines are replaced within the pipelined L2 cache on a least recently used basis. 8. The method of claim 1, wherein the in-order set of valid caches lines are output in an order that corresponding read requests were received by said queue of read requests. 9. A pipelined L2 cache for implementing frame buffer memory transfers for a video processor, comprising: a read request queue for storing read requests from a video processor, wherein the read request queue is configured to receive a first set of read requests from a scalar execution unit within the video processor, and to receive a second set of read requests from a vector execution unit within the video processor; anda cache pipeline for storing cache lines for outputting an in-order set of valid cache lines to the video processor responsive to the read requests, the cache pipeline configured to determine, for each of the read requests, whether there is a cache line hit corresponding to the request, and configured to allocate, for a cache line miss, a cache line slot to store a new cache line responsive to the cache line miss, wherein the cache lines of the in-order set of valid cache lines are output together as a group in response to the set of cache lines becoming valid, and wherein the group comprises the in-order set of valid cache lines, and wherein an arbiter coupled to the pipelined L2 cache is configured to arbitrate between the first set of read requests and the second set of read requests. 10. The pipelined L2 cache of claim 9, wherein an in-order set of cache lines are output to a DMA engine of the video processor. 11. The pipelined L2 cache of claim 9, further comprising: tracking an in flight read request issued in response to a miss for a cache line to avoid issuing a redundant read request. 12. The pipelined L2 cache of claim 9, wherein the group of in-order set of valid cache lines is output in response to a group of read requests. 13. The pipelined L2 cache of claim 9, wherein a frame buffer memory comprises a local graphics memory. 14. The pipelined L2 cache of claim 9, wherein the frame buffer memory comprises a system memory of a computer system. 15. The pipelined L2 cache of claim 9, wherein cache lines are replaced within the pipelined L2 cache on a least recently used basis. 16. A system for executing video processing operations, comprising: a CPU;a video processor coupled to the CPU, comprising: a memory interface for implementing communication between the video processor and a frame buffer memory; anda pipelined L2 cache within the memory interface to implement memory transfers for the video processor, comprising: a read request queue for storing read requests from a video processor; anda cache pipeline for storing cache lines for outputting an in-order set of valid cache lines to the video processor responsive to the read requests, the cache pipeline configured to determine, for each of the read requests, whether there is a cache line hit corresponding to the request, and configured to allocate, for a cache line miss, a cache line slot to store a new cache line responsive to the cache line miss, wherein the cache lines of the in-order set of valid cache lines are output together as a group in response to the set of cache lines becoming valid, wherein the group comprises the in-order set of valid cache lines, wherein the read request queue is configured to receive a first set of read requests from a scalar execution unit within the video processor, and to receive a second set of read requests from a vector execution unit within the video processor, and wherein an arbiter coupled to the pipelined L2 cache is configured to arbitrate between the first set of read requests and the second set of read requests. 17. The system of claim 16, wherein the in-order set of cache lines are output to a DMA engine of the video processor. 18. The pipelined L2 cache of claim 16, wherein the pipelined L2 cache is a non-stalling pipelined L2 cache. 19. The system of claim 16, and wherein the group of in-order set of valid cache lines is output in response to a group of read requests. 20. The system of claim 16, wherein the frame buffer memory comprises a local graphics memory. 21. The system of claim 16, wherein the frame buffer memory comprises a system memory of a computer system. 22. The system of claim 16, wherein the in-order set of caches lines is output to satisfy an outstanding work package. 23. The system of claim 16, wherein the pipeline L2 cache comprises a plurality of stages related to the size of a plurality of work packages.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (148)
Chiang Paul ; Ng Pius ; Look Paul, Accelerated multimedia processor.
MacInnis, Alexander G.; Tang, Chengfuh Jeffrey; Xie, Xiaodong; Patterson, James T.; Kranawetter, Greg A., Apparatus and method for blending graphics and video surfaces.
Harrell Chandlee B. (Mountain View CA), Apparatus and method for handling data transfer between a general purpose computer and a cooperating processor.
Ahmed, Ashraf; Filippo, Michael A.; Pickett, James K., Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor.
Kuma Takao (Kawasaki JPX) Sakai Kenichi (Kawasaki JPX), Asymmetric vector multiprocessor composed of a vector unit and a plurality of scalar units each having a different archi.
Heng,Pheng Ann; Xie,Yongming; Wong,Tien Tsin; Chui,Yim Pan, Block-based fragment filtration with feasible multi-GPU acceleration for real-time volume rendering on conventional personal computer.
Asghar Saf ; Ireton Mark ; Bartkowiak John G., CPU with DSP function preprocessor having look-up table for translating instruction sequences intended to perform DSP fu.
Colglazier, Daniel J.; Dombrowski, Chris; Genduso, Thomas B., Cache for processing data in a memory controller and a method of use thereof to reduce first transfer latency.
Chen Steve S. (Chippewa Falls) Simmons Frederick J. (Neillsville) Spix George A. (Eau Claire) Wilson Jimmie R. (Eau Claire) Miller Edward C. (Eau Claire) Eckert Roger E. (Eau Claire) Beard Douglas R., Cluster architecture for a highly parallel scalar/vector multiprocessor system.
Tannenbaum David C. (Hurley NY) Schanely Paul M. (Hurley NY) Richardson Leland D. (Kingston NY) Hempel Bruce C. (Tivoli NY), Context management in a graphics system.
Apperley Norman (Chandlers Ford NY GBX) Edwards Roger J. (Woodstock NY) Foster Raymond L. J. (Landford GBX) Haigh David C. (Winchester GBX) Haslam Michael (Winchester GBX) Verey Peter (Winchester GBX, Data management for plasma display.
Oldfield William H. (Cambridgeshire GBX), Data memories and method for storing multiple categories of data in latches dedicated to particular category.
Nagashima, Shigeo; Torii, Shunichi; Omoda, Koichiro; Inagami, Yasuhiro, Data processing system including scalar data processor and vector data processor.
Ellis James P. (Hudson MA) Nangia Era (Marlboro MA) Patwa Nital (Hudson MA) Shah Bhavin (Mountain View CA) Wolrich Gilbert M. (Framingham MA), Digital computer system with cache controller coordinating both vector and scalar operations.
Richardson,John J., Driver framework component for synchronizing interactions between a multi-threaded environment and a driver operating in a less-threaded software environment.
Patti Michael F. (Plainsboro NJ) Fedele Nicola J. (Kingston NJ) Harney Kevin (Brooklyn NY) Simon Allen H. (Belle Mead NJ), Dual mode adder circuitry with overflow detection and substitution enabled for a particular mode.
Hilgendorf Rolf,DEX ; Schwermer Hartmut,DEX ; Soell Werner,DEX, Dynamic conversion between different instruction codes by recombination of instruction elements.
Bowhill William J. (Marlborough MA) Dickson Robert (Arlington MA) Durdan W. H. (Waban MA), Efficient protocol for communicating between asychronous devices.
Sweeney Michael A. (Manassas VA), Fast access priority queue for managing multiple messages at a communications node or managing multiple programs in a mu.
Ebrahim Zahir (Mountain View CA) Normoyle Kevin (San Jose CA) Nishtala Satyanarayana (Cupertino CA) Van Loo William C. (Palo Alto CA), Fast, dual ported cache controller for data processors in a packet switched cache coherent multiprocessor system.
Thayer Larry J. (Ft. Collins CO) Coleman Mark D. (Ft. Collins CO), Graphics system with programmable tile size and multiplexed pixel data and partial pixel addresses based on tile size.
Arimilli Ravi Kumar ; Dodson John Steven ; Lewis Jerry Don, High performance cache directory addressing scheme for variable cache sizes utilizing associativity.
Van Hook Timothy J. ; Cheng Howard H. ; DeLaurier Anthony P. ; Gossett Carroll P. ; Moore Robert J. ; Shepard Stephen J. ; Anderson Harold S. ; Princen John ; Doughty Jeffrey C. ; Pooley Nathan F. ; , High performance low cost video game system with coprocessor providing high speed efficient 3D graphics and digital audio signal processing.
Pfeiffer David M. (Plano TX) Stoner David T. (McKinney TX) Norsworthy John P. (Carrollton TX) Dipert Dwight D. (Richardson TX) Thompson Jay A. (Plano TX) Fontaine James A. (Plano TX) Corry Michael K., High speed image processing system using separate data processor and address generator.
Van Hook Timothy J. ; Moreton Henry P. ; Fuccio Michael L. ; Pryor ; Jr. Robert W. ; Tuffli ; III Charles F., Instruction methods for performing data formatting while moving data between memory and a vector register file.
Singh Gurbir ; Wang Wen-Hann ; Rhodehamel Michael W. ; Bauer John M. ; Sarangdhar Nitin V., Method and apparatus for cache memory replacement line identification.
Hall Michael L. (Marysville WA) Engel Glenn R. (Lake Stevens WA), Method and apparatus for dynamically linking subprogram to main program using tabled procedure name comparison.
Zatz, Harold Robert Feldman; Tannenbaum, David C., Method and apparatus for generation of programmable shader configuration information from state-based control information and program instructions.
Dickson Robert (Arlington MA) Durdan W. Hugh (Waban MA) Uhler George M. (Marlborough MA), Method and apparatus for optimizing inter-processor instruction transfers.
Mills Karl Scott ; Holmes Jeffrey Michael ; Bonnelycke Mark Emil ; Owen Richard Charles Andrew, Method and apparatus for optimizing pixel data write operations to a tile based frame buffer.
Floyd, Michael Stephen; Kahle, James Allan; Le, Hung Qui; Moore, John Anthony; Reick, Kevin Franklin; Silha, Edward John, Method and apparatus for patching problematic instructions in a microprocessor using software interrupts.
Johl, Manraj Singh; Steinmetz, Joseph Harold; Wakeley, Matthew Paul, Method and system increasing performance substituting finite state machine control with hardware-implemented data structure manipulation.
Naegle, Nathaniel David; Sweeney, Jr., William E.; Morse, Wayne A., Method for context switching a graphics accelerator comprising multiple rendering pipelines.
Eichenberger,Alexandre E.; O'Brien,John Kevin Patrick; O'Brien,Kathryn M., Method to efficiently prefetch and batch compiler-assisted software cache accesses.
Shiell Jonathan H. ; Bosshart Patrick W., Microprocessor with circuits, systems, and methods for operating with patch micro-operation codes and patch microinstruction codes stored in multi-purpose memory structure.
Bakalash, Reuven; Leviathan, Yaniv, PC-level computing system with a multi-mode parallel graphics rendering subsystem employing an automatic mode controller, responsive to performance data collected during the run-time of graphics applications.
Gooding David N. (Endicott NY) Shimp Everett M. (Endwell NY), Parallel digital arithmetic device having a variable number of independent arithmetic zones of variable width and locati.
Chiarulli Donald M. (4724 Newcomb Dr. Baton Rouge LA 70808) Rudd W. G. (Dept. of Computer Science Oregon State University Corvallis OR 97331) Buell Duncan A. (1212 Chippenham Dr. Baton Rouge LA 70808, Processor utilizing reconfigurable process segments to accomodate data word length.
Beard Douglas R. (Eleva WI) Phelps Andrew E. (Eau Claire WI) Woodmansee Michael A. (Eau Claire WI) Blewett Richard G. (Altoona WI) Lohman Jeffrey A. (Eau Claire WI) Silbey Alexander A. (Eau Claire WI, Scalar/vector processor.
Shiell Jonathan H. ; Chen Ian, Single chip microprocessor circuits, systems, and methods for self-loading patch micro-operation codes and patch microi.
Moll,Laurent R.; Cheng,Yu Qing; Glaskowsky,Peter N.; Song,Seungyoon Peter, Small and power-efficient cache that can provide data for background DNA devices while the processor is in a low-power state.
Hahn Woo Jong,KRX ; Park Kyong,KRX ; Yoon Suk Han,KRX, Structure of processor having a plurality of main processors and sub processors, and a method for sharing the sub processors.
Guttag Karl M. (Sugar Land TX) Read Christopher J. (Houston TX) Poland Sydney W. (Katy TX) Gove Robert J. (Plano TX) Golston Jeremiah E. (Sugar Land TX), Transfer processor with transparency.
William N. Joy ; Marc Tremblay ; Gary Lauterbach ; Joseph I. Chamdani, Vertically and horizontally threaded processor with multidimensional storage for storing thread data.
Alexander,Gregory W.; Levitan,David S.; Sinharoy,Balaram; Starke,William J., Zero cycle penalty in selecting instructions in prefetch buffer in the event of a miss in the instruction cache.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.