Apparatus, systems, and methods for providing configurable computational imaging pipeline
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/80
G06F-015/167
G06F-009/38
출원번호
US-0082645
(2013-11-18)
등록번호
US-9146747
(2015-09-29)
발명자
/ 주소
Moloney, David
Richmond, Richard
Donohoe, David
Barry, Brendan
Brick, Cormac
Vesa, Ovidiu Andrei
출원인 / 주소
LINEAR ALGEBRA TECHNOLOGIES LIMITED
대리인 / 주소
Wilmer Cutler Pickering Hale and Dorr LLP
인용정보
피인용 횟수 :
0인용 특허 :
32
초록▼
The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated
The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.
대표청구항▼
1. An electronic device comprising: a parallel processing device comprising: a plurality of processing elements each configured to execute instructions;a memory subsystem comprising a plurality of memory slices including a first memory slice associated with one of the plurality of processing element
1. An electronic device comprising: a parallel processing device comprising: a plurality of processing elements each configured to execute instructions;a memory subsystem comprising a plurality of memory slices including a first memory slice associated with one of the plurality of processing elements, wherein the first memory slice comprises a plurality of random access memory (RAM) tiles each having individual read and write ports; andan interconnect system configured to couple the plurality of processing elements and the memory subsystem, wherein the interconnect system includes: a local interconnect configured to couple the first memory slice and the one of the plurality of processing elements, anda global interconnect configured to couple the first memory slice and the remaining of the plurality of processing elements;a processor, in communication with the parallel processing device, configured to run a module stored in memory that is configured to: receive a flow graph associated with a data processing process, wherein the flow graph comprises a plurality of nodes and a plurality of edges connecting two or more of the plurality of nodes, wherein each node identifies an operation and each edge identifies a relationship between the connected nodes; andassign a first node of the plurality of nodes to a first processing element of the parallel processing device and a second node of the plurality of nodes to a second processing element of the parallel processing device, thereby parallelizing operations associated with the first node and the second node. 2. The electronic device of claim 1, wherein the flow graph is provided in an extensible markup language (XML) format. 3. The electronic device of claim 1, wherein the module is configured to assign the first node of the plurality of nodes to the first processing element based on a past performance of a memory subsystem in the parallel processing device. 4. The electronic device of claim 3, wherein the memory subsystem of the parallel processing device comprises a counter that is configured to count a number of memory clashes over a predetermined period of time, and the past performance of the memory subsystem comprises the number of memory clashes measured by the counter. 5. The electronic device of claim 1, wherein the module is configured to assign the first node of the plurality of nodes to the first processing element while the parallel processing device is operating at least a portion of the flow graph. 6. The electronic device of claim 1, wherein the module is configured to receive a plurality of flow graphs, and assign all operations associated with the plurality of flow graphs to a single processing element in the parallel processing device. 7. The electronic device of claim 1, wherein the module is configured to stagger memory accesses by the processing elements to reduce memory clashes. 8. The electronic device of claim 1, wherein the electronic device includes a mobile device. 9. The electronic device of claim 1, wherein the flow graph is specified using an application programming interface (API) associated with the parallel processing device. 10. The electronic device of claim 1, wherein the module is configured to provide input image data to the plurality of processing elements by: dividing the input image data into a plurality of strips; andproviding one of the plurality of strips of the input image data to one of the plurality of processing elements. 11. The electronic device of claim 10, wherein a number of the plurality of strips of the input image data is the same as a number of the plurality of processing elements. 12. A method comprising: receiving, at a processor in communication with a parallel processing device, a flow graph associated with a data processing process, wherein the flow graph comprises a plurality of nodes and a plurality of edges connecting two or more of the plurality of nodes, wherein each node identifies an operation and each edge identifies a relationship between the connected nodes; andassigning a first node of the plurality of nodes to a first processing element of the parallel processing device and a second node of the plurality of nodes to a second processing element of the parallel processing device, thereby parallelizing operations associated with the first node and the second node,wherein the parallel processing device also comprises: a memory subsystem comprising a plurality of memory slices including a first memory slice associated with the first processing element, wherein the first memory slice comprises a plurality of random access memory (RAM) tiles each having individual read and write ports; andan interconnect system configured to couple the first processing element, the second processing element, and the memory subsystem, wherein the interconnect system includes: a local interconnect configured to couple the first memory slice and the first processing element, anda global interconnect configured to couple the first memory slice and the second processing element. 13. The method of claim 12, wherein the flow graph is provided in an extensible markup language (XML) format. 14. The method of claim 12, wherein assigning the first node of the plurality of nodes to the first processing element of the parallel processing device comprises assigning the first node of the plurality of nodes to the first processing element based on a past performance of a first memory slice in the parallel processing device. 15. The method of claim 14, further comprising counting, at a counter in the memory subsystem, a number of memory clashes in the first memory slice over a predetermined period of time, and the past performance of the first memory slice comprises the number of memory clashes in the first memory slice. 16. The method of claim 12, wherein assigning the first node of the plurality of nodes to the first processing element is performed while the parallel processing device is operating at least a portion of the flow graph. 17. The method of claim 12, further comprising staggering memory accesses by the processing elements to the first memory slice in order to reduce memory clashes. 18. The method of claim 12, wherein the flow graph is specified using an application programming interface (API) associated with the parallel processing device. 19. The method of claim 12, further comprising providing an input image data to the plurality of processing elements by dividing the input image data into a plurality of strips and providing one of the plurality of strips of the input image data to one of the plurality of processing elements. 20. The method of claim 19, wherein a number of the plurality of strips of the input image data is the same as a number of the plurality of processing elements.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (32)
Comair, Claude; Li, Xin; Abou-Samra, Samir; Champagne, Robert; Fam, Sun Tjen; Ghali, Prasanna; Pan, Jun, 3D transformation matrix compression and decompression.
Seong,Nak hee; Lim,Kyoung mook; Jeong,Seh woong; Park,Jae hong; Im,Hyung jun; Bae,Gun young; Kim,Young duck, Apparatus and method for dispatching very long instruction word having variable length.
Iwata Yasushi,JPX ; Asato Akira,JPX, Data processing device to compress and decompress VLIW instructions by selectively storing non-branch NOP instructions.
Pitsianis,Nikos P.; Pechanek,Gerald George; Rodriguez,Ricardo, Efficient complex multiplication and fast fourier transform (FFT) implementation on the ManArray architecture.
Pitsianis, Nikos P.; Pechanek, Gerald G.; Rodriguez, Ricardo E., Efficient complex multiplication and fast fourier transform (FFT) implementation on the manarray architecture.
Coleman Charles H. (Redwood City CA) Miller Sidney D. (Mountain View CA) Smidth Peter (Menlo Park CA), Method and apparatus for image data compression using combined luminance/chrominance coding.
Gerald G. Pechanek ; Juan Guillermo Revilla ; Edwin F. Barry, Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor.
Pechanek Gerald G. ; Revilla Juan Guillermo ; Barry Edwin F., Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor.
Pechanek, Gerald G.; Revilla, Juan Guillermo; Barry, Edwin Franklin, Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor.
Drabenstott, Thomas L.; Pechanek, Gerald G.; Barry, Edwin F.; Kurak, Jr., Charles W., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Drabenstott, Thomas L.; Pechanek, Gerald G.; Barry, Edwin F.; Kurak, Jr., Charles W., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Drabenstott,Thomas L.; Pechanek,Gerald George; Barry,Edwin Franklin; Kurak, Jr.,Charles W., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Drabenstott,Thomas L.; Penchanek,Gerald G.; Barry,Edwin F.; Kurak, Jr.,Charles W., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Thomas L. Drabenstott ; Gerald G. Pechanek ; Edwin F. Barry ; Charles W. Kurak, Jr., Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution.
Hall William E. (Beaverton OR) Stigers Dale A. (Hillsboro OR) Decker Leslie F. (Portland OR), Parallel vector processing system for individual and broadcast distribution of operands and control information.
Topham,Nigel Peter, Processor and method for generating and storing compressed instructions in a program memory and decompressed instructions in an instruction cache wherein the decompressed instructions are assigned im.
Topham,Nigel Peter, Processor and method for generating and storing compressed instructions in a program memory and decompressed instructions in an instruction cache wherein the decompressed instructions are assigned imaginary addresses derived from information stored in the program memory with the compressed instructions.
Booth, Jr.,Lawrence A.; Rosenzweig,Joel; Burr,Jeremy, System and method for high-speed communications between an application processor and coprocessor.
Haikonen Pentti,FIX ; Juhola Janne M.,FIX ; Latva-Rasku Petri,FIX, Video compressing method wherein the direction and location of contours within image blocks are defined using a binary picture of the block.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.