Arithmetic node including general digital signal processing functions for an adaptive computing machine
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/173
G06F-015/80
출원번호
US-0367188
(2003-02-13)
등록번호
US-8949576
(2015-02-03)
발명자
/ 주소
Hogenauer, Eugene B.
출원인 / 주소
NVIDIA Corporation
대리인 / 주소
Patterson & Sheridan, LLP
인용정보
피인용 횟수 :
0인용 특허 :
49
초록▼
An apparatus for processing operations in an adaptive computing environment is provided. The adaptive computing environment including at least one processing node. A node includes a memory configured to receive and store data. The data is received from a programmable interconnection network and stor
An apparatus for processing operations in an adaptive computing environment is provided. The adaptive computing environment including at least one processing node. A node includes a memory configured to receive and store data. The data is received from a programmable interconnection network and stored. The node also includes an execution unit configured to perform a signal processing operation. The operation is performed using data retrieved from the memory and an output result is generated. The output result may be used for further computations or sent directly to the programmable interconnection network for transfer to another processing node in the adaptive computing environment.
대표청구항▼
1. An adaptive computing engine, comprising: a programmable interconnection network including a network root and a set of crosspoint switches, each crosspoint switch coupled to the network root, wherein the network root and the set of crosspoint switches can be programmed to configure the adaptive c
1. An adaptive computing engine, comprising: a programmable interconnection network including a network root and a set of crosspoint switches, each crosspoint switch coupled to the network root, wherein the network root and the set of crosspoint switches can be programmed to configure the adaptive computing engine for one or more different tasks; anda plurality of nodes that each have a fixed and different architecture that corresponds to a particular algorithmic function, wherein each node is connected to one or more other nodes in the plurality of nodes by at least one crosspoint switch in the set of crosspoint switches, each node including: an execution unit configured to perform the particular algorithmic function associated with the node, the execution unit having an internal structure specific to the particular algorithmic function associated with the node,a memory configured to receive data and to store data, the memory having a size and a memory format, anda node wrapper configured to: receive data and configuration information from the programmable interconnection network,distribute the data and configuration information received from the interconnection network to the execution unit and to the memory,receive data from the execution unit and from the memory, andtransmit the data received from the execution unit and from the memory to other nodes in the plurality of nodes and to one or more processing elements external to the adaptive computing engine via the programmable interconnection network. 2. The adaptive computing engine of claim 1, wherein each node in the plurality of nodes further includes: an instruction cache configured to store one or more instructions and one or more operands; anda controller configured to control the operation of the execution unit and the address generator by performing the steps of: retrieving an instruction from the instruction cache,causing the address generator to generate an address for the instruction cache at which one or more operands associated with the instruction are stored,retrieving the one or more operands from the memory based on the address, andtransmitting the instruction and the one or more operands to the execution unit for execution. 3. The adaptive computing engine of claim 2, wherein the controller retrieves one or more instructions from sequential addresses in the instruction cache until a branch instruction is retrieved from the instruction cache. 4. The adaptive computing engine of claim 3, wherein the branch instruction is an unconditional branch instruction including a branch address specifying a location in the instruction cache of a subsequent instruction to be executed that is based on a value stored in a computed value latch that is set during execution of a previous instruction. 5. The adaptive computing engine of claim 3, wherein the branch instruction is a conditional branch instruction, the controller continues to retrieve instructions from sequential addresses in the instruction cache according to a binary value stored in a conditional status latch that is set during execution of a previous instruction. 6. The adaptive computing engine of claim 1, configured to execute an instruction loop, the adaptive computing engine comprising: a loop stack configured to receive an address of a start instruction in the instruction cache, an address of an end instruction in the instruction cache, and a maximum number of loop iterations; anda program counter configured to record a current number of loop iterations, wherein the loop stack is popped and a subsequent instruction loop is executed when the maximum number of loop iterations is equal to the current number of loop iterations. 7. The adaptive computing engine of claim 1, further comprising: a system bus interface configured to provide communication with one or more computer systems;a network input interface configured to send and receive real-time data;an external memory interface configured to be coupled to one or more external memory devices; anda network output interface configured to provide communication with one or more other adaptive computing engines. 8. The adaptive computing engine of claim 7, wherein the adaptive computing engine is coupled to one or more other adaptive computing engines that are connected in a sequence, and the last other adaptive computing engine in the sequence includes a feedback connection to the adaptive computing engine. 9. The adaptive computing engine of claim 1, wherein the programmable interconnection network is configured to cause the plurality of nodes to implement a linear algorithmic operation, a non-linear algorithmic operation, a finite state machine operation, a memory operation, a bit manipulation, a fast Fourier transform, an arithmetic logic function, a multiply-accumulate function, or a discrete cosine transformation. 10. A computing system, comprising a first adaptive computing engine and a second adaptive computing engine, wherein the first adaptive computing engine and the second adaptive computing engine each comprise: a programmable interconnection network including a network root and a set of crosspoint switches, each crosspoint switch coupled to the network root, wherein the network root and the set of crosspoint switches can be programmed to configure the adaptive computing engine for one or more different tasks; anda plurality of nodes that each have a fixed and different architecture that corresponds to a particular algorithmic function, wherein each node is connected to one or more other nodes in the plurality of nodes by at least one crosspoint switch in the set of crosspoint switches, each node including: an execution unit configured to perform the particular algorithmic function associated with the node, the execution unit having an internal structure specific to the particular algorithmic function associated with the node,a memory configured to receive data and to store data, the memory having a size and a memory format, anda node wrapper configured to: receive data and configuration information from the programmable interconnection network,distribute the data and configuration information received from the interconnection network to the execution unit and to the memory,receive data from the execution unit and from the memory, andtransmit the data received from the execution unit and from the memory to other nodes in the plurality of nodes and to one or more processing elements external to the adaptive computing engine via the programmable interconnection network. 11. The computing system of claim 10, wherein each node in the plurality of nodes further includes: an instruction cache configured to store one or more instructions and one or more operands; anda controller configured to control the operation of the execution unit and the address generator by performing the steps of: retrieving an instruction from the instruction cache,causing the address generator to generate an address for the instruction cache at which one or more operands associated with the instruction are stored,retrieving the one or more operands from the memory based on the address, andtransmitting the instruction and the one or more operands to the execution unit for execution. 12. The computing system of claim 11, wherein the controller retrieves one or more instructions from sequential addresses in the instruction cache until a branch instruction is retrieved from the instruction cache. 13. The computing system of claim 12, wherein the branch instruction is an unconditional branch instruction including a branch address specifying a location in the instruction cache of a subsequent instruction to be executed that is based on a value stored in a computed value latch that is set during execution of a previous instruction. 14. The computing system of claim 12, wherein the branch instruction is a conditional branch instruction, the controller continues to retrieve instructions from sequential addresses in the instruction cache according to a binary value stored in a conditional status latch that is set during execution of a previous instruction. 15. The computing system of claim 10, configured to execute an instruction loop, the adaptive computing engine comprising: a loop stack configured to receive an address of a start instruction in the instruction cache, an address of an end instruction in the instruction cache, and a maximum number of loop iterations; anda program counter configured to record a current number of loop iterations, wherein the loop stack is popped and a subsequent instruction loop is executed when the maximum number of loop iterations is equal to the current number of loop iterations. 16. The computing system of claim 10, wherein the first adaptive computing engine and the second adaptive computing engine each further include: a system bus interface configured to provide communication with one or more computer systems;a network input interface configured to send and receive real-time data;an external memory interface configured to be coupled to one or more external memory devices; anda network output interface configured to provide communication with one or more other adaptive computing engines. 17. The computing system of claim 16, wherein the network output interface included in the first adaptive computing engine is coupled to the network input interface of the second adaptive computing engine, and the network output interface included in the second adaptive computing engine is coupled to the network input interface of the first adaptive computing engine comprising a feedback connection from the second adaptive computing engine to the first adaptive computing engine. 18. The computing system of claim 10, wherein the programmable interconnection network is configured to cause the plurality of nodes to implement a linear algorithmic operation, a non-linear algorithmic operation, a finite state machine operation, a memory operation, a bit manipulation, a fast Fourier transform, an arithmetic logic function, a multiply-accumulate function, or a discrete cosine transformation.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (49)
Freeman Ross H. (San Jose CA), Configurable electrical circuit having configurable logic elements and configurable interconnects.
Popli Sanjay (Sunnyvale CA) Pickett Scott (Los Gatos CA) Hawley David (Belmont CA) Moni Shankar (Santa Clara CA) Camarota Rafael C. (San Jose CA), Configuration features in a configurable logic array.
Martin Vorbach DE; Robert Munch DE, Internal bus system for DFPS and units with two- or multi-dimensional programmable cell architectures, for managing large volumes of data with a high interconnection complexity.
Bertolet Allan Robert ; Clinton Kim P.N. ; Gould Scott Whitney ; Keyser III Frank Ray ; Reny Timothy Shawn ; Zittritsch Terrance John, Method and system for layout and schematic generation for heterogeneous arrays.
Cooke Laurence H. ; Phillips Christopher E. ; Wong Dale, Method for compiling high level programming languages into an integrated processor with reconfigurable logic.
Vorbach, Martin; Munch, Robert, Method for deadlock-free configuration of dataflow processors and modules with a two- or multidimensional programmable cell structure (FPGAs, DPGAs, etc.).
Martin Vorbach DE; Robert Munch DE, Method for hierarchical caching of configuration data having dataflow processors and modules having two-or multidimensional programmable cell structure (FPGAs, DPGAs, etc.)--.
Camarota Rafael C. (San Jose CA) Furtek Frederick C. (Menlo Park CA) Ho Walford W. (Saratoga CA) Browder Edward H. (Saratoga CA), Programmable logic cell and array.
Camarota Rafael C. (San Jose CA) Furtek Frederick C. (Menlo Park CA) Ho Walford W. (Saratoga CA) Browder Edward H. (Saratoga CA), Programmable logic cell and array with bus repeaters.
Trimberger Stephen M. ; Carberry Richard A. ; Johnson Robert Anders ; Wong Jennifer, Programmable logic device including configuration data or user data memory slices.
Davis Donald J. ; Bennett Toby D. ; Harris Jonathan C. ; Miller Ian D. ; Edwards Stephen G., System and method for programming the hardware of field programmable gate arrays (FPGAs) and related reconfiguration resources as if they were software by creating hardware objects.
Martin Vorbach DE; Robert Munch DE, UNIT FOR PROCESSING NUMERIC AND LOGIC OPERATIONS FOR USE IN CENTRAL PROCESSING UNITS (CPUS), MULTIPROCESSOR SYSTEMS, DATA-FLOW PROCESSORS (DSPS), SYSTOLIC PROCESSORS AND FIELD PROGRAMMABLE GATE ARRAY.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.