Apparatus and method for an improved performance VLIW processor
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/82
G06F-015/76
출원번호
US-0730039
(2000-12-05)
발명자
/ 주소
Mohamed,Moataz A.
Spence,John R.
출원인 / 주소
Mindspeed Technologies, Inc.
대리인 / 주소
Farjami &
인용정보
피인용 횟수 :
3인용 특허 :
15
초록▼
In one exemplary embodiment, the disclosed VLIW processor comprises a number of threads where each thread includes a processing unit. For example, there can be two threads, where each of the two threads has its own processing unit. According to this exemplary embodiment, a number of VLIW packets are
In one exemplary embodiment, the disclosed VLIW processor comprises a number of threads where each thread includes a processing unit. For example, there can be two threads, where each of the two threads has its own processing unit. According to this exemplary embodiment, a number of VLIW packets are divided into a number of issue groups. As an example, two VLIW packets are divided into two issue groups each. The first issue group in the first VLIW packet is provided to a first thread for execution in the first thread processing unit during a first clock cycle. Concurrently, the first issue group in the second VLIW packet is provided to a second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the first clock cycle. Moreover, the second issue group in the first VLIW packet is provided to the first thread for execution in the first thread processing unit during a second clock cycle. Concurrently, the second issue group in the second VLIW packet is provided to the second thread for execution in the second thread processing unit during the same clock cycle, i.e. during the second clock cycle. In this manner, various resources of the VLIW processor are efficiently utilized and two VLIW packets are executed during two clock cycles. As such, the processing speed of the VLIW processor is doubled without a significant increase in the power consumed by the VLIW processor.
대표청구항▼
The invention claimed is: 1. A processor comprising: i. A first thread and a second thread, said first thread comprising a first processing unit and said second thread comprising a second processing unit; ii. A first instruction packet and a second instruction packet, said first instruction packet
The invention claimed is: 1. A processor comprising: i. A first thread and a second thread, said first thread comprising a first processing unit and said second thread comprising a second processing unit; ii. A first instruction packet and a second instruction packet, said first instruction packet comprising at most two issue groups and said second instruction packet comprising at most two issue groups, each of said at most two issue groups of said first instruction packet and each of said at most two issue groups of said second instruction packet comprising at most 64 bits and an internal instruction bus no greater than 64 bits wide for transport to one of said first and second processing units; iii. Each of said first and second threads receiving a respective one of said at most two issue groups of a respective one of said first and second instruction packets; iv. Said first processing unit executing one of said at least two issue groups of said first instruction packet and said second processing unit executing one of said at most two issue groups of said second instruction packet in a single clock cycle; v. Each of said at most two issue groups of each of said first and second instruction packets performing an operation on data fetched from an exclusive thread memory communicating with only one of said first and second threads, a result of said operation being stored back in said exclusive thread memory communicating with said only one of said first and second threads. 2. The processor of claim 1 wherein said each of said first and second instruction packets is 128 bits wide. 3. The processor of claim 1 wherein said first instruction packet comprises two issue groups, wherein a first one of said two issue groups is 64 bits wide and a second one of said two issue groups is 48 bits wide. 4. The processor of claim 1 wherein said first instruction packet comprises two issue groups, wherein a first one of said two issue groups is 48 bits wide and a second one of said two issue groups is 64 bits wide. 5. The processor of claim 1 wherein said each of said first and second instruction packets resides in a respective instruction cache and is addressed by a respective program counter. 6. A method for improving performance of a VLIW processor comprising: dividing a first instruction packet into first and second issue groups, each of said first and second issue groups of said first instruction packet comprising at most 64 bits; dividing a second instruction packet into first and second issue groups, each of said first and second issue groups of said second instruction packet comprising at most 64 bits; providing, through a first internal instruction bus no greater than 64 bits wide, said first issue group of said first instruction packet to a first thread having a first thread processing unit and, through a second internal instruction bus no greater than 64 bits wide, said first issue group of said second instruction packet to a second thread having a second thread processing unit during a first clock cycle; and providing, through said first internal instruction bus, said second issue group of said first instruction packet to said first thread having said first thread processing unit and, through said second internal instruction bus, said second issue group of said second instruction packet to said second thread having said second thread processing unit during a second clock cycle, wherein said first instruction packet is a different instruction packet than said second instruction packet; fetching data from an exclusive thread memory communicating with only one of said first and second threads; performing an operation on said data by one of said first and second issue groups of said first instruction packet and said first and second issue groups of said second instruction packet; storing back a result of said operation in said exclusive thread memory communicating with said only one of said first and second threads. 7. The method of claim 6 wherein each of said first and second instruction packets consists of 128 bits. 8. The method of claim 6 wherein said first issue group of said first instruction packet comprises 64 bits and said second issue group of said first instruction packet comprises 48 bits. 9. The method of claim 6 wherein said first issue group of said first instruction packet comprises 48 bits and said second issue group of said first instruction packet comprises 64 bits. 10. The method of claim 6 wherein said first issue group of said second packet comprises 64 bits and said second issue group of said second packet comprises 48 bits. 11. The method of claim 6 wherein said first issue group of said second packet comprises 48 bits and said second issue group of said second packet comprises 64 bits. 12. A method for improving performance of a VLIW processor comprising: i. Dividing a first instruction packet into first and second issue groups and a second instruction packet into first and second issue groups, each of said first and second issue groups of said first instruction packet and said first and second issue groups of said second instruction packet comprising at most 64 bits; ii. Providing each of said first and second issue groups of said first instruction packet and said first and second issue groups of said second instruction packet, in one of two clock cycles, to a respective thread having a respective processing unit, and an internal instruction bus no greater than 64 bits wide for transport to said respective processing unit; iii. Executing said first and second instruction packets in said two clock cycles, wherein an issue group from each of said first and second instruction packets is executed in one of said two clock cycles; iv. Fetching data from an exclusive thread memory communicating with only one thread; v. Performing an operation on said data by one of said first and second issue groups of said first instruction packet and said first and second issue groups of said second instruction packet; vi. Storing back a result of said operation in said exclusive thread memory communicating with said only one thread. 13. The method of claim 12 wherein said each of said first and second instruction packets is 128 bits wide. 14. The method of claim 12 wherein said first issue group of said first instruction packet is 64 bits wide and said second issue group of said first instruction packet is 48 bits wide. 15. The method of claim 12 wherein said first issue group of said first instruction packet is 48 bits wide and said second issue group of said first instruction packet is 64 bits wide. 16. The method of claim 12 wherein said first issue group of said second instruction packet is 64 bits wide and said second issue group of said second instruction packet is 48 bits wide. 17. The method of claim 12 wherein said first issue group of said second instruction packet is 48 bits wide and said second issue group of said second instruction packet is 64 bits wide. 18. The method of claim 12 wherein said each of said first and second instruction packet resides in a respective instruction cache and is addressed by a respective program counter.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (15)
Petit Phillip M. (San Carlos CA), Apparatus and method for multi-threaded program execution in a microcoded data processing system.
Berenbaum, Alan David; Heintze, Nevin; Jeremiassen, Tor E.; Kaxiras, Stefanos, Method and apparatus for identifying splittable packets in a multithreaded VLIW processor.
Gerald G. Pechanek ; Juan Guillermo Revilla ; Edwin F. Barry, Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor.
Keckler Stephen W. (Cambridge MA) Dally William J. (Framingham MA), Multiprocessor coupling system with integrated compile and run time scheduling for parallelism.
Codrescu, Lucian; Padgett, Donald Robert; Plondke, Erich; Simpson, Taylor; Ahmed, Muhammad; Anderson, William C.; Jamil, Sujat, Controlling execution mode of program threads by applying a mask to a control register in a multi-threaded processor.
Ahmed, Muhammad; Plondke, Erich James; Codrescu, Lucian; Anderson, William C., Register files for a digital signal processor operating in an interleaved multi-threaded environment.
Ahmed, Muhammad; Plondke, Erich; Codrescu, Lucian; Anderson, William C., Register files for a digital signal processor operating in an interleaved multi-threaded environment.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.