Dynamic load balancing of instructions for execution by heterogeneous processing engines
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-009/46
출원번호
US-0831873
(2007-07-31)
등록번호
US-8578387
(2013-11-05)
발명자
/ 주소
Mills, Peter C.
Oberman, Stuart F.
Lindholm, John Erik
Liu, Samuel
출원인 / 주소
Nvidia Corporation
대리인 / 주소
Patterson & Sheridan, L.L.P.
인용정보
피인용 횟수 :
2인용 특허 :
21
초록▼
An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by
An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
대표청구항▼
1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising: computing, prior to assigning instructions included in a set of unassigned instructions, a first initial
1. A computer-implemented method for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising: computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;overriding the target specified by the dual-issue instruction;assigning the dual-issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count;receiving a control instruction;extracting a target address from the control instruction; andreading and executing one or more instructions starting at the target address. 2. The computer-implemented method of claim 1, further comprising: determining that a second instruction is assigned to the first processing engine for execution; andcomputing a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine. 3. The computer-implemented method of claim 2, further comprising updating the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count. 4. The computer-implemented method of claim 3, further comprising updating the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution. 5. The computer-implemented method of claim 1, wherein a first weighted value computed for execution of the dual issue instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue program instruction by the second processing engine. 6. A non-transitory computer-readable medium storing instructions for causing a SIMD architecture processor that includes heterogeneous processing engines to dynamically load balance instruction execution by performing the steps of: computing, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with a first processing engine of the heterogeneous processing engines and a second initial weighted instruction count associated with a second processing engine of the heterogeneous processing engines, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;assigning the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;computing a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;computing a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;determining that the first weighted instruction count associated with the first processing engine is greater than the second weighted instruction count associated with the second processing engine;overriding the target specified by the dual-issue instruction;assigning the dual issue program instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction countreceiving a control instruction;extracting a target address from the control instruction; andreading and executing one or more instructions starting at the target address. 7. The non-transitory computer-readable medium of claim 6, further comprising the steps of: determining that a second instruction is assigned to the first processing engine for execution; andcomputing a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine. 8. The non-transitory computer-readable medium of claim 7, further comprising the step of updating the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count. 9. The non-transitory computer-readable medium of claim 8, further comprising the step of updating the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution. 10. The non-transitory computer-readable medium of claim 6, wherein a first weighted value computed for execution of the dual issue instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue instruction by the second processing engine. 11. A system for dynamically load balancing instruction execution in a single-instruction multiple-data (SIMD) architecture with heterogeneous processing engines, comprising: a first processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a first type in parallel for multiple threads in a SIMD thread group, wherein only the first processing engine is configured to execute instructions of the first type;a second processing engine of the heterogeneous processing engines that is configured to execute dual issue instructions and instructions of a second type that is different than the first type in parallel for the multiple threads in the SIMD thread group, wherein only the second processing engine is configured to execute instructions of the second type;a work distribution unit coupled to the first processing engine and the second processing engine and configured to: compute, prior to assigning instructions included in a set of unassigned instructions, a first initial weighted instruction count associated with the first processing engine and a second initial weighted instruction count associated with the second processing engine, based on a set of expected latencies associated with the set of unassigned instructions, wherein a dual-issue instruction that is included in the set of unassigned instructions and that is configured to specify the first processing engine as a target is assigned a weighted value of zero in both the first initial weighted instruction count and the second initial weighted instruction count, wherein only the first processing engine is configured to execute instructions of a first single-issue type included in the set of unassigned instructions, wherein only the second processing engine is configured to execute instructions of a second single-issue type that is different than the instructions of the first single-issue type included in the set of unassigned instructions, and wherein the dual-issue instruction is executable in parallel for multiple threads in a SIMD thread group by the first processing engine and the second processing engine;assign the instructions in the set of unassigned instructions to the first processing engine or the second processing engine based on the first initial weighted instruction count and the second initial weighted instruction count;compute a first weighted instruction count for the instructions assigned to the first processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the first processing engine;compute a second weighted instruction count for the instructions assigned to the second processing engine that is proportional to an execution latency corresponding to instructions that are assigned to the second processing engine;determine that a first weighted instruction count associated with the first processing engine is greater than a second weighted instruction count associated with the second processing engine,override the target specified by the dual-issue instruction, andassign the dual issue instruction for execution by the second processing engine based on the first weighted instruction count being greater than the second weighted instruction count; andan instruction unit included in the first processing engine that is configured to: receive a control instruction,extract a target address from the control instruction, andread and cause one or more instructions in the set of unassigned instructions to be executed at the target address. 12. The system of claim 11, wherein the work distribution unit is further configured to: determine that a second instruction is assigned to the first processing engine for execution;compute a weighted value for the second instruction that is proportional to an execution latency that is incurred when the second instruction is executed by the first processing engine. 13. The system of claim 12, wherein the work distribution unit is further configured to update the first weighted instruction count associated with the first processing engine by adding the weighted value for the second instruction to the first weighted instruction count. 14. The system of claim 13, wherein the work distribution unit is further configured to update the first weighted instruction count associated with the first processing engine by subtracting the weighted value for the second instruction from the first weighted instruction count when the second instruction is dispatched for execution. 15. The system of claim 11, wherein a first weighted value computed for execution of the dual issue program instruction by the first processing engine does not equal a second weighted value computed for execution of the dual issue instruction by the second processing engine.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (21)
Eilert Catherine K. (Wappingers Falls NY) Pierce Bernard R. (Poughkeepsie NY), Apparatus and method for managing a server workload according to client performance goals in a client/server data proces.
Favor John G., Microprocessor including multiple register files mapped to the same logical storage and inhibiting sychronization between the register files responsive to inclusion of an instruction in an instructio.
Lindholm,John E.; Coon,Brett W., Prioritized issuing of operation dedicated execution unit tagged instructions from multiple different type threads performing different set of operations.
Mills,Peter C.; Lindholm,John Erik; Coon,Brett W.; Tarolli,Gary M.; Burgess,John Matthew, Scheduling instructions from multi-thread instruction buffer based on phase boundary qualifying rule for phases of math and data access operations with better caching.
Byers Larry L. (Apple Valley MN) De Subijana Joseba M. (Minneapolis MN) Michaelson Wayne A. (Circle Pines MN), System and method for executing branch instructions wherein branch target addresses are dynamically selectable under pro.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.