Techniques are provided for characterizing processor designs and estimating power consumption of software programs executing on processors. A power model of a processor may be obtained by performing simulation using one or more training programs to obtain average power consumption during one or more
Techniques are provided for characterizing processor designs and estimating power consumption of software programs executing on processors. A power model of a processor may be obtained by performing simulation using one or more training programs to obtain average power consumption during one or more windows of operation, then using the results to select parameters and coefficients for a processor characterization equation that can estimate power consumption while minimizing error.
대표청구항▼
What is claimed is: 1. A method of characterizing a processor design comprising: identifying a training program; establishing a plurality of timing windows from a low-level simulation of a processor executing the training program, each of the plurality of timing windows corresponding to a particula
What is claimed is: 1. A method of characterizing a processor design comprising: identifying a training program; establishing a plurality of timing windows from a low-level simulation of a processor executing the training program, each of the plurality of timing windows corresponding to a particular number of processor cycles; determining an average power consumption for each of the plurality of timing windows; identifying one or more parameters; for each of the timing windows, determining a value for each parameter from a high-level simulation of the training program; for each of the timing windows, determining an estimated power consumption equal to a summation of the value of each parameter multiplied by a coefficient corresponding to that parameter; and selecting the coefficients so that the estimated power consumption approximates the average power consumption for a majority of the timing windows. 2. The method of claim 1, wherein: the training program is a first training program; and the timing windows are a first plurality of timing windows; the method further comprising: identifying a second training program; determining an average power consumption for each of a second plurality of timing windows from a low-level simulation of the processor executing the second training program; for each of the second plurality of timing windows, determining a value for each parameter from a high-level simulation of the second training program; for each of the second plurality of timing windows, determining an estimated power consumption equal to a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter; and selecting the coefficients so that the estimated power consumption approximates the average power consumption for a majority of both the first plurality of timing windows and the second plurality of timing windows. 3. The method of claim 1, wherein the estimated power consumption approximates the average power consumption when the difference between the estimated power consumption and the average power consumption is less than a constant. 4. The method of claim 1, wherein the timing windows are a first plurality of timing windows, the method further comprising: identifying a software program; for each of a second plurality of timing windows, determining a value for each parameter from a high-level simulation of the software program; and for each of the second plurality of timing windows, calculating an estimated power consumption of the software program from a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter. 5. The method of claim 1, wherein the training program represents an application domain. 6. The method of claim 1, wherein: for each of the timing windows, a power estimation error is equal to the difference between the estimated power consumption and the average power consumption; and the estimated power consumption approximates the average power consumption when an error value is minimized, wherein the error value is equal to one selected from: a maximum power estimation error, wherein the maximum power estimation error is equal to largest power estimation error for all timing windows; an average power estimation error, wherein the average power estimation error is equal to average of the power estimation error for each timing window; and a weighted sum of the maximum power estimation error and the average power estimation error. 7. The method of claim 1, wherein the low-level simulation is a gate-level simulation and the high-level simulation is an instruction-set simulation. 8. The method of claim 1, wherein the parameters include at least one selected from: the number of instruction cache misses; the number of data cache misses; the number of untaken branch instructions predicted as untaken branch; the number of taken branch instructions predicted as taken branch; the number of taken branch instructions predicted as untaken branch; the number of untaken branch instructions predicted as taken branch; the number of branch instructions which use a link register; the number of branch instructions which always jump to an absolute address; the number of load instructions executed; the number of store instructions executed; the number of load instructions which cause a data cache miss; the number of store instructions which cause a data cache miss; the number of instructions which move data from general purpose register to off-chip memory; the number of instructions which move data from off-chip memory to general purpose register; the number of instruction cache accesses; the number of data cache accesses; the number of taken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of untaken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache miss; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache hit; the number of pairs of two consecutive store instructions which do access the same cache line; the number of pairs of two consecutive store instructions which do not access the same cache line; and the number of instructions using way-1 and way-2 of the 2-way VLIW processor. 9. The method of claim 1, wherein the summation of the value of each parameter multiplied by the corresponding coefficient provides a linear equation that models the processor. 10. The method of claim 1, wherein: the high-level simulation is obtained from a cycle inaccurate simulator; and the particular number of processor cycles is estimated using a linear equation. 11. The method of claim 1, wherein one or more of the parameters are deleted based on an evaluation of the coefficients selected. 12. The method of claim 1, wherein one or more new parameters are added to the parameters based on an evaluation of a difference between the estimated power consumption and the average power consumption. 13. A device for characterizing a processor design comprising: a characterization tool operable to identify a training program; a timing divider operable to establish a plurality of timing windows from a low-level simulation of a processor executing the training program, each of the plurality of timing windows corresponding to a particular number of processor cycles; a power calculator operable to determine an average power consumption for each of the plurality of timing windows; a parameter extractor operable to identify one or more parameters and further operable to, for each of the timing windows, determine a value for each parameter from a high-level simulation of the training program; and a linear programming module operable to, for each of the timing windows, determine an estimated power consumption equal to a summation of the value of each parameter multiplied by a coefficient corresponding to that parameter, the linear programming module further operable to select the coefficients so that the estimated power consumption approximates the average power consumption for a majority of the timing windows. 14. The device of claim 13, wherein: the training program is a first training program; the timing windows are a first plurality of timing windows; the characterization tool is further operable to identify a second training program; the power calculator is further operable to determine an average power consumption for each of a second plurality of timing windows from a low-level simulation of the processor executing the second training program; the parameter extractor is further operable to, for each of the second plurality of timing windows, determine a value for each parameter from a high-level simulation of the second training program; and the linear programming module is further operable to, for each of the second plurality of timing windows, determine an estimated power consumption equal to a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter, the linear programming module further operable to select the coefficients so that the estimated power consumption approximates the average power consumption for a majority of both the first plurality of timing windows and the second plurality of timing windows. 15. The device of claim 13, wherein the estimated power consumption approximates the average power consumption when the difference between the estimated power consumption and the average power consumption is less than a constant. 16. The device of claim 13, further comprising a host processor operable to identify a software program; wherein: the timing windows are a first plurality of timing windows; the parameter extractor is further operable to, for each of a second plurality of timing windows, determine a value for each parameter from a high-level simulation of the software program; and the host processor is further operable to, for each of the second plurality of timing windows, determine an estimated power consumption of the software program equal to a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter. 17. The device of claim 13, wherein the training program represents an application domain. 18. The device of claim 13, wherein: for each of the timing windows, a power estimation error is equal to the difference between the estimated power consumption and the average power consumption; and the estimated power consumption approximates the average power consumption when an error value is minimized, wherein the error value is equal to one selected from: a maximum power estimation error, wherein the maximum power estimation error is equal to largest power estimation error for all timing windows; an average power estimation error, wherein the average power estimation error is equal to average of the power estimation error for each timing window; and a weighted sum of the maximum power estimation error and the average power estimation error. 19. The device of claim 13, wherein the low-level simulation is a gate-level simulation and the high-level simulation is an instruction-set simulation. 20. The device of claim 13, wherein the parameters include at least one selected from: the number of instruction cache misses; the number of data cache misses; the number of untaken branch instructions predicted as untaken branch; the number of taken branch instructions predicted as taken branch; the number of taken branch instructions predicted as untaken branch; the number of untaken branch instructions predicted as taken branch; the number of branch instructions which use a link register; the number of branch instructions which always jump to an absolute address; the number of load instructions executed; the number of store instructions executed; the number of load instructions which cause a data cache miss; the number of store instructions which cause a data cache miss; the number of instructions which move data from general purpose register to off-chip memory; the number of instructions which move data from off-chip memory to general purpose register; the number of instruction cache accesses; the number of data cache accesses; the number of taken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of untaken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache miss; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache hit; the number of pairs of two consecutive store instructions which do access the same cache line; the number of pairs of two consecutive store instructions which do not access the same cache line; and the number of instructions using way-1 and way-2 of the 2-way VLIW processor. 21. Logic for characterizing a processor design, the logic encoded in media and operable when executed to: identify a training program; establish a plurality of timing windows from a low-level simulation of a processor executing the training program, each of the plurality of timing windows corresponding to a particular number of processor cycles; determine an average power consumption for each of the plurality of timing windows; identify one or more parameters; for each of the timing windows, determine a value for each parameter from a high-level simulation of the training program; for each of the timing windows, determine an estimated power consumption equal to a summation of the value of each parameter multiplied by a coefficient corresponding to that parameter; and select the coefficients so that the estimated power consumption approximates the average power consumption for a majority of the timing windows. 22. The logic of claim 21, wherein: the training program is a first training program; and the timing windows are a first plurality of timing windows; the logic further operable to: identify a second training program; determine an average power consumption for each of a second plurality of timing windows from a low-level simulation of the processor executing the second training program; for each of the second plurality of timing windows, determine a value for each parameter from a high-level simulation of the second training program; for each of the second plurality of timing windows, determine an estimated power consumption equal to a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter; and select the coefficients so that the estimated power consumption approximates the average power consumption for a majority of both the first plurality of timing windows and the second plurality of timing windows. 23. The logic of claim 21, wherein the estimated power consumption approximates the average power consumption when the difference between the estimated power consumption and the average power consumption is less than a constant. 24. The logic of claim 21, wherein the timing windows are a first plurality of timing windows, the logic further operable to: identify a software program; for each of a second plurality of timing windows, determine a value for each parameter from a high-level simulation of the software program; and for each of the second plurality of timing windows, determine an estimated power consumption of the software program equal to a summation of the value of each parameter multiplied by the coefficient corresponding to that parameter. 25. The logic of claim 21, wherein the training program represents an application domain. 26. The logic of claim 21, wherein: for each of the timing windows, a power estimation error is equal to the difference between the estimated power consumption and the average power consumption; and the estimated power consumption approximates the average power consumption when an error value is minimized, wherein the error value is equal to one selected from: a maximum power estimation error, wherein the maximum power estimation error is equal to largest power estimation error for all timing windows; an average power estimation error, wherein the average power estimation error is equal to average of the power estimation error for each timing window; and a weighted sum of the maximum power estimation error and the average power estimation error. 27. The logic of claim 21, wherein the low-level simulation is a gate-level simulation and the high-level simulation is an instruction-set simulation. 28. The logic of claim 21, wherein the parameters include at least one selected from: the number of instruction cache misses; the number of data cache misses; the number of untaken branch instructions predicted as untaken branch; the number of taken branch instructions predicted as taken branch; the number of taken branch instructions predicted as untaken branch; the number of untaken branch instructions predicted as taken branch; the number of branch instructions which use a link register; the number of branch instructions which always jump to an absolute address; the number of load instructions executed; the number of store instructions executed; the number of load instructions which cause a data cache miss; the number of store instructions which cause a data cache miss; the number of instructions which move data from general purpose register to off-chip memory; the number of instructions which move data from off-chip memory to general purpose register; the number of instruction cache accesses; the number of data cache accesses; the number of taken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of untaken branch instructions predicted as taken branch, which is located at the end of a cache line; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache miss; the number of pairs of two consecutive load instructions which access the same cache line, where the first instruction results in a cache hit; the number of pairs of two consecutive store instructions which do access the same cache line; the number of pairs of two consecutive store instructions which do not access the same cache line; and the number of instructions using way-1 and way-2 of the 2-way VLIW processor. 29. A system for characterizing a processor design comprising: means for identifying a training program; means for establishing a plurality of timing windows from a low-level simulation of a processor executing the training program, each of the plurality of timing windows corresponding to a particular number of processor cycles; means for determining an average power consumption for each of the plurality of timing windows; means for identifying one or more parameters; means for determining, for each of the timing windows, a value for each parameter from a high-level simulation of the training program; means for determining, for each of the timing windows, an estimated power consumption equal to a summation of the value of each parameter multiplied by a coefficient corresponding to that parameter; and means for selecting the coefficients so that the estimated power consumption approximates the average power consumption for a majority of the timing windows.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (11)
Bose,Pradip; Karkhanis,Tejas S.; Ramani,Srinivasan; Ware,Malcolm Scott; Vu,Ken, Architectural level throughput based power modeling methodology and apparatus for pervasively clock-gated processor cores.
Crafts Harold S. (Colorado Springs CO) Blinne Richard D. (Ft. Collins CO), Method and apparatus for calculating dynamic power dissipation in CMOS integrated circuits.
Gehman John B. ; Johns-Vano Kerry Lucille ; Steward Colleen Kane, Processor power consumption estimator that using instruction and action formulas which having average static and dynamic power coefficients.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Budget-based power consumption for application execution on a plurality of compute nodes.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Budget-based power consumption for application execution on a plurality of compute nodes.
Weber, Wolf-Dietrich; Fan, Xiaobo; Barroso, Luiz Andre, Data center load monitoring for utilizing an access power amount based on a projected peak power usage and a monitored power usage.
Boss, Gregory J.; Doran, James R.; Hamilton, II, Rick A.; Sand, Anne R., Framework for distribution of computer workloads based on real-time energy costs.
Sasaki, Takayuki; Kotegawa, Hirohisa; Tokue, Takashi; Satoh, Shigeru; Miyakawa, Tatsuro; Fujita, Ryuji; Niitsuma, Junichi, Method and system for power estimation based on a number of signal changes.
Fan, Xiaobo; Hennecke, Mark D.; Heath, Taliver Brooks, Method of correlating power in a data center by fitting a function to a plurality of pairs of actual power draw values and estimated power draw values determined from monitored CPU utilization of a statistical sample of computers in the data center.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Archer, Charles J.; Carey, James E.; Markland, Matthew W.; Sanders, Philip J., Monitoring operating parameters in a distributed computing system with active messages.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Profiling an application for power consumption during execution on a plurality of compute nodes.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Reducing power consumption during execution of an application on a plurality of compute nodes.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the nodes during execution.
Archer, Charles J.; Blocksome, Michael A.; Randles, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Thread selection according to power characteristics during context switching on compute nodes.
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E., Thread selection according to predefined power characteristics during context switching on compute nodes.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.