John Griswell Balaram Sinharoy Richard J. Eickemeyer Wael El-Essawy University of Rochester ECE Department Workshop of Complexity-Effective Design In Conjunction with ISCA 2002 Using a Performance Model to Estimate Core Clock-gating Power Savings Introduction Complexity Issues Modeling Methodology Clock Gating Events Selection Criteria Sample Clock Gating Events Performance Model Conclusions and Future Work Agenda Quantifying power savings. Study power swings. Steering complexity vs. power savings decision. Microprocessor power increase raises chip cooling cost and Introduces reliability concerns. 70% of the Power4 chip power (115W) is consumed in the clocking elements [C. Anderson, et al., ISSCC 2001]. Clock gating ... but: Introduction Design complexity, timing, and verification. Vs. power saving. Does a clock gating event pull its weight? Noise, decoupling cap, and area tradeoff. Clock gating effect on cycle-to-cycle power swings. Macro level power consumption. Macros input switching factors. Internal node switching. Circuit level clock gating options. Architectural level clock gating events (conditions). So much data at many design levels. Complexity Issues Performance simulator provides estimation of events occurrence. Detailed analysis of all macros. Decoupling, wire, intrinsic, and parasitic cap. Random input vectors with user-specified SF. CPAM tool provides macros power numbers. Circuit signals and associated macros list pairs. Percentages of macros power saving. Clock gating events are subject to selection criteria. Candidate clock gating events are presented by circuits designers. Power Modeling Methodology Clock Gating Descriptions Macros Gated Power Scripts CPAM Model Performance Model Event Frequencies (pre cycle or over time) Macro Power Numbers Power Modeling Methodology Cont., Noise induced if too many macros are gated/activated in one cycle. Define decoupling cap. values and locations. Cycle-to-cycle power swing. Dynamic behavior of unit powers, chip power, and events patterns. Dynamic averages over specific time periods. Achieved preset power savings goal? Average power for the overall simulation period. Power Simulation Modes Additional decoupling capacitors very close to trouble spots. Helps to perform power saving vs. area and noise tradeoff. Produce short sub-trace (10 cycles) if a power swing threshold is hit. Produce histograms for each unit as well as the whole core/chip. Bit vector and macros key map. Store the state of all events in every cycle for a million cycles. Power Swing Modeling Methodology Position 001 002 ... Event IFU_01 IFU_02 ... Macros ifu_ca,... ifu_ga,... ... Bit Vector Key Performance Model Cycle 000001 000002 000003 ... Power Scripts Macro Power Numbers Bit Vector 10100101001101010010... 00110101100111010010... 01100110101011010010... ... CPAM Model Million cycle trace of the state of all events per cycle Power Swing Modeling Methodology Cont., ## && ) %% !! . $$ $$ $$ #! #! ## "" 3 *2 # # / 0 1 * /1 ,- /0 . +,- ' )* '( Frequent. Considerable power saving. Select coarse granularity. Simple. Limit small number of event per unit. Shallow depth of gated macros. Not too many fragmented events. No new critical paths. Clock-gating Events Selection Criteria ICache & directory IERAT Branch prediction array & logic Microcode ROM Predecode PLA Queue entry/section Not Microcode Inst. L2 not returning data Queue entry/section not written Clock gated structure(s) ICache miss IERAT miss & IFB full Event Clock-gating Events Examples The power model can be ported to different architectures/simulators easily. Modular design, easily modified. Early rough estimation for power savings. Highly parameterized. Evaluate different power savings techniques. Throttling, adaptive queue sizing, ... etc. Cycle accurate . Verified against VHDL. Trace driven (no data values). Varying degrees of speculation. Advantages to using performance model: Performance Simulation Model Very large output file size. Step-power not Ldi/dt. Not a methodology limitation. Initial estimate, can be followed by more analysis once the design is further along. Accuracy of power estimation. Data values not in trace => no switching factors. Drawbacks of Our Current Performance Model © Copyright International Business Machines Corporation 2002. IBM and the IBM logo are registered trademarks or trademarks of the International Business Machines Corporation in the United States and/or other countries. Other company, product and service names may be trademarks or service marks of others. IBM may not offer the products, programs, services or features discussed herein in other countries, and the information may be subject to change without notice. General availability may vary by geography. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Any reliance on these statements is at the relying party’s sole risk and will not create any liability or obligation for IBM. IBM may have patents or pending patent aplications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. Any performance data contained in this document was determined in a controlled environment. Results obtained in other operating environments may vary significantly. Integrating power model into performance model. Integrating power delivery model. Improving power modeling => account for switching activity. Future work: By integrating data from circuit, macro, and architectural levels, power savings projections, design tradeoffs, and cycle-to-cycle power swings estimates are modeled. Conclusions and Future Work