U sing a P erform ance M odel to U sing a P erform ance M odel to E

advertisement
John Griswell
Balaram Sinharoy
Richard J. Eickemeyer
Wael El-Essawy
University of Rochester
ECE Department
Workshop of Complexity-Effective Design
In Conjunction with ISCA 2002
Using a Performance Model to
Estimate Core Clock-gating
Power Savings
Introduction
Complexity Issues
Modeling Methodology
Clock Gating Events Selection Criteria
Sample Clock Gating Events
Performance Model
Conclusions and Future Work
Agenda
Quantifying power savings.
Study power swings.
Steering complexity vs. power savings decision.
Microprocessor power increase raises chip
cooling cost and Introduces reliability
concerns.
70% of the Power4 chip power (115W) is
consumed in the clocking elements [C.
Anderson, et al., ISSCC 2001].
Clock gating ... but:
Introduction
Design complexity, timing, and verification.
Vs. power saving.
Does a clock gating event pull its weight?
Noise, decoupling cap, and area tradeoff.
Clock gating effect on cycle-to-cycle power
swings.
Macro level power consumption.
Macros input switching factors.
Internal node switching.
Circuit level clock gating options.
Architectural level clock gating events
(conditions).
So much data at many design levels.
Complexity Issues
Performance simulator provides estimation of
events occurrence.
Detailed analysis of all macros.
Decoupling, wire, intrinsic, and parasitic cap.
Random input vectors with user-specified SF.
CPAM tool provides macros power numbers.
Circuit signals and associated macros list pairs.
Percentages of macros power saving.
Clock gating events are subject to selection
criteria.
Candidate clock gating events are presented
by circuits designers.
Power Modeling Methodology
Clock Gating
Descriptions
Macros
Gated
Power
Scripts
CPAM
Model
Performance
Model
Event Frequencies
(pre cycle or over time)
Macro Power
Numbers
Power Modeling Methodology Cont.,
Noise induced if too many macros are
gated/activated in one cycle.
Define decoupling cap. values and locations.
Cycle-to-cycle power swing.
Dynamic behavior of unit powers, chip power,
and events patterns.
Dynamic averages over specific time periods.
Achieved preset power savings goal?
Average power for the overall simulation
period.
Power Simulation Modes
Additional decoupling capacitors very close to
trouble spots.
Helps to perform power saving vs. area and noise
tradeoff.
Produce short sub-trace (10 cycles) if a power
swing threshold is hit.
Produce histograms for each unit as well as the
whole core/chip.
Bit vector and macros key map.
Store the state of all events in every cycle for a
million cycles.
Power Swing Modeling Methodology
Position
001
002
...
Event
IFU_01
IFU_02
...
Macros
ifu_ca,...
ifu_ga,...
...
Bit Vector Key
Performance
Model
Cycle
000001
000002
000003
...
Power
Scripts
Macro Power
Numbers
Bit Vector
10100101001101010010...
00110101100111010010...
01100110101011010010...
...
CPAM
Model
Million cycle trace of the state of all events per cycle
Power Swing Modeling Methodology
Cont.,
##
&&
)
%%
!!
.
$$
$$
$$
#! #!
##
""
3
*2
# #
/
0
1
*
/1
,-
/0
.
+,-
'
)*
'(
Frequent.
Considerable power saving.
Select coarse granularity.
Simple.
Limit small number of event per unit.
Shallow depth of gated macros.
Not too many fragmented events.
No new critical paths.
Clock-gating Events Selection
Criteria
ICache & directory
IERAT
Branch prediction array & logic
Microcode ROM
Predecode PLA
Queue entry/section
Not Microcode Inst.
L2 not returning data
Queue entry/section not
written
Clock gated structure(s)
ICache miss
IERAT miss &
IFB full
Event
Clock-gating Events Examples
The power model can be ported to different
architectures/simulators easily.
Modular design, easily modified.
Early rough estimation for power savings.
Highly parameterized.
Evaluate different power savings techniques.
Throttling, adaptive queue sizing, ... etc.
Cycle accurate .
Verified against VHDL.
Trace driven (no data values).
Varying degrees of speculation.
Advantages to using performance model:
Performance Simulation Model
Very large output file size.
Step-power not Ldi/dt.
Not a methodology limitation.
Initial estimate, can be followed by more analysis
once the design is further along.
Accuracy of power estimation.
Data values not in trace => no switching
factors.
Drawbacks of Our Current
Performance Model
© Copyright International Business Machines Corporation 2002. IBM and the IBM logo are registered trademarks or trademarks of the International Business Machines Corporation in the United States
and/or other countries. Other company, product and service names may be trademarks or service marks of others. IBM may not offer the products, programs, services or features discussed herein in other
countries, and the information may be subject to change without notice. General availability may vary by geography. All statements regarding IBM's future direction and intent are subject to change or
withdrawal without notice, and represent goals and objectives only. Any reliance on these statements is at the relying party’s sole risk and will not create any liability or obligation for IBM. IBM may have patents
or pending patent aplications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. Any performance data contained in this document
was determined in a controlled environment. Results obtained in other operating environments may vary significantly.
Integrating power model into performance model.
Integrating power delivery model.
Improving power modeling => account for
switching activity.
Future work:
By integrating data from circuit, macro, and
architectural levels, power savings projections,
design tradeoffs, and cycle-to-cycle power
swings estimates are modeled.
Conclusions and Future Work
Download