A Quick Thermal Tutorial

advertisement
© 2004, 2005, Kevin Skadron
A Quick Thermal Tutorial
Kevin Skadron
Mircea Stan
Univ. of Virginia
HotSpot group
© 2004, 2005, Kevin Skadron
Cooking-aware computing
Some
chips rated for 100°C+
2
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
- Prelim CMP, SMT, and GPU results
© 2004, 2005, Kevin Skadron
6. Thermal Modeling
- Including some illustrative HotSpot details
7. Sensor issues
8. Some research questions
3
Metrics and Design Objectives
• Power
Design for power delivery
• Average power, instantaneous power, peak power
• Energy
Low-Power Design
Power-Aware/
• Energy (MIPS/W) = heat
Energy-Efficient
2
• Energy-Delay product (MIPS /W)
Design
• Energy-Delay2 product (MIPS3/W) – voltage indep.
© 2004, 2005, Kevin Skadron
• Temperature
(Zyuban, GVLSI’02)
• Correlated with power density over sufficiently
large time periods
Temperature-Aware Design
• Localized T, short time scales
vs.
• Coarse granularities
Power-Aware Design
4
Power Dissipation
Processor Alpha
21364
Clock
1.15 GHz
Rate
Power
110W
(Max)
AMD
Opteron
2.2 GHz
HPIBMPA8700 Power 4
870 MHz 1.7 GHz
Intel
Itanium 2
1.5 GHz
Intel
Xeon
3.2 GHz
MIPS
R14000
600 MHz
86 W
75W
130W
86W
16W
100W
© 2004, 2005, Kevin Skadron
Source: Microprocessor Report
5
ITRS Projections
2001 – was 0.4
Year
Tech node (nm)
Vdd (high perf) (V)
Vdd (low power) (V)
Frequency (high perf) (GHz)
High-perf w/ heatsink
Cost-performance
Hand-held
2003
100
1.2
1.0
3.0
149
80
2.1
2006
2010
70
45
1.1
1.0
0.9
0.7
6.8
15.1
Max power (W)
180
198
98
120
2.4
2.8
© 2004, 2005, Kevin Skadron
ITRS 2004
2013
32
0.9
0.6
23.0
2016
22
0.8
0.5
39.7
198
138
3.0
198
158
3.0
2001 – was 288
• These are targets, doubtful that they are feasible
• Growth in power density means cooling costs
continue to grow
• High-performance designs seem to be shifting
away from clock frequency toward # cores
6
Trends in Power Density
1000
Rocket
Nozzle
© 2004, 2005, Kevin Skadron
Watts/cm 2
Nuclear Reactor
100
Pentium® 4
Pentium® III
Pentium® II
Hot plate
10
Pentium® Pro
Pentium®
i386
i486
1
1.5m
1m
0.7m
0.5m
0.35m
0.25m
0.18m
0.13m
0.1m
0.07m
* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” –
Fred Pollack, Intel Corp. Micro32 conference key note - 1999.
7
© 2004, 2005, Kevin Skadron
ITRS quotes – thermal challenges
•
For small dies with high pad count, high power
density, or high frequency, “operating
temperature, etc for these devices exceed the
capabilities of current assembly and packaging
technology.”
•
“Thermal envelopes imposed by affordable
packaging discourage very deep pipelining.”
• Intel recently canceled its NetBurst
microarchitecture
– Press reports suggest thermal envelopes were
a factor
8
Leakage Power
• The fraction of leakage power is increasing
exponentially with each generation
• Also exponentially dependent on temperature
Increasing
ratio
across
generations
Static power/ Dynamic Power
70
50
40
30
20
10
373
368
363
358
353
348
343
338
333
328
323
318
313
308
303
0
298
© 2004, 2005, Kevin Skadron
Percentage
60
Temperature(K)
180nm
130nm
100nm
90nm
Source: Sankaranarayanan et al, University of Virginia
80nm
70nm
9
© 2004, 2005, Kevin Skadron
Why we care about thermal issues
Source: Tom’s Hardware Guide
http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
10
Other Costs of High Heat Flux
•
•
•
•
Packaging, cooling costs
Noise (quiet high-speed fans are expensive)
Form factors
Some chips may already be underclocked
due to thermal constraints!
• (especially mobile and sealed systems)
© 2004, 2005, Kevin Skadron
•
Temperature-dependent phenomena
•
•
•
•
Leakage
IR voltage drop (R is T-dep)
Aging (e.g. EM)
Performance (carrier mobility)
11
Packaging cost
© 2004, 2005, Kevin Skadron
From Cray (local power generator and refrigeration)…
Source: Gordon Bell, “A Seymour Cray perspective”
http://www.research.microsoft.com/users/gbell/craytalk/
12
Intel Pentium 4 packaging
© 2004, 2005, Kevin Skadron
•
Simpler, but still…
Source: Intel web site
13
Graphics Cards
© 2004, 2005, Kevin Skadron
•
Nvidia GeForce 5900 card
Source: Tech-Report.com
14
Apple G5 – liquid cooling
•
•
•
•
Don’t know details
In G5 case, liquid is probably for noise
Lots of people in thermal engineering
community think liquid is inevitable,
especially for server rooms
But others say no:
© 2004, 2005, Kevin Skadron
• This introduces a whole new kind of leakage
problem
• Water and electronics don’t mix!
15
Environment
© 2004, 2005, Kevin Skadron
• Equivalent power (with only 30% efficiency) for AC
• CFCs used for refrigeration
16
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling
7. Sensor issues
8. Some research questions
17
Worst-Case leads to Over-design
• Average case temperature lower than worst-case
• Aggressive clock gating
• Application variations
• Underutilized resources, e.g. FP units during integer code
• Currently 20-40% difference
Reduced target
power density
© 2004, 2005, Kevin Skadron
TDP
Reduced cooling
cost
Source: Gunther et al, ITJ 2001
18
Temporal, Spatial Variations
© 2004, 2005, Kevin Skadron
Temperature variation
of SPEC applu over time
Hot spots increase
cooling costs
 must cool for
hot spot
19
Application Variations
•
•
Wide variation across applications
Architectural and technology trends are making
it worse, e.g. simultaneous multithreading
(SMT)
•
Leakage is an especially severe problem:
exponentially dependent on temperature!
ST
SMT
420
420
Kelvin
Kelvin
© 2004, 2005, Kevin Skadron
410
410
400
400
390
390
380
380
370
370
gzip
gzip
mcf
mcf
swim mgrid
mgrid applu
applu
swim
eon
eon
mesa
mesa
20
Heat vs. Temperature
•
•
Different time, space scales
Heat: no notion of spatial locality
© 2004, 2005, Kevin Skadron
Temperature-aware computing:
Optimize performance subject to a temperature
constraint
21
Thermal Modeling: P vs. T
•
Power metrics are an unacceptable proxy
© 2004, 2005, Kevin Skadron
•
•
•
Chip-wide average won’t capture hot spots
Localized average won’t capture lateral coupling
Different functional units have different power densities
22
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling
7. Sensor issues
8. Some research questions
23
Thermal consequences
© 2004, 2005, Kevin Skadron
Temperature affects:
• Circuit performance
• Circuit power (leakage)
• IC reliability
• IC and system packaging cost
• Environment
24
Performance and leakage
Temperature affects :
•
Transistor threshold and mobility
•
Subthreshold leakage, gate leakage
•
Ion, Ioff, Igate, delay
•
ITRS: 85°C for high-performance, 110°C for embedded!
© 2004, 2005, Kevin Skadron
Ioff
Ion
NMOS
25
Temperature-aware circuits
• Robustness constraint: sets Ion/Ioff ratio
• Robustness and reliability: Ion/Igate ratio
© 2004, 2005, Kevin Skadron
Idea: keep ratios constant with T: trade leakage for performance!
Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000
26
Garrett et al. “T3…”, ISCAS 2001
Reliability
The Arrhenius Equation: MTF=A*exp(Ea/K*T)
MTF: mean time to failure at T
A: empirical constant
Ea: activation energy
K: Boltzmann’s constant
T: absolute temperature
© 2004, 2005, Kevin Skadron
Failure mechanisms:
Die metalization (Corrosion, Electromigration, Contact spiking)
Oxide (charge trapping, gate oxide breakdown, hot electrons)
Device (ionic contamination, second breakdown, surface-charge)
Die attach (fracture, thermal breakdown, adhesion fatigue)
Interconnect (wirebond failure, flip-chip joint failure)
Package (cracking, whisker and dendritic growth, lid seal failure)
Most of the above increase with T (Arrhenius)
Notable exception: hot electrons are worse at low temperatures
More on this later
27
Heat mechanisms
•
Conduction is the main mechanism in a
single chip
• Conduction is proportional to the temperature
difference (Thot – Tcold) and surface area
© 2004, 2005, Kevin Skadron
•
Convection is the main mechanism in racks,
data centers, etc.
28
Carnot Efficiency
• Note that in all cases, heat transfer is
proportional to ΔT
• This is also one of the reasons energy
“harvesting” in computers is probably not
cost-effective
• ΔT w.r.t. ambient is << 100°
© 2004, 2005, Kevin Skadron
• For example, with a 25W processor,
thermoelectric effect yields only ~50mW
• Solbrekken et al, ITHERM’04
• This is also why Peltier coolers are not
energy efficient
• 10% eff., vs. 30% for a refrigerator
29
Surface-to-surface contacts
© 2004, 2005, Kevin Skadron
• Not negligible, heat crowding
• Thermal greases/epoxy (can “pump-out”)
• Phase Change Films (undergo a transition from solid to semisolid with the application of heat)
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
30
Thermal resistance
© 2004, 2005, Kevin Skadron
•
Θ = rt / A = t / kA
31
Thermal capacitance
© 2004, 2005, Kevin Skadron
• Cth = V· Cp· 
(Aluminum) = 2,710 kg/m3
Cp(Aluminum) = 875 J/(kg-°C)
V = t· A = 0.000025 m3
Cbulk = V· Cp·  = 59.28 J/°C
32
Simplistic steady-state model
All thermal transfer: R = k/A
T_hot
Power density matters!
Ohm’s law for thermals
(steady-state)
V = I · R -> T = P · R
T_amb
T_hot = P · Rth + T_amb
© 2004, 2005, Kevin Skadron
Ways to reduce T_hot:
-
reduce P (power-aware)
-
reduce Rth (packaging, spread heat)
-
reduce T_amb (Alaska?)
-
maybe also take advantage of
transients (Cth)
33
Simplistic dynamic thermal model
© 2004, 2005, Kevin Skadron
Electrical-thermal duality
V  temp (T)
I  power (P)
R  thermal resistance (Rth)
C  thermal capacitance (Cth)
RC  time constant
T_hot
T_amb
KCL
differential eq.
I = C · dV/dt + V/R
difference eq.
V = I/C · t + V/RC · t
thermal domain
T = P/C · t + T/RC · t
(T = T_hot – T_amb)
One can compute stepwise changes in temperature
for any granularity at which one can get P, T, R, C
34
Reliability as f(T)
•
© 2004, 2005, Kevin Skadron
•
•
•
•
Reliability criteria (e.g., DTM thresholds) are typically
based on worst-case assumptions
But actual behavior is often not worst case
So aging occurs more slowly
This means the DTM design is over-engineered!
We can exploit
this, e.g. for DTM
or frequency
Spend
Bank
35
EM Model
t failure

0
1
e
kT (t )
Ea

kT ( t )
© 2004, 2005, Kevin Skadron
Life Consumption
Rate:
dt  th , th  const
1
R (t ) 
e
kT (t )

Ea
kT ( t )
Apply in a “lumped” fashion at the granularity of
microarchitecture units, just like RAMP [Srinivasan et al.]
36
© 2004, 2005, Kevin Skadron
Average slowdown
Reliability-Aware DTM
0.16
0.12
0.08
0.04
0.00
_C
e
s
a
B
re
u
ig
f
on
_C
h
g
i
H
_
n
tio
c
e
v
on
..
.
s
e
R
_
k
ic
h
T
e
at
M
_
d
a
re
p
S
l
a
i
r
DTM_controller
DTM_reliability
37
Temperature limits
•
© 2004, 2005, Kevin Skadron
•
Temperature limits for circuit performance
can be measured
Temperature limits for reliability are at
best an estimate
• 150° is a reasonable rule of thumb for when
immediate damage might occur
• Chips are typically specified at lower
temperatures, 100-125° for both performance
and long-term reliability
• Rule of thumb that every 10° halves circuit
lifetime is false
–Originates from a mil-spec that is
debunked
38
Thermal issues summary
• Temperature affects
performance, power, and reliability
• Architecture-level: conduction only
© 2004, 2005, Kevin Skadron
• Very crude approximation of convection as equivalent
resistance
• Convection: too complicated
– Need CFD!
• Radiation: can be ignored
•
•
•
•
Use compact models for package
Power density is key
Temporal, spatial variation are key
Hot spots drive thermal design
39
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling
7. Sensor issues
8. Some research questions
40
© 2004, 2005, Kevin Skadron
Temperature-Aware Design
•
Worst-case design is wasteful
•
Power management is not sufficient for
chip-level thermal management
• Must target blocks with high power density
• When they are hot
• Spreading heat helps
– Even if energy not affected
– Even if average temperature goes up
• This also helps reduce leakage
41
Role of Architecture?
Dynamic thermal management (DTM)
• Automatic hardware response when temp. exceeds cooling
• Cut power density at runtime, on demand
• Trade reduced costs for occasional performance loss
© 2004, 2005, Kevin Skadron
• Architecture natural granularity for thermal
management
• Activity, temperature correlated within arch. units
• DTM response can target hottest unit: permits
fine-tuned response compared to OS or package
• Modern architectures offer rich opportunities for
remapping computation
– e.g., CMPs/SoCs, graphics processors, tiled
architectures
– e.g., register file
42
Dynamic Thermal Management
(DTM)
© 2004, 2005, Kevin Skadron
Temperature
(Brooks
Martonosi,
HPCA
2001)
Designedand
for Cooling
Capacity w/out
DTM
Designed for Cooling
Capacity w/ DTM
System
Cost Savings
DTM Trigger
Level
DTM Disabled
DTM/Response Engaged
Time
Source: David Brooks 2002
43
DTM
• Worst case design for the external cooling
solution is wasteful
• Yet safe temperatures must be maintained when
worst case happens
© 2004, 2005, Kevin Skadron
• Thermal monitors allow
• Tradeoff between cost and performance
• Cheaper package
– More triggers,
less performance
• Expensive package
– No triggers
full performance
44
© 2004, 2005, Kevin Skadron
Existing DTM Implementations
•
Intel Pentium 4: Global clock gating with
shut-down fail-safe
•
Intel Pentium M: Dynamic voltage scaling
•
Transmeta Crusoe: Dynamic voltage scaling
•
IBM Power 5: Probably fetch gating
•
ACPI: OS configurable combination of passive &
active cooling
•
These solutions sacrifice time (slower or stalled
execution) to reduce power density
Better: a solution in “space”
•
•
Tradeoff between exacerbating leakage (more idle logic) or
reducing leakage (lower temperatures)
45
Previously Published DTM
Techniques
• DVS (Transmeta Crusoe)
• Temperature-tracking frequency scaling
(Skadron et al. 2003)
• Adjusts frequency to account for T-dep. of switching speed
• Clock gating (Pentium 4, Gunther et al. 2001)
• Fetch gating (FG)
(Brooks and Martonosi 2001)
© 2004, 2005, Kevin Skadron
• Feedback-controlled fetch gating (Skadron et al. 2002)
•
•
•
•
Fetch/decode throttling (Motorola G3, Sanchez 1997)
Speculation control (Manne et al. 1998)
Dual pipeline (Lim et al. 2002)
Low-power caches (Huang et al. 2000)
• DEETM framework used a hierarchy of response
• Migrating computation (Skadron et al. 2003,
Heo et al. 2003, Powell et al. 2004)
• Hybrid DTM (Skadron 2004)
46
© 2004, 2005, Kevin Skadron
Alternative: Migrating Computation
This is only a
simplistic
illustrative example
47
Space vs. Time
•
Moving the hotspot, rather than throttling it,
reduces performance overhead by almost 60%
Space
Time
© 2004, 2005, Kevin Skadron
Slowdown Factor
1.40
1.30
1.359
1.270
1.231
1.20
1.112
1.10
1.00
DVS
FG
Hyb
MC
The greater the replication and spread,
the greater the opportunities
48
Future DTM considerations
•
Trend in architecture:
increasing replication
•
•
© 2004, 2005, Kevin Skadron
•
Chip multiprocessors
– Independent CPUs
on a single die
– Ex: IBM Power5
Clustered
organizations
– Semi-autonomous
exec. units in a
single CPU
– Ex: Alpha 21264
Tiled organizations
– Semi-coupled
CPUs
– Ex: RAW, TRIPS
•
Levels of architectural
DTM
•
•
•
Subunit (single queue
entry, register, etc.)
– Lots of replication,
low migration cost
not spread out
Structure (queue,
register file, ALU, etc.)
– Yuck: hard to avoid
throttling
Cluster/tile/core
– Lots of replication,
good spread, but
high migration
cost, and local
hotspots remain
The greater the replication and spread,
the greater the opportunities
49
Previously Published DTM
• DVS (Transmeta Crusoe)
Techniques
• Temperature-tracking frequency scaling
(Skadron et al. 2003)
• Adjusts frequency to account for T-dep. of switching speed
• Clock gating (Pentium 4, Gunther et al. 2001)
• Fetch gating (FG)
(Brooks and Martonosi 2001)
• Feedback-controlled fetch gating (Skadron et al. 2002)
•
•
•
•
Fetch/decode throttling (Motorola G3, Sanchez 1997)
Speculation control (Manne et al. 1998)
Dual pipeline (Lim et al. 2002)
Low-power caches (Huang et al. 2000)
© 2004, 2005, Kevin Skadron
• DEETM framework used a hierarchy of response
• Migrating computation (Skadron et al. 2003,
Heo et al. 2003, Powell et al. 2004)
• Hybrid DTM (Skadron 2004)
50
SMT vs. CMP
•
Recent work with IBM
© 2004, 2005, Kevin Skadron
Δ  25°
Δ  15°
51
SMT vs. CMP, cont.
•
•
CMP is more energy efficient for CPU-bound
workloads
SMT can be more energy efficient for
memory-bound workloads!
• For same # of threads and equal chip size, CMP
has less L2 cache
© 2004, 2005, Kevin Skadron
•
Localized or hybrid hot-spot management,
e.g. intelligent register-file allocation and
throttling, can outperform DVS
52
DTM for GPU, cont.
Texture accesses
Fragments generated
© 2004, 2005, Kevin Skadron
Vertices transformed
(a)
(b)
(c)
(d)
53
© 2004, 2005, Kevin Skadron
DTM for GPU
54
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling and HotSpot
7. Sensor issues
8. Some research questions
55
Thermal modeling
• Want a fine-grained, dynamic model of
temperature
•
•
•
•
At a granularity architects can reason about
That accounts for adjacency and package
That does not require detailed designs
That is fast enough for practical use
• HotSpot - a compact model based on
thermal R, C
© 2004, 2005, Kevin Skadron
• Parameterized to automatically derive a model
based on various
– Architectures
– Power models
– Floorplans
– Thermal Packages
56
Dynamic compact thermal model
© 2004, 2005, Kevin Skadron
Electrical-thermal duality
V temp (T)
I power (P)
R thermal resistance (Rth)
C thermal capacitance (Cth)
RC time constant (Rth Cth)
T_hot
T_amb
Kirchoff Current Law
differential eq. I = C · dV/dt + V/R
thermal domain P = Cth · dT/dt + T/Rth
where T = T_hot – T_amb
At higher granularities of P, Rth, Cth
P, T are vectors and Rth, Cth are circuit matrices
57
Example System
Heat sink
IC Package
© 2004, 2005, Kevin Skadron
Heat spreader
PCB
Die
Pin
Interface
material
58
Modeling the package
• Thermal management allows for packaging
alternatives/shortcuts/interactions
• HotSpot needs a model of packaging
• Basic thermal model:
© 2004, 2005, Kevin Skadron
•
•
•
•
Heat spreader
Heatsink
Interface materials (e.g. epoxy)
Fan/Active cooler
• Thermal resistance due to convection
• Constriction and bulk resistance for fins
• Spreading constriction and bulk resistance for
heatsink base and heat spreader
• Thermal resistance for interface materials
• Thermal capacitance heat spreader and heatsink
59
Equivalent vertical network
• Diagram is simplified – peripheral nodes
Chip
Peripheral spreader nodes
Interface
Spreader
© 2004, 2005, Kevin Skadron
Interface + Sink
Convection
60
Vertical network parameters
• Resistances
• Determined by the corresponding areas and
their cross sectional thickness
• R = resistivity x thickness / Area
• Capacitances
• C = specific heat x thickness x Area
© 2004, 2005, Kevin Skadron
• Peripheral node areas
North
Spreader
West Chip East
South
61
© 2004, 2005, Kevin Skadron
Lateral resistances
• Determined by the floorplan and the length
of shared edges between adjacent blocks
62
Lateral resistances – contd...
© 2004, 2005, Kevin Skadron
• Lengths used for silicon
• Lengths used in the spreader
63
© 2004, 2005, Kevin Skadron
Our model (lateral and vertical)
Interface material
(not shown)
64
Temperature equations
• Fundamental RC differential equation
• P = C dT/dt + T / R
• Steady state
• dT/dt = 0
• P=T/R
© 2004, 2005, Kevin Skadron
• When R and C are network matrices
• Steady state – T = R x P
• Modified transient equation
– dT/dt + (RC)-1 x T = C-1 x P
• HotSpot software mainly solves these
two equations
65
HotSpot
• Time evolution of temperature is driven by
unit activities and power dissipations
averaged over 10K cycles
• Power dissipations can come from any power
simulator, act as “current sources” in RC circuit
('P' vector in the equations)
• Simulation overhead in Wattch/SimpleScalar:
<1%
© 2004, 2005, Kevin Skadron
• Requires models of
• Floorplan: important for adjacency
• Package: important for spreading and time
constants
• R and C matrices are derived from the above
66
Implementation
•
•
Primarily a circuit solver
Steady state solution
•
•
© 2004, 2005, Kevin Skadron
•
Mainly matrix inversion – done in two steps
– Decomposition of the matrix into lower and upper
triangular matrices
– Successive backward substitution of solved variables
Implements the pseudocode from CLR
Transient solution
•
•
•
Inputs – current temperature and power
Output – temperature for the next interval
Computed using a fourth order Runge-Kutta
(RK4) method
67
Transient solution
•
Solves differential equations of the form dT + AT =
B where A and B are constants
•
•
•
© 2004, 2005, Kevin Skadron
•
In HotSpot, A is constant (RC) but B depends on the
power dissipation
Solution – assume constant average power dissipation
within an interval (10 K cycles) and call RK4 at the end
of each interval
In RK4, current temperature (at t) is advanced in
very small steps (t+h, t+2h ...) till the next interval
(10K cycles)
RK – `4` because error term is 4th order i.e., O(h^4)
68
Transient solution contd...
•
•
•
© 2004, 2005, Kevin Skadron
•
4th order error has to be within the required
precision
The step size (h) has to be small enough even for
the maximum slope of the temperature evolution
curve
Transient solution for the differential equation is
of the form Ae-Bt with A and B are dependent on
the RC network
Thus, the maximum value of the slope (AxB) and
the step size are computed accordingly
69
Validation
•
Validated and calibrated using MICRED test chips
© 2004, 2005, Kevin Skadron
•
•
•
9x9 array of power dissipators and sensors
Compared to HotSpot configured with same grid,
package
Within 7% for both steady-state and transient stepresponse
•
Interface material (chip/spreader) matters
70
Validation (cont.)
•
Also validated against an FPGA
• Can instantiate a temp. sensor based on a ring
oscillator and counter
© 2004, 2005, Kevin Skadron
•
Also validated against IBM AnSYS FEM
simulations
71
Notes
•
Note that HotSpot currently measures
temperatures
in the silicon
• But that’s also what the most sensors measure
•
Temperature continues to rise through each
layer of the die
© 2004, 2005, Kevin Skadron
• Temperature in upper-level metal is
considerably higher
• Interconnect model forthcoming
72
Soon to be features
•
Grid model – RC network per grid cell instead of a
block
•
•
•
Multigrid algorithm for faster convergence with
very small elements
Temperature models for wires, pads and interface
material between heat sink and spreader
© 2004, 2005, Kevin Skadron
•
•
•
•
Straightforward extension of “lumpy model”, but regular
and easier to accelerate the computation
See DAC’04 paper
Better (more user friendly) floorplan specification
Automatic floorplan generation using classical
floorplanning algorithms
Interface for package selection
73
HotSpot Summary
© 2004, 2005, Kevin Skadron
• HotSpot is a simple, accurate and fast
architecture level thermal model for
microprocessors
• Over 350 downloads since June’03
• Ongoing active development –
architecture level floorplanning will be
available soon
• Download site
• http://lava.cs.virginia.edu/HotSpot
• Mailing list
•
www.cs.virginia.edu/mailman/listinfo/hotspot
74
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling
7. Sensor issues
8. Some research questions
75
Sensors
© 2004, 2005, Kevin Skadron
Caveat emptor:
We are not well-versed on sensor design;
the following is a digest of information we
have been able to collect from industry
sources and the research literature.
76
© 2004, 2005, Kevin Skadron
Desirable Sensor Characteristics
•
•
•
•
•
•
•
Small area
Low Power
High Accuracy + Linearity
Easy access and low access time
Fast response time (slew rate)
Easy calibration
Low sensitivity to process and supply noise
77
Types of Sensors
(In approx. order of increasing ease to build)
• Thermocouples – voltage output
• Junction between wires of different materials; voltage at
terminals is α Tref – Tjunction
• Often used for external measurements
• Thermal diodes – voltage output
• Biased p-n junction; voltage drop for a known current is
temperature-dependent
© 2004, 2005, Kevin Skadron
• Biased resistors (thermistors) – voltage output
• Voltage drop for a known current is temperature dependent
– You can also think of this as varying R
• Example: 1 KΩ metal “snake”
• BiCMOS, CMOS – voltage or current output
• Rely on reference voltage or current generated from a reference
band-gap circuit; current-based designs often depend on tempdependence of threshold
• 4T RAM cell – decay time is temp-dependent
• [Kaxiras et al, ISLPED’04]
78
Sensors: Problem Issues
• Poor control of CMOS transistor
parameters
• Noisy environment
• Cross talk
• Ground noise
• Power supply noise
© 2004, 2005, Kevin Skadron
• These can be reduced by making the
sensor larger
• This increases power dissipation
• But we may want many sensors
79
“Reasonable” Values
• Based on conversations with engineers at
Sun, Intel, and
HP (Alpha)
• Linearity: not a problem for range of
temperatures of interest
• Slew rate: < 1 μs
© 2004, 2005, Kevin Skadron
• This is the time it takes for the physical sensing
process (e.g., current) to reach equilibrium
• Sensor bandwidth: << 1 MHz, probably 100200 kHz
• This is the sampling rate; 100 kHz = 10 μs
• Limited by slew rate but also A/D
– Consider digitization using a counter
80
“Reasonable” Values: Precision
•
•
Mid 1980s: < 0.1° was possible
Precision
•
•
•
•
© 2004, 2005, Kevin Skadron
•
± 3° is very reasonable
± 2° is reasonable
± 1° is feasible but expensive
< ± 1° is really hard
P: 10s of mW
The limited precision of the G3 sensor
seems to have been a design choice
involving the digitization
81
Calibration
•
Accuracy vs. Precision
• Analogous to mean vs. stdev
•
Calibration deals with accuracy
• The main issue is to reduce inter-die variations
in offset
•
© 2004, 2005, Kevin Skadron
•
Typically requires per-part testing and
configuration
Basic idea: measure offset, store it, then
subtract this from dynamic measurements
82
Dynamic Offset Cancelation
•
•
•
© 2004, 2005, Kevin Skadron
•
•
Rich area of research
Build circuit to continuously, dynamically
detect offset and
cancel it
Typically uses an op-amp
Has the advantage that it adapts to
changing offsets
Has the disadvantage of more complex
circuitry
83
Role of Precision
•
Suppose:
© 2004, 2005, Kevin Skadron
• Junction temperature is J
• Max variation in sensor is S, offset is O
• Thermal emergency is T
•
T=J–S–O
•
Spatial gradients
• If sensors cannot be located exactly at
hotspots, measured temperature may be G°
lower than true hotspot
•
T=J–S–O–G
84
Rate of change of temperature
•
•
© 2004, 2005, Kevin Skadron
•
•
Our FEM simulations suggest maximum
0.1° in about 25-100 μs
This is for power density < 1 W/mm2 die
thickness between 0.2 and 0.7mm, and
contemporary packaging
This means slew rate is not an issue
But sampling rate is!
85
Sensors Summary
• Sensor precision cannot be ignored
• Reducing operating threshold by 1-2 degrees
will affect performance
• Precision of 1° is conceivable but
expensive
© 2004, 2005, Kevin Skadron
• Maybe reasonable for a single sensor or a few
• Precision of 2-3° is reasonable even for a
moderate number of sensors
• Power and area are probably negligible
from the architecture standpoint
• Sampling period <= 10-20 μs
86
Current features
• Specification of arbitrary floorplans
• Format of floorplan file:
© 2004, 2005, Kevin Skadron
• One line per unit
• Line format – <unit-name> \t <width> \t
<height> \t <left-x> \t <bottom-y> \n
• Takes a power trace file as an input and
outputs corresponding temperature trace
• Ability to modify package specifactions
(type of interface material, size and type of
heat spreader and heat sink etc.)
87
Better floorplan specification
© 2004, 2005, Kevin Skadron
• Floorplan of current microprocessors has
a structural similarity
• Floorplans similar to MIPS R10K, Pentium
and the Alpha 21264
• Pipeline order corresponds to floorplan
adjacency
88
Better floorplan specification
© 2004, 2005, Kevin Skadron
• Sample specification (with % areas) that
takes advantage of pipeline order
89
Automatic floorplan for architects
• Why develop an architectural
floorplanning tool?
• Thermal modeling requires adjacency
information.
• Wire delays make performance depend on the
floorplan.
© 2004, 2005, Kevin Skadron
• Goal
• Derive a realistic floorplan using only
microarchitectural information
• Trade off thermal efficiency against latency
• Simulated annealing based floorplan
optimization for thermal, delay and combined
metrics
• Current work. Results will be available
soon
90
Overview
1.
2.
3.
4.
5.
What is thermal?
Why thermal?
Why should architects care?
Useful thermal background
Dynamic thermal mgmt. (DTM)
© 2004, 2005, Kevin Skadron
- Prelim CMP, SMT, and GPU results
6. Thermal Modeling
7. Sensor issues
8. Some research questions
91
Future Work
•
Modeling
• Accelerated or sampled simulation for
multiprogrammed and multi-core simulation
•
Metrics
• Quantify tradeoffs among performance loss,
energy costs, and cooling costs
© 2004, 2005, Kevin Skadron
•
Explore relationship between thermal,
leakage
• Also other physical phenomena, e.g.
parameter variations
92
Future Work: DTM
•
Connect thermal management to OS knowledge of
application requirements
•
•
•
•
Spread heat across the CPU, e.g. compiler
techniques
Exploit opportunities for migration in architectures
with remapping – rich design space
© 2004, 2005, Kevin Skadron
•
•
Allows predictive rather than reactive response
Runtime phase prediction and runtime
profiling can also permit predictive response
eg, CMP, clusters, TRIPS
Sensor fusion to reduce sensor imprecision
•
•
•
•
As much as 50% of DTM overhead is due to sensors
Might use many small, cheap, imprecise sensors
Very interesting paper at ISLPED: 4T RAM as sensor
Need to better understand sensor placement
93
Recap
•
Opportunity and need for temperature-aware design
•
Compact models are a natural modeling abstraction
•
•
The microarchitecture is a natural domain in which to
control on-chip temperature
•
•
•
© 2004, 2005, Kevin Skadron
Applicable in all design stages
ILP matters
Microarchitecture techniques can outperform DVS
Tiled/clustered/reconfigurable/etc. architectures offer a rich
design space
•
Need to understand the multi-dimensional problem of
power, leakage, reliability, and temperature
management
•
Need designs that prevent hotspots
94
Download