© 2004, 2005, Kevin Skadron A Quick Thermal Tutorial Kevin Skadron Mircea Stan Univ. of Virginia HotSpot group © 2004, 2005, Kevin Skadron Cooking-aware computing Some chips rated for 100°C+ 2 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) - Prelim CMP, SMT, and GPU results © 2004, 2005, Kevin Skadron 6. Thermal Modeling - Including some illustrative HotSpot details 7. Sensor issues 8. Some research questions 3 Metrics and Design Objectives • Power Design for power delivery • Average power, instantaneous power, peak power • Energy Low-Power Design Power-Aware/ • Energy (MIPS/W) = heat Energy-Efficient 2 • Energy-Delay product (MIPS /W) Design • Energy-Delay2 product (MIPS3/W) – voltage indep. © 2004, 2005, Kevin Skadron • Temperature (Zyuban, GVLSI’02) • Correlated with power density over sufficiently large time periods Temperature-Aware Design • Localized T, short time scales vs. • Coarse granularities Power-Aware Design 4 Power Dissipation Processor Alpha 21364 Clock 1.15 GHz Rate Power 110W (Max) AMD Opteron 2.2 GHz HPIBMPA8700 Power 4 870 MHz 1.7 GHz Intel Itanium 2 1.5 GHz Intel Xeon 3.2 GHz MIPS R14000 600 MHz 86 W 75W 130W 86W 16W 100W © 2004, 2005, Kevin Skadron Source: Microprocessor Report 5 ITRS Projections 2001 – was 0.4 Year Tech node (nm) Vdd (high perf) (V) Vdd (low power) (V) Frequency (high perf) (GHz) High-perf w/ heatsink Cost-performance Hand-held 2003 100 1.2 1.0 3.0 149 80 2.1 2006 2010 70 45 1.1 1.0 0.9 0.7 6.8 15.1 Max power (W) 180 198 98 120 2.4 2.8 © 2004, 2005, Kevin Skadron ITRS 2004 2013 32 0.9 0.6 23.0 2016 22 0.8 0.5 39.7 198 138 3.0 198 158 3.0 2001 – was 288 • These are targets, doubtful that they are feasible • Growth in power density means cooling costs continue to grow • High-performance designs seem to be shifting away from clock frequency toward # cores 6 Trends in Power Density 1000 Rocket Nozzle © 2004, 2005, Kevin Skadron Watts/cm 2 Nuclear Reactor 100 Pentium® 4 Pentium® III Pentium® II Hot plate 10 Pentium® Pro Pentium® i386 i486 1 1.5m 1m 0.7m 0.5m 0.35m 0.25m 0.18m 0.13m 0.1m 0.07m * “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999. 7 © 2004, 2005, Kevin Skadron ITRS quotes – thermal challenges • For small dies with high pad count, high power density, or high frequency, “operating temperature, etc for these devices exceed the capabilities of current assembly and packaging technology.” • “Thermal envelopes imposed by affordable packaging discourage very deep pipelining.” • Intel recently canceled its NetBurst microarchitecture – Press reports suggest thermal envelopes were a factor 8 Leakage Power • The fraction of leakage power is increasing exponentially with each generation • Also exponentially dependent on temperature Increasing ratio across generations Static power/ Dynamic Power 70 50 40 30 20 10 373 368 363 358 353 348 343 338 333 328 323 318 313 308 303 0 298 © 2004, 2005, Kevin Skadron Percentage 60 Temperature(K) 180nm 130nm 100nm 90nm Source: Sankaranarayanan et al, University of Virginia 80nm 70nm 9 © 2004, 2005, Kevin Skadron Why we care about thermal issues Source: Tom’s Hardware Guide http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html 10 Other Costs of High Heat Flux • • • • Packaging, cooling costs Noise (quiet high-speed fans are expensive) Form factors Some chips may already be underclocked due to thermal constraints! • (especially mobile and sealed systems) © 2004, 2005, Kevin Skadron • Temperature-dependent phenomena • • • • Leakage IR voltage drop (R is T-dep) Aging (e.g. EM) Performance (carrier mobility) 11 Packaging cost © 2004, 2005, Kevin Skadron From Cray (local power generator and refrigeration)… Source: Gordon Bell, “A Seymour Cray perspective” http://www.research.microsoft.com/users/gbell/craytalk/ 12 Intel Pentium 4 packaging © 2004, 2005, Kevin Skadron • Simpler, but still… Source: Intel web site 13 Graphics Cards © 2004, 2005, Kevin Skadron • Nvidia GeForce 5900 card Source: Tech-Report.com 14 Apple G5 – liquid cooling • • • • Don’t know details In G5 case, liquid is probably for noise Lots of people in thermal engineering community think liquid is inevitable, especially for server rooms But others say no: © 2004, 2005, Kevin Skadron • This introduces a whole new kind of leakage problem • Water and electronics don’t mix! 15 Environment © 2004, 2005, Kevin Skadron • Equivalent power (with only 30% efficiency) for AC • CFCs used for refrigeration 16 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling 7. Sensor issues 8. Some research questions 17 Worst-Case leads to Over-design • Average case temperature lower than worst-case • Aggressive clock gating • Application variations • Underutilized resources, e.g. FP units during integer code • Currently 20-40% difference Reduced target power density © 2004, 2005, Kevin Skadron TDP Reduced cooling cost Source: Gunther et al, ITJ 2001 18 Temporal, Spatial Variations © 2004, 2005, Kevin Skadron Temperature variation of SPEC applu over time Hot spots increase cooling costs must cool for hot spot 19 Application Variations • • Wide variation across applications Architectural and technology trends are making it worse, e.g. simultaneous multithreading (SMT) • Leakage is an especially severe problem: exponentially dependent on temperature! ST SMT 420 420 Kelvin Kelvin © 2004, 2005, Kevin Skadron 410 410 400 400 390 390 380 380 370 370 gzip gzip mcf mcf swim mgrid mgrid applu applu swim eon eon mesa mesa 20 Heat vs. Temperature • • Different time, space scales Heat: no notion of spatial locality © 2004, 2005, Kevin Skadron Temperature-aware computing: Optimize performance subject to a temperature constraint 21 Thermal Modeling: P vs. T • Power metrics are an unacceptable proxy © 2004, 2005, Kevin Skadron • • • Chip-wide average won’t capture hot spots Localized average won’t capture lateral coupling Different functional units have different power densities 22 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling 7. Sensor issues 8. Some research questions 23 Thermal consequences © 2004, 2005, Kevin Skadron Temperature affects: • Circuit performance • Circuit power (leakage) • IC reliability • IC and system packaging cost • Environment 24 Performance and leakage Temperature affects : • Transistor threshold and mobility • Subthreshold leakage, gate leakage • Ion, Ioff, Igate, delay • ITRS: 85°C for high-performance, 110°C for embedded! © 2004, 2005, Kevin Skadron Ioff Ion NMOS 25 Temperature-aware circuits • Robustness constraint: sets Ion/Ioff ratio • Robustness and reliability: Ion/Igate ratio © 2004, 2005, Kevin Skadron Idea: keep ratios constant with T: trade leakage for performance! Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000 26 Garrett et al. “T3…”, ISCAS 2001 Reliability The Arrhenius Equation: MTF=A*exp(Ea/K*T) MTF: mean time to failure at T A: empirical constant Ea: activation energy K: Boltzmann’s constant T: absolute temperature © 2004, 2005, Kevin Skadron Failure mechanisms: Die metalization (Corrosion, Electromigration, Contact spiking) Oxide (charge trapping, gate oxide breakdown, hot electrons) Device (ionic contamination, second breakdown, surface-charge) Die attach (fracture, thermal breakdown, adhesion fatigue) Interconnect (wirebond failure, flip-chip joint failure) Package (cracking, whisker and dendritic growth, lid seal failure) Most of the above increase with T (Arrhenius) Notable exception: hot electrons are worse at low temperatures More on this later 27 Heat mechanisms • Conduction is the main mechanism in a single chip • Conduction is proportional to the temperature difference (Thot – Tcold) and surface area © 2004, 2005, Kevin Skadron • Convection is the main mechanism in racks, data centers, etc. 28 Carnot Efficiency • Note that in all cases, heat transfer is proportional to ΔT • This is also one of the reasons energy “harvesting” in computers is probably not cost-effective • ΔT w.r.t. ambient is << 100° © 2004, 2005, Kevin Skadron • For example, with a 25W processor, thermoelectric effect yields only ~50mW • Solbrekken et al, ITHERM’04 • This is also why Peltier coolers are not energy efficient • 10% eff., vs. 30% for a refrigerator 29 Surface-to-surface contacts © 2004, 2005, Kevin Skadron • Not negligible, heat crowding • Thermal greases/epoxy (can “pump-out”) • Phase Change Films (undergo a transition from solid to semisolid with the application of heat) Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001 30 Thermal resistance © 2004, 2005, Kevin Skadron • Θ = rt / A = t / kA 31 Thermal capacitance © 2004, 2005, Kevin Skadron • Cth = V· Cp· (Aluminum) = 2,710 kg/m3 Cp(Aluminum) = 875 J/(kg-°C) V = t· A = 0.000025 m3 Cbulk = V· Cp· = 59.28 J/°C 32 Simplistic steady-state model All thermal transfer: R = k/A T_hot Power density matters! Ohm’s law for thermals (steady-state) V = I · R -> T = P · R T_amb T_hot = P · Rth + T_amb © 2004, 2005, Kevin Skadron Ways to reduce T_hot: - reduce P (power-aware) - reduce Rth (packaging, spread heat) - reduce T_amb (Alaska?) - maybe also take advantage of transients (Cth) 33 Simplistic dynamic thermal model © 2004, 2005, Kevin Skadron Electrical-thermal duality V temp (T) I power (P) R thermal resistance (Rth) C thermal capacitance (Cth) RC time constant T_hot T_amb KCL differential eq. I = C · dV/dt + V/R difference eq. V = I/C · t + V/RC · t thermal domain T = P/C · t + T/RC · t (T = T_hot – T_amb) One can compute stepwise changes in temperature for any granularity at which one can get P, T, R, C 34 Reliability as f(T) • © 2004, 2005, Kevin Skadron • • • • Reliability criteria (e.g., DTM thresholds) are typically based on worst-case assumptions But actual behavior is often not worst case So aging occurs more slowly This means the DTM design is over-engineered! We can exploit this, e.g. for DTM or frequency Spend Bank 35 EM Model t failure 0 1 e kT (t ) Ea kT ( t ) © 2004, 2005, Kevin Skadron Life Consumption Rate: dt th , th const 1 R (t ) e kT (t ) Ea kT ( t ) Apply in a “lumped” fashion at the granularity of microarchitecture units, just like RAMP [Srinivasan et al.] 36 © 2004, 2005, Kevin Skadron Average slowdown Reliability-Aware DTM 0.16 0.12 0.08 0.04 0.00 _C e s a B re u ig f on _C h g i H _ n tio c e v on .. . s e R _ k ic h T e at M _ d a re p S l a i r DTM_controller DTM_reliability 37 Temperature limits • © 2004, 2005, Kevin Skadron • Temperature limits for circuit performance can be measured Temperature limits for reliability are at best an estimate • 150° is a reasonable rule of thumb for when immediate damage might occur • Chips are typically specified at lower temperatures, 100-125° for both performance and long-term reliability • Rule of thumb that every 10° halves circuit lifetime is false –Originates from a mil-spec that is debunked 38 Thermal issues summary • Temperature affects performance, power, and reliability • Architecture-level: conduction only © 2004, 2005, Kevin Skadron • Very crude approximation of convection as equivalent resistance • Convection: too complicated – Need CFD! • Radiation: can be ignored • • • • Use compact models for package Power density is key Temporal, spatial variation are key Hot spots drive thermal design 39 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling 7. Sensor issues 8. Some research questions 40 © 2004, 2005, Kevin Skadron Temperature-Aware Design • Worst-case design is wasteful • Power management is not sufficient for chip-level thermal management • Must target blocks with high power density • When they are hot • Spreading heat helps – Even if energy not affected – Even if average temperature goes up • This also helps reduce leakage 41 Role of Architecture? Dynamic thermal management (DTM) • Automatic hardware response when temp. exceeds cooling • Cut power density at runtime, on demand • Trade reduced costs for occasional performance loss © 2004, 2005, Kevin Skadron • Architecture natural granularity for thermal management • Activity, temperature correlated within arch. units • DTM response can target hottest unit: permits fine-tuned response compared to OS or package • Modern architectures offer rich opportunities for remapping computation – e.g., CMPs/SoCs, graphics processors, tiled architectures – e.g., register file 42 Dynamic Thermal Management (DTM) © 2004, 2005, Kevin Skadron Temperature (Brooks Martonosi, HPCA 2001) Designedand for Cooling Capacity w/out DTM Designed for Cooling Capacity w/ DTM System Cost Savings DTM Trigger Level DTM Disabled DTM/Response Engaged Time Source: David Brooks 2002 43 DTM • Worst case design for the external cooling solution is wasteful • Yet safe temperatures must be maintained when worst case happens © 2004, 2005, Kevin Skadron • Thermal monitors allow • Tradeoff between cost and performance • Cheaper package – More triggers, less performance • Expensive package – No triggers full performance 44 © 2004, 2005, Kevin Skadron Existing DTM Implementations • Intel Pentium 4: Global clock gating with shut-down fail-safe • Intel Pentium M: Dynamic voltage scaling • Transmeta Crusoe: Dynamic voltage scaling • IBM Power 5: Probably fetch gating • ACPI: OS configurable combination of passive & active cooling • These solutions sacrifice time (slower or stalled execution) to reduce power density Better: a solution in “space” • • Tradeoff between exacerbating leakage (more idle logic) or reducing leakage (lower temperatures) 45 Previously Published DTM Techniques • DVS (Transmeta Crusoe) • Temperature-tracking frequency scaling (Skadron et al. 2003) • Adjusts frequency to account for T-dep. of switching speed • Clock gating (Pentium 4, Gunther et al. 2001) • Fetch gating (FG) (Brooks and Martonosi 2001) © 2004, 2005, Kevin Skadron • Feedback-controlled fetch gating (Skadron et al. 2002) • • • • Fetch/decode throttling (Motorola G3, Sanchez 1997) Speculation control (Manne et al. 1998) Dual pipeline (Lim et al. 2002) Low-power caches (Huang et al. 2000) • DEETM framework used a hierarchy of response • Migrating computation (Skadron et al. 2003, Heo et al. 2003, Powell et al. 2004) • Hybrid DTM (Skadron 2004) 46 © 2004, 2005, Kevin Skadron Alternative: Migrating Computation This is only a simplistic illustrative example 47 Space vs. Time • Moving the hotspot, rather than throttling it, reduces performance overhead by almost 60% Space Time © 2004, 2005, Kevin Skadron Slowdown Factor 1.40 1.30 1.359 1.270 1.231 1.20 1.112 1.10 1.00 DVS FG Hyb MC The greater the replication and spread, the greater the opportunities 48 Future DTM considerations • Trend in architecture: increasing replication • • © 2004, 2005, Kevin Skadron • Chip multiprocessors – Independent CPUs on a single die – Ex: IBM Power5 Clustered organizations – Semi-autonomous exec. units in a single CPU – Ex: Alpha 21264 Tiled organizations – Semi-coupled CPUs – Ex: RAW, TRIPS • Levels of architectural DTM • • • Subunit (single queue entry, register, etc.) – Lots of replication, low migration cost not spread out Structure (queue, register file, ALU, etc.) – Yuck: hard to avoid throttling Cluster/tile/core – Lots of replication, good spread, but high migration cost, and local hotspots remain The greater the replication and spread, the greater the opportunities 49 Previously Published DTM • DVS (Transmeta Crusoe) Techniques • Temperature-tracking frequency scaling (Skadron et al. 2003) • Adjusts frequency to account for T-dep. of switching speed • Clock gating (Pentium 4, Gunther et al. 2001) • Fetch gating (FG) (Brooks and Martonosi 2001) • Feedback-controlled fetch gating (Skadron et al. 2002) • • • • Fetch/decode throttling (Motorola G3, Sanchez 1997) Speculation control (Manne et al. 1998) Dual pipeline (Lim et al. 2002) Low-power caches (Huang et al. 2000) © 2004, 2005, Kevin Skadron • DEETM framework used a hierarchy of response • Migrating computation (Skadron et al. 2003, Heo et al. 2003, Powell et al. 2004) • Hybrid DTM (Skadron 2004) 50 SMT vs. CMP • Recent work with IBM © 2004, 2005, Kevin Skadron Δ 25° Δ 15° 51 SMT vs. CMP, cont. • • CMP is more energy efficient for CPU-bound workloads SMT can be more energy efficient for memory-bound workloads! • For same # of threads and equal chip size, CMP has less L2 cache © 2004, 2005, Kevin Skadron • Localized or hybrid hot-spot management, e.g. intelligent register-file allocation and throttling, can outperform DVS 52 DTM for GPU, cont. Texture accesses Fragments generated © 2004, 2005, Kevin Skadron Vertices transformed (a) (b) (c) (d) 53 © 2004, 2005, Kevin Skadron DTM for GPU 54 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling and HotSpot 7. Sensor issues 8. Some research questions 55 Thermal modeling • Want a fine-grained, dynamic model of temperature • • • • At a granularity architects can reason about That accounts for adjacency and package That does not require detailed designs That is fast enough for practical use • HotSpot - a compact model based on thermal R, C © 2004, 2005, Kevin Skadron • Parameterized to automatically derive a model based on various – Architectures – Power models – Floorplans – Thermal Packages 56 Dynamic compact thermal model © 2004, 2005, Kevin Skadron Electrical-thermal duality V temp (T) I power (P) R thermal resistance (Rth) C thermal capacitance (Cth) RC time constant (Rth Cth) T_hot T_amb Kirchoff Current Law differential eq. I = C · dV/dt + V/R thermal domain P = Cth · dT/dt + T/Rth where T = T_hot – T_amb At higher granularities of P, Rth, Cth P, T are vectors and Rth, Cth are circuit matrices 57 Example System Heat sink IC Package © 2004, 2005, Kevin Skadron Heat spreader PCB Die Pin Interface material 58 Modeling the package • Thermal management allows for packaging alternatives/shortcuts/interactions • HotSpot needs a model of packaging • Basic thermal model: © 2004, 2005, Kevin Skadron • • • • Heat spreader Heatsink Interface materials (e.g. epoxy) Fan/Active cooler • Thermal resistance due to convection • Constriction and bulk resistance for fins • Spreading constriction and bulk resistance for heatsink base and heat spreader • Thermal resistance for interface materials • Thermal capacitance heat spreader and heatsink 59 Equivalent vertical network • Diagram is simplified – peripheral nodes Chip Peripheral spreader nodes Interface Spreader © 2004, 2005, Kevin Skadron Interface + Sink Convection 60 Vertical network parameters • Resistances • Determined by the corresponding areas and their cross sectional thickness • R = resistivity x thickness / Area • Capacitances • C = specific heat x thickness x Area © 2004, 2005, Kevin Skadron • Peripheral node areas North Spreader West Chip East South 61 © 2004, 2005, Kevin Skadron Lateral resistances • Determined by the floorplan and the length of shared edges between adjacent blocks 62 Lateral resistances – contd... © 2004, 2005, Kevin Skadron • Lengths used for silicon • Lengths used in the spreader 63 © 2004, 2005, Kevin Skadron Our model (lateral and vertical) Interface material (not shown) 64 Temperature equations • Fundamental RC differential equation • P = C dT/dt + T / R • Steady state • dT/dt = 0 • P=T/R © 2004, 2005, Kevin Skadron • When R and C are network matrices • Steady state – T = R x P • Modified transient equation – dT/dt + (RC)-1 x T = C-1 x P • HotSpot software mainly solves these two equations 65 HotSpot • Time evolution of temperature is driven by unit activities and power dissipations averaged over 10K cycles • Power dissipations can come from any power simulator, act as “current sources” in RC circuit ('P' vector in the equations) • Simulation overhead in Wattch/SimpleScalar: <1% © 2004, 2005, Kevin Skadron • Requires models of • Floorplan: important for adjacency • Package: important for spreading and time constants • R and C matrices are derived from the above 66 Implementation • • Primarily a circuit solver Steady state solution • • © 2004, 2005, Kevin Skadron • Mainly matrix inversion – done in two steps – Decomposition of the matrix into lower and upper triangular matrices – Successive backward substitution of solved variables Implements the pseudocode from CLR Transient solution • • • Inputs – current temperature and power Output – temperature for the next interval Computed using a fourth order Runge-Kutta (RK4) method 67 Transient solution • Solves differential equations of the form dT + AT = B where A and B are constants • • • © 2004, 2005, Kevin Skadron • In HotSpot, A is constant (RC) but B depends on the power dissipation Solution – assume constant average power dissipation within an interval (10 K cycles) and call RK4 at the end of each interval In RK4, current temperature (at t) is advanced in very small steps (t+h, t+2h ...) till the next interval (10K cycles) RK – `4` because error term is 4th order i.e., O(h^4) 68 Transient solution contd... • • • © 2004, 2005, Kevin Skadron • 4th order error has to be within the required precision The step size (h) has to be small enough even for the maximum slope of the temperature evolution curve Transient solution for the differential equation is of the form Ae-Bt with A and B are dependent on the RC network Thus, the maximum value of the slope (AxB) and the step size are computed accordingly 69 Validation • Validated and calibrated using MICRED test chips © 2004, 2005, Kevin Skadron • • • 9x9 array of power dissipators and sensors Compared to HotSpot configured with same grid, package Within 7% for both steady-state and transient stepresponse • Interface material (chip/spreader) matters 70 Validation (cont.) • Also validated against an FPGA • Can instantiate a temp. sensor based on a ring oscillator and counter © 2004, 2005, Kevin Skadron • Also validated against IBM AnSYS FEM simulations 71 Notes • Note that HotSpot currently measures temperatures in the silicon • But that’s also what the most sensors measure • Temperature continues to rise through each layer of the die © 2004, 2005, Kevin Skadron • Temperature in upper-level metal is considerably higher • Interconnect model forthcoming 72 Soon to be features • Grid model – RC network per grid cell instead of a block • • • Multigrid algorithm for faster convergence with very small elements Temperature models for wires, pads and interface material between heat sink and spreader © 2004, 2005, Kevin Skadron • • • • Straightforward extension of “lumpy model”, but regular and easier to accelerate the computation See DAC’04 paper Better (more user friendly) floorplan specification Automatic floorplan generation using classical floorplanning algorithms Interface for package selection 73 HotSpot Summary © 2004, 2005, Kevin Skadron • HotSpot is a simple, accurate and fast architecture level thermal model for microprocessors • Over 350 downloads since June’03 • Ongoing active development – architecture level floorplanning will be available soon • Download site • http://lava.cs.virginia.edu/HotSpot • Mailing list • www.cs.virginia.edu/mailman/listinfo/hotspot 74 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling 7. Sensor issues 8. Some research questions 75 Sensors © 2004, 2005, Kevin Skadron Caveat emptor: We are not well-versed on sensor design; the following is a digest of information we have been able to collect from industry sources and the research literature. 76 © 2004, 2005, Kevin Skadron Desirable Sensor Characteristics • • • • • • • Small area Low Power High Accuracy + Linearity Easy access and low access time Fast response time (slew rate) Easy calibration Low sensitivity to process and supply noise 77 Types of Sensors (In approx. order of increasing ease to build) • Thermocouples – voltage output • Junction between wires of different materials; voltage at terminals is α Tref – Tjunction • Often used for external measurements • Thermal diodes – voltage output • Biased p-n junction; voltage drop for a known current is temperature-dependent © 2004, 2005, Kevin Skadron • Biased resistors (thermistors) – voltage output • Voltage drop for a known current is temperature dependent – You can also think of this as varying R • Example: 1 KΩ metal “snake” • BiCMOS, CMOS – voltage or current output • Rely on reference voltage or current generated from a reference band-gap circuit; current-based designs often depend on tempdependence of threshold • 4T RAM cell – decay time is temp-dependent • [Kaxiras et al, ISLPED’04] 78 Sensors: Problem Issues • Poor control of CMOS transistor parameters • Noisy environment • Cross talk • Ground noise • Power supply noise © 2004, 2005, Kevin Skadron • These can be reduced by making the sensor larger • This increases power dissipation • But we may want many sensors 79 “Reasonable” Values • Based on conversations with engineers at Sun, Intel, and HP (Alpha) • Linearity: not a problem for range of temperatures of interest • Slew rate: < 1 μs © 2004, 2005, Kevin Skadron • This is the time it takes for the physical sensing process (e.g., current) to reach equilibrium • Sensor bandwidth: << 1 MHz, probably 100200 kHz • This is the sampling rate; 100 kHz = 10 μs • Limited by slew rate but also A/D – Consider digitization using a counter 80 “Reasonable” Values: Precision • • Mid 1980s: < 0.1° was possible Precision • • • • © 2004, 2005, Kevin Skadron • ± 3° is very reasonable ± 2° is reasonable ± 1° is feasible but expensive < ± 1° is really hard P: 10s of mW The limited precision of the G3 sensor seems to have been a design choice involving the digitization 81 Calibration • Accuracy vs. Precision • Analogous to mean vs. stdev • Calibration deals with accuracy • The main issue is to reduce inter-die variations in offset • © 2004, 2005, Kevin Skadron • Typically requires per-part testing and configuration Basic idea: measure offset, store it, then subtract this from dynamic measurements 82 Dynamic Offset Cancelation • • • © 2004, 2005, Kevin Skadron • • Rich area of research Build circuit to continuously, dynamically detect offset and cancel it Typically uses an op-amp Has the advantage that it adapts to changing offsets Has the disadvantage of more complex circuitry 83 Role of Precision • Suppose: © 2004, 2005, Kevin Skadron • Junction temperature is J • Max variation in sensor is S, offset is O • Thermal emergency is T • T=J–S–O • Spatial gradients • If sensors cannot be located exactly at hotspots, measured temperature may be G° lower than true hotspot • T=J–S–O–G 84 Rate of change of temperature • • © 2004, 2005, Kevin Skadron • • Our FEM simulations suggest maximum 0.1° in about 25-100 μs This is for power density < 1 W/mm2 die thickness between 0.2 and 0.7mm, and contemporary packaging This means slew rate is not an issue But sampling rate is! 85 Sensors Summary • Sensor precision cannot be ignored • Reducing operating threshold by 1-2 degrees will affect performance • Precision of 1° is conceivable but expensive © 2004, 2005, Kevin Skadron • Maybe reasonable for a single sensor or a few • Precision of 2-3° is reasonable even for a moderate number of sensors • Power and area are probably negligible from the architecture standpoint • Sampling period <= 10-20 μs 86 Current features • Specification of arbitrary floorplans • Format of floorplan file: © 2004, 2005, Kevin Skadron • One line per unit • Line format – <unit-name> \t <width> \t <height> \t <left-x> \t <bottom-y> \n • Takes a power trace file as an input and outputs corresponding temperature trace • Ability to modify package specifactions (type of interface material, size and type of heat spreader and heat sink etc.) 87 Better floorplan specification © 2004, 2005, Kevin Skadron • Floorplan of current microprocessors has a structural similarity • Floorplans similar to MIPS R10K, Pentium and the Alpha 21264 • Pipeline order corresponds to floorplan adjacency 88 Better floorplan specification © 2004, 2005, Kevin Skadron • Sample specification (with % areas) that takes advantage of pipeline order 89 Automatic floorplan for architects • Why develop an architectural floorplanning tool? • Thermal modeling requires adjacency information. • Wire delays make performance depend on the floorplan. © 2004, 2005, Kevin Skadron • Goal • Derive a realistic floorplan using only microarchitectural information • Trade off thermal efficiency against latency • Simulated annealing based floorplan optimization for thermal, delay and combined metrics • Current work. Results will be available soon 90 Overview 1. 2. 3. 4. 5. What is thermal? Why thermal? Why should architects care? Useful thermal background Dynamic thermal mgmt. (DTM) © 2004, 2005, Kevin Skadron - Prelim CMP, SMT, and GPU results 6. Thermal Modeling 7. Sensor issues 8. Some research questions 91 Future Work • Modeling • Accelerated or sampled simulation for multiprogrammed and multi-core simulation • Metrics • Quantify tradeoffs among performance loss, energy costs, and cooling costs © 2004, 2005, Kevin Skadron • Explore relationship between thermal, leakage • Also other physical phenomena, e.g. parameter variations 92 Future Work: DTM • Connect thermal management to OS knowledge of application requirements • • • • Spread heat across the CPU, e.g. compiler techniques Exploit opportunities for migration in architectures with remapping – rich design space © 2004, 2005, Kevin Skadron • • Allows predictive rather than reactive response Runtime phase prediction and runtime profiling can also permit predictive response eg, CMP, clusters, TRIPS Sensor fusion to reduce sensor imprecision • • • • As much as 50% of DTM overhead is due to sensors Might use many small, cheap, imprecise sensors Very interesting paper at ISLPED: 4T RAM as sensor Need to better understand sensor placement 93 Recap • Opportunity and need for temperature-aware design • Compact models are a natural modeling abstraction • • The microarchitecture is a natural domain in which to control on-chip temperature • • • © 2004, 2005, Kevin Skadron Applicable in all design stages ILP matters Microarchitecture techniques can outperform DVS Tiled/clustered/reconfigurable/etc. architectures offer a rich design space • Need to understand the multi-dimensional problem of power, leakage, reliability, and temperature management • Need designs that prevent hotspots 94