L16: Power Dissipation in Digital Systems L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 1 Problem #1: Power Dissipation/Heat 18KW 5KW 1.5KW 500W Power (Watts) 10000 1000 Pentium® proc 100 286 486 8086 10 386 8085 8080 8008 1 4004 0.1 1971 1974 1978 1985 1992 2000 2004 2008 Year 10000 Power Density (W/cm2) 100000 Sun’s Surface Rocket Nozzle 1000 Nuclear Reactor 100 8086 10 4004 Hot 8008 8085 386 286 8080 1 1970 1980 Plate P6 Pentium® proc 486 1990 Year 2000 2010 Courtesy Intel (S. Borkar) How do you cool these chips?? heat sink chip L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 2 Problem #2: Energy Consumption The Energy Problem 7.5 cm3 AA battery Alkaline: ~10,000J What can One Joule of energy do? (Image by MIT OCW. Adapted from Jon Eager, Gates Inc. , S. Watanabe, Sony Inc.) Mow your lawn for 1 ms Operate a processor for ~ 7s Send a 1 Megabyte file over 802.11b No Moore’s law for batteries… Today: Understand where power goes and ways to manage it Image by MIT OCW. L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 2 Dynamic Energy Dissipation Discharging VDD Charging E0→1 = CLVDD2 VDD iDD RP IN =0 Ecap = 1/2CLVDD2 Ediss, RP = 1/2CLVDD RN CL RP Ediss,RN =1/2CLVDD2 2 IN =1 RN CL P = CL VDD2 fclk L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 3 The Transition Activity Factor α0−>1 Current Input 00 00 00 00 01 01 01 01 10 10 10 10 11 11 11 11 L16: 6.111 Spring 2006 Next Input 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 Output Transition 1 −> 1 1 −> 1 1 −> 1 1 −> 0 1 −> 1 1 −> 1 1 −> 1 1 −> 0 1 −> 1 1 −> 1 1 −> 1 1 −> 0 0 −> 1 0 −> 1 0 −> 1 0 −> 0 A B Z Assume inputs (A,B) arrive at f and are uniformly distributed What is the average power dissipation? α0−>1 = 3/16 P = α0−>1 CL VDD2 f Introductory Digital Systems Laboratory 4 Junction (Silicon) Temperature Simple Scenario Realistic Scenario TJ Silicon TC Case Silicon TS Sink Tj-Ta= RθJA PD TJ RθJA is the thermal resistance between silicon and Ambient RθJC PD TJ TC RθCS RθJA PD TA TS TA RθSA Tj= Ta + RθJA PD TA Make this as low as possible RθCA = RθCS + RθSA is minimized by facilitating heat transfer (bolt case to extended metal surface – heat sink) L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 5 Intel Pentium 4 Thermal Guidelines Pentium 4 @ 3.06 GHz dissipates 81.8W! Maximum TC = 69 °C RCA < 0.23 °C/W for 50 C ambient Typical chips dissipate 0.5-1W (cheap packages without forced air cooling) Image by MIT OpenCourseWare. L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory Image by MIT OpenCourseWare. Adapted from Intel Pentium 4 documentation. 6 Power Reduction Strategies P = α0−>1 CL VDD2 f Reduce Transition Activity or Switching Events Reduce Capacitance (e.g., keep wires short) Reduce Power Supply Voltage Frequency is typically fixed by the application, though this can be adjusted to control power Optimize at all levels of design hierarchy L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 7 Clock Gating is a Good Idea! Clock gating reduces activity and is the most common low-power technique used today Global Clock Adder Off Adder Clock + Enable_Adder Multiplier On X Enable_Multiplier Multiplier Clock 100’s of different clocks in a microprocessor Clock Gating Reduces Energy, does it reduce Power? L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 8 Does your GHz Processor run at a GHz? Processor Chip Activity Control Thermal Sensor Note that there is a difference between average and peak power On-chip thermal sensor (diode based), measures the silicon temperature If the silicon junction gets too hot (say 125 °C), then the activity is reduced (e.g., reduce clock rate or use clock gating) Use of Thermal Feedback L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 9 Power Supply Resonance Lpackage Lboard Rgrid On-die decap Board decap Switching currents Can write a Virus to Activate Power Supply Resonance! Image removed due to copyright restrictions. Image removed due to copyright restrictions. Image removed due to copyright restrictions. L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 10 Number Representation: Two’s Complement vs. Sign Magnitude Two’s complement Sign-Magnitude -7 -6 1111 +0 1110 -5 +1 0000 0001 +2 1101 -4 -3 -2 0010 1100 0011 +3 1011 0100 +4 1010 0101 1001 -1 +5 0110 1000 -0 0111 +6 +7 Consider a 16 bit bus where inputs toggles between +1 and –1 (i.e., a small noise input) Which representation is more energy efficient? L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 11 Time Sharing is a Bad Idea 2 Time Sharing Increases Switching Activity L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 12 Not just a 6-1 Issue: “Cool” Software ??? address MEMORY address CPU 16 a[0] a[1] a[2] a[3] 0111111100000000 0111111100000001 0111111100000010 0111111100000011 b[0] b[1] b[2] b[3] 1000000000000000 1000000000000001 1000000000000010 1000000000000011 float a [256], b[256]; float pi= 3.14; float a [256], b[256]; float pi= 3.14; for (i = 0; i < 255; i++) { a[i] = sin(pi * i /256); b[i] = cos(pi * i /256); } for (i = 0; i < 255; i++) {a[i] = sin(pi * i /256);} for (i = 0; i < 255; i++) {b[i] = cos(pi * i /256);} 512(8)+2+4+8+16+32+64+128+256 2(8)+2(2+4+8+16+32+64+128+256) = 4607 bit transitions = 1030 transitions L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 13 Glitching Transitions Chain Topology A Tree Topology B + A C + B C + + D D + + (A+B) + (C+D) (((A+B) + C)+D) Balancing paths reduces glitching transitions Structures such as multipliers have lot of glitching transitions Keeping logic depths short (e.g., pipelining) reduces glitching L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 14 Reduce Supply Voltage : But is it Free? t =0+ VDD VDD IN G VS + V DD - K 2 −V ) (V 2 DD T D CL S OUT Delay = C ⋅ ΔV L i D C ⋅ L = V DD 2 k − V )2 (V T 2 DD ∝ VDD (VDD − VT ) 2 ≈ 1 VDD VDD from 2V to 1V, energy ↓ by x4, delay ↑ x2 L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 15 Transistors Are Free… (What do you do with a Billion Transistors?) f =1GHz VDD=2V f = 500Mhz IN VDD=1V IN IN f = 500Mhz VDD=1V X X X OUT SELECT OUT Pserial = Cmult 22 f Pparallel = (2Cmult 12 f /2) = Pserial/4 Trade Area for Low Power L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 16 Algorithmic Workload Image by MIT OCW. Exploit Time Varying Algorithmic Workload To Vary the Power Supply Voltage L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 17 Dynamic Voltage Scaling (DVS) Variable Power Supply Fixed Power Supply IDLE ACTIVE ACTIVE EFIXED = ½ C VDD2 EVARIABLE = ½ C (VDD/2)2 = EFIXED / 4 Normalized Energy 1.0 0.8 0.6 Fixed Supply 0.4 Variable Supply 0.2 0 0 0.2 0.4 0.6 0.8 1.0 Normalized Workload [Gutnik97] L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 18 Energy per Operation DVS on a Processor Digitally adjustable DC-DC converter powers SA-1110 core 1 0.8 0.6 0.4 0.2 0 59.0 88.5 118.0 147.5 176.9 206.4 5 Frequency (MHz) 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Core Voltage (V) 3.6V Controller Vout SA-1110 Figure by MIT OpenCourseWare. Adapted from R. Min, T. Furrer, and A. P. Chandrakasan. "Dynamic Voltage Scaling Techniques for Distributed Microsensor Networks." Workshop on VLSI (April 2000): 43-46. Control μOS μOS selects appropriate clock frequency based on workload and latency constraints L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 19 Energy Efficiency of Software FPGA (Xilinx) Processor (StrongARM-1100) 0.25 CLB CLB C LB C LB Average Current (A) 0.2 0.15 0.1 0.05 0 ARM Instructions Figure by MIT OpenCourseWare. Adapted from A. Sinha, DAC. I/O 45 CLB 9% 40 5% Power (%) 35 30 Clock 25 Interconnect 21% 65% 20 15 10 5 0 Cache Cpntrol GCLK EBOX I/O,PLL Image by MIT OpenCourseWare. Adapted from Kusse 1998, UCB. Figure by MIT OpenCourseWare. Adapted from Montanaro 1996, JSSC. Software” “Software ” Energy Dissipation has Large Overhead L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 20 VDD VDD E = CVDD2 0 C Switching (computing) E = VDDI010 VT/S 1 C Leakage (standby) Total Energy/Switching Energy Trends: Leakage and Power Gating Duty Cycle (%) Low VT devices are leaky - Use a High VT device is used to gate leakage Sleep current L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 21 Trends: Energy Scavenging MEMS Generator Power Harvesting Shoes Image removed due to copyright restrictions. Courtesy of Joe Paradiso (MIT Media Lab). Used with permission. Vibration-to-Electric Conversion After 3-6 steps, it provides 3 mA for 0.5 sec ~10mW ~ 10μW L16: 6.111 Spring 2006 Introductory Digital Systems Laboratory 22