TKT-1527 Digital System Design Issues Designing for Low Power Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.] CSE477 L12&13 Low Power.1 Irwin&Vijay, PSU, 2002 Review: Designing Fast CMOS Gates Transistor sizing Progressive transistor sizing Transistor ordering fet closest to the output is smallest of series fets put latest arriving signal closest to the output Logic structure reordering replace large fan-in gates with smaller fan-in gate network Logical effort Buffer (inverter) insertion separate large fan-in from large CL with buffers uses buffers so there are no more than four TGs in series CSE477 L12&13 Low Power.2 Irwin&Vijay, PSU, 2002 Why Power Matters Packaging costs Power supply rail design Chip and system cooling costs Noise immunity and system reliability Battery life (in portable systems) Environmental concerns Office equipment accounted for 5% of total US commercial energy usage in 1993 Energy Star compliant systems CSE477 L12&13 Low Power.3 Irwin&Vijay, PSU, 2002 Why worry about power? -- Power Dissipation Lead microprocessors power continues to increase Power (Watts) 100 P6 Pentium ® 10 8086 286 1 8008 4004 486 386 8085 8080 0.1 1971 1974 1978 1985 1992 2000 Year Power delivery and dissipation will be prohibitive Source: Borkar, De Intel CSE477 L12&13 Low Power.4 Irwin&Vijay, PSU, 2002 Why worry about power? -- Chip Power Density Sun’s Surface Power Density (W/cm2) 10000 Rocket Nozzle 1000 …chips might become hot… Nuclear Reactor 100 8086 Hot Plate 10 4004 P6 8008 8085 Pentium® 386 286 486 8080 1 1970 1980 1990 Year 2000 2010 Source: Borkar, De Intel CSE477 L12&13 Low Power.5 Irwin&Vijay, PSU, 2002 Chip Power Density Distribution Al-SiC+ Epoxy Die Attach WillametteMap Power Distribution Power On-Die Temperature 110 250 100 100 200-250 90 150-200 80 100-150 50-100 70 0-50 60 Temperature (C) 150 Heat Flux (W/cm2) 200 50 50 40 0 Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots Impact on packaging, w.r.t. cooling CSE477 L12&13 Low Power.6 Irwin&Vijay, PSU, 2002 Power and Energy Figures of Merit Power consumption in Watts Peak power determines power ground wiring designs sets packaging limits impacts signal noise margin and reliability analysis Energy efficiency in Joules determines battery life in hours rate at which power is consumed over time Energy = power * delay Joules = Watts * seconds lower energy number means less power to perform a computation at the same frequency CSE477 L12&13 Low Power.7 Irwin&Vijay, PSU, 2002 Power versus Energy Power is height of curve Watts Lower power design could simply be slower Approach 1 Approach 2 Watts time Energy is area under curve Two approaches require the same energy Approach 1 Approach 2 time CSE477 L12&13 Low Power.8 Irwin&Vijay, PSU, 2002 PDP and EDP Power-delay product (PDP) = Pav * tp = (CLVDD2)/2 PDP is the average energy consumed per switching event (Watts * sec = Joule) lower power design could simply be a slower design Energy-delay product (EDP) = PDP * tp = Pav * tp2 EDP is the average energy consumed multiplied by the computation time required takes into account that one can trade increased delay for lower energy/operation (e.g., via supply voltage scaling that increases delay, but decreases energy consumption) Energy-Delay (normalized) 15 energy-delay 10 energy 5 delay 0 0.5 allows one to understand tradeoffs better CSE477 L12&13 Low Power.9 1 1.5 2 2.5 Vdd (V) Irwin&Vijay, PSU, 2002 Understanding Tradeoffs Which design is the “best” (fastest, coolest, both) ? Lower EDP b c a d 1/Delay better CSE477 L12&13 Low Power.11 Irwin&Vijay, PSU, 2002 CMOS Energy & Power Equations E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage f01 = P01 * fclock P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage Dynamic power CSE477 L12&13 Low Power.12 Short-circuit power Leakage power Irwin&Vijay, PSU, 2002 Dynamic Power Consumption Vdd Vin Vout CL Energy/transition = CL * VDD2 f01 * P01 Pdyn = Energy/transition * f = CL * VDD2 * P01 * f Pdyn = CEFF * VDD2 * f where CEFF = P01 CL Not a function of transistor sizes! Data dependent - a function of switching activity! CSE477 L12&13 Low Power.13 Irwin&Vijay, PSU, 2002 Lowering Dynamic Power Capacitance: Function of fan-out, wire length, transistor sizes Supply Voltage: Has been dropping with successive generations Pdyn = CL VDD2 P01 f Activity factor: How often, on average, do wires switch? CSE477 L12&13 Low Power.14 Clock frequency: Increasing… Irwin&Vijay, PSU, 2002 Short Circuit Power Consumption Vin Isc Vout CL Finite slope of the input signal causes a direct current path between VDD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting. CSE477 L12&13 Low Power.15 Irwin&Vijay, PSU, 2002 Short Circuit Currents Determinates Esc = tsc VDD Ipeak P01 Psc = tsc VDD Ipeak f01 Duration and slope of the input signal, tsc Ipeak determined by the saturation current of the P and N transistors which depend on their sizes, process technology, temperature, etc. strong function of the ratio between input and output slopes - a function of CL CSE477 L12&13 Low Power.16 Irwin&Vijay, PSU, 2002 Impact of CL on Psc Isc 0 Vin Isc Imax Vout CL Vin Vout CL Large capacitive load Small capacitive load Output fall time significantly larger than input rise time. Output fall time substantially smaller than the input rise time. CSE477 L12&13 Low Power.17 Irwin&Vijay, PSU, 2002 Ipeak as a Function of CL 2.5 x 10-4 CL = 20 fF 2 When load capacitance is small, Ipeak is large. 1.5 CL = 100 fF 1 0.5 0 0 -0.5 2 Short circuit dissipation is minimized by CL = 500 fF matching the rise/fall times of the input and 4 6 x 10-10 output signals - slope engineering. time (sec) 500 psec input slope CSE477 L12&13 Low Power.18 Irwin&Vijay, PSU, 2002 Psc as a Function of Rise/Fall Times 8 When load capacitance is small (tsin/tsout > 2 for VDD > 2V) the power is dominated by Psc 7 VDD= 3.3 V 6 5 4 VDD = 2.5 V 3 2 1 VDD = 1.5V 0 0 2 tsin/tsou If VDD < VTn + |VTp| then Psc is eliminated since both devices are never on at the same time. 4 t W/Lp = 1.125 m/0.25 m W/Ln = 0.375 m/0.25 m CL = 30 fF CSE477 L12&13 Low Power.19 normalized wrt zero input rise-time dissipation Irwin&Vijay, PSU, 2002 Leakage (Static) Power Consumption VDD Ileakage Vout Drain junction leakage Gate leakage Sub-threshold current Sub-threshold current is the dominant factor. All increase exponentially with temperature! CSE477 L12&13 Low Power.20 Irwin&Vijay, PSU, 2002 Leakage as a Function of VT Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation. 10-2 ID (A) 10-7 VT=0.4V VT=0.1V 10-12 0 0.2 0.4 0.6 0.8 An 90mV/decade VT roll-off - so each 255mV increase in VT gives 3 orders of magnitude reduction in leakage (but adversely affects performance) 1 VGS (V) CSE477 L12&13 Low Power.21 Irwin&Vijay, PSU, 2002 TSMC Processes Leakage and VT CL018 G CL018 LP CL018 ULP CL018 HS CL015 HS CL013 HS Vdd 1.8 V 1.8 V 1.8 V 2V 1.5 V 1.2 V Tox (effective) 42 Å 42 Å 42 Å 42 Å 29 Å 24 Å Lgate 0.16 m 0.16 m 0.18 m 0.13 m 0.11 m 0.08 m IDSat (n/p) (A/m) 600/260 500/180 320/130 780/360 860/370 920/400 20 1.60 0.15 300 1,800 13,000 0.42 V 0.63 V 0.73 V 0.40 V 0.29 V 0.25 V 30 22 14 43 52 80 Ioff (leakage) (A/m) VTn FET Perf. (GHz) From MPR, 2000 CSE477 L12&13 Low Power.22 Irwin&Vijay, PSU, 2002 Exponential Increase in Leakage Currents 10000 Ileakage(nA/m) 1000 0.25 0.18 0.13 0.1 100 10 1 30 40 50 60 70 80 Temp(C) 90 100 110 From De,1999 CSE477 L12&13 Low Power.23 Irwin&Vijay, PSU, 2002 Review: Energy & Power Equations E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage f01 = P01 * fclock P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage Dynamic power (~90% today and decreasing relatively) CSE477 L12&13 Low Power.24 Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Energy Design Time Variable Throughput/Latency Non-active Modules Logic Design Active Reduced Vdd Run Time DFS, DVS Clock Gating Sizing Multi-Vdd (Dynamic Freq, Voltage Scaling) Sleep Transistors Leakage + Multi-VT Multi-Vdd + Variable VT Variable VT CSE477 L12&13 Low Power.25 Irwin&Vijay, PSU, 2002 Dynamic Power as a Function of Device Size Device sizing affects dynamic energy consumption The optimal gate sizing factor (f) for dynamic energy is smaller than the one for performance, especially for large F’s gain is largest for networks with large overall effective fan-outs (F = CL/Cg,1) e.g., for F=20, fopt(energy) = 3.53 while fopt(performance) = 4.47 If energy is a concern avoid oversizing beyond the optimal 1.5 F=1 normalized energy F=2 1 F=5 0.5 F=10 F=20 0 1 2 3 4 f 5 6 7 From Nikolic, UCB CSE477 L12&13 Low Power.26 Irwin&Vijay, PSU, 2002 Dynamic Power Consumption is Data Dependent Switching activity, P01, has two components A static component – function of the logic topology A dynamic component – function of the timing behavior (glitching) 2-input NOR Gate A B Out 0 0 1 0 1 0 1 0 0 1 1 0 CSE477 L12&13 Low Power.27 Static transition probability P01 = Pout=0 x Pout=1 = P0 x (1-P0) With input signal probabilities PA=1 = 1/2 PB=1 = 1/2 NOR static transition probability = 3/4 x 1/4 = 3/16 Irwin&Vijay, PSU, 2002 NOR Gate Transition Probabilities Switching activity is a strong function of the input signal statistics PA and PB are the probabilities that inputs A and B are one A B 0 A B CL PA 1 0 PB 1 P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB) CSE477 L12&13 Low Power.28 Irwin&Vijay, PSU, 2002 Transition Probabilities for Some Basic Gates NOR OR NAND AND XOR P01 = Pout=0 x Pout=1 (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB) (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB)) PAPB x (1 - PAPB) (1 - PAPB) x PAPB (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB) 0.5 A 0.5 B X Z For X: P01 = P0 x P1 = (1-PA) PA = 0.5 x 0.5 = 0.25 For Z: P01 = P0 x P1 = (1-PXPB) PXPB = (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16 CSE477 L12&13 Low Power.30 Irwin&Vijay, PSU, 2002 Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan-out (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 0.5 A 0.5 B X Z Reconvergent (1- 0.75 x 0.5) x (0.75 x 0.5) = 0.234 Incorrect! P(Z=1) = P(B=1) & P(A=1 | B=1) Have to use conditional probabilities CSE477 L12&13 Low Power.32 Irwin&Vijay, PSU, 2002 Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P01 = P0 x P1 = (1 - PAPB) x PAPB 0.5 A B 0.5 (1-0.25)*0.25 = 3/16 7/64 W X 15/256 C F 0.5 D 0.5 0.5 A 0.5 B 0.5 C 0.5 D 3/16 Y 15/256 F Z 3/16 Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects CSE477 L12&13 Low Power.33 Irwin&Vijay, PSU, 2002 Input Ordering (1-0.5x0.2)x(0.5x0.2)=0.09 0.5 A B 0.2 X C 0.1 F 0.2 B C 0.1 (1-0.2x0.1)x(0.2x0.1)=0.0196 X A 0.5 F Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) CSE477 L12&13 Low Power.35 Irwin&Vijay, PSU, 2002 Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A B X Z C ABC 101 000 X Z Unit Delay CSE477 L12&13 Low Power.37 Irwin&Vijay, PSU, 2002 Glitching in an RCA Cin S14 S15 S0 S1 S2 S Output Voltage (V) 3 S3 2 S4 Cin S2 S15 S5 1 S10 S1 S0 0 0 CSE477 L12&13 Low Power.38 2 4 6 Time (ps) 8 10 12 Irwin&Vijay, PSU, 2002 Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs 0 0 F1 0 0 0 0 1 F1 1 F2 2 F3 0 0 F3 F2 1 So equalize the lengths of timing paths through logic CSE477 L12&13 Low Power.39 Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Energy Design Time Variable Throughput/Latency Non-active Modules Logic Design Active Reduced Vdd Run Time DFS, DVS Clock Gating Sizing Multi-Vdd (Dynamic Freq, Voltage Scaling) Sleep Transistors Leakage + Multi-VT Multi-Vdd + Variable VT Variable VT CSE477 L12&13 Low Power.40 Irwin&Vijay, PSU, 2002 Dynamic Power as a Function of VDD Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 VDD (V) Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits). CSE477 L12&13 Low Power.41 Irwin&Vijay, PSU, 2002 Multiple VDD Considerations How many VDD? – Two is becoming common Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off V - The cross-coupled PMOS transistors do the level conversion - The NMOS transistor operate on a reduced supply Vin DDH VDDL Vout Level converters are not needed for a step-down change in voltage Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) CSE477 L12&13 Low Power.42 Irwin&Vijay, PSU, 2002 Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths CSE477 L12&13 Low Power.43 Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Energy Design Time Variable Throughput/Latency Non-active Modules Logic Design Active Reduced Vdd Run Time DFS, DVS Clock Gating Sizing Multi-Vdd (Dynamic Freq, Voltage Scaling) Sleep Transistors Leakage + Multi-VT Multi-Vdd + Variable VT Variable VT CSE477 L12&13 Low Power.44 Irwin&Vijay, PSU, 2002 Reducing the VT increases the subthreshold leakage current (exponentially) 90mV reduction in VT increases leakage by an order of magnitude But, reducing VT decreases gate delay (increases performance) ID (A) Leakage as a Function of Design Time VT VT=0.4V VT=0.1V 0 0.2 0.4 0.6 0.8 1 VGS (V) Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control. A careful assignment of VT’s can reduce the leakage by as much as 80% CSE477 L12&13 Low Power.45 Irwin&Vijay, PSU, 2002 Dual-Thresholds Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed No level converters are needed CSE477 L12&13 Low Power.46 Irwin&Vijay, PSU, 2002 Variable VT (ABB) at Run Time VT = VT0 + (|-2F + VSB| - |-2F|) For an n-channel device, the substrate is normally tied to ground (VSB = 0) A negative bias on VSB causes VT to increase Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) Requires a dual well fab process 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 -2.5 CSE477 L12&13 Low Power.47 -2 -1.5 -1 VSB (V) -0.5 0 Irwin&Vijay, PSU, 2002