CPE 626 Advanced VLSI Design Lecture 8: Power and Designing for Low Power Aleksandar Milenkovic http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~milenka/cpe626-04F/ milenka@ece.uah.edu Assistant Professor Electrical and Computer Engineering Dept. University of Alabama in Huntsville Advanced VLSI Design Why Power Matters Packaging costs Power supply rail design Chip and system cooling costs Noise immunity and system reliability Battery life (in portable systems) Environmental concerns Office equipment accounted for 5% of total US commercial energy usage in 1993 Energy Star compliant systems A. Milenkovic 2 Advanced VLSI Design Why worry about power? – Power Dissipation Lead Lead microprocessors microprocessors power power continues continues to to increase increase Power (Watts) 100 P6 Pentium ® 10 8086 286 1 4004 8008 486 386 8085 8080 0.1 Year 1971 1974 1978 1985 1992 2000 Power Power delivery delivery and and dissipation dissipation will will be be prohibitive prohibitive A. Milenkovic Source: Borkar, De Intel 3 1 Advanced VLSI Design Problem Illustration A. Milenkovic 4 Advanced VLSI Design Why worry about power ? – Battery Size/Weight 50 Nominal Capacity (W -hr/lb) Rechargable Lithium Battery (40+ lbs) 40 Ni-Metal Hydride 30 20 Nickel-Cadmium 10 0 65 70 75 80 Expected battery lifetime increase over the next 5 years: 30 to 40% 85 90 95 Year From Rabaey Rabaey,, 1995 A. Milenkovic 5 Advanced VLSI Design Why worry about power? – Standby Power Year 2005 2008 2011 1.5 1.2 0.9 0.7 0.6 Threshold V T ( V ) 0.4 0.4 0.35 0.3 0.25 2014 Drain leakage will increase as V T decreases to maintain noise margins and meet frequency demands, leading to excessive battery draining standby power consumption. •8KW •50% •40% • Standby Power § 2002 Power supply V dd (V) •20% …and phones leaky! •1.7KW •30% •400W • 88W •12W •10% •0% •2000 •2002 •2004 •2006 •2008 A. Milenkovic Source: Borkar, De Intel 6 2 Advanced VLSI Design Power and Energy Figures of Merit Power consumption in Watts determines battery life in hours Peak power determines power ground wiring designs sets packaging limits impacts signal noise margin and reliability analysis Energy efficiency in Joules rate at which power is consumed over time Energy = power * delay Joules = Watts * seconds lower energy number means less power to perform a computation at the same frequency A. Milenkovic 7 Advanced VLSI Design Power versus Energy Power is height of curve Watts Lower power design could simply be slower Approach 1 Approach 2 time Energy is area under curve Two approaches require the same energy Watts Approach 1 Approach 2 time A. Milenkovic 8 Advanced VLSI Design PDP and EDP Power-delay product (PDP ) = P av * tp = (CLV DD2)/2 PDP is the average energyconsumed per switching event (Watts * sec = Joule) lower power design could simply be a slower design • Energy-delay product (EDP) = PDP * tp = P av * t p2 15 Energy-Delay (normalized) • EDP is the average energy consumed multiplied by the computation time required 10 • takes into account that one can trade increased delay for lower energy/operation 5 (e.g., via supply voltage scaling that increases delay, but decreases energy 0 consumption) • allows one to understand tradeoffs 0.5better 1 A. Milenkovic energy-delay energy delay 1.5 2 2.5 Vdd (V) 9 3 Advanced VLSI Design Understanding Tradeoffs b Energy better Which design is the “best” (fastest, coolest, both) ? c a d 1/Delay better A. Milenkovic 10 Advanced VLSI Design Understanding Tradeoffs Lower EDP b Energy better Which design is the “best” (fastest, coolest, both) ? c a d 1/Delay better A. Milenkovic 11 Advanced VLSI Design CMOS Energy & Power Equations E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage f0→1 = P0→1 * fclock P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage Dynamic power Short -circuit power A. Milenkovic Leakage power 12 4 Advanced VLSI Design Dynamic Power Consumption Vdd Vin Vout CL Energy/transition = CL * VDD2 * P0→1 f0→1 P dyn = Energy/transition * f = CL * V DD2 * P 0→1 * f P dyn = CEFF * VDD2 * f where CEFF = P0→1 CL Not a function of transistor sizes! Data dependent - a function of switching activity! A. Milenkovic 13 Advanced VLSI Design Pop Quiz Consider a 0.25 micron chip, 500 MHz clock, average load cap of 15fF/gate (fanout of 4), 2.5V supply. Dynamic Power consumption per gate is ?? With 1 million gates (assuming each transitions every clock) Dynamic Power of entire chip = ??. A. Milenkovic 14 Advanced VLSI Design Lowering Dynamic Power Capacitance: Function of fan -out, wire length, transistor sizes Supply Voltage: Has been dropping with successive generations P dyn = CL V DD2 P 0→1 f Activity factor: How often, on average, do wires switch? A. Milenkovic Clock frequency: Increasing… 15 5 Advanced VLSI Design Short Circuit Power Consumption Vin Isc Vout CL Finite slope of the input signal causes a direct current path between V DD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting. A. Milenkovic 16 Advanced VLSI Design Short Circuit Currents Determinates Esc = tsc VDD Ipeak P0→1 Psc = tsc VDD Ipeak f0→1 Duration and slope of the input signal, tsc Ipeak determined by the saturation current of the P and N transistors which depend on their sizes, process technology, temperature, etc. strong function of the ratio between input and output slopes • a function of CL A. Milenkovic 17 Advanced VLSI Design Impact of CL on Psc Isc ≈ 0 Vin Isc ≈ I max Vout Vin CL Vout CL Large capacitive load Small capacitive load Output fall time significantly larger than input rise time. Output fall time substantially smaller than the input rise time. A. Milenkovic 18 6 Advanced VLSI Design Ipeak as a Function of CL x 10-4 2.5 When load capacitance is small, Ipeak is large. CL = 20 fF 2 Ipeak (A) 1.5 CL = 100 fF 1 0.5 CL = 500 fF 0 0 2 -0.5 4 time (sec) Short circuit dissipation is minimized by matching the rise/fall times of the input and x 610-10 output signals - slope engineering. 500 psec input slope A. Milenkovic 19 Advanced VLSI Design Psc as a Function of Rise/Fall Times 8 When load capacitance is small (tsin/tsout > 2 for V DD > 2V) the power is dominated by Psc 7 P normalized 6 VDD = 3.3 V 5 4 VDD = 2.5 V 3 If VDD < VTn + |VT p| then P sc is eliminated since both devices are never on at the same time. 2 VDD = 1.5V 1 0 tsin/tsout 0 2 4 W/Lp = 1.125 µ m/0.25 µ m normalized wrt zero input W/Ln = 0.375 µ m/0.25 µ m rise-time dissipation CL = 30 fF A. Milenkovic 20 Advanced VLSI Design Leakage (Static) Power Consumption VDD Ileakage Vout Drain junction leakage Gate leakage Sub-threshold current Sub-threshold current is the dominant factor. All increase exponentially with temperature! A. Milenkovic 21 7 Advanced VLSI Design Leakage as a Function of VT q Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation. 10-2 ID (A) q 10-7 VT=0.4V VT=0.1V An 90mV/decade VT roll-off - so each 255mV increase in V T gives 3 orders of magnitude reduction in leakage (but adversely affects performance) 10-12 0 0.2 0.4 0.6 0.8 1 VGS (V) A. Milenkovic 22 Advanced VLSI Design TSMC Processes Leakage and VT CL018 G CL018 LP CL018 ULP V dd 1.8 V 1.8 V T ox (effective) 42 Å 42 Å Lgate 0.16 µ m IDSat (n/p) (µ A/µ m) Ioff (leakage) (ρA/µ m) V Tn FET Perf. (GHz) CL018 HS CL015 HS CL013 HS 1.8 V 2V 1.5 V 1.2 V 42 Å 42 Å 29 Å 24 Å 0.16 µ m 0.18 µ m 0.13 µ m 0.11 µ m 0.08 µ m 600/260 500/180 320/130 780/360 860/370 920/400 20 1.60 0.15 300 1,800 13,000 0.42 V 0.63 V 0.73 V 0.40 V 0.29 V 0.25 V 30 22 14 43 52 80 From MPR, 2000 A. Milenkovic 23 Advanced VLSI Design Exponential Increase in Leakage Currents 10000 Ileakage(nA/µm) 1000 0.25 0.18 0.13 0.1 100 10 1 30 40 50 60 70 80 Temp(C) 90 100 110 From De,1999 A. Milenkovic 24 8 Advanced VLSI Design Review: Energy & Power Equations E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage f0→1 = P0→1 * fclock P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage Dynamic power (~90% today and decreasing relatively) Short -circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) A. Milenkovic 25 Advanced VLSI Design Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vd d Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Multi-Vd d Variable VT + Variable VT A. Milenkovic 26 Advanced VLSI Design Dynamic Power as a Function of Device Size Device sizing affects dynamic energy consumption gain is largest for networks with large overall effective fan-outs (F = C L/Cg,1 ) e.g., for F=20, fopt(energy) = 3.53 while fopt(performance) = 4.47 If energy is a concern avoid oversizing beyond the optimal normalized energy 1.5 The optimal gate sizing factor (f) for dynamic energy is smaller than the one for performance, especially for large F’s F=1 1 F=2 F=5 0.5 A. Milenkovic F=10 F=20 0 1 2 3 4 f 5 6 7 From Nikolic, UCB 27 9 Advanced VLSI Design Dynamic Power Consumption is Data Dependent Switching activity, P 0→1, has two components A static component – function of the logic topology A dynamic component – function of the timing behavior ( glitching ) Static transition probability P 0→1 = Pout=0 x Pout=1 2-input NOR Gate A B Out 0 0 1 0 1 0 1 0 0 1 1 0 = P0 x (1-P0) With input signal probabilities PA=1 = 1/2 PB=1 = 1/2 NOR static transition probability = 3/4 x 1/4 = 3/16 A. Milenkovic 28 Advanced VLSI Design NOR Gate Transition Probabilities Switching activity is a strong function of the input signal statistics PA and P B are the probabilities that inputs A and B are one A B 0 A B CL PA 1 0 1 PB P 0→1 = P0 x P 1 = (1-(1-P A )(1-P B )) (1-PA )(1-P B ) A. Milenkovic 29 Advanced VLSI Design Transition Probabilities for Some Basic Gates NOR OR NAND AND XOR P 0→1 = Pout=0 x Pout=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB ) (1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B )) P A PB x (1 - P AP B ) (1 - P A PB ) x P A PB (1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B ) 0.5 A X Z 0.5 B For X: P 0→1 = For Z: P0→1 = A. Milenkovic 30 10 Advanced VLSI Design Transition Probabilities for Some Basic Gates NOR OR NAND AND XOR P 0→1 = Pout=0 x Pout=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB ) (1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B )) P A PB x (1 - P AP B ) (1 - P A PB ) x P A PB (1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B ) X 0.5 A Z 0.5 B For X: P 0→1 = P 0 x P1 = (1-PA ) PA = 0.5 x 0.5 = 0.25 For Z: P0→1 = P0 x P1 = (1 -P X PB ) PX PB = (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16 A. Milenkovic 31 Advanced VLSI Design Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan- out A X 0.5 B 0.5 Z Reconvergent fan-out P(Z=1) = P(B=1) & P(A=1 | B=1) Have to use conditional probabilities A. Milenkovic 32 Advanced VLSI Design Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan- out (1-0.5)(1-0.5)x(1-(1-0.5)(1- 0.5)) = 3/16 0.5 A 0.5 B Reconvergent X Z P(Z=1) = P(B=1) * P(X=1 | B=1) = 0.5 * 1 = 0.5 P(Z=0) = 1 – P(B=1)*P(X=1 | B=1) = 0.5 P(0->1) = 0.5*0.5 = 0.25 P(Z=1) = P(B=1) & P(A=1 | B=1) Have to use conditional probabilities A. Milenkovic 33 11 Advanced VLSI Design Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P0→1 = P0 x P1 = (1 - P AP B ) x PAP B 0.5 A B 0.5 (1-0.25)*0.25 = 3/16 W 7/64 X 15/256 C F 0.5 D 0.5 3/16 Y 0.5 A 0.5 B 15/256 F 0.5 C 0.5 D Z 3/16 Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects A. Milenkovic 34 Advanced VLSI Design Input Ordering 0.5 A B 0.2 0.2 B X C 0.1 F C 0.1 X F A 0.5 Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) A. Milenkovic 35 Advanced VLSI Design Input Ordering (1-0.5x0.2)x(0.5x0.2)=0.09 0.5 A B 0.2 X C 0.1 F 0.2 B C 0.1 (1-0.2x0.1)x(0.2x0.1)=0.0196 X A 0.5 F Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) A. Milenkovic 36 12 Advanced VLSI Design Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle be fore settling to the correct logic value A B X Z C ABC 101 000 X Z Unit Delay A. Milenkovic 37 Advanced VLSI Design Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle be fore settling to the correct logic value A B X Z C ABC 101 000 X Z Unit Delay A. Milenkovic 38 Advanced VLSI Design Glitching in an RCA Cin S14 S15 S2 S0 S1 S Output Voltage (V) 3 2 S3 S4 Cin S2 S15 S5 1 S10 S1 S0 0 0 2 4 6 Time (ps) A. Milenkovic 8 10 12 39 13 Advanced VLSI Design Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs 0 0 0 F1 0 0 1 F1 1 F2 2 0 F3 0 0 F3 F2 1 So equalize the lengths of timing paths through logic A. Milenkovic 40 Advanced VLSI Design Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vd d Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Multi-Vd d Variable VT + Variable VT A. Milenkovic 41 Advanced VLSI Design Dynamic Power as a Function of VDD 5.5 5 4.5 tp(normalized) Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) 4 3.5 3 2.5 2 1.5 1 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 V DD (V) Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits). A. Milenkovic 42 14 Advanced VLSI Design Multiple VDD Considerations How many VDD? – Two is becoming common Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) If a gate supplied with VDDL drives a gate at VDDH , Vthe PMOS never DDH turns off • The cross-coupled PMOS transistors do the level conversion • The NMOS transistor operate on a Vin reduced supply VDDL Vout Level converters are not needed for a step-down change in voltage Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) A. Milenkovic 43 Advanced VLSI Design Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slackis available Level conversion is done in the flipflops at the end of the paths A. Milenkovic 44 Advanced VLSI Design Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non- active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Multi-Vdd Variable V T + Variable VT A. Milenkovic 45 15 Advanced VLSI Design Stack Effect Leakage is a function of the circuit topology and the value of the inputs V T = VT 0 + γ(√|-2φF + VSB | - √|-2φF|) where VT0 is the threshold voltage at V SB = 0 ; VS B is the source- bulk (substrate) voltage; γ is the body-effect coefficient A B A VX ISUB 0 0 V T ln(1+n) V GS=VBS = -V X 0 1 0 V GS=VBS =0 1 0 V DD -V T V GS=VBS =0 1 1 0 V SG=VSB =0 B Out A VX B Leakage is least when A = B = 0 Leakage reduction due to stacked transistors is called the stack effect A. Milenkovic 46 Advanced VLSI Design Short Channel Factors and Stack Effect In short-channel devices, the subthreshold leakage current depends on VGS ,VBS and VDS. The VT of a short-channel device decreases with increasing VDS due to DIBL (drain-induced barrier loading). Typical values for DIBL are 20 to 150mV change in VT per voltage change in VD S so the stack effect is even more significant for short-channel devices. VX reduces the drain-source voltage of the top nfet, increasing its V T and lowering its leakage For our 0.25 micron technology, VX settles to ~100mV in steady state so VBS = -100mV and VDS = VDD -100mV which is 20 times smaller than the leakage of a device with V BS = 0mV and VDS = V DD A. Milenkovic 47 Advanced VLSI Design Leakage as a Function of Design Time VT 90mV reduction in V T increases leakage by an order of magnitude ID (A) Reducing the VT increases the sub-threshold leakage current (exponentially) But, reducing VT decreases gate delay (increases performance) VT=0.4V VT=0.1V 0 0.2 0.4 0.6 0.8 1 VGS (V) Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high V T on the other logic for leakage control. A careful assignment of VT’s can reduce the leakage by as much as 80% A. Milenkovic 48 16 Advanced VLSI Design Dual-Thresholds Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed No level converters are needed A. Milenkovic 49 Advanced VLSI Design Variable V T (ABB) at Run Time V T = V T0 + γ(√|- 2φF + V SB | - √|-2φF |) • For an n-channel device, the substrate is normally tied to ground (VSB = 0) • A negative bias on V SB causes VT to increase • Requires a dual well fab process VT (V) • Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 -2.5 A. Milenkovic -2 -1.5 -1 VS B (V) -0.5 0 50 17