Advanced VLSI Design Why Power Matters Packaging costs Power supply rail design CPE 626 Advanced VLSI Design Lecture 8: Power and Designing for Low Power Chip and system cooling costs Noise immunity and system reliability Battery life (in portable systems) Environmental concerns Aleksandar Milenkovic Office equipment accounted for 5% of total US commercial energy usage in 1993 Energy Star compliant systems http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~milenka/cpe626-04F/ milenka@ece.uah.edu Assistant Professor Electrical and Computer Engineering Dept. University of Alabama in Huntsville A. Milenkovic Advanced VLSI Design 2 Advanced VLSI Design Why worry about power? – Power Dissipation Problem Illustration Lead Lead microprocessors microprocessors power power continues continues to to increase increase Power (Watts) 100 P6 Pentium ® 10 8086 286 1 4004 8008 486 386 8085 8080 0.1 Year 1971 1974 1978 1985 1992 2000 Power Power delivery delivery and and dissipation dissipation will will be be prohibitive prohibitive A. Milenkovic Source: Borkar, De Intel A. Milenkovic 3 Advanced VLSI Design Advanced VLSI Design Why worry about power ? – Battery Size/Weight Why worry about power? – Standby Power 50 2002 2005 2008 2011 Power supply V dd (V) Year 1.5 1.2 0.9 0.7 0.6 Threshold V T ( V ) 0.4 0.4 0.35 0.3 0.25 40 Ni-Metal Hydride § 30 20 •8KW •50% Nickel-Cadmium 10 •40% 0 65 70 75 Expected battery lifetime increase over the next 5 years: 30 to 40% 80 85 90 95 Year 2014 Drain leakage will increase as V T decreases to maintain noise margins and meet frequency demands, leading to excessive battery draining standby power consumption. • Standby Power Nominal Capacity (W -hr/lb) Rechargable Lithium Battery (40+ lbs) 4 •20% …and phones leaky! •1.7KW •30% •400W • 88W •12W •10% From Rabaey Rabaey,, 1995 •0% A. Milenkovic 5 •2000 •2002 •2004 •2006 •2008 A. Milenkovic Source: Borkar, De Intel 6 1 Advanced VLSI Design Advanced VLSI Design Power and Energy Figures of Merit Power versus Energy Power is height of curve Power consumption in Watts Watts determines battery life in hours Lower power design could simply be slower Approach 1 Peak power determines power ground wiring designs sets packaging limits impacts signal noise margin and reliability analysis Approach 2 Energy efficiency in Joules time Energy is area under curve Two approaches require the same energy Watts rate at which power is consumed over time Approach 1 Energy = power * delay Approach 2 Joules = Watts * seconds lower energy number means less power to perform a computation at the same frequency time A. Milenkovic A. Milenkovic 7 Advanced VLSI Design 8 Advanced VLSI Design PDP and EDP Understanding Tradeoffs Power-delay product (PDP ) = P av * tp = (CLV DD2)/2 Which design is the “best” (fastest, coolest, both) ? PDP is the average energyconsumed per switching event (Watts * sec = Joule) lower power design could simply be a slower design 15 Energy-Delay (normalized) • EDP is the average energy consumed multiplied by the computation time required 10 • takes into account that one can trade increased delay for lower energy/operation 5 (e.g., via supply voltage scaling that increases delay, but decreases energy 0 consumption) • allows one to understand tradeoffs 0.5better 1 A. Milenkovic energy-delay b Energy better • Energy-delay product (EDP) = PDP * tp = P av * t p2 c a d energy delay 1.5 2 1/Delay better 2.5 Vdd (V) A. Milenkovic 9 Advanced VLSI Design 10 Advanced VLSI Design Understanding Tradeoffs CMOS Energy & Power Equations Which design is the “best” (fastest, coolest, both) ? b Energy better E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage Lower EDP c f0→1 = P0→1 * fclock a P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage d Dynamic power 1/Delay better A. Milenkovic 11 Short -circuit power A. Milenkovic Leakage power 12 2 Advanced VLSI Design Advanced VLSI Design Dynamic Power Consumption Vdd Vin Pop Quiz Consider a 0.25 micron chip, 500 MHz clock, average load cap of 15fF/gate (fanout of 4), 2.5V supply. Vout Dynamic Power consumption per gate is ?? CL Energy/transition = CL * VDD2 * P0→1 f0→1 P dyn = Energy/transition * f = CL * V DD2 * P 0→1 * f P dyn = CEFF * VDD2 * f where CEFF = P0→1 CL With 1 million gates (assuming each transitions every clock) Not a function of transistor sizes! Data dependent - a function of switching activity! A. Milenkovic Dynamic Power of entire chip = ??. A. Milenkovic 13 Advanced VLSI Design Advanced VLSI Design Short Circuit Power Consumption Lowering Dynamic Power Capacitance: Function of fan -out, wire length, transistor sizes 14 Supply Voltage: Has been dropping with successive generations Vin Isc Vout CL P dyn = CL V DD2 P 0→1 f Activity factor: How often, on average, do wires switch? Finite slope of the input signal causes a direct current path between V DD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting. Clock frequency: Increasing… A. Milenkovic A. Milenkovic 15 Advanced VLSI Design 16 Advanced VLSI Design Impact of CL on Psc Short Circuit Currents Determinates Esc = tsc VDD Ipeak P0→1 Isc ≈ 0 Psc = tsc VDD Ipeak f0→1 Vin Isc ≈ I max Vout Vin CL Vout CL Duration and slope of the input signal, tsc Ipeak determined by the saturation current of the P and N transistors which depend on their sizes, process technology, temperature, etc. strong function of the ratio between input and output slopes Large capacitive load Small capacitive load Output fall time significantly larger than input rise time. Output fall time substantially smaller than the input rise time. • a function of CL A. Milenkovic 17 A. Milenkovic 18 3 Advanced VLSI Design Advanced VLSI Design Ipeak as a Function of CL Psc as a Function of Rise/Fall Times 8 When load capacitance is small, Ipeak is large. CL = 20 fF 2 Ipeak (A) 1.5 CL = 100 fF 1 0.5 CL = 500 fF 0 0 2 -0.5 4 time (sec) Short circuit dissipation is minimized by matching the rise/fall times of the input and x 610-10 output signals - slope engineering. VDD = 3.3 V 6 5 4 VDD = 2.5 V 3 If VDD < VTn + |VT p| then P sc is eliminated since both devices are never on at the same time. 2 VDD = 1.5V 1 0 tsin/tsout 0 2 4 W/Lp = 1.125 µ m/0.25 µ m normalized wrt zero input W/Ln = 0.375 µ m/0.25 µ m rise-time dissipation CL = 30 fF 500 psec input slope A. Milenkovic When load capacitance is small (tsin/tsout > 2 for V DD > 2V) the power is dominated by Psc 7 P normalized x 10-4 2.5 A. Milenkovic 19 Advanced VLSI Design Advanced VLSI Design Leakage (Static) Power Consumption VDD Ileakage Leakage as a Function of VT q Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation. 10-2 Vout q ID (A) Drain junction leakage Sub-threshold current Gate leakage 20 10-7 VT=0.4V VT=0.1V Sub-threshold current is the dominant factor. An 90mV/decade VT roll-off - so each 255mV increase in V T gives 3 orders of magnitude reduction in leakage (but adversely affects performance) 10-12 All increase exponentially with temperature! 0 0.2 0.4 0.6 0.8 1 VGS (V) A. Milenkovic A. Milenkovic 21 Advanced VLSI Design Advanced VLSI Design TSMC Processes Leakage and VT CL018 LP CL018 ULP V dd 1.8 V 1.8 V T ox (effective) 42 Å 42 Å Lgate 0.16 µ m IDSat (n/p) (µ A/µ m) 600/260 Ioff (leakage) (ρA/µ m) V Tn FET Perf. (GHz) Exponential Increase in Leakage Currents 10000 CL018 HS CL015 HS CL013 HS 1.8 V 2V 1.5 V 1.2 V 42 Å 42 Å 29 Å 24 Å 0.16 µ m 0.18 µ m 0.13 µ m 0.11 µ m 0.08 µ m 500/180 320/130 780/360 860/370 920/400 20 1.60 0.15 300 1,800 13,000 0.42 V 0.63 V 0.73 V 0.40 V 0.29 V 0.25 V 30 22 14 43 52 80 1000 Ileakage(nA/µm) CL018 G 0.25 0.18 0.13 0.1 100 10 1 30 40 50 60 70 80 Temp(C) From MPR, 2000 A. Milenkovic 22 23 90 100 110 From De,1999 A. Milenkovic 24 4 Advanced VLSI Design Advanced VLSI Design Review: Energy & Power Equations Power and Energy Design Space E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage Constant Throughput/Latency f0→1 = P0→1 * fclock P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage Dynamic power (~90% today and decreasing relatively) Short -circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) A. Milenkovic Active Clock Gating Leakage + Multi-VT Sleep Transistors Multi-Vd d Variable VT + Variable VT A. Milenkovic F=1 F=2 1 2-input NOR Gate F=5 0.5 F=10 F=20 0 1 2 3 4 f 5 6 7 A B Out 0 0 1 0 1 0 1 0 0 1 1 0 From Nikolic, UCB A. Milenkovic Advanced VLSI Design NOR OR NAND AND XOR A B 0 PA 1 0 NOR static transition probability = 3/4 x 1/4 = 3/16 28 P 0→1 = Pout=0 x Pout=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB ) (1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B )) P A PB x (1 - P AP B ) (1 - P A PB ) x P A PB (1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B ) 0.5 A 1 PB X Z 0.5 B For X: P 0→1 = P 0→1 = P0 x P 1 = (1-(1-P A )(1-P B )) (1-PA )(1-P B ) A. Milenkovic With input signal probabilities PA=1 = 1/2 PB=1 = 1/2 Advanced VLSI Design PA and P B are the probabilities that inputs A and B are one CL = P0 x (1-P0) Transition Probabilities for Some Basic Gates Switching activity is a strong function of the input signal statistics B Static transition probability P 0→1 = Pout=0 x Pout=1 A. Milenkovic 27 NOR Gate Transition Probabilities A 26 A static component – function of the logic topology A dynamic component – function of the timing behavior ( glitching ) 1.5 normalized energy Run Time DFS, DVS (Dynamic Freq, Voltage Scaling) Switching activity, P 0→1, has two components gain is largest for networks with large overall effective fan-outs (F = C L/Cg,1 ) If energy is a concern avoid oversizing beyond the optimal Non-active Modules Advanced VLSI Design Device sizing affects dynamic energy consumption e.g., for F=20, fopt(energy) = 3.53 while fopt(performance) = 4.47 Design Time Logic Design Reduced Vdd Sizing Multi-Vd d Dynamic Power Consumption is Data Dependent Dynamic Power as a Function of Device Size The optimal gate sizing factor (f) for dynamic energy is smaller than the one for performance, especially for large F’s Energy 25 Advanced VLSI Design Variable Throughput/Latency 29 For Z: P0→1 = A. Milenkovic 30 5 Advanced VLSI Design Advanced VLSI Design Transition Probabilities for Some Basic Gates Inter-signal Correlations P 0→1 = Pout=0 x Pout=1 (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB ) (1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B )) P A PB x (1 - P AP B ) (1 - P A PB ) x P A PB (1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B ) NOR OR NAND AND XOR Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan- out A X 0.5 B 0.5 0.5 A Reconvergent fan-out Z 0.5 B P(Z=1) = P(B=1) & P(A=1 | B=1) For X: P 0→1 = P 0 x P1 = (1-PA ) PA = 0.5 x 0.5 = 0.25 For Z: P0→1 = P0 x P1 = (1 -P X PB ) PX PB = (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16 A. Milenkovic Have to use conditional probabilities A. Milenkovic 31 Advanced VLSI Design Logic Restructuring Determining switching activity is complicated by the fact that signals exhibit correlation in space and time Logic restructuring: changing the topology of a logic network to reduce transitions AND: P0→1 = P0 x P1 = (1 - P AP B ) x PAP B reconvergent fan- out (1-0.5)(1-0.5)x(1-(1-0.5)(1- 0.5)) = 3/16 A 0.5 B 0.5 A B 0.5 X Z (1-0.25)*0.25 = 3/16 W 7/64 X 15/256 C F 0.5 D 0.5 P(Z=1) = P(B=1) * P(X=1 | B=1) = 0.5 * 1 = 0.5 P(Z=0) = 1 – P(B=1)*P(X=1 | B=1) = 0.5 P(0->1) = 0.5*0.5 = 0.25 Reconvergent Have to use conditional probabilities A. Milenkovic 0.5 B 0.5 C 0.5 D 15/256 F Z 3/16 A. Milenkovic 33 Advanced VLSI Design 34 Advanced VLSI Design Input Ordering Input Ordering (1-0.5x0.2)x(0.5x0.2)=0.09 0.2 B X C 0.1 3/16 Y 0.5 A Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects P(Z=1) = P(B=1) & P(A=1 | B=1) 0.5 A B 0.2 32 Advanced VLSI Design Inter-signal Correlations 0.5 Z X C 0.1 F 0.5 A B 0.2 X A 0.5 F C 0.1 F 0.2 B C 0.1 (1-0.2x0.1)x(0.2x0.1)=0.0196 X A 0.5 F Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) A. Milenkovic X 35 A. Milenkovic 36 6 Advanced VLSI Design Advanced VLSI Design Glitching in Static CMOS Networks Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle be fore settling to the correct logic value A B X 101 A B Z C ABC glitch: node exhibits multiple transitions in a single cycle be fore settling to the correct logic value 000 ABC X X Z Z 101 Unit Delay A. Milenkovic Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs Cin S14 S2 0 0 S0 S1 3 S Output Voltage (V) 38 Advanced VLSI Design Glitching in an RCA 0 F1 0 0 1 S4 S2 F1 1 F2 2 0 S3 Cin 000 37 Advanced VLSI Design 2 Z Unit Delay A. Milenkovic S15 X C F3 0 0 F3 S15 F2 1 S5 1 S10 S1 So equalize the lengths of timing paths through logic S0 0 2 4 6 8 10 12 Time (ps) A. Milenkovic Advanced VLSI Design Energy Design Time Active Leakage Dynamic Power as a Function of VDD 5.5 Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) Variable Throughput/Latency Non-active Modules Run Time Logic Design Reduced Vdd Sizing Multi-Vd d Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) + Multi-VT Sleep Transistors Multi-Vd d Variable VT 5 4.5 4 3.5 3 2.5 2 1.5 1 0.8 A. Milenkovic 40 Advanced VLSI Design Power and Energy Design Space Constant Throughput/Latency A. Milenkovic 39 tp(normalized) 0 1 1.2 1.4 1.6 1.8 2 2.2 2.4 V DD (V) + Variable VT Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits). 41 A. Milenkovic 42 7 Advanced VLSI Design Advanced VLSI Design Multiple VDD Considerations Dual-Supply Inside a Logic Block How many VDD? – Two is becoming common Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slackis available Level conversion is done in the flipflops at the end of the paths If a gate supplied with VDDL drives a gate at VDDH , Vthe PMOS never DDH turns off • The cross-coupled PMOS transistors do the level conversion • The NMOS transistor operate on a Vin reduced supply VDDL Vout Level converters are not needed for a step-down change in voltage Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) A. Milenkovic A. Milenkovic 43 Advanced VLSI Design Advanced VLSI Design Power and Energy Design Space Stack Effect Energy Design Time Non- active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) + Multi-VT Sleep Transistors Multi-Vdd Variable V T Leakage Leakage is a function of the circuit topology and the value of the inputs Variable Throughput/Latency V T = VT 0 + γ(√|-2φF + VSB | - √|-2φF|) where VT0 is the threshold voltage at V SB = 0 ; VS B is the source- bulk (substrate) voltage; γ is the body-effect coefficient A B A A VX B A. Milenkovic V T ln(1+n) V GS=VBS = -V X 0 1 0 V GS=VBS =0 1 0 V DD -V T V GS=VBS =0 1 1 0 V SG=VSB =0 Leakage is least when A = B = 0 Leakage reduction due to stacked transistors is called the stack effect A. Milenkovic 45 Advanced VLSI Design ISUB 0 B Out + Variable VT VX 0 46 Advanced VLSI Design Short Channel Factors and Stack Effect Leakage as a Function of Design Time VT Reducing the VT increases the sub-threshold leakage current (exponentially) In short-channel devices, the subthreshold leakage current depends on VGS ,VBS and VDS. The VT of a short-channel device decreases with increasing VDS due to DIBL (drain-induced barrier loading). 90mV reduction in V T increases leakage by an order of magnitude Typical values for DIBL are 20 to 150mV change in VT per voltage change in VD S so the stack effect is even more significant for short-channel devices. VX reduces the drain-source voltage of the top nfet, increasing its V T and lowering its leakage ID (A) Constant Throughput/Latency 44 But, reducing VT decreases gate delay (increases performance) VT=0.4V VT=0.1V 0 For our 0.25 micron technology, VX settles to ~100mV in steady state so VBS = -100mV and VDS = VDD -100mV which is 20 times smaller than the leakage of a device with V BS = 0mV and VDS = V DD 0.2 0.4 0.6 0.8 1 VGS (V) Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high V T on the other logic for leakage control. A careful assignment of VT’s can reduce the leakage by as much as 80% A. Milenkovic 47 A. Milenkovic 48 8 Advanced VLSI Design Advanced VLSI Design Dual-Thresholds Inside a Logic Block Variable V T (ABB) at Run Time V T = V T0 + γ(√|- 2φF + V SB | - √|-2φF |) Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths • For an n-channel device, the substrate is normally tied to ground (VSB = 0) Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed No level converters are needed • A negative bias on V SB causes VT to increase • Requires a dual well fab process VT (V) • Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 -2.5 A. Milenkovic 49 A. Milenkovic -2 -1.5 -1 VS B (V) -0.5 0 50 9