Energy-Recovery CMOS Design Jay Moon, Bill Athas* Univ of Southern California *Apple Computer, Inc. jsmoon@usc.edu / athas@apple.com March 05, 2001 UCLA EE215B jsmoon@usc.edu / athas@apple.com 1 Outline • • • • • • • • Motivation Review of CMOS switching energetics Adiabatic charging Energy-Recovery CMOS Stepwise charging Clock-powered logic (CPL) Harmonic resonant charging Future Research UCLA EE215B jsmoon@usc.edu / athas@apple.com 2 Motivation high-performance & low-power computing • It’s becoming increasingly difficult to get rid of the heat generated by VLSI chips • Battery life for portables UCLA EE215B jsmoon@usc.edu / athas@apple.com 3 Types of power dissipation • Dynamic power dissipation – Charging and discharging capacitances – Short-circuit current • Static power dissipation – Sub-threshold currents – Drain-junction leakage UCLA EE215B jsmoon@usc.edu / athas@apple.com 4 Capacitor energy equations • Suppose at time t, a charge q is transferred from one plate to the other • The potential v is q/C • For a charge transfer increment of dq, the additional work is : q dE = vdq = dq C • For the total charge transfer Q : Q q 1 Q2 E = dE = ∫ dq = 0 C 2 C Q = CV 1 E = CV 2 2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 5 CMOS switching energetics • Interestingly (and thankfully) CMOS energetics can be analyzed and understood from the CMOS inverter. • Charge is conserved • Energy is conserved • Neglect leakage current • Neglect short-circuit current EPS=VQ=CV 2 V V PS 0 0 C UCLA EE215B jsmoon@usc.edu / athas@apple.com 6 The charging event EPS=VQ=CV 2 EHEAT=(1/2)CV2 V V PS 0 0 C • Power supply delivers a charge packet of size Q=CV EPS = CV • V = CV2 EC = (1/2)CV2 EPS – EC = (1/2)CV2 = EHEAT • This much energy is dissipated in the pFET UCLA EE215B jsmoon@usc.edu / athas@apple.com 7 The discharging event V V PS 0 EHEAT C 0 EPS=0Q=0 • Power supply gets the charge at potential 0 EPS = 0 • The energy on the capacitor goes from (1/2)CV2 to 0 EC – 0 = (1/2)CV2 = EHEAT • This much energy is dissipated in the nFET • All of the charge is returned to the PS at potential 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 8 Complex gates and pass logic V PS V 0 0 C • Circuit topology does not change energetics • It’s about the potential of the charge • Not where the charge goes UCLA EE215B jsmoon@usc.edu / athas@apple.com 9 Power supply perspectives • Inject charge at the highest allowed voltage VDD • Recover returned charge at the lowest allowed voltage 0 • Simple scheme of shorting capacitors to VDD or ground through switches • Maximally wasteful from an energy conservation standpoint UCLA EE215B jsmoon@usc.edu / athas@apple.com 10 Power equation • • • • (1/2)CV2 is dissipated to charge the capacitor (1/2)CV2 is dissipated to discharge the capacitor CV2 is dissipated per charge/discharge cycle If we cycle the capacitor F times per second : P = F • CV2 • Power is the rate at which work is done • Note that if you need to cycle a capacitor N times from a battery, doesn’t matter if you do it fast or slow. The battery is just as dead either way UCLA EE215B jsmoon@usc.edu / athas@apple.com 11 Voltage scaling • Energy decreases quadratically with the voltage E ~ VDD2 • Delay increases as the voltage reduces τ ~ VDD/(VDD-VTH)2 τ3.3V / τ2.0V = 0.3 E3.3V / E2.0V = 2.7 (assuming Vth = 1V) UCLA EE215B jsmoon@usc.edu / athas@apple.com 12 Voltage scaling effects • PowerMillTM simulations of a 16-bit uProcessor UCLA EE215B jsmoon@usc.edu / athas@apple.com 13 Energy vs. Cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 14 Adiabatic charging • Charging from a variable-voltage source (e.g. linear ramp) V T R 0 C • Assuming that R is the on-resistance of the switch, the dissipation for charging or discharging C is: E = (RC/T)•CV2 when T >> RC • Energy can be traded for delay by increasing the charge transport time • Model the FETs as simple resistors (Rup and Rdn) UCLA EE215B jsmoon@usc.edu / athas@apple.com 15 Adiabatic-charging principle Conventional digital CMOS Adiabatic charging Rup C ξ(RC/T)CV2 C VDD Rdn Rup Rdn T ξ(RC/T)CV2 Ecycle = CV2 UCLA EE215B C Ecycle = 2ξ(RC/T)CV2 jsmoon@usc.edu / athas@apple.com 16 Energy-Recovery CMOS energy source • • energy-efficient clock driver clock-powered chip Exploit the on-chip capacitances of CMOS VLSI to reduce power dissipation below the conventional limit (FCV2) using adiabatic charging and energy-recovery This research includes: – Clock-energy recovery techniques – Clock-powered logic – balanced power versus speed – Stepwise charging (charging recycling) technique for • Low-power VLSI pin drivers • LCD panels – Harmonic resonant charging technique for • Clock signal for conventional chip UCLA EE215B jsmoon@usc.edu / athas@apple.com 17 Stepwise charging V (N-1)V/N V CT 0 charging steps V/N CT • • • • • • C The load C is switched from 0 to V and vice-versa through N steps CT should be roughly 10 times larger than C Only one supply voltage is required Intermediate step voltages converge after a few cycles Dissipation for charging or discharging C is: E = (1/2)(CV2)/N The overhead for controlling the FETs needs to be considered UCLA EE215B jsmoon@usc.edu / athas@apple.com 18 2-Stepwise Driver in in d_in d_in t t p V/2 p CT n CL n UCLA EE215B jsmoon@usc.edu / athas@apple.com 19 2-Stepwise Driver in d_in t t p V/2 (3) CT p (2) (1) n (4) CL n • • • • • Event 1 : 1/2C(V/2)2 stored, 1/2C(V/2)2 dissipated Event 2 : 1/2C(V/2)2 added, 1/2C(V/2)2 dissipated Event 3 : 1/2C(V/2)2 recovered, 1/2C(V/2)2 dissipated Event 4 : 1/2C(V/2)2 dissipated Total dissipation : 1/2C(V/2)2 * 4 = 1/2CV2 UCLA EE215B jsmoon@usc.edu / athas@apple.com 20 Clock-powered logic • Exploits adiabatic charging to reduce dissipation • Uses clocks as global time-varying voltage sources • The challenge is to use the clock to drive data nodes clock line 0 1 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com 21 Clock-Powered logic design • • • • Need an efficient clock driver Innovate in the design of clock-steering logic Use conventional precharged, pass-transistor, static logic Use the clock-steering logic for high-capacitance nodes UCLA EE215B jsmoon@usc.edu / athas@apple.com 22 Resonant clock driver Vdc off-chip inductor power pulse on-chip capacitive load • • • • Build-up energy in inductor Transfer it to the load as a pulse Recover the pulsed energy in the inductor Repeat the process UCLA EE215B jsmoon@usc.edu / athas@apple.com 23 The all-resonant clock driver a.k.a blip driver ϕ1 Cϕ • • • L Vdc L ϕ2 Cϕ Self-oscillating driver generates almost non-overlapping clock pulses Highly efficient because of all-resonant gate drive Trade-off between frequency stability and power efficiency UCLA EE215B jsmoon@usc.edu / athas@apple.com 24 Clocked buffers gate to channel capacitance used for bootstrapping ϕ1 ϕ2 ϕ1 Viso Din isolation transistor Vbn ϕ2 clock-pass transistor Vbn pull-down clamp transistor for noise immunity • • • Clock-pass transistor is critical for speed and power performance Bootstrapping yields high conductance per gate capacitance Clock voltage swing can be decoupled from the logic voltage swing. – “Hot clocks” : clock swings above supply UCLA EE215B jsmoon@usc.edu / athas@apple.com 25 Clocked buffers ϕ2 ϕ1 1 Viso 1 0 1 0 1 clock-pass transistor Vbn 0 ϕ2 ϕ1 1 Viso 1 0 1 1 1+A 0 A clock-pass transistor Vbn 0 UCLA EE215B jsmoon@usc.edu / athas@apple.com A 26 Clock-powered logic • Eliminate pFETs and complements of clocks (smaller circuits, simpler clock requirements) – Precharge transistors are hot-clocked nFETs – Pass gates in latches are hot-clocked nFETs • Move more capacitive loads to the clock-powered paths – Pass-transistor logic (e.g. in muxes) powered by clocks (not shown) ϕ1 ϕ2 Viso Viso Cp ER latch ER latch precharged logic block UCLA EE215B jsmoon@usc.edu / athas@apple.com 27 The AC-1 processor experiment • Objectives – Design and implement low-power processor based on clock-powered logic and blip driver – Evaluate significance of blip driver for low-power operation – Compare clock-powered processor to conventional, static CMOS alternative • Approach – Select 16-bit ISA – Design five-stage pipelined microarchitecture – Use energy-recovery latches to inject and retract energy at large capacitive loads – Design logic and latches using “mostly-nMOS” circuit styles – Include both conventional and blip drivers (for evaluation purposes) – Desing a implementation of the same ISA using purely conventional static-CMOS techniques UCLA EE215B jsmoon@usc.edu / athas@apple.com 28 AC-1 microarchitecture to PC_B + E G 1 + C G ALU A B RF from I_B PLA C control B from D_B A A B F to A_B ϕ2 fromIR RD0 3 A B fromIR • • • • • • RD1 0 RISC ISA (Bunda’93) 16-bit data 16-bit instructions ϕ ϕ 16 registers Conventional 5-stage pipeline Integer operations only (no multiply or divide) UCLA EE215B to D_B fromIR 0 1 fromIR 1 H WRL ϕ1 ϕ2 ϕ1 ϕ2 ϕ1 2 jsmoon@usc.edu / athas@apple.com 29 AC-1 processor • • • • • • • Clock-powered logic Resonant clock driver 16-bit data & instructions 16 registers 0.5um n-well CMOS 5-stage pipeline ~13K transistors UCLA EE215B jsmoon@usc.edu / athas@apple.com 30 AC-1c : a conventional processor • • • • • Same target process Cascade library cells 30k transistors 5.5um2 Uses gated clocks to reduce power dissipation • Important differences – Custom vs library cells – Optimizations – Clock gating in AC-1c (40%) UCLA EE215B jsmoon@usc.edu / athas@apple.com 31 Processor core summary • • • • AC-1 – First generation clock-powered processor – Mostly nMOS logic style – Hot clocks – Custom layout AC-1c – First generation conventional processor – Static CMOS – Cascade Epoch standard-cell library ACPL – Second-generation clock-powered processor – Static CMOS – Low-swing clocks – Custom low-power fixed-cell library – Cascade Epoch for place and route DC-1 – Second-generation conventional processor – Static CMOS – Single-phase clocking – Custom low-power fixed-cell library – Cascade Epoch for place and route UCLA EE215B jsmoon@usc.edu / athas@apple.com 32 Processor comparison 1.4 AC-1, no energy recovery AC-1/c ACPL, no energy recovery DC-1 AC-1, 6.5x energy recovery ACPL, 6.5x energy recovery 1.2 mW/MHz 1 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 Frequency (MHz) UCLA EE215B jsmoon@usc.edu / athas@apple.com 33 Resonant clock drivers Csmall ? controller C big resonant clock driver • • • clock-powered chip The difficulty with clock-powered logic is in the clock driver Resonant circuits offer the highest efficiency Low-power techniques that minimize the switched capacitance in real time do not work well with resonant clock drivers – The clocks will vary in phase, amplitude, and pulse width • • Stabilizing the clock load maximizes the capacitive load It’s an open research topic UCLA EE215B jsmoon@usc.edu / athas@apple.com 34 Harmonic resonant charging – Sinusoids • Easy and efficient to generate • Low overhead • Hard to work with, very “undigital” – Staircase • Simple to generate and control • High overhead • Positive-going only – Blips • Advantages of the sinusoids • Can be complementary • Positive-going only – Harmonic resonant driver • We thought this would be hard (practically) • Now think it is highly doable UCLA EE215B jsmoon@usc.edu / athas@apple.com 35 Harmonic resonator design UCLA EE215B jsmoon@usc.edu / athas@apple.com 36 Harmonic resonator results • 2nd Harmonic Resonator – 85% Energy efficiency – 10% slew rate of total cycle time • 4th Harmonic Resonator – 80% Energy efficiency – 6% slew rate of total cycle time UCLA EE215B jsmoon@usc.edu / athas@apple.com 37 Harmonic resonator result • As R becomes smaller, slew rate decreases while power increases UCLA EE215B jsmoon@usc.edu / athas@apple.com 38 Harmonic resonator result • Frequency of output signal doesn’t change for 30% variation of load capacitance while energy efficiency suffers UCLA EE215B jsmoon@usc.edu / athas@apple.com 39 Future research • Clock-powered logic and blip driver has been developed as a practical way of exploiting adiabatic charging for CMOS microprocessor • How about Digital signal processor? – Where power goes in DSP? • Bus transaction vs. computation • Energy-recovery SRAM, DRAM, SAM – Capacitance variance is minimal because bitlines are dual • Driving clock network using harmonic resonator UCLA EE215B jsmoon@usc.edu / athas@apple.com 40 References • • • • ACMOS Homepage (still alive) – http://www.isi.edu/acmos For online paper archive – http://www.isi.edu/acmos/acmosPapers.html Books – Rabaey, Pedram Ed. “Low Power Design Methodology” – Chandrakasan, Brodersen Ed. “Low Power CMOS Design” Most recent paper is published in – JSSC, Nov. 2000 pp1561-1570 UCLA EE215B jsmoon@usc.edu / athas@apple.com 41