Jay Moon`s ER Techniques

advertisement
Energy-Recovery CMOS Design
Jay Moon, Bill Athas*
Univ of Southern California
*Apple Computer, Inc.
jsmoon@usc.edu / athas@apple.com
March 05, 2001
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
1
Outline
•
•
•
•
•
•
•
•
Motivation
Review of CMOS switching energetics
Adiabatic charging
Energy-Recovery CMOS
Stepwise charging
Clock-powered logic (CPL)
Harmonic resonant charging
Future Research
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
2
Motivation
high-performance
&
low-power
computing
• It’s becoming increasingly difficult to get rid of the heat
generated by VLSI chips
• Battery life for portables
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
3
Types of power dissipation
• Dynamic power dissipation
– Charging and discharging capacitances
– Short-circuit current
• Static power dissipation
– Sub-threshold currents
– Drain-junction leakage
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
4
Capacitor energy equations
• Suppose at time t, a charge q is transferred from one
plate to the other
• The potential v is q/C
• For a charge transfer increment of dq, the additional
work is :
q
dE = vdq = dq
C
• For the total charge transfer Q :
Q q
1 Q2
E = dE = ∫ dq =
0 C
2 C
Q = CV
1
E = CV 2
2
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
5
CMOS switching energetics
• Interestingly (and thankfully) CMOS energetics can be
analyzed and understood from the CMOS inverter.
• Charge is conserved
• Energy is conserved
• Neglect leakage current
• Neglect short-circuit current
EPS=VQ=CV 2
V
V
PS
0
0
C
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
6
The charging event
EPS=VQ=CV 2
EHEAT=(1/2)CV2
V
V
PS
0
0
C
• Power supply delivers a charge packet of size Q=CV
EPS = CV • V = CV2
EC = (1/2)CV2
EPS – EC = (1/2)CV2 = EHEAT
• This much energy is dissipated in the pFET
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
7
The discharging event
V
V
PS
0
EHEAT
C
0
EPS=0Q=0
• Power supply gets the charge at potential 0
EPS = 0
• The energy on the capacitor goes from (1/2)CV2 to 0
EC – 0 = (1/2)CV2 = EHEAT
• This much energy is dissipated in the nFET
• All of the charge is returned to the PS at potential 0
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
8
Complex gates and pass logic
V
PS
V
0
0
C
• Circuit topology does not change energetics
• It’s about the potential of the charge
• Not where the charge goes
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
9
Power supply perspectives
• Inject charge at the highest allowed voltage
VDD
• Recover returned charge at the lowest allowed voltage
0
• Simple scheme of shorting capacitors to VDD or ground
through switches
• Maximally wasteful from an energy conservation
standpoint
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
10
Power equation
•
•
•
•
(1/2)CV2 is dissipated to charge the capacitor
(1/2)CV2 is dissipated to discharge the capacitor
CV2 is dissipated per charge/discharge cycle
If we cycle the capacitor F times per second :
P = F • CV2
• Power is the rate at which work is done
• Note that if you need to cycle a capacitor N times from a
battery, doesn’t matter if you do it fast or slow.
The battery is just as dead either way
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
11
Voltage scaling
• Energy decreases quadratically with the voltage
E ~ VDD2
• Delay increases as the voltage reduces
τ ~ VDD/(VDD-VTH)2
τ3.3V / τ2.0V = 0.3
E3.3V / E2.0V = 2.7
(assuming Vth = 1V)
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
12
Voltage scaling effects
• PowerMillTM simulations of a 16-bit uProcessor
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
13
Energy vs. Cycle time
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
14
Adiabatic charging
• Charging from a variable-voltage source
(e.g. linear ramp)
V
T
R
0
C
• Assuming that R is the on-resistance of the switch, the
dissipation for charging or discharging C is:
E = (RC/T)•CV2 when T >> RC
• Energy can be traded for delay by increasing the charge
transport time
• Model the FETs as simple resistors (Rup and Rdn)
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
15
Adiabatic-charging principle
Conventional digital CMOS
Adiabatic charging
Rup
C
ξ(RC/T)CV2
C
VDD
Rdn
Rup
Rdn
T
ξ(RC/T)CV2
Ecycle = CV2
UCLA EE215B
C
Ecycle = 2ξ(RC/T)CV2
jsmoon@usc.edu / athas@apple.com
16
Energy-Recovery CMOS
energy source
•
•
energy-efficient
clock driver
clock-powered
chip
Exploit the on-chip capacitances of CMOS VLSI to reduce power
dissipation below the conventional limit (FCV2) using adiabatic
charging and energy-recovery
This research includes:
– Clock-energy recovery techniques
– Clock-powered logic – balanced power versus speed
– Stepwise charging (charging recycling) technique for
• Low-power VLSI pin drivers
• LCD panels
– Harmonic resonant charging technique for
• Clock signal for conventional chip
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
17
Stepwise charging
V
(N-1)V/N
V
CT
0
charging steps
V/N
CT
•
•
•
•
•
•
C
The load C is switched from 0 to V and vice-versa through N steps
CT should be roughly 10 times larger than C
Only one supply voltage is required
Intermediate step voltages converge after a few cycles
Dissipation for charging or discharging C is: E = (1/2)(CV2)/N
The overhead for controlling the FETs needs to be considered
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
18
2-Stepwise Driver
in
in
d_in
d_in
t
t
p
V/2
p
CT
n
CL
n
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
19
2-Stepwise Driver
in
d_in
t
t
p
V/2
(3)
CT
p
(2)
(1)
n
(4)
CL
n
•
•
•
•
•
Event 1 : 1/2C(V/2)2 stored, 1/2C(V/2)2 dissipated
Event 2 : 1/2C(V/2)2 added, 1/2C(V/2)2 dissipated
Event 3 : 1/2C(V/2)2 recovered, 1/2C(V/2)2 dissipated
Event 4 : 1/2C(V/2)2 dissipated
Total dissipation : 1/2C(V/2)2 * 4 = 1/2CV2
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
20
Clock-powered logic
• Exploits adiabatic charging to reduce dissipation
• Uses clocks as global time-varying voltage sources
• The challenge is to use the clock to drive data nodes
clock line
0
1
0
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
21
Clock-Powered logic design
•
•
•
•
Need an efficient clock driver
Innovate in the design of clock-steering logic
Use conventional precharged, pass-transistor, static logic
Use the clock-steering logic for high-capacitance nodes
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
22
Resonant clock driver
Vdc
off-chip
inductor
power
pulse
on-chip
capacitive load
•
•
•
•
Build-up energy in inductor
Transfer it to the load as a pulse
Recover the pulsed energy in the inductor
Repeat the process
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
23
The all-resonant clock driver
a.k.a blip driver
ϕ1
Cϕ
•
•
•
L
Vdc
L
ϕ2
Cϕ
Self-oscillating driver generates almost non-overlapping clock pulses
Highly efficient because of all-resonant gate drive
Trade-off between frequency stability and power efficiency
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
24
Clocked buffers
gate to channel capacitance
used for bootstrapping
ϕ1
ϕ2
ϕ1
Viso
Din
isolation
transistor
Vbn
ϕ2
clock-pass
transistor
Vbn
pull-down clamp
transistor for
noise immunity
•
•
•
Clock-pass transistor is critical for speed and power performance
Bootstrapping yields high conductance per gate capacitance
Clock voltage swing can be decoupled from the logic voltage swing.
– “Hot clocks” : clock swings above supply
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
25
Clocked buffers
ϕ2
ϕ1
1
Viso
1
0
1
0
1
clock-pass
transistor
Vbn
0
ϕ2
ϕ1
1
Viso
1
0
1
1
1+A
0
A
clock-pass
transistor
Vbn
0
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
A
26
Clock-powered logic
•
Eliminate pFETs and complements of clocks (smaller circuits, simpler
clock requirements)
– Precharge transistors are hot-clocked nFETs
– Pass gates in latches are hot-clocked nFETs
•
Move more capacitive loads to the clock-powered paths
– Pass-transistor logic (e.g. in muxes) powered by clocks (not shown)
ϕ1
ϕ2
Viso
Viso
Cp
ER latch
ER latch
precharged
logic block
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
27
The AC-1 processor experiment
•
Objectives
– Design and implement low-power processor based on clock-powered
logic and blip driver
– Evaluate significance of blip driver for low-power operation
– Compare clock-powered processor to conventional, static CMOS
alternative
•
Approach
– Select 16-bit ISA
– Design five-stage pipelined microarchitecture
– Use energy-recovery latches to inject and retract energy at large
capacitive loads
– Design logic and latches using “mostly-nMOS” circuit styles
– Include both conventional and blip drivers (for evaluation purposes)
– Desing a implementation of the same ISA using purely conventional
static-CMOS techniques
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
28
AC-1 microarchitecture
to PC_B
+
E
G
1
+
C
G
ALU
A
B
RF
from
I_B
PLA
C
control
B
from D_B
A
A
B
F
to A_B
ϕ2
fromIR
RD0
3
A
B
fromIR
•
•
•
•
•
•
RD1
0
RISC ISA (Bunda’93)
16-bit data
16-bit instructions
ϕ
ϕ
16 registers
Conventional 5-stage pipeline
Integer operations only (no multiply or divide)
UCLA EE215B
to D_B
fromIR
0
1
fromIR
1
H
WRL
ϕ1
ϕ2
ϕ1
ϕ2
ϕ1
2
jsmoon@usc.edu / athas@apple.com
29
AC-1 processor
•
•
•
•
•
•
•
Clock-powered logic
Resonant clock driver
16-bit data & instructions
16 registers
0.5um n-well CMOS
5-stage pipeline
~13K transistors
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
30
AC-1c : a conventional processor
•
•
•
•
•
Same target process
Cascade library cells
30k transistors
5.5um2
Uses gated clocks to
reduce power dissipation
• Important differences
– Custom vs library cells
– Optimizations
– Clock gating in AC-1c (40%)
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
31
Processor core summary
•
•
•
•
AC-1
– First generation clock-powered processor
– Mostly nMOS logic style
– Hot clocks
– Custom layout
AC-1c
– First generation conventional processor
– Static CMOS
– Cascade Epoch standard-cell library
ACPL
– Second-generation clock-powered processor
– Static CMOS
– Low-swing clocks
– Custom low-power fixed-cell library
– Cascade Epoch for place and route
DC-1
– Second-generation conventional processor
– Static CMOS
– Single-phase clocking
– Custom low-power fixed-cell library
– Cascade Epoch for place and route
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
32
Processor comparison
1.4
AC-1, no energy recovery
AC-1/c
ACPL, no energy recovery
DC-1
AC-1, 6.5x energy recovery
ACPL, 6.5x energy recovery
1.2
mW/MHz
1
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
120
140
160
Frequency (MHz)
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
33
Resonant clock drivers
Csmall
?
controller
C big
resonant clock driver
•
•
•
clock-powered chip
The difficulty with clock-powered logic is in the clock driver
Resonant circuits offer the highest efficiency
Low-power techniques that minimize the switched capacitance in real time
do not work well with resonant clock drivers
– The clocks will vary in phase, amplitude, and pulse width
•
•
Stabilizing the clock load maximizes the capacitive load
It’s an open research topic
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
34
Harmonic resonant charging
– Sinusoids
• Easy and efficient to generate
• Low overhead
• Hard to work with, very “undigital”
– Staircase
• Simple to generate and control
• High overhead
• Positive-going only
– Blips
• Advantages of the sinusoids
• Can be complementary
• Positive-going only
– Harmonic resonant driver
• We thought this would be hard (practically)
• Now think it is highly doable
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
35
Harmonic resonator design
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
36
Harmonic resonator results
• 2nd Harmonic Resonator
– 85% Energy efficiency
– 10% slew rate of total
cycle time
• 4th Harmonic Resonator
– 80% Energy efficiency
– 6% slew rate of total
cycle time
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
37
Harmonic resonator result
•
As R becomes smaller, slew rate decreases while power increases
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
38
Harmonic resonator result
•
Frequency of output signal doesn’t change for 30% variation of load
capacitance while energy efficiency suffers
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
39
Future research
• Clock-powered logic and blip driver has been
developed as a practical way of exploiting
adiabatic charging for CMOS microprocessor
• How about Digital signal processor?
– Where power goes in DSP?
• Bus transaction vs. computation
• Energy-recovery SRAM, DRAM, SAM
– Capacitance variance is minimal because bitlines are
dual
• Driving clock network using harmonic resonator
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
40
References
•
•
•
•
ACMOS Homepage (still alive)
– http://www.isi.edu/acmos
For online paper archive
– http://www.isi.edu/acmos/acmosPapers.html
Books
– Rabaey, Pedram Ed. “Low Power Design Methodology”
– Chandrakasan, Brodersen Ed. “Low Power CMOS Design”
Most recent paper is published in
– JSSC, Nov. 2000 pp1561-1570
UCLA EE215B
jsmoon@usc.edu / athas@apple.com
41
Download