CPE 626 Advanced VLSI Design Lecture 8: Power and Designing

advertisement
CPE 626
Advanced VLSI Design
Lecture 8: Power and
Designing for Low Power
Aleksandar Milenkovic
http://www.ece.uah.edu/~milenka
http://www.ece.uah.edu/~milenka/cpe626-04F/
milenka@ece.uah.edu
Assistant Professor
Electrical and Computer Engineering Dept.
University of Alabama in Huntsville
Advanced VLSI Design
Why Power Matters
Packaging costs
Power supply rail design
Chip and system cooling costs
Noise immunity and system reliability
Battery life (in portable systems)
Environmental concerns
Office equipment accounted for 5% of total US
commercial energy usage in 1993
Energy Star compliant systems
 A. Milenkovic
2
Advanced VLSI Design
Why worry about power? –
Power Dissipation
Lead
Lead microprocessors
microprocessors power
power continues
continues to
to increase
increase
Power (Watts)
100
P6
Pentium ®
10
8086 286
1
4004
8008
486
386
8085
8080
0.1
Year
1971
1974
1978
1985
1992
2000
Power
Power delivery
delivery and
and dissipation
dissipation will
will be
be prohibitive
prohibitive
 A. Milenkovic
Source: Borkar, De Intel
3
1
Advanced VLSI Design
Problem Illustration
 A. Milenkovic
4
Advanced VLSI Design
Why worry about power ? –
Battery Size/Weight
50
Nominal Capacity (W -hr/lb)
Rechargable Lithium
Battery
(40+ lbs)
40
Ni-Metal Hydride
30
20
Nickel-Cadmium
10
0
65
70
75
80
Expected battery lifetime increase
over the next 5 years: 30 to 40%
85
90
95
Year
From Rabaey
Rabaey,, 1995
 A. Milenkovic
5
Advanced VLSI Design
Why worry about power? –
Standby Power
Year
2005
2008
2011
1.5
1.2
0.9
0.7
0.6
Threshold V T ( V )
0.4
0.4
0.35
0.3
0.25
2014
Drain leakage will increase as V T decreases to maintain noise
margins and meet frequency demands, leading to excessive battery
draining standby power consumption.
•8KW
•50%
•40%
• Standby Power
§
2002
Power supply V dd (V)
•20%
…and phones leaky!
•1.7KW
•30%
•400W
• 88W
•12W
•10%
•0%
•2000
•2002
•2004
•2006
•2008
 A. Milenkovic
Source: Borkar, De Intel
6
2
Advanced VLSI Design
Power and Energy Figures of Merit
Power consumption in Watts
determines battery life in hours
Peak power
determines power ground wiring designs
sets packaging limits
impacts signal noise margin and reliability analysis
Energy efficiency in Joules
rate at which power is consumed over time
Energy = power * delay
Joules = Watts * seconds
lower energy number means less power to perform a
computation at the same frequency
 A. Milenkovic
7
Advanced VLSI Design
Power versus Energy
Power is height of curve
Watts
Lower power design could simply be slower
Approach 1
Approach 2
time
Energy is area under curve
Two approaches require the same energy
Watts
Approach 1
Approach 2
time
 A. Milenkovic
8
Advanced VLSI Design
PDP and EDP
Power-delay product (PDP ) = P av * tp = (CLV DD2)/2
PDP is the average energyconsumed per switching event
(Watts * sec = Joule)
lower power design could simply be a slower design
• Energy-delay product (EDP) = PDP * tp = P av * t p2
15
Energy-Delay (normalized)
• EDP is the average energy
consumed multiplied by the
computation time required
10
• takes into account that one
can trade increased delay
for lower energy/operation
5
(e.g., via supply voltage
scaling that increases delay,
but decreases energy
0
consumption)
• allows one to understand tradeoffs 0.5better 1
 A. Milenkovic
energy-delay
energy
delay
1.5
2
2.5
Vdd (V)
9
3
Advanced VLSI Design
Understanding Tradeoffs
b
Energy
better
Which design is the “best” (fastest, coolest, both) ?
c
a
d
1/Delay
better
 A. Milenkovic
10
Advanced VLSI Design
Understanding Tradeoffs
Lower
EDP
b
Energy
better
Which design is the “best” (fastest, coolest, both) ?
c
a
d
1/Delay
better
 A. Milenkovic
11
Advanced VLSI Design
CMOS Energy & Power Equations
E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage
f0→1 = P0→1 * fclock
P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage
Dynamic
power
Short -circuit
power
 A. Milenkovic
Leakage
power
12
4
Advanced VLSI Design
Dynamic Power Consumption
Vdd
Vin
Vout
CL
Energy/transition = CL * VDD2 * P0→1
f0→1
P dyn = Energy/transition * f = CL * V DD2 * P 0→1 * f
P dyn = CEFF * VDD2 * f
where CEFF = P0→1 CL
Not a function of transistor sizes!
Data dependent - a function of switching activity!
 A. Milenkovic
13
Advanced VLSI Design
Pop Quiz
Consider a 0.25 micron chip, 500 MHz clock,
average load cap of 15fF/gate (fanout of 4),
2.5V supply.
Dynamic Power consumption per gate is ??
With 1 million gates (assuming each
transitions every clock)
Dynamic Power of entire chip = ??.
 A. Milenkovic
14
Advanced VLSI Design
Lowering Dynamic Power
Capacitance:
Function of fan -out,
wire length, transistor
sizes
Supply Voltage:
Has been dropping
with successive
generations
P dyn = CL V DD2 P 0→1 f
Activity factor:
How often, on average,
do wires switch?
 A. Milenkovic
Clock frequency:
Increasing…
15
5
Advanced VLSI Design
Short Circuit Power Consumption
Vin
Isc
Vout
CL
Finite slope of the input signal causes a direct
current path between V DD and GND for a short
period of time during switching when both the
NMOS and PMOS transistors are conducting.
 A. Milenkovic
16
Advanced VLSI Design
Short Circuit Currents Determinates
Esc = tsc VDD Ipeak P0→1
Psc = tsc VDD Ipeak f0→1
Duration and slope of the input signal, tsc
Ipeak determined by
the saturation current of the P and N
transistors which depend on their sizes,
process technology, temperature, etc.
strong function of the ratio between input
and output slopes
• a function of CL
 A. Milenkovic
17
Advanced VLSI Design
Impact of CL on Psc
Isc ≈ 0
Vin
Isc ≈ I max
Vout
Vin
CL
Vout
CL
Large capacitive load
Small capacitive load
Output fall time significantly larger
than input rise time.
Output fall time substantially
smaller than the input rise time.
 A. Milenkovic
18
6
Advanced VLSI Design
Ipeak as a Function of CL
x 10-4
2.5
When load capacitance
is small, Ipeak is large.
CL = 20 fF
2
Ipeak (A)
1.5
CL = 100 fF
1
0.5
CL = 500 fF
0
0
2
-0.5
4
time (sec)
Short circuit dissipation
is minimized by
matching the rise/fall
times of the input and
x 610-10 output signals - slope
engineering.
500 psec input slope
 A. Milenkovic
19
Advanced VLSI Design
Psc as a Function of Rise/Fall Times
8
When load capacitance
is small (tsin/tsout > 2 for
V DD > 2V) the power is
dominated by Psc
7
P normalized
6
VDD = 3.3 V
5
4
VDD = 2.5 V
3
If VDD < VTn + |VT p| then
P sc is eliminated since
both devices are never
on at the same time.
2
VDD = 1.5V
1
0
tsin/tsout
0
2
4
W/Lp = 1.125 µ m/0.25 µ m
normalized wrt zero input
W/Ln = 0.375 µ m/0.25 µ m
rise-time dissipation
CL = 30 fF
 A. Milenkovic
20
Advanced VLSI Design
Leakage (Static) Power Consumption
VDD Ileakage
Vout
Drain junction
leakage
Gate leakage
Sub-threshold current
Sub-threshold current is the dominant factor.
All increase exponentially with temperature!
 A. Milenkovic
21
7
Advanced VLSI Design
Leakage as a Function of VT
q
Continued scaling of supply voltage and the subsequent
scaling of threshold voltage will make subthreshold
conduction a dominate component of power dissipation.
10-2
ID (A)
q
10-7
VT=0.4V
VT=0.1V
An 90mV/decade VT
roll-off - so each
255mV increase in
V T gives 3 orders of
magnitude reduction
in leakage (but
adversely affects
performance)
10-12
0
0.2
0.4
0.6
0.8
1
VGS (V)
 A. Milenkovic
22
Advanced VLSI Design
TSMC Processes Leakage and VT
CL018
G
CL018
LP
CL018
ULP
V dd
1.8 V
1.8 V
T ox (effective)
42 Å
42 Å
Lgate
0.16 µ m
IDSat (n/p)
(µ A/µ m)
Ioff (leakage)
(ρA/µ m)
V Tn
FET Perf.
(GHz)
CL018
HS
CL015
HS
CL013
HS
1.8 V
2V
1.5 V
1.2 V
42 Å
42 Å
29 Å
24 Å
0.16 µ m
0.18 µ m
0.13 µ m
0.11 µ m
0.08 µ m
600/260
500/180
320/130
780/360
860/370
920/400
20
1.60
0.15
300
1,800
13,000
0.42 V
0.63 V
0.73 V
0.40 V
0.29 V
0.25 V
30
22
14
43
52
80
From MPR, 2000
 A. Milenkovic
23
Advanced VLSI Design
Exponential Increase in Leakage Currents
10000
Ileakage(nA/µm)
1000
0.25
0.18
0.13
0.1
100
10
1
30
40
50
60
70
80
Temp(C)
90
100
110
From De,1999
 A. Milenkovic
24
8
Advanced VLSI Design
Review: Energy & Power Equations
E = CL V DD2 P0→1 + tsc V DD Ipeak P0→1 + V DD Ileakage
f0→1 = P0→1 * fclock
P = CL V DD2 f 0→1 + tscV DD Ipeak f 0→1 + V DD Ileakage
Dynamic power
(~90% today and
decreasing
relatively)
Short -circuit
power
(~8% today and
decreasing
absolutely)
Leakage power
(~2% today and
increasing)
 A. Milenkovic
25
Advanced VLSI Design
Power and Energy Design Space
Constant
Throughput/Latency
Variable
Throughput/Latency
Energy
Design Time
Non-active Modules
Run Time
Active
Logic Design
Reduced Vdd
Sizing
Multi-Vd d
Clock Gating
DFS, DVS
(Dynamic
Freq, Voltage
Scaling)
Leakage
+ Multi-VT
Sleep Transistors
Multi-Vd d
Variable VT
+ Variable VT
 A. Milenkovic
26
Advanced VLSI Design
Dynamic Power as a
Function of Device Size
Device sizing affects dynamic energy consumption
gain is largest for networks with large overall effective fan-outs
(F = C L/Cg,1 )
e.g., for F=20,
fopt(energy) = 3.53 while
fopt(performance) = 4.47
If energy is a concern avoid
oversizing beyond the
optimal
normalized energy
1.5
The optimal gate sizing
factor (f) for dynamic energy
is smaller than the one for
performance, especially for
large F’s
F=1
1
F=2
F=5
0.5
 A. Milenkovic
F=10
F=20
0
1
2
3
4
f
5
6
7
From Nikolic, UCB
27
9
Advanced VLSI Design
Dynamic Power Consumption is
Data Dependent
Switching activity, P 0→1, has two components
A static component – function of the logic topology
A dynamic component – function of the timing behavior ( glitching )
Static transition probability
P 0→1 = Pout=0 x Pout=1
2-input NOR Gate
A
B
Out
0
0
1
0
1
0
1
0
0
1
1
0
= P0 x (1-P0)
With input signal probabilities
PA=1 = 1/2
PB=1 = 1/2
NOR static transition probability
= 3/4 x 1/4 = 3/16
 A. Milenkovic
28
Advanced VLSI Design
NOR Gate Transition Probabilities
Switching activity is a strong function of the input signal
statistics
PA and P B are the probabilities that inputs A and B are one
A
B
0
A
B
CL
PA
1 0
1
PB
P 0→1 = P0 x P 1 = (1-(1-P A )(1-P B )) (1-PA )(1-P B )
 A. Milenkovic
29
Advanced VLSI Design
Transition Probabilities for Some Basic
Gates
NOR
OR
NAND
AND
XOR
P 0→1 = Pout=0 x Pout=1
(1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB )
(1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B ))
P A PB x (1 - P AP B )
(1 - P A PB ) x P A PB
(1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B )
0.5 A
X
Z
0.5 B
For X: P 0→1 =
For Z: P0→1 =
 A. Milenkovic
30
10
Advanced VLSI Design
Transition Probabilities for Some Basic
Gates
NOR
OR
NAND
AND
XOR
P 0→1 = Pout=0 x Pout=1
(1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - PB )
(1 - P A )(1 - PB ) x (1 - (1 - P A )(1 - P B ))
P A PB x (1 - P AP B )
(1 - P A PB ) x P A PB
(1 - (P A + P B - 2P AP B )) x (PA + P B - 2P AP B )
X
0.5 A
Z
0.5 B
For X: P 0→1 = P 0 x P1 = (1-PA ) PA
= 0.5 x 0.5 = 0.25
For Z: P0→1 = P0 x P1 = (1 -P X PB ) PX PB
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
 A. Milenkovic
31
Advanced VLSI Design
Inter-signal Correlations
Determining switching activity is complicated by
the fact that signals exhibit correlation in space
and time
reconvergent fan- out
A
X
0.5 B
0.5
Z
Reconvergent fan-out
P(Z=1) = P(B=1) & P(A=1 | B=1)
Have to use conditional probabilities
 A. Milenkovic
32
Advanced VLSI Design
Inter-signal Correlations
Determining switching activity is complicated by the fact
that signals exhibit correlation in space and time
reconvergent fan- out
(1-0.5)(1-0.5)x(1-(1-0.5)(1- 0.5)) = 3/16
0.5
A
0.5
B
Reconvergent
X
Z
P(Z=1) = P(B=1) * P(X=1 | B=1)
= 0.5 * 1 = 0.5
P(Z=0) = 1 – P(B=1)*P(X=1 | B=1) = 0.5
P(0->1) = 0.5*0.5 = 0.25
P(Z=1) = P(B=1) & P(A=1 | B=1)
Have to use conditional probabilities
 A. Milenkovic
33
11
Advanced VLSI Design
Logic Restructuring
Logic restructuring: changing the topology of a logic network to
reduce transitions
AND: P0→1 = P0 x P1 = (1 - P AP B ) x PAP B
0.5
A
B
0.5
(1-0.25)*0.25 = 3/16
W
7/64
X
15/256
C
F
0.5
D
0.5
3/16
Y
0.5 A
0.5 B
15/256
F
0.5
C
0.5 D
Z
3/16
Chain implementation has a lower overall
switching activity than the tree implementation for
random inputs
Ignores glitching effects
 A. Milenkovic
34
Advanced VLSI Design
Input Ordering
0.5
A
B
0.2
0.2
B
X
C
0.1
F
C
0.1
X
F
A
0.5
Beneficial to postpone the introduction of signals
with a high transition rate (signals with signal
probability close to 0.5)
 A. Milenkovic
35
Advanced VLSI Design
Input Ordering
(1-0.5x0.2)x(0.5x0.2)=0.09
0.5
A
B
0.2
X
C
0.1
F
0.2
B
C
0.1
(1-0.2x0.1)x(0.2x0.1)=0.0196
X
A
0.5
F
Beneficial to postpone the introduction of signals
with a high transition rate (signals with signal
probability close to 0.5)
 A. Milenkovic
36
12
Advanced VLSI Design
Glitching in Static CMOS Networks
Gates have a nonzero propagation delay resulting in
spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle be fore
settling to the correct logic value
A
B
X
Z
C
ABC
101
000
X
Z
Unit Delay
 A. Milenkovic
37
Advanced VLSI Design
Glitching in Static CMOS Networks
Gates have a nonzero propagation delay resulting in
spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle be fore
settling to the correct logic value
A
B
X
Z
C
ABC
101
000
X
Z
Unit Delay
 A. Milenkovic
38
Advanced VLSI Design
Glitching in an RCA
Cin
S14
S15
S2
S0
S1
S Output Voltage (V)
3
2
S3
S4
Cin
S2
S15
S5
1
S10
S1
S0
0
0
2
4
6
Time
(ps)
 A. Milenkovic
8
10
12
39
13
Advanced VLSI Design
Balanced Delay Paths to Reduce Glitching
Glitching is due to a mismatch in the path lengths in the
logic network; if all input signals of a gate change
simultaneously, no glitching occurs
0
0
0
F1
0
0
1
F1
1
F2 2
0
F3
0
0
F3
F2
1
So equalize the lengths of timing paths through logic
 A. Milenkovic
40
Advanced VLSI Design
Power and Energy Design Space
Constant
Throughput/Latency
Variable
Throughput/Latency
Energy
Design Time
Non-active Modules
Run Time
Active
Logic Design
Reduced Vdd
Sizing
Multi-Vd d
Clock Gating
DFS, DVS
(Dynamic
Freq, Voltage
Scaling)
Leakage
+ Multi-VT
Sleep Transistors
Multi-Vd d
Variable VT
+ Variable VT
 A. Milenkovic
41
Advanced VLSI Design
Dynamic Power as a Function
of VDD
5.5
5
4.5
tp(normalized)
Decreasing the VDD
decreases dynamic
energy consumption
(quadratically)
But, increases gate
delay (decreases
performance)
4
3.5
3
2.5
2
1.5
1
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
V DD (V)
Determine the critical path(s) at design time and use high VDD for the
transistors on those paths for speed. Use a lower VDD on the other
gates, especially those that drive large capacitances (as this yields
the largest energy benefits).
 A. Milenkovic
42
14
Advanced VLSI Design
Multiple VDD Considerations
How many VDD? – Two is becoming common
Many chips already have two supplies
(one for core and one for I/O)
When combining multiple supplies, level converters are
required whenever a module at the lower supply drives a
gate at the higher supply (step-up)
If a gate supplied with VDDL drives a gate at VDDH , Vthe PMOS never
DDH
turns off
• The cross-coupled PMOS transistors
do the level conversion
• The NMOS transistor operate on a
Vin
reduced supply
VDDL
Vout
Level converters are not needed
for a step-down change in voltage
Overhead of level converters can be mitigated by doing
conversions at register boundaries and embedding the level
conversion inside the flipflop (see Figure 11.47)
 A. Milenkovic
43
Advanced VLSI Design
Dual-Supply Inside a Logic Block
Minimum energy consumption is achieved if all logic
paths are critical (have the same delay)
Clustered voltage-scaling
Each path starts with VDDH and switches to VDDL (gray logic
gates) when delay slackis available
Level conversion is done in the flipflops at the end of the paths
 A. Milenkovic
44
Advanced VLSI Design
Power and Energy Design Space
Constant
Throughput/Latency
Variable
Throughput/Latency
Energy
Design Time
Non- active Modules
Run Time
Active
Logic Design
Reduced Vdd
Sizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic
Freq, Voltage
Scaling)
Leakage
+ Multi-VT
Sleep Transistors
Multi-Vdd
Variable V T
+ Variable VT
 A. Milenkovic
45
15
Advanced VLSI Design
Stack Effect
Leakage is a function of the circuit topology and the value of the
inputs
V T = VT 0 + γ(√|-2φF + VSB | - √|-2φF|)
where VT0 is the threshold voltage at V SB = 0 ; VS B is the source- bulk
(substrate) voltage; γ is the body-effect coefficient
A B
A
VX
ISUB
0
0
V T ln(1+n) V GS=VBS = -V X
0
1
0
V GS=VBS =0
1
0
V DD -V T
V GS=VBS =0
1
1
0
V SG=VSB =0
B
Out
A
VX
B
Leakage is least when A = B = 0
Leakage reduction due to stacked
transistors is called the stack effect
 A. Milenkovic
46
Advanced VLSI Design
Short Channel Factors and Stack Effect
In short-channel devices, the subthreshold leakage
current depends on VGS ,VBS and VDS. The VT of a
short-channel device decreases with increasing VDS
due to DIBL (drain-induced barrier loading).
Typical values for DIBL are 20 to 150mV change in VT per
voltage change in VD S so the stack effect is even more
significant for short-channel devices.
VX reduces the drain-source voltage of the top nfet, increasing
its V T and lowering its leakage
For our 0.25 micron technology, VX settles to ~100mV in
steady state so VBS = -100mV and VDS = VDD -100mV
which is 20 times smaller than the leakage of a device
with V BS = 0mV and VDS = V DD
 A. Milenkovic
47
Advanced VLSI Design
Leakage as a Function of Design Time VT
90mV reduction in V T
increases leakage by an
order of magnitude
ID (A)
Reducing the VT increases
the sub-threshold leakage
current (exponentially)
But, reducing VT decreases
gate delay (increases
performance)
VT=0.4V
VT=0.1V
0
0.2
0.4
0.6
0.8
1
VGS (V)
Determine the critical path(s) at design time and use low VT devices
on the transistors on those paths for speed. Use a high V T on the
other logic for leakage control.
A careful assignment of VT’s can reduce the leakage by as much as 80%
 A. Milenkovic
48
16
Advanced VLSI Design
Dual-Thresholds Inside a Logic Block
Minimum energy consumption is achieved if all logic
paths are critical (have the same delay)
Use lower threshold on timing-critical paths
Assignment can be done on a per gate or transistor basis;
no clustering of the logic is needed
No level converters are needed
 A. Milenkovic
49
Advanced VLSI Design
Variable V T (ABB) at Run Time
V T = V T0 + γ(√|- 2φF + V SB | - √|-2φF |)
• For an n-channel device, the substrate is normally tied
to ground (VSB = 0)
• A negative bias on V SB
causes VT to increase
• Requires a dual well fab
process
VT (V)
• Adjusting the substrate
bias at run time is called
adaptive body-biasing
(ABB)
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
-2.5
 A. Milenkovic
-2
-1.5
-1
VS B (V)
-0.5
0
50
17
Download