Logic-Level Power Estimation ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits

advertisement
ELEC 5270/6270 Fall 2007
Low-Power Design of Electronic Circuits
Logic-Level Power Estimation
Vishwani D. Agrawal
James J. Danaher Professor
Dept. of Electrical and Computer Engineering
Auburn University, Auburn, AL 36849
vagrawal@eng.auburn.edu
http://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Fall07/course.html
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
1
Power Analysis

Motivation:
Specification
 Optimization
 Reliability


Applications
Design analysis and optimization
 Physical design
 Packaging
 Test

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
2
Abstraction, Complexity, Accuracy
Abstraction level
Algorithm
Computing resources
Analysis accuracy
Least
Worst
Most
Best
Software and system
Hardware behavior
Register transfer
Logic
Circuit
Device
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
3
Spice

Circuit/device level analysis




Analysis is accurate but expensive


Circuit modeled as network of transistors, capacitors, resistors
and voltage/current sources.
Node current equations using Kirchhoff’s current law.
Average and instantaneous power computed from supply
voltage and device current.
Used to characterize parts of a larger circuit.
Original references:


L. W. Nagel and D. O. Pederson, “SPICE – Simulation Program
With Integrated Circuit Emphasis,” Memo ERL-M382, EECS
Dept., University of California, Berkeley, Apr. 1973.
L. W. Nagel, SPICE 2, A Computer program to Simulate
Semiconductor Circuits, PhD Dissertation, University of
California, Berkeley, May 1975.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
4
Logic Model of MOS Circuit
pMOS FETs
VDD
a
Da
a
Ca
c
Cc
b
Cb nMOS
FETs
Cd
Ca , Cb , Cc and Cd are
node capacitances
Copyright Agrawal, 2007
b
c
Dc
Db
Da and Db are
interconnect or
propagation delays
Dc is inertial delay
of gate
ELEC6270 Fall 07, Lecture 3
5
Spice Characterization of a 2-Input
NAND Gate
Input data pattern
Delay (ps)
Dynamic energy (pJ)
a=b=0→1
69
1.55
a = 1, b = 0 → 1
62
1.67
a = 0 → 1, b = 1
50
1.72
a=b=1→0
35
1.82
a = 1, b = 1 → 0
76
1.39
a = 1 → 0, b = 1
57
1.94
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
6
Spice Characterization (Cont.)
Input data pattern
Static power (pW)
a=b=0
5.05
a = 0, b = 1
13.1
a = 1, b = 0
5.10
a=b=1
28.5
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
7
Switch-Level Partitioning


Circuit partitioned into channel-connected components
for Spice characterization.
Reference: R. E. Bryant, “A Switch-Level Model and
Simulator for MOS Digital Systems,” IEEE Trans.
Computers, vol. C-33, no. 2, pp. 160-177, Feb. 1984.
Internal
switching
nodes not
seen by
logic
simulator
Copyright Agrawal, 2007
G2
G1
G3
ELEC6270 Fall 07, Lecture 3
8
Delay and Discrete-Event Simulation
Inputs
(NAND gate)
Transient
region
a
b
c (CMOS)
Logic simulation
c (zero delay)
c (unit delay)
X
c (multiple delay)
Unknown (X)
c (minmax delay)
0
Copyright Agrawal, 2007
5
ELEC6270 Fall 07, Lecture 3
rise=5, fall=5
min =2, max =5
Time units
9
Event-Driven Simulation Example
Scheduled
events
a =1
c =1→0
e =1
g =1
2
2
d=0
4
b =1
f =0
Time stack
2
t=0
1
2
3
4
5
6
7
8
Activity
list
c=0
d, e
d = 1, e = 0
f, g
g=0
f=1
g
g=1
g
0
Copyright Agrawal, 2007
4
8
Time, t
ELEC6270 Fall 07, Lecture 3
10
Time Wheel (Circular Stack)
Current
time
pointer
max
t=0
1
Event link-list
2
3
4
5
6
7
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
11
Gate-Level Power Analysis

Pre-simulation analysis:
Partition circuit into channel connected gate
components.
 Determine node capacitances from layout
analysis (accurate) or from wire-load model*
(approximate).
 Determine dynamic and static power from
Spice for each gate.
 Determine gate delays using Spice or Elmore
delay model.

* Wire-load model estimates capacitance of a net by its pin-count.
See Yeap, p. 39.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
12
Elmore Delay Model

W. Elmore, “The Transient Response of Damped Linear Networks
with Particular Regard to Wideband Amplifiers,” J. Appl. Phys., vol.
19, no.1, pp. 55-63, Jan. 1948.
2
R2
C2
1
R1
s
4
R4
C1
C4
R3
3
Shared resistance:
R5
C3
R45 = R1 + R3
R15 = R1
R34 = R1 + R3
Copyright Agrawal, 2007
5
C5
ELEC6270 Fall 07, Lecture 3
13
Elmore Delay Formula
N
Delay at node k = 0.69 Σ Cj × Rjk
j=1
where N = number of capacitive nodes in the network
Example:
Delay at node 5 = 0.69[R1 C1 + R1 C2 + (R1+R3)C3 + (R1+R3)C4
+ (R1+R3+R5)C5]
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
14
Gate-Level Power Analysis (Cont.)


Run discrete-event (event-driven) logic
simulation with a set of input vectors.
Monitor the toggle count of each net and obtain
capacitive power dissipation:
Pcap = Σ Ck V 2 f
all nodes k

Where:



Ck is the total node capacitance being switched, as
determined by the simulator.
V is the supply voltage.
f is the clock frequency, i.e., the number of vectors applied
per unit time
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
15
Gate-Level Power Analysis (Cont.)

Monitor dynamic energy events at the
input of each gate and obtain internal
switching power dissipation:
Pint = Σ
Σ
E(g,e) F(g,e)
gates g

events e
Where
 E(g,e)
= energy of event e of gate g, pre-computed
from Spice.
 F(g,e) = occurrence frequency of the event e at
gate g, observed by logic simulation.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
16
Gate-Level Power Analysis (Cont.)

Monitor the static power dissipation state of each
gate and obtain the static power dissipation:
Pstat = Σ
gates g

Σ P(g,s) T(g,s)/ T
states s
Where



P(g,s) = static power dissipation of gate g for state s,
obtained from Spice.
T(g,s) = duration of state s at gate g, obtained from logic
simulation.
T = vector period.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
17
Gate-Level Power Analysis

Sum up all three components of power:

P = Pcap + Pint + Pstat
References:



A. Deng, “Power Analysis for CMOS/BiCMOS Circuits,” Proc.
International Workshop Low Power Design, 1994.
J. Benkoski, A. C. Deng, C. X. Huang, S. Napper and J.
Tuan, “Simulation Algorithms, Power Estimation and
Diagnostics in PowerMill,” Proc. PATMOS, 1995.
C. X. Huang, B. Zhang, A. C. Deng and B. Swirski, “The
Design and Implementation of PowerMill,” Proc. International
Symp. Low Power Design, 1995, pp. 105-109.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
18
Probabilistic Analysis

View signals as a random processes
Prob{s(t) = 1} = p1
p0 = 1 – p1
C
0→1 transition probability = (1 – p1) p1
Power, P = (1 – p1) p1 CV2fck
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
19
Source of Inaccuracy
p1 = 0.5
P = 0.5CV 2 fck
1/fck
p1 = 0.5
P = 0.33CV 2 fck
p1 = 0.5
P = 0.167CV 2 fck
Observe that the formula, Power, P = (1 – p1) p1 C V 2 fck , is not
Correct.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
20
Switching Frequency
Number of transitions per unit time:
T
=
N(t)
───
t
For a continuous signal:
T
=
lim
t→∞
N(t)
───
t
T is defined as transition density.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
21
Static Signal Probabilities

Observe signal for interval t 0 + t 1
Signal is 1 for duration t 1
 Signal is 0 for duration t 0
 Signal probabilities:



p 1 = t 1/(t 0 + t 1)
p 0 = t 0/(t 0 + t 1) = 1 – p 1
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
22
Static Transition Probabilities

Transition probabilities:




T 01 = p 0 Prob{signal is 1 | signal was 0} = p 0 p1
T 10 = p 1 Prob{signal is 0 | signal was 1} = p 1 p 0
T = T 01 + T 10 = 2 p 0 p 1 = 2 p 1 (1 – p 1)
Transition density: T = 2 p 1 (1 – p 1)
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
23
Static Transition Frequency
f = p1(1 – p1)
0.25
0.2
0.1
0.0
0
Copyright Agrawal, 2007
0.25
0.5
p1
ELEC6270 Fall 07, Lecture 3
0.75
1.0
24
Inaccuracy in Transition Density
p1 = 0.5
T = 1.0
1/fck
p1 = 0.5
T = 4/6
p1 = 0.5
T = 1/6
Observe that the formula, T = 2 p1 (1 – p1), is not correct.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
25
Cause for Error and Correction
Probability of transition is not independent of
the present state of the signal.
 Determine probability p 01 of a 0→1
transition.
 Recognize p 01 ≠ p 0 × p 1
 We obtain p 1 = (1 – p 1)p 01 + p 1 p 11
p 01
p 1 = ─────────
1 – p 11 + p 01

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
26
Correction (Cont.)
Since p 11 + p 10 = 1, i.e., given that the
signal was previously 1, its present value
can be either 1 or 0.
 Therefore,
p 01
p 1 = ──────
p 10 + p 01
This uniquely gives signal probability as a
function of transition probabilities.

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
27
Transition and Signal Probabilities
p01 = p10 = 0.5
p1 = 0.5
1/fck
p01 = p10 = 1/3
p1 = 0.5
p01 = p10 = 1/6
p1 = 0.5
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
28
Probabilities: p0, p1, p00, p01, p10, p11



p 01 + p 00 =1
p 11 + p 10 = 1
p0=1–p1
p 01
p 1 = ───────
p 10 + p 01
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
29
Transition Density

T = 2 p 1 (1 – p 1) = p 0 p 01 + p 1 p 10
= 2 p 10 p 01 / (p 10 + p 01)
= 2 p 1 p 10 = 2 p 0 p 01
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
30
Power Calculation
Power can be estimated if transition
density is known for all signals.
 Calculation of transition density requires

Signal probabilities
 Transition densities for primary inputs;
computed from vector statistics

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
31
Signal Probabilities
x1
x1 x2
x2
x1
x1 + x2 – x1x2
x2
1 - x1
x1
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
32
Signal Probabilities
x1
x2
x3
X1
0
0
0
0
1
1
1
1
0.5
x1 x2
0.25
0.5
0.625
0.5
X2
0
0
1
1
0
0
1
1
Copyright Agrawal, 2007
X3
0
1
0
1
0
1
0
1
Y
1
0
1
0
1
0
1
1
y = 1 - (1 - x1x2) x3
= 1 - x3 + x1x2x3
= 0.625
Ref: K. P. Parker and E. J. McCluskey,
“Probabilistic Treatment of General
Combinational Networks,” IEEE Trans.
on Computers, vol. C-24, no. 6, pp. 668670, June 1975.
ELEC6270 Fall 07, Lecture 3
33
Correlated Signal Probabilities
x1
0.5
x1 x2
0.5
0.25
0.625?
x2
X1
0
0
1
1
Copyright Agrawal, 2007
X2
0
1
0
1
y = 1 - (1 - x1x2) x2
= 1 – x2 + x1x2x2
= 1 – x2 + x1x2
= 0.75 (correct value)
Y
1
0
1
1
ELEC6270 Fall 07, Lecture 3
34
Correlated Signal Probabilities
x1
x2
0.5
x1 + x2 – x1x2
0.5
0.75
X1
0
0
1
1
Copyright Agrawal, 2007
X2
0
1
0
1
0.375?
y = (x1 + x2 – x1x2) x2
= x1x2 + x2x2 – x1x2x2
= x1x2 + x2 – x1x2
= x2
= 0.5 (correct value)
Y
0
1
0
1
ELEC6270 Fall 07, Lecture 3
35
Observation

Numerical computation of signal
probabilities is accurate for fanout-free
circuits.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
36
Remedies
Use Shannon’s expansion theorem to
compute signal probabilities.
 Use Boolean difference formula to
compute transition densities.

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
37
Shannon’s Expansion Theorem


C. E. Shannon, “A Symbolic Analysis of Relay
and Switching Circuits,” Trans. AIEE, vol. 57,
pp. 713-723, 1938.
Consider:




Boolean variables, X1, X2, . . . , Xn
Boolean function, F(X1, X2, . . . , Xn)
Then F = Xi F(Xi=1) + Xi’ F(Xi=0)
Where


Xi’ is complement of X1
Cofactors, F(Xi=j) = F(X1, X2, . . , Xi=j, . . , Xn), j = 0 or 1
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
38
Expansion About Two Inputs
F = XiXj F(Xi=1, Xj=1) + XiXj’ F(Xi=1, Xj=0)
+ Xi’Xj F(Xi=0, Xj=1)
+ Xi’Xj’ F(Xi=0, Xj=0)
 In general, a Boolean function can be
expanded about any number of input
variables.
 Expansion about k variables will have 2k
terms.

Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
39
Correlated Signal Probabilities
X1
X1 X2
Y = X1 X2 + X2’
X2
X1
0
0
1
1
Copyright Agrawal, 2007
X2
0
1
0
1
Y
1
0
1
1
Shannon expansion about the
reconverging input, X2:
Y = X2 Y(X2 = 1) + X2’ Y(X2 = 0)
= X2 (X1) + X2’ (1)
ELEC6270 Fall 07, Lecture 3
40
Correlated Signals

When the output function is expanded about all
reconverging input variables,




All cofactors correspond to fanout-free circuits.
Signal probabilities for cofactor outputs can be calculated
without error.
A weighted sum of cofactor probabilities gives the correct
probability of the output.
For two reconverging inputs:
f = xixj f(Xi=1, Xj=1) + xi(1-xj) f(Xi=1, Xj=0)
+ (1-xi)xj f(Xi=0, Xj=1) + (1-xi)(1-xj) f(Xi=0, Xj=0)
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
41
Correlated Signal Probabilities
X1
X1 X2
Y = X1 X2 + X2’
X2
X1
0
0
1
1
X2
0
1
0
1
Y
1
0
1
1
Shannon expansion about the
reconverging input, X2:
Y = X2 Y(X2=1) + X2’ Y(X2=0)
= X2 (X1) + X2’ (1)
y = x2 (0.5) + (1-x2) (1)
= 0.5 (0.5) + (1-0.5) (1)
= 0.75
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
42
Example
0.5
0.5
1
0
0.5
Reconv.
signal
Supergate
0.25
0.5
0.0
Point of
reconv.
0.0 0.5
1.0
0.5
1.0
0.375
Signal probability for supergate output
= 0.5 Prob{rec. signal = 1} + 1.0 Prob{rec. signal = 0}
= 0.5 × 0.5 + 1.0 × 0.5 = 0.75
S. C. Seth and V. D. Agrawal, “A New Model for Computation of
Probabilistic Testability in Combinational Circuits,” Integration, the VLSI
Journal, vol. 7, no. 1, pp. 49-75, April 1989.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
43
Probability Calculation Algorithm

Partition circuit into supergates.



Definition: A supergate is a circuit partition with a single output
such that all fanouts that reconverge at the output are contained
within the supergate.
Identify reconverging and non-reconverging inputs
of each supergate.
Compute signal probabilities from PI to PO:

For a supergate whose input probabilities are known



Copyright Agrawal, 2007
Enumerate reconverging input states
For each input state do gate by gate probability computation
Sum up corresponding signal probabilities, weighted by state
probabilities
ELEC6270 Fall 07, Lecture 3
44
Calculating Transition Density
..
..
.
1
x1, T1
xn, Tn
Copyright Agrawal, 2007
Boolean
function
y, T(Y) = ?
n
ELEC6270 Fall 07, Lecture 3
45
Boolean Difference
Boolean diff(Y, Xi) =



∂Y
── = Y(Xi=1) ⊕ Y(Xi=0)
∂Xi
Boolean diff(Y, Xi) = 1 means that a path is sensitized from input
Xi to output Y.
Prob(Boolean diff(Y, Xi) = 1) is the probability of transmitting a
toggle from Xi to Y.
Probability of Boolean difference is determined from the
probabilities of cofactors of Y with respect to Xi.
F. F. Sellers, M. Y. Hsiao and L. W. Bearnson, “Analyzing Errors with
the Boolean Difference,” IEEE Trans. on Computers, vol. C-17, no. 7,
pp. 676-683, July 1968.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
46
Transition Density
n
T(y) = Σ T(Xi) Prob(Boolean diff(Y, Xi) = 1)
i=1
F. Najm, “Transition Density: A New Measure of Activity in Digital
Circuits,” IEEE Trans. CAD, vol. 12, pp. 310-323, Feb. 1993.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
47
Power Computation



For each primary input, determine signal probability and
transition density for given vectors.
For each internal node and primary output Y, find the
transition density T(Y), using supergate partitioning and
the Boolean difference formula.
Compute power,
Σ
0.5CY V2 T(Y)
all Y
where CY is the capacitance of node Y and V is supply
voltage.
P=
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
48
Transition Density and Power
X1
X2
X3
0.2, 1
0.3, 2
0.06, 0.7
0.436, 3.24
Ci
Y
0.4, 3
CY
Transition density
Signal probability
Power = 0.5 V 2 (0.7Ci + 3.24CY)
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
49
Prob. Method vs. Logic Sim.
Probability method
Logic Simulation
Av. density
CPU s*
Error
%
0.52
3.39
63
+2.1
11.36
0.58
8.57
241
+29.8
383
2.78
1.06
3.25
132
-14.5
C1355
346
4.19
1.39
6.18
408
-32.2
C1908
880
2.97
2.00
5.01
464
-40.7
C2670
1193
3.50
3.45
4.00
619
-12.5
C3540
1669
4.47
3.77
4.49
1082
-0.4
C5315
2307
3.52
6.41
4.79
1616
-26.5
C6288
2406
25.10
5.67
34.17
31057
-26.5
3.83
9.85
5.08
2713
-24.2
Circuit
No. of
gates
C432
160
3.46
C499
202
C880
C7552 3512
* CONVEX c240
Copyright Agrawal, 2007
Av. density CPU s*
ELEC6270 Fall 07, Lecture 3
50
Probability Waveform Methods




F. Najm, R. Burch, P. Yang and I. Hajj, “CREST – A
Current Estimator for CMOS Circuits,” Proc. IEEE Int.
Conf. on CAD, Nov. 1988, pp. 204-207.
C.-S. Ding, et al., “Gate-Level Power Estimation using
Tagged Probabilistic Simulation,” IEEE Trans. on CAD,
vol. 17, no. 11, pp. 1099-1107, Nov. 1998.
F. Hu and V. D. Agrawal, “Dual-Transition Glitch
Filtering in Probabilistic Waveform Power Estimation,”
Proc. IEEE Great Lakes Symp. VLSI, Apr. 2005, pp.
357-360.
F. Hu and V. D. Agrawal, “Enhanced Dual-Transition
Probabilistic Power Estimation with Selective
Supergate Analysis,” Proc. IEEE Int. Conf. Computer
Design, Oct. 2005. pp. 366-369.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
51
Problem 1
For equiprobable inputs analyze the 0→1 transition probabilities of all
gates in the two implementations of a four-input AND gate shown
below. Assuming that the gates have zero delays, which
implementation will consume less average dynamic power?
E
A
B
C
D
F
G
A
B
C
D
G
F
Tree structure
Chain structure
Copyright Agrawal, 2007
E
ELEC6270 Fall 07, Lecture 3
52
Problem 1 Solution
Given the primary input probabilities, P(A) = P(B) = P(C) = P(D) = 0.5,
signal and transition (0→1) probabilities are as follows:
Chain
Tree
Signal
name
Prob(sig.= 1)
Prob(0→1)
Prob(sig.=1)
Prob(0→1)
E
0.2500
0.1875
0.2500
0.1875
F
0.1250
0.1094
0.2500
0.1875
G
0.0625
0.0586
0.0625
0.0586
Total
transitions/vector
0.3555
0.4336
The tree implementation consumes 100×(0.4336 – 0.3555)/0.3555 = 22%
more average dynamic power. This advantage of the chain structure may
be somewhat reduced because of glitches caused by unbalanced path
delays.
Copyright Agrawal, 2007
ELEC6270 Fall 07, Lecture 3
53
Problem 2
Assume that the two-input AND gates in Problem 1 each has one unit
of delay. Find input vector pairs for each implementation that will
consume the peak dynamic power. Which implementation consumes
less peak dynamic power?
E
A
B
C
D
F
G
A
B
C
D
G
F
Tree structure
Chain structure
Copyright Agrawal, 2007
E
ELEC6270 Fall 07, Lecture 3
54
Problem 2 Solution
For the chain structure, a vector pair {A B C D} = {1110},{1011} will
produce four gate transitions as shown below.
A
E
F
B
C
D
G
A=11
B=10
E=10
C=11
F=10
D=01
G=00
0
Copyright Agrawal, 2007
1
2
3
ELEC6270 Fall 07, Lecture 3
Time units
55
Problem 2 Solution (Cont.)
The tree structure has balanced delay paths. So it cannot make more
than 3 gate transitions. A vector pair {ABCD} = {1111},{1010} will
produce three transitions as shown below.
A
E
B
G
C
D
F
A=11
Therefore, just counting the gate
transitions, we find that the chain
consumes 100(4 – 3)/3 = 33%
higher peak power than the tree.
B=10
E=10
C=11
D=10
F=10
G=10
Time units
0
1
2
Copyright Agrawal, 2007
3
ELEC6270 Fall 07, Lecture 3
56
Download