Logical Effort

advertisement
Outline
Introduction to
CMOS VLSI
Design
‰
‰
‰
‰
‰
‰
Lecture 7B:
Logical Effort
Introduction
Delay in a Logic Gate
Multistage Logic Networks
Choosing the Best Number of Stages
Example
Summary
Lecture by Jay Brockman
University of Notre Dame Fall 2008
Modified by Peter Kogge Fall 2009
Based on lecture slides by David Harris, Harvey Mudd College
http://www.cmosvlsi.com/coursematerials.html
6: Logical Effort
Slide 1
Review
‰ Assume gate G1 is driving some # of other gates G2
– Fanout = number of such gates being driven
‰ Delay of a gate G1 = parasitic delay + effort delay
‰ Parasitic Delay = delay if gate G1 is driving 0 load
– Function of diffusion capacitance in gate
– Delay seen when G1 drives no other circuits
‰ Effort Delay: delay due to capacitance of circuits
driven by G; function of
– the number of gates of type G2 being driven
– the input capacitance presented by a G2 gate
CMOS VLSI Design
5: Logical Effort
CMOS VLSI Design
Slide 2
Normalized Delay
‰ Normalized Delay: gate delay relative to:
– “fanout of 1” inverter (drives one inverter)
– with no parasitic capacitance
– of value 3RC
• R = eqvt resistance of unit nmos transistor in
saturation
• C = eqvt capacitance of gate of unit nmos
transistor
CMOS VLSI Design
Normalized Parasitic Delay
‰ Inverter has 3 units of diffusion capacitance
– 2 from pmos
– 1 from nmos
‰ Parasitic delay of inverter is τ =3RC
‰ Normalized Parasitic Delay of a gate is p/3RC
– p is parasitic delay from Elmore model
‰ To convert to psec, multiply by parasitic delay of an
inverter in chosen technology
CMOS VLSI Design
Normailzed Effort Delay
‰ h = fanout or electrical effort = property of circuit
– = # equivalent G1 gates being driven by G1
– = Cout/Cin where
• Cout = total load capacitance presented by G2 inputs
• Cin = input capacitance G1 presents on its sources
– When gates G2 (those being driven) are type G1, then
• h = # of copies
‰ g = logical effort to drive a gate of type G2
– effort required to drive one gate vs perfect inverter
– how many eqvt invertors one G2 gate looks like
– = (Input cap of G2 gate)/(Input cap of inverter)
– = (Input cap of G2 gate)/3
‰ Inverter has logical effort = 1
CMOS VLSI Design
Summary
Rise Time
Input Cap per load
# of loads
Common Gates
Gate type
Number of inputs
1
Inverter
R
Fall Time is similar
Cdiff
hCin
# of G2s
Logical effort of type G2
delay = RCdiff + hRCin
NOR
2
XOR, XNOR
4/3
5/3
6/3
(n+2)/3
5/3
7/3
9/3
(2n+1)/3
2
2
2
2
4, 4
6, 12, 6
8, 16, 16, 8
2
3
4
n
2
3
4
n
1
NOR
pG1 + h*gG2
n
Tristate / mux
XOR, XNOR
2
2
3
4
n
4
6
8
2n
4
6
8
3RC
CMOS VLSI Design
g=Logical Effort
Number of inputs
1
NAND
-------------------------- =
4
Gate type
Inverter
relative-delay = RCdiff + hRCin
3
1
NAND
Tristate / mux
2
CMOS VLSI Design
p=Normalized Parasitic
n-input NAND Gates
parasitic delay = (N2/2 + 5/2N)RC
SLOW!!!!!!
CMOS VLSI Design
MultiStage Logic Networks
Relative Input Capacitance
(based on gate design & transistor size)
gi = logical effort to drive a gate of type i = input cap/cap of inverter
hi = fanout of gates of type i = load cap/input cap
Question: if delay thru one gate is p + hg,
can we write delay thru multistage as some P+HG?
CMOS VLSI Design
Scaling Transistors
‰ What if all transistors in gate G got wider by k?
– Denote as gate “G(k)”
‰ Parasitic delay of G(k): delay of unloaded gate
– Diffusion capacitance increases by k
– Resistance decreases by k
– Result: No change
‰ Effort delay: ratio of load cap to input cap
– If drive same # of G(k) as before, no change
– If drive same # of G(1) as before, decrease by 1/k
‰ Result: fanout to type G(1) gates increases by k
CMOS VLSI Design
Overall Delay
‰ delay = ∑delay(i)
– where delay(i) = delay thru I’th “stage” of logic
‰ delay(i) = pi + hi * gi
– pi function only of gate type at stage i
– gi function only of gate type at stage i
• input cap/cap of inverter
– hi depends on gates at stage i+1
• total load on gate i/input cap of gate I
‰ delay = ∑(pi + hi * gi ) = ∑(pi ) + ∑(hi * gi )
‰ Can we write ∑(hi * gi ) as some H*G?
CMOS VLSI Design
‰
Paths that Branch
Definitions
Path Logical Effort G =
∏g
Branch point
‰ No! Consider paths that branch:
‰ Individual terms
15
g1 = g2 = 1 (inverters)
h1 = (15 +15) / 5 = 6
5
h2 = 90 / 15 = 6
15
‰ Path Terms
G = Πgi = 1x1 = 1
H = Cout/Cin = 90 / 5 = 18
GH = 18
F = g1h1g2h2 = 36 = 2GH != GH
i
‰ Path Electrical Effort
H=
Cout-path
F =∏
‰ Path Effort
10
x
g2 = 5/3
h2 = y/x
g1 = 1
h1 = x/10
Cin-path
f i = ∏ gi hi
y
g3 = 4/3
h3 = z/y
z
g4 = 1
h4 = 20/z
20
90
90
Question: Can we write F = GH?
5: Logical Effort
CMOS VLSI Design
Slide 13
Branching Effort
‰ Path Delay
‰ Path Effort Delay
Con path
Note:
‰ Now we compute the Path Effort
– F = GBH
CMOS VLSI Design
∏h
i
Slide 14
D = ∑ d i = DF + P
‰ Path Parasitic Delay
Con path + Coff path
B = ∏ bi
5: Logical Effort
CMOS VLSI Design
Multistage Delays
‰ Introduce Branching Effort
– Accounts for branching between stages in path
b=
5: Logical Effort
P = ∑ pi
DF = ∑ f i
= BH
Slide 15
5: Logical Effort
CMOS VLSI Design
Slide 16
Designing Fast Circuits
Gate Sizes
‰ How wide should the gates be for least delay?
D = ∑ d i = DF + P
fˆ = gh = g CCoutin
‰ Delay is smallest when each stage bears same effort
1
fˆ = gi hi = F N
⇒ Cini =
‰ Thus minimum delay of N stage path is
‰ Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
‰ Check work by verifying input cap spec is met.
1
D = NF N + P
‰ This is a key result of logical effort
– To find fastest possible delay
– Doesn’t require calculating gate sizes
5: Logical Effort
CMOS VLSI Design
Slide 17
5: Logical Effort
Example: 3-stage path
A
8
5: Logical Effort
x
CMOS VLSI Design
y
45
y
B
45
Slide 19
Slide 18
x
x
A
x
CMOS VLSI Design
Example: 3-stage path
‰ Select gate sizes x and y for least delay from A to B
x
gi Couti
fˆ
8
x
y
45
y
Logical Effort
Electrical Effort
Branching Effort
Path Effort
Best Stage Effort
Parasitic Delay
Delay
5: Logical Effort
B
45
G=
H=
B=
F=
fˆ =
P=
D=
CMOS VLSI Design
Slide 20
Example: 3-stage path
x
x
A
8
x
45
B
45
Logical Effort
Electrical Effort
Branching Effort
Path Effort
Best Stage Effort
Parasitic Delay
Delay
5: Logical Effort
‰ Work backward for sizes
y=
x=
y
y
G = (4/3)*(5/3)*(5/3) = 100/27
H = 45/8
B=3*2=6
F = GBH = 125
fˆ = 3 F = 5
x
8
P=2+3+2=7
D = 3*5 + 7 = 22 = 4.4 FO4
CMOS VLSI Design
Slide 21
45
P: 4
N: 6
CMOS VLSI Design
P: 12
N: 3
B
5: Logical Effort
x
y
45
y
CMOS VLSI Design
B
45
Slide 22
Choosing Best # of Stages
‰ Goal: estimate delay & choose transistor sizes
‰ Many different topologies (combinations of gate types)
that implement same function
‰ We know in general
– NANDs better than NORs
– Gates with fewer inputs better than more inputs
‰ Typical shortcut: estimate delay by # of stages
– Assuming constant “gate delay”
– and thus shorter paths are faster
‰ THIS IS NOT ALWAYS TRUE!
– Adding inverters at end with increasing sizes
can speed up circuit, esp. when high load
‰ Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
5: Logical Effort
x
A
Example: 3-stage path
A P: 4
N: 4
Example: 3-stage path
45
Slide 23
CMOS VLSI Design
Best Number of Stages
Example (p. 178)
(p. 178)
‰ How many stages should a path use?
– Minimizing number of stages is not always fastest
‰ Example: drive 64-bit datapath with unit inverter
InitialDriver
1
1
1
‰ How many stages should a path use?
– Minimizing number of stages is not always fastest
‰ Example: drive 64-bit datapath with unit inverter
1
InitialDriver
D =
1
D = NF1/N + P
= N(64)1/N + N
1
1
1
8
4
2.8
16
8
23
DatapathLoad
N:
f:
D:
5: Logical Effort
64
1
64
2
64
3
CMOS VLSI Design
64
DatapathLoad
N:
f:
D:
4
Slide 25
5: Logical Effort
General Derivation
D = NF + ∑ pi + ( N − n1 ) pinv
1
N
Logic Block:
n1Stages
Path Effort F
N - n1 ExtraInverters
N total stages with (N-n1)
Inverters
• do not change logical effort
• do add parasitic delay
CMOS VLSI Design
3
4
15
64
4
2.8
15.3
Fastest
CMOS VLSI Design
Slide 26
‰
pinv + ρ (1 − ln ρ ) = 0 has no closed-form solution
‰ Neglecting parasitics (pinv = 0), we find ρ = 2.718 (e)
‰ Again,
– these ρ values are best logical effort per stage
^ = log
– when you have N
ρ F stages
pinv + ρ (1 − ln ρ ) = 0
5: Logical Effort
2
8
18
64
‰ For pinv = 1, solve numerically for ρ = 3.59
i =1
1
1
1
∂D
= − F N ln F N + F N + pinv = 0
∂N
1
‰ Define best stage effort ρ = F N
1
64
65
64
Best Stage Effort
‰ Consider adding inverters to end of n1 stage path
– How many give least delay?
n1
64
Slide 27
5: Logical Effort
CMOS VLSI Design
Slide 28
Sensitivity Analysis
1st Example, Revisited
D(N) /D(N)
‰ How sensitive is delay to using exactly the best
1.6
1.51
number of stages?
1.4
‰ Ben Bitdiddle is the memory designer for the Motoroil 68W86,
an embedded automotive processor. Help Ben design the
decoder for a register file.
A[3:0] A[3:0]
1.26
1.2
32 bits
1.15
1.0
0.5
0.7
1.0
N/ N
1.4
2.0
= actual N vs optimal N
‰ 2.4 < ρ < 6 gives delay within 15% of optimal
– We can be sloppy!
– I like ρ = 4
5: Logical Effort
CMOS VLSI Design
Slide 29
What Does This Mean?
‰ 16 word register file
– There are 16 separate row lines
– Branching factor of 16 at end
‰ Each word is 32 bits wide & each bit presents load
of 3 unit-sized transistors
– The load on each row line is 32*3
‰ True and complementary address inputs A[3:0]
– Any address input needed for only 8 row lines
‰ Each input may drive 10 unit-sized transistors
– Total input capacitance from 1st stage gates on
inputs = 10
5: Logical Effort
CMOS VLSI Design
Slide 31
‰ Decoder specifications:
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
‰ Ben needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
5: Logical Effort
16
Register File
CMOS VLSI Design
Slide 30
Number of Stages
‰ Decoder effort is mainly electrical and branching
Electrical Effort:
H=
Branching Effort:
B=
‰ If we neglect logical effort (assume G = 1)
Path Effort:
F=
Number of Stages:
5: Logical Effort
N=
CMOS VLSI Design
Slide 32
16 words
0.0
4:16 Decoder
(ρ =2.4)
(ρ=6)
Number of Stages
3 Stage Gate Sizes & Delay
‰ Decoder effort is mainly electrical and branching
Electrical Effort:
H = (32*3) / 10 = 9.6
Branching Effort:
B=8
‰ If we neglect logical effort (assume G = 1)
Path Effort:
F = GBH = 76.8
Number of Stages:
Logical Effort:
Path Effort:
Stage Effort:
Path Delay:
Gate sizes:
A[3] A[3]
10
A[2] A[2]
10
10
10
A[1] A[1]
10
10
G = 1 * 6/3 * 1 = 2
F = GBH = 154
fˆ = F 1/ 3 = 5.36
D = 3 fˆ + 1 + 4 + 1 = 22.1
z = 96*1/5.36 = 18
y = 18*2/5.36 = 6.7
A[0] A[0]
10
10
N = log4F = 3.1
y
z
word[0]
96 units of wordline capacitance
‰ Try a 3-stage design
5: Logical Effort
y
CMOS VLSI Design
Slide 33
Comparison
5: Logical Effort
z
word[15]
CMOS VLSI Design
Slide 34
Review of Definitions
‰ Compare many alternatives with a spreadsheet
Term
Stage
Path
number of stages
1
N
G = ∏ gi
Design
N
G
P
D
NAND4-INV
2
2
5
29.8
logical effort
g
NAND2-NOR2
2
20/9
4
30.1
electrical effort
h=
Cout
Cin
H=
Con-path + Coff-path
Con-path
B = ∏ bi
Cout-path
Cin-path
INV-NAND4-INV
3
2
6
22.1
branching effort
b=
NAND4-INV-INV-INV
4
2
7
21.1
effort
f = gh
F = GBH
NAND2-NOR2-INV-INV
4
20/9
6
20.5
NAND2-INV-NAND2-INV
4
16/9
6
19.7
effort delay
f
DF = ∑ f i
INV-NAND2-INV-NAND2-INV
5
16/9
7
20.4
parasitic delay
p
P = ∑ pi
NAND2-INV-NAND2-INV-INV-INV 6
16/9
8
21.6
delay
d= f +p
5: Logical Effort
CMOS VLSI Design
Slide 35
5: Logical Effort
CMOS VLSI Design
D = ∑ d i = DF + P
Slide 36
Method of Logical Effort
1)
2)
3)
4)
5)
Compute path effort
Estimate best number of stages
Sketch path with N stages
Estimate least delay
Determine best stage effort
6) Find gate sizes
5: Logical Effort
F = GBH
N = log 4 F
1
D = NF N + P
1
fˆ = F N
Cini =
CMOS VLSI Design
gi Couti
fˆ
Slide 37
Summary
‰ Logical effort is useful for thinking of delay in circuits
– Numeric logical effort characterizes gates
– NANDs are faster than NORs in CMOS
– Paths are fastest when effort delays are ~4
– Path delay is weakly sensitive to stages, sizes
– But using fewer stages doesn’t mean faster paths
– Delay of path is about log4F FO4 inverter delays
– Inverters and NAND2 best for driving large caps
‰ Provides language for discussing fast circuits
– But requires practice to master
5: Logical Effort
CMOS VLSI Design
Slide 39
Limits of Logical Effort
‰ Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
‰ Simplistic delay model
– Neglects input rise time effects
‰ Interconnect
– Iteration required in designs with wire
‰ Maximum speed only
– Not minimum area/power for constrained delay
5: Logical Effort
CMOS VLSI Design
Slide 38
Download