Lecture 11 Design for Speed

advertisement
CMPEN 411
VLSI Digital Circuits
Spring 2012
Lecture 11: Designing for Speed
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L11 S.1
Review: CMOS Inverter: Dynamic
VDD
tpHL = f(Rn, CL)
Vout
CL
Rn
tpHL = 0.69 Reqn CL
tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )
= 0.52 CL / (W/Ln k’n VDSATn )
Vin = V DD
Sp12 CMPEN 411 L11 S.2
Review: Designing Inverters for Performance

Reduce CL




Increase W/L ratio of the transistor



the most powerful and effective performance optimization
tool in the hands of the designer
watch out for self-loading!
Increase VDD


internal diffusion capacitance of the gate itself
interconnect capacitance
fanout
only minimal improvement in performance at the cost of
increased energy dissipation
Slope engineering - keeping signal rise and fall times
smaller than or equal to the gate propagation delays
and of approximately equal values


good for performance
good for power consumption
Sp12 CMPEN 411 L11 S.3
Switch Delay Model
Req
A
A
Rp
A
Rp
Rp
B
B
Rn
Rp
CL
A
A
Rn
B
NAND
Sp12 CMPEN 411 L11 S.4
Cint
Cint
Rn
Rn
A
B
NOR
CL
Input Pattern Effects on Delay
Rp
A
Rp

Delay is dependent on the pattern of
inputs

Low to high transition

B
both inputs go low
- delay is ____________
Rn

CL
- delay is ____________
A
Rn

Cint
High to low transition

B
both inputs go high
- delay is ____________

Sp12 CMPEN 411 L11 S.5
one input goes low
Adding transistors in series (without
sizing) slows down the circuit
High to Low Transition (VTC Curve)
2-input NAND with
0.5/0.25 NMOS
0.75/0.25 PMOS
3
A
B
2
F= ! (A & B)
D
A
S
D
B
M1
VGS1 = VB
S
Vout
weaker
PUN
1
M2
VGS2 = VA –VDS1
A,B: 0 -> 1
B=1, A:0 -> 1
A=1, B:0->1
Cint
0
0

1
Vin
2
The threshold voltage of M2 could be higher than M1 due
to the body effect () because of Cint
VTn1 = VTn0
VTn2 = VTn0 + ((|2F| + Vint) - |2F|)
since VSB of M2 is not zero due to the presence of Cint
Sp12 CMPEN 411 L11 S.6
Low to High Transition (Delay Curve)
2-input NAND with
0.5m/0.25m NMOS
0.75m/0.25m PMOS
CL = 10 fF
3
A=B=10
2.5
2
A=1, B=10
Voltage, V
1.5
A=10, B=1
1
0.5
0
-0.5
0
100
200
time, psec
Sp12 CMPEN 411 L11 S.7
300
400
Input Data
Delay
Pattern
(psec)
A=B=01
69
A=1, B=01
62
A= 01, B=1
50
A=B=10
35
A=1, B=10
76
A= 10, B=1
57
Low to High Transition (Delay Curve)
A
2-input NAND with
0.5m/0.25m NMOS
0.75m/0.25m PMOS
CL = 10 fF
B
F= ! (A & B)
A
B
M2
M1
Cint
Case 1. have to discharge both CL and Cint (really depends on state of
Cint – assuming charged up here)
Case 2. have to discharge both CL and Cint
Case 3. have to discharge only CL
Case 4. no Cint to charge, both pfets on so strong pullup
Case 5. have to charge both CL and Cint through one pfet
Case 6. have to charge only CL but through one pfet
Sp12 CMPEN 411 L11 S.8
Input Data
Delay
Pattern
(psec)
A=B=01
69
A=1, B=01
62
A= 01, B=1
50
A=B=10
35
A=1, B=10
76
A= 10, B=1
57
Transistor Sizing
Rp
1 A
Rp
B
Rn
2
A
2
Rn
B
Rp
2 B
1
CL
Cint
Rp
2
A
1
Rn
Rn
A
B
Assuming Rp = Rn
Sp12 CMPEN 411 L11 S.9
Cint
CL
1
Transistor Sizing a Complex CMOS Gate
B
A
C
D
OUT = !(D + A • (B + C))
A
D
B
Sp12 CMPEN 411 L11 S.10
C
Transistor Sizing a Complex CMOS Gate
A
B
4 12
C
4 12
2 6
D
2
6
OUT = !(D + A • (B + C))
A
D
2
1
B
2C
Sp12 CMPEN 411 L11 S.11
2
Transistor Sizing a Complex CMOS Gate
B
A
D
C
OUT
D
A
C
B
Sp12 CMPEN 411 L11 S.12
Fan-In Considerations
A
B
C
D
A
CL
B
C3
C
C2
D
C1
Distributed RC model
(Elmore delay)
tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)
Propagation delay deteriorates
rapidly as a function of fan-in –
quadratically in the worst case.
Sp12 CMPEN 411 L11 S.13
tp as a Function of Fan-In
1250
quadratic
function of
fan-in
tp (psec)
1000
750
tpHL
500
tp
250
tpLH
0
2
4
6
8
10
12
14
linear
function of
16 fan-in
fan-in
 Gates with a fan-in greater than 4 should be avoided.
Sp12 CMPEN 411 L11 S.14
Fast Complex Gates: Design Technique 1

Transistor sizing


as long as fan-out capacitance dominates, the pull down
chain is like a distributed RC line so
Should all fets be of the same size?
InN
MN
In3
M3
C3
In2
M2
C2
In1
M1
C1
Sp12 CMPEN 411 L11 S.15
CL
Fast Complex Gates: Design Technique 1

Transistor sizing


as long as fan-out capacitance dominates, the pull down
chain is like a distributed RC line so
Should all fets be of the same size?
No, use progressive sizing
InN
MN
In3
M3
C3
In2
M2
C2
In1
M1
C1
CL
M1 > M2 > M3 > … > MN
The fet closest to the output
should be the smallest.
Can reduce delay by more than 20%;
decreasing gains as technology shrinks
Sp12 CMPEN 411 L11 S.16
Fast Complex Gates: Design Technique 2

Input re-ordering

When not all inputs arrive at the same time, the latest arriving
signal should be driving the top or bottom fet?
critical path
In3 1 M3
In2 1 M2
In1
M1
01
Sp12 CMPEN 411 L11 S.17
C charged
L
C2
C1
critical path
01
In1
M3
CLcharged
In2 1 M2
C2
In3 1 M1
C1
Fast Complex Gates: Design Technique 2

Input re-ordering

When not all inputs arrive at the same time, the latest arriving
signal should be driving the top or bottom fet?
critical path
In3 1 M3
charged
CL
In2 1 M2
C2 charged
In1
M1
01
C1 charged
delay determined by time to
discharge CL, C1 and C2

critical path
01
In1
M3
CLcharged
In2 1 M2
C2 discharged
In3 1 M1
C1 discharged
delay determined by time to
discharge CL
The latest arriving signal should be driving the fet closest
to the output.
Sp12 CMPEN 411 L11 S.18
Sizing and Input Ordering Effects
A
3 B
3 C
3 D
A
44
B
45
C
46
C2
D
47
C1
3
CL= 100 fF
C3
Progressive sizing in pull-down
chain gives up to a 23%
improvement.
Input ordering saves 5%
critical path A – 23%
critical path D – 17%
Sp12 CMPEN 411 L11 S.19
Fast Complex Gates: Design Technique 3

Alternative logic structures, which is the fastest?
F = ABCDEFGH
Sp12 CMPEN 411 L11 S.20
Fast Complex Gates: Design Technique 4

Isolating fan-in from fan-out using buffer insertion
CL
Sp12 CMPEN 411 L11 S.21
CL
Fast Complex Gates: Design Technique 5
Logical Effort

First proposed by Ivan Sutherland and Bob Sproull in
1991



“Logical Effort: Designing for Speed on the back of an
Envelope”, IEEE Advanced Research in VLSI, 1991
Both authors are vice president and fellow at Sun
Microsystems
Gain-based synthesis based on Logical effort


Implemented in IBM’s logic synthesis tool BooleDozer
Also adopted by Magma’s logic synthesis tool
Sp12 CMPEN 411 L11 S.22
Introduction of Logical Effort

Logical Effort is a method to answer these questions:



A very simple model of delay
Back of the envelope calculations and tractable optimization
Who needs to learn about logical effort


Circuit designers
EDA tool developers
Sp12 CMPEN 411 L11 S.23
Application of Logical Effort

Alternative logic structures, which is the fastest?
F = ABCDEFGH
Sp12 CMPEN 411 L11 S.24
Next Lecture: Logical Effort

Reading:

Textbook pp.252-257
Sp12 CMPEN 411 L11 S.25
Download