Logic Design for High-Performance

advertisement
High-Speed Digital CMOS Circuits
73255
Summer Term 2015
Monday 8:00 – 9:30
N5320
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
1
Sumer Term 2015
Lecturer CV
henzler@tum.de
2
Stephan Henzler received the Dipl.-Ing. degree in
electrical engineering in 2002, the Dr.-Ing. degree in 2006,
and the habilitation1 degree in 2010 from the Technische
Universität München (TUM), Germany. From 2002 to
2005, he was with the Institute for Technical Electronics,
Technische Universität München, where he worked on
low-power digital integrated circuit design and leakage
reduction techniques. For his dissertation on power
management and leakage reduction techniques he
received the Rhode-und-Schwarz outstanding thesis
award 2007. In 2005, he joined the Advanced Systems
and Circuits Department of Infineon Technologies AG,
Munich, where he worked on high-speed/highperformance digital integrated circuits, variability in deepsubmicron CMOS technologies, and mixed-signal circuit
design in nanometer CMOS technologies, especially timeto-digital converters. In 2010 he joined the wireless mixedsignal department of Infineon where he works on mixedsignal system and circuit design. Since February 2011 he
carries on the same responsibilities within Intel.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
Sumer Term 2015
Administratives
Lecture:
Stephan Henzler
henzler@tum.de
office hours by arrangement
Andrew Giebfried (teaching assistant)
andrew.giebfried@tum.de
Tutorials: embedded in lecture
Exam:
homework / self-learning module
in written form, 60 minutes, after lecture cycle
Language: english
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
3
Sumer Term 2015
Course Overview
 Logic families for high-speed and high-performance
 Register (flip-flop) design
 Clock generation and distribution
– Phase/Delay Locked Loop
– Frequency dividers
 Time-to-digital converters
 Arithmetic algorithms and macros for fast adders,
multipliers, etc.
 Memory design (self learning module)
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
4
Sumer Term 2015
Outline
 Literature
 CMOS delay models
– Elmore delay
– Delay Minimization in buffer chain
– Delay minimization of combinatorial logic
Logical Effort methodology
 Static CMOS logic – Design considerations
 Dynamic Logic
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
5
Sumer Term 2015
Recommended Literature I
Course Books:
Rabaey, Chanddrakasan, Nikolic.
Digital Integrated Circuits, A Design Perspective
Weste, Harris.
CMOS VLSI Design, A Circuits and System Perspective
Kaeslin,
Digital Integrated Circuit Design
Ken, Martin.
Digital Integrated Circuit Design
Bernstein, Carrig, Durham, Hansen, Hogenmiller, Nowak, Rohrer.
High Speed CMOS Design Styles
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
6
Sumer Term 2015
Recommended Literature II
Phase-Locked Loops
Razavi.
RF-Microelectronics
Time-to-Digital Converters:
Henzler.
Time-to-Digital Converters
Arithmetic Circuits:
Ercegovac, Lang.
Digital Arithmetic
Parhami.
Computer Arithmetic, Algorithms and Hardware Design
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
7
Sumer Term 2015
Recommended Literature III
Memory Circuits:
Haraszi.
CMOS Memory Circuits
Low-Power:
Henzler.
Power Management of Digital Circuits in Deep Sub-Micron
CMOS Technologies
Latest material for all chapters:
IEEE Xplore with TUM full library access
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
8
Sumer Term 2015
High-Speed Circuits
 Very high frequency, i.e. several GHz
 Considerable part of clock period consumed for
synchronization, e.g. flip-flop delay tcpq, setup time tsetup, and
clock skew plus jitter tskew
 Limited time for logic  only simple operations per cycle or
pipeline stage, respectively
 Be aware of hold time violations!
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
9
Sumer Term 2015
High-Performance Circuits
 Moderate frequency, i.e. 300MHz – 2GHz
 Predominant part of clock period consumed for logic
operations, small synchronization overhead
 Powerful operations possible within a single cycle/stage
 Despite long cycle time the timing is critical due to the long
combinatorial paths between two flip-flop stages
 Be aware of setup time violations!
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
10
Sumer Term 2015
Logic Design for High-Performance
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
11
Sumer Term 2015
Static CMOS Logic
 Complementary pull-up and pulldown network:
NMOS
 PMOS
serial
 parallel connection
 Always low resistive connection to
power supply (VDD or VSS)
– full swing signals
– noise and leakage tolerant
– strong supply dependence of delay
 Inputs connected to n PMOS and n
NMOS devices
– input load  2n (large)
– large internal load connected to output
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
12
Sumer Term 2015
Static CMOS Logic 2
 Handover between pull-up and pulldown during switching
– cross current
– medium speed
– not ratioed
 Activity dependent power
consumption
 Excellent modeling and EDA
integration available
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
13
Sumer Term 2015
Outline
 Literature
 CMOS delay models
– Elmore delay
– Delay Minimization in buffer chain
– Delay minimization of combinatorial logic
Logical Effort methodology
 Static CMOS logic – Design considerations
 Dynamic Logic
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
14
Sumer Term 2015
Elmore Delay
Prerequisites:
– one input only
– caps between network node and ground
– no resistive loops
W. C. Elmore, The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers, Journal of Applied Physics, 1948.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
15
Sumer Term 2015
Elmore Delay (cont)
There is exactly one path from a network node i to the input s.
The sum of all resistances along this path is the path resistance
Rii, e.g. R44 = R4 + R3 + R1.
W. C. Elmore, The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers, Journal of Applied Physics, 1948.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
16
Sumer Term 2015
Elmore Delay (cont)
The shared path resistance Rik is the sum of all resistances
along the joint sub-path of the two paths s  i and s k.
Example: Ri4 = R1 + R3
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
17
Sumer Term 2015
Elmore Delay (cont)
Elmore delay:
First order approximation of the delay after which a voltage
step at the input s can be observed at the output i.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
18
Sumer Term 2015
Elmore Delay (cont)
Elmore delay:
quite useful for
– wire delay estimation
– first order delay model of static and dynamic CMOS gates
(RC model)
(actually a transistor is not a resistor, excellent for qualitative understanding)
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
19
Sumer Term 2015
Load Dependence of Inverter
electrical effort
effort,
fanout,fan-out)
gain
(gain,
Linear load-delay dependence holds
fairly good, even in deep sub-micron
technologies.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
20
Sumer Term 2015
Sizing of Super Buffer
min. sized
N - 1 unknown sizings
Find inverter dimensions for minimum propagation delay.
C1 and CL given  N-1 variables
path electrical effort
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
21
Sumer Term 2015
Sizing of Super Buffer 2
Minimize delay (i.e. search for optimum fanout hi):
for optimum delay all fanouts need to be the same,
i.e. h1 = h2 = h3 = … = hN
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
22
Sumer Term 2015
Sizing of Super Buffer 3
The product of all fanouts is constant and given by the
constraints, i.e. C1 and CL:
Minimum delay of an N-stage inverter chain (superbuffer):
However, what is the optimum number of stages
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
23
Sumer Term 2015
Sizing of Super Buffer 4
Find optimum number of stages:
(implicit equation for hi)
normalized delay
100
H=50
H=100
H=200
80
60
40
20
0
0
1
2
3
4
5
6
number of stages
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
* Sometimes an optimal fan-out of e is reported .
This follows from a similar derivation if the parasitic
delay of the gate is neglected.
24
Sumer Term 2015
Sizing of Combinatorial Logic
Buffer chain is mainly an academic exercise.
How can we size combinatorial logic for minimum delay?
How many stages shall we use to realize a certain function?
Logical Effort Methodology
(a generalization of the preceding investigation)
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
25
Sumer Term 2015
Delay of Combinatorial Gate
effort delay
p parasitic delay, depends on logic, not sizing or load
h electrical effort, depends on sizing and load not on log. func.
g logical effort, depends on logic, not sizing
Two equivalent definitions of logical effort g:
1.
gate capacitance
gate cap. of ref. inverter
when the gate is sized to deliver
the same current than the reference inverter
2. g describes how much worse the gate can deliver current to the load
compared to an inverter when the gate is sized to provide the same
input capacitance as the inverter.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
26
Sumer Term 2015
Calculation of Logical Effort
p = 2, g =4/3
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
27
Sumer Term 2015
Calculation of Logical Effort
p = 7/3, gA = 2, gB = 2, gC = 5/3
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
28
Sumer Term 2015
Delay of High Fanin Gates
Parasitic delay and logical effort of NAND gate
(according to basic estimation of previous slides)
inputs
parasitic delay
logical effort
2
3
2
3
4/3 5/3
4
4
2
5
5
7/3
6
6
8/3
8
8
10/3
n
n
(n+2)/3
In reality parasitic delay increases nearly quadratically due to
intermediate capacitances.
Use Elmore delay or simulation for accurate parameter
extraction. Linear delay model still quite good.
´
1³ 2
pN =
N +N
3
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
29
Sumer Term 2015
Delay of Combinatorial Paths
The branching in combinatorial blocks increases the electrical
effort by the branching effort b
branching effort
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
30
Sumer Term 2015
Delay of Combinatorial Path 2
determine optimum sizing in the same way than for buffers

Define path effort:
minimum delay can be estimated
before sizing process is started!
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
31
Sumer Term 2015
Unequal Rising and Falling Delay
 Equal rise and fall delay is often disadvantageous for
average delay (path delay is relevant for applications) and
area consumption.
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
32
Sumer Term 2015
Limitations of Logical Effort
 Logical effort methodology does …
– Valuable rule of thumb for sizing of high performance paths
– Predicts the optimum path delay /wo knowledge of sizing
– Indicates how to distribute the gain along a critical path
 Logical effort methodology does not …
– Take the slope dependence of gate delays into account
(however, along the critical path slopes are very similar)
– Consider simultaneous switching
– Consider power, i.e. gives no sizing rule for sub-critical paths
– Indicate how to size a path for small power and/or area
– Interconnect delay
– Branching is difficult to estimate, especially for parallel critical paths
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
33
Sumer Term 2015
Non-Linear Delay Model
 Linear delay model is suited very well for hand calculation
and intuitive understanding how to size gates
 Linear delay model is not suited for high numerical accuracy
 Non-linear delay model for computer calculation
–
–
–
–
Define a set or relevant input slopes (transition times)
Define a set of relevant load capacitances
Perform SPICE simulation for each (load,slope) tupel
Measure propagation delay and output slope (transition time)
and store results in 2-dimensional lookup table
– Usually stored in the so called liberty-file
 Numerically accurate
 Not useful to understand trade-offs / derive design strategies
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
34
Sumer Term 2015
Example for Timing Description in LIB File
pin(Z) {
…
timing() {
related_pin : “X1”;
timing () {
cell_fall(slp_load) {
index_1 (“0.010, 0.050”); (slope)
index_2 (“0, 10, 50”);
(load)
values(
“50, 150, 550”\
“60, 170, 610”);
…
 Description tables like this are done for any timing figure, i.e.
–
–
–
–
delay from any input in both switching directions to the output
slope at the output in response to a switching event at any input
setup & hold times
…
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
35
Sumer Term 2015
Outline
 Literature
 CMOS delay models
– Elmore delay
– Delay Minimization in buffer chain
– Delay minimization of combinatorial logic
Logical Effort methodology
 Static CMOS logic – Design considerations
 Dynamic CMOS Logic
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
36
Sumer Term 2015
Input Dependence of Gate Delay
simultaneous switching of inputs is worst case
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
37
Sumer Term 2015
Equalization of Gate Delay
 Layout is more complex, i.e. cell area is larger
– makes only sense if functionality requires equal propagation delay
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
38
Sumer Term 2015
Asymmetric Gates
speed requirements of
one input relaxed
– downsizing of slow
branch
– upsizing of low active
series devices
pAd = 13/9, pAu = 13/9
pBd = 17/9, pBu = 26/9
gAd = 10/9, gAu = 10/9
gBd = 5/3,
gBu = 10/3
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
39
Sumer Term 2015
Skewed Gates
If one transition is much more critical than the other one the
critical transition can be accelerated at the cost of the other one
unskewed
p = 1, g = 1
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
skewed
pu = 5/6, pd = 5/3, gu = 5/6, gd = 5/3
40
Sumer Term 2015
Dynamic Logic
(Precharge Logic)
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
41
Sumer Term 2015
Dynamic Logic





Low input & parasitic caps.
No contention
No static power consumption
Extremely fast
Wide NOR structures e.g. for
decoders
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München




Sensitive to noise and leakage
High dynamic power consumption
Clocking required
Monotonicity requirement
42
Sumer Term 2015
High Fanin Dynamic Gates
Wide NOR Structures
 NOR operation is for free in single
ended domino gate
 Wide OR structures cause significant
leakage currents degrading the charge
on the dynamic node  keeper
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
43
Sumer Term 2015
Alternative for Wide NOR: Pseudo NMOS
 Cross current, acceptable e.g. if pulldown is exception or for high-speed
applications
 Reduced swing (tradeoff between pullup speed and level reduction
 ratioed
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
44
Sumer Term 2015
High Fanin Dynamic Gates
Long NAND Structures
 No contention
 Low load
 long NMOS pull-down chain possible
but charge sharing is critical
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
45
Sumer Term 2015
Leakage Currents in Deep Sub-Micron MOSFETs:
Classic Leakage Currents
gate
source
drain
1
bulk
1
2
2
Subthreshold current
Junction leakage
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
46
Sumer Term 2015
Subthreshold Leakage
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
47
Sumer Term 2015
Leakage Currents in Deep Sub-Micron MOSFETs:
Tunneling Currents
gate
source
drain
1
bulk
1
2
2
Gate tunneling current
Gate induced drain leakage
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
48
Sumer Term 2015
Noise and Leakage Sensitivity
Leakage currents discharge
dynamic node
 limited retention time
 minimum operation frequency
(very disadvantageous for
production test or low speed
operation modes)
Noise on power and signal wires
opens pull-down paths weakly
 erroneous discharge of
dynamic node
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
49
Sumer Term 2015
Reduction of Noise & Leakage Sensitivity
 Weak keeper device compensates for leakage and noise
induced discharge currents
 Size keeper for approximately 10% of discharge current
 5-10 % speed degradation
 No inversion
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
50
Sumer Term 2015
Designing Weak Keepers
weak keeper  small W / L – ratio  small W, large L?
good device
properties but
large current
small current
but strongly
sensitive to
variations
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
small current
but modeling of
length dependence is difficult
good keeper
device
51
Sumer Term 2015
Design of Weak Keepers 2
good keeper device but
large output loading
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
good keeper with reduced
output loading
52
Sumer Term 2015
Adaptive Keepers
With increasing process variations
keeper design becomes difficult
 slow NMOS & fast PMOS:
– keeper too strong
– significant speed degradation
 fast NMOS & slow PMOS:
– high leakage in pull-down path
but small compensation current
 keeper too weak
– erroneous discharge
Steven Hsu, Intel, ISSCC 2006
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
53
Sumer Term 2015
Delayed Keeper
 Increasing leakage calls for stronger keepers
 delay penalty, advantage of dynamic circuits vanishes
 Concept:
– Use small keeper which cannot compensate leakage completely
– Enable strong keeper after evaluation/discharge is completed
 Challenge: Size permanent and delayed keeper such that
leakage currents do not compromise logical decision
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
54
Sumer Term 2015
Charge Sharing in Dynamic Gates
 Charge sharing between
cap of pre-charged node
and intrinsic caps
 Eventually undefined
levels and disturbance of
subsequent stages
 Might be recovered by keeper
 Can be easily overlooked in simulation
 think about worst case situation
 Remedy: Precharge internal nodes with
weak transistors
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
55
Sumer Term 2015
Multi-Output Dynamic Logic
Domino gates can produce multiple logic functions (with
common subterms) simultaneously
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
56
Sumer Term 2015
Compound Domino Logic
Coupling inverters can be substituted by any static gate to
reduce number of logic stages
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
57
Sumer Term 2015
Conditional Keeper
 Strongly low skewed
CMOS gates with precharge
 Reduced contention
 No latching
 Also known as skewed
CMOS
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
58
Sumer Term 2015
Clocking of Single-Rail Domino Circuits




Sequential activation of logic stages
High noise sensitivity
High speed  optimized clock skews  variation sensitive
Circuit becomes somehow “analog”
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
59
Sumer Term 2015
Clocking of Single-Rail Domino Circuits 2
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
60
Sumer Term 2015
Clocking of Single-Rail Domino Circuits 3
 Self timed evaluation (domino principle)
 Simultaneous pre-charge / evaluation
 Bypassing of stages possible
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
61
Sumer Term 2015
Clocking of Single-Rail Domino Circuits 4
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
62
Sumer Term 2015
Clocking of Footer-Less Domino Circuits
 No footer  speed & power improvement
 Self timed
 Sequential pre-charge to avoid cross currents
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
63
Sumer Term 2015
Clocking of Footer-Less Domino Circuits 2
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
64
Sumer Term 2015
NORAce Domino Logic
 Alternating NMOS / PMOS Domino Gates
 pre-charged state disables all evaluation paths
 Self timed
 Noise sensitive
 No direct bypassing
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
65
Sumer Term 2015
NORAce Domino Logic 2
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
66
Sumer Term 2015
Cross Coupled Domino
– A Dynamic Dual Rail Family –
 No contention
 Improved robustness
 Implicit inversion
 Higher clock load
 Complex wiring
 No wide NOR structures
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
67
Sumer Term 2015
Is it a good idea to use dynamic logic?
Well, it‘s fancy
Performance advantage vanishes in DSM technologies
Many design pitfalls, e.g. charge sharing, leakage, noise
Very susceptible to parasitics, PVT, etc.
Weak EDA support, e.g. timing verification,
 poor verification
Conclusion





–
–
–
–
Dynamic logic is a risk – Say No-No!
Dynamic logic is often the reason for redesigns
Avoid whenever possible
If you think its required, first seek for architectural loopholes,
e.g. logic optimization, pipelining, parallelization …
If you find no other way do it, but very carefully!
High-Speed Digital CMOS Circuits
Stephan Henzler
Technische Universität München
68
Sumer Term 2015
Download