Lecture 10: Latch and Flip

advertisement
Lecture 10:
Latch and Flip-Flop Design
Slides orginally from:
Vladimir Stojanovic
Computer Systems Laboratory
Stanford University
horowitz@stanford.edu
5/7/2001
EE371
1
Outline
•
•
•
•
•
•
Recent interest in latches and flip-flops
Timing and Power metrics
Design and optimization tradeoffs
Master-slave vs. Pulse-triggered Latch
Representative designs
Comparison
5/7/2001
EE371
2
1
Recent Interest in Flip-Flops
• Trends in high-performance systems
! Higher clock frequency
! More transistors on chip
• Consequences
! Increased flip-flop overhead relative to cycle time
• Cycle time 10 - 20 FO4 delays, flop overhead 2 - 4 FO4
!
!
!
!
Difficult to control both edges of the clock
Higher impact of clock skew
Higher crosstalk and substrate coupling
Higher power consumption
• expensive packages and cooling systems
• limit in performance
! Clock burns up to 40%, flops up to 20% of total power
5/7/2001
EE371
3
Requirements in the Flip-Flop Design
•
•
•
•
Small Clk-Output delay, Narrow sampling window
Low power
Small clock load
High driving capability (increased levels of parallelism)
! Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to
over 200fF, with typical values of 100-150fF in critical paths (28FO4s or even higher)
• Integration of logic into the flop
• Multiplexed or clock scan
• Crosstalk insensitivity
- dynamic/high impedance nodes are affected
5/7/2001
EE371
4
2
Flip-Flop Delay
• Sum of setup time and Clk-output delay is the only true
measure of the performance with respect to the system
speed
• T = TClk-Q + TLogic + Tsetup+ Tskew
D Q
Logic
D Q
N
Clk
Clk
TClk-Q
TSetup
TLogic
5/7/2001
EE371
5
Delay vs. Setup/Hold Times
350
300
Minimum Data-Output
Clk-Output [ps]
250
200
150
Setup
Hold
100
50
0
-200
-150
-100
-50
0
50
100
150
200
Data-Clk [ps]
5/7/2001
EE371
6
3
Timing parameters, details
410
Unstable Clk-Q region
390
Failure region
Time [ps]
370
350
330
Stable Clk-Q region
D-Q
D CQ +U
minimum D-Q
Clk-Q stable
D CQ
310
290
270
U
Optimum setup time
250
-80
-60
-40
-20
0
20
40
D - Clk delay [ps]
60
80
100
The best point to pick on delay curve is minimum D-Q
5/7/2001
EE371
7
Design & optimization tradeoffs
PDPtot [fJ]
90
80
•
! Minimal Total power
consumption
! Minimal Delay
50
40
30
20
•
•
10
0
0
50
100
150
Total Power [uW]
90
80
70
60
PDPtot [fJ]
PDPtot [fJ]
Opposite Goals
70
60
50
40
30
20
10
0
0
5
10
15
20
200
90
80
70
60
50
40
30
20
10
0
25
0
200
400
600
800
1000
1200
Delay [ps]
Width [um]
5/7/2001
Power-Delay tradeoff
Minimize Power-Delay
product (PDPtot)
EE371
8
4
Types of Flip-Flops
Master-Slave Latch
Pulse-Triggered Latch
L
Data
L1
L2
D Q
D Q
Clk
Clk
Data
Clk
D Q
Clk
Clk
Data
Clk
5/7/2001
S Q
R
EE371
9
Master-Slave Latches
• Positive setup times
• Two clock phases:
! distributed globally
! generated locally
• Small penalty in delay for incorporating MUX
• Some circuit tricks needed to reduce the overall delay
5/7/2001
EE371
10
5
T-G Master-Slave Latch
• PowerPC 603 (Gerosa, JSSC 12/94)
Vdd
Clk
Vdd
Clkb
Q
D
Clk
Clkb
5/7/2001
EE371
11
T-G Master-Slave Latch
• Low power feedback
• Unbuffered input
! input capacitance depends on the phase of the clock
! over-shoot and under-shoot with long routes
! wirelength must be restricted at the input
•
•
•
•
Clock load is high
Low power
Small clk-output delay, but positive setup
Easily embedded scan or mux
5/7/2001
EE371
12
6
C2MOS MS Latches
Vdd
Vdd
Ck
Ckb
D
Q
Ckb
Vdd
Clk
Ck
Vdd
Vdd
Vdd
Ck
Vdd
Vdd
Ckb
•
•
•
•
Ck
Ckb
Ck
Low power feedback
Locally generated second phase
Poor driving capability
Robustness to clock slope
5/7/2001
EE371
13
Single-Transistor-Clocked MS latches
Q
Vdd
Vdd
Q
Vdd
Vdd
Clk
Clk
Q
Q
D
D
D
•
•
•
•
•
D
DSTC
SSTC
Yuan and Svennson, JSSC Jan. ‘97
Ratioed DCVS and SRPL based designs
Relatively small clock load
Very sensitive to input glitching
Capacitive coupling and charge sharing related speed and power problems
5/7/2001
EE371
14
7
Pulse-Triggered Latches
• First stage is a pulse generator
! generates a pulse (glitch) on a rising edge of the clock
• Second stage is a latch
! captures the pulse generated in the first stage
• Pulse generation results in a negative setup time
• Frequently exhibit a soft edge property
• Must check for hold time violations
Note: power is always consumed in the clocked pulse
generator
5/7/2001
EE371
15
Hybrid Latch Flip-Flop
•
AMD K-6, Partovi, ISSCC’96
Vdd
Q
Q
D
Clk
5/7/2001
EE371
16
8
HLFF Operation
• 1-0 and 0-1 transitions at the input with 0ps setup time
5/7/2001
EE371
17
Hybrid Latch Flip-Flop
Skew absorption
Partovi et al, ISSCC’96
5/7/2001
EE371
18
9
Hybrid Latch Flip-Flop
• Flip-flop features:
! single phase clock
! edge triggered, on one clock edge
• Latch features: Soft clock edge property
!
!
!
!
brief transparency, equal to 3 inverter delays
negative setup time
allows slack passing
absorbs skew
• Hold time is comparable to HLFF delay
! minimum delay between flip-flops must be controlled
• Fully static
• Possible to incorporate logic
5/7/2001
EE371
19
Semi-Dynamic Flip-Flop (SDFF)
• Sun UltraSparc III, Klass, VLSI Circuits’98
Vdd
Vdd
Q
Q
D
Clk
•
•
•
Soft edge conditioned by data since first stage is precharged - cross-coupled
latch is added for robustness
Small penalty for adding logic
Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists
5/7/2001
EE371
20
10
Sense-amplifier-based flip-flop
Matsui et al. 1994.
DEC Alpha 21264, StrongARM 110
•
•
•
•
•
First stage is a sense
amplifier
On rising clock edge
monotonic S_b or R_b
trigger the S-R latch
Cross-coupled NAND speed bottleneck
Big power savings in
reduced swing designs
Nice interface to/from
domino logic
5/7/2001
EE371
21
Modified Sense Amplifier-Based Flip-Flop
• The first stage is unchanged
sense amplifier
• Second stage is sized to
provide maximum switching
speed
• Driver transistors are large
• Keeper transistors are small
and disengaged during
transitions
Nikolic & Stojanovic, ISSCC ‘99
5/7/2001
EE371
22
11
Modified Sense Amplifier-Based Flip-Flop
• Delay of each of the outputs is independent of the load on
the other output
• Delay of Q and Q is symmetrical as opposed to the NAND
based design
• Convenient for dual rail logic and driving strength for
standard CMOS is effectively doubled
• SAFF presents a small clock load, small setup time and all
the advantages of original design
• Possible tradeoff between speed and robustness to crosstalk
5/7/2001
EE371
23
K-6 Dual-Rail ETL
•
•
•
•
Self-reset property
! increases dynamic power
! drives domino logic
Precharge increases speed
Very fast but burns a lot of power
Small clock load
Vdd
Vdd
Q
Q
D
Clk
5/7/2001
EE371
24
12
Flip-Flop Performance Comparison
Data
D
Q
Test bench
Clk Q
Clock
200fF
200fF
50fF
• Total power consumed
! internal power
! data power
! clock power
Delay is (minimum D-Q)
Clk-Q + setup time
• Measured for four cases
! no activity (0000… and 1111…)
! maximum activity (0101010..)
! average activity (random sequence)
5/7/2001
EE371
25
Delay comparison
700
Delay [ps]
600
500
400
300
200
100
0
K6
SA-F/F
StrongArm
SSTC
DSTC
350
•
300
Delay [ps]
250
200
150
Pulsed
design brings
the fastest
structures
100
50
0
SDFF
5/7/2001
HLFF
PowerPC mC2MOS
EE371
26
13
Overall performance
60
160
50
140
PDPtot [fJ]
PDPtot [fJ]
120
40
30
20
100
80
60
40
20
10
0
0
SA-F/F
HLFF
SDFF
PowerPC
mC2MOS
K6
SSTC
DSTC
Activity=0.5 equal transition probability
Activity=0.5 equal transition probability
•
•
Strong
Arm110
Real signals have the activity between 0 and 0.5 (g)
Precharged hybrid structures are the fastest but their power
consumption strongly depends on the probability of “ones”
More “ones” above the g point
•
5/7/2001
EE371
27
Conventional Clk-Q vs. minimum D-Q
400
HLFF
SSTC & DSTC
Total power [uW]
350
300
Pulsed designs
250
PowerPC
MS designs
Strong Arm FF
200
SA-F/F
150
mC2MOS latch
100
K6 ETL
50
0
150
SSTC
200
250
300
350
400
450
500
550
600
650
Delay [ps]
SDFF
400
HLFF
350
Total Power [uW]
DSTC
PowerPC
300
Strong Arm FF
250
200
SA-F/F
150
mC2MOS latch
100
K6 ETL
50
0
100
SSTC
150
200
250
300
350
•
Hidden positive
setup time
Degradation of
total delay
SDFF
Clk-Q delay [ps]
5/7/2001
DSTC
•
EE371
28
14
Comparison of Clock power consumption
DSTC MS latch
SSTC MS latch
K6 ETL
StrongArm FF
SA-F/F
2
mC MOS
PowerPC MS latch
SDFF
HLFF
0
10
20
30
40
50
Local Clock power consumption [µ
µ W]
5/7/2001
EE371
29
Design goals
•
Apply
! Small clock load
! Short direct path
! Reduced node swing
! Low-power feedback
! Pulsed design
! Optimization of both
Master and Slave latch
•
Avoid
! Positive setup time
! Sensitivity to clock slope and
skew
! Dynamic (floating) nodes
! Dynamic Master latch
Conduct Power *Delay optimizations on constant frequency - really
optimize Energy*Delay product
Take into account all sources of power dissipation
ALWAYS use Clk-Q + setup time for max delay
5/7/2001
EE371
30
15
General characteristics
! 60ps = FO4 delay in .2u technology
! min gate width 1.6u
Nominal
conditions
# of
transistors
PowerPC 603
HLFF
SDFF
mC2MOS
SA-F/F
StrongArm FF
K6 ETL
SSTC
DSTC
16
20
23
24
19
20
37
16
10
5/7/2001
Table 1: General characteristics
Total
Internal Clock Data Total Delay
transistor power power power power [ps]
width [u]
[uW] [uW] [uW] [uW]
147
36
46
5
87
266
162
106
18
3
127
199
167
158
27
2
187
187
170
94
15
6
115
292
214
97
18
3
118
272
215
101
18
3
122
275
246
250
15
5
270
200
147
94
22
4
120
592
136
132
22
4
158
629
EE371
PDPtot
[fJ]
23
25
35
34
32
34
54
71
99
31
16
Download