Clk - Stanford University

advertisement
Lecture 8:
Latch and Flip Flop Design
Slides originally from:
Vladimir Stojanovic & Vojin G. Oklobdzija
Computer Systems Laboratory
Stanford University
horowitz@stanford.edu
4/24/02
EE371
1
Outline
•
•
•
•
•
•
Recent interest in latches and flip-flops
Timing and Power metrics
Design and optimization tradeoffs
Master-slave vs. Pulse-triggered Latch
Representative designs
Comparison
4/24/02
EE371
2
Recent Interest in Flip-Flops
• Trends in high-performance systems
à Higher clock frequency
à More transistors on chip
• Consequences
à Increased flip-flop overhead relative to cycle time
• Cycle time 10 - 20 FO4 delays, flop overhead 2 - 4 FO4
à
à
à
à
Difficult to control both edges of the clock
Higher impact of clock skew
Higher crosstalk and substrate coupling
Higher power consumption
• expensive packages and cooling systems
• limit in performance
à Clock burns up to 40%, flops up to 20% of total power
4/24/02
EE371
3
Requirements in the Flip-Flop Design
•
•
•
•
Small Clk-Output delay, Narrow sampling window
Low power
Small clock load
High driving capability (increased levels of parallelism)
à Typical flip-flop load in a 0.18µm CMOS ranges from 50fF to
over 200fF, with typical values of 100-150fF in critical paths
• Integration of logic into the flop
• Multiplexed or clock scan
• Crosstalk insensitivity
- dynamic/high impedance nodes are affected
4/24/02
EE371
4
Flip-Flop Delay
• Sum of setup time and Clk-output delay is the only true
measure of the performance with respect to the system
speed
• T = TClk-Q + TLogic + Tsetup+ Tskew
D Q
Logic
D Q
N
Clk
TClk-Q
4/24/02
Clk
TSetup
TLogic
EE371
5
Delay vs. Setup/Hold Times
350
300
Minimum Data-Output
Clk-Output [ps]
250
200
150
Setup
Hold
100
50
Sampling Window
0
-200
-150
-100
-50
0
50
100
150
200
Data-Clk [ps]
4/24/02
EE371
6
Timing parameters, details
410
Unstable Clk-Q region
390
Failure region
Time [ps]
370
350
330
D CQ +U
Stable Clk-Q region
D-Q
minimum D-Q
Clk-Q stable
310
D CQ
290
270
Optimum setup time
U
250
-80
-60
-40
-20
0
20
40
D - Clk delay [ps]
60
80
100
The best point to pick on delay curve is minimum D-Q
4/24/02
EE371
7
Types of State-Elements
Master-Slave Latch
Pulse-Triggered Latch
L
Data
L1
L2
D Q
D Q
Clk
Data
Clk
D Q
Clk
Clk
Clk
Data
Clk
4/24/02
EE371
S Q
R
8
Master-Slave Latches
• Positive setup times
• Two clock phases:
à distributed globally
à generated locally
• Small penalty in delay for incorporating MUX
• Some circuit tricks needed to reduce the overall delay
4/24/02
EE371
9
T-G Master-Slave Latch
• PowerPC 603 (Gerosa, JSSC 12/94)
Vdd
Clk
Vdd
Clkb
Q
D
Clkb
4/24/02
Clk
EE371
10
T-G Master-Slave Latch
• Low power feedback
• Unbuffered input
à input capacitance depends on the phase of the clock
à over-shoot and under-shoot with long routes
à wirelength must be restricted at the input
•
•
•
•
Clock load is high
Low power
Small clk-output delay, but positive setup
Easily embedded scan or mux
4/24/02
EE371
11
C2MOS MS Latches
Y. Suzuki, “Clocked CMOS Calculator Circuitry”, IEEE J. Solid-State Circuits, Dec. 1973
Vdd
Vdd
Ck
Ckb
D
Q
Ckb
Vdd
Clk
•
•
•
•
Vdd
Vdd
Ck
Low power feedback
Locally generated second phase
Poor driving capability
Robustness to clock slope
4/24/02
Ck
Vdd
Vdd
Vdd
Ckb
Ck
Ck
Ckb
EE371
12
Single-Transistor-Clocked MS latches
D
Vdd
Clk
Clk
Vdd
Q
Q
D
D
D
Vdd
•
•
•
•
•
DSTC
SSTC
Yuan and Svennson, JSSC Jan. ‘97
Ratioed DCVS and SRPL based designs
Relatively small clock load
Very sensitive to input glitching
Capacitive coupling and charge sharing related speed and power problems
4/24/02
EE371
13
Pulse-Triggered Latches
• First stage is a pulse generator
à generates a pulse (glitch) on a rising edge of the clock
• Second stage is a latch
à captures the pulse generated in the first stage
• Pulse generation results in a negative setup time
• Frequently exhibit a soft edge property
• Must check for hold time violations
Note: power is always consumed in the clocked pulse
generator
4/24/02
EE371
14
Hybrid Latch Flip-Flop (H. Partovi, ISSCC’96)
Vdd
Second
Stage Latch Q
Q
D
D=1
Clk
D=0
D=0
D=1
signal at
node X
Pulse Generator
4/24/02
EE371
15
HLFF – pulse generation
Keepers
Second
Stage Latch
Data
Clk
D=1
Pulse
Generator
D=0
D=0
signal at
node X
D=1
4/24/02
EE371
16
HLFF Operation
• 1-0 and 0-1 transitions at the input with 0ps setup time
4/24/02
EE371
17
Hybrid Latch Flip-Flop
Skew absorption
Partovi et al, ISSCC’96
4/24/02
EE371
18
Hybrid Latch Flip-Flop
• Flip-flop features:
à single phase clock
à edge triggered, on one clock edge
• Latch features: Soft clock edge property
à
à
à
à
brief transparency, equal to 3 inverter delays
negative setup time
allows slack passing
absorbs skew
• Hold time is comparable to HLFF delay
à minimum delay between flip-flops must be controlled
• Fully static
• Possible to incorporate logic
4/24/02
EE371
19
Semi-Dynamic Flip-Flop (SDFF)
• Sun UltraSparc III, Klass, VLSI Circuits’98
Vdd
Vdd
Q
Q
D
Clk
•
•
•
Soft edge conditioned by data since first stage is precharged - cross-coupled
latch is added for robustness
Small penalty for adding logic
Latch has one transistor less in stack - faster than HLFF, but 1-1 glitch exists
4/24/02
EE371
20
Sense-amplifier-based flip-flop
Madden & Bowhill, 1990, Matsui et al. 1994.
DEC Alpha 21264, StrongARM 110
•
•
•
•
•
First stage is a sense
amplifier
On rising clock edge
monotonic S_b or R_b
trigger the S-R latch
Cross-coupled NAND speed bottleneck
Big power savings in
reduced swing designs
Nice interface to/from
domino logic
4/24/02
EE371
21
Modified Sense Amplifier-Based Flip-Flop
• The first stage is unchanged
sense amplifier
• Second stage is sized to
provide maximum switching
speed
• Driver transistors are large
• Keeper transistors are small
and disengaged during
transitions
Nikolic & Stojanovic, ISSCC ‘99
4/24/02
EE371
22
Modified Sense Amplifier-Based Flip-Flop
• Delay of each of the outputs is independent of the load on
the other output
• Delay of Q and Q is symmetrical as opposed to the NAND
based design
• Convenient for dual rail logic and driving strength for
standard CMOS is effectively doubled
• SAFF presents a small clock load, small setup time and all
the advantages of original design
• Possible tradeoff between speed and robustness to crosstalk
4/24/02
EE371
23
K-6 Dual-Rail ETL
•
•
•
•
Clk
D
Self-reset property
à increases dynamic power
à drives domino logic
Precharge increases speed
Very fast but burns a lot of power
Small clock load
Vdd
4/24/02
EE371
24
Power and Delay Definitions
•
PD
All power related to the SE can be
divided into:
à
VDD
Input power
D
• Data power (PD)
• Clock power (PCLK)
à
à
•
•
à
Internal power (PINT)
Load power (PLOAD)
CLK
PLOAD
CLK Qb
PCLK
data activity ratio (α) – number of
captured data transitions with respect to
number of clock transitions
(αmax=100%)
• no activity (0000… and 1111…)
• maximum activity (0101010..)
• average activity (random sequence)
Glitching activity
4/24/02
Q
VDD
PLOAD can be merged into PINT
Internal power is a function of
à
D
VDD
EE371
Ptot = Pinternal +
PINT
∑P
driver
inputs(D,CLK)
Delay is (minimum D-Q)
Clk-Q + setup time
25
State Element Performance Metrics
It is always possible trade power for speed
Common metrics:
• Power-Delay Product (PDP)
• Misleading measure
• Good only if measured at constant frequency = EDP
• EDP - Energy-Delay Product (EDP)
à More accurate measure (Gonzalez & Horowitz)
• ED2P – Energy-Delay2-Product
à A new measure, being justified by new results (Hofstee, Nowka,
IBM)
4/24/02
EE371
26
Design & optimization tradeoffs
PDPtot [fJ]
90
80
•
Opposite Goals
70
60
à Minimal Total power
consumption
à Minimal Delay
50
40
30
20
•
•
Opt.
10
0
0
50
100
150
Power-Delay tradeoff
Minimize Power-Delay
product (PDPtot) @ f=const.
200
90
80
70
60
70
60
PDPtot [fJ]
PDPtot [fJ]
Total Power [uW]
90
80
50
40
30
20
0
5
10
30
20
10
0
Opt.
10
0
50
40
15
20
25
0
Width [um]
4/24/02
Opt.
200
400
600
800
1000
Delay [ps]
EE371
27
1200
Delay Comparison
(50% activity)
Overall
Results
5
MS Latch
Pulsed Latch
Differential
4.5
4
Delay [ FO4 ]
3.5
3
2.5
2
1.5
1
0.5
0
PowPC
4/24/02
C2MOS
HLFF
EE371
SDFF
StrongArm
SAbFF
28
Conventional Clk-Q vs.minimum D-Q
400
Total power [uW]
HLFF
SSTC & DSTC
350
PowerPC
Pulsed designs
300
MS designs
250
Strong Arm FF
200
SA-F/F
150
mC2MOS latch
100
K6 ETL
50
SSTC
0
0
1
2
3
4
5
6
7
8
9
10
11
Delay [ FO4 ]
DSTC
SDFF
400
HLFF
350
Total Power [uW]
PowerPC
300
Strong Arm FF
250
200
SA-F/F
150
mC2MOS latch
100
K6 ETL
50
SSTC
0
0
1
2
3
4
5
SDFF
Clk-Q delay [FO4]
4/24/02
DSTC
EE371
•
•
Hidden positive
setup time
Degradation of
total delay
Older 0.22u comparison results
29
Overall Results
Single-Edge Triggered Structures Power Consumption Comparison
(50% activity)
Internal Power [uW]
MS Latch
Data Power [uW]
Single Ended
Dual Ended
200
150
100
50
4/24/02
FF
CC
DE
SA
bF
F
F
CP
F
St
ro
ng
Ar
m
SE
EE371
TG
CC
FF
FF
SD
FF
HL
SS
TC
TC
DS
C2
M
O
S
0
Po
wP
C
Power Consumption [uW]
250
Clock Power [uW]
30
Internal Power distribution
Internal Power [uW]
400
350
300
250
200
150
100
50
0
Random,
activity=0.5
…01010101…
activity=1
…11111111…
activity=0
…00000000…
activity=0
Data patterns
HLFF
SDFF
PowerPC 603 latch
mC2MOS latch
StrongARM FF
Alpha 21264 FF
K6 ETL
•
Four sequences characterize the boundaries for internal power consumption
à
à
à
à
4/24/02
…010101…
random, equal transition probability,
…111111…
…000000…
maximum
average
precharge activity
leakage + internal clock processing
EE371
Older 0.22u comparison results
31
Comparison of Clock power consumption
DSTC MS latch
SSTC MS latch
K6 ETL
StrongArm FF
SA-F/F
2
mC MOS
PowerPC MS latch
SDFF
HLFF
0
10
20
30
40
50
Local Clock power consumption [? W]
Older 0.22u comparison results
4/24/02
EE371
32
Design goals
•
Apply
à Small clock load
à Short direct path
à Reduced node swing
à Low-power feedback
à Pulsed design
à Optimization of both
Master and Slave latch
•
Avoid
à Positive setup time
à Sensitivity to clock slope and
skew
à Dynamic (floating) nodes
à Dynamic Master latch
Conduct Energy - Delay optimizations
Take into account all sources of power dissipation
ALWAYS use Clk-Q + setup time for max delay
For more details on storage elements check prof. Oklobdzija’s ISSCC’02 talk:
http://www.ece.ucdavis.edu/acsel under Presentations
4/24/02
EE371
33
Simulation Conditions:
•
•
•
•
•
Power Supply Voltage: VDD=1.8V nominal
Temperature T=27°C nominal
Technology: 0.18µm Fujitsu
Fan-Out of 4 Delay = 75pS
Transistor Widths
à Minimal 0.36µm
 Maximal 10µm
• Load: 14 minimal inverters in the technology used
• Clock frequency: 500MHz (250MHz for Dual-Egde)
• Data/Clock slopes of ideal signal 100ps
4/24/02
EE371
34
Download