ECE 260B - CSE241A VLSI Digital Circuits

advertisement
ECE260B – CSE241A
Winter 2005
Timing Analysis and Correction
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
ECE 260B – CSE 241A Timing Analysis & Correction 1
http://vlsicad.ucsd.edu
Timing Analysis
 Testing
 Simulation



Device modeling (BSIM)
Transistor-level time domain analysis (SPICE)
Frequency domain interconnect analysis (AWE,
PRIMA)
 Static timing analysis


Transistor-level (PathMill)
Gate-level (PrimeTime)
ECE 260B – CSE 241A Timing Analysis & Correction 2
http://vlsicad.ucsd.edu
Sequential Machine
Combinational
logic
Combinational
logic
Combinational
logic
clk
clk
clk
 State is stored in registers (flip-flops or latches)
 Combinational logic computes next-state, outputs
from present-state, inputs
ECE 260B – CSE 241A Timing Analysis & Correction 3
Courtesy K. Keutzer et al. UCB
http://vlsicad.ucsd.edu
Why Clocks?
 Clocks provide the means to synchronize

By allowing events to happen at known timing boundaries, we
can sequence these events
 Greatly simplifies building of state machines
 No need to worry about variable delay through
combinational logic (CL)

All signals delayed until clock edge (clock imposes the worst
case delay)
FSM
Courtesy K. Yang, UCLA
Comb
Logic
register
ECE 260B – CSE 241A Timing Analysis & Correction 4
register
register
Comb
Logic
Dataflow
http://vlsicad.ucsd.edu
Clock Cycle Time
 Cycle time is determined by the delay through the CL


Signal must arrive before the latching edge
If too late, it waits until the next cycle
- Synchronization and sequential order becomes incorrect
 Constraint:

Tcycle > Tprop_delay_through_CL + Toverhead
Example: 3.0 GHz Pentium-4  Tcycle = 333ps
 Can change circuit architecture to obtain smaller Tcycle
ECE 260B – CSE 241A Timing Analysis & Correction 5
Courtesy K. Yang, UCLA
http://vlsicad.ucsd.edu
Pipelining
 For dataflow:



Instead of a long critical path, split the critical path into chunks
Insert registers to store intermediate results
This allows 2 waves of data to coexist within the CL
 Can we extend this ad infinitum?

Overhead eventually limits the pipelining
- E.g., 1.5 to 2 gate delays for latch or FF

Granularity limits as well
- Minimum time quantum: delay of a gate
T
 cycle
> Tpd + Toverhead
A
tpd1
Courtesy K. Yang, UCLA
CL
B
register
ECE 260B – CSE 241A Timing Analysis & Correction 6
CL
> max(tpd1, tpd2) + Toverhead
register
tpd
register
A+B
register
register
CL
T
 cycle
tpd2
http://vlsicad.ucsd.edu
Intel MPU FO4 INV Delays Per Clock Period
Number of FO4 inverter delays
120.00
100.00
386
486 DX2 DX4
80.00
Pentium
Pentium MMX
Pentium Pro
60.00
Pentium II
Celeron
40.00
Pentium III
Pentium 4
20.00
0.00
1982
1987
1993
1998
2004
Year


FO4 INV = inverter driving 4 identical inverters (no interconnect)
Half of frequency improvement has been from reduced logic stages, i.e., pipelining
ECE 260B – CSE 241A Timing Analysis & Correction 7
http://vlsicad.ucsd.edu
Let’s Revisit Cycle Time and Path Delay

Cycle time (T) cannot be
smaller than longest path
delay (Tmax)

Longest (critical) path
delay is a function of:

Total gate, wire delays

cycle time
data
Tclock1
Tmax  T
logic levels
Q2
Q1
Tclock1
critical path,
~5 logic levels
Tclock2
clock
ECE 260B – CSE 241A Timing Analysis & Correction 8
Courtesy K. Keutzer et al. UCB
http://vlsicad.ucsd.edu
Cycle Time - Setup Time


For FFs to correctly
capture data, must be
stable for:
Setup time (Tsetup) before
clock arrives
setup time
data
Tclock1
Tmax  Tsetup  T
Q2
Q1
Tclock1
critical path,
~5 logic levels
Tclock2
clock
ECE 260B – CSE 241A Timing Analysis & Correction 9
Courtesy K. Keutzer et al. UCB
http://vlsicad.ucsd.edu
Cycle Time – Clock Skew
 If clock network has
unbalanced delay – clock
skew
 Cycle time is also a
data
Tclock1
Tclock2
Q2
function of clock skew
(Tskew)
Tmax  Tsetup  Tskew  T
Q2
Q1
Tclock1
clock skew
critical path,
~5 logic levels
Tclock2
clock
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 10
http://vlsicad.ucsd.edu
10
Cycle Time – Flip-Flop Delay (Clock to Q)


Cycle time is also a
function of propagation
delay of FF (Tclk-to-Q or
Tc2q)
Tc2q : time from arrival
of clock signal till
change at FF output)
data
Tclock1
Tclock2
Q2
clock-to-Q
Tmax  Tsetup  Tskew  Tclk  to Q  T
Q2
Q1
Tclock1
critical path,
~5 logic levels
Tclock2
clock
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 11
http://vlsicad.ucsd.edu
Min Path Delay - Hold Time

For FFs to correctly latch
data, data must be
stable during:

Hold time (Thold) after clock
arrives

Determined by delay of
shortest path in circuit (Tmin)
and clock skew (Tskew)
hold time
data
Tclock1
Q2
Q1
Tclock1
Tmin  Thold  Tskew
short path, ~3
logic levels
Tclock2
clock
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 12
http://vlsicad.ucsd.edu
Setup, Hold, Cycle Times
cycle time
hold time –
D stable
after clock
set-up time – D stable
before clock
When signal
may change
Example of a single phase clock
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 13
http://vlsicad.ucsd.edu
Timing Constraints for Edge-Triggered FFs
Logic
FlipFlop
Comb
Tcycle
 Max(Tpd) < Tcycle – Tsetup – Tc2q – Tskew

Delay is too long for data to be captured
 Min(Tpd) > Thold-Tc2q+Tskew

Delay is too short and data can race through, skipping a state
ECE 260B – CSE 241A Timing Analysis & Correction 14
Courtesy K. Yang, UCLA
http://vlsicad.ucsd.edu
Example of Tpdmax Violation

Suppose there is skew between the registers in a dataflow
(regA after regB)



“i” gets its input values from regA at transition in Ck’
CL output “o” arrives after Ck transition due to skew
To correct this problem, can increase cycle time
Ck’
Ck
Tskew
Comb
Logic
o
regB
regA
i
Tpdmax
Ck
Too late!
Ck’
i
Courtesy K. Yang, UCLA
o
ECE 260B – CSE 241A Timing Analysis & Correction 15
Tpdmax
http://vlsicad.ucsd.edu
Example of Tpdmin Violation: Race Through




Suppose clock skew causes regA to be clocked before regB
“i” passes through the CL with little delay (tpdmin)
“o” arrives before the rising Ck’ causes the data to be latched
Cannot be fixed by changing frequency  have rock instead of chip
Ck’
Ck
Comb
Logic
o
regB
regA
i
Tpdmin
Tskew
Ck
Ck’
i
Too early!
Tpdmin
o
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 16
http://vlsicad.ucsd.edu
Summary: Timing Constraints
 Synchronous design = combinational logic +
sequential elements
FF
 For each flip-flop:
 Tmax+ Tsetup < Tcycle - Tskew
CLK
 Tmin > Thold + Tskew
Q
FF
combinational
logic
D
 Tmax : longest data
propagation path delay
CLK
 Tmin : shortest data
propagation path delay
DATA
Tcycle
Thold
ECE 260B – CSE 241A Timing Analysis & Correction 17
Tsetup
http://vlsicad.ucsd.edu
Clock Identification
 Partition the design
 Clock network
 Clock definition
 Derived clock
 Clock groups
 Clock delay (skew)
FF
FF
Q
combinational
logic
CLK1
/8 divider
calculation
 Timing constraints exist
D
CLK4
CLK2
CLK3
between clocks with a
common divisor frequency
 Data paths with timing
constraints
ECE 260B – CSE 241A Timing Analysis & Correction 18
http://vlsicad.ucsd.edu
Timing Graph
 Data paths with timing constraints


Starting from primary inputs/FF outputs
Ending at primary outputs/FF inputs
 Represented by a labeled directed graph G = <V,E>



Timing node V ~ pin/primary input/output
Timing edge E ~ gate/wire delay
(Timing arc ~ gate delay)
U
0
A
1
V
0
1
Y
2
1
U
.20 X
0
2
Z F
0
2
.15
C
2
.15
C
B
A
.20
X
F
V
.20
.20
1
2
B
2
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 19
Z
Y
http://vlsicad.ucsd.edu
Characterization
 Static analysis = vector-less worst case analysis


Graph based path propagation
No logics
 Pre-characterized look-up tables for gate delays

Min/max/rise/fall
 Characterized interconnect delays


On-the-fly delay calculation
SDF (standard delay format) annotation
X
X
Y
2
2
Z
Z
2
ECE 260B – CSE 241A Timing Analysis & Correction 20
Y
http://vlsicad.ucsd.edu
Compute Longest Path
A
1
U
0
0
Origin
(Kirkpatrick 1966, IBM JRD)
.20 X
2
.15
C
2
F
V
.20
1
2
B
2
Z
Y
Compute longest path in a DAG G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 21
http://vlsicad.ucsd.edu
Compute Longest Path
A
1
U
0
0
Origin
(Kirkpatrick 1966, IBM JRD)
.20 X
2
.15
C
2
F
V
.20
1
Z
2
B
2
Y
Compute longest path in a DAG G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}


Longest path(G){
Forward_prop(Origin) }
Dynamic programming
How to exclude a set of paths?
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 22
http://vlsicad.ucsd.edu
Timing Analysis Terminology
 Actual arrival time (AAT): forward propagation
 Required arrival time (RAT): backward propagation
 Slack = RAT - AAT




A measure of how much timing margin exists at each node
Slack < 0  timing violation
Can optimize a particular branch
Can trade slack for power, area, robustness
 Critical path
clock
ECE 260B – CSE 241A Timing Analysis & Correction 23
http://vlsicad.ucsd.edu
Static Timing Analysis Flow

Read in

design (LEF/DEF)

timing library (.lib)
timing constraints (GCF)
delay annotation (SDF)



Set up constraints




Annotated delays
IO path constraints
Single cycle setup/hold
checks
Timing exceptions
-

Construct timing graph




AAT propagation


Partition clock domain
(form path groups)
Ideal/propagated clock
Case analysis
Levelization
Timing report


End points with violations
Path enumeration
False paths
Multi-cycle paths
Max delay constraints
Min delay constraints
ECE 260B – CSE 241A Timing Analysis & Correction 24
http://vlsicad.ucsd.edu
Timing Exceptions
 False paths: topologically connected but logically
impossible to enable
 To enable a path


Logically: non-controlling values (e.g., 0 for OR gates, 1 for AND
gates) at side inputs
Temporally: earlier signal transitions at side inputs
clock
ECE 260B – CSE 241A Timing Analysis & Correction 25
http://vlsicad.ucsd.edu
False Path Representation
 Abstracted graph

Set_false_path -from {…} –through {…} … -through {…} –to {…}
through
through
from
to
from
to
through
ECE 260B – CSE 241A Timing Analysis & Correction 26
through
http://vlsicad.ucsd.edu
False Path Identification
 Tagged timing analysis


Arrival times with the same tag are compared to find worst case
False path filtered
arr: 1
tag: 0
arr: 2
tag: 2
b
d
a
c
arr: 3
tag: 3
clock
from
a
to
through
through
d
b
c
tag: 2
tag: 3
ECE 260B – CSE 241A Timing Analysis & Correction 27
http://vlsicad.ucsd.edu
Handling Latch-Based Designs
 Latch: level enabling sequential element
Latch
 Transparent signal propagation
 Time borrowing


combinational
logic
Path delay of previous stage
– Tborrow
Path delay of current stage
CLK
+ Tborrow
D Q
combinational
logic
CLK
DATA
transparent
Tborrow
ECE 260B – CSE 241A Timing Analysis & Correction 28
http://vlsicad.ucsd.edu
Counting Process Variation
 Off-chip variation: two paths on a chip cannot use two
different operating conditions (i.e., corners) at the same
time for setup or hold analysis


Launchclock_latepath (max) + data_latepath (max) <
captureclock_earlypath (max) + clock_period – setup
Launchclock_earlypath (min) + data_earlypath (min) >
captureclock_latepath (min) + hold
 On-chip variation: the software calculates the delay for
one path based on maximum operating condition while
calculating the delay for another path based on minimum
operating condition for setup or hold checks
 Statistical static timing analysis (SSTA)


pdf
Continuous pdf (probability distribution functions)
Or discrete corners
ECE 260B – CSE 241A Timing Analysis & Correction 29
http://vlsicad.ucsd.edu
Clock Re-convergence Pessimism Removal
 Common part of two clock propagation paths cannot
have two different path delays at the same time
 Need to compute clock propagation delay from the
branch point
FF
Q
max
combinational
logic
FF
min
D
CLK
max
Common part
ECE 260B – CSE 241A Timing Analysis & Correction 30
http://vlsicad.ucsd.edu
Outline
 Timing Analysis


Timing Requirements
Static Timing Analysis
 Timing Correction
ECE 260B – CSE 241A Timing Analysis & Correction 31
http://vlsicad.ucsd.edu
Timing Correction
 Driven by STA

“Incremental performance analysis backplane”
 Two goals


Fix logic design rule violations
Fix timing problems
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 32
http://vlsicad.ucsd.edu
Logic Design Rules
 Constraints of



Fanout
Slew rate
Load cap
 Reduce timing look-up table extrapolation error
 Control signal integrity




Transition degradation
Crosstalk noise
Supply voltage drop
Device reliability
 Approaches



Resizing
Buffering
Cloning (copying cells)
ECE 260B – CSE 241A Timing Analysis & Correction 33
http://vlsicad.ucsd.edu
Timing Correction Approaches

Re-synthesis


Timing-driven placement


Critical net weighting
Timing-driven routing




Local synthesis transforms
Net ordering
Buffering
Topology optimization
Post-route optimization (IPO)




Re-routing
Re-timing and useful clock skew
Sizing
Buffering
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 34
http://vlsicad.ucsd.edu
Local Synthesis Transforms




Resize cells



Move critical signals forward
Buffer or clone to reduce load on critical nets
Decompose large cells
Swap connections on commutative pins or among
equivalent nets
Pad early paths
Area recovery
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 35
http://vlsicad.ucsd.edu
Transform Example
…..
Double Inverter
Delay = 4
Removal
…..
…..
Delay = 2
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 36
http://vlsicad.ucsd.edu
Resizing
?
b
0.2
e
0.2
f
0.3
d
a
d
0.05
0.04
0.03
0.02
0.01
0
0
a
0.2
A
b
0.8
0.6
0.4
1
load
0.035
A
B
C
a
C
b
0.026
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 37
http://vlsicad.ucsd.edu
d
Cloning
0.05
0.04
0.03
0.02
0.01
0
0
0.2
0.4
0.6
0.8
1
load
A
a
?
b
d
0.2
e
0.2
f
0.2
g
h
0.2
0.2
B
C
d
A
f
a
B
b
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 38
e
g
h
http://vlsicad.ucsd.edu
d
Buffering
0.05
0.04
0.03
0.02
0.01
0
0
0.2
0.4
0.6
0.8
1
load
A
a
?
b
d
0.2
e
0.2
f
0.2
g
h
B
C
0.2
e
0.2
a
B
b
0.2
0.2
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 39
d
0.1
B
f
0.2
g
0.2
0.2
h
http://vlsicad.ucsd.edu
Redesign Fan-in Tree
Arr(a)=4
Arr(b)=3
a
b
1
e
1
Arr(c)=1
Arr(d)=0
c
Arr(e)=6
1
d
a
b
c
d
1
e
1
Arr(e)=5
1
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 40
http://vlsicad.ucsd.edu
Redesign Fan-out Tree
3
3
1
1
1
1
1
1
1
1
2
1
1
Longest Path = 4
Slowdown of buffer due to load
Longest Path = 5
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 41
http://vlsicad.ucsd.edu
Decomposition
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 42
http://vlsicad.ucsd.edu
Swap Commutative Pins
1
0
a
1
1
2
b
5
1
c
2
Simple sorting on arrival times and delay works
1
2
3
c
1
1
b
0
1
a
2
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 43
http://vlsicad.ucsd.edu
Logic Restructuring 1
• Nodes in critical section that fan out outside of critical
section are duplicated
f
f
a
Collapsed
node
a
b
e
e
e
b
h
h
d
c
Late input
signals
ECE 260B – CSE 241A Timing Analysis & Correction 44
c
Slides courtesy of Keutzer
d
http://vlsicad.ucsd.edu
Logic Restructuring 2
 Place timing-critical nodes closer to output

Make them pass through fewer gates

After collapse, a divisor is selected such that substituting k into f places critical
signal c and d closer to output
Re-extract factor k
f
Collapse critical section
k
f
Collapsed
node
a
b c
d
d
divisor
e
ECE 260B – CSE 241A Timing Analysis & Correction 45
e
a
b
Slides courtesy of Keutzer
c
close to
output
http://vlsicad.ucsd.edu
Summary of Local Synthesis Transforms
 Variety of methods for delay optimization

No single technique dominates
 The one with more tricks wins? No!
 Technology dependant (for gate delay)

Differ with cell libraries
 Methodology dependant (for wire delay)


Need to predict placement and routing result
Uncertainty!
 Pros: large potential improvement
 Cons: less predictable, more expensive
ECE 260B – CSE 241A Timing Analysis & Correction 46
http://vlsicad.ucsd.edu
Summary of Local Synthesis Transforms
 Work smoothly in a physical synthesis flow

Tight integration with placement and routing
 Need a good framework for evaluating and processing
different transforms

Accurate, fast timing engine with incremental analysis capability
- don’t want to retime the whole design for each local transform

Simultaneous min and max delay analysis
- How does fixing the setup violation affect the existing hold checks?
ECE 260B – CSE 241A Timing Analysis & Correction 47
http://vlsicad.ucsd.edu
Timing Correction Approaches
 Re-Synthesis

Local Transformation
 Timing-Driven Placement
 Timing-Driven Routing
 Post-Route Optimization (IPO)




Re-Routing
Re-Timing and Useful Clock Skew
Sizing
Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 48
http://vlsicad.ucsd.edu
Reducing Crosstalk Effect
 Shielding


Effective for short range capacitive coupling
Not for long range inductive coupling
 Net ordering (wire swizzling)
ECE 260B – CSE 241A Timing Analysis & Correction 49
http://vlsicad.ucsd.edu
Reducing Crosstalk Effect
 Shielding
 Net ordering
 Gate sizing


A strong driver is less sensitive to crosstalk
But more likely to project crosstalk to its neighbors
ECE 260B – CSE 241A Timing Analysis & Correction 50
http://vlsicad.ucsd.edu
Reducing Crosstalk Effect
 Shielding
 Net ordering
 Gate sizing
 Buffering


Partition interconnects
Mutual canceling:
ECE 260B – CSE 241A Timing Analysis & Correction 51
http://vlsicad.ucsd.edu
Timing Correction Approaches
 Re-Synthesis

Local Transformation
 Timing-Driven Placement
 Timing-Driven Routing
 Post-Route Optimization (IPO)




Re-Routing
Re-Timing and Useful Clock Skew
Sizing
Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 52
http://vlsicad.ucsd.edu
Re-Timing
 How would you meet the 10ns clock cycle time?
FF
FF
FF
D Q
D Q
D Q
6
clock
4
2
4
4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 53
http://vlsicad.ucsd.edu
Re-Timing
 Re-order sequential elements and combinational logic
 Did you see a problem here?
FF
FF
FF
D Q
D Q
D Q
6
clock
4
4
2
4
Cycle = 10
FF
FF
FF
D Q
D Q
D Q
6
clock
4
2
4
4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 54
http://vlsicad.ucsd.edu
Re-Timing
 Re-order sequential elements and combinational logic
 Need to predict placement and routing
FF
FF
FF
D Q
D Q
D Q
6
clock
4
4
2
4
Cycle = 10
FF
FF
FF
D Q
D Q
D Q
6
clock
4
2
4
4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 55
http://vlsicad.ucsd.edu
Useful Clock Skew
 Equivalent to re-timing
 Clock tree re-construction



Insert delay cells
Snaking
Add dummy capacitive load
FF
FF
FF
D Q
D Q
D Q
6
4
4
2
4
+2
clock
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 56
http://vlsicad.ucsd.edu
Timing Correction Approaches
 Re-Synthesis

Local Transformation
 Timing-Driven Placement
 Timing-Driven Routing
 Post-Route Optimization (IPO)




Re-Routing
Re-Timing and Useful Clock Skew
Sizing
Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 57
http://vlsicad.ucsd.edu
Driving Large Capacitances: Inverter As Buffer
A
U*A
1
U
In
Cin


Total propagation delay = tp(inv) + tp(buffer)

Minimize tp = U * tp0 + X/U * tp0
tp0 = delay of min-size inverter with single min-size inverter as
fanout load


CL = X * Cin
Uopt = sqrt(X) ; tp,opt = 2 tp0 * sqrt(X)
Use only if combined delay is less than unbuffered case
•Slide courtesy of Mary Jane Irwin, PSU
ECE 260B – CSE 241A Timing Analysis & Correction 58
http://vlsicad.ucsd.edu
Delay Reduction With Cascaded Buffers
CL = xCin = uN Cin
in
Cin
1
u2
u
C1
uN-1
C2
out
CL

Cascade of buffers with increasing sizes (U = tapering factor)
can reduce delay

If load is driven by a large transistor (which is driven by a
smaller transistor) then its turn-on time dominates overall
delay

Each buffer charges the input capacitance of the next buffer in
the chain and speeds up charging, reducing total delay

Cascaded buffers are useful when Rint < Rtr
•Slide courtesy of Mary Jane Irwin, PSU
ECE 260B – CSE 241A Timing Analysis & Correction 59
http://vlsicad.ucsd.edu
t as Function of U and X
p
u/ln(u)
60.0
40.0
x=10,000
x=1000
20.0
x=100
x=10
0.0
1.0


3.0
5.0
7.0
u
Total line delay as function of driver size, load capacitance
Question: Derive the optimum (min-delay) value of U.
•Slide courtesy of Mary Jane Irwin, PSU
ECE 260B – CSE 241A Timing Analysis & Correction 60
http://vlsicad.ucsd.edu
Reducing RC Delay With Repeaters


RC delay is quadratic in length  must reduce length
Observation: 22 = 4 and 1+1 = 2 but 12 + 12 = 2
driver
receiver
driver
receiver
L = 2 units
 Repeater = strong driver (usually inverter or pair of inverters for
non-inversion) that is placed along a long RC line to “break up”
the line and reduce delay
ECE 260B – CSE 241A Timing Analysis & Correction 61
http://vlsicad.ucsd.edu
Repeaters vs. Cascaded Buffers

Repeaters are used to drive long RC lines



Cascaded buffers are used to drive large capacitive
loads, where there is no parasitic resistance



Breaking up the quadratic dependence of delay on line length is
the goal
Typically sized identically
We put all buffers at the beginning of the load
This would be pointless for a long RC wire since the wire RC
delay would be unaffected and would dominate the total delay
Optimum buffering for an uniform long interconnect


Cascaded buffers at source and sink
Identical sized and spaced repeaters in between
ECE 260B – CSE 241A Timing Analysis & Correction 62
http://vlsicad.ucsd.edu
Buffering a Tree for Timing Optimization

Van Ginneken’s dynamic
programming




Bottom-up traversal
Evaluate each sub-tree by a triple
<delay, cap, cost>
Filter out sub-optimal solutions
Limitations




<delay, cap, cost>
<delay, cap, cost>
Buffer insertion locations (explored by
edge segmenting)
Buffer insertion constraints (e.g., legal
buffer locations)
Routing detour
Delay calculation accuracy (wire
delay, slew rate, etc.)
ECE 260B – CSE 241A Timing Analysis & Correction 63
http://vlsicad.ucsd.edu
Buffering a Tree for Load Cap Constraints

Greedy for a single line



Greedy for a fixed routing tree




Bottom-up traversal
Insert a buffer when load cap reaches
limit
Bottom-up traversal
For each edge, greedy insertion
For each node, buffer the branch with
the largest cap
NP-hard for simultaneous
buffering and routing construction
C1
C2
C3
C4
C1 < U, C2 < U, C3 < U, C4 < U
C1 + C2 + C3 + C4 > U
ECE 260B – CSE 241A Timing Analysis & Correction 64
http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction

Minimum wirelength
(Steiner Minimum Tree)


Given a set of terminals S

Find an additional set of
points A such that a
spanning tree T over S  A
has minimum wirelength
May not be timing optimum



S
T
Some sinks are more timing
critical than others
Some sinks have larger
capacitive load
Buffers?
ECE 260B – CSE 241A Timing Analysis & Correction 65
http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction

Minimum wirelength
(Steiner Minimum Tree)


Shortest Path Tree
AHHK Tree




Cost(q) = k * path_length(p)
+ edge_length(p, q)
k = 0  minimum wirelength
k = 1 shortest path
S
T
Heuristics with sink timing
criticality weights
ECE 260B – CSE 241A Timing Analysis & Correction 66
http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction

Simultaneous routing tree construction
and buffer insertion



Dynamic programming


Buffer station (legal buffer locations)
Routing blockage
P-Tree
Clustering (C-Tree)




Timing criticality
Geometric distance
Signal polarity
Try AHHK with different k
ECE 260B – CSE 241A Timing Analysis & Correction 67
http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Topology Optimization


Chicken-egg dilemma (delay vs. routing)
Iterative greedy improvement (Q-Tree)
Delta Elmore delay
S
T
Buffer location
ECE 260B – CSE 241A Timing Analysis & Correction 68
http://vlsicad.ucsd.edu
Download