Timing Closure Today

advertisement
Timing Closure Today
Lou Scheffer
Cadence
San Jose, CA
Lou@cadence.com
Hangzhou, April 2002
Lou Scheffer
1
Timing Closure Today
Design Entry
Synthesis
Timing
Place
Timing
•Timing more accurate as flow progresses
•Sometimes an earlier stage thinks timing is
OK, but it fails a later stage
•Need to repeat one or more steps with
tighter constraints
• We have a timing closure problem when
this process fails. Symptoms include:
•Non-convergence
Route
•Too many iterations
Timing
•Solution achievable, but this flow
cannot find it.
Hangzhou, April 2002
Lou Scheffer
2
The Timing Closure Problem
Performance of Circuit
Test 7
Frequency (Target !00MHz)
100
100
99
99
96
95
90
pks
regular
85
83
80
78
75
PKS/WLM
P&R
IPO
P&R
Stage
Hangzhou
Lou Scheffer
II-3
Examples of Problems
Worst slack / # misses
Synthesis
Placed
Cycle
time
C1
-1 / 2000
-12 / 38k
7.5 ns
.25 µm
V1
0/0
-12 / 15k
7.5 ns
.18 µm
T1
-0.5 / 2000
-48 / 164k
P1
-0.4 / 100
-97 / 43k
8 ns
.25 µm
V2
-0.5 / 500
-11 / 2000
7.5 ns
.18 µm
Design
Hangzhou, April 2002
Lou Scheffer
Tech
2.5-10 ns .18 µm
4
Agenda
Timing Analysis Overview
 Traditional design flows
 Summary of DSM Problems
 Correction Methods Overview
 Hierarchy and Timing Closure
 Block Level Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-5
Timing Analysis
Give accurate time values on each pin/port
of the network
 Has to deal with design changes in
optimization toolbox
 Static Timing Analysis

 Simulation
far too slow in optimization
environment
 Accuracy is more than enough
Hangzhou
Lou Scheffer
II-6
Timing Analysis Requirements

Choose combination of timing analyzer and delay
calculator which are appropriate for level of
design
 give
the best accuracy
 for performance that can be tolerated

Timing Analysis / Delay calculation must be able
to cope with logic design changes
 Incremental
 Highest
performance possible
 Non-linear delay models
Hangzhou
Lou Scheffer
II-7
Timing Analysis Requirements

Must handle…
 Difference
between rising and falling delays
 Delay dependent on slew rate
 Slew and delay dependent on output load
 Non-linear delay equations
Hangzhou
Lou Scheffer
II-8
Late Mode Analysis Definitions
d ax



ATa
a
ATb
b
y
RATx
x
c
Constraints: assertions at the boundaries
– Arrival times: ATa, ATb
– Required arrival time: RATx
Delay from a to x is the longest time it takes to
propagate a signal from a to x
Slack is required arrival time - arrival time.
Hangzhou
Lou Scheffer
II-9
Example
SL y  1  2  1
SLa  0  0  0
ATa  0
a
ATb  1
b
SLb  0  1  1
AT y  2
RATx  2
y
1
1
c
ATc  0
SLc  1  0  1
Hangzhou
Lou Scheffer
x
ATx  3
SLx  2  3  1
II-10
Early mode analysis


Definitions change as follows
– longest becomes shortest
– slack = arrival – required
Not as important since early violations are easier to fix
SL y  1  1  0
SLa  0  0  0
ATa  0
a
ATb  1
b
RATx  2
y
1
SLb  1  0  1
Hangzhou
AT y  1
1
c
ATc  0
SLc  0  1  1
Lou Scheffer
x
ATx  1
SLx  1  2  1
II-11
Delay modeling
a
b
dax
dbx
x
d
tcl _ d
o
cl
Propagation Arcs
dcl _ o
Timing Model
Hangzhou
Test Arc
Lou Scheffer
II-12
Agenda
Timing Analysis Overview
 Traditional design flows
 Summary of DSM Problems
 Timing Correction Overview
 Approaches to Fixing Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-13
Traditional Design Flows
Design Entry
1.
Tech independent
optimization
Synthesis
2.
Tech mapping
3.
Rudimentary
timing correction
Timing
Place
Timing
Route
Timing
Hangzhou
Lou Scheffer
II-14
Logic Synthesis

Technology independent optimization
 General
goal: reduce connections, literals,
redundancies, area

Technology mapping
 Map

logic into technology library
Timing correction added next
 Find
and fix critical timing paths
 Fix electrical violations (load, slew)
Hangzhou
Lou Scheffer
II-15
Traditional Design Flows
Design Entry
1.
Tech independent
optimization
Synthesis
w/Timing
2.
Tech mapping
3.
Timing correction
Place w/Timing
Route
Timing
Integrate timing with
synthesis and placement
Hangzhou
Lou Scheffer
II-16
Traditional Design Flows
Design Entry
1.
Tech independent
optimization
Synthesis/Place
ment w/Timing
2.
Tech mapping
3.
Placement
4.
Timing Correction
Global Route
Detailed Route
Timing
Integrate timing with
synthesis and placement
Hangzhou
Lou Scheffer
II-17
Traditional Design Flows
Design Entry
Synthesis and
Placement
w/Timing and
Global route
1.
Tech independent
optimization
2.
Tech mapping
3.
Placement
4.
Timing Correction
5.
Global route
Detailed Route
Timing
Integrate timing with
synthesis, placement and
global route
Hangzhou
Lou Scheffer
II-18
Agenda
Timing Analysis Overview
 Traditional design flows
 Summary of DSM Problems
 Correction Methods Overview
 Hierarchy and Timing Closure
 Block Level Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-19
The Wall
Logic designers concentrate on logic and
timing (as understood by synthesis)
 Design work done in abstract world

 Was
gates and wire load models
 Now may include placement and global route
Throw design over the wall when complete
 Physical designers concentrate on layout
and ability to route
 Effective method for many years

Hangzhou
Lou Scheffer
II-20
General CMOS Problems

Low drive strengths / low power
 Capacitance
(not intrinsic delay) plays a large
role in performance
 Huge variability – range between slowest
possible and fastest possible

Noise affects delay
 IR
drop a big percentage of supply
 Crosstalk can change delay by a factor of 2
Hangzhou
Lou Scheffer
II-21
Additional DSM Problems
High density / huge designs
 Very thin and resistive wires
 Very high frequencies

 Inductance

becomes more important
Smaller voltages
 IR
drop a bigger fraction of signal swing
Clock skew and latency
 Electromigration and noise

Hangzhou
Lou Scheffer
II-22
Clock Distribution Problems
Most common design approach requires
close to zero skew
 CMOS / DSM problems all affect clocks
 Distribution problem increasing

 Number
of latches/flip-flops growing
significantly

Power consumed in clock tree significant
 I
Hangzhou
and noise also of concern
Lou Scheffer
II-23
Process Designers are trying to help
Many metal layers
 Different metal pitches

 Small
pitch for local interconnect
 Big pitch/thick metal for long, fast wires
Copper wires, thick metal to lower R
 SOI – Silicon On Insulator
 Low k dielectrics
 These help but are not enough

Hangzhou
Lou Scheffer
II-24
Agenda
Timing Analysis Overview
 Traditional design flows
 Summary of DSM Problems
 Correction Methods Overview
 Hierarchy and Timing Closure
 Block Level Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-25
Timing Correction

Fix electrical violations (slew and load).
Takes priority since needed for reliability.
 Resize
cells
 Buffer nets
 Copy (clone) cells

Fix timing problems
 Local
transforms (bag of tricks)
 Path-based transforms
Hangzhou
Lou Scheffer
II-26
Local Transforms







Resize cells
Buffer or clone to reduce load on critical nets
Decompose large cells
Swap connections on commutative pins or among
equivalent nets
Move critical signals forward
Pad early paths
Area recovery
Hangzhou
Lou Scheffer
II-27
Transform Example
…..
Double Inverter
Delay = 4
Removal
…..
…..
Delay = 2
Hangzhou
Lou Scheffer
II-28
Resizing
?
b
0.2
e
0.2
f
0.3
d
a
d
0.05
0.04
0.03
0.02
0.01
0
0
a
0.2
A
b
0.8
0.6
0.4
1
load
0.035
A
B
C
a
C
b
0.026
Hangzhou
Lou Scheffer
II-29
d
Cloning
0.05
0.04
0.03
0.02
0.01
0
0
0.2
0.4
0.6
0.8
1
load
A
a
?
b
d
0.2
e
0.2
f
0.2
g
h
0.2
0.2
B
C
d
A
e
f
a
B
b
g
h
Can also isolate critical sinks
Hangzhou
Lou Scheffer
II-30
d
Buffering
0.05
0.04
0.03
0.02
0.01
0
0
0.2
0.4
0.6
0.8
1
load
A
a
?
b
d
0.2
e
0.2
f
0.2
g
h
Hangzhou
B
C
d
0.2
e
0.2
a
B
b
0.2
0.2
0.1
B
f
0.2
g
0.2
0.2
h
Lou Scheffer
II-31
Redesign Fan-in Tree
Arr(a)=4
Arr(b)=3
a
b
1
e
1
Arr(c)=1
Arr(d)=0
c
Arr(e)=6
1
d
a
b
c
d
Hangzhou
1
e
1
Arr(e)=5
1
Lou Scheffer
II-32
Redesign Fan-out Tree
3
3
1
1
1
1
1
1
1
1
2
1
Longest Path = 5
Hangzhou
1
Longest Path = 4
Slowdown of buffer due to load
Lou Scheffer
II-33
Decomposition
Hangzhou
Lou Scheffer
II-34
Swap Commutative Pins
1
0
a
1
1
2
b
5
1
c
2
Simple Sorting on arrival times and delay works
1
2
3
c
1
1
0
b
1
a
2
Hangzhou
Lou Scheffer
II-35
Move Critical Signals Forward
a
b
c
d
e

a
b

e
d
Based on ATPG
– linear in circuit size
– Detects redundancies
efficiently
Efficiently find wires to
be added and remove.
– Based on mandatory
assignments.
c
Hangzhou
Lou Scheffer
II-36
Path-based Transforms
Path-based resizing
 Unmap / remap a path or cone
 Slack stealing
 Retiming

Hangzhou
Lou Scheffer
II-37
Slack Stealing

Take advantage of timing behavior of level sensitive registers
(latches)
C1
C2
0
Slack = -1
C1
1
2
Slack = +1
C2
C1
Slack = 0
C2
Hangzhou
Lou Scheffer
II-38
Retiming
Backward
Delay=3
Forward
A more aggressive optimization
Delay=2
since it changes the function
Hangzhou
Lou Scheffer
II-39
Solutions to Timing Closure
Carry hierarchical logic design into physical
 Hand / Custom design
 Improved analysis
 More sophisticated clock design
 Modify existing flows
 More physically knowledgeable tools

 Many
variations: combined synthesis/place/route,
gain based synthesis, etc.
Hangzhou
Lou Scheffer
II-40
Agenda
Analysis Methods Overview
 Traditional design flows
 Summary of DSM Problems
 Correction Methods Overview
 Hierarchy and Timing Closure
 Block Level Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-41
Hierarchy and Physical Design
Logical hierarchy can be carried over into
physical design
 Seems natural top-down approach, using
floorplanning as a firm guide to physical
design
 Use of hierarchy offers many advantages
and many possible problems
 A new generation of tools for this problem

Hangzhou
Lou Scheffer
II-42
Pin Assignment and Timing Budgeting
Each block requires:
 Content definition
Block 1
L
Block 2
L
 Partitioning
 Pin
locations
 Clock/timing definition
 Set_input_delay
L
 Set_output_delay
 Set_drive
Block 3
 Set_load
 Path
exceptions
(false, multicycle paths)
Hangzhou
Lou Scheffer
II-43
Hierarchy and Physical Design
Advantages…
Run time of P&R tools
 Blocks can be built independently
 Early (and valuable) knowledge of global wires
 Limited wire delay within macro may allows
simpler methodologies
 Contains the problem size
 Extends naturally to SOC and mixed A/D chips
 May be the only real method available

Hangzhou
Lou Scheffer
II-44
Physical Hierarchy Disadvantages
Possible to overconstrain the design in many
ways (see next slide)
 Hierarchy usually logic-based, not
physically-based

 Designed
for logical correctness, not physical
implementation
Hangzhou
Lou Scheffer
II-45
Physical Hierarchy Overconstraints

Placement solution perhaps overconstrained
 Logical

Ability to find a routable solution hindered
 Can’t

gates may not fit naturally in a rectangle
detour through neighboring cell
Boundary conditions explode and must be
managed carefully to avoid surprises
 A recent
IBM design had 17,000 top level
connections. A bad timing constraint on any one
can make the whole design infeasible
Hangzhou
Lou Scheffer
II-46
Hierarchy Example Plots
Hangzhou
Lou Scheffer
II-47
Hierarchy Example Plots
Hangzhou
Lou Scheffer
II-48
Hierarchy Example Plots
Hangzhou
Lou Scheffer
II-49
The Challenges
How to derive sensible partitioning?
 How to achieve die utilization similar to
“flat” approach?
 How to achieve clock speed and skews
similar to “flat” approach?
 How to automatically generate optimal pin
assignments for each module?
 How to automatically come up with realistic
timing budgets for each module?

Hangzhou
Lou Scheffer
II-50
Basic Approach to solution
Example tool – First Encounter
 Start with a Silicon Virtual Prototype

•A near final quality ‘flat’ placement
RTL / gates
•Near legal routing
•Known feasible solution for timing and
routability
•Use this solution to guide the final
implemention
Silicon Virtual Prototype
hierarchical partitioning and placement
Top Level
Block Level
Top level buffering,
clock balancing,
and power grid
Physical synthesis
/ placement
and routing
•Partitioning, pin assignment,
timing constraints
•Build the blocks with more detailed
tools.
Hangzhou
Chip assembly routing
GDSII
Lou Scheffer
II-51
Basic Approach continued
Logic Design: RTL, gates, IP, “black box”
IP
a=b+c
const.
netlist
Physical Prototype
Power
IPO
Clock Tree
STA
Delay Calc
RC Extract
Trial Route
Scan Opt.
(proves timing and
routability)
Silicon Virtual Prototype
Placement
Complete “flat” physical
design
Physical
Data
.lib
Accurate timing
and routability
data in hours
instead of days or
weeks
VERY FAST
Full-Chip Physical Prototype
Hand off a floorplan OR full placement
Block-Level Physical Synthesis and/or Route
Hangzhou
Lou Scheffer
Confidence the
design will work
once the blocks
are re-assembled
into the complete
IC
II-52
In-Context Hierarchical Partitioning




Partitioning
Hangzhou
Pin assignment
Timing budgeting
Clock tree generation
Power grid planning
Independent block-level
implementation
Lou Scheffer
SoC assembly
II-53
In-Context Pin Assignment
Accurate Physical Prototype
Flat Full-Chip

Top Level Partition View
Full-chip prototype results in optimal pin placement
 Results
in narrower channels and reduced die size
 Reduces the routing congestion
 Improves the chip timing
Hangzhou
Lou Scheffer
II-54
In Context Timing Budgeting
Block 1
L
Block 2
L
L
Block 3
Each block requires:
 Clock definition
 Set_input_delay
 Set_output_delay
 Set_drive
 Set_load
 Path exceptions
(false, multicycle paths)
Accurate timing budgets result in predictable
timing convergence
Hangzhou
Lou Scheffer
II-55
Agenda
Analysis Methods Overview
 Traditional design flows
 Summary of DSM Problems
 Correction Methods Overview
 Hierarchy and Timing Closure
 Block Level Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-56
Blocks have timing closure
problems, too

Didn’t the big flat placement guarantee
blocks are feasible? No, because
 Block
may not have been defined when global
constraints were set
 Global placer does not deal with all DSM effects
 Block may be too hard for the relatively simple
global placer (which must be very fast)
 Requirements change as project progresses
 Process technology may have changed
 …..
Hangzhou
Lou Scheffer
II-57
Hand/Custom Design

Mentioned for completeness
 Hurts
productivity
 Yields highest performance

Can only fix a few things – for example:
 Can
realistically fix timing or crosstalk
problems on a few nets
 Cannot realistically change the size of blocks
Hangzhou
Lou Scheffer
II-58
Improved Analysis Helps


Plot shows slack by net for two designs
A 10% timing delta -> many more bad nets

Often the difference between success and failure
3500
Series1
Series2
3000
Number of Nets
2500
2000
1500
1000
500
0
-5
0
5
10
15
20
Slack Relative to Worst Net (ns)
Hangzhou
Lou Scheffer
II-59
More accurate analysis

Crosstalk induced delay
approach – overestimate coupling C
 Better – compute nominal timing + xtalk delta
 Old

Customer example from CadMos
 Ignore
 Not
crosstalk completely
400 MHz
an acceptable alternative
 Coupling
Caps overestimated by 60% 300 MHz
 Nominal delays + computed crosstalk 333 MHz
 More accurate analysis gains 10% margin
Hangzhou
Lou Scheffer
II-60
Increased accuracy helps

Global/detailed route correlation
 Any
global route better than Wire Load Models
or Steiner trees, since global routes consider
congestion
 But to get that last 10%, need global/detailed
router link
 Knowing
some nets must detour is good, but….
 Which net takes which detour is needed for good
correlation
Hangzhou
Lou Scheffer
II-61
Modified clock design
Zero skew is not necessary, and often not
even desirable
 We have the freedom to adjust clock arrival
times at memory elements

 This
obtains more margin and thus helps
convergence
Similar to retiming but less disruptive
 Improvement very design dependent

 If

worst path is flip-flop to itself, doesn’t help
May impact scan chains
Hangzhou
Lou Scheffer
II-62
Previous attempts to fix block closure

Without the radical step of combining synthesis
and placement, designers have tried:
 Allow
placer to do sizing and buffering
 Do post placement optimization
Simple transformations
 Use existing placement

 Do
post placement re-synthesis
Complex transformations allowed
 Needs incremental placement and extraction



But these have not been fully successful
Why? Re-examine the root cause of discrepancies
 Wire
load models and their limitations
 Combined Synthesis/Placement/Routing
Hangzhou
Lou Scheffer
II-63
Post-Placement Optimization
Design Entry
Synthesis
w/Timing
Place
Re-run Synthesis
w/Timing
1.
In-place optimizations
2.
Minimally disturb
placement optimizations
Route
Timing
Hangzhou
Lou Scheffer
II-64
Post-Placement Optimization

In-place (little or no placement impact)
 Resizing
(carefully)
 Pin swapping, some tree rebuilding
 Wire sizing / typing

Minimally disruptive
 Resizing
 Buffering
 Cloning
 Tree
rebuilding
 Cell removal
Hangzhou
Lou Scheffer
II-65
In-place Optimization
Not too difficult
 Can use extracted electrical data (C, RC)
from placement tool

 Some
changes affect pin locations, but may be
ignored
 Tree rebuilding needs incremental extraction

Can use timing reports for timing data
 But,
accuracy suffers as changes are made
 Real RC data replaced by estimates again
Hangzhou
Lou Scheffer
II-66
In-place Optimization
Resize
swap pins
rebuild trees
Placed
netlist
Placement &
extraction
Optimization
C/RC
data
Opt’d
netlist
Hangzhou
Lou Scheffer
II-67
Place-disruptive Optimization

Nets changing implies…
 Must
be able to recompute C and RC
 May need to incrementally place new cells
 Need incremental timing capability
Hangzhou
Lou Scheffer
II-68
Place-disruptive Optimization
Resize
buffer
clone
cell removal
rebuild trees
Placed
netlist
Optimization
with placer,
timer, extractor
Placement &
extraction
C/RC
data
Opt’d
netlist
Hangzhou
Lou Scheffer
II-69
What are the problems?

Getting the timing right
 Different
timers used at different stages
 Do the optimizer and placer see the same worst
paths as the static timer?

Design size / tool capacity
 Using
Hangzhou
synthesis technology on flat designs
Lou Scheffer
II-70
More problems

Incompatible tools, formats
 Placer,
synthesizer, timer may all use different
file format, may all be different vendors
 Basic interoperability issues

Incremental placer needed for new cells
 Doesn’t
have to be smart
 But might produce some infeasible solutions
 Must be integrated with optimizer
Hangzhou
Lou Scheffer
II-71
Still more challenges/problems
Extraction/Estimation of net data
 Any optimization which significantly alters
net topology needs this ability

 Insert
cells
 Remove cells
 Move connections from one cell to another
Steiner tree estimation
 Net C and delay (RC) calculator
 Do results match detail router and other
extraction tools?

Hangzhou
Lou Scheffer
II-72
Sample Optimization Results
Worst slack / # misses
Synthesized
Placed
Opt
Cycle
time
C1
-1 / 2000
-12 / 38k
-2 / 1400
7.5 ns
.25 µm
V1
0/0
-12 / 15k
-0.3 / 100
7.5 ns
.18 µm
T1
-0.5 / 2000
-48 / 164k
-6 / 62k
2.5-10 ns
.18 µm
P1
-0.4 / 100
-97 / 43k
-13 / 20k
8 ns
.25 µm
V2
-0.5 / 500
-11 / 2000
-4 / 1000
7.5 ns
.18 µm
Design
Hangzhou
Lou Scheffer
Tech
II-73
Root Problem is Wire Load Models
Main problem: correlation between PreP&R estimates and Post-P&R extraction
 If correlation is good…

 Problems

detected and potentially fixed early
If correlation is bad…
 Problems
detected late
 Not a good situation! Need to re-write RTL is
worst case for timing closure.
Hangzhou
Lou Scheffer
II-74
Why are Wire Load Models Used?
Can’t complete layout until logic design is
complete
 Can’t complete logic design without timing
 Can’t time without load and net delay data
 Can’t extract load and net delay data until
layout is complete
 Can’t complete layout …

Hangzhou
Lou Scheffer
II-75
WLM solution – use statistics
Don’t know specific layout data
 But we know something about statistical
properties
 Average net load, average net delay
 Further refine using other characteristics

 Number
of sinks
 Size of design (number of circuits)
 Physical size
Hangzhou
Lou Scheffer
II-76
Correlation Pre/Post-P&R
using averages
Wire load models give synthesis an estimate
of physical design
 We can correlate averages pre- and postP&R as accurately as needed
 If specific design has average behavior, its
timing, on average, can be predicted
 Otherwise, a pass through placement can
provide correct WLM for a design, and get
the averages right

Hangzhou
Lou Scheffer
II-77
Timing and averages
WLMs OK for area, power (properties that
are sums are well handled by statistics)
 But, timing dictated by the worst specific
path
 That path is built of individual nets
 One net can determine the speed of an
entire design
 Reality: poor correlation for relatively few
nets can cause major headaches

Hangzhou
Lou Scheffer
II-78
Correlation Pre/Post-P&R
Averages and Wire Loads
10
0
11
0
90
70
60
mean
50
40
30
20
median
80
30000
25000
20000
15000
10000
5000
0
0
10
Number of nets
Distribution of C / fan-out
pF per fan-out
Note the very long tails of this distribution
Hangzhou
Lou Scheffer
II-79
Correlation Pre/Post-P&R
Cwire Data by Logic Design
Cwire
Number of fan-outs
Hangzhou
Lou Scheffer
II-80
Better Wire Load Models





How can we use information from one pass
through physical design?
Adjust wire load model coefficients
Back annotate specific net load and delay data to
the logic design
New problem: correlation of logic pre- and postsynthesis
But, there are fundamental limits to statistical
models – a new approach is needed.
Hangzhou
Lou Scheffer
II-81
A better (but harder) approach:
Combine Synthesis, P & R
Don’t use wire load models at all
 Synthesis does a trial placement as it runs

 Loading

found from estimated routes
For best results, must include global routing
 Then,
feed global route to detailed router
 Or, do detailed route itself
Much better correlation and timing closure
 No inter-tool data transfer headaches

Hangzhou
Lou Scheffer
II-82
Example of Combined SP&R
Video Graphics Engine
160k instances
 70 macros (blocks)
 5 layers, 0.18 micron
 Target freq: 100Mhz

Hangzhou
Lou Scheffer
II-83
Conventional Flow
Func. & Timing
.lib
More than 20 Iterations
 89MHz best result
w/manual changes

Synthesis
Static Timing
PT
syn2GCF
SE
Floorplan
DEF
Func. & Timing
.TLF
Physical
LEF
Placement base
optimization
Global route
Detail route
Extraction
Delay calc
Hangzhou
DC
Lou Scheffer
DRC
Pearl
II-84
Combined SP&R Flow




100MHz final result, met timing
Correlation within + - 2.1%
One pass
12hrs 20min runtime
Static Timing
PT
write_constraints
TCL Constraints
Floorplan
DEF
Func. & Timing
.TLF
Physical
LEF
EDIF
netlist
SE-PKS
PKS Optimization
Global Route
Static Timing
Detail route
Extraction
Delay calc
Hangzhou
Lou Scheffer
DRC
HE
Pearl
II-85
Slack Correlation
PKS
Routed
Wire Load Based
Hangzhou
Lou Scheffer
II-86
Enlargement of SP&R slack
Hangzhou
Lou Scheffer
II-87
Results from combined SP&R
Case
1
2
3
4
size
instances (k)
350
250
50
160
Hangzhou
macros
56
50
4
70
PKS timing
error (%)
+ - 3%
+ - 3%
+ - 0.96%
+ - 2.1%
Lou Scheffer
max freq (MHz)
conventional
SP&R
140
140
97
100
93
95
89
100
II-88
Agenda
Traditional design flows
 Summary of DSM Problems
 Analysis Methods Overview
 Correction Methods Overview
 Approaches to Fixing Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-89
Experimental Results

For Hierarchical design, two objectives
 Should
be faster and higher capacity than a fully
detailed flat design, to find problems earlier
 Resulting partitions should be realizable

For Block Design, compare different strategies
 Can
overconstrain clock or wire models
 Can do IPO or not after placing
 Can allow placer to change size or not
 Can test combined synthesis/placement against
running the two tools separately
Hangzhou
Lou Scheffer
II-90
Hierarchical Experimental results
Design
Import
Detail
Place
Hangzhou
57x
33x
Detail
Route*
RC
Extract
Delay
Calculation
Lou Scheffer
6x
35 hr 40 min
5 hr 25 min
9 hr
1 hr 50 min
2 hr 15 min
56x
20 min
7 min
60x
6 min
1x
8 min
3 hr 20 min
2 hr 50 min
4 hr
4 min
(*) SPC Trial Route
3 hr 50 min

5 hr 45 min

Design 580K cells, 0.25um process, 5LM,
100MHz
Data collected on a 500MHz processor
workstation
First Encounter Flow
Resulting blocks were realizable
Traditional Flow
7 hr 30 min

5x
7x
Timing
Analysis
IPO
Design
Iteration
II-91
How do different block design
approaches compare?
Jay McDougal of Agilent ran many flows on
the same design
1. Overconstrain clock by various amounts
2. Accurate or conservative WLMs

3.
4.
5.
Tried many levels of conservatism
Allow placer to size or not
Do post placement optimization or not
Physically knowledgeable synthesis
Hangzhou
Lou Scheffer
II-92
Characteristics of sample design

Design not very difficult
 ColdFire
processor
 80K instances
 0.25 micron library
 5 layer process, not congestion dominated
 Design goal was 180 MHz, known to be
possible with this design
 85% of delay in gates; 15% in interconnect
 0.18/0.13
micron, bigger designs will show bigger
differences between techniques
Hangzhou
Lou Scheffer
II-93
Key to the plot of results


Basic flow – Design Compiler & QPlace
TDD = timing driven design
 In
addition to minimizing wire length and congestion,
placer is given timing constraints and allowed to
change gate sizes

IPO and PBO are post placement optimizers
– runs on synthesis DB with back annotation
 PBO – runs on physical DB with synthesis transforms
 IPO

PKS = Physically Knowledgeable Synthesis
(combined Synthesis/Place/Route)
Hangzhou
Lou Scheffer
II-94
Clock cycle achieved
Comparison of Approaches
9.5
9
8.5
8
7.5
7
6.5
6
5.5
5
0.95
Hangzhou
No custom WLM
90% WLM
3ns;50%WL
IPO 5ns NoWL
IPO 3ns NoWL
TDD/PBO 50%WL
TDD/PBO 90%WL
PKS
Required
Cycle time
1.05
1.15
Relative size
Lou Scheffer
1.25
II-95
Clock cycle achieved
Comparison of Approaches
9.5
9
8.5
8
7.5
7
6.5
6
5.5
5
0.95
Hangzhou
Good area, but iterates
between placement and
synthesis, worst TTM,
didn’t hit timing target
No WLM
90% WLM
3ns;50%WL
IPO 5ns NoWL
IPO 3ns NoWL
TDD/PBO 50%WL
TDD/PBO 90%WL
PKS
One tool, no iteration,
better TTM, hit timing
target
1.05
1.15
Relative size
Lou Scheffer
1.25
II-96
Agenda
Traditional design flows
 Summary of DSM Problems
 Analysis Methods Overview
 Correction Methods Overview
 Approaches to Fixing Timing Closure
 Experimental Results
 Summary

Hangzhou
Lou Scheffer
II-97
Good News

At least we understand the problem
 Analysis
of timing is well understood
 Transformations that help timing are well
understood
 DSM effects are painful but can be controlled
Hangzhou
Lou Scheffer
II-98
Bad News
Cycle time and technology advances
demand more and more sophisticated
optimization techniques
 In previous flows, corrections must be
applied in separate tools
 Disconnects among various tools involved
increases turn-around-time and limits
optimization

Hangzhou
Lou Scheffer
II-99
Good News
The Bad News is commonly recognized
 Many tool vendors, academics, in-house
EDA researchers are working to solve these
problems
 A new generation of tools is already
available that was designed from the ground
up to address timing closure

 Hierarchical
Hangzhou
and block design
Lou Scheffer
II-100
Bad News
These problems won’t be the last!
 Each process generation brings new
problems

 Increased
size
 Weird process rules (antenna)
 Possible new effects (single event upset)
Hangzhou
Lou Scheffer
II-101
Summary
Timing closure is a very real problem
 Large chips need hierarchical tools to help
with partitioning and budgeting
 Block tools must understand synthesis,
placement and routing

 wire
load models have serious limitations
Best approach is combined synthesis/P&R
 Experimental data backs this up

Hangzhou
Lou Scheffer
II-102
Acknowledgements

Tony Drumm wrote the original set of slides
for this lecture, including many of the
examples. He credits:
 Alex
Suess
 José Neves
 Bill Joyner
 IBM Rochester EDA folks

But the conclusions, and any mistakes, are
mine
Hangzhou
Lou Scheffer
II-103
Hangzhou
Lou Scheffer
II-104
Download