Development and Application of Tree Synthesis Algorithms

advertisement
Development and Application of
Tree Synthesis Algorithms
John Lillis
University of Illinois
Chicago
Overview
 Part I: Buffer tree synthesis
 Formulations
 S/P/SP-tree
 Part II: Fanin tree embedding/replication
 Optimization
across gate boundaries
 Interaction with placement
Part I: Buffer Tree Synthesis
Premises of Work
 MAIN PREMISE: Powerful Buffer Tree Synthesis is a
Core for Modern Design
 Conservation of Resources Crucial
 Estimate:
700-800K Buffers/Chip in Near Future
 Cost-Performance Tradeoffs
 General Cost Model
 Topology / Embedding / Buffering Spaces Should be
Explored Simultaneously
 2-Phase
Approach Not Robust / Predictable
 Particularly Troublesome in Presence of Blockages
Max Slack Weakness
Overoptimized
Slack
subtrees
Cost
Problem Formulation
 Given:
 Location
of Driver and Sinks
 Technology Parameters
 Timing Requirements
 Buffer Library
 Target Routing Graph (Blockages)
 Find:
 Topology
in corresponding space
 its Embedding
 and Buffer Assignment
 Minimizing Cost
 s.t. Timing Constraints
Philosophy of Constraint Imposition
 Goals:
 Predictable
Behavior
 Absence of ad-hoc Heuristics
 Main Idea:
 Optimally
Solve Constrained Variant of the Problem
 Well-Designed Constraints Produce
Large Flexible Solution Space
Tractability
 Constraints: Topology Space
Full space
Constrained
space
Topology Embedding Flexibility
s
c
s
a
c
a
b
s
b
c
a
b
Target Routing Graph Construction
Routing blockage
s
a
c
b
Buffer blockage
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
Core Subroutine:
Timing-Driven Maze Routing
 Generalization of [Hur, et. al.; TCAD Feb 2000]
 Single Target, Multiple Sources
 Finds non-dominated paths
 Simultaneous Buffer Insertion
 Handling of Blockages in Topology Synthesis
Target
Sources
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
Topology Embedding
 Goal: Obtain timing feasible embedding / buffering of
given topology, minimizing cost
 Solution: Dynamic Programming (bottom-up)
Solution sets
 A(u,v) represents a set of solutions that correspond to
 Vertex
u in Topology
 Vertex v in Target Graph
A1b = Join(A1.left , A1.right)
A1 = GenDijsktra(A1b)
A(u,v)
u
v
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
S-Tree
 Notion of localities:
 Spatial
 Temporal
 Polarity
 Partition sinks into 2 sets based on:
 estimated
timing criticality
 signal polarity requirements
 some other criteria...
 Subtrees can break topology and “stitch” at different
place
S-Tree Topology Space
s
Sink partition:
{a,c,d}
{b}
d
c
a
b
s
s
b
d
d
b
a
c
a
c
S-Tree Recurrence
A1b = Join(A1.left , A1.right)
A1 = GenDijsktra(A1b)
A2b = Join(A2.left , A2.right)
A2 = GenDijsktra(A2b)
A12b = Join(A12.left , A12.right) + Join(A1 , A2)
A12 = GenDijsktra(A12b)
S-Tree Topology Space
s
s
Initial topology
s
c
a
b
b
f
d
c
a
e
f
d
e
s
s
b
a
c
c
a
b
d
f
e
a
b
d
f
e
c
f
d
e
Incorporating polarity
 4 sets:
 critical
& positive signal polarity
 critical & negative
 non-critical & positive
 non-critical & negative
 Other partitioning schemes...
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
P-Tree Topology Space
 All Permutation-Constrained Topologies
a
s
a
b
c
d
e
b
c
d
s
e
a
b
c
d
e
Limitations of P-Tree Space
 Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality”
 Min WL May Not Produce Min Cost
Driver
Driver
Critical
Critical
Non-critical
Non-critical
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree
P-Tree
SP-Tree
SP-Tree
 Combine everything said so far...
 From P-Tree
 Spatial
locality
 Robustness
 From S-Tree
 Temporal
locality
 Polarity locality
 Ability to fix “topology problems” by “stitching”
Solution Space
Entire space
SP-Tree
P-Tree
S-Tree
Fixed topo.
Experiments
 Randomly generated nets
 Non-uniform required arrival time
 Non-uniform sink input capacitance
 Buffer-biased cost
 Interested in:
 Min
cost feasible solution
 Max slack solution for verification
 Runtime
 More details in the paper...
Algorithms for Experiments
 S-Tree
 P-Tree
 SP-Tree
 RMP [Cong, Yuan; DAC 2000]
 RMP-Quick [Cong, Yuan; DAC 2000]
Results
RMP
RMP-Qck
S-Tree
P-Tree
Net2-06
SP-Tree
35
Min cost
feasible
Max slack
# buffers
30
25
20
15
10
5
0
Wire
Buf
Cost
Slack
Max
Slack
Wire
Buf
Cost
Runtime
Results
RMP
RMP-Qck
S-Tree
P-Tree
Net2-08
SP-Tree
50
Min cost feasible
45
Max slack
# buffers
40
35
30
25
20
15
10
5
0
Wire
Buf
Cost
Slack
Max
Slack
Wire
Buf
Cost
Runtime
Results
RMP
RMP-Qck
S-Tree
P-Tree
Net2-12
SP-Tree
80
Min cost feasible
Max slack
70
# buffers
60
50
40
30
20
10
0
Wire
Buf
Cost
Slack
Max
Slack
Wire
Buf
Cost
Runtime
SP-Tree vs. P-Tree
Conclusions
 Key Concepts:
 General
Cost Models
Routing Congestion
Buffer Congestion
 Orthogonal
Separation of Spatial and Temporal Locality
 Polarity Requirements
 Routing and Buffer Blockages
 Targets:
 Small-to-Medium
Sized Signal Nets
 Results Summary
 Highly
Cost-Efficient, High Performance Solutions
 Substantially Outperforms Prior Approaches in Solution
Quality and Runtime
Part II: Fanin Tree
Embedding/Replication
Replication Overview
• Hrkic, Lillis, Beraudo (DAC04, IWLS04)
• Concept: Netlist structure limits potential of
timing-driven placement
• Difficult for top-down synthesis to fix
• Main issue: inherently non-monotone paths
• Approach (Hrkic, Lillis; DAC04) touches on
placement, synthesis (netlist perturbation) and
routing.
Logic Replication
 Duplicate logic cell
 Preserve functionality
 Improve timing
Place / Move cells
 Adjust connections

A
B
A
B
CR
C
C
D
E
D
E
Early Work
 Use replication to straighten I/O paths
 Local monotonicity [Beraudo, Lillis, DAC 2003]
Sequence of 3 cells on the path
 Incremental framework

B
D
B
A
A
C
C
E
D
CR
E
Limitations of Local
Monotonicity
 Local Monotonicity satisfied
 Still many non-monotone paths
A
B
C
D
F
E
Replication Tree Approach
[Hrkic et. al. DAC04]




Identify critical sink
Extract critical fan-in tree (Replication Tree)
Optimize fan-in tree (Fan-in Tree Embedding)
Legalize placement
Slowest Paths Tree
 Focus on slowest paths
 Find slowest paths tree from critical sink
 Include paths within epsilon of current critical
delay
 Focus on most critical portions of fan-in cone
Replication Tree
 Most circuits do not contain large fan-in trees
due to reconvergence
 Given a critical tree temporarily replicate the
entire tree
 Assign connections:
if (u,v) is tree edge; connect uR to vR
 else connect u to vR

A
C
B
A
D
E
C
B
E
BR
DR
D
F
AR
F
FR
CR
Placement cost
 Replication is temporary
 Placement cost is crucial
 Cost discount for placing cell over its logical
equivalent
low cost for placing DR over D
 actual replication will never occur
 multiple low cost location possible

A
C
B
CR
BR
DR
D
E
AR
F
FR
Fan-in Tree Embedding
 Given:
 Fan-in
tree
 Placement of sink and inputs
 Arrival times at inputs
 Placement and routing graph
 Find:
 Placement
of internal tree nodes (Gates)
 Minimizing Cost
 s.t. Timing Constraints
 cost / delay tradeoff
Fan-in Tree Embedding Example
C
A
C
A
B
B
sink
Higher delay, lower cost
sink
Lower delay, higher cost
Fan-out and Fan-in Tree
C
source
A
B
C
A
sink
B
Bottom-up
Top-down
Fan-in Tree Embedding
 Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC
2002]
 Keep:
Graph Model for Embedding Target
 Modified Timing-Driven Maze Routing

 multiple source, multiple targets
 at each vertex keep a list of non-dominated solutions
 S. Hur, J. Lillis, IEEE TCAD 2000
 Modify:
Top-down vs. Bottom-up
 Solution signature (c,t):

 c - cost
 t - signal arrival time

Gate placement cost p(x,y)
Fan-in Tree Embedding
 Non-binary tree: multiple gate inputs
 Top-Down Dynamic Programming
 Maze Routing to populate solutions
 deffered
backtracking
 Join Solutions
Modified
maze routing
 c=px,y
+ c1 + ... + cn
 t=MAX(t1, ... ,tn)
 Bottom-Up solution extraction
 backtrack
to extract maze route
 extract gate placement
Join
Aside: Legalization
 Use Modified Gain-Graph approach [Hur, Lillis;
ICCAD00]
 Modified to incorporate timing information
Optimization Flow
 Identify critical sink (static timing analysis)
 Extract Fan-in Tree
 Replication
Tree
 epsilon-Slowest Paths Tree




Embed Fan-in Tree
Decide which cells to Replicate / Unify
Legalize placement
Repeat while there is improvement
Enhancements
 Post-process unification
some cells placed close to their logical equivalents
 no automatic unification
 if one of the paths is non-critical it is possible to unify
without degrading performace

 Unification in legalizer
during ripple-move cell may be placed on top of its
replica
 unify them and stop legalization

 epsilon-Slowest Paths Tree
no randomization
 dynamically modify value of epsilon to enlarge the fan-in
cone

Experiments
 Algorithms
Timing-Driven VPR (Versatile Place and Route)
[http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html]
 Local Replication [Beraudo, Lillis, DAC-03]
 RT-Embedding

 20 MCNC Benchmark Circuits
 Interested in:
Critical delay
 Amount of replication
 Wire usage

 Tests performed in FPGA domain
 Promissing results
Experimental Setup
Obtain valid placement with Timing-Driven VPR placer
Local Replication
Replication Tree Embedding
Route and Evaluate with Timing-Driven VPR router
0.927
1.020
1.003
RT-Embed
Average
values over all 20 circuits0.858
normalized to VPR
0.869
1.084
critical path delay
1.004
W
W
wire
inf
low-stress
length
blocks
LocalDelay improved for all circuits
Repl
0.925
0.927
1.020
Best improvement
for circuit
1.003
RT- pdc: 0.641
Embed
0.858
0.869
1.084
Runtime 1.004
penalty under 5% on
the VPR flow
Delay improved for all circuits
Best improvement for circuit pdc: 0.641
Replication Statistics
 Circuit ex1010: 38 replications, 12 unifications
Ongoing Work
 Generalize to ASICs
 Include
simultaneous buffering
• Mitigation of legalization noise
 Preventing
(some) overlaps in embedding
 More sophisticated placement cost
 Reconvergence - arborescence approach
 Simultaneous technology (re-)mapping
– Explore multiple Tree Topologies simultaneously
(Universal Tree solver engine: U-Tree)
Review
 Trees are everywhere!
 Even in places where they seem to be absent
 Tree based algorithms can be very strong in generality
of formulation and predictability
 Enable
connection to general placement/routing target
 Can capture tradeoffs between complex objectives
 Can sometimes be applied to drive optimization of graph
structures.
 References:
 http://cs.uic.edu/~jlillis/pubs.html
 S/P/SP-tree executables:
 http://eda.cs.uic.edu/software.html
Thank you
Timing-Driven Placement
Legalization
 After embedding, cells could overlap in the
placement
 Moving cells on critical path may harm timing
 Ripple-move strategy [Hur, Lillis, ICCAD 2000]
 Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement
Legalization
 After embedding, cells could overlap in the
placement
 Moving cells on critical path may harm timing
 Ripple-move strategy [Hur, Lillis, ICCAD 2000]
 Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement
Legalization
 After embedding, cells could overlap in the
placement
 Moving cells on critical path may harm timing
 Ripple-move strategy [Hur, Lillis, ICCAD 2000]
 Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement
Legalization
 After embedding, cells could overlap in the
placement
 Moving cells on critical path may harm timing
 Ripple-move strategy [Hur, Lillis, ICCAD 2000]
 Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement
Legalization
 After embedding, cells could overlap in the
placement
 Moving cells on critical path may harm timing
 Ripple-move strategy [Hur, Lillis, ICCAD 2000]
 Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement
Legalization
 Identify overlap
 Identify up to 4 closest empty (one in each
quadrant)
 Construct gain graph
monotone paths from congested to free slots
 edges: gain of moving a cell to neighboring slot
 wire and timing gain
 find max-gain path and perform ripple-move
 gain could be negative
Overlap

Empty
Empty
Review
Download