ppt - University of Michigan

advertisement
© KLMH
Taming the Complexity of Coordinated Place and Route
EECS 527. Layout Synthesis and Optimization
Taming the Complexity of Coordinated Place and Route
By Jin Hu, Myung-Chul Kim and Igor Markov
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
Lienig
Presented By: Alvin Li
© KLMH
Taming the Complexity of Coordinated Place and Route
1. Introduction
2. Background
3. LIRE: Routing Estimation
4. Congestion Relief
5. Coordinated Place and Route
6. Empirical Validation
7. Comparison to Prior Arts
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
2
Lienig
8. Conclusions
© KLMH
1. Introduction
Interconnects
VLSI Physical Design: From Graph Partitioning to Timing Closure
- More than 3 layers
- Non-uniform pitch
Chapter 1: Introduction
3
Lienig
- 3 layers
- Uniform pitch
© KLMH
1. Introduction
• Interconnect complexities increased since 1980s
Interconnects
(From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
4
Lienig
• Increased to 9-12 layers(non-uniform pitch) from 3
• Longer routing times
• Lower quality of IC circuits
© KLMH
1. Introduction
• Interconnects Dominate
IC Performance
Power Dissipation
Size
Signal Integrity
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
5
Lienig
•
•
•
•
© KLMH
1. Introduction: Significance of the Paper
• Global Placement & Global Routing
• Standalone vs. integrated
- Signal integrity and coupling capacitances in interconnect
A set of individual optimizations or
one simultaneous optimization?
• Streamlined System: Coordinated Place-and-Route(CoPR)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
6
Lienig
• Routing estimation during placement
• Placement technique that addresses three types of routing congestion
• Interface to congestion elimination
© KLMH
2. Background – Dijkstra’s Algorithm
 Also known as Maze Routing
 Finds shortest path from source node to target node
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
7
Lienig
• Graph with non-negative edge
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
8
Lienig
© KLMH
2. Background – Dijkstra’s Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
9
Lienig
© KLMH
2. Background – Dijkstra’s Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
10
Lienig
© KLMH
2. Background – Dijkstra’s Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
11
Lienig
© KLMH
2. Background – Dijkstra’s Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
12
Lienig
© KLMH
2. Background – Dijkstra’s Algorithm
© KLMH
2. Background – A* Search Algorithm
 Extension of Dijkstra’s Algorithm, but faster
 Estimates distance to target
 Node priority:
Group 2 label in Dijkstra’s Algorithm
+
Distance estimate, including vias, to the target node
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
13
Lienig
31 Nodes vs. 6 Nodes visited
Characteristic
Detail
Effect
Captures
Detours
Includes history cost and
congestion
Speed
Priority Queue
Selects the best path
Complexity
Pointer-Based Algorithm
Cache Miss
History Cost
Used to determine optimal
path along with congestion
Overshadows
functions based
on straight-line
distance
Admissible
Considers the fewest nodes
Cannot leverage
incrementality, no
incremental
improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
14
Lienig
© KLMH
2. Background – Key Characteristics of A* Search Algorithm
© KLMH
2. Coordinated Place-and-Route
Proposed Improvement to A* Search Algorithm:
Streamlined System: Coordinated Place-and-Route(CoPR)
Cache-friendly routing primitives: estimate routing congestion
Leverages incrementality in routing and congestions updates
New categorization of congestion
New congestion-relief techniques
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
15
Lienig
•
•
•
•
© KLMH
3. LIRE: Routing Estimation
 Lightweight Incremental Routing Estimator
• Congestion maps like global router
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
16
Lienig
• 75K nets per second (can tradeoff between quality and run time)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
17
Lienig
© KLMH
3. LIRE: Routing Estimation
© KLMH
3.1 Faster Routing
 Traditional Global Routing: Maze Routing
• Priority queue  complex and slow
• Large history based cost
• Lacks incrementality
 Linear-time cache-friendly routing
• Avoid priority-queue-based approaches
• Avoid pointers to improve cache hit rate
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
18
Lienig
Bellman-Ford Algorithm
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Bellman – Ford Algorithm(1958)
Slower than Dijkstra’s Algorithm
E * O(1) relaxation steps
Goes through all nodes
Relaxes all edges instead of greedily selecting minimum weight node not yet
processed to relax
Calculates all path and repeat (N-1) times (N = number of vertices)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
19
Lienig
Visits nodes randomly
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
20
Lienig
Bellman – Ford Algorithm(1958)
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Monotonic Routing with One Linear-Time BF Pass

Consider only forward edges

Only consider the space bounded by S and T

Visit in order, going through each node once
 runtime complexity is O(N)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
21
Lienig
(N = number of nodes in the space bounded by S and T)
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm

Duplex-edge relaxation: relaxation in both directions

Echo-relaxation: propagate smaller cost through all recently relaxed edge
incident to the point

Effective in detouring short nets (majority of nets are short)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
22
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
23
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
24
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
25
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
26
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
27
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
28
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
29
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
30
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
31
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
32
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
33
Lienig
Non-monotonic Routing with One Linear-Time BF Pass
© KLMH
3.1 Faster Routing
 Bellman-Ford with Yen’s improvement (1970)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
34
Lienig
• J.Y. Yen suggested reversing the node ordering between BF passes
• Reduces the number of passes required to find optimal path
• BFY finds optimal paths faster than A*-search for most nets
in the experiment (Theorem 1)
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
35
Lienig
First forward pass finds optimal monotonic path
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
36
Lienig
Backward pass finds a detour
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
37
Lienig
Second forward pass finds optimal path
© KLMH
3.1 Faster Routing
 Bellman-Ford with Yen’s improvement (1970)
• With m passes, runtime complexity is O(mN)
(N = number of nodes in the space bounded by S and T)
• Limit m to reduce runtime
• Small loss of optimality
• Focus on incremental calls to BFY
• Incremental Routing with BFY
• Records partial costs along an existing route to reduce runtime
(rip-up-and-reroute and repeated invocations of LIRE during placement)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
38
Lienig
• Faster!
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Incremental Routing with BFY
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
39
Lienig
Initial route with BFY
© KLMH
3.1 Faster Routing – Bellman Ford Algorithm
Incremental Routing with BFY
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
40
Lienig
Through relaxation, BFY preserve part of the route
and find a better partial segment
© KLMH
4. Congestion Relief
 Main Goal: To increase the porosity of placement regions
with high routing congestion
 How?
i. After global placement, shift cell locations
and use congestion driven detailed placement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
41
Lienig
ii. During global placement, inflate cells
based on early congestion estimates and pin density
© KLMH
4. Congestion Relief
Traditional ways are insufficient:
After global placement, shift cell locations and use congestion
driven detailed placement
Must preserve the structure of resulting placement or
risk unbearable deterioration of interconnect length
During global placement, inflate cells based on early congestion
estimates and pin density
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
42
Lienig
When they move outside the congest region, new cells must be
inflated, which may consume all whitespace without solving root cause
© KLMH
4. Congestion Relief – Further Analysis
 3 Types of Routing Congestion:
i. Cell based congestion caused by cell-to-cell proximity
ii. Local layout-based congestion caused by static design
properties, such as blockages and reduced routing
capacities
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
43
Lienig
iii. Remotely-induced layout based congestion attributed to
non-local factors such as long net
© KLMH
4. Congestion Relief – Further Analysis
1. Cell based congestion caused by cell-to-cell proximity
•
Mitigated by cell inflation(only top5% most congested GCells
to avoid exhausting whitespace)
2. Local layout-based congestion caused by static design
properties, such as blockages and reduced routing capacities
•
Locally inject whitespace(move cells out of congested region)
3. Remotely-induced layout based congestion attributed to
non-local factors such as long net
•
Enforce non-uniform target density by:
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
44
Lienig
i) Creating a packing peanut(fixed cell) at the center of every GCell
ii) Modify its size based on congestion
© KLMH
5. Coordinated Place and Route
Integration of Routing and Placement
 Incremental placement updates
• After its first invocation, LIRE maintains the overall congestion map and
keeps track of the GCells traversed by each point by point connection
• In next invocation, if the endpoints remain the same, it is left unchanged
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
45
Lienig
• Has pronounced effect in later iterations and during detailed placement,
when locations are stabilized
© KLMH
5. Coordinated Place and Route
Integration of Routing and Placement
 Incremental-routing updates
• When invoked for first time, LIRE generates routes from scratch.
• After that, it reuses existing routes where possible
• Nets whose terminals relocated to different Gcells
are rerouted using the original net ordering
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
46
Lienig
• Remaining nets are checked if their routes are congested,
and it is mitigated by single incremental BFY passes
• Replicates accuracy of maize router, but a better runtime
© KLMH
6. Empirical Validation
Verifying Result
Implemented in CoPR in C++ using the OpenMP library,
compiled with g++4.7.0
Global placer derived from SimPL
Used by three of the top four teams at the ICCAD 2012 Contest
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
47
Lienig
Reported on the ICCAD 2012 benchmark by IBM researchers

Based on same run-time, CoPR outperforms the finalists of ICCAD 2012
Contest by 7% and 2% in quality metrics. It is 5.7 faster than another
contestant with same quality.

With respect to scoring formulas used at the ICCAD 2012 Contest,
CoPR outperforms the winner.
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
48
Lienig
© KLMH
6. Empirical Validation
© KLMH
7. Comparisons to Prior Art
 Fast Routing:
“A Fast Maze-free Routing Congestion Estimator With Hybrid
Unilateral Monotonic Routing”
by W.-H. Liu, Y.-L. Li and C.-K. Kok
 Replaces A* - Search with fast linear-time routing algorithms that
exploit a different notion of monotonic routes
 Uses multiple passes to find non-monotonic routes
and does not claim optimality
 Doesn’t consider CPU cache effects and the connection with BFY
 Not used to drive competitive global placer in comparison to
the successful results for coordinated place-and-route by CoPR
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
49
Lienig
 CoPR’s authors completed their work before this paper was published or
made available
© KLMH
7. Comparisons to Prior Art
 Fast Routing:
“BonnTools: Mathematical Innovation for Layout
and Timing Closure of Systems on a Chip”
by B. Korte, D. Rautenbach and J. Vygen
 Speeds up Dijkstra’s algorithm with sophisticated data structures
and algorithms
 Uses more memory for advanced data structure
and requires significant up-front set-up
 Singled-threaded version of LIRE takes <15% of runtime
in the entire place-and-route flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
50
Lienig
 CoPR’s authors avoided sophisticated routing algorithms
and data structures
© KLMH
7. Comparisons to Prior Art
 Incremental Routing Techniques
 All modern routability-driven technique use built-in congestion estimation
to construct new estimates from scratch every invocation
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
51
Lienig
 Unnecessarily time-consuming, especially when placement
has not changed significantly
© KLMH
7. Comparisons to Prior Art
 Incremental Routing Techniques
“GDRouter: Interleaved Global Routing
and Detailed Routing for Ultimate Routability”
by Y. Zhang and C. Chu
 Rip-up and reroute some congested nets
 Assume static routing and placement instance
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
52
Lienig
 CoPR:
 Accounts for dynamic placement and routing instances
 Takes advantage of previous partial routes
 Updates routes on an as-needed basis
© KLMH
8. Conclusions
• Interconnects are playing dominance roles in IC Design:
 Area
 Volume
 Delay
 Power
 Signal integrity
• Threatening to render Moore’s Law irrelevant
• Solution? Reduce interconnect demand
• IBM researchers:
• Combining the two brings tangible and significant benefits in IC
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
cost
53
Lienig
• Design flows with separate placement and routing steps have
become ineffective for modern ICs
© KLMH
8. Conclusions
 Why isn’t there more research on integrated optimizations?
• Sophisticated data structures
• Elaborate multistep optimizations used by state-of-the-art algorithms
• Unmaintainable source-code bases that are unnecessarily entangled
• Large sets of tuning parameters
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
54
Lienig
• Significant runtime
© KLMH
8. Conclusions
 Coordinated Place-and-Route(CoPR)
• Dramatic acceleration of constructive routing estimation through linear-time
cache-friendly algorithms that do not require sophisticated data structures
• Significant reductions in the amount of work through pervasive incrementality
at the interface between placement and routing
• Identification of two new types of routing congestion, as well as mechanism by
which a global placer can diagnose them and respond effectively
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
55
Lienig
• Strong empirical results on the most recent benchmarks from IBM research
© KLMH
8. Conclusions
 Impact of this paper:
•More compact and less costly IC layouts
•Reduce back-end turn-around-time so IC designers can evaluate a greater
number of micro-architectural configurations
•Provide an algorithm framework:
• Integrates routing and placement
• Enhances performance
This paper will be presented at the
Paper Sessions of DAC 2013(Design Automation Conference)
in June 6th at Austin,Texas
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
56
Lienig
“One Small Step for Placement, One Big Leap for Routability! ”
© KLMH
ICCAD
• Annual CAD Contest in Taiwan since 2000
• Boost EDA research momentum in Taiwan
• ICCAD started in 2012 sponsored by IEEE CEDA and Taiwan MoE
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
57
Lienig
• Designed for university students
© KLMH
ICCAD
• The quality metrics are determined by the problem
specifications
• Correctness
• Runtime
• Memory usage
• Evaluated by the announced benchmarks and hidden
benchmarks
• Language: Standard C/C++ Library, MATLAB prohibited
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
58
Lienig
• System Platform (Machine type & Linux/GNU libc/Gcc version)
is announced in each problem
© KLMH
ICCAD 2013
• 2012 Contest:
• >50 Teams from 7 regions
• Problems:
1. Finding the minimal logic difference for functional ECO
(contributed by Cadence Design Systems Inc., Taiwan)
2. Design hierarchy aware routability-driven placement
(contributed by IBM Corp., USA)
3. Fuzzy pattern matching for physical verification
(contributed by Mentor Graphics Corp., USA)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
59
Lienig
First Place of Problem 2
Myung-Chul Kim & Jin Hu –University of Michigan
Advisor: Prof. Igor L. Markov
© KLMH
ICCAD 2013
• 2013 Contest:
• Problems:
1. Technology Mapping for Macro Blocks contributed
(contributed by Taiwan Cadence Design Systems, Inc.)
2. Placement Finishing – Detailed Placement and Legalization
(contributed by IBM Research, Austin, TX)
3. Mask Optimization contributed
(contributed by IBM Research, East Fishkill, NY)
Registration Deadline: May 15, 2013
http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2013/default.html
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
60
Lienig
•
© KLMH
END
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
61
Lienig
Thank you very much!
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 1: Introduction
62
Lienig
© KLMH
Proof of Theorem 1
Download