© KLMH Taming the Complexity of Coordinated Place and Route EECS 527. Layout Synthesis and Optimization Taming the Complexity of Coordinated Place and Route By Jin Hu, Myung-Chul Kim and Igor Markov VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction Lienig Presented By: Alvin Li © KLMH Taming the Complexity of Coordinated Place and Route 1. Introduction 2. Background 3. LIRE: Routing Estimation 4. Congestion Relief 5. Coordinated Place and Route 6. Empirical Validation 7. Comparison to Prior Arts VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 2 Lienig 8. Conclusions © KLMH 1. Introduction Interconnects VLSI Physical Design: From Graph Partitioning to Timing Closure - More than 3 layers - Non-uniform pitch Chapter 1: Introduction 3 Lienig - 3 layers - Uniform pitch © KLMH 1. Introduction • Interconnect complexities increased since 1980s Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 4 Lienig • Increased to 9-12 layers(non-uniform pitch) from 3 • Longer routing times • Lower quality of IC circuits © KLMH 1. Introduction • Interconnects Dominate IC Performance Power Dissipation Size Signal Integrity VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 5 Lienig • • • • © KLMH 1. Introduction: Significance of the Paper • Global Placement & Global Routing • Standalone vs. integrated - Signal integrity and coupling capacitances in interconnect A set of individual optimizations or one simultaneous optimization? • Streamlined System: Coordinated Place-and-Route(CoPR) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 6 Lienig • Routing estimation during placement • Placement technique that addresses three types of routing congestion • Interface to congestion elimination © KLMH 2. Background – Dijkstra’s Algorithm Also known as Maze Routing Finds shortest path from source node to target node VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 7 Lienig • Graph with non-negative edge VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 8 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 9 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 10 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 11 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 12 Lienig © KLMH 2. Background – Dijkstra’s Algorithm © KLMH 2. Background – A* Search Algorithm Extension of Dijkstra’s Algorithm, but faster Estimates distance to target Node priority: Group 2 label in Dijkstra’s Algorithm + Distance estimate, including vias, to the target node VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 13 Lienig 31 Nodes vs. 6 Nodes visited Characteristic Detail Effect Captures Detours Includes history cost and congestion Speed Priority Queue Selects the best path Complexity Pointer-Based Algorithm Cache Miss History Cost Used to determine optimal path along with congestion Overshadows functions based on straight-line distance Admissible Considers the fewest nodes Cannot leverage incrementality, no incremental improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 14 Lienig © KLMH 2. Background – Key Characteristics of A* Search Algorithm © KLMH 2. Coordinated Place-and-Route Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR) Cache-friendly routing primitives: estimate routing congestion Leverages incrementality in routing and congestions updates New categorization of congestion New congestion-relief techniques VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 15 Lienig • • • • © KLMH 3. LIRE: Routing Estimation Lightweight Incremental Routing Estimator • Congestion maps like global router VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 16 Lienig • 75K nets per second (can tradeoff between quality and run time) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 17 Lienig © KLMH 3. LIRE: Routing Estimation © KLMH 3.1 Faster Routing Traditional Global Routing: Maze Routing • Priority queue complex and slow • Large history based cost • Lacks incrementality Linear-time cache-friendly routing • Avoid priority-queue-based approaches • Avoid pointers to improve cache hit rate VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 18 Lienig Bellman-Ford Algorithm © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958) Slower than Dijkstra’s Algorithm E * O(1) relaxation steps Goes through all nodes Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax Calculates all path and repeat (N-1) times (N = number of vertices) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 19 Lienig Visits nodes randomly © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 20 Lienig Bellman – Ford Algorithm(1958) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Monotonic Routing with One Linear-Time BF Pass Consider only forward edges Only consider the space bounded by S and T Visit in order, going through each node once runtime complexity is O(N) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 21 Lienig (N = number of nodes in the space bounded by S and T) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Duplex-edge relaxation: relaxation in both directions Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point Effective in detouring short nets (majority of nets are short) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 22 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 23 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 24 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 25 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 26 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 27 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 28 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 29 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 30 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 31 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 32 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 33 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing Bellman-Ford with Yen’s improvement (1970) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 34 Lienig • J.Y. Yen suggested reversing the node ordering between BF passes • Reduces the number of passes required to find optimal path • BFY finds optimal paths faster than A*-search for most nets in the experiment (Theorem 1) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 35 Lienig First forward pass finds optimal monotonic path © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 36 Lienig Backward pass finds a detour © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 37 Lienig Second forward pass finds optimal path © KLMH 3.1 Faster Routing Bellman-Ford with Yen’s improvement (1970) • With m passes, runtime complexity is O(mN) (N = number of nodes in the space bounded by S and T) • Limit m to reduce runtime • Small loss of optimality • Focus on incremental calls to BFY • Incremental Routing with BFY • Records partial costs along an existing route to reduce runtime (rip-up-and-reroute and repeated invocations of LIRE during placement) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 38 Lienig • Faster! © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 39 Lienig Initial route with BFY © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 40 Lienig Through relaxation, BFY preserve part of the route and find a better partial segment © KLMH 4. Congestion Relief Main Goal: To increase the porosity of placement regions with high routing congestion How? i. After global placement, shift cell locations and use congestion driven detailed placement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 41 Lienig ii. During global placement, inflate cells based on early congestion estimates and pin density © KLMH 4. Congestion Relief Traditional ways are insufficient: After global placement, shift cell locations and use congestion driven detailed placement Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length During global placement, inflate cells based on early congestion estimates and pin density VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 42 Lienig When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause © KLMH 4. Congestion Relief – Further Analysis 3 Types of Routing Congestion: i. Cell based congestion caused by cell-to-cell proximity ii. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 43 Lienig iii. Remotely-induced layout based congestion attributed to non-local factors such as long net © KLMH 4. Congestion Relief – Further Analysis 1. Cell based congestion caused by cell-to-cell proximity • Mitigated by cell inflation(only top5% most congested GCells to avoid exhausting whitespace) 2. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Locally inject whitespace(move cells out of congested region) 3. Remotely-induced layout based congestion attributed to non-local factors such as long net • Enforce non-uniform target density by: VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 44 Lienig i) Creating a packing peanut(fixed cell) at the center of every GCell ii) Modify its size based on congestion © KLMH 5. Coordinated Place and Route Integration of Routing and Placement Incremental placement updates • After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection • In next invocation, if the endpoints remain the same, it is left unchanged VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 45 Lienig • Has pronounced effect in later iterations and during detailed placement, when locations are stabilized © KLMH 5. Coordinated Place and Route Integration of Routing and Placement Incremental-routing updates • When invoked for first time, LIRE generates routes from scratch. • After that, it reuses existing routes where possible • Nets whose terminals relocated to different Gcells are rerouted using the original net ordering VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 46 Lienig • Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes • Replicates accuracy of maize router, but a better runtime © KLMH 6. Empirical Validation Verifying Result Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0 Global placer derived from SimPL Used by three of the top four teams at the ICCAD 2012 Contest VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 47 Lienig Reported on the ICCAD 2012 benchmark by IBM researchers Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality. With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner. VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 48 Lienig © KLMH 6. Empirical Validation © KLMH 7. Comparisons to Prior Art Fast Routing: “A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes Uses multiple passes to find non-monotonic routes and does not claim optimality Doesn’t consider CPU cache effects and the connection with BFY Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 49 Lienig CoPR’s authors completed their work before this paper was published or made available © KLMH 7. Comparisons to Prior Art Fast Routing: “BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms Uses more memory for advanced data structure and requires significant up-front set-up Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 50 Lienig CoPR’s authors avoided sophisticated routing algorithms and data structures © KLMH 7. Comparisons to Prior Art Incremental Routing Techniques All modern routability-driven technique use built-in congestion estimation to construct new estimates from scratch every invocation VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 51 Lienig Unnecessarily time-consuming, especially when placement has not changed significantly © KLMH 7. Comparisons to Prior Art Incremental Routing Techniques “GDRouter: Interleaved Global Routing and Detailed Routing for Ultimate Routability” by Y. Zhang and C. Chu Rip-up and reroute some congested nets Assume static routing and placement instance VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 52 Lienig CoPR: Accounts for dynamic placement and routing instances Takes advantage of previous partial routes Updates routes on an as-needed basis © KLMH 8. Conclusions • Interconnects are playing dominance roles in IC Design: Area Volume Delay Power Signal integrity • Threatening to render Moore’s Law irrelevant • Solution? Reduce interconnect demand • IBM researchers: • Combining the two brings tangible and significant benefits in IC VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction cost 53 Lienig • Design flows with separate placement and routing steps have become ineffective for modern ICs © KLMH 8. Conclusions Why isn’t there more research on integrated optimizations? • Sophisticated data structures • Elaborate multistep optimizations used by state-of-the-art algorithms • Unmaintainable source-code bases that are unnecessarily entangled • Large sets of tuning parameters VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 54 Lienig • Significant runtime © KLMH 8. Conclusions Coordinated Place-and-Route(CoPR) • Dramatic acceleration of constructive routing estimation through linear-time cache-friendly algorithms that do not require sophisticated data structures • Significant reductions in the amount of work through pervasive incrementality at the interface between placement and routing • Identification of two new types of routing congestion, as well as mechanism by which a global placer can diagnose them and respond effectively VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 55 Lienig • Strong empirical results on the most recent benchmarks from IBM research © KLMH 8. Conclusions Impact of this paper: •More compact and less costly IC layouts •Reduce back-end turn-around-time so IC designers can evaluate a greater number of micro-architectural configurations •Provide an algorithm framework: • Integrates routing and placement • Enhances performance This paper will be presented at the Paper Sessions of DAC 2013(Design Automation Conference) in June 6th at Austin,Texas VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 56 Lienig “One Small Step for Placement, One Big Leap for Routability! ” © KLMH ICCAD • Annual CAD Contest in Taiwan since 2000 • Boost EDA research momentum in Taiwan • ICCAD started in 2012 sponsored by IEEE CEDA and Taiwan MoE VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 57 Lienig • Designed for university students © KLMH ICCAD • The quality metrics are determined by the problem specifications • Correctness • Runtime • Memory usage • Evaluated by the announced benchmarks and hidden benchmarks • Language: Standard C/C++ Library, MATLAB prohibited VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 58 Lienig • System Platform (Machine type & Linux/GNU libc/Gcc version) is announced in each problem © KLMH ICCAD 2013 • 2012 Contest: • >50 Teams from 7 regions • Problems: 1. Finding the minimal logic difference for functional ECO (contributed by Cadence Design Systems Inc., Taiwan) 2. Design hierarchy aware routability-driven placement (contributed by IBM Corp., USA) 3. Fuzzy pattern matching for physical verification (contributed by Mentor Graphics Corp., USA) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 59 Lienig First Place of Problem 2 Myung-Chul Kim & Jin Hu –University of Michigan Advisor: Prof. Igor L. Markov © KLMH ICCAD 2013 • 2013 Contest: • Problems: 1. Technology Mapping for Macro Blocks contributed (contributed by Taiwan Cadence Design Systems, Inc.) 2. Placement Finishing – Detailed Placement and Legalization (contributed by IBM Research, Austin, TX) 3. Mask Optimization contributed (contributed by IBM Research, East Fishkill, NY) Registration Deadline: May 15, 2013 http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2013/default.html VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 60 Lienig • © KLMH END VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 61 Lienig Thank you very much! VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 62 Lienig © KLMH Proof of Theorem 1