ppt - University of Michigan

© KLMH Taming the Complexity of Coordinated Place and Route EECS 527. Layout Synthesis and Optimization Taming the Complexity of Coordinated Place and Route By Jin Hu, Myung-Chul Kim and Igor Markov VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction Lienig Presented By: Alvin Li © KLMH Taming the Complexity of Coordinated Place and Route 1. Introduction 2. Background 3. LIRE: Routing Estimation 4. Congestion Relief 5. Coordinated Place and Route 6. Empirical Validation 7. Comparison to Prior Arts VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 2 Lienig 8. Conclusions © KLMH 1. Introduction Interconnects VLSI Physical Design: From Graph Partitioning to Timing Closure - More than 3 layers - Non-uniform pitch Chapter 1: Introduction 3 Lienig - 3 layers - Uniform pitch © KLMH 1. Introduction • Interconnect complexities increased since 1980s Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 4 Lienig • Increased to 9-12 layers(non-uniform pitch) from 3 • Longer routing times • Lower quality of IC circuits © KLMH 1. Introduction • Interconnects Dominate IC Performance Power Dissipation Size Signal Integrity VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 5 Lienig • • • • © KLMH 1. Introduction: Significance of the Paper • Global Placement & Global Routing • Standalone vs. integrated - Signal integrity and coupling capacitances in interconnect A set of individual optimizations or one simultaneous optimization? • Streamlined System: Coordinated Place-and-Route(CoPR) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 6 Lienig • Routing estimation during placement • Placement technique that addresses three types of routing congestion • Interface to congestion elimination © KLMH 2. Background – Dijkstra’s Algorithm  Also known as Maze Routing  Finds shortest path from source node to target node VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 7 Lienig • Graph with non-negative edge VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 8 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 9 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 10 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 11 Lienig © KLMH 2. Background – Dijkstra’s Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 12 Lienig © KLMH 2. Background – Dijkstra’s Algorithm © KLMH 2. Background – A* Search Algorithm  Extension of Dijkstra’s Algorithm, but faster  Estimates distance to target  Node priority: Group 2 label in Dijkstra’s Algorithm + Distance estimate, including vias, to the target node VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 13 Lienig 31 Nodes vs. 6 Nodes visited Characteristic Detail Effect Captures Detours Includes history cost and congestion Speed Priority Queue Selects the best path Complexity Pointer-Based Algorithm Cache Miss History Cost Used to determine optimal path along with congestion Overshadows functions based on straight-line distance Admissible Considers the fewest nodes Cannot leverage incrementality, no incremental improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 14 Lienig © KLMH 2. Background – Key Characteristics of A* Search Algorithm © KLMH 2. Coordinated Place-and-Route Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR) Cache-friendly routing primitives: estimate routing congestion Leverages incrementality in routing and congestions updates New categorization of congestion New congestion-relief techniques VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 15 Lienig • • • • © KLMH 3. LIRE: Routing Estimation  Lightweight Incremental Routing Estimator • Congestion maps like global router VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 16 Lienig • 75K nets per second (can tradeoff between quality and run time) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 17 Lienig © KLMH 3. LIRE: Routing Estimation © KLMH 3.1 Faster Routing  Traditional Global Routing: Maze Routing • Priority queue  complex and slow • Large history based cost • Lacks incrementality  Linear-time cache-friendly routing • Avoid priority-queue-based approaches • Avoid pointers to improve cache hit rate VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 18 Lienig Bellman-Ford Algorithm © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman – Ford Algorithm(1958) Slower than Dijkstra’s Algorithm E * O(1) relaxation steps Goes through all nodes Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax Calculates all path and repeat (N-1) times (N = number of vertices) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 19 Lienig Visits nodes randomly © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 20 Lienig Bellman – Ford Algorithm(1958) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Monotonic Routing with One Linear-Time BF Pass  Consider only forward edges  Only consider the space bounded by S and T  Visit in order, going through each node once  runtime complexity is O(N) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 21 Lienig (N = number of nodes in the space bounded by S and T) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm  Duplex-edge relaxation: relaxation in both directions  Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point  Effective in detouring short nets (majority of nets are short) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 22 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 23 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 24 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 25 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 26 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 27 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 28 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 29 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 30 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 31 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 32 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing – Bellman Ford Algorithm VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 33 Lienig Non-monotonic Routing with One Linear-Time BF Pass © KLMH 3.1 Faster Routing  Bellman-Ford with Yen’s improvement (1970) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 34 Lienig • J.Y. Yen suggested reversing the node ordering between BF passes • Reduces the number of passes required to find optimal path • BFY finds optimal paths faster than A*-search for most nets in the experiment (Theorem 1) © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 35 Lienig First forward pass finds optimal monotonic path © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 36 Lienig Backward pass finds a detour © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Bellman-Ford with Yen’s improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 37 Lienig Second forward pass finds optimal path © KLMH 3.1 Faster Routing  Bellman-Ford with Yen’s improvement (1970) • With m passes, runtime complexity is O(mN) (N = number of nodes in the space bounded by S and T) • Limit m to reduce runtime • Small loss of optimality • Focus on incremental calls to BFY • Incremental Routing with BFY • Records partial costs along an existing route to reduce runtime (rip-up-and-reroute and repeated invocations of LIRE during placement) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 38 Lienig • Faster! © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 39 Lienig Initial route with BFY © KLMH 3.1 Faster Routing – Bellman Ford Algorithm Incremental Routing with BFY VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 40 Lienig Through relaxation, BFY preserve part of the route and find a better partial segment © KLMH 4. Congestion Relief  Main Goal: To increase the porosity of placement regions with high routing congestion  How? i. After global placement, shift cell locations and use congestion driven detailed placement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 41 Lienig ii. During global placement, inflate cells based on early congestion estimates and pin density © KLMH 4. Congestion Relief Traditional ways are insufficient: After global placement, shift cell locations and use congestion driven detailed placement Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length During global placement, inflate cells based on early congestion estimates and pin density VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 42 Lienig When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause © KLMH 4. Congestion Relief – Further Analysis  3 Types of Routing Congestion: i. Cell based congestion caused by cell-to-cell proximity ii. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 43 Lienig iii. Remotely-induced layout based congestion attributed to non-local factors such as long net © KLMH 4. Congestion Relief – Further Analysis 1. Cell based congestion caused by cell-to-cell proximity • Mitigated by cell inflation(only top5% most congested GCells to avoid exhausting whitespace) 2. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities • Locally inject whitespace(move cells out of congested region) 3. Remotely-induced layout based congestion attributed to non-local factors such as long net • Enforce non-uniform target density by: VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 44 Lienig i) Creating a packing peanut(fixed cell) at the center of every GCell ii) Modify its size based on congestion © KLMH 5. Coordinated Place and Route Integration of Routing and Placement  Incremental placement updates • After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection • In next invocation, if the endpoints remain the same, it is left unchanged VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 45 Lienig • Has pronounced effect in later iterations and during detailed placement, when locations are stabilized © KLMH 5. Coordinated Place and Route Integration of Routing and Placement  Incremental-routing updates • When invoked for first time, LIRE generates routes from scratch. • After that, it reuses existing routes where possible • Nets whose terminals relocated to different Gcells are rerouted using the original net ordering VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 46 Lienig • Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes • Replicates accuracy of maize router, but a better runtime © KLMH 6. Empirical Validation Verifying Result Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0 Global placer derived from SimPL Used by three of the top four teams at the ICCAD 2012 Contest VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 47 Lienig Reported on the ICCAD 2012 benchmark by IBM researchers  Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality.  With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner. VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 48 Lienig © KLMH 6. Empirical Validation © KLMH 7. Comparisons to Prior Art  Fast Routing: “A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok  Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes  Uses multiple passes to find non-monotonic routes and does not claim optimality  Doesn’t consider CPU cache effects and the connection with BFY  Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 49 Lienig  CoPR’s authors completed their work before this paper was published or made available © KLMH 7. Comparisons to Prior Art  Fast Routing: “BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen  Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms  Uses more memory for advanced data structure and requires significant up-front set-up  Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 50 Lienig  CoPR’s authors avoided sophisticated routing algorithms and data structures © KLMH 7. Comparisons to Prior Art  Incremental Routing Techniques  All modern routability-driven technique use built-in congestion estimation to construct new estimates from scratch every invocation VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 51 Lienig  Unnecessarily time-consuming, especially when placement has not changed significantly © KLMH 7. Comparisons to Prior Art  Incremental Routing Techniques “GDRouter: Interleaved Global Routing and Detailed Routing for Ultimate Routability” by Y. Zhang and C. Chu  Rip-up and reroute some congested nets  Assume static routing and placement instance VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 52 Lienig  CoPR:  Accounts for dynamic placement and routing instances  Takes advantage of previous partial routes  Updates routes on an as-needed basis © KLMH 8. Conclusions • Interconnects are playing dominance roles in IC Design:  Area  Volume  Delay  Power  Signal integrity • Threatening to render Moore’s Law irrelevant • Solution? Reduce interconnect demand • IBM researchers: • Combining the two brings tangible and significant benefits in IC VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction cost 53 Lienig • Design flows with separate placement and routing steps have become ineffective for modern ICs © KLMH 8. Conclusions  Why isn’t there more research on integrated optimizations? • Sophisticated data structures • Elaborate multistep optimizations used by state-of-the-art algorithms • Unmaintainable source-code bases that are unnecessarily entangled • Large sets of tuning parameters VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 54 Lienig • Significant runtime © KLMH 8. Conclusions  Coordinated Place-and-Route(CoPR) • Dramatic acceleration of constructive routing estimation through linear-time cache-friendly algorithms that do not require sophisticated data structures • Significant reductions in the amount of work through pervasive incrementality at the interface between placement and routing • Identification of two new types of routing congestion, as well as mechanism by which a global placer can diagnose them and respond effectively VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 55 Lienig • Strong empirical results on the most recent benchmarks from IBM research © KLMH 8. Conclusions  Impact of this paper: •More compact and less costly IC layouts •Reduce back-end turn-around-time so IC designers can evaluate a greater number of micro-architectural configurations •Provide an algorithm framework: • Integrates routing and placement • Enhances performance This paper will be presented at the Paper Sessions of DAC 2013(Design Automation Conference) in June 6th at Austin,Texas VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 56 Lienig “One Small Step for Placement, One Big Leap for Routability! ” © KLMH ICCAD • Annual CAD Contest in Taiwan since 2000 • Boost EDA research momentum in Taiwan • ICCAD started in 2012 sponsored by IEEE CEDA and Taiwan MoE VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 57 Lienig • Designed for university students © KLMH ICCAD • The quality metrics are determined by the problem specifications • Correctness • Runtime • Memory usage • Evaluated by the announced benchmarks and hidden benchmarks • Language: Standard C/C++ Library, MATLAB prohibited VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 58 Lienig • System Platform (Machine type & Linux/GNU libc/Gcc version) is announced in each problem © KLMH ICCAD 2013 • 2012 Contest: • >50 Teams from 7 regions • Problems: 1. Finding the minimal logic difference for functional ECO (contributed by Cadence Design Systems Inc., Taiwan) 2. Design hierarchy aware routability-driven placement (contributed by IBM Corp., USA) 3. Fuzzy pattern matching for physical verification (contributed by Mentor Graphics Corp., USA) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 59 Lienig First Place of Problem 2 Myung-Chul Kim & Jin Hu –University of Michigan Advisor: Prof. Igor L. Markov © KLMH ICCAD 2013 • 2013 Contest: • Problems: 1. Technology Mapping for Macro Blocks contributed (contributed by Taiwan Cadence Design Systems, Inc.) 2. Placement Finishing – Detailed Placement and Legalization (contributed by IBM Research, Austin, TX) 3. Mask Optimization contributed (contributed by IBM Research, East Fishkill, NY) Registration Deadline: May 15, 2013 http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2013/default.html VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 60 Lienig • © KLMH END VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 61 Lienig Thank you very much! VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction 62 Lienig © KLMH Proof of Theorem 1

ppt - University of Michigan

Related documents

Products

Support

ppt - University of Michigan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib