© KLMH EECS 527 Paper Presentation Techniques for Fast Physical Synthesis By Charles J. Alpert, Shrirang K. Karandikar, Zhuo Li, Gi-Joon Nam, Stephen T. Quay, Haoxing Ren, C. N. Sze, Paul G. Villarrubia, and Mehmet C. Yildiz Presented by Lingfeng Xu Department Electrical Engineering and Computer Science University of Michigan, Ann Arbor VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 1 Lienig 11/2011 © KLMH EECS 527 Paper Presentation Outlines Introduction Buffering Trends Major Phases of Physical Synthesis Closer Look at Optimization Selected Techniques Fast Timing-Driven Buffering Layout Aware Buffer Trees Diffusion Based Legalization Q&A VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 2 Lienig © KLMH EECS 527 Paper Presentation Introduction Purpose of physical synthesis Timing closure Physical synthesis Iterations Iterate between manual design work and automatic physical synthesis Philosophy As fast as possible even if a little optimality is sacrificed IBM’s physical synthesis tool VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 3 Lienig PDS (Placement-Driven Synthesis) system © KLMH EECS 527 Paper Presentation Buffering trends “Buffering Explosion” Thiner wires == resistance increase Wire delays increasingly dominate gate delays Saxena et al. [3] predict that half of all logic will consist of buffers VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 4 Lienig 20% - 25% buffers or inverters in today’s 90nm design © KLMH Percentage of block-level nets requiring repeaters [3] Intra-block communication repeaters as a percentage of the total cell count for the block [3] Chapter 5: Global Routing 5 Lienig VLSI Physical Design: From Graph Partitioning to Timing Closure © KLMH EECS 527 Paper Presentation Buffering trends Challenges Buffer insertion need to be performed fast Area and Power Layout awareness VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 6 Lienig Buffering constricts or seeds global routing © KLMH EECS 527 Paper Presentation Major Phase of Physical Synthesis PDS stages Initial placement and optimization Timing-driven placement and optimization Timing-driven detailed placement Optimization techniques Clock insertion and optimization Routing and post routing optimization VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 7 Lienig Early-mode timing optimization EECS 527 Paper Presentation Closer look at Optimization Optimization phases Electrical correction Critical path optimization Histogram compression • • • • © KLMH Phase 1 Initial Placement Electrical Correction Legalization Critical Slack Optimization Phase 2 • • • • • • Timing-driven Placement Electrical Correction Critical Slack Optimization Legalization Compression Legalization Legalization Phase 3 • Timing-driven Detailed Placement An example of physical synthesis breakdown VLSI Physical Design: From Graph Partitioning to Timing Closure • • • • • • • • Electrical Correction Legalization Critical Slack Optimization Legalization Critical Slack Optimization Legalization Compression Legalization Chapter 5: Global Routing 8 Lienig Phase 4 © KLMH EECS 527 Paper Presentation How to Achieve Fast Physical Synthesis? Selected Techniques Fast Timing-Driven Buffering Layout Aware Buffer Trees VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 9 Lienig Diffusion Based Legalization © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Motivation Over a million buffers Rebuffering rips all buffers and reinserts buffers from scratch Considerations Buffering resources vs. delay Runtime VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 10 Lienig Slew, noise and capacitance constraints © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Classical Buffering Algorithm Goal: Maximize source RAT Dynamic programming Candidate solutions generated and propagated from the sinks to the source Solution internal node characteristics (q, c, w) q: required arrival time c: downstream load capacitance w: cost summation for the buffer insertion decision VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 11 Lienig Example: sink (q = RAT, c = load capacitance, w = 0) © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Classical Buffering Algorithm Two solutions α1, α2 α2 dominates α1, if q2 ≥ q1, c2 ≤ c1 and w2 ≤ w1 α1 is redundant and can be pruned At the end of algorithm A set of solutions with different cost-RAT tradeoff is obtained Choose one in middle VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 12 Lienig “10 ps rule”: If margin RAT gain is more than 10ps, choose solution with bigger RAT © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Prebuffer Slack Pruning (PSP) Based on current node being processed if q2 < q1, c2 < c1 and (q2 - q1)/(c2 - c1) ≥ Rmin, then α2 is pruned early VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 13 Lienig Appropriate Rmin guarantees optimality, however larger value does not hurt solution quality © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Squeeze Pruning Three partial solutions α1, α2, α3 with same cost if (q2 - q1)/(c2 - c1)≤(q3 - q2)/(c3 - c2), then α2 is pruned VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 14 Lienig For a two-pin net, the middle point is always dominated by either the first or the third solution; for multi-sink net, optimality not guaranteed but causes no degradation in solution most of the time © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Library Lookup Every buffer in the library is examined for iteration If there are m kinds of buffer and inverter, n nodes, mn candidate solutions in total However many candidate solutions are not worth considering Pre-compute Buffer table and Inverter table VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 15 Lienig 2n candidate solutions, n with inverters and n with buffers © KLMH EECS 527 Paper Presentation Fast Timing-Driven Buffering Results and Summary Derived from 5000 high capacitance nets from an ASIC chip 3% quality degradation and 20x speedup Philosophy: as fast as possible even if a little optimality is sacrificed VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 16 Lienig Rip up and rebuffering with more accurate techniques can be perform latter if desired © KLMH EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees Layout problems in buffering (a) Alley (b) Pile-ups Holes in large blocks Layout constrains Holes in large blocks Navigating blocks and dense region Critical and non-critical routes VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 17 Lienig Avoiding routing congestions © KLMH EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees Layout aware buffer tree flow Step 1: Construct a fast timing-driven Steiner tree Step 2: Reroute the Steiner tree to preserve its topology while navigating environmental constrains Step 3: Insert buffers (e.g. with Fast Timing-Driven Buffering) This work focuses on Step 2 VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 18 Lienig © KLMH EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees Algorithm Break existing Steiner tree into disjoint 2-paths, i.e., paths start and end with either source, sink or a Steiner point Each 2-path is routed in turn to minimize cost, starting from sinks and ending at source Maze routing for each 2-path with cost function VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 19 Lienig If Steiner point is in a congested region, move it in a specified “plate region” © KLMH EECS 527 Paper Presentation VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 20 Lienig © KLMH EECS 527 Paper Presentation VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 21 Lienig © KLMH EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees General Maze routing cost function Tradeoff parameter 0 ≤ K ≤ 1 Tile cost: cost(t) = 1 + K e(t) Merging branches: cost(t) = max(cost(L), cost(R) + K min(cost(L), cost(R)) Sink initialization cost(s) = (K - 1)RAT(s)/DpT VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 22 Lienig Use K=1 for electrical correction; use K=0.1 for critical path © KLMH EECS 527 Paper Presentation Layout Aware Fast and Flexible Buffer Trees Example and Summary A 7-pin net of an industrial design (a) K=1.0, 4134ps slack improvement VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 23 Lienig (b) K=0.1, 4646ps slack improvement © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Classical legalization After optimization, local regions can be overfull Run periodically to snap from overlaps to legal positions If one waits too long between two legalizations, cells may end up quite far away from optimal position, which may severely hurt timing Diffusion-Based Legalization Avoid cells been moved too far away VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 24 Lienig Fast. Run in minutes on designs with millions of gates © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Diffusion as a Physical Process Moves elements from a state with non-zero potential energy to a state of equilibrium Can be modeled by breaking down into finite time steps Relationship of material concentration with time and space t 2 d x , y (t ) VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 25 Lienig d x , y (t ) © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Diffusion as a Physical Process Cell velocity vxH, y ( x, y ) v ( x, y ) V x, y d x , y (t ) x d x , y (t ) y / d x , y (t ) / d x , y (t ) Cell new location t x(t ) x(0) vxH(t '), y (t ') (t ')dt ' 0 t y(t ) y(0) vVx(t '), y (t ') (t ')dt ' VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 26 Lienig 0 © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Diffusion Based Placement Coordinates are scaled so that the width and height of each bin is one Location (x, y) lies in bin ( j, k ) ( x , y ) Forward Time Centered Space (FTCS) scheme New bin density t d j ,k (n 1) d j ,k (n) (d j 1,k (n) d j 1,k (n) 2d j ,k ( n)) 2 t (d j 1,k (n) d j ,k 1 (n) 2d j ,k ( n)) 2 Bin velocity vVj ,k (n) d j 1,k (n) d j 1,k (n) 2 d ( j , k ) ( n) d j ,k 1 (n) d j ,k 1 (n) VLSI Physical Design: From Graph Partitioning to Timing Closure 2 d ( j , k ) ( n) Chapter 5: Global Routing 27 Lienig v Hj,k (n) © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Diffusion Based Placement Enforce vH = 0 at horizontal boundary and vH = 0 at vertical boundary Two cells right next to each other can be assigned very different velocities which could change their relative ordering. Apply velocity interpolation based on the four closest bins to remedy this behavior New locations (x, y) for the next time stamp x(n 1) x(n) vxH( n ), y ( n ) t VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 28 Lienig y (n 1) y (n) vVx ( n ), y ( n ) t © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Diffusion Based Placement: Getting it work Diffusion process reaches equilibrium when each bin has the same density, i.e. the average density, can cause unnecessary spreading, even if every bin’s density is well below dmax Idea: Run diffusion for regions which requires it Local Diffusion: Run diffusion on cells in a window around bins that violate target density constraint VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 29 Lienig If FTCS error exceeds a certain threshold, update the real density based on real cell placement and restart the diffusion algorithm © KLMH EECS 527 Paper Presentation Diffusion-Based Placement Techniques for Legalization Example Before legalization, after traditional legalization and diffusion legalization 4% total wire length save 48% worst slack improvement 36% less negative paths Summary VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 30 Lienig Diffusion based legalization is less likely to disrupt the state of design © KLMH EECS 527 Paper Presentation Summary Buffering trends “Buffer Explosion” Physical synthesis phases Fast Timing-Driven Buffering Layout Aware Buffer Trees Diffusion-Based Legalization VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 31 Lienig 4 phases © KLMH EECS 527 Paper Presentation Thanks ! VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing 32 Lienig Q&A