VLSI Physical Design Automation Placement (3) Prof. David Pan dpan@ece.utexas.edu Office: ACES 5.434 3/16/2016 1 Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods • Timing and congestion consideration during placement • Newer trends 3/16/2016 2 Timing Cost Critical Path Delay of the circuit is defined as the longest delay among all possible paths from primary inputs to primary outputs. Interconnection delay becomes more and more important in deep sub-micron regime. 3/16/2016 3 Timing Analysis netlist with delay for each gate PI1 1 4 6 5 PO1 PI2 3 6 6 7 PO2 PI3 1 4 4 5 4 PO3 1 0 PI1 arrival times 1 0 3 PI2 0 PI3 1 3/16/2016 7 4 3 1 13 5 6 9 6 6 4 7 4 7 5 15 14 7 4 18 22 18 PO1 PO2 PO3 4 Timing Analysis 1/5 0/4 PI1 arrival time/required time 1 0/0 3 PI2 0/8 1 PI3 slack = required time arrival time 3/3 1/9 1 0 3 PI2 8 PI3 4 6 4 4 1 3/16/2016 8 5 6 6 7/15 5 15/15 14/18 7 4 18/22 22/22 18/22 PO1 PO2 PO3 7/13 2 4 0 13/15 9/9 4 4 PI1 7/9 2 5 6 0 6 6 4 8 4 6 5 0 4 7 4 4 0 4 PO1 PO2 PO3 5 Another example with interconnect delay – Same Timing Analysis 22 3 2 5 L A T C H 19 1 2 4 2 5 L A T C H 4 3 3/16/2016 1 1 4 1 5 2 6 Timing Driven Placement Approaches • Path-based – Most accurate information – Very slow • Budgeting – Inaccurate information – Hard to budget – Fast • Net-based approach – Net-weighting 3/16/2016 7 Net-Weighting • Basic approach – For more timing critical nets (i.e., smaller slack), assign higher net weights – Minimize w net _ length(i), i i where 1 wi Si 3/16/2016 8 Sensitivity Guided Netweighting for Placement Driven Synthesis H. Ren, D. Z. Pan and D.S. Kung ISPD-04 3/16/2016 9 Figure of Merit (FOM) • FOM is the total slack difference compared to a certain slack threshold for all timing end points. FOM tPo (Slk (t ) Slk ) Slk ( t ) Slkt t • Interpreted as the amount of work left for the physical synthesis engine or to the designers for manual fix. • FOM and WNS (worst negative slack) are the two most important metrics for timing closure in modern physical synthesis • However, FOM was not used to guide placement explicitly 3/16/2016 10 Sensitivity Definitions • Net length sensitivity to net weight S • Net delay sensitivity to net length L W S T L ΔL ΔW ΔT ΔL • Net slack sensitivity to net weight: S Slk W ΔT ΔT ΔL T L T S L SW SW ΔW ΔL ΔW • FOM sensitivity to net delay S FOM T ΔFOM ΔT • FOM sensitivity to net weight: S FOM W ΔFOM ΔFOM ΔT ΔW ΔT ΔW 3/16/2016 S FOM T SW T 11 Closed-Form Sensitivity • For net length to weight sensitivity, we have S L W L Wsrc Wsin k 2W WsrcWsin k • For delay to wire length sensitivity, we have ΔT S L ΔL rcL cRd rCl T • Use switch-level RC and Elmore delay to illustrate the concept • Good enough during placement • Can be extended to more accurate models 3/16/2016 12 FOM to Net Delay Sensitivity • Question: suppose the delay of net i is reduced by a small amount DT(i), what is the impact to FOM? • Define: K(i) to be the number of timing end points whose slack will change due to DT(i) • Then, we have the following Theorem FOM T S ΔFOM (i ) K (i) ΔT (i) 3/16/2016 13 K(i) Computation • Topologically sorted order from PO to PI • Only propagate K(i) to the most timing critical input pin (slack, K(i)) pair (-3, 2) A (-3, 2) (-3, 1) B (-1.2, 1) D C (-0.8, 0) (-0.8, 0) 3/16/2016 (-3, 1) (-3, 1) Po1 (-1.2, 1) Po2 14 Net Weight Generation • Put these sensitivities together and generate new net weight DW (i ) ( Slk t Slk (i )) S wSlk (i ) SWFOM (i ) Worg (i ) W (i ) Worg (i ) DW (i ) 3/16/2016 Slk (i ) Slk t Slk (i ) Slk t 15 Experiments • We compare the placement and physical synthesis results of three different algorithms on 7 industry chips (up to 444k movable objects) from IBM – WL: wire length driven placement with uniform weight – TS: timing driven placement using slack sensitivity – TSF: timing driven placement using both slack and FOM sensitivity 3/16/2016 16 Timing after Placement FOM Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average ZW -9134 0 -535 -322 -114 -142 -4 WL -41650 -6966 -13711 -8057 -28527 -20257 -452 TS -26093 -4102 -6468 -4024 -15334 -9417 -248 TSF -25602 -3454 -5595 -3440 -12229 -9536 -131 Improvement TS TSF 48% 49% 41% 50% 55% 62% 52% 60% 46% 57% 54% 53% 46% 72% 49% 58% TSF -4.254 -1.754 -3.788 -3.605 -2.002 -4.856 -0.432 Improvement TS TSF 63% 44% 37% 38% 30% 27% 55% 58% 34% 45% 0% 12% 37% 54% 37% 40% WNS Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 Average ZW -1.702 0.248 -0.55 -0.941 -0.102 -0.508 0.16 WL -6.274 -2.977 -4.997 -7.218 -3.575 -5.47 -1.135 TS -3.392 -1.784 -3.684 -3.736 -2.379 -5.484 -0.66 3/16/2016 17 Timing after Physical Synthesis Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 FOM WL TS -7829 -6086 -2059 -384 -1854 -405 -2537 -1844 -4732 -2726 -1481 -541 -94 -8 Average Design ckt1 ckt2 ckt3 ckt4 ckt5 ckt6 ckt7 WNS WL TS -0.834 -0.743 -0.705 -0.011 -0.701 -0.139 -2.156 -1.908 -0.472 -0.443 -0.36 -0.293 -0.097 0.182 Average 3/16/2016 TSF -5170 -631 -422 -1770 -1819 -266 0 Improvement TS TSF 22% 34% 81% 69% 78% 77% 27% 30% 42% 62% 63% 82% 91% 100% 58% 65% TSF -0.739 -0.073 -0.19 -1.9 -0.341 -0.351 0.283 Improvement TS TSF 11% 11% 98% 90% 80% 73% 12% 12% 6% 28% 19% 3% 100% 100% 47% 45% 18 Outline • Wire length driven placement • Main methods – Simulated Annealing – Partition-based methods – Analytical methods • Timing and congestion consideration • Newer trends 3/16/2016 19 Congestion Minimization • Traditional placement problem is to minimize interconnection length (wirelength) • A valid placement has to be routable • Congestion is important because it represents routability (lower congestion implies better routability) • There is not yet enough research work on the congestion minimization problem 3/16/2016 20 Definition of Congestion Routing demand = 3 Assume routing supply is 1, overflow = 3 - 1 = 2 on this edge. Overflow on each edge = Routing Demand - Routing Supply (if Routing Demand > Routing Supply) 0 (otherwise) Overflow = S overflow all edges 3/16/2016 21 Correlation between Wirelength and Congestion Total Wirelength = Total Routing Demand 3/16/2016 22 Wirelength Congestion A congestion minimized placement A wirelength minimized placement 3/16/2016 23 Congestion Map of a Wirelength Minimized Placement Congested Spots 3/16/2016 24 Congestion Reduction Postprocessing Reduce congestion globally by minimizing the traditional wirelength Post process the wirelength optimized placement using the congestion objective 3/16/2016 25 An Effective Congestion Driven Placement Framework André Rohe University of Bonn, Germany joint work with Ulrich Brenner ISPD 2002 (Best Paper) 3/16/2016 26 A dense Placement • good wirelength • impossible to route 3/16/2016 27 Possible Solution • easy to route • bad wirelength/timing 3/16/2016 28 Congestion Driven Placement • easy to route + good wirelength almost no extra computation efford ! 3/16/2016 29 Overall Algorithm: Bonn Place • Partitioning based approach • Solves QP in each level, followed by partitioning • Partitioning is done by quadrisection: circuits are partitioned with minimum movement (Vygen) 3/16/2016 30 Methods used for congestion driven placement • Very fast congestion calculation • Inflate circuits in congested regions • Spreading inflated cells 3/16/2016 31 Congestion calculation • Calculate Steiner Tree for each net • Probablitiy estimation for each 2-point connection (similar to Hung & Flynn, Lou et al.) 3/16/2016 32 Quality of congestion calculation congestion estimation 3/16/2016 33 Quality of congestion calculation Bonn Global HDP Global 3/16/2016 34 Inflation of circuits (used previously by Hou et al.) • Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by up to 100% • The inflation is based on the congestion in R and the surrounding regions & the pin density in R • Deflation is possible if the circuit is no longer critical. 3/16/2016 35 Placement Step 0 3/16/2016 36 Placement Step 1 3/16/2016 37 Placement Step 2 3/16/2016 38 Placement Step 3 3/16/2016 39 Placement Step 4 3/16/2016 40 Placement Step 5 3/16/2016 41 Placement Step 6 3/16/2016 42 Placement Step 7 3/16/2016 43 Spreading inflated cells • Repartitioning considers 2x2 windows in placement grid to optimize netlength • Use extra repartitioning step to move cells away from overloaded regions 3/16/2016 44 Summary: Algorithm overview 1. Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} 2. Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Repartitioning 3. Legalization 3/16/2016 45 Algorithm overview 1. Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c) 2. Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Repartitioning 3. Legalization 3/16/2016 46 Algorithm overview 1. Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c) 2. Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Repartitioning 3. Legalization 3/16/2016 47 Algorithm overview 1. Init: Set window_set := {chip area}, set circuit_list(chip area):={all circuits} For (each c in {all circuits}) Increase b(c) proportionally to |pins(c)|/size(c) # initial inflation b(c) 2. Main Loop: While (window size big enough) Solve a QP to minimize quadratic netlength For (each window w in window_set) Quadrisection(w) Compute congestion and update b(c) # update inflation b(c) Quadrisection(w) Reduce overloaded windows # extra repartitioning steps Repartitioning 3. Legalization 3/16/2016 48 Computational Results Standard Chip CPU Congestion Driven len CPU len Blow IBM 1 0:23 h 7.2 m 0:26 h 7.4 m 10.2 % IBM 2 0:26 h 7.9 m 0:27 h 9.0 m 6.6 % IBM 3 3:50 h 134 m 4:39 h 142 m 20.1 % IBM 4 7:08 h 241 m 7:24 h 270 m 20.2 % IBM 5 16:10 h 375 m 16:37 h 406 m 57.8 % +8.7 % +8.5% Mean 3/16/2016 49 Computational Results II Standard Chip HDP ov CPU Congestion Driven len HDP ov CPU len IBM 1 81.7 8374 0:15 h 9m 75.5 0 0:05 h 7.5 m IBM 2 82.7 7000 0:19 h 11.5 m 75.4 0 0:05 h 10.1 m IBM 3 88.8 78111 47:36 h 162 m 77.3 0 4:51 h 164 m IBM 4 82.8 972 7:18 h 324 m 75.2 0 2:48 h 326 m IBM 5 89.9 14382 70:57 h 512 m 84.2 0 29:48 h 527 m -73 % -5.2 % Mean -9 % 3/16/2016 50 Summary • In this module, we cover two important concepts during placement to consider besides wire length – Timing driven placement, using net-weighting • A new sensitivity based net weighting in ISPD’04 paper – Congestion minimization (using ISPD’02 as an example) • congestion estimation • Inflate cells in congested region • Spread inflated cells 3/16/2016 51