An Analytic Placer Aplacer Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Problem Formulation • Minimize wirelength subject to the constraint that cells do not overlap • Therefore the objective includes – Density objective: to spread cells – Wirelength objective : to minimize wirelength Wirelength Formulation • Placement objective: HPWL • Smooth approximation Naylor et al., US Patent 6301693, 2001 – log-sum-exp formula: pick the most dominant terms among pin coordinates – : smoothing parameter – closer to HPWL when α → 0 – precise – strictly convex – continuously differentiable Density Control • Common strategy – divide the placement area into grids – equalize the total cell area in each grid • Squared deviation penalty of an uneven cell distribution – not smooth or differentiable – difficult to optimize Cell Potential Function • Bell-shaped cell potential function [Naylor et al., US Patent 6301693, 2001] • Cell c has potential(c, g) with respect to grid g p(d) 1-2d2/r2 • • • • • Cell c at (x, y) has area A Grid point g = (x', y') p(d) : bell-shaped function r : the radius of cells' potential C : a proportionality factor, s.t. 2(r-d)2/r2 d r r/2 r/2 r Implementation • Cells are spread by minimizing the smooth density penalty function • APlace combines the above two objectives and optimizes the following function using a Conjugate Gradient optimizer: – Density term drives cell spreading – Wirelength term draws connected components back toward each other Wirelength vs. Density Objectives Objective: • Density weight: fixed – larger spread cells out hastily without good wirelength • Wirelength weight: variable – larger contract cells together and prevent them from spreading out – initially set to be large – repeat until all cells are spread out evenly: • execute conjugate-gradient solver until convergence • reduce the weight by half Descent Method • Produce a minimizing sequence x(k) , k=1,… where : Step length : Search direction such that • From convexity we know A General Decent Method Given a starting point x in dom f Repeat 1. Determine a descent direction Δx 2. Line search. Choose a step length t > 0 3. Update. x := x + t Δx Until stopping criterion is satisfied • Line search is called since t determines where along the line { x + t Δx | t in R+ } the next iterate will be • In gradient descent method – Δx := - ) Algorithm Algorithm • Loop stopping criterion – Predetermined number of iterations is reached – Step length returned by the line search function is small enough – The function value is not changing significantly FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model Natarajan Viswanathan Chris Chong-Nuen Chu Iowa State University International Symposium on Physical Design April 19, 2004 FastPlace – Key Features Efficient Analytical Placement using 1. 2. 3. Cell Shifting Iterative Local Refinement Hybrid Net Model Are Existing Algorithms Adequate? Solution Quality There may be significant room for improvement For existing wirelength-driven placement algorithms Cong et al. [ASPDAC 03] [ISPD 03] For existing timing-driven placement algorithms Cong et al. [ICCAD 03] Efficiency Important to have fast placement algorithms Circuit sizes are huge in modern design Placement must be run in early design stages Analytical Placement Formulation Let (xi ,yi ) Coordinate s of the center of cell i wij Weight of the net between cell i and cell j x, y Solution v ectors Cost of the net between cell i and cell j 1 wij ( xi x j ) 2 ( yi y j ) 2 2 1 T 1 T T T Total cost x Qx d x x y Qy d y y const 2 2 Analytical Placement Framework: repeat Solve the convex quadratic program Spread the cells until the cells are evenly distributed FastPlace Approach Framework: repeat Solve the convex quadratic program Reduce wirelength by iterative heuristic Spread the cells until the cells are evenly distributed Special features of FastPlace: Hybrid Net Model Speed up solving of convex QP Iterative Local Refinement Minimize wirelength based on linear objective Cell Shifting Easy-to-compute technique Enable fast convergence Outline FastPlace: Efficient Analytical Placement using 1. 2. 3. Cell Shifting Iterative Local Refinement Hybrid Net Model Spreading by Cell Shifting Quadratic placement should produce good relative position of cells Simple shifting of cells should be able to produce a good placement Major difficulties: 1. How to shift cells in a 2-D region? 2. How to make sure wirelength will still be good? Our Approach: 1. Perform 1-D shifting in x and y directions independently 2. Interleave a small amount of shifting with quadratic placement Cell Shifting 1. Shifting of bin boundary Uniform Bin Structure Non-uniform Bin Structure 2. Shifting of cells linearly within each bin Apply to all rows and all columns independently Cell Shifting – Animation … Bin i Bin i Bin i+1 Bin i+1 k j Ui l OBi-1 OBi OBi+1 Ui+1 OBi-1 OBi OBi+1 k j l NBi NBi Pseudo pin and Pseudo net Need to add forces to prevent cells from collapsing back during the next global optimization Done by adding pseudo pins and pseudo nets Only diagonal and linear terms of the quadratic system need to be updated Takes a single pass of O(n) time to regenerate matrix Q (which is common for both x and y problems) Pseudo pin Pseudo pin Pseudo net Pseudo net Additional Force Target Position Original Position Outline FastPlace: Efficient Analytical Placement using 1. 2. 3. Cell Shifting Iterative Local Refinement Hybrid Net Model Iterative Local Refinement Iteratively go through all the cells one by one For each cell, consider moving it in four directions by a certain distance Compute a score for each direction based on Half-perimeter wirelength (HPWL) reduction Cell density at the source and destination regions Move in the direction with highest positive score (Do not move if no positive score) Distance moved (H or V) is decreasing over iterations Detailed placement is handled by the same heuristic V H H V Outline FastPlace: Efficient Analytical Placement using 1. 2. 3. Cell Shifting Iterative Local Refinement Hybrid Net Model Effect of Net Model on Runtime Need to replace each multi-pin net by 2-pin nets Then the placement problem (even with pseudo nets) can be formulated as a convex QP: 1 T 1 T T T Total cost x Qx d x x y Qy d y y const 2 2 Solved by any convex QP algorithms Use Incomplete Cholesky Conjugate Gradient (ICCG) Runtime is proportional to # of non-zero entries in Q Each non-zero entry in Q corresponds to one 2-pin net Traditionally, placers model each multi-pin net by a clique High-degree nets will generate a lot of 2-pin nets Slow down convex QP algorithms significantly Clique, Star and Hybrid Net Models Star model is introduced by Mo et al. [ICCAD-00] for macro placement Introduce a star node even for 2-pin nets Not clear how the placement result will be affected Star Node Clique Model Star Model # pins 2 3 4 5 6 … Net Model Clique Clique Star Star Star … Hybrid Model Equivalence of Clique and Star Models Lemma: By setting the net weights appropriately, clique and star net models are equivalent. Proof: When star node is at equilibrium position, each cell are the same for clique and star net models. Star Node Weight = γW Weight = γ kW for a k-pin net Clique Model Star Model Experimental Setup ISPD-02 mixed-mode benchmark suite by IBM Macro blocks replaced by standard cells with width set to 4 x average cell width 10% whitespace FastPlace implemented in C Compared with: MetaPl-Capo 8.8 in default mode Dragon 2.2.3 in fixed die mode All placers run on a 750MHz Sun Sparc-2 machine Placement Benchmark Statistics Circuit ibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 Ibm10 Ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm18 #Nodes 12506 19342 22853 27220 28146 32332 45639 51023 53110 68685 70152 70439 83709 147088 161187 182980 184752 210341 #Terminals 246 259 283 287 1201 166 287 286 285 744 406 637 490 517 383 504 743 272 #Nets 14111 19584 27401 31970 28446 34826 48117 50513 60902 75196 81454 77240 99666 152772 186608 190048 189581 201920 #Pins 50566 81199 93573 105859 126308 128182 175639 204890 222088 297567 280786 317760 357075 546816 715823 778823 860036 819697 #Rows 96 109 121 136 139 126 166 170 183 234 208 242 224 305 303 347 379 361 Clique Net Model vs Hybrid Net Model # Non-zero Entries Circuit ibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm18 Average Clique Model Hybrid Model Clique / Hybrid Speed-Up ( Hybrid / Clique ) 109183 343409 206069 220423 349676 321308 373328 732550 478777 707969 508442 748371 744500 1125147 1751474 1923995 2235716 2221860 41164 70014 74680 84556 108282 106835 147009 173541 185102 251101 230865 270849 295048 456474 607289 668491 753507 711702 2.65 4.90 2.76 2.61 3.23 3.01 2.54 4.22 2.59 2.82 2.20 2.76 2.52 2.46 2.88 2.88 2.97 3.12 2.95 1.5 2.4 1.4 1.2 1.3 1.6 1.3 2.0 1.4 1.6 1.2 1.6 1.5 1.3 1.4 1.3 1.4 1.4 1.5 Half Perimeter Wirelength 80 70 50 40 30 20 10 Capo 8.8 Dragon 2.2.3 Average Wirelength Ratio FastPlace / Capo : 1.010 FastPlace / Dragon : 1.016 ibm18 ibm17 ibm16 ibm15 ibm14 ibm13 ibm12 ibm11 ibm10 ibm09 ibm08 ibm07 ibm06 ibm05 ibm04 ibm03 ibm02 0 ibm01 Wirelength (x 10 e6) 60 FastPlace Runtime Comparison Circuit ibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm18 Average Capo 8.8 3 m 59 s 7 m 15 s 8 m 23 s 10 m 46 s 10 m 44 s 12 m 08 s 18 m 32 s 19 m 53 s 22 m 50 s 29 m 04 s 31 m 11 s 30 m 41 s 39 m 27 s 1 h 12 m 1 h 30 m 1 h 31 m 1 h 43 m 1 h 44 m Runtime Dragon 2.2.3 29 m 06 s 31 m 13 s 31 m 49 s 1h5m 1 h 48 m 1 h 21 m 1 h 47 m 4 h 30 m 3 h 43 m 3 h 19 m 2 h 22 m 3 h 48 m 3 h 04 m 7 h 37 m 10 h 34 m 12 h 06 m 26 h 54 m 23 h 39 m FastPlace 13 s 33 s 33 s 39 s 51 s 45 s 1 m 19 s 1 m 33 s 1 m 42 s 2 m 25 s 2 m 13 s 2 m 23 s 2 m 54 s 5 m 34 s 8 m 45 s 10 m 52 s 11 m 30 s 12 m 21 s Speed-Up (Capo / FP) (Dragon / FP) x 18.4 x 134.3 x 13.2 x 56.8 x 15.2 x 57.8 x 16.6 x 100.0 x 12.6 x 127.1 x 16.2 x 108.0 x 14.1 x 81.3 x 12.8 x 174.2 x 13.4 x 131.2 x 12.0 x 82.3 x 14.1 x 64.1 x 12.9 x 95.7 x 13.6 x 63.4 x 12.9 x 82.1 x 10.3 x 72.4 x 8.4 x 66.8 x 9.0 x 140.3 x 8.4 x 114.9 x 13.0 x 97.4 Summary FastPlace -- Efficient Flat Placement Algorithm 13.0x faster than Capo 97.4x faster than Dragon Comparable WL to Capo and Dragon Based on three techniques: 1. Cell Shifting Fast convergence Simple computation 2. Iterative Local Refinement Reduce wirelength based on HPWL measure 3. Hybrid Net Model 1.5x speedup compared to Clique Applicable to any analytical placement tools Thank You !!