Guiding Global Placement with Wire Density Kalliopi Tsota, Cheng-Kok Koh and Venkataramanan Balakrishnan School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907–2035 {ktsota, chengkok, ragu}@ecn.purdue.edu Abstract—This paper presents an efficient technique for the estimation of the routed wirelength during global placement using the wire density of the net. The proposed method identifies congested regions of the chip and incorporates the model of the routed wirelength into the objective function in order to effectively alleviate these regions from congestion. The method is integrated in the analytical placement framework and the two-level structure improves the scalability of the placer and speeds up the algorithm. The proposed analytical placer provides the best-so-far average routed wirelength in the IBM version2 benchmark suite. I. I NTRODUCTION The placement of cells on the chip area is a difficult task in VLSI physical design and the minimization of total routed wirelength is one of the fundamental goals. Moreover, routability is a key measurement of quality. A placed design that results in congested regions on the chip often leads to routing detours and may result in a higher routed wirelength. In the worst case, it may even not allow successful routing. Several approaches have been used in the past to address the issue of routability. To reduce congestion, in [1] the author modeled the routing supply versus routing demand and incorporated the routing congestion into the objective function of simulated annealing. In [2], [3] the routability of a placement solution was improved using allocation of the white space in congested region during global placement. In [4], [5] the congestion-driven placement was based on the inflation of cells that were located in the congested regions. In [6] the authors made use of linear programming to resolve the conflicts of multiple congested regions. In [7], [8], Rent’s rule was used to estimate the routing demand. In [9] the cell placement problem was formulated as a minimal-cost maximum-flow problem. The algorithm proposed in [10] obtained a model for the wire distribution on the chip, using the wire density of the net. The paper presents a routability-driven global placement flow. The global placement is performed in a two-level framework and a wirelength estimation technique is used to guide the entire global placement process. At the first level, the clusters are placed on the chip area using an effective global placement algorithm. Then, the clusters are assigned to the updated positions, obtained from the placement algorithm, and every cell of the original netlist is located at the center of its corresponding cluster. At the second level, the cells of the original netlist are placed on the chip area to further reduce overlap. An efficient routed wirelength estimation method is used to determine the wirelength inside the regions of the chip and guide the placement of the clusters at both levels. The main contributions of the paper are: (i) The estimation 978-1-4244-2820-5/08/$25.00 ©2008 IEEE of wirelength in a region; and (ii) The balancing of the cell density and the wirelength across the placement regions. In this paper, the wire density of a net is defined as the ratio of the bounding-box perimeter over the bounding-box area of the net. If the bounding box of a net overlaps with a region, it adds to the total wirelength within the region by a factor equal to the product of the wire density of the net and the overlap area between the region and the bounding box of the net. The proposed method makes use of the wire density of a net to estimate the wire congestion at different regions of the chip area during the global placement. The objective is to reduce the wire congestion by modifying the cell density and, at the same time, distributing the wire uniformly inside the regions of the chip. The regions characterized as congested are assigned to lower cell density, whereas the uncongested regions are allowed to have higher cell density. During the global placement phase, the placement area is divided into an array of uniform rectangular global bins. The wire distribution is independent of the number of wires that cross the boundary of the bin and therefore not affected by the bin structure. At the same time, it is not easy to calculate the wirelength inside the bin, as there are nets that either have multiple pins or do not reside entirely inside the bin. The technique for the estimation of wirelength inside the bin is based on the fact that the wirelength of a multi-pin net at the interior of a bin is determined from the overlap area between the two and the wire density of the net. The wirelength inside the bin is calculated and incorporated into the objective function. Therefore, the wire density of the net guides the placement process as the tool of estimating the wirelength. By penalizing the bins having wirelength higher or lower than the average, the formulation provides a placement of cells that does not contain congested regions and obtains shorter average routed wirelength. The placer, guided by wire density to estimate wirelength, obtains shorter average routed wirelength compared to the existing placement tools. The improvement is between 2.2% and 14.6% compared to RUDY [10], APlace [11], CHKS+WSA [12], mPL-R+WSA [13], mPL6 [14] and ROOSTER [15], for the circuits of the IBM version2 benchmark suite. The rest of the paper is organized as follows: The two-level framework of the algorithm is covered in Section II. Section III gives the analytical placement formulation. In Section IV, the estimation technique for the routed wirelength is analyzed. Section V includes the implementation details and Section VI shows the experimental results. Section VII concludes the paper. 212 II. A T WO -L EVEL FRAMEWORK The placement algorithm presented in the paper is performed in a two-level framework. The first level deals with a netlist of clusters. Initially, the cluster locations are randomly assigned. The algorithm updates their locations to remove cluster overlap and uniformly distribute wirelength across the chip area. At the second level, the cells of the original netlist are assigned to the locations obtained from the first level (each cell is located at the center of its corresponding cluster), and the placement algorithm places the cells on the chip to further reduce cell overlap and uniformly distribute wirelength. Therefore, the wirelength estimation technique is incorporated in the global placement and applied in both levels of the twolevel framework. The two-level framework of the placer can be easily extended to a multilevel framework. III. G LOBAL PLACEMENT The input to the global placer is a circuit netlist, where the vertices represent the cells and the hyperedges represent the nets in the circuit. The global placement can be applied to the original netlist, at the second level of the two-level framework, or the netlist of clustered cells, at the first level of the two-level framework. As the goal of a global placement is to determine approximate locations of the cells, it is not necessary to enforce the non-overlap constraints strictly. Instead, a popular formulation is to evenly distribute the cell density throughout the entire chip. To facilitate such a formulation, the placement area is divided into uniform bins. Let (x,y) denote the vector of the cell coordinates, W L(x,y) the wirelength of the placement and SDg (x,y) the cell density inside bin g. If the goal is to equalize the cell area in every bin and D is the average cell area, SDg (x,y) = D. The constrained non-linear optimization problem formulated in APlace [11], [16] is min W L(x, y) subject to SDg (x, y) = D, ∀g. The problem is usually solved as a sequence of unconstrained optimization problems and the cell density of the bin is included into the objective function as a weighted penalty 1 2 · min W L(x,y) + (SDg (x,y) − D) , (1) 2·u g where u monotonically decreases in the sequence. The wirelength WL is typically evaluated using HPWL, which is a convex function but not strictly convex. As such, it cannot be effectively minimized. Also, HPWL is not differentiable. Being neither strictly convex nor differentiable, the expression is not amenable to Newton-type methods. The log-sum-exp (LSE) function [17] is often used to capture the linear HPWL. This is a continuous, differentiable function and has been extensively used to approximate and smooth the HPWL. This paper considers a different approximation function for the HPWL. The two-variable CHKS smoothing function is called recursively to approximate the maximum of the pin coordinates within a net. The negative of the same function approximates the minimum. The CHKS function, as presented in [12], [18] constitutes an alternative smoothing method to the HPWL, and it is as effective as the LSE approximation in terms of wirelength minimization. A. Wirelength formulation The half-perimeter wirelength of a net e ∈ E is expressed HP W Le = max{xi } − min{xi } + max{yi } − min{yi }. (2) i∈e i∈e i∈e i∈e The total HPWL is HP W L = HP W Le . e∈E The CHKS function [12] is used to smooth the two-variable maximum function (x1 − x2 )2 + α2 + x1 + x2 , CHKS(x1 , x2 ) = 2 where α > 0 is the smoothing parameter. The multi-variable maximum function, proposed in [18] and used in [12], is obtained by recursive calls to the two-variable maximum function max {x} = max {max {x(1) }, max {x(2) }}, where x(1) , x(2) , is a disjoint partitioning of x. Then, min {x} was obtained from max {x} approximation as min{x} = − max{−x}. This paper follows the definition given in [12]: For a convex function f : R2 → R define ∀1 ≤ i ≤ n − 1, the function fi,i+1 : Rn → R by fi,i+1 (x) = f (xi , xi+1 ), and ∀1 ≤ i < j ≤ n, the function fi,i : Rn → R by fi,i (x) = xi . Moreover, ∀1 ≤ i < j ≤ n and j − i + 1 > 2, let fi,j : Rn → R be defined by fi,j (x) = f (fi,k (x), fk+1,j (x)), where k = i+j 2 . Therefore, the multi-variable maximum and minimum functions are smoothed and approximated by f1,n which, for sake of simplicity, is denoted as f in the rest of the paper. The CHKS function is convex and differentiable, therefore can be handled by general optimization algorithms. Then, the HP W Le (2) is expressed as HP W Le = fx,e + fy,e , where fx,e approximates maxi∈e {xi } − mini∈e {xi } and is obtained by recursive calls to CHKS for the maximum and minimum x-coordinates of the pins of net e. In the same way, fy,e approximates maxi∈e {yi } − mini∈e {yi }. 213 B. Cell density formulation The cell potential is a rectangular function that is smoothed by the Bell-shaped function as in APlace [11]. Let wv , hv be the width, height of cell v and wg , hg be the width, height of bin g. If cell v is located at a horizontal distance dx from the center of bin g, then the x-overlap between g and v. ⎧ wv +2·wg 2 ⎪ ], ⎨ 1 − a · dx if dx ∈ [0, 2 2 w +4·w w +2·w w +4·w v g px (g, v) = ) if dx ∈ [ v 2 g , v 2 g ], b · (dx − ⎪ ⎩ 0 otherwise. 2 where a= 4 (wv +2·wg )·(wv +4·wg ) , b = Bounding box of net B Bin g2 Bounding box of net A Bin g1 Fig. 1. The global bins g1 and g2 are shown in light gray. The bounding boxes of nets A and B overlap bins g1 and g2 respectively. The dark lines represent the routed wirelength of each net. 2 wg ·(wv +4·wg ) , so that the function is continuous when dx = w2v + wg . Similarly, a smoothed cell potential function is defined for the y-direction. The potential function SDg has the form SDg (x,y) = Cv · px (g, v) · py (g, v), v where γ is a user defined constant that determines how much Dg (x,y) deviates from the average D. The optimization problem (1) can now be rewritten as where Cv is a normalization factor such that Av = Cv · px (g, v) · py (g, v). min HP W L(x,y) + g In other words, each cell has a total area potential equal to its area. IV. E STIMATION OF ROUTED WIRELENGTH A highly congested region in the placement may result in routing detours around the region and a larger routed wirelength. Congested areas may also degrade the performance of the router and even result in an unroutable design. In order to obtain a global placement that is eventually routable without increasing the runtime, it is important to efficiently evaluate the congestion and identify congested regions of the design. That rules out performing actual routing to determine wirelength and routing congestion, as routing itself is a difficult problem. During the global placement stage, the path that will be used by the router to route the nets remains unknown and the routing pattern has not yet been obtained. In this paper, the routed-wirelength within a region of the chip is estimated based on the bounding box of each net that overlaps the region. In order to identify the congested regions, it is necessary to determine the wirelength within each one of these regions. For a two-pin net, the router will connect the pins by following the shortest path between them. However, the path the router will follow in the case of a multi-pin net is not as easy to predict. There are possibly many paths that all have the same minimum length for the router to follow. Moreover, the routing paths of the other nets in the region determine the routing path of the net. Let Sdg (x,y) denote the estimated routed wirelength inside the bin and avgd (x,y) the average estimated routed wirelength. Then Dg (x,y) is a function of the estimated wirelength inside bin g and Dg (x,y) = (1 + γ · (1 − Sdg (x,y) )) · D, avgd (x,y) 1 2 · (SDg (x,y) − Dg (x,y)) , 2·u g where the average cell density D used in APlace has now been replaced by Dg (x,y). From (3), if Sdg (x,y) = avgd , Dg (x,y) = D. If Sdg (x,y) < avgd , the estimated routed wirelength inside bin g is less than the average and the cell density allowed inside the bin is higher. Similarly, if Sdg (x,y) > avgd , the estimated routed wirelength is more than the average and the cell density allowed inside the bin is lower. Having estimated the wirelength inside each global bin, the bins that have high (low) congestion are assigned a lower (higher) cell density. The wirelength of a net can be approximated by the HPWL. Since there is a strong correlation between the HPWL and the routed wirelength of a net, in the case of a net being entirely inside a global bin, the routed wirelength can be calculated satisfactorily. However, in the case of a net being only partially inside the bin, the routed wirelength inside the bin cannot be determined using the HPWL of the net. There are many paths outside the bin that connect the pins of the net and the router may not follow a path passing through the bin. An example is illustrated in IV, where the bounding box of net A partially overlaps the global bin g1 (shown in gray) and the router follows a path (shown with a solid line) that is located inside the bin. The bounding box of net B is routed across the global bin g2 . For net B, the path is located outside the bin. Either one of the two cases may occur and, therefore, the exact wirelength cannot be calculated in advance. The wire density of a net is defined dn (x,y) = HP W Ln , Arean (4) where (3) HP W Ln = max{xi } − min{xi } + max{yi } − min{yi } i∈e 214 i∈e i∈e i∈e is the HPWL of the bounding box of the net e, and Global bin Arean = (max{xi } − min{xi }) · (max{yi } − min{yi }) i∈e i∈e i∈e y_top i∈e is the area of the bounding box of the net. The algorithm proposed in [10] estimates the routing dedem (x, y) as the superposition of the rectangle funcmand Drout tions of all nets, weighted by the wire density of each net. The rectangle function of a net is equal to one for (x, y) inside the bounding box of the net and equal to zero otherwise. The algorithm adapts the demand to the supply at each position; essentially, the algorithm seeks a placement solution where the wire density is uniform across the chip. In this paper, however, the formulation uses the wire density of the net only to determine the wirelength inside a global bin and, therefore, it is the wirelength within the bin that is incorporated into the objective function, not the wire density. In addition, the optimization is independent of the shape of the bounding box of the net and the wirelength is independent of the routing model following the placement. Taking into consideration the formula for the wire density of a net (4), it is possible to estimate the routed wirelength inside bin g for every possible configuration. The estimated routed wirelength is dg,e (x,y) = dn (x,y) · Ag,e (x,y), where dn (x,y) is the wire density of net e and Ag,e (x,y) is the overlap area between bin g and the bounding box of net e. The total wirelength inside bin g is the sum of the wirelength of each net which overlaps the bin either fully or partially. Thus dg,e (x,y). (5) Sdg (x,y) = y_bot x_left Fig. 2. The global bins are shown in light gray (the bin under consideration in darker gray) and the bounding box of the net which overlaps with the global bin is shown in black. Five out of the many different possible configurations are depicted here. Following the analysis performed in Section III, let fx,e be the smoothed function that approximates maxi∈e {xi } − mini∈e {xi } and fy,e be the smoothed function that approximates maxi∈e {xi } − mini∈e {xi }. The wire density of a net is written as dn,e (x,y) = e The average bin wirelength is avgd (x,y) = avgg {Sdg (x,y)}. Therefore Dg (x,y) = (1 + γ · (1 − A. Function smoothing Ag,e = (horiz overlap) · (vertical overlap), horiz overlap = CHKS(xright , min xi )+ where Sdg (x,y) = fx,e (x,y) + fy,e (x,y) · Ag,e (x,y)). ( fx,e (x,y) · fy,e (x,y) e Also, fx,e (x,y) + fy,e (x,y) avgd (x,y) = avgg { · Ag,e (x,y))}. ( fx,e (x,y) · fy,e (x,y) e ∂fi,j (x) ∂fi,j (x) ∂fi,k (x) ∂fi,j (x) ∂fk+1,j (x) = · + · , ∂xl ∂fi,k ∂xl ∂fk+1,j ∂xl i∈e +CHKS(−xright , − min xi ), i∈e vertical overlap = CHKS(max yi , min yi )+ i∈e Sdg (x,y) )) · D, avgd (x,y) Most Newton-like optimization solvers require evaluations of the function value f(x) and its gradient g(x) = (x) ∂f (x) ∂f (x) T [ ∂f ∂x1 , ∂x2 , ..., ∂xn ] . To calculate the gradient of f(x) where where k = i+j 2 and 1 ≤ l ≤ j. For i ≤ l ≤ k and CHKS smoothing, i∈e +CHKS(max −yi , min −yi ) i∈e fx,e (x,y) + fy,e (x,y) . fx,e (x,y) · fy,e (x,y) The total wirelength within bin g is Sdg (x,y) = (dn,e (x,y) · Ag,e (x,y)). e The estimation of the wirelength inside the bin using the wire density of a net, includes determining the overlap area Ag,e (x,y). As shown in IV-A, there are different ways in which the bounding box of a net and the bin overlap. For example, when the bounding box of the net is located partially inside and on the right side of the bin as illustrated in IV-A, the overlapped area between the specified net and the bin is calculated as x_right i∈e and (xlef t , ybot ) , (xright , ytop ) are the coordinates of the bin. To smooth the function of the estimated routed wirelength inside bin g, Sdg (x,y), we use the recursive CHKS function. 215 ∂fi,j (x) 1 ∂fi,k (x) = · + ∂xl 2 ∂xl 1 fi,k (x) − fk+1,j (x) ∂fi,k (x) · · . 2 2 2 ∂xl (fi,k (x) − fk+1,j (x)) + alpha To solve the unconstrained optimization problem, one needs to calculate the gradient of Dg , dn . The gradient of Dg is calculated as ∂Dg (x,y) = −γ · D · ∂xl ∂Sdg (x,y) ∂xl ∂maxd (x,y) ∂xl , ∂Dg (x,y) = −γ · D · ∂yl ∂Sdg (x,y) ∂yl ∂maxd (x,y) ∂yl . The gradient of dn (x,y) is calculated as ∂fx,e (x,y) ∂ fx,e (x,y) + fy,e (x,y) ) = − 2 ∂xl , ( ∂xl fx,e (x,y) · fy,e (x,y) fx,e (x,y) ∂fy,e (x,y) ∂ fx,e (x,y) + fy,e (x,y) ∂y )=− 2 l . ( ∂yl fx,e (x,y) · fy,e (x,y) fy,e (x,y) V. I MPLEMENTATION The placement algorithm is applied in a two-level framework as explained in Section II. The global placer places the clusters at the first level and the cells at the second level. For this work, the clusters were obtained from the K-way partitioning of the original circuit netlist using the publicly available software hMetis [19]–[21]. The hMetis algorithm performs K-way partitioning of the circuit netlist with the objective of reducing the number of edgecuts among the partitions. The hybrid first choice scheme in hMetis is used for the grouping of the cells. The partitioning of the netlist is obtained after N = 5 runs of the hMetis algorithm and the partition with the smallest edgecut is selected. The original netlist is clustered to a number of clusters such that the number of cells inside each cluster is between 1% and 5% of the number of cells in the netlist. Each cluster is shaped as a square, with its area being equal to the sum of the areas of the cells that belong to the cluster. The cells are confined inside the cluster area of their corresponding cluster and the input to the placement algorithm is the partitioning of the circuit netlist. At both levels of the framework, the routed wirelength estimation technique guides the placement process. The unconstrained optimization problem is solved using L-BFGS-B, a publicly available quasi-Newton solver with boundary constraints, as described in [22]. The implementation of the solver is in the default mode. The two-level framework includes a multi-grid structuresimilar to the one in [12] and the same (SDg −Dg )2 . The initial weight of the stopping criterion g density penalty is ∂SDg ∂SDg 1 xi ,yi g |SDg − Dg | · (| ∂xi | + | ∂yj |) + u= ∗ ∂W L ∂W L 2 xi ,yj (| ∂xi | + | ∂yj |) (6) ∂Dg ∂Dg 1 xi ,yi g |SDg − Dg | · (| ∂xi | + | ∂yj |) ∗ . ∂W L ∂W L 2 xi ,yj (| ∂xi | + | ∂yj |) After the initialization, u is halved in every iteration. At the end of the global placement, the detailed placer mPL-R+WSA [13] is used to legalize the placement. VI. E XPERIMENTAL RESULTS The results are based on sixteen IBM version 2 easy and hard benchmarks and show the impact of routabilitydriven standard-cell placement on the routed wirelength. All experiments were performed on a 3GHz Pentium 4 CPU with 3GHz memory. Table 1 includes the routed wirelengths, as well as the CPU times for placement compared to mPLR+WSA [13], mPL6 [14], CHKS+WSA [12], RUDY [10], ROOSTER [15] and APlace [11]. For each placement tool, the first column, labeled “r-WL”, shows the routed wirelength on each benchmark after routing the placed design using Cadence WRoute. The second column, labeled “p-CPU”, shows the CPU times for placement. The placement solutions of the proposed approach were routed with CADENCE WRoute 5.3 in default configuration. The routed wirelengths of circuits placed with mPL-R+WSA, mPL6 and CHKS+WSA are as reported in [12] (routed with CADENCE WRoute 5.3) and the routed wirelengths of circuits placed with RUDY, ROOSTER and APlace (routed with CADENCE WRoute 2.3.32) are as reported in [10]. The placement solutions obtained from the proposed approach result in reductions in the total routed wirelength by 14.56%, 5.48%, 5.55%, 2.17%, 9.68% and 8.53%, compared to mPLR+WSA, mPL6, CHKS+WSA, RUDY, ROOSTER and APlace respectively. Compared to RUDY, our algorithm obtained shorter routed wirelength in 12 out of the 16 benchmarks. In Table 1, the row labeled “r-WL%improv.” shows the improvements in the total routed wirelength obtained from our placer over those of other placement tools. The placement time of every placement tool is defined as the total CPU time to perform global/detailed placement and legalization. The placement CPU times for mPL-R+WSA, mPL6, CHKS+WSA and APlace are obtained from our runs of the respective executables. For our placement algorithm, the placement CPU times also include the time to run hMetis. The proposed algorithm results in reductions in the placement CPU time by 52.13%, 8.29%, 45.61% and 63.50% respectively. The CPU times for ROOSTER were not reported in [10] and are, therefore, not included in Table 1. The CPU time for RUDY is obtained on a 2.2 GHz AMD Athlon Opteron 248 machine in [10]; therefore, no direct comparison with the runtime of RUDY and ROOSTER can be made at this point. VII. C ONCLUSION The paper proposes a global placement flow guided by an efficient routed wirelength estimation method. The overlap between the bounding box of a net and a global bin, weighted by the wire density of the net, estimates the wirelength within the corresponding global bin and overcomes the difficulty of determining the wirelength inside a bin for the case of multipin nets. At the same time, the two-level framework reduces the total runtime of the placer. The result is an algorithm that obtains the best average routed wirelenghts in the IBM version2 benchmark suite. 216 TABLE I ROUTABILITY R ESULTS ON IBM VS .2 S TANDARD C ELL P LACEMENT B ENCHMARKS Proposed mPL-R+WSA mPL6 CHKS+WSA RUDY ROOSTER APlace r-WL p-CPU r-WL [12] p-CPU r-WL [12] p-CPU r-WL [12] p-CPU r-WL [10] p-CPU [10] r-WL [10] r-WL [10] p-CPU (m) (s) (m) (s) (m) (s) (m) (s) (m) (s) (m) (m) (s) ibm01e 0.64 132 0.77 216 0.76 109 0.78 233 0.68 34 0.72 0.79 292 ibm01h 0.64 117 0.74 221 0.76 108 0.79 223 0.68 33 0.73 0.73 181 ibm02e 1.74 249 1.88 491 1.79 234 1.83 399 1.85 67 2.00 1.85 355 ibm02h 1.77 247 1.98 513 1.90 217 1.91 394 1.99 68 1.98 1.97 342 ibm07e 3.59 536 4.26 911 3.65 515 3.76 819 3.60 176 3.95 3.98 1324 ibm07h 3.69 496 4.36 897 4.68 503 3.70 859 3.63 183 4.09 4.14 1419 ibm08e 3.91 614 4.54 1396 4.11 630 4.12 1042 4.04 223 4.23 3.96 1485 ibm08h 3.87 574 4.45 1386 4.05 599 4.05 1088 3.96 220 4.24 3.96 1667 ibm09e 2.95 572 3.48 1105 3.19 606 3.12 1213 2.90 216 3.20 3.10 1452 ibm09h 2.96 616 3.61 1034 3.11 571 3.07 1047 2.92 211 3.21 3.10 1612 ibm10e 5.83 874 6.78 1723 6.10 876 6.44 1390 5.81 307 6.42 6.18 1535 ibm10h 5.71 793 6.69 1697 5.96 839 5.81 1292 5.78 314 6.54 6.17 4550 ibm11e 4.31 737 5.10 1304 4.56 802 4.46 1368 4.41 301 4.75 4.76 2277 ibm11h 4.33 689 5.11 1283 4.46 793 4.39 1257 4.40 308 4.72 4.82 2464 ibm12e 8.33 958 10.43 2104 8.77 1025 8.65 1606 8.43 341 9.33 8.60 4241 ibm12h 8.23 1056 10.03 2022 8.62 970 8.28 1428 8.47 345 9.28 8.81 4274 %improv. 14.56% 48.16% 5.48% 0.41% 5.55% 41.14% 2.17% N/A 9.68% 8.53% 60.49% ACKNOWLEDGMENT This research was supported in part by SRC (Task ID 1822.001). R EFERENCES [1] C.-L. E. Cheng, “RISA: Accurate and efficient placement routability modeling,” in Proc. IEEE/ACM International Conference on ComputerAided Design, San Jose, California, Nov. 1994, pp. 690–695. [2] M. Wang, X. Yang, and M. Sarrafzadeh, “Congestion minimization during placement,” IEEE Trans. Computer-Aided Design, vol. 19, pp. 1140–1148, Oct. 2000. [3] J. Lou, T. S. Krishnamoorthy, and H. S. Sheng, “Estimating routing congestion using probabilistic analysis,” in Proc. ACM/SIGDA International Symposium on Physical Design, Sonoma, California, Jan. 2001, pp. 112–117. [4] W. Hou, H. Yu, X. Hong, Y. Cai, W. Wu, J. Gu, and W. Kao, “A new congestion-driven placement algorithm based on cell inflation,” in Proc. Asia and South Pacific Design Automation Conference, Yokohama, Japan, Jan. 2001, pp. 605–608. [5] U. Brenner and A. Rohe, “An effective congestion-driven placement framework,” in Proc. ACM/SIGDA International Symposium on Physical Design, San Diego, California, Jan. 2002, pp. 6–11. [6] Z. Li, W. Wu, and X. Hong, “Congestion-driven incremental placement algorithm for standard cell layout,” in Proc. IEEE/ACM Asia and South Pacific Design Automation Conference, Kitakyushu, Japan, Jan. 2003, pp. 723–728. [7] X. Yang, R. Kashner, and M. Sarrafzadeh, “Congestion estimation during top-down placement,” IEEE Trans. Computer-Aided Design, vol. 21, pp. 72–80, Jan. 2002. [8] B. Hu and M. Marek-Sadowska, “Congestion minimization during placement without estimation,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, San Jose, California, Nov. 2002, pp. 739– 745. [9] M. Wang and M. Sarrafzadeh, “Modeling and minimization of routing congestion,” in Proc. IEEE/ACM Asia and South Pacific Design Automation Conference, Yokohama, Japan, Jan. 2000, pp. 185–290. [10] P. Spindler and F. Johannes, “Fast and accurate routing demand estimation for efficient routability-driven placement,” in Proc. Design Automation and Test in Europe Conference, Nice, France, Apr. 2007, pp. 1–6. [11] A. B. Kahng and Q. Wang, “Implementation and extensibility of an analytic placer,” IEEE Trans. Computer-Aided Design, vol. 24, pp. 734– 747, May 2005. [12] C. Li and C.-K. Koh, “Recursive function smoothing of half-perimeter wirelength for analytical placement,” in Proc. International Symposium on Quality Electronic Design, San Jose, California, Mar. 2007, pp. 829– 834. [13] C. Li, M. Xie, C.-K. Koh, J. Cong, and P. H. Madden, “Routabilitydriven placement and white space allocation,” in Proc. International Conference on Computer-Aided Design, San Jose, California, Nov. 2004, pp. 394–401. [14] T. Chan, J. Cong, M. Romesis, J. R. Shinnerl, K. Sze, and M. Xie, “mPL6: A robust multilevel mixed-size placement engine,” in Proc. ACM/SIGDA International Symposium on Physical Design, San Fransisco, California, Apr. 2005, pp. 227–229. [15] J. A. Roy and I. L. Markov, “Seeing the forest and the trees: Steiner wirelength optimization in placement,” IEEE Trans. Computer-Aided Design, vol. 26, pp. 632–644, Apr. 2007. [16] A. B. Kahng, S. Reda, and Q. Wang, “Architecture and details of a highquality, large-scale analytical placer,” in Proc. IEEE/ACM International Conference on Computer-Aided Design, San Jose, California, Nov. 2005, pp. 891–898. [17] W. Naylor, R. Donelly, and L. Sha, “Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer,” U.S. Patent 6 301 693, Oct. 09, 2001. [18] S. Birbil, S. Fang, H. Frenk, and S. Zhang, “Recursive approximation of the high-dimensional max function,” Department of Systems Engineering and Engineering Management SEEM, The Chinese University of Hong Kong, HK, Tech. Rep. SEEM 2002-12, Dec. 2002. [19] G. Karypis and V. Kumar, “Multilevel K-way hypergraph partitioning,” IEEE Trans. VLSI Syst., vol. 11, pp. 285–300, Jan. 2000. [20] G. Karypis, R. Aggarwal, V. Kumar, and S.Shekhar, “Multilevel K-way hypergraph partitioning: Applications in VLSI domain,” IEEE Trans. VLSI Syst., vol. 7, pp. 69–79, Mar. 1999. [21] N. Selvakkumaran and G. Karypis, “Multi-objective hypergraph partitioning algorithms for cut and maximum subdomain degree minimization,” IEEE Trans. Computer-Aided Design, vol. 25, pp. 504–517, Mar. 2006. [22] C. Zhu, R. Byrd, and J. Nocedal, “L-BFGS-B, FORTRAN routines for large-scale bound constrained optimization,” ACM Transactions on Mathematical Software, vol. 23, pp. 550–560, Dec. 1997. 217