BLOCK PLACEMENT IN VLSI CHIP produce near optimal solutions. As a ABSTRACT The VLSI placement problem is to place objects into a fixed die such that there result, many new academic placers were invented in the recent years. Further, modern VLSI design challenges have reshaped the placement problem. As the feature size are no overlaps among objects and some keeps shrinking, billions of transistors (or millions cost metric (e.g., wire length, rout ability) of standard cells) can be integrated in a single chip. is optimized. It is a major step in physical Meanwhile, the intellectual property (IP) modules design that has been and Pre-designed macro blocks (such as embedded studied for decades. However, modern VLSI design challenges memories, analog blocks, pre-designed data paths, etc.) are often reused. As a result, advanced VLSI Have reshaped the placement problem. A designs often contain a large number (hundreds) of modern placer needs to handle large-scale macros of very different sizes from each other and designs objects, the standard cells, and some of the macros may be heterogeneous objects with very different prep laced in the chip. The dramatically increasing with millions sizes, and various of complex place-ment interconnect complexity further imposes routing difficulty. In addition to wire length, therefore, constraints such as prep laced blocks and modern placement shall also consider the density Chip density. In this paper, we first constraint. To solve such the modern large-scale introduce the major techniques employed in mixed-size placement problem, many academic our placer for tackling the large-scale placers were invented in recent years. Those placers mixed-size designs and the aforementioned can be classified into three major categories: (1) the analytical approach, (2) the min-cut partitioning constraints, and then provide some future based approach and (3) the hybrid approach. We research intend to determine the optimal positions of directions for the modern Placement problem. movable blocks so that the total wire length is minimized and there is no overlap among blocks. Xi, I. INTRODUCTION The VLSI placement problem is to place objects into a fixed die such that there are no overlaps among objects and some cost metric (e.g., wire length, rout ability) is optimized. It is a major step in physical design that has been studied for decades. Yet, it has attracted much attention recently because recent studies show that existing placers still cannot yi center coordinate of block VI WI, hi width and height of block VI, wb, hb width and height of bin b Mb maximum area of movable blocks in bin b Db potential (area of movable blocks) in bin b Pb base potential (prep laced block area) in bin b density target placement density As a ˜ 0, log-sum-exp wire length gives a good approximation to The HPWL. The function Db(x, y) can be expressed as Db(x, y) =_n v.V PX (b, v) Py (b, v), where PX and Py are the overlap functions between b in b and block v along the x and y directions. Since density Fig. 1. Notation used in this paper. Db(x, y) is neither smooth nor differentiable, A To evenly distribute the blocks, we divide the placement region into uniform non-overlapping bin grids. Then, the global placement problem can be formulated as a constrained minimization problem as follows: min W(x, y) s.t. Db(x, y) = Mb, for each bin b, (1) where W(x, y) is the wire Place each block to smooth the is defined by PX (b, v) = _ 1 - ad2 x, 0 = dx = wv/2 + wb b(dx - 2wb - 2wg)2, wv/2 + wb = dx = wv/2 + 2wb 0, wv/2 + 2wb = dx a = 4/((wv + 2wb)(wv + 4wb)) b = 2/(wb(wv + 4wb)) Mb is the maximum area of movable blocks in bin density value for each bin, wb (hb) is the width (height) of bin b, and Pb is the base potential that width, and dx is the x direction difference between the block v and the center of the bin b. The range of block’s potential is wv + 2wb in the x direction. By the smooth one, D_b(x, y) =_n equals the prep laced block area in bin b. Note that V.V cvpx (b, v) py (b, v), Mb is a fixed value as long as all prep laced block positions are given and the bin size Where CV is a normalization factor so that is determined. Figure 1 gives the notation used in this paper. The the total potential of a block equals its area. Min W(x, y) + ë_b wire length W(x, y) is defined as the total half- (D_b(x, y) -Mb) 2 perimeter wire length (HPWL) given by W(x, y) = _net e (maxvi, vj.e |xi - xj |+ maxvi, vj.e |yi - yj |). (2) Since W(x, y) is non-convex, it is hard to minimize it directly. Thus, several smooth wire (5) Here, wb is the bin width, wv is the block be computed by Mb = density(wbhb a user-specified target (4) Where that is the total area of movable blocks in bin b, and Pb), where density is density PX and Py, respectively. The bell-shaped potential function PX length function, b(x, y) is the potential function b. Mb can uses bell-shaped functions PX and py for (6) Algorithm: Multilevel Global Placement Input: Hyper graph H0: mixed-size circuit length approximation functions are proposed in the Nmax: the maximum block number in the coarsest literature. In NTUplace3, we apply the log-sumexp level wire length model, ã_e.E (log_vk.e Output: Exp (xk/ã) + log_vk.e (X*, y*): optimal block positions Exp (-xk/ã) + 01. Level = 0; log_vk.e 02. While (Block Number (Hlevel) > nmax) Exp (yk/ã) + log_vk.e 03. level++; Exp (-yk/ã)). (3) 04. Hlevel = FirstChoiceClustering (Hlevel-1); 05. Initialize block positions by SolveQP (Hlevel); measured by the overflow ratio, which is defined as 06. For current Level = level to 0 follows: 07. Initialize bin grid size nbin. Vnx; Overflow ratio = ∑Bin b max (Db(x, y) -Mb, 0) / 08. Initialize base potential for each bin; (∑total movable area) 09. Initialize ë0 = _|.W(x, y)| The global placement stage stops when the overflow _|.D_b ratio is less than a user-specified target value, which (X, y)| is 0 by default. ; m = 0; 2) Base Potential Smoothing: 10. do Prep laced blocks pre-define the base potential, 11. Solve min W(x, y) + ëm_ (D_b(x, y) -Mb) 2; which significantly affects block spreading. Since 12. m++; the base potential Pb is not smooth, it incurs 13. Ëm = 2ëm-1; mountains that prevent movable blocks from 14. if (current Level == 0 & overflow ratio < passing through these regions. 10%) G(x, y) =1/2πσ2 15. Call LookAheadLegalization () and Where σ is the standard deviation of the Save the best result; distribution. Applying convolution to the Gaussian 16. Compute overflow ratio; function G with the base potential P, P_(x, y) = 17. Until (spreading enough or G(x, y) * P(x, y), we can obtain a smoother base No further reduction in overflow ratio) Fig. 3. (a) The density profile of newblue2. (b) The 18. if (current Level == 0) base potential after Gaussian smoothing results in a 19. Restore the best look-ahead result; better smoothing potential. (c) The smoothing 20. Else function P__(x, y) is defined as follows: 21. Call Macro Shifting (); P__(x, y) = _ P_ + (P_(x, y) - P_)ä if P_(x, y) = P_ 22. Decluster and update block positions. P_ - (P_ - P_(x, y))ä if P_(x, y) = P_, Fig. 2. Our global placement algorithm. Where a = 1. A. Global Placement 3) Conjugate Gradient Search with Dynamic 1) Multilevel Framework: Step Size: We use the multilevel framework for global We use the conjugate gradient (CG) method with placement to improve the scalability. Our algorithm dynamic step size instead of line search to minimize is summarized in Figure 2. The coarsening stage Equation (6). After computing the conjugate (lines 1–4) iteratively clusters the blocks based on gradient direction dk, the step size ák is computed connectivity/size to reduce the problem size until a by ák = s/||dk||2, where s is a user-specified scaling given threshold is reached. Then, we find an initial factor. By doing so, we can limit the step size of placement (line 5). In the uncoarsening stage (lines block spreading since the total quadratic Euclidean 6–22), it iteratively declusters the blocks and refines movement is fixed, the block positions to reduce the objective function. 4) Macro Shifting: In NTUplace3, the evenness of block distribution is . (7) e-x2+y2/2σ² (8) (9) In the global placement stage, it is important to implementation, we set k = 3 for a good tradeoff preserve legal macro positions since macros are between the running time and solution quality. This much bigger than standard cells and illegal macro process repeats until all standard cells are processed. positions typically make legalization much more 3) Cell Sliding: difficult. To avoid this, we apply macro shifting at The objective of cell sliding is to reduce the density each declustering level of the global placement overflow in the congested area. We divide the stage. Macro shifting moves macros to the closest placement region into uniform non-overlapping legal positions. bins, and then iteratively reduce the densities of B. Legalization overflowed bins by sliding the cells horizontally After global placement, legalization removes all from denser bins to sparser bins, with the cell order overlaps and places standard cells into rows. We being preserved. extend the standard-cell legalization method is to V. FUTURE CHALLENGES solve the mixed-size legalization problem. The A. Macro Placement legalization are We can classify the methods for handling the large- determined by their x coordinates and sizes (widths scale mixed-size designs into three major categories: and heights). Larger blocks get the priority for (1) legalization. Therefore, we legalize macros earlier placement: Most existing mixed-sized placers (e.g., than standard cells. After the legalization order is A Place, Kraftwerk, mPL, NTUplace3, etc.) employ Determined, macros are placed to their nearest this single-stage methodology for macro and available positions and cells are packed into rows standard-cell placement. As the number of macros with the smallest wire length. increases dramatically due to the pervasive use of IP C. Detailed Placement modules, order of macros and cells 1) Cell Matching: simultaneous however, macro the and standard-cell methodology incurs significant difficulties in legality and complexity. We extend the window-based detailed placement Consequently, a robust legalizer is desirable for this (WDP) algorithm and name our approach cell method. (2) Constructive macro placement: Most matching here. The WDP algorithm finds a group of partitioning based placers (e.g., Capo, PATOMA exchangeable cells inside a given window, and keep macro overlap-free during the placement formulates problem by process by recursively partitioning the chip/macros matching the cells to the empty slots in the window. into sub regions. An intrinsic limitation of this The cost is given by the HPWL difference of a cell hierarchical approach lies in the lack of the global in each empty slot. The bipartite matching problem interaction among different sub regions/macros, and can be solved optimally in polynomial. thus the solution quality is also limited, especially a bipartite matching 2) Cell Swapping: for the placement instances with low utilization The cell swapping technique selects k adjacent cells rates. (3) two-stage macro placement: The two- each time to find the best ordering by enumerating staged approach consists of macro placement all possible orderings using the branch-and-bound followed by standard-cell placement. This approach method. Here, k is a user-specified parameter. In our is more robust in finding legal placement, and is applications for lower heat dissipation (and thus thus widely used in the industry. cooling cost). Previous works, like Cheon et al], B. Routability-Driven Placement proposed to reduce the power consumption during Most existing placement algorithms focus on total the placement stage. wire length minimization to obtain better circuit E. Thermal Placement performance and smaller layout area. Despite the As the process technology advances, the feature size pervasive use of the half-perimeter wire length keeps shrinking and thus the integration density objective, there is a significant mismatch between keeps increasing while the clock frequency keeps wire length and congestion objectives in placement. rising. As a result, the increased power density Although most routing algorithms can handle significantly raises the chip temperature. However, congestion, often routing congestion violations reducing the power consumption alone is not cannot be totally removed if the given placement sufficient to reduce the chip temperature, since the does not consider routability. Therefore, it is of power density is also a dominant factor. Therefore, particular significance to consider routability during it is desirable to develop new techniques that can placement, especially for modern VLSI designs with Evenly spread hot blocks/cells over the whole very large-scale interconnections. Previous works placement region to lower the chip temperature and allocate white spaces into congested regions for increase the chip reliability. better routability. VI. CONCLUSIONS C. Timing-Driven Placement Modern VLSI design challenges have reshaped the In high-performance circuits, a large portion of placement problem. In this paper, we have presented timing optimization is performed in the placement example techniques to tackle the challenges arising stage. Traditional placement algorithms usually from large-scale mixed-size circuit designs with the achieve length wire length optimization. Although significant minimization. Nevertheless, there is a significant progress has been made in placement research, gap between wire length and actual delay; so many modern circuit designs have induced many more methods have been proposed recently to tackle this challenges and opportunities for future research on challenge. Those proposed timing-driven placement macro placement and routability-, timing- , power-, methods can be classified into two major categories: and/or thermal-driven optimization of the placement (1) path-based and (2) net-based methods. The path- problem. based methods VII. REFERENCES the timing goal via wire try to control critical path delays directly, but they might incur prohibitively high [1] time complexity for modern large-scale circuits due http://www.ispd.cc/program.html. to their exponentially growing numbers of paths. [2] A. R. Agnihotri, S. Ono, and P. H. Madden. D. Power-Aware Placement Recursive bisection Placement: Feng Shui 5.0 Power consumption has long become the first-order implementation details. In Proc. of ISPD, pages cost metric not only for hand-held devices for longer 230–232, 2005. battery life, but also for high-performance ISPD 2006 Program. [3] T. Chan, J. Cong, J. Shinnerl, K. Sze, and M. Xie. mPL6: Enhanced multilevel mixed-size placement. In Proc. of ISPD, pages 212–214, 2006. ILLUSTRATIONS: