BLOCK PLACEMENT IN VLSI CHIP

advertisement
BLOCK PLACEMENT IN VLSI CHIP
produce near optimal solutions. As a
ABSTRACT
The VLSI placement problem is to
place objects into a fixed die such that there
result, many
new academic placers were invented in the recent
years. Further, modern VLSI design challenges have
reshaped the placement problem. As the feature size
are no overlaps among objects and some
keeps shrinking, billions of transistors (or millions
cost metric (e.g., wire length, rout ability)
of standard cells) can be integrated in a single chip.
is optimized. It is a major step in physical
Meanwhile, the intellectual property (IP) modules
design that has been
and Pre-designed macro blocks (such as embedded
studied for decades.
However, modern VLSI design challenges
memories, analog blocks, pre-designed data paths,
etc.) are often reused. As a result, advanced VLSI
Have reshaped the placement problem. A
designs often contain a large number (hundreds) of
modern placer needs to handle large-scale
macros of very different sizes from each other and
designs
objects,
the standard cells, and some of the macros may be
heterogeneous objects with very different
prep laced in the chip. The dramatically increasing
with
millions
sizes, and various
of
complex
place-ment
interconnect complexity further imposes routing
difficulty. In addition to wire length, therefore,
constraints such as prep laced blocks and
modern placement shall also consider the density
Chip density. In this paper, we first
constraint. To solve such the modern large-scale
introduce the major techniques employed in
mixed-size placement problem, many academic
our placer for tackling the large-scale
placers were invented in recent years. Those placers
mixed-size designs and the aforementioned
can be classified into three major categories: (1) the
analytical approach, (2) the min-cut partitioning
constraints, and then provide some future
based approach and (3) the hybrid approach. We
research
intend to determine the optimal positions of
directions
for
the
modern
Placement problem.
movable blocks so that the total wire length is
minimized and there is no overlap among blocks. Xi,
I. INTRODUCTION
The VLSI placement problem is to place objects
into a fixed die such that there are no overlaps
among objects and some cost metric (e.g., wire
length, rout ability) is optimized. It is a major step in
physical design that has been studied for decades.
Yet, it has attracted much attention recently because
recent studies show that existing placers still cannot
yi center coordinate of block VI WI, hi width and
height of block VI, wb, hb width and height of bin
b Mb maximum area of movable blocks in bin b Db
potential (area of movable blocks) in bin b Pb base
potential (prep laced block area) in bin b density
target placement density
As a ˜ 0, log-sum-exp wire length gives a
good approximation to The HPWL.
The function Db(x, y) can be expressed as
Db(x, y) =_n v.V PX (b, v) Py (b, v), where PX
and Py are the overlap functions between b in b and
block v along the x and y directions. Since density
Fig. 1. Notation used in this paper.
Db(x, y) is neither smooth nor differentiable, A
To evenly distribute the blocks, we divide
the placement region into uniform non-overlapping
bin grids. Then, the global placement problem can
be formulated as a constrained minimization
problem as follows: min W(x, y)
s.t. Db(x, y) =
Mb, for each bin b, (1) where W(x, y) is the wire
Place
each block to smooth the
is defined by PX (b, v) = _ 1 - ad2 x, 0 = dx =
wv/2 + wb b(dx - 2wb - 2wg)2, wv/2 + wb = dx =
wv/2 + 2wb 0, wv/2 + 2wb = dx
a = 4/((wv + 2wb)(wv + 4wb))
b = 2/(wb(wv + 4wb))
Mb is the maximum area of movable blocks in bin
density value for each bin, wb (hb) is the width
(height) of bin b, and Pb is the base potential that
width, and dx is the x direction difference between
the block v and the center of the bin b. The range of
block’s potential is wv + 2wb in the x direction.
By the smooth one, D_b(x, y) =_n
equals the prep laced block area in bin b. Note that
V.V cvpx (b, v) py (b, v),
Mb is a fixed value as long as all prep laced block
positions are given and the bin size
Where CV is a normalization factor so that
is determined.
Figure 1 gives the notation used in this paper. The
the total potential of a block equals its area.
Min W(x, y) + ë_b
wire length W(x, y) is defined as the total half-
(D_b(x, y) -Mb) 2
perimeter wire length (HPWL) given by W(x, y) =
_net e (maxvi, vj.e |xi - xj |+ maxvi, vj.e |yi - yj |).
(2)
Since W(x, y) is non-convex, it is hard to
minimize it directly.
Thus, several smooth wire
(5)
Here, wb is the bin width, wv is the block
be computed by Mb = density(wbhb a user-specified target
(4)
Where
that is the total area of movable blocks in bin b, and
Pb), where density is
density PX and Py,
respectively. The bell-shaped potential function PX
length function, b(x, y) is the potential function
b. Mb can
uses bell-shaped functions PX and py for
(6)
Algorithm: Multilevel Global Placement
Input:
Hyper graph H0: mixed-size circuit
length approximation functions are proposed in the
Nmax: the maximum block number in the coarsest
literature. In NTUplace3, we apply the log-sumexp
level
wire length model, ã_e.E (log_vk.e
Output:
Exp (xk/ã) + log_vk.e
(X*, y*): optimal block positions
Exp (-xk/ã) +
01. Level = 0;
log_vk.e
02. While (Block Number (Hlevel) > nmax)
Exp (yk/ã) + log_vk.e
03. level++;
Exp (-yk/ã)). (3)
04. Hlevel = FirstChoiceClustering (Hlevel-1);
05. Initialize block positions by SolveQP (Hlevel);
measured by the overflow ratio, which is defined as
06. For current Level = level to 0
follows:
07. Initialize bin grid size nbin. Vnx;
Overflow ratio = ∑Bin b max (Db(x, y) -Mb, 0) /
08. Initialize base potential for each bin;
(∑total movable area)
09. Initialize ë0 = _|.W(x, y)|
The global placement stage stops when the overflow
_|.D_b
ratio is less than a user-specified target value, which
(X, y)|
is 0 by default.
; m = 0;
2) Base Potential Smoothing:
10. do
Prep laced blocks pre-define the base potential,
11. Solve min W(x, y) + ëm_ (D_b(x, y) -Mb) 2;
which significantly affects block spreading. Since
12. m++;
the base potential Pb is not smooth, it incurs
13. Ëm = 2ëm-1;
mountains that prevent movable blocks from
14. if (current Level == 0 & overflow ratio <
passing through these regions.
10%)
G(x, y) =1/2πσ2
15. Call LookAheadLegalization () and
Where σ is the standard deviation of the
Save the best result;
distribution. Applying convolution to the Gaussian
16. Compute overflow ratio;
function G with the base potential P, P_(x, y) =
17. Until (spreading enough or
G(x, y) * P(x, y), we can obtain a smoother base
No further reduction in overflow ratio)
Fig. 3. (a) The density profile of newblue2. (b) The
18. if (current Level == 0)
base potential after Gaussian smoothing results in a
19. Restore the best look-ahead result;
better smoothing potential. (c) The smoothing
20. Else
function P__(x, y) is defined as follows:
21. Call Macro Shifting ();
P__(x, y) = _ P_ + (P_(x, y) - P_)ä if P_(x, y) = P_
22. Decluster and update block positions.
P_ - (P_ - P_(x, y))ä if P_(x, y) = P_,
Fig. 2. Our global placement algorithm.
Where a = 1.
A. Global Placement
3) Conjugate Gradient Search with Dynamic
1) Multilevel Framework:
Step Size:
We use the multilevel framework for global
We use the conjugate gradient (CG) method with
placement to improve the scalability. Our algorithm
dynamic step size instead of line search to minimize
is summarized in Figure 2. The coarsening stage
Equation (6). After computing the conjugate
(lines 1–4) iteratively clusters the blocks based on
gradient direction dk, the step size ák is computed
connectivity/size to reduce the problem size until a
by ák = s/||dk||2, where s is a user-specified scaling
given threshold is reached. Then, we find an initial
factor. By doing so, we can limit the step size of
placement (line 5). In the uncoarsening stage (lines
block spreading since the total quadratic Euclidean
6–22), it iteratively declusters the blocks and refines
movement is fixed,
the block positions to reduce the objective function.
4) Macro Shifting:
In NTUplace3, the evenness of block distribution is
. (7)
e-x2+y2/2σ²
(8)
(9)
In the global placement stage, it is important to
implementation, we set k = 3 for a good tradeoff
preserve legal macro positions since macros are
between the running time and solution quality. This
much bigger than standard cells and illegal macro
process repeats until all standard cells are processed.
positions typically make legalization much more
3) Cell Sliding:
difficult. To avoid this, we apply macro shifting at
The objective of cell sliding is to reduce the density
each declustering level of the global placement
overflow in the congested area. We divide the
stage. Macro shifting moves macros to the closest
placement region into uniform non-overlapping
legal positions.
bins, and then iteratively reduce the densities of
B. Legalization
overflowed bins by sliding the cells horizontally
After global placement, legalization removes all
from denser bins to sparser bins, with the cell order
overlaps and places standard cells into rows. We
being preserved.
extend the standard-cell legalization method is to
V. FUTURE CHALLENGES
solve the mixed-size legalization problem. The
A. Macro Placement
legalization
are
We can classify the methods for handling the large-
determined by their x coordinates and sizes (widths
scale mixed-size designs into three major categories:
and heights). Larger blocks get the priority for
(1)
legalization. Therefore, we legalize macros earlier
placement: Most existing mixed-sized placers (e.g.,
than standard cells. After the legalization order is
A Place, Kraftwerk, mPL, NTUplace3, etc.) employ
Determined, macros are placed to their nearest
this single-stage methodology for macro and
available positions and cells are packed into rows
standard-cell placement. As the number of macros
with the smallest wire length.
increases dramatically due to the pervasive use of IP
C. Detailed Placement
modules,
order
of
macros
and
cells
1) Cell Matching:
simultaneous
however,
macro
the
and
standard-cell
methodology
incurs
significant difficulties in legality and complexity.
We extend the window-based detailed placement
Consequently, a robust legalizer is desirable for this
(WDP) algorithm and name our approach cell
method. (2) Constructive macro placement: Most
matching here. The WDP algorithm finds a group of
partitioning based placers (e.g., Capo, PATOMA
exchangeable cells inside a given window, and
keep macro overlap-free during the placement
formulates
problem by
process by recursively partitioning the chip/macros
matching the cells to the empty slots in the window.
into sub regions. An intrinsic limitation of this
The cost is given by the HPWL difference of a cell
hierarchical approach lies in the lack of the global
in each empty slot. The bipartite matching problem
interaction among different sub regions/macros, and
can be solved optimally in polynomial.
thus the solution quality is also limited, especially
a
bipartite
matching
2) Cell Swapping:
for the placement instances with low utilization
The cell swapping technique selects k adjacent cells
rates. (3) two-stage macro placement: The two-
each time to find the best ordering by enumerating
staged approach consists of macro placement
all possible orderings using the branch-and-bound
followed by standard-cell placement. This approach
method. Here, k is a user-specified parameter. In our
is more robust in finding legal placement, and is
applications for lower heat dissipation (and thus
thus widely used in the industry.
cooling cost). Previous works, like Cheon et al],
B. Routability-Driven Placement
proposed to reduce the power consumption during
Most existing placement algorithms focus on total
the placement stage.
wire length minimization to obtain better circuit
E. Thermal Placement
performance and smaller layout area. Despite the
As the process technology advances, the feature size
pervasive use of the half-perimeter wire length
keeps shrinking and thus the integration density
objective, there is a significant mismatch between
keeps increasing while the clock frequency keeps
wire length and congestion objectives in placement.
rising. As a result, the increased power density
Although most routing algorithms can handle
significantly raises the chip temperature. However,
congestion, often routing congestion violations
reducing the power consumption alone is not
cannot be totally removed if the given placement
sufficient to reduce the chip temperature, since the
does not consider routability. Therefore, it is of
power density is also a dominant factor. Therefore,
particular significance to consider routability during
it is desirable to develop new techniques that can
placement, especially for modern VLSI designs with
Evenly spread hot blocks/cells over the whole
very large-scale interconnections. Previous works
placement region to lower the chip temperature and
allocate white spaces into congested regions for
increase the chip reliability.
better routability.
VI. CONCLUSIONS
C. Timing-Driven Placement
Modern VLSI design challenges have reshaped the
In high-performance circuits, a large portion of
placement problem. In this paper, we have presented
timing optimization is performed in the placement
example techniques to tackle the challenges arising
stage. Traditional placement algorithms usually
from large-scale mixed-size circuit designs with the
achieve
length
wire length optimization. Although significant
minimization. Nevertheless, there is a significant
progress has been made in placement research,
gap between wire length and actual delay; so many
modern circuit designs have induced many more
methods have been proposed recently to tackle this
challenges and opportunities for future research on
challenge. Those proposed timing-driven placement
macro placement and routability-, timing- , power-,
methods can be classified into two major categories:
and/or thermal-driven optimization of the placement
(1) path-based and (2) net-based methods. The path-
problem.
based methods
VII. REFERENCES
the
timing
goal
via
wire
try to control critical path delays
directly, but they might incur prohibitively high
[1]
time complexity for modern large-scale circuits due
http://www.ispd.cc/program.html.
to their exponentially growing numbers of paths.
[2] A. R. Agnihotri, S. Ono, and P. H. Madden.
D. Power-Aware Placement
Recursive bisection Placement: Feng Shui 5.0
Power consumption has long become the first-order
implementation details. In Proc. of ISPD, pages
cost metric not only for hand-held devices for longer
230–232, 2005.
battery
life,
but
also
for
high-performance
ISPD
2006
Program.
[3] T. Chan, J. Cong, J. Shinnerl, K. Sze, and M.
Xie.
mPL6:
Enhanced
multilevel
mixed-size
placement. In Proc. of ISPD, pages 212–214, 2006.
ILLUSTRATIONS:
Download