partition

advertisement
Partitioning
1998. 5. 19
조준동
SungKyunKwan Univ.
VADA Lab.
1
Partitioning in VLSI CAD
•
•
•
Partitioning is a technique widely used to solve diverse problems occurring in
VLSI CAD. Applications of partitioning can be found in logic synthesis, logic
optimization, testing, and layout synthesis.
High-quality partitioning is critical in high-level synthesis. To be useful, highlevel synthesis algorithms should be able to handle very large systems.
Typically, designers partition high-level design specifications manually into
procedures, each of which is then synthesized individually. However, logic
decomposition of the design into procedures may not be appropriate for highlevel and logic-level synthesis [60]. Different partitionings of the high-level
specifications may produce substantial differences in the resulting IC chip
areas and overall system performance.
Some technology mapping programs use partitioning techniques to map a
circuit specified as a network of modules performing simple Boolean
operations onto a network composed of specific modules available in an FPGA.
SungKyunKwan Univ.
VADA Lab.
2
Partitioning in VLSI CAD
•
•
Since the test generation problem for large circuits may be extremely intensive
computationally, circuit partitioning may provide the means to speed it up.
Generally, the problem of test pattern generation is NP-complete. To date, all
test generation algorithms that guarantee finding a test for a given fault exhibit
the worst-case behavior requiring CPU times exponentially increasing with the
circuit size. If the circuit can be partitioned into k parts (k not fixed), each of
bounded size c, then the worst-case test generation time would be reduced
linearly related to the circuit size.
Partitioning is often utilized in layout synthesis to produce and/or improve the
placement of the circuit modules. Partitioning is used to find strongly
connected subcircuits in the design, and the resulting information is utilized by
some placement algorithms to place in mutual proximity components
belonging to such subcircuits, thus minimizing delays and routing lengths.
SungKyunKwan Univ.
VADA Lab.
3
Partitioning in VLSI CAD
•
Another important class of partitioning problems occurs at the system design
level. Since IC packages can hold only a limited number of logic components
and external terminals, the components must be partitioned into subcircuits
small enough to be implemented in the available packages.
•
Partitioning has been used as well to estimate some properties of physical IC
designs, such as the expected IC area.
SungKyunKwan Univ.
VADA Lab.
4
Circuit Partitioning
•
The early attempts to solve the circuit partitioning problem were based on the
representation of the circuit as a graph G = (V,E), where V is a set of nodes (vertices)
representing the fundamental components, such as gates, flip-flops, inputs and outputs
and E is a set of edges representing nets present in the network. Graph partitioning
problems representing VLSI design problems usually involve separating the set of the
graph nodes into disjoint subsets while optimizing some objective function defined on
the graph vertices and edges. In the partitioned graph, edges can be divided into two
classes: inter-subset edges whose vertices belong to different subsets, and intra-subset
edges whose vertices belong to the same subset. The objective functions associated with
the graph partitioning problems usually treat these classes of edges in different ways.
•
One classic graph partitioning problem is the minimum cut (mincut) problem. Its
objective is to divide V into two disjoint parts, U and W, such that the number of the
inter-subset edges is minimized. The set e(U,W) is referred to as a cut set, and the
number of edges in cut set as the cut value.
SungKyunKwan Univ.
VADA Lab.
5
Circuit Partitioning
• graph and physical
representation
SungKyunKwan Univ.
VADA Lab.
6
VHDL example
process communication
Behavioral description
control/data flow graph
SungKyunKwan Univ.
VADA Lab.
7
Mincut Partitioning
•
•
An exact solution to the mincut problem was provided by Ford and Fulkerson [11], who
transformed the mincut problem into the maximum flow (maxflow) problem. The
maxflow-mincut algorithm finds a maximum flow in a network; the maxflow value is
equal to the mincut value. The first heuristic algorithm for a two-way graph partitioning
into equal-sized subsets was proposed by Kernighan and Lin, Their method consists of
choosing an initial partition randomly and reducing the cut value by exchanging
appropriately selected pairs of nodes from the subsets. After exchanging the positions,
nodes are locked in new positions. In subsequent steps, pair of unlocked nodes are
selected and exchanged until all nodes are locked. The execution of the algorithm stops,
when it riches the local minimum.
Most nets in digital circuits are multi-point connections among more than two modules
(logic gates, flip-flops, etc.). Therefore, modeling VLSI circuit partitioning problems as
graph partitioning problems may lead to poor results caused by inadequate
representation of multi-point nets which have to be decomposed into two-point
connections. One way to approximate circuit partitioning problems is to transform the
circuit into a weighted graph G' representation via a net model. For example, a multipoint net connecting n nodes may be modeled as a complete graph (clique) spanned on
these nodes, i.e., containing all possible edges among these nodes.
SungKyunKwan Univ.
VADA Lab.
8
Clustering (Cont’d)
• Clustering based on criterion B below the first cut-line,
then criterion A
• Clustering based on criterion A below the second cut-line,
then criterion B
SungKyunKwan Univ.
VADA Lab.
9
Clustering Example
• Two-cluster Partition
• Three-cluster Partition
SungKyunKwan Univ.
VADA Lab.
10
Survey on Partitioning
•
. We discuss the traditional min-cut and ratio cut bipartitioning formulations along with
multi-way extensions and newer problem formulations, e.g., constraint-driven
partitioning (for FPGAs) and partitioning with module replication. Our discussion of
solution approaches is divided into four major categories: move-based approaches,
geometric representations, combinatorial formulations, and clustering approaches.
Move-based algorithms iteratively explore the space of feasible solutions according to a
neighborhood operator ; such methods include greed, iterative exchange, simulated
annealing, and evolutionary algorithms. Algorithms based on geometric representations
embed the circuit netlist in some type of "geometry", e.g, a 1-dimensional linear
ordering or a multi-dimensional vector space; the embeddings are commonly
constructed using spectral methods. Combinatorial methods transform the partitioning
problem into another type of optimization, e.g., based on network flows or mathematical
programming. Finally, clustering algorithms merge the netlist modules into many small
clusters; we discuss methods which combine clustering with existing algorithms (e.g.,
two-phase partitioning).
SungKyunKwan Univ.
VADA Lab.
11
Survey on Partitioning
•
F-M partitioning algorithm is perhaps the most widely adopted algorithm, due to the
linear time complexity, its efficiency and the ease of the implementation. There have
been many enchancement of the algorithm proposed in the past. Both Khrishnamurthy
and Ng, et. al., have reported that the quality of the solutions yielded by the F-M
algorithm is very erratic for circuit partitioning. Subsequently, Krishnamurthy amended
the Fiduccia-Mattheyses implementation with a look-ahead technique which
considerably improved the average performance. Sanchis extended their work to
partition hypergraphs into k partitions. Sechen proposed new improved objective
function for mincut circuit partitioning, based on the statistical model, which estimate
the expected number of net crossings of the cutline. There have been many
improvements of F-M algorithm published, which utilized other techniques such as
clustering, replication and other improvement scheme of the basic F-M heuristic.
SungKyunKwan Univ.
VADA Lab.
12
Survey on Partitioning
•
An important class of partitioning approaches consists of so-called constructive methods,
where methods that are based on graph spectra received the most attention to date. They
use eigenvalues and eigenvectors of matrices derived from the netlist graph. Early
theoretical work by Barnes, Donath, and Hoffman established relationship between the
spectral properties and the partitioning properties of graph. More recently, eigenvector
and eigenvalue methods have been used for both component placement and graph
minimum-width bisection. Hadley et al. used an eigenvector approach for obtaining
good initial partitions of the netlist as a starting solution for iterative improvement
algorithm, which was used afterwards. Hagen and Kahng applied eigenvector
decomposition of graph for solving ratio-cut partitioning problem. They found that the
second smallest eigenvalue of matrix representation of graph yields a lower bound on
the optimal ratio cut cost. The most recent work from Alpert et al. and Chan et. al
showed that more extensive eigenvector computation leads to better partitioning results.
Other approaches to constructive partitioning approaches are based on placement
techniques, vertex orderings and clustering, dynamic and boolean programming and
geometric embeddings.
SungKyunKwan Univ.
VADA Lab.
13
Complexity of Partitioning
In general, computing the optimal
partitioning is an NP-complete problem,
which means that the best known algorithms
take time which is an exponential function of
n=|N| and p, and it is widely believed that no
algorithm whose running time is a
polynomial function of n=|N| and p exists
(see ``Computers and Intractability'', M.
Garey and D. Johnson, W. H. Freeman, 1979,
for details.) Therefore we need to use
heuristics to get approximate solutions for
problems where n is large. The picture below
illustrates a larger graph partitioning problem;
it was generated using the spectral
partitioning algorithm as implemented in the
graph partitioning software by Gilbert et al,
described below. The partition is N = Nblue U
Nblack, with red edges connecting nodes in the
two partitions.
SungKyunKwan Univ.
VADA Lab.
14
Chaco
•
•
Before a calculation can be performed on a parallel computer, it must first be
decomposed into tasks which are assigned to different processors. Efficient use of the
machine requires that each processor have about the same amount of work to do and that
the quantity of interprocessor communication is kept small.
Partitioning G means dividing N into the union of P disjoint pieces
N = N1 U N2 U ... U NP, where the nodes (jobs) in Ni are assigned to be done by
processor Pi. This partitioning is done subject to the optimality conditions below.
–
–
•
1.The sums of the weights Wn of the nodes n in each Ni is approximately equal. This means the
load is approximately balanced across processors.
2.The sum of the weights We of edges connecting nodes in different Ni and Nj should be
minimized. This means that the total size of all messages communicated between different
processors is minimized.
Chaco is used at over 150 institutions for parallel computing, sparse matrix reordering,
circuit placement and a range of other applications. More information about Chaco can
be found at bah@cs.sandia.gov
SungKyunKwan Univ.
VADA Lab.
15
Chaco
•
•
•
•
A good solution to the graph partitioning
problem assigns nodes to processors so
that
1.The sums of node weights are
approximately equal for each processor.
This means that each processor has an
equal amount of floating point work to do,
the the problem is load balanced.
2.As few edges cross processor
boundaries as possible. This minimizes
communication, since each crossing edge
Ai,j means that xj must be sent to the
processor owning xi.
The figure below illustrates such a
partitioning onto 4 processors (colored
blue, red, green and magenta). Crossing
edges, which require communication, are
colored black, and noncrossing edges,
which require no communication, have
the same color as the processor.
SungKyunKwan Univ.
VADA Lab.
16
Chaco
•
Given n nodes and p processors, there
are exponentially many ways to assign
n nodes to p processors, some of which
more nearly satisfy the optimality
conditions than others. To illustrate, the
following figure shows two partitions
of a graph with 8 nodes onto 4
processors, with 2 nodes per processor.
The partitioning on the left has 6 edges
crossing processor boundaries and so is
superior to the partitioning on the right,
with 10 edges crossing processor
boundaries. The reader is invited to find
another 6-edge-crossing partition, and
show that no fewer edge crossings
suffice.
SungKyunKwan Univ.
VADA Lab.
17
Edge Separator and Vertex Separator
Bisecting a graph G=(N,E) can be done in two
ways. In the last section, we discussed finding the
smallest subset Es of E such that removing Es
from E divided G into two disconnected subgraphs
G1 and G2, with nodes N1 and N2 respectively,
where N1 U N2 = N and N1 and N2 are disjoint
and equally large. (If the number of nodes is odd,
we obviously cannot make |N1|=|N2|. So we will
call Es an edge separator if |N1| and |N2| are
sufficiently close; we will be more explicit about
how different |N1| and |N2| can be only when
necessary.) The edges in Es connect nodes in N1
to nodes in N2. Since removing Es disconnects G,
Es is called an edge separator. The other way to
bisect a graph is to find a vertex separator, a
subset Ns of N, such that removing Ns and all
incident edges from G also results in two
disconnected subgraphs G1 and G2 of G. In other
words N = N1 U Ns U N2, where all three subsets
of N are disjoint, N1 and N2 are equally large, and
no edges connect N1 and N2.
SungKyunKwan Univ.
The following figure illustrates these ideas. The
green edges, Es1, form an edge separator, as well
as the blue edges Es2. The red nodes, Ns, are a
vertex separator, since removing them and the
indicident edges (Es1, Es2, and the purple edges),
leaves two disjoint subgraphs.
Theorem. (Tarjan, Lipton, "A separator theorem for planar graphs",
SIAM J. Appl. Math., 36:177-189, April 1979). Let G=(N,E) be an
planar graph. Then we can find a vertex separator Ns, so that N =
N1 U Ns U N2 is a disjoint partition of N, |N1| <= (2/3)*|N|, |N2|
<= (2/3)*|N|, and |Ns| <= sqrt(8*|N|).
VADA Lab.
18
Inertial Partitioning
1. Choose a straight line L, given by
a*(x-xbar)+b(y-ybar) = 0.
This is a straight line through
(xbar,ybar), with slope -a/b. We assume
without loss of generality that
a2 + b2 = 1.
2.For each node ni=(xi,yi), compute a
coordinate by computing the dotproduct Si = -b*(xi-xbar) + a*(yi-ybar).
Si is distance from (xbar,ybar) of the
projection of (xi,yi) onto the line L.
3.Find the median value Sbar of the Si's.
4.Let the nodes (xi,yi) satisfying Si <=
Sbar be in partition N1, and the nodes
where Si > Sbar be in partition N2.
SungKyunKwan Univ.
VADA Lab.
19
Inertial Partitioning
In mathematical terms, we want to pick a line such that the sum of squares of lengths of the green
lines in the figure are minimized; this is also called doing a total least squares fit of a line to the nodes.
In physical terms, if we think of the nodes as unit masses, we choose (x,y) to be the axis about which the
moment of inertia of the nodes is minimized. This is why the method is called inertial partitioning. This
means choosing a, b, xbar and ybar so that a2 + b2 = 1, and the following quantity is minimized:
sumi=1,...,|N| (length of i-th green line)2 = sumi=1,...,|N| ((xi-xbar)2 + (yi-ybar)2 (-b*(xi-xbar) + a*(yi-ybar))2 )
... by the Pythagorean theorem
= a2 * ( sumi=1,...,|N| (xi-xbar)2 )
+ b2 * ( sumi=1,...,|N| (yi-ybar)2 )
+ 2*a*b * ( sumi=1,...,|N| (xi-xbar)*(yi-ybar) ) = a2 * X2 + b2 * Y2 + 2*a*b * XY
= [ a b ] * [ X2 XY ] * [ a ] = [ a b ] * M * [ a ]
[ XY Y2 ] [ b ]
[b]
where X2, Y2 and XY are the summations in the previous lines. One can show that an answer is to
choose xbar = sumi=1,...,|N| xi / |N|, ybar = sumi=1,...,|N| yi / |N|, i.e. (xbar,ybar) is the "center of
mass" of the nodes, and (a,b) is the unit eigevector corresponding to the smallest eigenvalue of the
matrix M.
SungKyunKwan Univ.
VADA Lab.
20
Partitioning on Planar graph
•
A very simple partitioning algorithm is based
on breadth first search (BFS) of a graph. It is
reasonably effective on planar graphs, and
probably does well on overlap graphs as
defined above. Given a connected graph
G=(N,E) and a distinguished node r in N we
will call the root, breadth first search
produces a subgraph T of G (with the same
nodes and a subset of the edges), where T is a
tree with root r. In addition, it associates a
level with each node n, which is the number
of edges on the path from r to n in T. The
implementation requires a data structure
called a Queue, or a First-In-First-Out (FIFO)
list. It will contain a list of objects to be
processed. There are two operations one can
perform on a Queue. Enqueue(x) adds an
object x to the left end of the Queue.
y=Dequeue() removes the rightmost entry of
the Queue and returns it in y. In other words,
if x1, x2, ..., xk are Enqueued on the Queue in
that order, then k consecutive Dequeue
operations (possibly interleaved with the
Enqueue operations) will return x1, x2, ... , xk.
•
SungKyunKwan Univ.
NT = {(r,0)}
... Initially T is just the root r,
... which is at level 0
ET = empty set
... T = (NT, ET) at each stage of the algorithm
Enqueue((r,0))
... Queue is a list of nodes to be processed
Mark r
... Mark the root r as having been processed
While the Queue is nonempty
... While nodes remain to be processed
(n,level) = Dequeue()
... Get a node to process
For all unmarked children c of n
NT = NT U (c,level+1)
... Add child c to the list of nodes NT of T
ET = ET U (n,c)
... Add the edge (n,c) to the edge list ET of
T
Enqueue((c,level+1))
... Add child c to Queue for later
processing
Mark c
... Mark c as having been visited
End for
End while
VADA Lab.
21
Breadth First Search
•
•
Partitioning the graph into nodes at level L
or lower, and nodes at level L+1 or higher,
guarantees that only tree and interlevel
edges will be cut. There can be no "extra"
edges connecting, say, the root to the leaves
of the tree. This is illustrated in the above
figure, where the 10 nodes above the dotted
blue line are assigned to partition N1, and
the 10 nodes below the line as assigned to
N2.
For example, suppose one had an n-by-n
mesh with unit distance between nodes.
Choose any node r as root from which to
build a BFS tree. Then the nodes at level L
and above approximately form a diamond
centered at r with a diagonal of length 2*L.
This is shown below, where nodes are
visited counterclockwise starting with the
north.
SungKyunKwan Univ.
VADA Lab.
22
Kernighan and Lin Algorithm
•
•
B. Kernighan and S. Lin ("An effective heuristic
procedure for partitioning graphs", The Bell
System Technial Journal, pp. 291--308, Feb 1970),
which takes O(|N|3) time per iteration. A more
complicated and efficient implementation, which
takes only O(|E|) time per iteration, was presented
by C. Fiduccia and R. Mattheyses, "A linear-time
heuristic for improving network partitions",
Technical Report 82CRD130, General Electric Co.,
Corporate Research and Development Ceter,
Schenectady, NY 1982.
We start with an edge weighted graph
G=(N,E,WE), and a partitioning G = A U B into
equal parts: |A| = |B|. Let w(e) = w(i,j) be the
weight of edge e=(i,j), where the weight is 0 if no
edge e=(i,j) exists. The goal is to find equal-sized
subsets X in A and Y in B, such that exchanging X
and Y reduces the total cost of edges from A to B.
More precisely, we let T = sum[ a in A and b in
B ] w(a,b) = cost of edges from A to B and seek X
and Y such that new_A = A - X U Y and new_B
= B - Y U X has a lower cost new_T. To compute
new_T efficiently, we introduce:
SungKyunKwan Univ.
E(a) = external cost of a = sum[ b in B ] w(a,b)
I(a) = internal cost of a = sum[ a' in A, a'!=a]w(a,a')
D(a) = cost of a = E(a) - I(a) and analogously
E(b) = external cost of b = sum[ a in A ] w(a,b)
I(b) = internal cost of b = sum[ b' in B, b' !=b]w(b,b')
D(b) = cost of b = E(b) - I(b)
Then it is easy to show that swapping a in A and b in
B changes T to new_T = T - ( D(a) + D(b) 2*w(a,b) ) = T - gain(a,b)
In other words, gain(a,b) = D(a)+D(b)-2*w(a,b)
measures the improvement in the partitioning by
swapping a and b. D(a') and D(b') also change to
new_D(a') = D(a') + 2*w(a',a) - 2*w(a',b)
for all a' in A, a' !=a
new_D(b') = D(b') + 2*w(b',b) - 2*w(b',a)
for all b' in B, b' != b
VADA Lab.
23
SungKyunKwan Univ.
VADA Lab.
24
Kernighan and Lin Algorithm
(0) Compute T = cost of partition N = A U B
... cost = O(|N|2)
Repeat
(1)
Compute costs D(n) for all n in N
... cost = O(|N|2)
(2)
Unmark all nodes in G
... cost = O(|N|)
(3)
While there are unmarked nodes
... |N|/2 iterations
(3.1)
Find an unmarked pair (a,b) maximizing
gain(a,b)
... cost = O(|N|2)
(3.2)
Mark a and b (but do not swap them)
... cost = O(1)
(3.3)
Update D(n) for all unmarked n, as
though a
and b had been swapped
... cost = O(|N|)
End while
SungKyunKwan Univ.
... At this point, we have computed a sequence of
pairs
... (a1,b1), ... , (ak,bk) and
... gains gain(1), ..., gain(k)
... where k = |N|/2, ordered by the order in
which
... we marked them
(4) Pick j maximizing Gain = sumi=1...j gain(i)
... Gain is the reduction in cost from swapping
... (a1,b1),...,(aj,bj)
(5) If Gain > 0 then
(5.2)
Update A = A - {a1,...,ak} U {b1,...,bk}
... cost = O(|N|)
(5.2)
Update B = B - {b1,...,bk} U {a1,...,ak}
... cost = O(|N|)
(5.3)
Update T = T - Gain
... cost = O(1)
End if
Until Gain <= 0
VADA Lab.
25
Spectral Partitioning
•
•
•
•
This is a powerful but expensive technique,
based on techniques introduced by Fiedler in
the 1970s, but popularized in 1990 by A.
Pothen, H. Simon, and K.-P. Liou,
"Partitioning sparse matrices with
eigenvectors of graphs", SIAM J. Matrix
Anal. Appl., 11:430--452. We will first
describe the algorithm, and then give three
related justifications for its efficacy. Let
G=(N,E) be an undirected, unweighted
graph without self edges (i,i) or multiple
edges from one node to another. We define
two matrices related to this graph.
Definition The incidence matrix In(G) of G
is an |N|-by-|E| matrix, with one row for each
node and one column for each edge.
Suppose edge e=(i,j). Then column e of In(G)
is zero except for the the i-th and j-th entries,
which are +1 and -1, respectively.
SungKyunKwan Univ.
Note that there is some ambiguity in this
definition, since G is undirected; writing edge
e=(i,j) instead of (j,i) is equivalent to
multiplying
column e of In(G) by -1. We will see that this
ambiguity will not be important to us.
Definition The Laplacian matrix L(G) of G is
an |N|-by-|N| symmetric matrix, with one row
and column for each node. It is defined as
follows.
(L(G))(i,j) = degree of node i if i=j (number
of incident edges) = -1 if i!=j and there is an
edge (i,j)
VADA Lab.
26
Spatial Locality: Hardware Partitioning
•
•
•
•
•
•
•
The interface logic should be properly partitioned for area and timing reasons. Minimization
of global busses leads to lower bus capacitance, and thus lower interconnect power.
Signal values within the clusters tend to be more highly correlated.
Data path should be partitioned into approximately equal size.
In the DSP area, data paths tens to occupy far more area than the control paths.
Wiring is still one of the domain area consumers
The method used to identify clusters is based on the eigenvalues and eigenvectors of the
Laplacian of the graph.
The eigen vector corresponding to the second smallest eigen value provides a 1-D
placement of the nodes which minimizes the mean-squared connection length.
SungKyunKwan Univ.
VADA Lab.
27
Spectral Partitioning in VLSI placement
SungKyunKwan Univ.
VADA Lab.
28
Spectral Partitioning in VLSI placement
•
Setting the derivative of the Lagrangian, L, to zero gives:
(Q  I ) x  0
•
•
The solution to the above equation are those is the eigenvalue and x is the corresponding
eigenvector.

The smallest eigenvalue 0 gives a trivial solution with all nodes at the same point. The
eigenvector corresponding to the second smallest eigenvalue minimizes the cost function
while giving a non-trivial solution
SungKyunKwan Univ.
VADA Lab.
29
Key Ideas in Spectral Partitioning
SungKyunKwan Univ.
VADA Lab.
30
Spectral Partitioning
SungKyunKwan Univ.
VADA Lab.
31
Spectral Partitioning
The following theorem state some important
facts about In(G) and L(G). It introduces us to
the idea that the eigenvalues and eigen
vectors of L(G) are related to the connectivity
of G.
Theorem 1. Given a graph G, its associated
matrices In(G) and L(G) have the following
properties.
1.L(G) is a symmetric matrix. This means
the eigenvalues of L(G) are real, and its
eigenvectors are real and orthogonal.
2.Let e=[1,...,1]', where ' means transpose,
i.e. the column vector of all ones. Then
L(G)*e = 0.
3.In(G)*(In(G))' = L(G). This is
independent of the signs chosen in each
column of In(G).
4.Suppose L(G)*v = lambda*v, where v is
nonzero. Then
SungKyunKwan Univ.
norm(In(G)'*v)2
lambda =
-----------------norm(v)2
where norm(z)2 = sumi z(i)2
= sum{all edges e=(i,j)} (v(i)-
v(j))2
---------------------------------sumi v(i)2
5. The eigenvalues of L(G) are
nonnegative:
0 <= lambda1 <= lambda2 <= ... <=
lambdan
6.The number of of connected
components of G is equal to the number
of lambdai) equal to 0.
In particular, lambda2 != 0 if and only if
G is connected.
VADA Lab.
32
Spectral Partitioning
Compute the eigenvector v2
corresponding to lambda2 of L(G)
for each node n of G
if v2(n) < 0
put node n in partition Nelse
put node n in partition N+
endif
endfor
First we show that this partition is at least
reasonable, because it tends to give
connected components N- and N+:
Theorem 2. (M. Fiedler, "A property of
eigenvectors of nonnegative symmetric
matrices and its application to graph
theory", Czech.Math. J. 25:619--637,
1975.) Let G be connected, and N- and
N+ be defined by the above algorithm.
Then N- is connected. If no v2(n) = 0,
N+ is also connected.
SungKyunKwan Univ.
There are a number of reasons lambda2 is called
the algebraic connectivity. Here is another.
Theorem 3. (Fiedler). Let G=(N,E) be a graph,
and G1=(N,E1) a subgraph, i.e. with the same
nodes and subset of the edges, so that G1 is "less
connected" than G. Then lambda2(L(G1)) <=
lambda2(L(G)), i.e. the algebraic connectivity of
G1 is also less than or equal to the algebraic
connectivity of G.
Motivation for spectral bisection, by analogy with
a vibrating string
How does a taut string vibrate when it is plucked?
From our background in either physics or music,
we know that it has certain modes of vibration or
harmonics. If we were to take snapshots of these
modes, they would look like this:
VADA Lab.
33
Spectral Partitioning
SungKyunKwan Univ.
VADA Lab.
34
Multilevel Kernighan-Lin
Given a matching, Gc is computed as follows.
We let there be a node r in Nc for each edge in
Gc is computed in step (1) of
Recursive_partition as follows. We define a Em. Then we construct Ec as follows:
matching of a graph G=(N,E) as a subset
Em of the edges. E with the property that no for r = 1 to |Em| ... for each node in Nc
let (i,j) be the edge in Em corresponding to
two edges in Em share an endpoint. A
node r
maximal matching is one to which no more
for each other edge e=(i,k) in E incident on i
edges can be added and remain a matching.
let ek be the edge in Em incident on k, and
We can compute a maximal matching by a
let rk be the corresponding node in Nc
simple random algorithm:
add the edge (r,rk) to Ec
end for
let Em be empty
for each other edge e=(j,k) in E incident on j
mark all nodes in N as unmatched
let ek be the edge in Em incident on k, and
for i = 1 to |N| ... visit the nodes in a
let rk be the corresponding node in Nc
random order
add the edge (r,rk) to Ec
if node i has not been matched,
end for
choose an edge e=(i,j) where j is also
unmatched,
end for
and add it to Em
if there are multiple edges between pairs of
mark i and j as matched
nodes of Nc, collapse them into single edges
end if
end for
SungKyunKwan Univ.
VADA Lab.
35
Multilevel Kernighan-Lin
Note that we can take node weights into
account by letting the weight of a node (i,j)
in Nc be the sum of the weights of the
nodes I and j. We can similarly take edge
weights into account by letting the weight
of an edge in Ec be the sum of the weights
of the edges "collapsed" into it.
Furthermore, we can choose the edge (i,j)
which matches j to i in the construction of
Nc above to have the large weight of all
edges incident on i; this will tend to
minimize the weights of the cut edges. This
is called heavy edge matching in METIS,
and is illustrated on the right.
SungKyunKwan Univ.
VADA Lab.
36
Multilevel Kernighan-Lin
Given a partition (Nc+,Nc-) from step
(2) of Recursive_partition, it is
easily expanded to a partition
(N+,N-) in step (3) by associating
with each node in Nc+ or Nc- the
nodes of N that comprise it. This is
again shown below:
Finally, in step (4) of
Recurive_partition, the
approximate partition from step (3)
is improved using a variation of
Kernighan-Lin.
SungKyunKwan Univ.
VADA Lab.
37
Multilevel Spectral Partitioning
Now we turn to the divide-and-conquer
algorithm of Barnard and Simon, which is
based on spectral partitioning rather than
Kernighan-Lin. The expensive part of
spectral bisection is finding the eigenvector
v2, which requires a possibly large number
of matrix-vector multiplications with the
Laplacian matrix L(G) of the graph G. The
divide-and-conquer approach of
Recursive_partition will dramatically
decrease the cost. Barnard and Simon
perform step (1) of Recursive_partition,
computing Gc = (Nc,Ec) from G=(N,E),
slightly differently than above: They find a
maximal independent subset Nc of N. This
means that N contains Nc and E contains
Ec, no nodes in Nc are directly connected
by edges in E (independence), and Nc is as
large as possible (maximality).
SungKyunKwan Univ.
There is a simple "greedy" algorithm for
finding an Nc:
Nc = empty set
for i = 1 to |N|
if node i is not adjacent to any node
already in Nc
add i to Nc
end if
end for
This is shown below in the case where G is
simply a chain of 9 nodes with nearest
neighbor connections, in which case Nc
consists simply of every other node of N.
VADA Lab.
38
hMETIS
•
•
•
hMETIS is a set of programs for partitioning hypergraphs such as those
corresponding to VLSI circuits. The algorithms implemented by hMETIS are
based on the multilevel hypergraph partitioning scheme described in
[KAKS97].
hMETIS produces bisections that cut 10% to 300% fewer hyperedges than
those cut by other popular algorithms such as PARABOLI, PROP, and CLIPPROP, especially for circuits with over 100,000 cells, and circuits with nonunit cell areaIt is extremely fast!A single run of hMETIS is faster than a single
run of simpler schemes such as FM, KL, or CLIP. Furthermore, because of its
very good average cut characteristics, it produces high quality partitionings in
significantly fewer runs. It can bisect circuits with over 100,000 vertices in a
couple of minutes on Pentium-class workstations.
The performance of hMETIS on the new ISPD98 benchmark suite can be
found in the paper by Chuck Alpert.
 http://www.users.cs.umn.edu/~karypis/metis/metis.html
SungKyunKwan Univ.
VADA Lab.
39
How good is Recursive Bisection?
•
•
Horst D. Simon and Shang-Hua Teng , Report RNR-93-012, August 1993
The most commonly used p-way partitioning method is recursive bisection. It
first "optimally" divides the graph (mesh) into two equal sized pieces and then
recursively divides the two pieces.We show that,due to the greedy nature and
the lack of global information,recursive bisection, in the worst case,may
produce a partition that is very far from the optimal one. Our negative result is
complemented by two positive ones.First, we show that for some important
classes of graphs that occur in practical applications,such as well shaped finite
element and finite difference meshes,recursive bisection is normally within a
constant factor of the optimal one. Secondly,we show that if the balanced
condition is relaxed so that each block in the partition is bounded by
(1+e)n/p,then there exists a approximately balanced recursive partitioning
scheme that finds a partition whose cost is within an 0(log p) factor of the cost
of the optimal p-way partition.
SungKyunKwan Univ.
VADA Lab.
40
Partitioning Algorithm with Multiple
Constraints
1998. 5. 19
조준동
SungKyunKwan Univ.
VADA Lab.
41
Partitioning with pin and area constraints
회로가 그래프 G(V,E)로 표현될 때, V는 n개의 노드를 갖는 전체 노드의 집합으로 V = v_1 ,
v_2 , …, v_n 이며 각 노드는 면적 a_i를 갖는다. 간선 e_ij는 노드 v_i와 v_j를 연결한다. E는
전체 노드간의 간선들의 집합이다. 그래프 분할은 전체 노드의 집합을 서로 겹치지 않는 k 개
의 블록 V1,V2 ... ,Vk으로 나누는 것이다. 이때 각 블럭들은 각각의 면적 A1, A2, ... ,Ak 및 각
각의 블록의 핀 개수인 P1,P2, …, Pk를 가지고 있다. 각각의 블럭은 면적과 핀을 비롯한 여러
가지 제약조건들을 가지고 있다. 각 블록이 가질 수 있는 최대 면적은 A_upper이고 최소 면
적은 A_lower, 최대 핀의 개수는 P_upper이다. 또 C_ij는 블록 Vi와 Vj사이를 연결하는 간선
들의 가중치의 합이다. 분할 결과는 이러한 제약조건들을 만족시키면서 각 블록들간을 연결
하는 간선의 가중치가 적어지도록 만드는 것이다. k개의 부그래프의 집합을 K라고 할 때, 분
할은 제약조건들을 만족시키며 다음의 목적함수를 최소화시키는 최적의 매핑 Γ:V-> K 를 찾
는 것이다.
k
k
W   Cij (i  j )
i 1 j 1
Alower  Ai  Aupper, Pi  Pupper,1  i  k
SungKyunKwan Univ.
VADA Lab.
42
스위칭에 의한 충전과 방전
• 전체 전력소모의 최대 90%까지 차지
Vdd
PMOS
pull-up
network
charge
discharge
NMOS
pull-up
network
CL
short
circuit +
leakage
SungKyunKwan Univ.
VADA Lab.
43
저전력을 위한 분할
• 기존의 방법 : cut을 지나가는 간선의 수
• 저전력 : 간선의 스위칭 동작의 수
0.25
0.75
0.25
0.75
0.25
( a ) cut ÀÇ ¼ö·Î ÀÚ¸§
SungKyunKwan Univ.
0.25
( b ) ½ºÀ§Äª µ¿ÀÛÀÇ ¼ö·Î
ÀÚ¸§
VADA Lab.
44
최소비용흐름 알고리즘
• 주어진 양을 가장 적은 비용으로 원하는 목적지까지 보낼수 있는
방법
– 각 통로는 용량과 비용을 가짐
• Max-flow min-cut : 간선의 수만 고려
• Min-Cost flow : 간선마다 스위칭 동작의 가중치를 부여
– 비용 : 스위칭 동작 vs. 간선의 수
– 용량 : 간선에 흐를 수 있는 최대양
• 비용이 적을수록 선택되도록 큰 용량
Wi  Si  (1   )Ci
SungKyunKwan Univ.
VADA Lab.
45
Network and Mincost Flow
15 / 30
45 / 55
100 / 10
20 / 10
23 / 11
10 / 100
30 / 24
100 / 10
7 / 80
10 / 100
45 / 55
1/ 5
10 / 35
23 / 11
15 / 30
6 / 100
3/5
1 / 10
10 / 35
100 / 10
SungKyunKwan Univ.
VADA Lab.
46
그래프 변환 알고리즘
• Min-Cost Flow
경로를 찾음
• Cut 을 찾기 위해서
그래프의 변환이 필요
• 레벨에 따른
topological 정렬
Level 5
Level 4
Level 3
Level 2
Level 1
SungKyunKwan Univ.
VADA Lab.
47
그래프 변환 알고리즘
• 추가된 노드 및 간선
Level ( i+1 )
Sink
Source
Level ( i )
»õ·Î »ý¼ºµÈ °£¼±
±âÁ¸ÀÇ
SungKyunKwan Univ.
°£¼±
»õ·Î »ý¼ºµÈ ³ëµå
±âÁ¸ÀÇ ³ëµå
VADA Lab.
48
그래프 변환
Level 5
Level 4
S
Level 3
T
sink
Source
Level 2
Level 1
SungKyunKwan Univ.
VADA Lab.
49
Algorithm
Input: Flow f, Network
Output: Partition the network into f subnetworks
단계 1:
그래프에 Flow 를 push하여 최소비용흐름 알고리즘 수행;
만약 각각의 partition에 대하여 A_upper 또는 P_upper를 만족하면 마침;
그렇지않으면 f = f+1; 증가시키고 upper bound를 만족할 때까지
단계 1을 반복한다.
단계 2:
만약 A_lower 또는 P_lower를 만족하지 않는두개의 partition p, q 가 있고
Alower  Ap  Aq  Aupper
Plower  Pp  Pq  Pupper
라면 p와 q는 merge가 가능하고 모든 가능한{p,q} set에 대하여 최소비용매칭을 적용하
여 분할된 partition의 개수를 줄임.
SungKyunKwan Univ.
VADA Lab.
50
참고문헌
[1] J.D.Cho and P.D.Franzon, "High-Performance Design Automation for Multi-Chip Modules and
Packages", World Scientific Pub. Co. 1996
[2] H.J.M.Veendrick, "Short-Circuit Dessipation of Static CMOS Circuitry and its Impact on the Design of
Buffer Circuits" IEEE JSSCC, pp.468-473, August, 1984
[3] H.B.Bakoglu, "Circuits, Interconnections and Packaging for VLSI", pp.81-112, Addison-Wesley
Publishing Co., 1990
[4] K.M.hall. "An r-dimensional quadratic placement algorithm", Management Sci., vol.17, pp.219-229,
Nov, 1970
[5] Cadence Design Systems. "A Vision for Multi-Chip Module design in the nineties", Tech. Rep.
Cadence Design Systems Inc., Santa Clara, CA, 1993
[6] R.Raghavan, J.Cohoon, and S.Shani. "Single Bend Wiring", Journal of Algorithms, 7(2):232-257, June,
1986
[7] Kernighan, B.W. and S.lin. "An efficient heuristic procedure to partition graphs" Bell System
Technical Journal, 492:291-307, Feb. 1970
[8] Wei, Y.C. and C.K.Cheng "Ratio-Cut Partitioning for Hierachical Designs", IEEE Trans. on ComputerAided Design. 40(7):911-921, 1991
[9] S.W.Hadley, B.L.Mark, and A.Vanelli, "An Efficient Eigenvector Approach for Finding Netlist
Partitions", IEEE Trans. on Computer-Aided Design, vol. CAD-11, pp.85-892, July, 1992
[10] L.R.Fold, Jr. and D.R.Fulkerson. "Flows in Networks", Princeton University Press, Princeton, NJ,
1962
[11] Liu H. and D.F.Wong, "Network Flow Based Multi-Way Partitioning With Area and Pin Constraints",
IEEE/ACM Symposium on Physical Design, pp. 12-17, 1997
[12] Kirkpatrick, S. Jr., C.Gelatt, and M.Vecchi. "Optimization by simulated annealing", Science,
220(4598):498-516, May, 1983
[13] Pedram, M. "Power Minimization in IC Design: Principles and Applications," ACM Trans. on Design
Automation of Electronics Systems, 1(1), Jan. pp. 3-56, 1996.
[14] A.H.Farrahi and M.Sarrafzadeh. "FPGA Technology Mapping for Power Minimizatioin", In
International Workshop on Field-Programmable Logic and Applications, pp66-77, Sep. 1994
[15] M.A.Breur, "Min-Cut Placement", J.Design Automation and Fault-Tolerant Computing, pp.343-382,
Oct. 1977
SungKyunKwan Univ.
VADA Lab.
51
[16] M.Hanan and M.J.Kutrzberg. A Review of the Placement and the Quadratic Assignment
Problem, Apr. 1072.
[17] N.R.Quinn, "The Placement Problem as Viewed from the Physics of Classical
Mechanics", Proc. of the 12th Design Automation Conference, pp.173-178, 1975
[18] C.Sehen, and A.Sangiovanni-Vincentelli, "The Timber Wolf placement and routing
package", IEEE Journal of Solid-State Circuits, Sc-20, pp.501-522, 1985
[19] K.Shahookar, and P.Mazumder, "A Genetic Approach to Standard Cell Placement", First
European Design Automation Conference, Mar. 1990
[20] J.D.Cho, S.Raje, M.Sarrafzadeh, M.Sriram, and S.M.Kang, "Crosstalk Minimum Layer
Assignment", In Proc. IEEE Custom Integr. Circuits Conf., San Diego, CA, pp.29.7.1-29.7.4,
1993
[21] J.M.Ho, M.Sarrafzadeh, G,Vijayan, and C.K.Wong. "Layer Assignment for Multi-Chip
Modules", IEEE Trans. on Computer-Aided Design, CAD-9(12):1272-1277, Dec., 1991
[22] G.Devaraj. "Distributed placement and crosstalk driven router for multichip modules", In
MS Thesis, Univ. of Cincinnati, 1994
[23] J.D.Cho. "Min-Cost Flow based Minimum-Cost Rectilinear Steiner Distance-Preserving
Tree", International Symposium on Physical Desigh, pp-82-87, 1997
[24] A.Vitttal and M.Marek-Sadowska. "Minimal Delay Interconnection Design using
Alphabetic Trees", In Design Automation Conference, pp.392-396, 1994
[25] M.C.Golumbic. "Algorithmic Graph Theory and Perfect Graph", pp.80-103, New York :
Academic. 1980
[26] R.Vemuri. "Genetic Algorithms for partitioning, placement, and layer assignment for
multichip modules", Ph.D. Thesis, Univ. of Cincinnati, 1994
[27] J.L.Kennington and R.V.Helgason, "Algorithms for Network Programmin", John Wiley,
1980
[28] J.Y.Cho and J.D.Cho "Improving Performance and Routability Estimation in MCM
Placement", In InterPack'97, Hawaii, June, 1997
[29] J.Y.Cho and J.D.Cho "Partitioning for Low Power Using Min-Cost Flow Algorithm",
submitted to 한국반도체학술대회, Feb, 1998
SungKyunKwan Univ.
VADA Lab.
52
Download