Report - CAE Users

advertisement
Sean Griffin
901-688-4075
Page 1
2/17/2016
ECE 556
Two-Way Partitioning with Stable Performance
Abstract:
Traditional Kernighan-Lin based partitioning algorithms produce very good
results with some initial groupings. However, the results often degenerate with less
optimal initial groupings. If the circuit is first partitioned into a small number of tightly
connected clusters, the clusters can then be partitioned instead of the entire circuit. This
greatly reduces the complexity of the partitioning problem. A good partitioning solution
can then be found with fewer computations.
Motivation:
The Kernighan-Lin algorithm has been shown to be highly sensitive to the initial
partitioning. A common method of ensuring that a good solution is found by KernighanLin based algorithms is to simply run the algorithm many times with various random
starting positions. However, this requires significantly more computation. Additionally,
the probability of finding a global minimum solution with a given starting condition falls
exponentially as the size of the chip increases, thus the polynomial time algorithm is
reduced to an exponential time one.
If time were spent finding an optimal initial partitioning for Kernighan-Lin, only
one run of the algorithm would be required. This initial partitioning could be obtained
using clustering. If the circuit is divided into a small number of tightly connected
clusters, the clusters can then be sorted at a very small cost. Also, due to the small
number of clusters, the probability of finding a global minimum cut between the clusters
is high. The sorted clusters can then be used as a starting position for the full
partitioning.
Methods:
Ratio Cut:
The algorithm used to divide a circuit into tightly connected clusters is the ratio
cut. This algorithm uses a cost function CAB/(|A| * |B|) where C is the cut size and |A| is
the cardinality of the set A. If sets A and B are the same size, the product of their sizes is
the maximum. Thus, the cost function favors more equal cuts and thus the ratio cut
usually generates a more even partitioning than a min-cut algorithm. However, the
relative sizes of the partitions need not be equal.
Sean Griffin
901-688-4075
Page 2
2/17/2016
ECE 556
The algorithm for finding the ratio cut has three parts as follows:
Initialization:
Input: a network N = (V, E).
Output: an initial partitioning.
1) Randomly choose a module s in V. Find the other seed module t at the end of a
longest path by breadth-first search starting from s. Let X = {s}, and Y = V – {s,
t}.
2) Choose a module i in Y whose movement to X will generate the best ratio among
all the other competing modules. Move module i from Y to X; update X = X U
{i}, and Y = Y – {i}.
3) Repeat steps 2 until Y = Ø.
4) Repeat steps 2 and 3 with X = {t} and Y = V – {s, t} until Y = Ø.
5) The cut giving the minimum ratio found in the first procedure forms the initial
partitioning.
6)
Iterative Shifting:
Input: an initial partitioning found in the direction from s to t.
Output: a new partitioning if the ratio of the initial partitioning has been reduced.
1) Repeat right shifting operations until all the modules are exhausted.
2) Choose the minimal ratio obtained in step 1. If the new ratio value is reduced
from step 1, then the cut producing the ratio forms a new starting partition;
otherwise, output the previous partition and exit the process.
3) Repeat steps 1 and 2 with left shifting operations.
4) Repeat steps 1-3.
Group Swapping:
Input: a partitioning from the iterative shifting phase.
Output: the final partitioning result.
1) Calculate the ratio gain r(i) for every module i, and set all modules to the
"unlocked" state.
2) Select an unlocked module i with the largest ratio gain from the two subsets.
3) Move module i to the other side, and lock it.
4) Update the ratio gains of the remaining affected and unlocked modules.
5) Repeat steps 2-4 until all modules are locked.
6) If the largest accumulated ratio gain during this process is positive, swap the
group of modules corresponding to the largest gain.
7) Repeat steps 1-6 until the accumulated ratio gain in step 6 is nonpositive.
Sean Griffin
901-688-4075
Page 3
2/17/2016
ECE 556
Fiduccia-Mattheyses Partitioning:
A Kernighan-Lin algorithm which partitions a circuit into two sets by moving one
node from one set to the other in each pass. Since only one node is moved per pass, the
circuit can become unbalanced, and therefore a balance condition is set. In order for a
node to be moved from one partition to the other, the balance condition must be met. The
balance condition is r = |A|/(|A| + |B|) where r is the ratio and is user defined.
The Fiduccia-Mattheyses Partitioning algorithm:
1. Compute the gains of all cells.
2. Select the cell with the maximum gain that satisfies the balance criterion.
3. Move the cell to the other side and lock it. Update the gains of the rest of the
cells.
4. If there are remaining cells which are unlocked and satisfy the balance condition
goto 2.
5. Select the sequence of moves which resulted in the best partial sum.
6. If the best partial sum was positive, then unlock all the cells and goto step 1.
The Stable Two-Way Partitioning Algorithm:
First partition the Network into approximately g partitions using ratio cut recursively.
Then form a contracted network from these clusters, where each cluster is represented by
a node, and the edges between the clusters are represented by edges between nodes.
Partition these clusters using the specified balance conditions with Fiduccia-Mattheyses
partitioning. Since the number of clusters is small, and the Fiduccia-Mattheyses
algorithm is dependant on the initial partitioning, try the algorithm a number of times
with various starting partitions. Finally, convert the contracted network back to the full
network and apply Fiduccia-Mattheyses once.
Given: a network N = (V, E), an integer g, an integer num_of_reps, and a ratio r.
1) Initialize Ψ = {V}. Calculate the total circuit size tsize.
2) Let V* be a subset in Ψ such that |V*| = maxVi ε Ψ |Vi|. While |V*| > tsize/g,
repeat (3).
3) Set Ψ = Ψ – {V*}. Apply the ratio cut algorithm to V* to get a cut (A, A'} where
V* = A U A'. Set Ψ = Ψ U {A, A'}.
4) Construct a contracted network H = {V', H'}.
5) Apply number_of_reps times of the Fiduccia-Mattheyses algorithm to H with the
size constrait r.
7. Use the best step from step 5 to partition the network N. Apply the FiducciaMattheyses algorithm once to N with size constraint r.
Sean Griffin
901-688-4075
Page 4
2/17/2016
ECE 556
Program Structure:
The program uses three different classes, the main Partition class, a FM class
and a RC class.
Global Structures
The two structures below are used to keep track of the network and are passed to the two
classes to do the work.
struct Node
{
int index;
int* edges;
int numEdges;
int superNode;
int tempIndex;
};
struct Edge
{
int index;
int weight;
int node[2];
int tempIndex;
};
The enumeration Partition is used for clarity.
enum Partition
{
A, B, UNKNOWN, BOTH, IGNORE
};
Main (Partition) class:
The Partition class first reads in the nodes from the file. It forms an array of Node
and Edge objects. Portions of this array are then passed to the ratio cut algorithm which
sorts the array and returns indexes of the start of each partition. The SuperNode value for
each of the nodes is set accordingly. The partitions are split until all are smaller than the
required size. A second array of nodes is then formed. Each node in the secord array
represents all nodes in the first array with the supernode value corresponding to the nodes
array index. A second array of edges is formed. The new arrays are passed to the FM
class to sort with various starting positions. Finally the best of the SuperNode
partitionings is used to sort the origional array of nodes. The original array is then passed
to the FM algorithm to sort into final partitions.
Sean Griffin
901-688-4075
Page 5
2/17/2016
ECE 556
RC class:
The RC class has 3 structures:
struct RCNode
{
Node node;
bool locked;
};
struct RCEdge
{
Edge edge;
Partition part;
};
struct RCLinkedList
{
int listItem;
RCLinkedList* prev;
RCLinkedList* next;
};
The RCNode and RCEdge structures keep track of extra information related to the
RC algorithm, and the linked list structure allows RCNodes and RCEdges to be formed
into lists easily. The class has a number of functions for manipulating linked lists.
Additionally the linked lists are often used as sets, and thus a few set functions are
defined for the linked lists.
It was noted that the initialization and iterative shifting phases of the Ratio Cut
algorithm are simply special cases of the group swapping portion of the algorithm. Thus,
the RC class has a function called ShiftEngine with parameters to specify the direction
shifting should occur in, if shifting in both directions is allowed, and if the shift should be
performed once or until there is no gain. The Initialization, IterativeShifting, and
GroupSwapping algorithms simply call the ShiftEngine with the appropriate parameters.
FM class:
The FM class has 2 structures:
struct FMNode
{
Node node;
bool locked;
Partition part;
int gain;
};
struct FMEdge
{
Edge edge;
Partition part;
};
These structures keep track of information used by the Fiduccia-Mattheyses algorithm.
Sean Griffin
901-688-4075
Page 6
2/17/2016
ECE 556
Results and Discussion:
The algorithm was evaluated using the test files provided for KL partitioning, and
a couple files generated specifically to test the stable algorithm. The ratio cut portion of
the program tended to perform somewhat poorly on the KL test files, often choosing a
partitioning with one node on one side, and the rest on the other. This is likely because
the files were generated with fairly large weights relative to the number of nodes, and no
natural clusters. The files I generated to test the algorithm had a small number of tightly
connected nodes, which were easily identified by the ratio cut algorithm. Neither set of
test files however is likely to very closely mimic real networks.
The overall algorithm tended to achieve results similar to or slightly better than
KL with both sets of files. Because the goal was to achieve stability, and not necessarily
performance, the RC algorithm is not nearly as optimized as it should be. Therefore my
program tended to have very long run times with circuits larger than about 100 nodes.
This made it difficult to test the performance vs. KL in larger circuits which is where the
algorithm is actually supposed to achieve better results. The algorithm did prove to be
very stable, finding similar cut sizes, and often even the same final partitions with
different input orderings.
Reference:
Cheng, Chung-Kuan. and Wei, Yen-Chuen A. An Improved Two-Way Partitioning
Algorithm with Stable Perforamnce. IEEE Trans CAD Vol 10. NO. 12, Dec. 1991
Cheng, Chung-Kuan. and Wei, Yen-Chuen. Ratio Cut Partitioning for
Hierarchical Designs. IEEE Trans CAD Vol 10. NO. 7, July 1991.
Sait, Sadiq M. and Youssef, Habib. VLSI Physical Design Automation: Theory and
Practice. World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore. 2004.
Download