Sean Griffin 901-688-4075 Page 1 2/17/2016 ECE 556 Two-Way Partitioning with Stable Performance Abstract: Traditional Kernighan-Lin based partitioning algorithms produce very good results with some initial groupings. However, the results often degenerate with less optimal initial groupings. If the circuit is first partitioned into a small number of tightly connected clusters, the clusters can then be partitioned instead of the entire circuit. This greatly reduces the complexity of the partitioning problem. A good partitioning solution can then be found with fewer computations. Motivation: The Kernighan-Lin algorithm has been shown to be highly sensitive to the initial partitioning. A common method of ensuring that a good solution is found by KernighanLin based algorithms is to simply run the algorithm many times with various random starting positions. However, this requires significantly more computation. Additionally, the probability of finding a global minimum solution with a given starting condition falls exponentially as the size of the chip increases, thus the polynomial time algorithm is reduced to an exponential time one. If time were spent finding an optimal initial partitioning for Kernighan-Lin, only one run of the algorithm would be required. This initial partitioning could be obtained using clustering. If the circuit is divided into a small number of tightly connected clusters, the clusters can then be sorted at a very small cost. Also, due to the small number of clusters, the probability of finding a global minimum cut between the clusters is high. The sorted clusters can then be used as a starting position for the full partitioning. Methods: Ratio Cut: The algorithm used to divide a circuit into tightly connected clusters is the ratio cut. This algorithm uses a cost function CAB/(|A| * |B|) where C is the cut size and |A| is the cardinality of the set A. If sets A and B are the same size, the product of their sizes is the maximum. Thus, the cost function favors more equal cuts and thus the ratio cut usually generates a more even partitioning than a min-cut algorithm. However, the relative sizes of the partitions need not be equal. Sean Griffin 901-688-4075 Page 2 2/17/2016 ECE 556 The algorithm for finding the ratio cut has three parts as follows: Initialization: Input: a network N = (V, E). Output: an initial partitioning. 1) Randomly choose a module s in V. Find the other seed module t at the end of a longest path by breadth-first search starting from s. Let X = {s}, and Y = V – {s, t}. 2) Choose a module i in Y whose movement to X will generate the best ratio among all the other competing modules. Move module i from Y to X; update X = X U {i}, and Y = Y – {i}. 3) Repeat steps 2 until Y = Ø. 4) Repeat steps 2 and 3 with X = {t} and Y = V – {s, t} until Y = Ø. 5) The cut giving the minimum ratio found in the first procedure forms the initial partitioning. 6) Iterative Shifting: Input: an initial partitioning found in the direction from s to t. Output: a new partitioning if the ratio of the initial partitioning has been reduced. 1) Repeat right shifting operations until all the modules are exhausted. 2) Choose the minimal ratio obtained in step 1. If the new ratio value is reduced from step 1, then the cut producing the ratio forms a new starting partition; otherwise, output the previous partition and exit the process. 3) Repeat steps 1 and 2 with left shifting operations. 4) Repeat steps 1-3. Group Swapping: Input: a partitioning from the iterative shifting phase. Output: the final partitioning result. 1) Calculate the ratio gain r(i) for every module i, and set all modules to the "unlocked" state. 2) Select an unlocked module i with the largest ratio gain from the two subsets. 3) Move module i to the other side, and lock it. 4) Update the ratio gains of the remaining affected and unlocked modules. 5) Repeat steps 2-4 until all modules are locked. 6) If the largest accumulated ratio gain during this process is positive, swap the group of modules corresponding to the largest gain. 7) Repeat steps 1-6 until the accumulated ratio gain in step 6 is nonpositive. Sean Griffin 901-688-4075 Page 3 2/17/2016 ECE 556 Fiduccia-Mattheyses Partitioning: A Kernighan-Lin algorithm which partitions a circuit into two sets by moving one node from one set to the other in each pass. Since only one node is moved per pass, the circuit can become unbalanced, and therefore a balance condition is set. In order for a node to be moved from one partition to the other, the balance condition must be met. The balance condition is r = |A|/(|A| + |B|) where r is the ratio and is user defined. The Fiduccia-Mattheyses Partitioning algorithm: 1. Compute the gains of all cells. 2. Select the cell with the maximum gain that satisfies the balance criterion. 3. Move the cell to the other side and lock it. Update the gains of the rest of the cells. 4. If there are remaining cells which are unlocked and satisfy the balance condition goto 2. 5. Select the sequence of moves which resulted in the best partial sum. 6. If the best partial sum was positive, then unlock all the cells and goto step 1. The Stable Two-Way Partitioning Algorithm: First partition the Network into approximately g partitions using ratio cut recursively. Then form a contracted network from these clusters, where each cluster is represented by a node, and the edges between the clusters are represented by edges between nodes. Partition these clusters using the specified balance conditions with Fiduccia-Mattheyses partitioning. Since the number of clusters is small, and the Fiduccia-Mattheyses algorithm is dependant on the initial partitioning, try the algorithm a number of times with various starting partitions. Finally, convert the contracted network back to the full network and apply Fiduccia-Mattheyses once. Given: a network N = (V, E), an integer g, an integer num_of_reps, and a ratio r. 1) Initialize Ψ = {V}. Calculate the total circuit size tsize. 2) Let V* be a subset in Ψ such that |V*| = maxVi ε Ψ |Vi|. While |V*| > tsize/g, repeat (3). 3) Set Ψ = Ψ – {V*}. Apply the ratio cut algorithm to V* to get a cut (A, A'} where V* = A U A'. Set Ψ = Ψ U {A, A'}. 4) Construct a contracted network H = {V', H'}. 5) Apply number_of_reps times of the Fiduccia-Mattheyses algorithm to H with the size constrait r. 7. Use the best step from step 5 to partition the network N. Apply the FiducciaMattheyses algorithm once to N with size constraint r. Sean Griffin 901-688-4075 Page 4 2/17/2016 ECE 556 Program Structure: The program uses three different classes, the main Partition class, a FM class and a RC class. Global Structures The two structures below are used to keep track of the network and are passed to the two classes to do the work. struct Node { int index; int* edges; int numEdges; int superNode; int tempIndex; }; struct Edge { int index; int weight; int node[2]; int tempIndex; }; The enumeration Partition is used for clarity. enum Partition { A, B, UNKNOWN, BOTH, IGNORE }; Main (Partition) class: The Partition class first reads in the nodes from the file. It forms an array of Node and Edge objects. Portions of this array are then passed to the ratio cut algorithm which sorts the array and returns indexes of the start of each partition. The SuperNode value for each of the nodes is set accordingly. The partitions are split until all are smaller than the required size. A second array of nodes is then formed. Each node in the secord array represents all nodes in the first array with the supernode value corresponding to the nodes array index. A second array of edges is formed. The new arrays are passed to the FM class to sort with various starting positions. Finally the best of the SuperNode partitionings is used to sort the origional array of nodes. The original array is then passed to the FM algorithm to sort into final partitions. Sean Griffin 901-688-4075 Page 5 2/17/2016 ECE 556 RC class: The RC class has 3 structures: struct RCNode { Node node; bool locked; }; struct RCEdge { Edge edge; Partition part; }; struct RCLinkedList { int listItem; RCLinkedList* prev; RCLinkedList* next; }; The RCNode and RCEdge structures keep track of extra information related to the RC algorithm, and the linked list structure allows RCNodes and RCEdges to be formed into lists easily. The class has a number of functions for manipulating linked lists. Additionally the linked lists are often used as sets, and thus a few set functions are defined for the linked lists. It was noted that the initialization and iterative shifting phases of the Ratio Cut algorithm are simply special cases of the group swapping portion of the algorithm. Thus, the RC class has a function called ShiftEngine with parameters to specify the direction shifting should occur in, if shifting in both directions is allowed, and if the shift should be performed once or until there is no gain. The Initialization, IterativeShifting, and GroupSwapping algorithms simply call the ShiftEngine with the appropriate parameters. FM class: The FM class has 2 structures: struct FMNode { Node node; bool locked; Partition part; int gain; }; struct FMEdge { Edge edge; Partition part; }; These structures keep track of information used by the Fiduccia-Mattheyses algorithm. Sean Griffin 901-688-4075 Page 6 2/17/2016 ECE 556 Results and Discussion: The algorithm was evaluated using the test files provided for KL partitioning, and a couple files generated specifically to test the stable algorithm. The ratio cut portion of the program tended to perform somewhat poorly on the KL test files, often choosing a partitioning with one node on one side, and the rest on the other. This is likely because the files were generated with fairly large weights relative to the number of nodes, and no natural clusters. The files I generated to test the algorithm had a small number of tightly connected nodes, which were easily identified by the ratio cut algorithm. Neither set of test files however is likely to very closely mimic real networks. The overall algorithm tended to achieve results similar to or slightly better than KL with both sets of files. Because the goal was to achieve stability, and not necessarily performance, the RC algorithm is not nearly as optimized as it should be. Therefore my program tended to have very long run times with circuits larger than about 100 nodes. This made it difficult to test the performance vs. KL in larger circuits which is where the algorithm is actually supposed to achieve better results. The algorithm did prove to be very stable, finding similar cut sizes, and often even the same final partitions with different input orderings. Reference: Cheng, Chung-Kuan. and Wei, Yen-Chuen A. An Improved Two-Way Partitioning Algorithm with Stable Perforamnce. IEEE Trans CAD Vol 10. NO. 12, Dec. 1991 Cheng, Chung-Kuan. and Wei, Yen-Chuen. Ratio Cut Partitioning for Hierarchical Designs. IEEE Trans CAD Vol 10. NO. 7, July 1991. Sait, Sadiq M. and Youssef, Habib. VLSI Physical Design Automation: Theory and Practice. World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore. 2004.