CSCI 333: Kruskal’s Algorithm Overview Prim’s algorithm builds upon a single partial MST, at each step adding an edge connecting the vertex nearest to but not already in the current partial minimum spanning tree. Kruskal’s algorithm, on the other hand, maintains a set of partial minimum spanning trees, and repeatedly adds the shortest edge in the graph whose vertices are in different partial minimum spanning trees. High-level Prim’s algorithm: Let: F be a subset of edges, Y be a subset of vertices, and G be the graph such that G=<E,V> F=0 Y = {v1} while (the instance is not solved) { select a vertex in V-Y that is nearest to Y add the vertex to Y add the edge to F if (Y == V) done } Kruskal’s high-level algorithm: F=0 while (the instance is not solved) { select the next locally optimal edge if (the edge connects two vertices in disjoint subsets) { merge the subsets; add the edge to F; } } if (all the subsets are merged) done Kruskal’s algorithm for the Minimum Spanning Tree problem starts by creating disjoint subsets of V, one for each vertex and containing only that vertex. It then inspects the edges according to non-decreasing weight. If an edge connects two vertices in disjoint subsets, the edge is added and the subsets are merged into one set. This process is repeated until all the subsets are merged into one set. Kurskal’s Algorithm In Kruskal’s algorithm, we have to handle multiple sets: the nodes in each connected component. We need to carry out 2 operations: find(x), where x is a node, and merge(A,B) to merge two disjoint sets. We will use a disjoint set abstract data type, which consists of data types index and set pointer, and routines initial, find, merge, and equal. If we declare index i ; set pointer p, q ; Then initial(n) initializes n disjoint subsets, each of which contains exactly one of the indices between 1 and n. p = find (i) makes p point to the set containing index i. merge (p, q) merges the two sets, to which p and q point, into the set. equal (p, q) returns true if p and q both point to the same set. Data structures for Disjoint Sets are covered in Appendix C (page 590). For an on-line discussion click here. void kruskal (int n, int m, set_of_edges E, set_of_edges& F) { index I, j; set_pointer p, q; edge e; sort the m edges in E by weight in non-decreasing order F = 0 initial(n) while (number of edges in F is less than n-1) { e = edge with least weight not yet considered; I, j = indices of vertices connected by e p = find(i) q = find (j) if ( ! equal(p,q)) { merge (p, q) add e to F } } } To understand Kruskal’s algorithm better consider an example. First, we sort the edges by increasing order of weight [(v1, v2), (v3, v5), (v1, v3), (v2, v3), (v4, v5), (v2, v4)]. Then disjoint sets are created. After that we select edge (v1, v2) because it is the least weight, then edge (v3, v5), then edge (v1, v3), edge (v2, v3) is not added because it will create a cycle. Then edge (v3, v4) is selected. Time Complexity Let n be the number of vertices, and m the number of edges. The edges can be sorted in worse-case time (m lg m) using mergesort as described in chapter 2. The disjoint sets can be initialized in time (n). Each pass through the while loop, we do two find operations and potentially one merge operation. In the worse case, there are m passes through the while loop. According to Appendix C (page 597), the worse case number of comparisons done in m passes through a loop containing a constant number of calls to equal, find and merge is (m lg m). Therefore the time complexity is (m lg m) where m is the number of edges. In the case of a fully connected graph, where every vertex is connected to every other vertex, m = (n)(n-1)/2 = (n2) Therefore, in the worse case, Kruskal’s algorithm is order: (n2 ln n2) = (n2 ln n) Comparing Prim’s Algorithm with Kruskal’s Algorithm A given graph G can be fully connected or a connected graph G can have as few as |V|-1 edges, hence the relationship between the number of edges and the number of vertices in a connected graph is |V|-1 <= |E| <= (|V|-1)|V|/2. If a graph has large number of edges, i.e. |E| ~= |V|2, then the running time of Kruskal’s Algorithm becomes (|V|2log |V|) therefore Prim’s algorithm is better. If a graph has small number of edges, i.e. |E| ~= |V|, then the running time of Kruskal’s Algorithm becomes (|V| log |V|) therefore Kruskal’s algorithm is better. Proof Again, a proof is necessary is establish that an optimal solution is produced by a greedy algorithm. The proof by induction, similar to the proof for Prim’s algorithm, is on page 154 of your text. Work problems 6 and 9 on page 182 of your text.