Kruskal's Algorithm This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no edges). At each step, we add one (the cheapest one) edge so that it joins two trees together. If it were to form a cycle, it would simply link two nodes that were already part of a single connected tree, so that this edge would not be needed. The steps are: 1. Construct a forest - with each node in a separate tree. 2. Place the edges in a priority queue. 3. Until we've added n-1 edges, 1. Continue extracting the cheapest edge from the queue, until we find one that does not form a cycle, 2. Add it to the forest. Adding it to the forest will join two trees together. Every step joins two trees in the forest together, so that, at the end, only one tree will remain in T. The following sequence of diagrams illustrates Kruskal's algorithm in operation. gh is shortest. Either g or h could be the representative, g chosen arbitrarily. ci creates two trees. c chosen as representative for second. fg is next shortest. Add it, choose g as representative. ab creates a 3rd tree Add cf, merging two trees. c is chosen as the representative. gi is next cheapest, but a cycle would be created. c is the representative of both. Add cd instead hi would make a cycle Add ah instead bc would create a cycle. Add de instead to complete the spanning tree all trees joined, c is sole representative. The very nature of greedy algorithms makes them difficult to prove. We choose the step that maximizes the immediate gain (in the case of the minimum spanning tree - made the smallest possible addition to the total cost so far) without thought for the effect of this choice on the remainder of the problem. So the commonest method of proving a greedy algorithm is to use proof by contradiction, we show that if we didn't make the "greedy" choice, then, in the end, we will find that we should have made that choice. The Minimum Spanning Tree Algorithm At each step in the MST algorithm, we choose the cheapest edge that would not create a cycle. We can easily establish that any edge creating a cycle should not be added. The cycle-completing edge is more expensive than any previously added edge and the nodes which it connects are already joined by some path. Thus it is redundant and can be left out. Each edge that we add must join two sub-trees. If the next cheapest edge, ex, would join two sub-trees, Ta and Tb, then we must, at some later stage, use a more expensive edge, ey, to join Ta to Tb, either directly or by joining a node of one of them to a node that is now connected to the other. But we can join Ta to Tb (and any nodes which are now connected to them) more cheaply by using ex, which proves the proposition that we should choose the cheapest edge at each stage. Complexity The steps in Kruskal's algorithm are: Initialize the forest O(|V|) Sort the edges O(|E|log|E|) Until we've added |V|-1 edges O(|V|) x Check whether an edge O(|V|) = O(|V|2) forms a cycle Total O(|V|+|E|log|E|+|V|2) Since |E|=O(|V|2) O(|V|2log|V|) Thus, we may refer to Kruskal's algorithm as an O(n2log n) algorithm, where n is the number of vertices. However, note that if |E| is similar to |V|, then the complexity is O(n2). This emphasizes the point that no good software engineer tries to re-invent wheels, he keeps a good algorithms text in his library and makes sure to refer to it before attempting to program a new problem!