Kruskal's Algorithm forest

advertisement
Kruskal's Algorithm
This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no edges).
At each step, we add one (the cheapest one) edge so that it joins two trees together. If it were to form a
cycle, it would simply link two nodes that were already part of a single connected tree, so that this edge
would not be needed.
The steps are:
1. Construct a forest - with each node in a separate tree.
2. Place the edges in a priority queue.
3. Until we've added n-1 edges,
1. Continue extracting the cheapest edge from the queue,
until we find one that does not form a cycle,
2. Add it to the forest. Adding it to the forest will join two trees together.
Every step joins two trees in the forest together, so that, at the end, only one tree will remain in T.
The following sequence of diagrams illustrates Kruskal's algorithm in operation.
gh is shortest.
Either g or h could be the representative,
g chosen arbitrarily.
ci creates two trees.
c chosen as representative for second.
fg is next shortest.
Add it, choose g as representative.
ab creates a 3rd tree
Add cf,
merging two trees.
c is chosen as the representative.
gi is next cheapest,
but a cycle would be created.
c is the representative of both.
Add cd instead
hi would make a cycle
Add ah instead
bc would create a cycle.
Add de instead
to complete the spanning tree all trees joined, c is sole representative.
The very nature of greedy algorithms makes them difficult to prove. We choose the step that maximizes
the immediate gain (in the case of the minimum spanning tree - made the smallest possible addition to the
total cost so far) without thought for the effect of this choice on the remainder of the problem.
So the commonest method of proving a greedy algorithm is to use proof by contradiction, we show that if
we didn't make the "greedy" choice, then, in the end, we will find that we should have made that choice.
The Minimum Spanning Tree Algorithm
At each step in the MST algorithm, we choose the cheapest edge that would not create a cycle.
We can easily establish that any edge creating a cycle should not be added. The cycle-completing edge is
more expensive than any previously added edge and the nodes which it connects are already joined by
some path. Thus it is redundant and can be left out.
Each edge that we add must join two sub-trees. If the next cheapest edge, ex, would join two sub-trees, Ta
and Tb, then we must, at some later stage, use a more expensive edge, ey, to join Ta to Tb, either directly
or by joining a node of one of them to a node that is now connected to the other. But we can join Ta to Tb
(and any nodes which are now connected to them) more cheaply by using ex, which proves the
proposition that we should choose the cheapest edge at each stage.
Complexity
The steps in Kruskal's algorithm are:
Initialize the forest
O(|V|)
Sort the edges
O(|E|log|E|)
Until we've added |V|-1 edges O(|V|) x
Check whether an edge
O(|V|) = O(|V|2)
forms a cycle
Total
O(|V|+|E|log|E|+|V|2)
Since |E|=O(|V|2)
O(|V|2log|V|)
Thus, we may refer to Kruskal's algorithm as an O(n2log n) algorithm, where n is the number of vertices.
However, note that if |E| is similar to |V|, then the complexity is O(n2).
This emphasizes the point that no good software engineer tries to re-invent wheels, he keeps a good
algorithms text in his library and makes sure to refer to it before attempting to program a new problem!
Download