Trees ) Represent each set by a tree, where each element points to

advertisement
Trees
) Represent each set by a tree, where each element points to
its parent and the root points back to itself.
The representative of a set is the root.
Note that the trees are not necessarily binary trees:
• MAKE-SET(x) just create a new tree with root x.
Complexity: O(1)
• FIND-SET(x):: simply follow ”parent” pointers back to the root of x
Complexity: O(depth of x)
• UNION(x,y): just make the root of one of the trees point to the roo
Complexity:
⇥(max{height(treex ), height(treey )})
93
Worst-case sequence complexity for m operations:
Lower Bound
• Just like for the linked list with back pointers but no size
• I.e., we can create a tree that is just one long chain with m/4 element
How can we create this tree? using a combination of MAKE-SET
and UNION operations.
for i = 1 to m/4 do
MAKE-SET(xi )for i = 1 to m/4 - 1
94
UNION(x
Creating this tree takes m/4
UNION operations.
MAKE-SET operations and m/4
• Now FIND-SET takes time ⌦(m) .
• If we perform m/2 FIND-SET operations, we get a sequence whose total time is ⌦(m2 ).
Q: How do we know there is not a sequence of operations that
takes longer than ⇥(m2 )?
same argument as for linked lists
Q: How can we improve the trees data structure representation
of disjoint sets?
95
1
Add path compression
When performing FIND-SET(x),
• keep track of the nodes visited on the path from x to the
root of the tree by using a stack or queue
• once the root is found, update the parent pointers of each node to poi
Q: How does this affect the complexity of the FIND-SET operation? doubles it the first time, makes it constant the rest of the time
Q: What is the complexity of a UNION(x,y) operation?
depends on whether FIND-SET has already been called on one/both of x
Q: Does the improvement in complexity of UNION and subsequent FIND-SET operations out-weigh the increase in cost of
the initial FIND-SET?
Q: How might we answer this?
do amortized analysis–we’ll see this topic next.
96
Consider a sequence of operations including
• n MAKE-SET ops,
• at most n
1 UNIONs and
• f FIND-SET ops, the worst-case running time of a single
operation in the sequence is:
⇥(
f log n
) if f
log(1 + f /n)
⇥(n + f log n)
n
if f < n
Q: Can we do better?
Add “union-by-rank” and path compression.
Q: What measure of trees matters the most?
With trees, the measure that matters the most for the running time is the
Recall union-by-weight for lists. For trees it makes more sense
to relate heuristics to the height of a tree rather than the overall
weight in the UNION operation.
• Define the rank of a tree to be an upper bound on the height
of the tree.
• Note that the rank may not be equal to the height of the tree.
• We’ll store the rank of a tree at it’s root.
97
Operations
• MAKE-SET(x): Same as before, add rank(x)=0.
• UNION(x,y): We know rank(treey ) and rank(treex ).
Which root of treex and treey becomes the new root?
the node with higher rank is the new root
What is the rank of the new tree?
same as larger rank unless the two nodes have the same rank, pick
• FIND-SET: Nothing changes–use path compression. Does
not affect rank.
This is the best disjoint set implementation.
Q: How good is the worst-case sequence complexity?
It is possible to prove that the worst-case time for a sequence of
m operations, where there are n MAKE-SETs, is O(m log⇤ n).
Q: What is log⇤ ?
It is the number of times that you need to apply log to n until the answer is
98
Example:
5
< log 40 < 6
)
2
< log log 40 < 3
)
)
0 < log log log log 40 < 1
n = 40 )
)
1 < log log log 40 < 2
Back to Kruskal’s Algorithm
KRUSKAL-MST(G=(V,E),w:E->Z)
A := {};
insert the edges into a priority queue Q;
for each vertex v in V, MAKE-SET(v);
while (Q not empty)
e = EXTRACT-MIN(Q) \\e = (u,v);
if FIND-SET(u) =/= FIND-SET(v) then
UNION(u,v);
A := A U {e};
end if
end for
END KRUSKAL-MST
Q: If graph G has n vertices and is connected, then how many
edges does G have?
m>n
1
Q: Inserting the edges into a priority queue and extracting the
min for each edge takes how long?
m log m
99
Suppose that we implement the disjoint set ADT using linked lists
with union-by-weight.
(Remember, linked-lists have a pointer back to the representative element
How many MAKE-SETs do we do?
n
Complexity?
O(n)
Q: How many FIND-SETs do we do?
at most 2m-since we could visit the endpoints of an edge at most 2 times.
Complexity?
O(m)
Q: How many UNIONs do we do?
at most m
Complexity?
at most O(n log n)
So the worst-case complexity of Kruskals is O(m log m + n +
m + n log n).
The bottleneck is the sorting (priority queue step). Therefore the
complexity is O(m log m)
100
Download