7.3 SELF-ADJUSTING BINARY SEARCH TREES

advertisement
7.3 SELF-ADJUSTING BINARY SEARCH TREES
243
7.3 SELF-ADJUSTING BINARY SEARCH TREES
Our final tree implementation of the dictionary abstract data type is in many respects simpler
than the balanced tree structures considered in the previous sections. The data structure is a pure
binary search tree-the nodes have no balance, color, or other auxiliary fields, only left and right
child pointers and fields for the key itself and any associated data. The structure is distinguished
from a simple binary search tree by the algorithms that are used to implement the LookUp,
Insert, and Delete operations. If the dictionary contains fl items, these algorithms are not
guaranteed to operate in O(log n) time in the worst case. But we do have a guarantee of
amortized logarithmic cost: Any sequence of m of these operations, starting from an empty tree,
is guaranteed to take a total amount of time that is O(m log n). Therefore the average time used
by an operation in the sequence of length m is O(log n), and the amortized cost of an operation is
O(log ri). Though the amortized cost of an operation is G(log n), there may be single operations
whose cost is much higher-Q(n), for example-but this can happen only if those operations have
been preceded by many whose cost is 50 small that the cost of the entire sequence is O(m log n).
For many applications the guarantee of logarithmic amortized time is quite sufficient, and the
algorithms are sufficiently simpler than AVL tree or red-black tree algorithms that they are
preferable.
The algorithms operate by applying a tree version of the Move-to-Front Heuristic discussed
on page 179; each time a key is the object of a successful search, its node is moved to the root of
the binary tree. (However, the movement must happen in a very particular way, which is
described below. And to reemphasize, unlike the results of the analysis in §6.2, the guarantees on
the performance of these trees do not depend on any assumption about the probability
distribution of the operations on keys.) The critical operation is called Splay. Given a binary
search tree T and a key K, Splay(K, T) modifies T 50 that it remains a binary search tree on the
same keys. But the new tree has K at the root, if K is in the tree; if K is not in the tree, then the
root contains a key that would be the inorder predecessor or successor of K, if K were in the tree
(Figure 7.16). We call this "splaying the tree around K," and we refer to trees that are
manipulated using the splay operation as splay trees. (To "splay" something is to spread it out or
flatten it.)
Suppose that we are given an implementation of the Splay operation (we shall see just below
how Splay can be implemented efficiently). Then the dictionary operations can be implemented
as follows:
LookUp(K, T): Execute Splay(K, T), and then examine the root of the tree to see if it contains
K (Figure 7.17).
Insert(K, 1, T): Execute Splay(K, T). If K is in fact at the root, then simply install 1 in this
node. Otherwise create a new node containing K and I and break one link to make this
node the new root (Figure 7.18).
244
TREE STRUCTURES FOR DYNAMIC DICTIONARIES
Figure 7.16 Effect of Splay(K,T). If key K is in tree T, it is brought to the root, otherwise a
key in T that would neighbor K in the directory ordering is brought up to the root.
Figure 7.17 Implementation of LookUp(K,T) with the aid of Splay. Splay the tree around K,
then see if K is at the root.
Figure 7.18 Implementation of Insert(K,T) with the aid of Splay. Splay the tree around K,
then make K the root.
Delete(K, T) is implemented with the aid of an operation Concat(T1, T2). If T1 and T2 are binary
search trees such that every key in T1 is less than every key in T2, then Concat(T1, T2) creates a
binary search tree containing all keys in either T1 or T2. Concat is implemented with the aid of
Splay as follows:
Concat(T1, T2): First execute Splay(+oo, T1), where +00 is a key value greater than any that
can occur in a tree. After this has been done, T1 has no right subtree; attach the root of
T2 as the right child of the root of T1 (Figure 7.19).
7.3 SELF-ADJUSTING BINARY SEARCH TREES
245
Figure 7.19 Implementation of Concat(T1, T2) with the aid of Splay. Splay the first tree around
+00, then make the second tree the right subtree of the root.
Figure 7.20 Implementation of Delete(K, T) with the aid of Splay and Concat. Splay the tree
around K, then concatenate the two subtrees of the root.
Then Delete is implemented thus:
Delete(K, T>: Execute Splay(K, T). If the root does not contain K then there is nothing to do.
Otherwise apply Concat to the two subtrees of the root (Figure 7.20).
Thus to complete the account of the dictionary operations, it remains only to describe the
implementation of the splay operation. To splay
T around K, first search for K in the usual way,
remembering the search path by stacking it.1 Let P be the last node inspected; ifK is in the tree,
then K is in node P, and otherwise P has an empty child where the search for K terminated.
When the splay has been completed, P will be the new root. Return along the path from P back
to the root, carrying out the following rotations, which move P up the tree.
1
The size of the stack can be Ω(n), but link inversion can be used to reduce memory utilization.
246
TREE STRUCTURES FOR DYNAMIC DICTIONARIES
Figure 7.21 Rotation during splay, Case 1: P has no grandparent.
Figure 7.22 Rotation during splay, Case II: P and its parent are both leif children
Case 1. P has no grandparent, that is, Parent(P) is the root. Perform a single rotation
around the parent of P, as illustrated in Figure 7.21 or its mirror image.
Case II. P and Parent(P) are both left children, or both right children. Perform two
single rotations in the same direction, first around the grandparent of P and then
around the parent of P, as shown in Figure 7.22 or its mirror image.
Case III. One of P and Parent(P) is a left child and the other is a right child.
Perform single rotations in opposite directions, first around the parent of P and then
around its grandparent, as illustrated in Figure 7.23 or its mirror image.
7.3 SELF-ADJUSTING BINARY SEARCH TREES
247
Figure 7.23 Rotation during splay, Case III: P is a left child and its parent is a right child.
Ultimately P becomes the root and the splay algorithm is complete.
Note that Cases I and III are AVL tree single and double rotations, but Case II is special to
this algorithm. Figure 7.24 gives an example of splaying. The effects of the rotations are fairly
mysterious; note that they do not necessarily decrease the height of the tree (in fact, they can
increase it), nor do they necessarily make the tree more well-balanced in any evident way. The
analysis of these algorithms is more subtle than those of previous sections, because it must take
into account that the time "saved" while performing low-cost operations can be "used" later
during a time-consuming operation. To capture this idea, we use a banking metaphor.
(The remainder of this section deals only with the analysis of the algorithms that have already
been presented; the numerical quantities discussed below-"money," for example-play no role in
the actual implementation of the algorithms.)
We regard each node of the tree as a bank account containing a certain amount of money. The
amount of money at a node depends on how many descendants it has; nodes with more
descendants have more money. Thus as nodes are added to the tree, more money must be added
in order to keep enough money at each node. Also any fixed amount of work-performing a single
rotation at a single node, for example-costs a fixed amount of money. The essence of the proof is
to show that any sequence of m dictionary operations, starting from an empty tree and with the
tree never having more than fl nodes, can be carried out by a total investment of O(m log n)
dollars. On any single operation some of these dollars may come out of the "bank accounts"
already at the nodes of the tree, and some may be "new investment"; and on any single operation
some of these dollars may go to keep the bank accounts up to their required minimums, and
some may go to pay for the work done on the tree. But in aggregate O(m log n) dollars are
enough, so that the amortized cost of any single operation is only O(log n).
248
TREE STRUCTURES FOR DYNAMIC DICTIONARIES
Figure 7.24 Splaying a tree around D. (a) Original tree; D is a leaf child of a leaf child, 50 Case
II applies. (b) After applying the rotations of Figure 7.22 at D, E, and G. D is now a leaf child
of a fight child, 50 Case III applies. (c) After applying the rotation of Figure 7.23 at D, H, and
C. D now has no grandparent, 50 Case I applies. (d) After applying the rotation of Figure 7.21
at D and L.
To be precise about the necessary minimum bank balance at each node, for any node N let w(N)
(the weight of N) be the number of descendants of N (including N itself), and let r(N) (the rank
of N) be lg w(N). Then we insist that the following condition be maintained:
The Money Invariant: Each node N has r(N) dollars at all times.
Initially the tree is empty, and 50 there is no money in it. Money gets used in two ways while a
splay is in progress.
1.
We must pay for the time used. A fixed amount of time costs a fixed amount of money (say,
$1 per operation).
7.3
SELF ADJUSTING BINARY SEARCH TREES
249
2. Since the shape of the tree changes as the splay is carried out, we may have to add some
money to the tree, or redistribute the money already in the tree, in order to maintain the
Money Invariant everywhere.
Money that is spent, either to pay for time or to maintain the invariant, may be taken out of the
tree or may be "new money." The critical fact is this:
LEMMA (Investment) It costs at most 3 lg n + 1 new dollars to splay a tree with fl nodes
while maintaining the Money Invariant everywhere.
Let us defer the proof of the Investment Lemma for the time being, and suppose that it is true.
The Investment Lemma provides all the information that is needed to complete the amortized
analysis of splay trees.
THEOREM (Splay Tree) Any sequence of m dictionary operations on a self-adjusting tree
that is initially empty and never has more than fl nodes uses O(m log n) time.
PROOF
Any single dictionary operation on a tree T with at most fl nodes costs
O(log n) new dollars:
• LookUp(K, T) costs only what it costs to do the splay, which is O(log n).
• Insert(K, 1, T) costs what it costs to do the splay, plus what must be banked in the new root
to maintain the invariant there; this is lg(n + 1 ) additional dollars, for a total of O(log n).
(The new root is the only node that gains descendants when the new root is inserted.)
• Concat(T1, T2), where T1 and T2 have at most n nodes, costs what it costs to splay T1, which
is O(log n), plus what must be banked in the root in order to make T2 a subtree, which is at
most lg n, for a total of O(log n).
• Delete(K, T) costs what it costs to splay T, plus what it costs to concatenate the two
resulting subtrees, which is again O(log n).
This is the amount of new money required in each case. Nonetheless an operation may take
more than ~(log n) time, since the time can be paid for with money that had previously been
banked in the tree. However, if we start with an empty tree and do m operations, then the
amount of money in the tree is 0 initially and > 0 at the end, and by the Investment Lemma at
most m(3 lg n + 1) dollars are invested in the interim. This must be enough to pay for all
the time used as well as to maintain the invariant, so the amount of time used must be O(m
log n).
Now we tum to the proof of the Investment Lemma. For this we shall need two simple
observations about the ranks of nodes. Clearly the rank of a node is greater than or equal to the
rank of any of its descendants. Slightly less obvious is the
250
TREE STRUCTURES FOR DYNAMIC DICTIONARIES
• LEMMA (Rank Rule) If a node has two children of equal rank, then its rank is greater
than that of each child.
PROOF Let N be the node and let U and V be its children. By
the definition of rank, w(U) > 2r(U) and w(V) > 2r(V). If r(U)=r(V),
then w(N) > w(U) + w(V) ≥ 2r(U)+1 Therefore r(N) =lg w(N)  ≥ r(U)+l.
Now consider a single step of a splay operation, that is, a rotation as described in Case I, II, or
III. We write r'(P) to denote the rank of P after the rotation has been done, and r(P) to denote its
value beforehand.
• LEMMA (Cost of Splay Steps) A splay step involving node P, the parent of P, and
(possibly) the grandparent of P can be done with an investment of at most 3(r'(P) - r(P)) new
dollars, plus one more dollar if this was the last step in the splay.
Deferring for the moment the proof of this Lemma, we show that it implies the Investment
Lemma. Let us write r(i)(P) for the rank of P after i steps of the splay operation have been carried
out. According to the Lemma, the total investment of new money needed to carry out the splay is
at most
3(r'(P) - r(P))
+ 3(r(2)(P) - r'(P))
+ …
+ 3(r(k)(P) - r(k-1)(P)) + 1,
where k is the number of steps needed to bring P to the root. But r(k)(P) is the rank of the original
root, since the tree has the same number of nodes after the splay as before, so r(k)(P) < lg n. The
middle terms of the sum cancel out, and the total is 3(r(k) (P) - r(P)) + 1 ≤ 3 lg n + 1.
PROOF (of the Cost of Splay Steps Lemma) The three types of rotation must be treated
separately. In each case, let Q be the parent of P, and R the parent of Q, if it has one.
• Case I P has no grandparent. This must be the last step. The one extra dollar pays for the
time used to do the rotation. Since r'(P) = r(Q) (Figure 7.21), the number of new dollars
that must be added to the tree is
r'(P)+ r'(Q) - (r(P) + r(Q))
= r'(Q) - r(P)
≤ r'(P) - r(P) since Q becomes a child of P.
This is 1/3 of the amount specified in the Lemma.
7.3
SELF ADJUSTING BINARY SEARCH TREES
251
• Case II. Here r'(P) = r(R) (see Figure 7.22; r' refers to the situation in the rightmost tree, after
both rotations have been completed). So the total amount that needs to be added to the tree to
maintain the invariant is
r'(P) + r'(Q) + r'(R) - (r(P) + r(Q) + r(R))
= r'(Q) + r'(R) - (r(P) + r(Q))
≤ 2(r'(P) –r(P))
which is 2/3 of the available money. If r'(P) > r(P), then a dollar is left
over to pay for the work. So assume for the duration that r'(P) = r(P). Then also
r'(P) = r(R)
(IIa)
(since R is the root of the subtree before the rotations and P is the root afterwards). If r'(R)
were equal to r(P), then by the Rank Rule on the middle tree of Figure 7.22, r(P) <r'(P),
contrary to assumption. Hence
r'(R) < r(P),
(IIb)
since r'(R) ≤ r'(P) = r(P). Finally
r'(Q) S r(Q),
(IIc)
since r'(Q) ≤ r'(P) = r(P) ≤ r(Q). By (IIa), (IIb), and (IIc) we can move R's money to P, P's
money to R, and leave Q's money where it is, maintain the invariant everywhere, and still have
a dollar left over to pay for the work.
• Case III. In this case r'(P) = r(R) and r'(Q) ≤ r(Q) (see Figure 7.23). So if we move R's money
to P and leave some or all of Q's money on the invariant will remain true at P and Q. To
satisfy the invariant on R, use the money from P plus an additional r'(R) - r(P) ≤ r'(P) - r(P)
dollars, one third of the new dollars available. If r'(P) > r(P), then there is one dollar left over
to pay for the work. Otherwise r'(P) = r(P) = r(Q) = r(R) and hence either r'(Q) < r'(P) or
r'(R) < r'(P) (since r' (P) = r'(Q) = r'(R) is impossible by the Rank Rule applied to the righthand tree in Figure 7.23). So either r'(Q) <r(Q) or r'(R) <r(P) and there is a dollar left over to
pay for the work.
Download