Sets as AVL trees

advertisement
TreeSet<T> Implementation: AVL Trees
Total and balanced collections of binary trees
A collection C of binary search trees (whose node values are of type T,
say) is said to be total iff for every (finite) set s of elements drawn from
T, there is a tree in C representing s. For example, the collection of
binary search trees (with values of type Integer, say) all of whose
nodes have empty right subtrees is total (each tree in this case is like a
sorted linked list). For example, the set {1, 2, 3, 4} is represented by the
tree on the right, which is obviously in C. On the other hand, the
collection D of binary search trees (containing values of type Integer,
say) where every node has 0 or 2 child nodes is not total – it has no tree
representing the set {1, 2, 3, 4}, for example (try it!).
4
3
2
1
A collection of binary search trees is balanced iff for some constant k every tree in the collection
has height at most k*log2n, where n denotes the number of nodes in the tree (obviously k must
be at least 1). Each tree in a balanced collection is also said to be balanced. Informally, a tree is
balanced if the tree and all its subtrees have very roughly the same number of nodes on the left
and right hand sides. As an example, the collection C mentioned above is clearly not balanced (in
fact, each tree has height n).
If a total and balanced collection of binary search trees has insertion and deletion algorithms
whose worst case time complexity is proportional to the height of the tree being operated on,
then clearly insertion and deletion (and, of course, searching) have O(log n) worst case time
complexity. Such a collection offers an efficient implementation of TreeSet. Two such
collections are in common use: AVL trees and red-black trees.
AVL trees
A node in a binary tree is said to be AVL if the height of its left and right subtrees differ by at
most 1. A binary tree is said to be AVL if all its nodes are AVL. It can be shown that the height
of an AVL tree does not exceed about 1.44 log2n. Hence the AVL collection of binary search
trees (whose nodes contains values of some type T, say) is balanced. Moreover, it will become
evident shortly (when we exhibit the insertion algorithm) that it is total. Hence the collection of
AVL trees is a good candidate for implementing TreeSet.
Searching an AVL tree
Searching an AVL tree is exactly the same as searching a general binary search tree, and so the
time complexity is O(log n).
Insertion in an AVL tree
Insertion in an AVL tree proceeds as for insertion in a regular binary search tree (resulting in a
new leaf node, recall), but additionally the tree may need re-balancing to keep it AVL.
Balanced Trees 1
Suppose that the values
8
9
8
9, 8, 4, 3, 6, 7 are
inserted into an empty
9
4
8
9
4
AVL tree. 9 and 8 are
inserted as usual, but
3
6
4
after insertion of 4 node
9 is not AVL (see (i)).
7
It is re-balanced by
(i)
(ii)
(iii)
“rotating” (the tree
rooted at) node 9 right
(see (ii)). Next 3 is
8
6
inserted, then 6, but
after inserting 7, node 8
9
6
8
is not AVL (see (iii)).
4
Note that the insertion
4
7
path (shown in bold)
9
7
3
has a “kink” at node 4;
this signals that re- 3
balancing needs to
(iv)
(v)
proceed in two steps.
First, we “rotate left”
(the tree rooted at) node 4 (this is the inverse of “rotate right” above), yielding the tree shown in
(iv) (if node 6 happened to have had a left subtree in (iii) the subtree would appear in (iv) as the
right subtree of node 4). Second, we rotate (the tree rooted at) node 8 right resulting in the tree
shown in (v). Observe that the tree is still a binary search tree, and is AVL. For more examples,
see any of the many websites offering animations of AVL trees, e.g.
groups.engin.umd.umich.edu/CIS/course.des/cis350/treetool/
or
webdiis.unizar.es/asignaturas/EDA/AVLTree/avltree.html.
In general, re-balancing a tree after insertion requires rotating at most two (sub-)trees. A rotation
is either a right or a left rotation as shown below (the numbers indicate sample heights of the
(sub)trees). Check that the property
12
11
of being a binary search tree is
b
d
preserved by either rotation.
11
Rotate right
10
Call the path in the tree from the
9
d
10
b
Rotate left
B
point of insertion (of a leaf node) to
F
c
the root the insertion path (shown in
c
bold where relevant). The following
9
9
9
property of AVL trees should be 10 B
F
E
E
immediately clear: Inserting a (leaf)
c
c
c
c
node in an AVL tree affects the heights of at most the nodes on the insertion path. We should
expect, therefore, that restoring the AVL property requires at most re-balancing (sub)trees at
nodes on this path; we typically say re-balance a node as shorthand for re-balance the (sub)tree
rooted at a node.
To describe the re-balancing phase of the insertion algorithm in detail, we define the balance
factor of a node to be the height of its left subtree minus the height of its right subtree (so a
Balanced Trees 2
binary tree is AVL iff the balance factor of every node is 1, 0, or -1). After insertion, some nodes
on the insertion path may not be AVL because their balance factor is 2 or -2. After inserting the
new node, we proceed back up the insertion path checking the balance factor at each node. Let
node d, say, be the first node we encounter whose balance factor is 2 or -2. We identify the
following cases.
(i) Left-left. The balance factor of d is 2 and that of d’s left child is 1 – see the left tree in the
picture above. Note that node d’s left child b necessarily lies on the insertion path – why?
Observe that a right rotation of node d makes the resulting tree AVL – check!
(ii) Left-right. The balance factor of d is 2 and that of d’s left child (node b,
again on the insertion path) is -1. First picture subtree E expanded, labelling its
root node c (you should be able to convince yourself easily that c lies on the
insertion path). Now rotate node b left, yielding the heights indicated.
Although the tree is still not AVL, it is not AVL precisely in the manner of
case (i) – check this! So just rotate node d right, and we’re done. In expanding
E we attributed a balance factor of -1 to node c (its subtree heights are 8 and 9,
resp.); check that the transformation still works if we attribute it a balance
factor of 1 (it would also work for a balance factor of 0, but that case can’t
arise).
8
11
9
F
c
Rotate b left
9
B
c
F
c
E
c
10
b
D
c
9
9
B
c
F
c
D
c
C
c
c
9
c
10
c 10
C
c
9
b
11
11
B
c
11
d
d
9
d
12
12
b
12
10
Rotate d right
b
d
10
9
8
9
B
c
C
c
9
8
D
c
9
F
c
Convince yourself that the case of d having a balance factor of 2 and d’s left child (node b)
having a balance factor of 0 cannot arise.
(iii) Right-right. The balance factor of d is -2 and that of d’s right child is -1. Symmetrical to (i).
(iv) Right-left. The balance factor of d is -2 and that of d’s right child is 1. Symmetrical to (ii).
Now observe the following obvious property regarding insertion in an AVL tree: If for any node
d on the insertion path (starting at the point of insertion) the height of d is the same as it was
before the insertion, then so are the heights of the remaining nodes on the path (i.e. up to and
including the root). Hence the balance factors of the remaining nodes are as they were (which is
1, 0, or -1 in each case).
Balanced Trees 3
Finally: At most one node on the insertion path needs re-balancing. A node is rebalanced
because its balance factor has changed from 1 to 2 (or from -1 to -2), and that is because its
height has increased by 1. However, the act of re-balancing leaves the height of the re-balanced
subtree just as it was before the insertion – check this for the two primary cases above (note the
original height of node d is necessarily 11, and the heights of the resulting subtree in either case
is again 11). The result then follows from the preceding property.
Nodes in an AVL tree
Each node t in an AVL tree has a value and two reference fields, just as in a general binary
search tree; in addition it has a height field to record the height of the (sub)tree rooted at t.
After inserting a new node, the height fields clearly only need to be updated at most for the
nodes on the insertion path. Re-balancing and re-calculation of heights, therefore, only requires a
fixed amount of work at each node on the insertion path. As the maximum length of the insertion
path is 1.44 log2n, it follows that the time complexity of insertion is O(log n).
As a minor optimisation in storage costs, it is possible to store the balance factor in each node
rather than the height. The saving is small – balance factors only require two bits whereas
heights stored as integers use 16 or 32 bits typically.
Deletion in an AVL tree
Deletion is as for general binary search trees, except that we must additionally re-calculate
heights and re-balance subtrees along the path from the deletion point to the root. The deletion
point in this case refers to the deletion of the minimum node (refer to deletion in the general
binary search tree). The properties of deletion from an AVL tree are not as simple as those for
insertion, and so deletion is considerably more complex.
Balanced Trees 4
Download