Binary Search Trees

advertisement
Binary Search Trees (BSTs)
18 February 2003
Binary Search Tree (BST)



An important special kind of binary tree is the BST
Each node stores some information including a unique
key value, and associated data.
A binary tree is a BST iff, for every node n in the tree:
–
–

2
All keys in n’s left subtree are less than the key n
All keys in n’s right subtree are greater than the key n.
Note: if duplicate keys are allowed, then nodes with
values that are equal to the key in node n can be either
in n’s left subtree or in its right subtree (but not both).
BSTs
6
4
2
3
4
9
2
1
7
3
5
9
Not BSTs
6
4
4
2
4
9
5
2
1
7
3
5
6
BSTs are Not Unique
3
2
1
5
2
4
1
3
4
Importance

The reason binary-search trees are important
is that the following operations can be
implemented efficiently using a BST:
–
–
–
–
6
insert a key value
determine whether a key value is in the tree
remove a key value from the tree
print all of the key values in sorted order
Lookup

In general, to determine whether a given value
is in the BST, we will start at the root of the tree
and determine whether the value we are
looking for:
–
–
–

There are actually two special cases:
–
7
is in the root
might be in the root’s left subtree
might be in the root’s right subtree
–
The tree is empty; return null.
The value is in the root node; return the value.
Lookup


8
If neither special case holds, a recursive
lookup is done on the appropriate subtree.
Since all values less than the root’s value are
in the left subtree, and all values greater than
the root’s value are in the right subtree, there is
no point in looking in both subtrees
Pseudo Code

The pseudo code for the lookup method uses a
recursive method
left
key right
lookup(BST, searchkey)
if (BST = null) return null;
if (BST.key = searchkey) return BST.key;
if (BST.key > searchkey) return lookup(BST.left,
searchkey);
else return lookup(BST.right, searchkey);
9
Look for 12
13
9
5
10
16
12
19
Searching for 12
13
12 < 13 so go
to the left subtree
9
5
16
12
19
13
12 > 9 so go
to the right subtree
9
5
16
12
19
13
9
Found!
11
5
16
12
19
Search for 15
13
15 > 13 so go
to the right subtree
9
5
15 < 16 so go
to the left subtree.
It does not exist so the
search fails and it
returns null
12
16
12
19
13
9
5
16
12
19
Animation
http://www1.mmu.edu.my/~mukund/dsal/BST.htm
l
13
Analysis



14
How much time does it take to search for a
value in a BST?
Note that lookup always follows a path from the
root down towards a leaf. In the worst case, it
goes all the way to a leaf.
Therefore, the worst-case time is proportional
to the length of the longest path from the root
to a leaf (the height of the tree).
Worst Case



15
What is the relationship between the number of
nodes in a BST and the height of the tree?
This depends on the “shape” of the tree.
In the worst case, all nodes have just one child,
and the tree is essentially a linked list.
Worst Case
50


10
15
30
20
16

This tree has 5 nodes, and has
height = 5.
Searching for values in the range
16-19, and 21-29 will require
following the path from the root
down to the leaf (the node
containing the value 20)
Requires time proportional to the
number of nodes in the tree
Best Case

4

2
6

1
17
3
5
7
In best case, all
nodes have 2 children
All leaves are at the
same depth
This tree has 7
nodes, and height = 3
Best Case Tree Height



In general, a “full” tree will have height approximately
log2(N), where N is the number of nodes in the tree.
The value log2(N) is (roughly) the number of times you
can divide N by two, before you get to zero.
For example:
divide by 2 once
divide by 2 a second time
divide by 2 a third time,

18
7/2 = 3
3/2 = 1
1/2 = 0
the result is zero so quit
So log2(7) is approximately equal to 3.
Summary



19
The worst-case time required to do a lookup in
a BST is O(height of tree).
The worst case (a “linear” tree) is O(N), where
N is the number of nodes in the tree.
In the best case (a “full” tree) we get O(log N).
Inserting 15
13 (1) 15 > 13 so go
to right subtree
9
(2) 15 < 16
and no
left
subtree
9
5
20
5
16
12
19
13
16
12
(3) So insert
15 as left
13 child
9
19
5
16
12
15
19
Complexity


21
The complexity for insert is the same as for
lookup
In the worst case, a path is followed all the way
to a leaf.
Delete

If the search for the node containing the value
to be deleted succeeds, there are several
cases to deal with:
1.
2.
3.
22
The node to delete is a leaf (has no children).
The node to delete has one child.
The node to delete has two children
Deletion



23
If KeyToDelete in not in the tree, the tree is
simply unchanged.
We have to be careful that we do not “orphan”
any nodes when we remove one.
When the node to delete is a leaf, we want to
remove it from the BST by setting the
appropriate child pointer of its parent to null (or
by setting root to null if the node to be deleted
is the root, and it has no children).
Delete a leaf (15)
13
9
5
24
13
16
12 15
9
19
5
16
12
19
Delete a node with one child (16)
13
9
5
13
16
12
9
19
5
25
21
25
19
12
25
21
35
35
Messy Case



26
The hard case is when the node to delete has
two children.
To delete n, we can't replace node n with one
of its children, because what would we do with
the other child?
We replace node n with another node, x, lower
down in the tree, then (recursively) delete node
x.
Deletion




27
What node can we use to replace node n?
The tree must remain a BST (all of the values in n’s left
subtree are less than n, and all of the values in n’s right
subtree are greater than n)
There are two possibilities that work: the node in the
left subtree with the largest value, or the node in the
right subtree with the smallest value.
To find that node, we just follow a path in the right
subtree, always going to the left child, since smaller
values are in left subtrees. Once the node is found, we
copy its fields into node n, then we recursively delete
the copied node.
Deletion (8)
8
4
11
2
1
28
6
3
5
9
7
12
10
Delete (8) Replace with (7)
7
4
11
2
1
29
6
3
5
9
7
12
10
Delete (8) Replace with (9)
9
4
11
2
1
30
6
3
5
10
7
12
Keeping BSTs Efficient



31
In best of all worlds, BST is fully balanced at all
times.
In practical world, need BSTs to be “almost”
balanced.
A lot of CS research and energy has gone into
the problem of how to keep binary trees
balanced.
Height Balanced: AVL Trees

32
An AVL tree is a binary tree in which the
heights of the left and right subtrees of the root
differ by at most 1 and in which the left and
right subtrees are AVL.
Splay Trees




33
Self-adjusting data structure
Splay trees are BSTs that move to the root the
most recently accessed node.
Nodes that are frequently accessed tend to
cluster at the top of the tree driving those rarely
accessed toward the leaves.
Splay trees can become highly unbalanced,
but over the long term do perform well.
Download