Binary Search Trees

advertisement
Joe Meehean
1
 Important
and common problem
 Given a collection,
determine whether value v is a member
 Common variation
• given a collection of unique keys,
each associated with a value
• find value v associated with the key k
• find the mapping
 Dictionary
• key => word
• value => definition
 Phonebook
• key => name
• value => phone number
 Webpage
• key => address
• value => html files and pictures
 Problem:
given an array of N values,
determine if v is one of them
 Two approaches
• sequential search
• binary search
 Look
at each value in turn (iterate)
• e.g., a[0], a[1], …
• quit when v is found or end of array reached
• worst case time: O(N)
 What
if a is sorted?
• look at each value in turn
• quit when v is found or end of array reached
• OR, when current value is > v
• worst case time still O(N)
 Array
must be sorted
• (a[0] <= a[1] <= a[2] … <= a[n])
 Algorithm
• like the Clock Game on Price is Right
 Array
must be sorted
• (a[0] <= a[1] <= a[2] … <= a[n])
 Algorithm
• look at middle value x in array
• if x == v
• else eliminate ½ the array
 if v < x, eliminate the right half
 if v > x, eliminate the left half
• repeat until v is found or no remaining values
bat
cat
dog elk
fox
owl
rat
fox
owl
rat
fox
owl
rat
fox
owl
rat
bar
ant
bat
cat
dog elk
bar
ant
bat
cat
dog elk
bar
ant
bat
cat
dog elk
bar
Throws away half the entries
at every compare
ant
8
 Array
or vector must be sorted
 Data type must provide < operator
• if !(a < b) && !(b < a) then b == a
 Or
Comparator
 Number
of times N can be divided by 2
 In Big O it is O(logN)
• difference between log2 N and
log N is a constant
 Scales
better than O(N)
• O(logN) algorithms are faster
 If
N = 1024, log(N) is 10
10
bat
cat
dog elk
fox
owl
rat
fox
owl
rat
fox
owl
rat
fox
owl
rat
bar
ant
bat
cat
dog elk
bar
ant
bat
cat
dog elk
bar
ant
bat
cat
dog elk
bar
Throws away half the entries
at every compare
ant
12
What if we made a special data
structure that represents a
binary search?
13
 Special
kind of binary tree
 Each node stores a key
• sometimes an associated value
 For
each node n
• all keys in n’s left subtree are < key at n
• all keys in n’s right subtree are > key at n
• if duplicate keys allowed
 keys that equal n can go left XOR right (not both)
14
 Insert
a key (and associated data)
 Lookup a key (and associated data)
 Remove a key (…)
 Print all keys in sorted order
• using an inorder traversal
15
6
4
2
6
9
5
4
2
9
7
16
4
2
1
6
5
4
2
6
9
5
4
2
9
7
3
Yes
In order traversal produces:
24569
No: 7 is not < 6
17
// private inner class of BST<K>
class BinaryNode{
public:
K key_;
BinaryNode * left_;
BinaryNode * right_;
//constructors
}
18
template <class K,class Compare=less<K> >
class BST <K,Compare>{
private:
BinaryNode* root_;
Compare isLessThan_;
public:
BST() {root_ = NULL;}
bool insert(const K& key);
bool lookup(const K& key);
void delete(const K& key);
}
19
20
 Key
is in BST if it is in
• the root
• the left subtree
• or the right subtree
 Don’t
need to look in both subtrees
• just like binary search in an array
21
// public driver method
// method of BST
bool lookup(K& key){
// private recursive helper method
// on next slide
return lookup(root, key);
}
22
// private method of BST
bool lookup(Bnode* n, K& k){
if( n == NULL )
return false;
else if( isLessThan(k, n->key) )
return lookup(n->left, k);
else if( isLessThan(n->key, k))
return lookup(n->right, k);
else
return true;
}
23
Class Activity
 Cases
6
• empty (null) subtree
• value found
• next look left
4
• next look right
 Shout
9
it out
• lookup(4)
• lookup(5)
• lookup(3)
2
5
24
 Always
follows path from root down
 Worst-case
• goes to a leaf along longest path
• proportional to tree height
 Height
related to size
• given size, how can we know height?
25
 Best
case tree is balanced
• all non-leaf nodes have 2 children
• all leaves at the same depth
• height is log2N
 Worst
case tree is linear
• all non-leaf nodes have a single child
• height is N
26
6
lookup(2)
4
3
9
5
7
15
2
27
6
lookup(2)
4
3
2
9
5
7
15
Eliminates half the nodes
at every compare
28
6
lookup(2)
4
3
2
9
5
7
15
Eliminates half the nodes
at every compare
29
6
lookup(2)
4
3
2
9
5
7
15
Eliminates half the nodes
at every compare
30
 Worst-case
• O(height of tree)
 Worst
of worst
• height is N
• lookup is O(N)
 Best
worst-case
• height is log2N
• lookup is O(logN)
 O(LogN)
is waaaay better than O(N)
31
32
 New
values inserted as leaves
 Must choose position to respect
BST ordering
• and to ensure we can find it with a lookup
 Duplicate
keys are not allowed
33
 Traverse
the tree
• like a lookup
 If
we find a duplicate
• return an error
 If
we end up at a null
• make a new node with the key
• make it the child of the node with null pointer
 Note
the above two were our base cases
for lookup too
34
// members of BST
void insert(const K& key){
insert(root, key);
}
void insert( BinaryNode*& n, const K& key){
if( n == NULL ){
n = new BinaryNode(key);
}else if( isLessThan(key, n->key_) ){
insert(n->left_, key);
}else if( isLessThan(n->key_, key) ){
insert(n->right_, key);
}else{
//duplicate, do nothing
}
}
35
CLASS ACTIVITY
 First
names BST
• You add your names
36
 Similar
to lookup
• worst-case follow path from root to leaf
• O(logN) for a balanced tree
• O(N) for a completely unbalanced tree
37
 Find
the node n w/ key to be deleted
 Different actions depending on
n’s # of kids
• Case 1: n has 0 kids (it’s a leaf)
 set parent’s n-pointer (left or right) to null
• Case 2: n has 1 kid
 set parent’s n-pointer to point to n’s only kid
• Case 3: n has 2 kids
 replace n’s key with a key further down in the tree
 delete that node
38
 What
node key can replace n’s key?
• new key of n must be:
• > all keys in left subtree
• < all keys in right subtree
 Largest
key from the left subtree
 Smallest key from the right subtree
• let’s choose this one (arbitrarily)
• use findMin on root of right subtree
39
8
delete(17)
15
…
…
20
…
18
…
16
17
40
8
delete(17)
15
…
…
20
…
18
…
16
17
41
8
delete(16)
15
…
…
20
…
18
…
16
17
42
8
delete(16)
15
…
…
20
…
18
…
16
17
43
delete(15)
8
15
…
…
20
…
18
…
16
Smallest key in right subtree
17
44
delete(15)
8
16
…
…
20
…
18
…
16
Case 2: 1 kid
Replace 16 with it’s only child
17
45
8
delete(15)
16
…
…
20
…
18
…
16
Case 2: 1 kid
Replace 16 with it’s only child
17
46
47
 Find
the node n w/ key to be deleted
 Different actions depending on
n’s # of kids
• case 1: n has 0 kids (it’s a leaf)
 set parent’s n-pointer (left or right) to null
• case 2: n has 1 kid
 set parent’s n-pointer to point to n’s only kid
• case 3: n has 2 kids
 replace n’s key with a key further down in the tree
 delete that node
48
 Case
1 (n is leaf) and case 2 (n has 1 kid)
both need to update the parents pointer
 How?
 Pass a reference to that pointer
49
// publicly visible method
void BST<K>::delete(const K& key){
delete(root_, key);
}
// private helper method
void BST<K>::delete( Node*& n,
const K& k){
// base case 1 (key not in tree)
if( n == null ){
return;
}
...
}
50
// private helper method
void BST<K>::delete( Node*& n,
const K& k){
...
if( isLessThan(k, n->key) ){
delete(n->left, k);
}else if( isLessThan(n->key, k) ){
delete(n->right, k);
}
...
}
51
// private helper method
void BST<K>::delete( Node*& n,
const K& k){
...
// case 3 (has two children)
else if( n->left != NULL &&
n->right != NULL ){
Node** tmp = findMin(&n->right);
n->key = (*tmp)->key;
// handles cases 1 & 2 for tmp
removeNodeSimple(*tmp);
}
...
}
52
// private helper method
void BST<K>::delete( Node*& n,
const K& k){
...
else{
// handles cases 1 & 2
removeNodeSimple(n);
}
...
}
53
// private helper method
// should only be called on nodes with
// 0 or 1 children
void removeNodeSimple(Node*& n){
// left as an exercise
}
54
 Find
the node to be deleted
• follow path from root to desired node
 Delete
node
• worst-case: 2 kids
• find smallest key in right subtree
55
 Worst-case
(delete deepest leaf)
 Root
to leaf
 H = Height of Tree
• O(H)
 Balanced
tree
• O(logN) where N is keys in tree
 O(N)
for a completely unbalanced tree
56
 Some
BSTs have two data values
• a key: used to lookup
• a value: some data associated with that key
• used to implement maps
57
 In
class BST<K,V,Compare>
this case:
• two generic types
• binary node has four
K key_;
V value_;
Bnode* left_;
Bnode* right_;
insert( const K& key,
const V& value);
V& lookup(const K& key);
V delete(const K& key);
member variables
• insert has two
parameters
• lookup returns value
for given key
• delete might return
value too
58
 BSTs store comparable keys
• and associated values if desired
 Lookup, insert, delete are easy
to
implement (sort of)
 All are worst-case O(Height of tree)
• O(N) for unbalanced
• O(logN) for balanced
• O(logN) is waaaaay better than O(N)
 If
only we could guarantee a tree was
balanced…
59
60
Download