Document

advertisement
BST Data Structure
 A BST
node contains:
– A key (used to search)
– The data associated with that key
– Pointers to children, parent
• Leaf nodes have NULL pointers for children
 A BST
contains
– A pointer to the root of the tree.
1
BST Operations: Insert

BST property must be maintained
 Algorithm sketch:
– To insert data with key k
– Compare k to root.key
– If k < root.key, go left
– If k > root.key, go right
– Repeat until you reach a leaf. That's where the
new node should be inserted.
• Note: keep track of prospective parent along the
way.
2
BST Operations: Insert

Running time:
– The new node is inserted at a leaf position, so
this depends on the height of the tree.

Worst case:
– Inserting keys 1,2,3,... in this order will result
in a tree that looks like a chain:
• Tree has degenerated to list
1
• Height : linear
• Note also that such a tree is worse
than a linked list since it takes up
more space (more pointers)
2
3
3
BST Operations: Insert

Running time:
– The new node is inserted at a leaf position, so
this depends on the height of the tree.

Best case
– The top levels of the tree are
filled up completely
– The height is then logn
where n is the number
of nodes in the tree.
2
12
4
14
8
16
4
BST Operations: Insert

The height of a complete (i.e. all levels
filled up) BST with n nodes is logarithmic.
Why?
– Level i has 2i nodes,
for i=0 (top level) through h (=height)
– The total number of nodes, n, is then:
n = 20+21+...+2h
= (2h+1-1)/(2-1)
= 2h+1-1
Solving for h gives us h  logn
5
BST Operations: Insert
 Analysis
conclusion
– An insert operation consists of two parts:
• Search for the position
– best case logarithmic
– worst case linear
• Physically insert the node
– constant
6
BST Operations: Insert

What if we allow duplicate keys?
– Idea #1 : Always insert in the right subtree
• Results in very unbalanced tree
– Idea #2 : Insert in alternate subtrees
• Makes it difficult to search for all occurrences
– Idea #3 : All elements with the same key
are inserted in a single node
• Good idea!
– Easy to search, does not affect balance any more than
non-duplicate insertion.
7
BST Operations: Insert

What if we allow variable number of
children? (n-ary tree)
– Idea : Use a vector/list of pointers to children.
8
BST Operations: Search
 Take
advantage of the BST property.
 Algorithm sketch:
– Compare target to root
– If equal, return success
– If target < root, search left
– If target > root, search right
 Running
time:
– Similar to insert
9
BST Operations: Delete
 The
Delete operation consists of two
parts:
– Search for the node to be deleted
• best case constant (deleting the root)
• worst case linear
– Delete the node
• best case?
• worst case?
10
BST Operations: Delete
 CASE
#1
– The node to be deleted is a leaf node.
– Easy!
• Physically remove the node.
• Constant time
– We are just resetting its parent's child pointer
and deallocating memory
11
BST Operations: Delete
 CASE
#2
– The node to be deleted has exactly one
child
– Easy!
• Physically remove the node.
• Constant time
– We are just resetting its parent's child pointer,
its child's parent pointer and deallocating
memory
12
BST Operations: Delete

CASE #3
– The node to be deleted has two children
– Not so easy
• If we physically delete the node, we'll have to place
its two children somewhere. This seems to require
too much tree restructuring.
• But we know it's easy to delete a node that has at
most one child. What if we find such a node whose
contents can be copied over without violating the
BST property and then physically delete that node?
13
BST Operations: Delete

CASE #3, continued
– The node to be deleted, x, has two children
– Idea:
• Find the x's immediate successor, y. It is
guaranteed to have at most one child
• Copy the y's contents over to x
• Physically delete y.
14
BST Operations: Delete
 Finding
the immediate successor:
– We know that the node has two children.
Due to the BST property, the immediate
successor will be in the right subtree.
– In particular, the immediate successor
will be the smallest element in the right
subtree.
– The smallest element in a BST is always
the leftmost leaf.
15
BST Operations: Delete
 Finding
the immediate successor:
– Since it requires traveling down the tree
from the current node to a leaf, it may
take up to linear time in the worst case.
– In the best case it will take logarithmic
time.
– The time to perform the copy and delete
the successor is constant.
16
Binary Search Trees
Traversing a tree = visiting its nodes
 Three major ways to traverse a binary tree:

•preorder
•visit root
•visit left subtree
•visit right subtree
•inorder
•visit left subtree
•visit root
•visit right subtree
•postorder
•visit left subtree
•visit right subtree
•visit root
When applied on a BST, it visits
the nodes in order from smaller to
larger
17
Binary Search Trees
void print_inorder(Node *subroot ) {
if (subroot != NULL) {
print_inorder(subroot  left);
cout << subrootdata;
print_inorder(subroot right);
}
}
How long does this take?
There is exactly one call to print_inorder() for each node of the
tree.
There are n nodes, so the running time of this operation is (n)
18
Binary Search Trees

A tree may also be traversed one "level" at
a time (top to bottom, left to right). This is
usually called a level-order traversal.
– It requires the use of a temporary queue:
enqueue root
while (queue is not empty) {
get the front element, f
print f
enqueue f's children
dequeue
}
19
Binary Search Trees
12
4
2
14
8
6
16
10
in-order : 2 - 4 - 6 - 8 - 10 - 12 - 14
pre-order: 12 - 4 - 2 - 8 - 6 - 10 - 14 - 16
post-order: 2 - 6 - 10 - 8 - 4 - 16 - 14 - 12
level-order: 12 - 4 - 14 - 2 - 8 - 16 - 6 - 10
20
Binary Search Trees

Idea for sorting algorithm:
– Given a sequence of integers, insert each one in a
BST
– Perform an inorder traversal. The elements will be
accessed in sorted order.

Running time:
– In the worst case, the tree will degenerate to a list.
Creation will take quadratic time and traversal will be
linear. Total: O(n2)
– On average, the tree will be mostly balanced. Creation
will take O(nlogn) and traversal will again be linear.
Total: O(nlogn)
21
BSTs vs. Lists

Time
– In the worst case, all dictionary operations are linear.
– On average, BSTs are expected to do better.

Space
– BSTs store an additional pointer per node.

The BST seemed like a good idea, but in the end
it doesn't offer much improvement.
– We must find a way to keep the tree balanced and
guarantee logarithmic height.
22
Balanced Trees
There are several ways to define balance
 Examples:

– Force the subtrees of each node to have
almost equal heights
– Place upper and lower bounds on the heights
of the subtrees of each node.
– Force the subtrees of each node to have
similar sizes (=number of nodes)
23
Download