15-211 Fundamental Structures of Computer Science Binary Search Trees

advertisement
Binary Search Trees
15-211
Fundamental Structures
of Computer Science
Ananda Guna
Jan. 23, 2003
Based on lectures given by Peter Lee, Avrim Blum, Danny Sleator, William Scherlis,
Ananda Guna & Klaus Sutner
First a Review of Stacks and Queues
A Stack interface
public interface Stack {
public void push(Object x);
public void pop();
public Object top();
public boolean isEmpty();
public void clear();
}
Stacks are LIFO
Push operations:
e
d
c
b
a
Stacks are LIFO
Pop operation:
e
d
c
b
a
Last element
that was pushed
is the first to be
popped.
A Queue interface
public interface Queue {
public void enqueue(Object x);
public Object dequeue();
public boolean isEmpty();
public void clear();
}
Queues are FIFO
back
front
k
r
q
c
m
Queues are FIFO
Enqueue operation:
back
y
front
k
r
q
c
m
Queues are FIFO
Enqueue operation:
back
front
y
k
r
q
c
m
Queues are FIFO
Dequeue operation:
back
front
y
k
r
q
c
m
Implementing stacks, 1
Linked representation.
All operations constant
time, or O(1).
c
b
a
Implementing stacks, 2
An alternative is to use an array-based
representation.
a
b
c
top
What are some advantages and
disadvantages of an array-based
representation?
A queue from two stacks
Enqueue:
Dequeue:
j
a
i
b
h
c
g
d
f
e
What happens
when the stack
on the right
becomes empty?
Now to Trees
CS is upside down
root
leaves
Trees are everywhere
Trees are everywhere in life.
As a result, in computer programs,
trees turn out to be one of the most
commonly used data structures.
Arithmetic Expressions
+
*
2
5
7
Game trees
Directory structure
/afs
cs
usr
andrew
acs
course
15
127
18
211
usr
Tree Definitions
A tree is a set of nodes and a set
of directed edges that connects
pairs of nodes.
A tree is a a Directed, Acyclic Graph
(DAG) with the following properties

- one vertex is distinguished as
the root; no edges enter this vertex

- every other vertex has
exactly one entering edge
Trees, more abstractly
A tree is a directed graph with the
following characteristics:
There is a distinguished node called the
root node.
Every non-root node has exactly one
parent node (the root has none).
A closer look at Trees
R
siblings
T2
T1
T3
Unique parents
a
b
e
c
f
root
d
Implementation of Trees
How do we implement a general
tree? Eg: A file system
Each node will have two links
One to its left most child
One to its right sibling
Implementation of a binary tree
with an array
Assume that the left child of node i
(i=1….) is stored at 2i and right child
of node I is stored at 2i+1
Draw the tree represented by the
following array (assume indices start
from 1)
12 10 15 8 11 14 18
Question: What is the minimum
height of a binary tree with n nodes?
What is the maximum height?
Binary Tree Traversals
 Inorder – Left-Root-Right
Use stack or recursion
 PreOrder – Root-Left-Right
Use Stack or recursion
 PostOrder-Left-Right-Root
Use Stack or recursion
 Level Order Traversal
Use a queue
 What is the output of each of the
traversal?
(see next slide for BFS in a tree)
Algorithm for Breadth-first traversal (of a
tree using a queue)
enqueue the root
while (the queue is not empty)
{
dequeue the front element
print it
enqueue its left child (if present)
enqueue its right child (if present)
}
Facts and Questions About Trees
 A path from node n1 to nk is defined as a path
n1,n2, …. nk such that ni is the parent of ni+1
 Depth of a node is the length of the path from
root to the node. What is the depth of root?
What is the maximum depth of a tree with N
nodes?
 What is the number of edges in a tree with N
nodes?
 Height of a node is length of a path from node to
the deepest leaf. The height of the tree is the
________________?
 Let T(n) be the number of null pointers in a tree
of n nodes. Show that T(n) = n + 1
Time to think about complexity of
Algorithms
Considering algorithms
Is the approach correct?
How fast does it run?
How much memory does it use?
Can I finish writing the code in the next 8
hours?
 What is most important?
 Consider fib(n) = fib(n-1)+fib(n-2) for n >= 2
fib(0)=fib(1)=1
Lets look at a simple algorithm
Fibonacchi Tree
Closed form
public static long fib(int n) {
if (n <= 1) return 1;
return fib(n-1) + fib(n-2);
}
F(5)
F(3)
F(4)
F(3)
F(2)
F(1)
F(1)
F(1)
F(1)
F(2)
F(2)
F(0)
F(1)
F(0)
F(0)
 It turns out the number of function calls is proportional to fib(n)
itself! In fact, it's exactly 2*fib(n) - 1.
 fib(90) takes about 7000 years on 1Ghz machine.
Making Fibonacci more efficient
 Can we write a better algorithm?
 Can we reuse some of the parts of the recursion?
 // call initially as fastfib(0,1,n)
public static long fastfib(long prev, long current, int togo)
{
if (togo <= 0) return current;
return fastfib(current, current+prev, togo-1);
}
 What is the complexity of this algorithm?
A question about height and number
of nodes in a binary tree
 Suppose we have n nodes in a complete binary
tree of height h. What is the relation between n
and h?
 The number of nodes in level i is 2i (i=0,1,…,h)
 Therefore total nodes in all levels is
 So what is the relation between n and h?
 A binary tree is completely full if it is of height,
h, and has 2h+1-1 nodes.
Bit about asymptotic analysis
O notation:
T(n) is O(f(n)) if there exist two positive
constants c and n0 such that T(n) <= c*f(n) for
all n > n0
Omega notation:
T(n) is Omega(f(n)) if there exist two positive
constants c and n0 such that T(n) >= c*f(n) for
all n > n0
Theta notation:
T(n) is Theta(f(n)) if it is both O(f(n)) AND
Omega(f(n)).
“Big-Oh” notation
T(N) = O(f(N))
“T(N) is order f(N)”
cf(N)
running time
T(N)
n0
N
Some examples
If f(n) = 10n + 5 and g(n) = n
show f(n) is O(g(n))
f(n) = 3n2 + 4n + 1. Show f(n) is O(n2)
show that 5log(n) is O(n)
f(n) = 3n2 + 4n + 1. Show f(n) is (n2)
Therefore f(n) = theta(n2)
Logarithms and exponents
Logarithms and exponents are
everywhere in algorithm analysis
logba = c
if
a = bc
Logarithms and exponents
Usually will leave off the base b
when b=2, so for example
log 1024 = 10
Some useful equalities
logbac = logba + logbc
logba/c = logba - logbc
logbac = clogba
logba = (logca) / logcb
(ba)c = bac
babc = ba+c
ba/bc = ba-c
Big-Oh again
When T(N) = O(f(N)), we are saying
that T(N) grows no faster than f(N).
I.e., f(N) describes an upper bound on
T(N).
Put another way:
For “large enough” inputs, cf(N) always
dominates T(N).
Called the asymptotic behavior
Big-O characteristic
If T(N) = cf(N) then
T(N) = O(f(N))
Constant factors “don’t matter”
Because of this, when T(N) =
O(cg(N)), we usually drop the
constant and just say O(g(N))
Big-O characteristic
Suppose T(N)= k, for some constant k
Then T(N) = O(1)
Big-O characteristic
More interesting:
Suppose T(N) = 20n3 + 10nlog n + 5
Then T(N) = O(n3)
Lower-order terms “don’t matter”
Question:
What constants c and n0 can be used to
show that the above is true?
Answer: c=35, n0=1
Big-O characteristic
If T1(N) = O(f(N)) and T2(N) = O(g(N))
then
T1(N) + T2(N) = max(O(f(N)), O(g(N)).
The bigger task always dominates
eventually.
Also:
T1(N)  T2(N) = O(f(N)  g(N)).
Some common functions
1200
1000
800
10N
100 log N
5 N^2
N^3
2^N
600
400
200
0
1
2
3
4
5
6
7
8
9
10
BST-An Inductive Perspective
Let's focus on binary trees (left/right child only).
A binary tree is either
• empty (we'll write nil for clarity), or
• looks like (x,L,R) where
x is the element stored at the root, and
L, R are the left and right subtrees of the root.
In Pictures
x
Empty Tree
R
L
Flattening a BT
a
T
b
e
flat(T) = e, b,f,a,d,g
d
f
g
Def: Binary Search Tree
A binary T is a binary search tree (BST) iff
flat(T) is an ordered sequence.
Equivalently, in (x,L,R) all the nodes in L are
less than x, and all the nodes in R are larger
than x.
Example
5
3
2
7
4
6
flat(T) = 2,3,4,5,6,7,9
9
Why do we care?
versus
Binary Search
How does one search in a BST?
search(x,nil) = false
search(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<a
search(x,(a,L,R)) = search(x,R) x>a
should return value
Correctness
Clearly, search() can never return a false positive
answer.
But search() only walks down one branch, so how
do we know we don't get false negative answers?
Suppose T is a BST that contains x.
Claim: search(x,T) properly returns "true".
Proof
T cannot be nil, so suppose T = (a,L,R).
Case 1: x = a:
done.
Case 2: x < a: Since T is a BST, x must be in L.
But by induction (on trees), search(x,L) returns
true. Done.
Case 3: x > a: same as case 2.
Insertions
Insertions in a BST are very similar to
searching: find the right spot, and then put
down the new element as a new leaf.
We will not allow multiple insertions of the
same element, so there is always exaxtly one
place for the new guy.
How Many?
How many decisions do we have to make before
we have either found the element, or know it's
not in the tree?
We walk down a branch in the tree, so the worst
case RT for search is
O( depth of T ) = O( # nodes )
Good Tree
But in a "good" BST we have
depth of T = O( log # nodes )
Theorem: If the tree is constructed from n inputs
given in random order, then we can expect the depth
of the tree to be log2 n.
But if the input is already (nearly, reverse,…) sorted
we are in trouble.
Forcing good behavior
It is clear (?) that for any n inputs, there always is
a BST containing these elements of logarithmic
depth.
But if we just insert the standard way, we may build
a very unbalanced, deep tree.
Can we somehow force the tree to remain shallow?
At low cost?
AVL-Trees
G. M. Adelson-Velskii and E. M. Landis, 1962
1 or less
Next Week
More about AVL trees on tuesday
Homework 1 is due Monday 27th.
This is a good time to catch up with Java
deficiencies, if any.
Download