Optimal Binary Search Trees

advertisement
Optimal Binary Search Trees
A greedy strategy
Build a binary search tree with keys A, B, C, and D which have
probabilities p(A) = p(C) = 1/3 and p(B) = p(D) = 1/6.
But, if A is at the root we get:
Another Greedy Approach
key k1 <
p
.3
k2 <
.15
k3 <
.18
A different strategy
key
k1
k2
k3
k4
k5
cost:
left
right
k4 <
.25
k5
.12
Another approach for finding an optimal BST
key k1 <
p
.3
k2 <
.15
k3 <
.18
A bottom up approach
k4 <
.25
k5
.12
Building the Optimal Tree
Let Ti,j denote the subtree that contains keys from i to j inclusive.
cost(Ti,i) = cii =
j
cost(Ti,j) = min [cost(Ti,k-1) + cost (Tk+1,j) +  pm ]
k
m=i
n
cost(T1,n) = min [cost(Ti,k-1) + cost (Tk+1,j)] +  pm
1kn
key k1 <
p
.3
1
.3
2
k2 <
.15
3
m=1
k3 <
.18
4
k4 <
.25
5
1
.15
2
.18
3
.25
4
.12
c1,2 = min(
c2,3 = min(
c3,4 =
c4,5 =
k5
.12
5
Fill in the table as follows:
1. Fill in the ci,i values along
the diagonal.
2. Next, calculate all values
of the form ck,k+1
Remaining calculations
Trees with three nodes
c1,3 = min[c1,0 + c2,3, c1,1 + c3,3, c1,2 + c4,3] + p1 + p2 + p3
= min[0 + .48, .3 + .18, .6] + .3 +.15 +.18 = 1.11
c2,4 = min[c2,1 + c3,4, c2,2 + c4,4, c2,3 + c5,4] + p2 + p3 + p4
= min[0 + .61, .15 + .25, .48 + 0] + .15 + .18 + .25 = .98
c3,5 =
Trees with four nodes
c1,4 = min[c1,0 + c2,4, c1,1 + c3,4, c1,2 + c4,4, c1,3 + c5,4] + p1 + p2 + p3 + p4
= min[.98, .91, .85, 1.11] + .88 = 1.73
c2,5 =
c1,5 =
Constructing the actual tree
Two approaches used to avoid having a badly balanced tree:
1.
2.
1
.14
k=1
2
.22
k=1
.04
k=2
3
.54
k=1
.22
k=3
.14
k=3
4
.84
k=3
.52
k=3
.43
k=4
.15
k=4
5
1.08
k=3
.71
k=4
.59
k=4
.31
k=4
.08
k=5
6
1.16
k=3
.77
k=4
.65
k=4
.37
k=4
.12
k=5
.02
k=6
7
1.59
k=4
1.13
k=4
1.01
k=4
.70
k=5
.35
k=7
.17
k=7
.13
k=7
8
9
2.10
k=4
1.64
k=4
1.51
k=7
1.09
k=7
.69
k=7
.49
k=8
.43
k=8
.17
k=8
2.14
k=4
1.68
k=4
1.54
k=7
1.12
k=7
.72
k=7
.51
k=8
.45
k=8
.19
k=8
.01
k=9
Example adapted from Introduction to Algorithms in Pascal by Parsons, Wiley 1995
Balanced Trees
The internal path length of a tree
Let D(n) denote the internal path length for a tree with n nodes.
Recurrence relation for D(n)
Basis: D(1) =
Download