Optimal Binary Search Trees A greedy strategy Build a binary search tree with keys A, B, C, and D which have probabilities p(A) = p(C) = 1/3 and p(B) = p(D) = 1/6. But, if A is at the root we get: Another Greedy Approach key k1 < p .3 k2 < .15 k3 < .18 A different strategy key k1 k2 k3 k4 k5 cost: left right k4 < .25 k5 .12 Another approach for finding an optimal BST key k1 < p .3 k2 < .15 k3 < .18 A bottom up approach k4 < .25 k5 .12 Building the Optimal Tree Let Ti,j denote the subtree that contains keys from i to j inclusive. cost(Ti,i) = cii = j cost(Ti,j) = min [cost(Ti,k-1) + cost (Tk+1,j) + pm ] k m=i n cost(T1,n) = min [cost(Ti,k-1) + cost (Tk+1,j)] + pm 1kn key k1 < p .3 1 .3 2 k2 < .15 3 m=1 k3 < .18 4 k4 < .25 5 1 .15 2 .18 3 .25 4 .12 c1,2 = min( c2,3 = min( c3,4 = c4,5 = k5 .12 5 Fill in the table as follows: 1. Fill in the ci,i values along the diagonal. 2. Next, calculate all values of the form ck,k+1 Remaining calculations Trees with three nodes c1,3 = min[c1,0 + c2,3, c1,1 + c3,3, c1,2 + c4,3] + p1 + p2 + p3 = min[0 + .48, .3 + .18, .6] + .3 +.15 +.18 = 1.11 c2,4 = min[c2,1 + c3,4, c2,2 + c4,4, c2,3 + c5,4] + p2 + p3 + p4 = min[0 + .61, .15 + .25, .48 + 0] + .15 + .18 + .25 = .98 c3,5 = Trees with four nodes c1,4 = min[c1,0 + c2,4, c1,1 + c3,4, c1,2 + c4,4, c1,3 + c5,4] + p1 + p2 + p3 + p4 = min[.98, .91, .85, 1.11] + .88 = 1.73 c2,5 = c1,5 = Constructing the actual tree Two approaches used to avoid having a badly balanced tree: 1. 2. 1 .14 k=1 2 .22 k=1 .04 k=2 3 .54 k=1 .22 k=3 .14 k=3 4 .84 k=3 .52 k=3 .43 k=4 .15 k=4 5 1.08 k=3 .71 k=4 .59 k=4 .31 k=4 .08 k=5 6 1.16 k=3 .77 k=4 .65 k=4 .37 k=4 .12 k=5 .02 k=6 7 1.59 k=4 1.13 k=4 1.01 k=4 .70 k=5 .35 k=7 .17 k=7 .13 k=7 8 9 2.10 k=4 1.64 k=4 1.51 k=7 1.09 k=7 .69 k=7 .49 k=8 .43 k=8 .17 k=8 2.14 k=4 1.68 k=4 1.54 k=7 1.12 k=7 .72 k=7 .51 k=8 .45 k=8 .19 k=8 .01 k=9 Example adapted from Introduction to Algorithms in Pascal by Parsons, Wiley 1995 Balanced Trees The internal path length of a tree Let D(n) denote the internal path length for a tree with n nodes. Recurrence relation for D(n) Basis: D(1) =