CSCI 333: Optimal Binary Search Tree Binary search trees A binary search tree is a binary tree of items that come from an ordered set, such that 1. Each node contains one key. 2. The keys in the left sub-tree of a given node are less than or equal to the key in that node. 3. The keys in the right sub-tree of a given node are greater than or equal to the key in that node. check it out A tree is a graph with a specialized structure. The depth or the level of a node in a tree is the number of edges in the unique path from the root to the node. The depth of a tree is the maximum depth of all nodes in the tree. A tree is said to be balanced if the depth of the two sub-trees of every node never differ by more than 1. A binary search tree that is organized in a way such that the average time it takes to locate (search for) a key is minimized is called optimal. Constructing the optimal binary search tree is an optimization problem and the principle of optimality must apply; more about this later. Searching the Tree We will use a procedure search to locate a key in a binary search tree. The number of comparisons done by the procedure search to locate a key is called the search time. Our goal is to determine a tree for which the average search time is minimal. void search (node-pointer tree, keytype keyin, node-pointer & p) { bool found; p = false; while (! found) if (p->key == keyin) found = true; else if ( keyin < p->key); p = p->left else p = p->right; } Assume a branch statement is used to implement the nested if-else statement and only one comparison is performed per each iteration of the while loop. The search time (i.e., the number of comparisons) for a given key is: depth(key) + 1 Let Key1, Key2, . . . , Keyn be the n keys in order, and let pi be the probability that Keyi is the search key. If ci is the number of comparisons needed to find Keyi in a given tree, the average search time for that tree is: ∑ cipi for i = 1 to i = n where ci = depth(keyi) + 1 Example: Consider these trees with n = 3. (Values of keys are not important; the only requirement is that they be ordered). We have p1 = 0.7, p2 = 0.2 and p3 = 0.1 The average search time for the above trees is: 1. 3 (0.7) + 2 (0.2) + 1 (0.1) = 2.6 2. 2 (0.7) + 1 (0.2) + 2 (0.1) = 1.8 3. 1 (0.7) + 2 (0.2) + 3 (0.1) = 1.4 The last tree is optimal. Constructing the optimal binary search tree Let A[i][j] represent the average number of comparisons for searching an optimal tree containing keys i through j. A[i][j] = ∑ pkck = ∑ pk (depthk +1) for k=i to j This is the value that we seek to minimize in an optimal search tree. Because it takes one comparison to locate a key in a tree containing one node, A[i][i] =1. Criterion for an optimal tree (principle of optimality): Each optimal binary search tree is composed of a root and (at most) two optimal subtrees, the left and the right. Or, stated in another way: in an optimal tree, all the subtrees must also be optimal with respect to the keys that they contain. Let mij = ∑ pk for k=i to j and A[i][j] be the average number of comparisons carried out in an optimal subtree containing the keys keyi, keyi+1, keyi+2, …, keyj (as defined before). One of the keys, say k, must occupy the root of the subtree, as shown below. keyk L R keyk+1 … keyj keyi … keyk-1 When we look for a key in the main tree, the probability that it is in the sequence keyi, keyi+1, keyi+2, …, keyj is mij . In this case, one comparison is made with cost ck and the others may then be made in L or R. The average number of comparisons carried out is therefore: A[i][j] = mij + A[i][k-1] + A[k+1][j] To obtain a dynamic programming scheme, the root, k, must be chosen to minimize A[i][j]: A[i][j] = mij + min i<=k<=j (A[i][k-1] + A[k+1][j]) Optimal Binary Search Tree Algorithm Input: n, the number of keys and an array of real numbers, p, indexed from 1 to n, where p[i] is the probability of searching for the ith key. (Provided the keys are sorted, we do not need their exact values.) Output: A variable minavg, whose value is the average search time for an optimal binary search tree, and a 2-D array R where the row space of R is indexed from 1 to n+1, and the column space is indexed from 0 to n. R[i][j] is the index of the key in the root of the tree containing keys i through j. void optsearchtree (int n, const float p[], float & minavg, index R[][]) { index i, j, k, diagonal; float A[1, …, n+1][0 … n]; for (i = 1; i <= n; i++) { A[i][i-1] = 0; A[i][i]= p[i]; R[i][i] = i; R[i][i-1]=0; } A[n+1][n] = 0; R[n+1][n] = 0; for (diagonal = 1; diagonal <= n-1; diagonal++) for (i = 1; i <= n – diagonal; i++) { j = i + diagonal; A[i][j] = min i<=k<=j (A[i][k-1]+A[k+1][j]) + ∑ pm for m=i to j R[i][j] = k ; // the value of k that gave the minimum } minavg = A[1][n]; } Work problem 20 to illustrate this algorithm. Time Complexity In the above algorithm, we calculate the value of A[i][j] first for j-i=1, then for j-i=2, and so on. When j-i=m, there are n-m values of A[i][j] to caluculate, each involving a choice among m+1 possibilities. The required computation time is therefore: ∑ (n-m)(m+1) for m=1 to n-1 = (n3) Building the Optimal Search Tree Inputs: n, the number of keys, and array, Key containing the n keys in order, and the array R produced previously. Outputs: a pointer to an optimal binary search containing the n keys. node_pointer tree(index i, j) { index k; node_pointer p; } k = R[i][j]; if (k==0) return NULL; else { p = new nodetype; p->key = Key[k]; p->left = tree(i, k-1); p->right = tree(k+1, j); return p; } Illustrate this algorithm using the results for problem 20 obtained above. Work problem 23.