CSCI 333: Optimal Binary Search Tree Binary search trees A binary

advertisement
CSCI 333: Optimal Binary Search Tree
Binary search trees
A binary search tree is a binary tree of items that come from an
ordered set, such that
1. Each node contains one key.
2. The keys in the left sub-tree of a given node are less than or
equal to the key in that node.
3. The keys in the right sub-tree of a given node are greater than
or equal to the key in that node.
check it out
A tree is a graph with a specialized structure. The depth or the level of
a node in a tree is the number of edges in the unique path from the
root to the node. The depth of a tree is the maximum depth of all
nodes in the tree. A tree is said to be balanced if the depth of the two
sub-trees of every node never differ by more than 1.
A binary search tree that is organized in a way such that the average
time it takes to locate (search for) a key is minimized is called optimal.
Constructing the optimal binary search tree is an optimization problem
and the principle of optimality must apply; more about this later.
Searching the Tree
We will use a procedure search to locate a key in a binary search tree.
The number of comparisons done by the procedure search to locate a
key is called the search time. Our goal is to determine a tree for which
the average search time is minimal.
void search (node-pointer tree, keytype keyin, node-pointer & p)
{
bool found;
p = false;
while (! found)
if (p->key == keyin)
found = true;
else if ( keyin < p->key);
p = p->left
else
p = p->right;
}
Assume a branch statement is used to implement the nested if-else
statement and only one comparison is performed per each iteration of
the while loop. The search time (i.e., the number of comparisons) for
a given key is:
depth(key) + 1
Let Key1, Key2, . . . , Keyn be the n keys in order, and let pi be the
probability that Keyi is the search key. If ci is the number of
comparisons needed to find Keyi in a given tree, the average search
time for that tree is: ∑ cipi for i = 1 to i = n where ci = depth(keyi) + 1
Example:
Consider these trees with n = 3. (Values of keys are not important;
the only requirement is that they be ordered). We have p1 = 0.7, p2 =
0.2 and p3 = 0.1
The average search time for the above trees is:
1. 3 (0.7) + 2 (0.2) + 1 (0.1) = 2.6
2. 2 (0.7) + 1 (0.2) + 2 (0.1) = 1.8
3. 1 (0.7) + 2 (0.2) + 3 (0.1) = 1.4
The last tree is optimal.
Constructing the optimal binary search tree
Let A[i][j] represent the average number of comparisons for searching
an optimal tree containing keys i through j.
A[i][j] = ∑ pkck = ∑ pk (depthk +1) for k=i to j
This is the value that we seek to minimize in an optimal search tree.
Because it takes one comparison to locate a key in a tree containing
one node, A[i][i] =1.
Criterion for an optimal tree (principle of optimality):
Each optimal binary search tree is composed of a root and (at
most) two optimal subtrees, the left and the right. Or, stated in
another way: in an optimal tree, all the subtrees must also be
optimal with respect to the keys that they contain.
Let mij = ∑ pk for k=i to j and A[i][j] be the average number of
comparisons carried out in an optimal subtree containing the keys
keyi, keyi+1, keyi+2, …, keyj (as defined before). One of the keys, say
k, must occupy the root of the subtree, as shown below.
keyk
L
R
keyk+1 … keyj
keyi … keyk-1
When we look for a key in the main tree, the probability that it is in
the sequence keyi, keyi+1, keyi+2, …, keyj is mij . In this case, one
comparison is made with cost ck and the others may then be made in L
or R. The average number of comparisons carried out is therefore:
A[i][j] = mij + A[i][k-1] + A[k+1][j]
To obtain a dynamic programming scheme, the root, k, must be
chosen to minimize A[i][j]:
A[i][j] = mij + min
i<=k<=j
(A[i][k-1] + A[k+1][j])
Optimal Binary Search Tree Algorithm
Input: n, the number of keys and an array of real numbers, p, indexed
from 1 to n, where p[i] is the probability of searching for the ith key.
(Provided the keys are sorted, we do not need their exact values.)
Output: A variable minavg, whose value is the average search time for
an optimal binary search tree, and a 2-D array R where the row space
of R is indexed from 1 to n+1, and the column space is indexed from 0
to n. R[i][j] is the index of the key in the root of the tree containing
keys i through j.
void optsearchtree (int n, const float p[], float & minavg, index R[][])
{
index i, j, k, diagonal;
float A[1, …, n+1][0 … n];
for (i = 1; i <= n; i++) {
A[i][i-1] = 0;
A[i][i]= p[i];
R[i][i] = i;
R[i][i-1]=0;
}
A[n+1][n] = 0;
R[n+1][n] = 0;
for (diagonal = 1; diagonal <= n-1; diagonal++)
for (i = 1; i <= n – diagonal; i++) {
j = i + diagonal;
A[i][j] = min i<=k<=j (A[i][k-1]+A[k+1][j]) + ∑ pm for m=i to j
R[i][j] = k ; // the value of k that gave the minimum
}
minavg = A[1][n];
}
Work problem 20 to illustrate this algorithm.
Time Complexity
In the above algorithm, we calculate the value of A[i][j] first for j-i=1,
then for j-i=2, and so on. When j-i=m, there are n-m values of A[i][j]
to caluculate, each involving a choice among m+1 possibilities. The
required computation time is therefore:
 ∑ (n-m)(m+1) for m=1 to n-1 =  (n3)
Building the Optimal Search Tree
Inputs: n, the number of keys, and array, Key containing the n keys in
order, and the array R produced previously.
Outputs: a pointer to an optimal binary search containing the n keys.
node_pointer tree(index i, j)
{
index k;
node_pointer p;
}
k = R[i][j];
if (k==0)
return NULL;
else {
p = new nodetype;
p->key = Key[k];
p->left = tree(i, k-1);
p->right = tree(k+1, j);
return p;
}
Illustrate this algorithm using the results for problem 20 obtained
above. Work problem 23.
Download