K - PUC-Rio

advertisement
Binary Search Trees
Cormen (cap 12, Edition 3)
Estruturas de Dados e seus Algoritmos (Cap 4)
Dictionary Data Structures
Goal:
Design a data structure to store a small set of keys S={k1,k2,.. ,kn}
from a large universe U.
– It shall efficiently support
–
– Query(x): determine whether a key x is in S or not
– Insert(x): Add x to the set S if x is not there
– Delete(x): Remove x from S if x is there
Additional Goals
– Low memory consumption
– Efficient construction
2
Dictionary Data Structures
Linked Lists

Query(x): O(n) time

Insert(x): Insert at the beginning of the list: O(1) time

Delete(x): Find and then remove, O(n) time

Construction time: O(n) time

Space consumption: O(n)
3
Binary Search Trees

A Binary search tree(BST) T for a list K= (k1 < ··· < kn) of n keys is a
rooted tree that satisfies the following properties:
–
It has n internal nodes and n+1 leaves
–
Each internal node of T is associated with a distinct key
–
[Search Property] If node v is associated with key ki then:

all nodes at the left subtree of v are associated with keys
smaller than ki

all nodes at the right subtree of v are associated with keys
larger than ki
4
How does the BST works?
Search Property:
x
yx
How does the BST works?
Search Property:
x
yx
xz
Binary Search Trees

Binary Search Trees
k2
k1
k4
k3
k5
7
Binary Search Trees

Keys are elements from a totally ordered set U
–
U can be the set of integers
–
U can be the set of students from a university
8
Binary Search Trees

Additional Properties
–
The key with minimum value is stored in the leftmost node of the
tree
–
The key with maximum value is stored in the righttmost node of
the tree
9
Binary Search Trees

Basic Operations
–
Query(x): Determine whether x belongs to T or not
–
Insert(x): if x is not in T, then insert x in T.
–
Delete(x): If x in T, then remove x from T
10
BST: Query(x)
Algorithm Query(x)
If x = leaf then
Return “element was not found”
End If
If x = root then
Return “element was found”
Else
if x < root then
search in the left subtree
else
search in the right subtree
End If
Binary Search Trees

Query(x)
k2
k4
k1
k3
k5
12
Inseting a new key
Add a new element in the tree at the correct position in order to keep
the search property.
Algorithm Insert(x, T)
If x = root.key then
Return ‘x is already in T’
End If
If root(T) is a leaf then
Associate the leaf with x
Return
End If
If x < root.key then
Insert (x, left tree of T)
else
Insert (x, right tree of T)
End If
Inseting a new key
Example: Insert(50), Insert(20), Insert(39), Insert(8),
Insert(79), Insert(26)
50
20
8
79
39
26
Removing a node in a BST
SITUATIONS:
Removing a leaf
Removing an internal node with a unique child
Removing an internal node with two children
Removing a Leaf
6
8
2
1
4
3
Removing a Leaf
6
8
2
1
4
3
Removing a Leaf
6
8
2
1
4
3
6
8
2
1
4
Removing a node in a BST
SITUATIONS:
Removing a leaf
Removing an internal node with a unique child
Removing an internal node with two children
Removing an internal node with a unique child
It is necessary to correct the pointer, “jumping” the node: the only
grandchild becomes the right (left) son.
Removing an internal node with a unique child
6
8
2
1
4
3
Removing an internal node with a unique child
6
8
2
1
4
3
Removing an internal node with a unique child
6
8
2
1
4
3
Removing an internal node with a unique child
6
8
2
1
4
3
6
8
2
1
3
Removing a node in a BST
SITUATIONS:
Removing a leaf
Removing an internal node with a unique child
Removing an internal node with two children
Removing an internal node with two children
•

•
Find the element which preceeds the element to be removed
considering the ordering
(this corresponds to remove the rightmost element from the left
subtree)
Switch the information of the node to be removed with the node
found
Removing an internal node with two children
6
8
2
1
4
3
Removing an internal node with two children
6
8
2
1
4
3
6
8
2
1
4
3
Removing an internal node with two children
6
8
2
1
4
3
Removing an internal node with two children
6
8
2
1
4
3
4
8
2
1
6
3
Removing an internal node with two children
4
8
2
1
6
3
Removing an internal node with two children
4
8
2
1
6
3
Removing an internal node with two children
4
8
2
1
6
3
4
8
2
1
6
3
Binary Search Trees: Operations Complexity

Basic Operations
–
Query(x): Determine whether x belongs to T or not
 Number of operations = O( height of T)
–
Insert(x): if x is not in T, then insert x in T.
 Number of operations = O( height of T)
–
Delete(x): If x in T, then remove x from T
 Number of operations = O( height of T)
–
Max(T) and Min(T)
 Number of operations = O( height of T)
34
Binary Search Trees: Operations Complexity


Basic Operations
–
Query(x): Determine whether x belongs to T or not
 Number of operations = O( height of T)
–
Insert(x): if x is not in T, then insert x in T.
 Number of operations = O( height of T)
–
Delete(x): If x in T, then remove x from T
 Number of operations = O( height of T)
–
Max(T) and Min(T)
 Number of operations = O( height of T)
Shallow trees are desirable
35
Binary Search Trees: Construction
•
Simple Approach: let k1 ,…, kn be the set of key not necessarily
ordered. Proceed as follows:
insert( k1 ), insert( k2 ) , . . . ,
insert( kn )
36
Example: 50, 20, 39, 8, 79, 26, 58, 15,
88, 4, 85, 96, 71, 42, 53.
50
20
8
4
79
39
15
26
58
42
53
88
71
85
96
Binary Search Trees: Construction
•
Simple Approach: let k1 ,…, kn be the set of key not necessarily
ordered. Proceed as follows:
insert( k1 ), insert( k2 ) , . . . ,
The
insert( kn )
structure has height O(n) if the set of keys is ordered.
For
a random permutation of the n first integers, its expected
height is O(log n) (Cormen, 12.3)
38
Binary Search Trees: Construction
•
Simple Approach: let k1 ,…, kn be the set of key not necessarily
ordered. Proceed as follows:
•
Sort the keys
BST(1:n)
root(T) ‘median key’
left(root) <- BST(1,n/2)
right(root) <- BST(n/2+1,n)
End
39
Relation between #nodes and height of a binary tree
At each level the number of nodes duplicates, such that for a binary
tree with height h we have at most:
20+ 21 + 22 + ...+ 2h-1 = 2h – 1 nodes
Relation between #nodes and height of a binary tree
At each level the number of nodes duplicates, such that for a binary
tree with height h we have at most:
20+ 21 + 22 + ...+ 2h-1 = 2h – 1 nodes
Or equivalently:
The height of every binary search tree with n nodes is at least log n
The tree may become unbalanced
Remove: node 8
6
8
2
1
4
3
6
2
1
4
3
The tree may become unbalanced
Remove: node 8
Remove node 1
6
8
2
1
4
3
6
2
1
4
3
The tree may become unbalanced
Remove: node 8
Remove node 1
6
8
2
1
4
3
6
6
2
2
1
4
4
3
3
The tree may become unbalanced
The binary tree may become degenerate after operations of insertion
and remotion: becoming a list, for example.
Balanced Trees
Cormen (cap 13, Edition 3)
Estruturas de Dados e seus Algoritmos (Cap 5)
AVL TREES
(Adelson-Velskii and Landis 1962)
BST trees that maintain a reasonable balanced tree all the time.
Key idea: if insertion or deletion get the tree out of balance then fix
it immediately
All operations insert, delete,… can be done on an AVL tree with N
nodes in O(log N) time (worst case)
AVL TREES
AVL Tree Property: It is a BST in which the
heights of the left and right subtrees of the
root differ by at most 1 and in which the right
and left subtrees are also AVL trees
Height: length of the longest path from the root to a
leaf.
AVL TREES: Example:
44
4
2
17
78
1
3
2
32
88
50
1
48
62
1
1
An example of an AVL tree where the heights are shown next
to the nodes:
AVL TREES
Other Examples:
AVL TREES
Other Examples:
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
T
r
Te
Td
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
T
r
Te
h-1
Td
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
T
r
Te
h-1
Td
h-1 ou h-2
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
T
r
Te
h-1
Nh ≥ 1 + Nh-1 + Nh-2
Td
h-1 ou h-2
It grows faster than Fibonacci series  Nh ≥ 1.618h-2
Relation between #nodes and height of na AVL tree
Let r be the root of an AVL tree of height h
Let Nh denote the minimum number of nodes in an AVL tree of height h
T
r
Te
h-1
Nh ≥ 1 + Nh-1 + Nh-2
Td
h-1 ou h-2
It grows faster than Fibonacci series  Nh ≥ 1.618h-2
Height of AVL Tree <= 1.44 log N (N is the number of nodes)
Height of AVL Tree
Height of the tree is O(logN)
Where N is the number of elements contained in the tree

This implies that tree search operations
Query(), Max(), Min() take O(logN) time.

Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
44
17
78
32
50
48
88
62
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
17
78
32
50
48
88
62
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
44
17
17
78
78
32
32
50
50
88
48
48
88
62
62
54
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
4
44
17
17
78
78
32
32
50
50
88
48
48
88
62
62
54
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
4
44
17
17
78
78
32
32
50
88
48
48
3
50
62
62
54
88
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
4
44
17
17
78
78
32
32
50
88
48
48
3
50
62
62
54
88
1
Insertion in an AVL Tree
Insertion is as in a binary search tree (always done by expanding an
external node)
Example:
Insert node 54
44
4
44
17
17
78
78
32
32
50
88
48
48
3
50
62
62
Unbalanced!!
54
88
1
How does the AVL tree work?
After insertion and deletion we will examine the tree structure and see if
any node violates the AVL tree property

If the AVL property is violated at node x, it means the heights of
left(x) and right(x) differ by exactly 2
If it does violate the property we can modify the tree structure using
“rotations” to restore the AVL tree property
Rotations
Two types of rotations
Single rotations
– two nodes are “rotated”
Double rotations
– three nodes are “rotated”


Localizing the problem
Two principles:
•
•
Imbalance will only occur on the path from the inserted/deleted
node to the root (only these nodes have had their subtrees altered
- local problem)
Rebalancing should occur at the deepest unbalanced node (local
solution too)
Single Rotation (Right): Case I
• Rotate x with left child y
•
x and y satisfy the AVL property after the rotation
Single Rotation (Left): Case II
• Rotate x with right child y
•
x and y satisfy the AVL property after the rotation
Single Rotation - Example
h
h+1
Tree is an AVL tree by definition.
Single Rotation - Example
h
h+2
Node 02 added
Tree violates the AVL definition!
Perform rotation.
Single Rotation - Example
x
y
h
h+1
h
B
A
Tree has this form.
C
Example – After Rotation
y
x
A
B
Tree has this form.
C
Single Rotation
Sometimes a single rotation fails to solve the problem
k1
k2
k1
Z
h+2
h
k2
h
X
h+2
X
Y

In such cases, we need to use a double-rotation
Z
Y
Double Rotations: Case IV
Double Rotations
h
h+1
Delete node 94
Tree is an AVL tree by definition.
Double Rotations
h+2
h
AVL tree is violated.
Double Rotations
x
y
z
C
A
B1
Tree has this form.
B2
After Double Rotations
z
y
A
Tree has this form
x
B1
B2
C
Insertion
We keep the height of each node x to check the AVL properrty
Part 1. Perform normal BST insertion
Part 2. Check AVL property and restore the property if necessary.
To check whether the AVL property persists we only need to check
the nodes in the path from the new leaf to the root of the BST because
the balance of the other nodes are not affected
–
–
Check if node x is balanced using the identity
Height(x) = 1 + max { Height (left(x)), Height(right(x) }
–
We should update the heights of the visited nodes in this process
Insertion: Part 2 Detailed
For each x in the path from the inserted leaf towards the root.
If the heights of left(x) and right(x) height differ at most by 1
Do ‘nothing’
Else we know that one of the subtrees of x has height h and the
other h+2
If the height of left(x) is h+2 then
–
If the height of left(left(x)) is h+1, we single rotate with left
child (case 1)
–
Otherwise, the height of right(left(x)) is h+1 and we double
rotate with left child (case 3)
Otherwise, the height of right(x) is h+2
–
If the height of right(right(x)) is h+1, then we rotate with
right child (case 2)
–
Otherwise, the height of left(right(x)) is h+1 and we double
rotate with right child (case 4)
Break For
Insertion: Correctness
•
Let x be the deepest node that does not satisfy the AVL property.
•
Assume that case 2 occurs (the new element is inserted in tree C)
• x and y satisfy the property after the rotation.
• The ancestors of x satisfy the property because the height(x)
before the insertion is h+2 and height(y) after the rotation is
also h+2
Insertion: Correctness
•
Let x be the deepest node that does not satisfy the AVL property.
•
Assume that case 2 occurs (the new element is inserted in tree C)
• The nodes in the path between the new element and y also
satisfy the AVL property due to the assumption that x is the
deepest node for which the AVL property does not hold
• Nodes that are not in the path from the root to the new
element are not affected
Insertion: Correctness
•
Let x be the deepest node that does not satisfy the AVL property.
•
Assume that case 4 occurs (the new element is inserted in tree B1)
• x, y and z satisfy the property after the rotation
• The ancestors of x are balanced after the rotation because the
height of x is h+2 before the insertion and the height of z is
h+2 after the rotation.
Insertion: Correctness
•
Let x be the deepest node that does not satisfy the AVL property.
•
Assume that case 4 occurs (the new element is inserted in tree B1)
• The remaining nodes in the path between the new element and x
also satisfy the property due to the assumption that x is the
deepest node that does not satisfy the AVL property
• The nodes that are not in the path between the new element
and x are not affected.
Insertion: Complexity
•
•
The time complexity to perform a rotation is O(1) since we just update
a few pointers
The time complexity to find a node that violates the AVL property
depends on the height of the tree, which is log(N)
Deletion
Perform normal BST deletion
Perform verification similar to those employed for the insertion to
restore the tree property
Summary AVL Trees
Maintains a Balanced Tree
Modifies the insertion and deletion routine
Performs single or double rotations to restore structure
Guarantees that the height of the tree is O(logn)
The guarantee directly implies that functions find(), min(), and max()
will be performed in O(logn)


Other Balanced trees
•
Red Black Trees (Cormen Cap 13, Jayme cap 6)
•
2-3 Trees (Hopcroft)
Dictionary Problem: non uniform access probabilities

We want to keep a data structure to support a sequence of
INSERT, QUERY, DELETE operations
–
Some elements are accessed much more often than others 
non-uniform access probabilities
104
Dictionary Problem: non uniform access probabilities
Consider the following AVL Tree
44
17
78
32
50
48
88
62
Dictionary Problem: non uniform access probabilities
Consider the following AVL Tree
44
17
78
32
50
48
88
62
Suppose we want to search for the following sequence of elements:
48, 48, 48, 48, 62, 62, 62, 48, 62.
Dictionary Problem: non uniform access probabilities
Consider the following AVL Tree
44
17
In this case,
is this a good structure?
78
32
50
48
88
62
Suppose we want to search for the following sequence of elements:
48, 48, 48, 48, 62, 62, 62, 48, 62.
Dictionary Problem: non uniform access probabilities
Consider the following AVL Tree
48
This structure is much better!
32
17
44
62
50
78
88
Suppose we want to search for the following sequence of elements:
48, 48, 48, 48, 62, 62, 62, 48, 62.
Dictionary Problem: non uniform access probabilities
Application: Building Inverted indexes

Given a text T, we want to design an inverted index S for T, that is,
a structure that maintains for every word x of T, the list of
positions where x occurs.
T
Positions
ALO ALO MEU AMIGO
123456789... 12
...
ALO

AMIGO 
MEU

….
ALO AMIGO MEU
30 34
40
1,4,30
12,34
9, 40
109
Dictionary Problem: non uniform access probabilities
Application: Building Inverted indexes

Given a text T, we want to design an inverted index S for T, that is,
a structure that maintains for every word x of T, the list of
positions where x occurs.
T
Positions
ALO ALO MEU AMIGO
123456789... 12
...
ALO

AMIGO 
MEU


….
ALO AMIGO MEU
30 34
40
1,4,30
12,34
9, 40
We do not know the list of words beforehand; some words may
occur much more frequently than others
110
Dictionary Problem: non uniform access probabilities
Static Case: distribution access probability is known beforehand
•
Lists
•
Optimal Binary Search Trees
Dynamic Case: distribution access probability is not known beforehand
•
Self Adjusted Lists
•
Self Adjusted Binary Search Trees
• Splay Trees
111
Dictionary Problem with non uniform access probabilities
Problem

Given sequence K = k1 < k2 <··· < kn of n sorted keys,
with a search probability pi for each key ki.
–


We assume that we always search an element that belongs to K.
This assumption can be easily removed.
Want to design a data structure with minimum expected search
cost.
Actual cost = # of items examined.
– For key ki , number of elements accessed before finding ki
112
Optimal Binary Search Trees
Cormen (cap 15.5, Edition 3)
Estruturas de Dados e seus Algoritmos (Cap 4)
Dictionary Problem: non uniform access probabilities
Approach 1: Linked lists



Put the elements with highest probabilities of being accessed at
the beginning of the list
Keys (1,2,3,4,5); p=(0.1, 0.3, 0.2, 0.05, 0.15)
Best possible linked list
 2  3  5 1  4
 Expected cost of accessing an element
= 1 x 0.3 + 2x0.2 + 3x0.15 + 4x0.1 + 5x0.05
114
Dictionary Problem with non uniform access probabilities
Approach 2: Binary Search Tree




Given sequence K = k1 < k2 <··· < kn of n sorted keys,
with a search probability pi for each key ki.
Want to build a binary search tree (BST)
with minimum expected search cost.
Actual cost = # of items examined.
For key ki ,
cost = depthT (ki) + 1,
where depthT(ki) = depth of ki in BST T . (root is at depth 0)
115
Expected Search Cost
E[search cost in T ]
n
  (depth T (ki )  1)  pi
i 1
n
n
i 1
i 1
  depth T (ki )  pi   pi
n
 1   depth T (ki )  pi
Sum of probabilities is 1.
i 1
Identity (1)
116
Example
Consider 5 keys with these search probabilities:
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2
k1
i
1
2
3
4
5
k4
k3
depthT ( k i)
1
0
2
1
2
depthT ( k i) · pi
0.25
0
0.1
0.2
0.6
1.15
k5
Therefore, E[search cost] = 2.15.
117
Example
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2
k1
k5
i depthT(ki)
1
1
2
0
3
3
4
2
5
1
depthT(ki)·pi
0.25
0
0.15
0.4
0.3
1.10
k4
Therefore, E[search cost] = 2.10.
k3
This tree turns out to be optimal for this set of keys.
118
Example
Observations:


Optimal BST may not have smallest height.
Optimal BST may not have highest-probability key at root.
Build by exhaustive checking?



Construct each n-node BST.
For each,
assign keys and compute expected search cost.
But there are (4n/n3/2) different BSTs with n nodes.
119
Optimal Substructure
Any subtree of a BST contains keys in a contiguous range ki, ..., kj for
some 1 ≤ i ≤ j ≤ n.
T
T
If T is an optimal BST and T contains subtree T ’ with keys ki, ... ,kj , then
T must be an optimal BST for keys ki, ..., kj.
Proof: Otherwise, we can obtain a tree better T by replacing T’ with an
optimal BST for keys ki, ..., kj.
120
Optimal Substructure
One of the keys in ki, …,kj, say kr, where i ≤ r ≤ j,
must be the root of an optimal subtree for these keys.
Left subtree of kr contains ki,...,kr1.
kr
Right subtree of kr contains kr+1, ...,kj.
To find an optimal BST:


ki
kr-1
kr+1
kj
Examine all candidate roots kr , for i ≤ r ≤ j
Determine all optimal BSTs containing ki,...,kr1 and
containing kr+1,...,kj
121
Recursive Solution
When the OPT subtree becomes a subtree of a node:
Depth of every node in OPT subtree goes up by 1.
Expected search cost increases by


j
w(i, j )   pl
l i
from Identity (1)
Recursive Solution
When the OPT subtree becomes a subtree of a node:
Depth of every node in OPT subtree goes up by 1.
Expected search cost increases by


j
w(i, j )   pl
from Identity (1)
l i
k2
k1
k4
k3
k1
k5
k0
k4
k3
k5
123
Recursive Solution
e[i,j]: cost of the optimal BST for ki,..,kj :
If kr is the root of an optimal BST for ki,..,kj :

e[i, j ] = pr + ( e[i, r1] + w(i, r1) ) + ( e[r+1, j] + w(r+1, j) )=
e[i, r1] + e[r+1, j] + w(i, j).
But, we don’t know kr. Hence,
if j  i  1
0
e[i, j ]  
{e[i, r  1]  e[r  1, j ]  w(i, j )} if i  j
min
ir  j
124
Computing an Optimal Solution
For each subproblem (i,j), store:
expected search cost in a table e [1 .. n+1 , 0 .. n]

Will use only entries e[i, j ], where j ≥ i1.
root[i, j ] = root of subtree with keys ki,..,kj, for 1 ≤ i ≤ j ≤ n.
w[1..n+1, 0..n] = sum of probabilities


w[i, i1] = 0 for 1 ≤ i ≤ n.
w[i, j ] = w[i, j-1] + pj for 1 ≤ i ≤ j ≤ n.
125
Pseudo-code
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
OPTIMAL-BST(p, q, n)
for i ← 1 to n + 1
do e[i, i 1] ← 0
Consider all trees with l keys.
for len ← 1 to n
Fix the first key.
do for i ← 1 to nlen + 1
Fix the last key
do j ←i + len1
e[i, j ]←∞
for r ←i to j
do t ← e[i, r1] + e[r + 1, j ] + w[i, j ]
if t < e[i, j ]
Determine the root
then e[i, j ] ← t
of the optimal
root[i, j ] ←r
(sub)tree
return e and root
Time: O(n3)
Space: O(n2)
126
Speeding up the Algorithm
Knuth principle: Let kr be the root of an optimal BST for the set
of keys ki < ....< kj. Furthermore, let kj+1 > kj and ki-1<ki. Then,
(i) there is an optimal BST for the set of keys ki-1 ,ki,..., kj
with root smaller than or equal to kr
(ii) there is an optimal BST for the set of keys ki ,ki+1,..., kj+1 with
root larger than or equal to kr
127
Knuth principle: Example
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2
k1
k5
k4
k3
•
Let k0 be a key with probability p0 then there is an optimal BST for
the set (k0,…, k5) with root smaller than or equal to k2.
128
Knuth principle: Example
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2
k1
k5
k4
k3
•
Let k6 be a key with probability p6 then there is an optimal BST for
the set (k1,…, k6) with root larger than or equal to k2
129
Speeding up the Algorithm
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
OPTIMAL-BST-Revised(p, q, n)
for i ← 1 to n + 1
do e[i, i 1] ← 0
Consider all trees with l keys.
for len ← 1 to n
O(n l )
do for i ← 1 to nlen + 1
do j ←i + len1
Optimization.
e[i, j ]←∞
for r ←root[i,j-1] to root[i+1,j]
do t ← e[i, r1] + e[r + 1, j ] + w[i, j ]
if t < e[i, j ]
Determine the root
then e[i, j ] ← t
of the optimal
root[i, j ] ←r
(sub)tree
return e and root
Time: O(n2)
Space: O(n2)
130
Speeding up the Algorithm
•
To calculate an optimal BST for kesy ki,..., kj it is enough to search for the
root in the interval [root(i,j-1), root(i+1,j)]
•
Therefore, the cost to find the root of the optimal BST for the set of keys
ki,..., kj is proportional to root[i+1,j]-root[i,j-1]
•
For a fixed len, the cost is
𝑛−𝑙𝑒𝑛
𝑖=1 𝑟𝑜𝑜𝑡(𝑘
•
+ 1, 𝑘 + 𝑙𝑒𝑛 − 1) − 𝑟𝑜𝑜𝑡 𝑘, 𝑘 + 𝑙𝑒𝑛 − 2 ≤ 2𝑛
Adding for all possible values of len, we obtain O(n2)
131
Lower Bound on the expected search cost
•
Let l1,..., ln be the levels of the leaves of a ternary tree. Then,
the following inequality holds
3−𝑙𝑖 ≤ 1
Proof: Induction on the size of the tree
132
Lower Bound on the expected search cost
•
Let h1,..., hn be the levels of the nodes of a BST. Then, the
following inequality holds
3−(ℎ𝑖 +1) ≤ 1
Proof:
•
•
Transform the BST into a ternary tree so that the
depth of each node increases by at most one unit
Use the previous equation
133
Lower Bound on the expected search cost
•
Let T be an optimal BST for the set of keys k1,..., kn with
probabilites p1,..., pn. Then,
ExpectedCost(T) >= -1 +
𝑛
𝑖=1 −𝑝𝑖
𝑙𝑜𝑔3 𝑝𝑖
Proof:
ExpectedCost(T) >= z*,
where z* =
s.a
𝑛
𝑖=1 𝑝𝑖 𝑙𝑖
𝑛
−(𝑙𝑖 +1)
3
𝑖=1
≤1
Using analytical methods one can prove that
z*= -1 +
𝑛
𝑖=1 −𝑝𝑖
𝑙𝑜𝑔3 𝑝𝑖
134
Download