Search Algorithms

advertisement
Search Algorithms
Analysis of Algorithms
Prepared by
John Reif, Ph.D.
Search Algorithms
a) Binary Search: average case
b) Interpolation Search
c) Unbounded Search (Advanced material)
Readings
• Reading Selection:
• CLR, Chapter 12
Binary Search Trees
• (in sorted Table of) keys k0, …, kn-1
• Binary Search Tree property:
at each node x
key (x) > key (y)
 y nodes on left
subtree of x
key (x) < key (z)
 z nodes on right
subtree of x
Binary Search Trees (cont’d)
Binary Search Trees (cont’d)
• Assume
1) Keys inserted into tree in random order
2) Search with all keys equally likely
length = # of edges
n + 1 = number of leaves
• Internal path length I
= sum of lengths of all internal paths of
length > 1 (from root to nonleaves)
Binary Search Trees (cont’d)
• External path length E
= sum of lengths of all external
paths
(from root to leaves) = I +
2n
Successful Search
• Expected # comparisons
Cn = 1 + (I/n)


=  (C'i  1)  / n
 i0

n-1
Unsuccessful Search
• Expected # comparisons
C 'n = E/(n+1) = (I+2n)/(n+1)
= (n Cn +n)/(n+1)
 n-1

=   (C'i  2)  / (n+1)
 i=0

n
2
=
 2 ln(n)
i=0 (i+1)
= 1.386 log n
Model of Random Real Inputs over
an Interval
•
•
Input
Set S of n keys each independently randomly
chosen over real interval |L,U| for 0 < L < U
Operations
• Comparison operations
•   ,   operations:
•
•
r =largest integer below or equal to r
r = smallest integer above of equal to r
Sorting and Selection with Random
Real Inputs
• Results
1) Sorting in 0(n) expected time
2) Selection in 0(loglogn) expected
time
Bucket Sorting with Random Inputs
Input set of n keys, S randomly chosen over
[L,U]
Algorithm
BUCKEY-SORT(S):
begin
for i  1 to n do B[i]  empty list
 n(x i -L) 
for i  1 to n do add x i to B 
 1
  (U-L)  
for i  1 to n do sort (B[i])
output B[1]  B[2] B[n]
end
Bucket Sorting with Random Inputs
(cont’d)
• Theorem The expected time T of BUCKET-SORT is 0(n)
Proof
|B[i]| is upper bounded by a Binomial
1
variable with parameters n, p =
n
-j
Hence c>1 i,j Prob {|B[i]| > j} < c
n
So T  n  c-j (jlogj) = O(n)
j=0
• Generalizes to case keys have nonuniform distribution
Random Search Table
• Table X = (x0 < x1 < …< xn < xn+1)
where x1, …, xn random reals chosen
independently from real interval (x0,
xn+1)
• Selection Problem
Input
key Y
Problem
find index k* with Xk* = Y
Note
k* has Binomial distribution
with parameters
n,p = (Y-X0)/(Xn+1-X0)
Algorithm
PSEUDO INTERPOLATION-SEARCH (X,Y)
Random Table
X = (X0 , X1 , …, Xn , Xn+1)
Algorithm
pseudo interpolation search (X,Y)
[0] k   pn where p = (Y-X0)/(Xn+1-X0)
[1] if Y = Xk then return k
Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y) (cont’d)
[2] If Y > Xk then
for k' = k, k+ n , k+2 n ,...
if Y < X 
 then exit with
k'+ n
output pseudo interpolation search
(X’,Y)
where
X' = (Xk' ,..., Xk'+ n )
Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y)
(cont’d)
[3] Else if Y < Xk then
for k' = k, k- n , k-2 n ,...
if Y > X   then exit with
k'- n
output pseudo interpolation search
(X’,Y)
where
X" = (Xk'- n ,..., Xk' )
Probability Distribution of Pseudo
Interpolation Search
Probabilistic Analysis of Pseudo
Interpolation Search
• k* is Binomial with mean pn
variance 2 = p(1-p)n
*


k

pn
• So


approximates normal as n 
 k *   pn 

Prob 
 Z    (Z)/Z

• Hence


where  (Z) =
e
-Z2
2
2
Probabilistic Analysis of Pseudo
Interpolation Search (cont’d)
• So Prob (  i probes used in given call)


< Prob(|k  pn | > (i - 2) n )
 (Zi )/Zi
*
• where
Zi =
(i - 2) n

(i - 2)

 2(i - 2)
p(i-p)
1
since p(1-p) 
4
Probabilistic Analysis of Pseudo
Interpolation Search (cont’d)
• Lemma
C  2.03 where
C = expected number of
probes
in given call
proof
C=
 i Prob(i probes used)
i>1
=  Prob(  i probes used)
i>1
 2    (Zi )/Zi  2.03
i 3
Probabilistic Analysis of Pseudo
Interpolation Search (cont’d)
• Theorem
Pseudo Interpolation Search
has expected time
T(n)  C loglogn
proof
T(n)  C + T( n )
 C loglogn
Algorithm
INTERPOLATION-SEARCH (X,Y)
1) Initialize k   np comment k =  E(k*)
2) If Xk = Y then return k
3) If Xk < Y then
output INTERPOLATION-SEARCH
(X’,Y)
where X’ = (xk, …, xn+1 )
4) Else Xk > Y and
output INTERPOLATION-SEARCH
(X”,Y)
where X” = (x0, …, xk )
Probability Distribution of
INTERPOLATION-SEARCH (X,Y)
Advanced Material: Probabilistic Analysis
of Interpolation Search
• Lemma
1
Prob(|k  pn |  0( nlogn))  α
n
where α is constant
*


• Proof
Since k* is Binomial with parameters
p,n
-Z2
2
2
e
1
*
Prob(|k  pn|  Zσ) 
 α
Z 2 n
for  2 = p(1-p)n and Z = 0( logn )
Probabilistic Analysis of
Interpolation Search (cont’d)
• Theorem
• The expected number of comparisons
of Interpolation Search is
T(n)  loglogn + c1 (logloglogn)
2
Probabilistic Analysis of
Interpolation Search (cont’d)
• Proof
1
n

T(n)  1+ (1- α ) T (O( n log n ))  α 
n 
 n
 1  loglog n log n + c1logloglog n log n ) 2  o(1)
1

 1  log  logn  + c1 (logloglogn) 2
2

 loglogn + c1 (logloglogn) 2 since log2=1
Advance Material:
Unbounded Search
• Input table X[1], X[2], …
where for j = 1, 2, …
0 j < n
X[j] = 
1 j  n
• Unbounded Search Problem
• Find n such that X[n-1] = 0 and
X[n]=1
• Cost for algorithm A:
• CA(n)=m if algorithm A uses m
evaluations to determine that n is
the solution to the unbounded
search problem
Applications of Unbounded Search
1) Table Look-up in an ordered, infinite table
2) Binary encoding of integers
if Sn represents integer n,
then Sn is not a prefix of any Sj for n  j
{S1, S2, …} called a prefix set
•
Idea: use Sn (b1, b2, …, bCA(n))
where bm = 1
if the m’th evaluation of X is 1
in algorithm A for unbounded search
Unary Unbounded Search Algorithm
• Algorithm B0
• Try X[1], X[2], …, until X[n] = 1
• Cost CB0(n) = n
Binary Unbounded Search Algorithm
• Algorithm B1
i
1st stage try X[2 -1] for i=1,2,...,m
m
until X[2 -1]=1
(cost m =  logn   1 where 2m-1  n  2m  1)
2nd stage binary search over 2
cost log(2
m-1
m-1
) = m-1=  logn 
Total Cost CB1 (n) = 2  logn   1
elements
Binary Unbounded Search Algorithm
(cont’d)
Double Unbounded Search Search
• Algorithm B2
(21 1)
(2m1 1)
1st stage try X[2
-1], ..., X[2
where m1 =  logn   1
]=1
(cost is C B1 (m1 ) = 2  logm1  1)
2nd stage same as 2nd stage of B1 after m was found.
Cost is C B0 (n) = m-1=  logn 
Total Cost CB2 (n) = CB1 (m1 ) + C B0 (n)
= 2(  log (  logn+1)   1 )   logn 
Double Unbounded Search Algorithm
(cont’d)
Ultimate Unbounded Search Algorithm
Search Algorithms
Analysis of Algorithms
Prepared by
John Reif, Ph.D.
Download