Search Algorithms

Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Search Algorithms a) Binary Search: average case b) Interpolation Search c) Unbounded Search (Advanced material) Readings • Reading Selection: • CLR, Chapter 12 Binary Search Trees • (in sorted Table of) keys k0, …, kn-1 • Binary Search Tree property: at each node x key (x) > key (y)  y nodes on left subtree of x key (x) < key (z)  z nodes on right subtree of x Binary Search Trees (cont’d) Binary Search Trees (cont’d) • Assume 1) Keys inserted into tree in random order 2) Search with all keys equally likely length = # of edges n + 1 = number of leaves • Internal path length I = sum of lengths of all internal paths of length > 1 (from root to nonleaves) Binary Search Trees (cont’d) • External path length E = sum of lengths of all external paths (from root to leaves) = I + 2n Successful Search • Expected # comparisons Cn = 1 + (I/n)   =  (C'i  1)  / n  i0  n-1 Unsuccessful Search • Expected # comparisons C 'n = E/(n+1) = (I+2n)/(n+1) = (n Cn +n)/(n+1)  n-1  =   (C'i  2)  / (n+1)  i=0  n 2 =  2 ln(n) i=0 (i+1) = 1.386 log n Model of Random Real Inputs over an Interval • • Input Set S of n keys each independently randomly chosen over real interval |L,U| for 0 < L < U Operations • Comparison operations •   ,   operations: • • r =largest integer below or equal to r r = smallest integer above of equal to r Sorting and Selection with Random Real Inputs • Results 1) Sorting in 0(n) expected time 2) Selection in 0(loglogn) expected time Bucket Sorting with Random Inputs Input set of n keys, S randomly chosen over [L,U] Algorithm BUCKEY-SORT(S): begin for i  1 to n do B[i]  empty list  n(x i -L)  for i  1 to n do add x i to B   1   (U-L)   for i  1 to n do sort (B[i]) output B[1]  B[2] B[n] end Bucket Sorting with Random Inputs (cont’d) • Theorem The expected time T of BUCKET-SORT is 0(n) Proof |B[i]| is upper bounded by a Binomial 1 variable with parameters n, p = n -j Hence c>1 i,j Prob {|B[i]| > j} < c n So T  n  c-j (jlogj) = O(n) j=0 • Generalizes to case keys have nonuniform distribution Random Search Table • Table X = (x0 < x1 < …< xn < xn+1) where x1, …, xn random reals chosen independently from real interval (x0, xn+1) • Selection Problem Input key Y Problem find index k* with Xk* = Y Note k* has Binomial distribution with parameters n,p = (Y-X0)/(Xn+1-X0) Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) Random Table X = (X0 , X1 , …, Xn , Xn+1) Algorithm pseudo interpolation search (X,Y) [0] k   pn where p = (Y-X0)/(Xn+1-X0) [1] if Y = Xk then return k Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y) (cont’d) [2] If Y > Xk then for k' = k, k+ n , k+2 n ,... if Y < X   then exit with k'+ n output pseudo interpolation search (X’,Y) where X' = (Xk' ,..., Xk'+ n ) Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y) (cont’d) [3] Else if Y < Xk then for k' = k, k- n , k-2 n ,... if Y > X   then exit with k'- n output pseudo interpolation search (X’,Y) where X" = (Xk'- n ,..., Xk' ) Probability Distribution of Pseudo Interpolation Search Probabilistic Analysis of Pseudo Interpolation Search • k* is Binomial with mean pn variance 2 = p(1-p)n *   k  pn • So   approximates normal as n   k *   pn   Prob   Z    (Z)/Z  • Hence   where  (Z) = e -Z2 2 2 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • So Prob (  i probes used in given call)   < Prob(|k  pn | > (i - 2) n )  (Zi )/Zi * • where Zi = (i - 2) n  (i - 2)   2(i - 2) p(i-p) 1 since p(1-p)  4 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • Lemma C  2.03 where C = expected number of probes in given call proof C=  i Prob(i probes used) i>1 =  Prob(  i probes used) i>1  2    (Zi )/Zi  2.03 i 3 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • Theorem Pseudo Interpolation Search has expected time T(n)  C loglogn proof T(n)  C + T( n )  C loglogn Algorithm INTERPOLATION-SEARCH (X,Y) 1) Initialize k   np comment k =  E(k*) 2) If Xk = Y then return k 3) If Xk < Y then output INTERPOLATION-SEARCH (X’,Y) where X’ = (xk, …, xn+1 ) 4) Else Xk > Y and output INTERPOLATION-SEARCH (X”,Y) where X” = (x0, …, xk ) Probability Distribution of INTERPOLATION-SEARCH (X,Y) Advanced Material: Probabilistic Analysis of Interpolation Search • Lemma 1 Prob(|k  pn |  0( nlogn))  α n where α is constant *   • Proof Since k* is Binomial with parameters p,n -Z2 2 2 e 1 * Prob(|k  pn|  Zσ)   α Z 2 n for  2 = p(1-p)n and Z = 0( logn ) Probabilistic Analysis of Interpolation Search (cont’d) • Theorem • The expected number of comparisons of Interpolation Search is T(n)  loglogn + c1 (logloglogn) 2 Probabilistic Analysis of Interpolation Search (cont’d) • Proof 1 n  T(n)  1+ (1- α ) T (O( n log n ))  α  n   n  1  loglog n log n + c1logloglog n log n ) 2  o(1) 1   1  log  logn  + c1 (logloglogn) 2 2   loglogn + c1 (logloglogn) 2 since log2=1 Advance Material: Unbounded Search • Input table X[1], X[2], … where for j = 1, 2, … 0 j < n X[j] =  1 j  n • Unbounded Search Problem • Find n such that X[n-1] = 0 and X[n]=1 • Cost for algorithm A: • CA(n)=m if algorithm A uses m evaluations to determine that n is the solution to the unbounded search problem Applications of Unbounded Search 1) Table Look-up in an ordered, infinite table 2) Binary encoding of integers if Sn represents integer n, then Sn is not a prefix of any Sj for n  j {S1, S2, …} called a prefix set • Idea: use Sn (b1, b2, …, bCA(n)) where bm = 1 if the m’th evaluation of X is 1 in algorithm A for unbounded search Unary Unbounded Search Algorithm • Algorithm B0 • Try X[1], X[2], …, until X[n] = 1 • Cost CB0(n) = n Binary Unbounded Search Algorithm • Algorithm B1 i 1st stage try X[2 -1] for i=1,2,...,m m until X[2 -1]=1 (cost m =  logn   1 where 2m-1  n  2m  1) 2nd stage binary search over 2 cost log(2 m-1 m-1 ) = m-1=  logn  Total Cost CB1 (n) = 2  logn   1 elements Binary Unbounded Search Algorithm (cont’d) Double Unbounded Search Search • Algorithm B2 (21 1) (2m1 1) 1st stage try X[2 -1], ..., X[2 where m1 =  logn   1 ]=1 (cost is C B1 (m1 ) = 2  logm1  1) 2nd stage same as 2nd stage of B1 after m was found. Cost is C B0 (n) = m-1=  logn  Total Cost CB2 (n) = CB1 (m1 ) + C B0 (n) = 2(  log (  logn+1)   1 )   logn  Double Unbounded Search Algorithm (cont’d) Ultimate Unbounded Search Algorithm Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D.

Search Algorithms

Related documents

Products

Support

Search Algorithms

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib