Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Search Algorithms a) Binary Search: average case b) Interpolation Search c) Unbounded Search (Advanced material) Readings • Reading Selection: • CLR, Chapter 12 Binary Search Trees • (in sorted Table of) keys k0, …, kn-1 • Binary Search Tree property: at each node x key (x) > key (y) y nodes on left subtree of x key (x) < key (z) z nodes on right subtree of x Binary Search Trees (cont’d) Binary Search Trees (cont’d) • Assume 1) Keys inserted into tree in random order 2) Search with all keys equally likely length = # of edges n + 1 = number of leaves • Internal path length I = sum of lengths of all internal paths of length > 1 (from root to nonleaves) Binary Search Trees (cont’d) • External path length E = sum of lengths of all external paths (from root to leaves) = I + 2n Successful Search • Expected # comparisons Cn = 1 + (I/n) = (C'i 1) / n i0 n-1 Unsuccessful Search • Expected # comparisons C 'n = E/(n+1) = (I+2n)/(n+1) = (n Cn +n)/(n+1) n-1 = (C'i 2) / (n+1) i=0 n 2 = 2 ln(n) i=0 (i+1) = 1.386 log n Model of Random Real Inputs over an Interval • • Input Set S of n keys each independently randomly chosen over real interval |L,U| for 0 < L < U Operations • Comparison operations • , operations: • • r =largest integer below or equal to r r = smallest integer above of equal to r Sorting and Selection with Random Real Inputs • Results 1) Sorting in 0(n) expected time 2) Selection in 0(loglogn) expected time Bucket Sorting with Random Inputs Input set of n keys, S randomly chosen over [L,U] Algorithm BUCKEY-SORT(S): begin for i 1 to n do B[i] empty list n(x i -L) for i 1 to n do add x i to B 1 (U-L) for i 1 to n do sort (B[i]) output B[1] B[2] B[n] end Bucket Sorting with Random Inputs (cont’d) • Theorem The expected time T of BUCKET-SORT is 0(n) Proof |B[i]| is upper bounded by a Binomial 1 variable with parameters n, p = n -j Hence c>1 i,j Prob {|B[i]| > j} < c n So T n c-j (jlogj) = O(n) j=0 • Generalizes to case keys have nonuniform distribution Random Search Table • Table X = (x0 < x1 < …< xn < xn+1) where x1, …, xn random reals chosen independently from real interval (x0, xn+1) • Selection Problem Input key Y Problem find index k* with Xk* = Y Note k* has Binomial distribution with parameters n,p = (Y-X0)/(Xn+1-X0) Algorithm PSEUDO INTERPOLATION-SEARCH (X,Y) Random Table X = (X0 , X1 , …, Xn , Xn+1) Algorithm pseudo interpolation search (X,Y) [0] k pn where p = (Y-X0)/(Xn+1-X0) [1] if Y = Xk then return k Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y) (cont’d) [2] If Y > Xk then for k' = k, k+ n , k+2 n ,... if Y < X then exit with k'+ n output pseudo interpolation search (X’,Y) where X' = (Xk' ,..., Xk'+ n ) Algorithm PSEUDO INTERPOLATIONSEARCH (X,Y) (cont’d) [3] Else if Y < Xk then for k' = k, k- n , k-2 n ,... if Y > X then exit with k'- n output pseudo interpolation search (X’,Y) where X" = (Xk'- n ,..., Xk' ) Probability Distribution of Pseudo Interpolation Search Probabilistic Analysis of Pseudo Interpolation Search • k* is Binomial with mean pn variance 2 = p(1-p)n * k pn • So approximates normal as n k * pn Prob Z (Z)/Z • Hence where (Z) = e -Z2 2 2 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • So Prob ( i probes used in given call) < Prob(|k pn | > (i - 2) n ) (Zi )/Zi * • where Zi = (i - 2) n (i - 2) 2(i - 2) p(i-p) 1 since p(1-p) 4 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • Lemma C 2.03 where C = expected number of probes in given call proof C= i Prob(i probes used) i>1 = Prob( i probes used) i>1 2 (Zi )/Zi 2.03 i 3 Probabilistic Analysis of Pseudo Interpolation Search (cont’d) • Theorem Pseudo Interpolation Search has expected time T(n) C loglogn proof T(n) C + T( n ) C loglogn Algorithm INTERPOLATION-SEARCH (X,Y) 1) Initialize k np comment k = E(k*) 2) If Xk = Y then return k 3) If Xk < Y then output INTERPOLATION-SEARCH (X’,Y) where X’ = (xk, …, xn+1 ) 4) Else Xk > Y and output INTERPOLATION-SEARCH (X”,Y) where X” = (x0, …, xk ) Probability Distribution of INTERPOLATION-SEARCH (X,Y) Advanced Material: Probabilistic Analysis of Interpolation Search • Lemma 1 Prob(|k pn | 0( nlogn)) α n where α is constant * • Proof Since k* is Binomial with parameters p,n -Z2 2 2 e 1 * Prob(|k pn| Zσ) α Z 2 n for 2 = p(1-p)n and Z = 0( logn ) Probabilistic Analysis of Interpolation Search (cont’d) • Theorem • The expected number of comparisons of Interpolation Search is T(n) loglogn + c1 (logloglogn) 2 Probabilistic Analysis of Interpolation Search (cont’d) • Proof 1 n T(n) 1+ (1- α ) T (O( n log n )) α n n 1 loglog n log n + c1logloglog n log n ) 2 o(1) 1 1 log logn + c1 (logloglogn) 2 2 loglogn + c1 (logloglogn) 2 since log2=1 Advance Material: Unbounded Search • Input table X[1], X[2], … where for j = 1, 2, … 0 j < n X[j] = 1 j n • Unbounded Search Problem • Find n such that X[n-1] = 0 and X[n]=1 • Cost for algorithm A: • CA(n)=m if algorithm A uses m evaluations to determine that n is the solution to the unbounded search problem Applications of Unbounded Search 1) Table Look-up in an ordered, infinite table 2) Binary encoding of integers if Sn represents integer n, then Sn is not a prefix of any Sj for n j {S1, S2, …} called a prefix set • Idea: use Sn (b1, b2, …, bCA(n)) where bm = 1 if the m’th evaluation of X is 1 in algorithm A for unbounded search Unary Unbounded Search Algorithm • Algorithm B0 • Try X[1], X[2], …, until X[n] = 1 • Cost CB0(n) = n Binary Unbounded Search Algorithm • Algorithm B1 i 1st stage try X[2 -1] for i=1,2,...,m m until X[2 -1]=1 (cost m = logn 1 where 2m-1 n 2m 1) 2nd stage binary search over 2 cost log(2 m-1 m-1 ) = m-1= logn Total Cost CB1 (n) = 2 logn 1 elements Binary Unbounded Search Algorithm (cont’d) Double Unbounded Search Search • Algorithm B2 (21 1) (2m1 1) 1st stage try X[2 -1], ..., X[2 where m1 = logn 1 ]=1 (cost is C B1 (m1 ) = 2 logm1 1) 2nd stage same as 2nd stage of B1 after m was found. Cost is C B0 (n) = m-1= logn Total Cost CB2 (n) = CB1 (m1 ) + C B0 (n) = 2( log ( logn+1) 1 ) logn Double Unbounded Search Algorithm (cont’d) Ultimate Unbounded Search Algorithm Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D.