Introduction to Algorithms Jiafen Liu Sept. 2013 Today’s Tasks • Order Statistics – Randomized divide and conquer – Analysis of expected time – Worst-case linear-time order statistics – Analysis Order statistics • Given n elements in array, try to select the ith smallest of n elements (the element with rank i)? • This has various applications. – i=1, find the minimum element. – i=n, find the maximum element . – Find the median: – i= (n+1)/2 (odd)or i=n/2 and n/2+1(even) – This is useful in statistics. How to find the ith element? • Naïve algorithm? – Sort array A, and find the element A[i]. – If we use merge sort or randomized quicksort – Worst-case running time= Θ(nlgn) + Θ(1) = Θ(nlgn) • Can we do better than that? – Related with sorting, but different. – Our expected time is Θ(n). Randomized divide-and-conquer algorithm • In which Rand-Partition(A,p,q) seems familiar? Partitioning subroutine PARTITION(A, p, q) //A[p. . q] //pivot= A[p] x←A[p] i←p for j← p+1 to q do if A[j] ≤x then i←i+ 1 exchange A[i] ↔ A[j] exchange A[p] ↔ A[i] return i Randomized divide-and-conquer algorithm Example • Select the i= 7th smallest: • Partition Algorithm Analysis • (All our analyses today assume that all elements are distinct.) • Like Quicksort, our algorithm depends on the effect of partition. • Recall what’s the lucky case of Partition? – Median – 1/10:9/10? – Each case is lucky except 0:n-1or n-1:0 Lucky or Unlucky? • Lucky: – Let’s take 1/10:9/10 partition as an example – T(n)= T(9n/10) + Θ(n) – How to solve it? – Master Method: – T(n)= Θ(n) Lucky or Unlucky? • Unlucky: – 0:n-1or n-1:0 partition – T(n)= T(n-1) + Θ(n) – T(n)= Θ(n2) – That’s like arithmetic series. – Even worse than sorting first and then select! Analysis of Expected Time • We have deal with expected running time of Quicksort algorithm in Lecture 4. – Recall how we handle that? – We have n possibilities in partition, how to express them all in an expression? – Indicator random variable. Analysis of Expected Time • Let T(n) = the running time of RAND-SELECT on an input of size n, assuming random numbers are independent. • To obtain an upper bound, assume that the ith element always falls in the larger side of the partition: Analysis of Expected Time • For k= 0, 1, …, n–1, define the indicator random variable: • Computing of Expected Time Independence ! Computing of Expected Time • How to solve • Substitution Method ? – We guess the answer is Θ(n) – Prove: E[T(n)] ≤ cn for some constant c. – Try to do the rest of this by yourself. if c is chosen large enough so that cn/4 dominates Θ(n). That’s the end of proof? The Base Case Summary of randomized orderstatistic selection • • • • Works fast: linear expected time. the worst case is bad: Θ(n2). Still an excellent algorithm in practice. Questions: Is there an algorithm that runs in linear time even in the worst case? • Pick the pivot randomly is simple, but is not good. Improvement of randomized selection • Due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973]. • IDEA: Generate a really good pivot recursively. • How can we make the complexity of recursion less than Θ(n)? Worst-case linear-time order statistics • SELECT(i, n) – Divide the n elements into └n/5┘ groups of 5 elements. Find the median of each 5elements group by rote. – Recursively select the median x of the └n/5┘group medians to be the pivot. – Partition around the pivot x. Let k= rank(x). • If i=k then return x • Else if i< k then recursively select the ith smallest element in the lower part • Else recursively select the (i–k)th smallest element in the upper part Choosing the pivot 5 └n/5┘ into └n/5┘ • Divide n elements groups of 5 elements. • Reorganize five elements in each group so that – the middle one is the median. – the upper two are less than the median. – the lower two are bigger. Choosing the pivot • • How much time does it takes? Θ(n) Choosing the pivot • Recursively select the median x of the └n/5┘group medians to be the pivot. • Rearranged these groups by these medians. Choosing the pivot • Suppose that the whole SELECT(i, n) algorithm takes T(n), What’s the running time of this step? • T(└n/5┘)=T(n/5) • Now what do we know about all these elements? Analysis • Rest of the algorithm – Partition around the pivot x. Let k= rank(x). • If i=k then return x • Else if i< k then recursively select the ith smallest element in the lower part • Else recursively select the (i–k)th smallest element in the upper part • The whole cost we expected is Θ(n), so the rest cost must strictly less than T(4n/5), why? – We have already a recursive call of T(n/5). Analysis • Look at this figure carefully – there are some directed paths and gives us more information than we just had. Analysis • Look at this figure carefully – there are some directed paths and gives us more information than we just had. – All the elements in the block are ≤ x. – How many elements are there? Analysis • At least half the group medians are ≤x, which is at least └└n/5┘/2┘ = └n/10┘group medians. • Therefore, at least 3└n/10┘elements are ≤x. Analysis • Look at this figure carefully – there are some directed paths and gives us more information than we just had. – Now all the elements in the block are ≥ x. – How many elements are there? Analysis • At least half the group medians are ≥ x, which is at least └└n/5┘/2┘ = └n/10┘group medians. • Therefore, at least 3└n/10┘elements are ≥ x. • Similarly, at least 3└n/10┘ elements are ≤ x. Analysis • Then, what’s the expression of the cost of 3-case recursion? – One side with at least 3└n/10┘elements – The other side with at most 7└n/10┘elements – Then the cost is T(7└n/10┘) • For n≥50, we have 3└n/10┘≥ n/4 – It means, for n≥50 we have 7└n/10┘ ≤ 3n/4 – T (3n/4) is even better than our expectation 4n/5. • For n ≤ 50, we have T(n) = Θ(1). Total Running Time Solving the recurrence • How? • Substitution Method desired residual If c is chosen large enough to handle Θ(n) desired Conclusions • Since the work at each level of recursion is a constant fraction (19/20) smaller, the work per level is a geometric series. • In practice, this algorithm runs slowly, because the constant in front of n is large. • The randomized algorithm is far more practical. Further Thought • Why did we use groups of five? • Why not groups of three? • How about 7?