CSE 3101: Introduction to the Design and Analysis of Algorithms Suprakash Datta datta[at]cse.yorku.ca 4/8/2015 CSE 3101 1 Quick Sort • Characteristics – sorts ”almost” in place, i.e., does not require an additional array, like insertion sort – Divide-and-conquer, like merge sort – very practical, average sort performance O(n log n) (with small constant factors), but worst case O(n2) [CAVEAT: this is true for the CLRS version] 2 Quick Sort – the main idea • To understand quick-sort, let’s look at a highlevel description of the algorithm • A divide-and-conquer algorithm – Divide: partition array into 2 subarrays such that elements in the lower part <= elements in the higher part – Conquer: recursively sort the 2 subarrays – Combine: trivial since sorting is done in place 3 Partitioning • Linear time partitioning procedure Partition(A,p,r) 01 02 03 04 05 06 07 08 09 10 11 j i xA[r] 17 ip-1 X=10 jr+1 while TRUE 10 repeat jj-1 until A[j] x repeat ii+1 until A[i] x 10 if i<j then exchange A[i]A[j] else return j 12 10 5 6 19 23 8 5 10 j i 12 5 6 6 6 19 23 8 i j 19 23 8 5 17 12 17 j i 8 23 19 12 17 4 Quick Sort Algorithm • Initial call Quicksort(A, 1, length[A]) Quicksort(A,p,r) 01 if p<r 02 then qPartition(A,p,r) 03 Quicksort(A,p,q) 04 Quicksort(A,q+1,r) 5 Analysis of Quicksort • Assume that all input elements are distinct • The running time depends on the distribution of splits 6 Best Case • If we are lucky, Partition splits the array evenly T (n) 2T (n / 2) (n) 7 Using the median as a pivot The recurrence in the previous slide works out, BUT…… Q: Can we find the median in linear-time? A: Yes! Chapter 9 of the text Note : Most implementations do not use the median as pivot. 8 Worst Case • What is the worst case? • One side of the parition has only one element T (n) T (1) T (n 1) ( n) T (n 1) ( n) n ( k ) k 1 n ( k ) k 1 ( n ) 2 9 Worst Case (2) 10 Worst Case (3) • When does the worst case appear? – input is sorted – input reverse sorted • Same recurrence for the worst case of insertion sort • However, sorted input yields the best case for insertion sort! 11 Analysis of Quicksort • Suppose the split is 1/10 : 9/10 T (n) T (n /10) T (9n /10) (n) (n log n)! 12 An Average Case Scenario • Suppose, we alternate lucky and unlucky cases to get an average behavior (n) n L(n) 2U ( n / 2) ( n) lucky U (n) L(n 1) (n) unlucky we consequently get L(n) 2( L( n / 2 1) ( n / 2)) ( n) 2 L(n / 2 1) ( n) (n log n) n n-1 1 (n-1)/2 (n-1)/2 (n-1)/2+1 (n) (n-1)/2 13 An Average Case Scenario (2) • How can we make sure that we are usually lucky? – Partition around the ”middle” (n/2th) element? – Partition around a random element (works well in practice) • Randomized algorithm – running time is independent of the input ordering – no specific input triggers worst-case behavior – the worst-case is only determined by the output of the random-number generator 14 Randomized Quicksort • Assume all elements are distinct • Partition around a random element • Randomization is often used to design algorithms with good average-case complexity (the worst-case complexity may not be as good) 15 The optimality question Q: Can we do better that worst case (n log n) time for sorting? A: In general no, but in some special cases yes! Q: Why not? A: The well-known (n log n) lower bound. 16 On Lower Bounds • “the best any algorithm can do” for a problem • The proof must be algorithm independent • In general, lower bound proofs are difficult • Must make some assumptions – the sorting lower bound assumes that sorting is comparison based. This will be covered later today, or by Prof. Ruppert next week If we relax the “comparison-based” assumption, we can sort in linear time! 17 Next: Linear sorting Q: How we beat the (n log n) lower bound for sorting? A: By making extra assumptions about the input 18 Non-Comparison Sort – Bucket Sort • Assumption: uniform distribution – Input numbers are uniformly distributed in [0,1). – Suppose input size is n. • Idea: – – – – Divide [0,1) into n equal-sized subintervals (buckets). Distribute n numbers into buckets Expect that each bucket contains few numbers. Sort numbers in each bucket (insertion sort as default). – Then go through buckets in order, listing elements Can be shown to run in linear-time on average 19 Example of BUCKET-SORT 20 Generalizing Bucket Sort Q: What if the input numbers are NOT uniformly distributed in [0,1)? A: Can be generalized in different ways, e.g. if the distribution is known we can design (unequal sized) bins that will have roughly equal number of numbers on average. 21 Non-Comparison Sort – Counting Sort • Assumption: n input numbers are integers in the range [0,k], k=O(n). • Idea: – Determine the number of elements less than x, for each input x. – Place x directly in its position. 22 Counting Sort - pseudocode Counting-Sort(A,B,k) • for i0 to k • do C[i] 0 • for j 1 to length[A] • do C[A[j]] C[A[j]]+1 • // C[i] contains number of elements equal to i. • for i 1 to k • do C[i]=C[i]+C[i-1] • // C[i] contains number of elements i. • for j length[A] downto 1 • do B[C[A[j]]] A[j] • C[A[j]] C[A[j]]-1 23 Counting Sort - example 24 Counting Sort - analysis 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. for i0 to k (k) do C[i] 0 (1) for j 1 to length[A] (n) do C[A[j]] C[A[j]]+1 (1) ((1) (n)= (n)) // C[i] contains number of elements equal to i. (0) for i 1 to k (k) do C[i]=C[i]+C[i-1] (1) ((1) (n)= (n)) // C[i] contains number of elements i. (0) for j length[A] downto 1 (n) do B[C[A[j]]] A[j] (1) ((1) (n)= (n)) C[A[j]] C[A[j]]-1 (1) ((1) (n)= (n)) Total cost is (k+n), suppose k=O(n), then total cost is (n). So, it beats the (n log n) lower bound! 25 Stable sort • Preserves order of elements with the same key. • Counting sort is stable. Crucial question: can counting sort be used to sort large integers efficiently? 26 Radix sort Radix-Sort(A,d) • for i1 to d • do use a stable sort to sort A on digit i Analysis: Given n d-digit numbers where each digit takes on up to k values, Radix-Sort sorts these numbers correctly in (d(n+k)) time. 27 Radix sort - example 1019 3075 2225 2231 2231 3075 2225 1019 1019 2225 2231 3075 1019 3075 2225 2231 1019 2225 2231 3075 1019 3075 2231 2225 1019 2231 2225 3075 Sorted! Not sorted! 28