Algorithm Design and Analysis (ADA) 242-535, Semester 1 2014-2015 5. Quicksort • Objective o describe the quicksort algorithm, it's partition function, and analyse its running time under different data conditions 242-535 ADA: 5. Quicksort 1 Overview 1. Quicksort 2. Partitioning Function 3. Analysis of Quicksort 4. Quicksort in Practice 242-535 ADA: 5. Quicksort 2 1. Quicksort • Proposed by Tony Hoare in 1962. • Voted one of top 10 algorithms of 20th century in science and engineering o http://www.siam.org/pdf/news/637.pdf • A divide-and-conquer algorithm. • Sorts “in place” -- rearranges elements using only the array, as in insertion sort, but unlike merge sort which uses extra storage. • Very practical (after some code tuning). 242-535 ADA: 5. Quicksort 3 Divide and conquer Quicksort an n-element array: 1. Divide: Partition the array into two subarrays around a pivot x such that elements in lower subarray ≤ x ≤ elements in upper subarray. 2. Conquer: Recursively sort the two subarrays. 3. Combine: Nothing to do. Key: implementing a linear-time partitioning function 242-535 ADA: 5. Quicksort 4 Pseudocode quicksort(int[] A, int left, int right) if (left < right) // If the array has 2 or more items pivot = partition(A, left, right) // recursively sort elements smaller than the pivot quicksort(A, left, pivot-1) // recursively sort elements bigger than the pivot quicksort(A, pivot+1, right) 242-535 ADA: 5. Quicksort 5 Quicksort Diagram pivot 242-535 ADA: 5. Quicksort 6 Fine Tuning the Code • quicksort will stop when the subarray is 0 or 1 element big. • When the subarray gets to a small size, switch over to dedicated sorting code rather than relying on recursion. • quicksort is tail-recursive, a recursive behaviour which can be optimized. 242-535 ADA: 5. Quicksort 7 Tail-Call Optimization • Tail-call optimization avoids allocating a new stack frame for a called function. o It isn't necesary because the calling function only returns the value that it gets from the called function. • The most common use of this technique is for optimizing tail-recursion o the recursive function can be rewritten to use a constant amount of stack space (instead of linear) 242-535 ADA: 5. Quicksort 8 Tail-Call Graphically • Before applying tail-call optimization: • After applying it: 242-535 ADA: 5. Quicksort 9 Pseudocode • Before: int foo(int n) { if (n == 0) return A(); else { int x = B(n); return foo(x); } } • After: int foo(int n) { if (n == 0) return A(); else { int x = B(n); goto start of foo() code with x as argument value } } 2. Partitioning Function PARTITION(A, p, q) x ← A[p] // pivot = A[p] i←p // index // A[p . . q] Running time = O(n) for n elements. for j ← p + 1 to q if A[ j] ≤ x then i←i+1 // move the i boundary exchange A[i] ↔ A[ j] // switch big and small exchange A[p] ↔ A[i] return i // return index of pivot 242-535 ADA: 5. Quicksort 11 Example of partitioning scan right until find something less than the pivot 242-535 ADA: 5. Quicksort 12 Example of partitioning 242-535 ADA: 5. Quicksort 13 Example of partitioning 242-535 ADA: 5. Quicksort 14 Example of partitioning swap 10 and 5 242-535 ADA: 5. Quicksort 15 Example of partitioning resume scan right until find something less than the pivot 242-535 ADA: 5. Quicksort 16 Example of partitioning 242-535 ADA: 5. Quicksort 17 Example of partitioning 242-535 ADA: 5. Quicksort 18 Example of partitioning swap 13 and 3 242-535 ADA: 5. Quicksort 19 Example of partitioning swap 10 and 2 242-535 ADA: 5. Quicksort 20 Example of partitioning 242-535 ADA: 5. Quicksort 21 Example of partitioning j runs to the end 242-535 ADA: 5. Quicksort 22 Example of partitioning swap pivot and 2 242-535 ADA: 5. Quicksort so in the middle 23 3. Analysis of Quicksort • The analysis is quite tricky. • Assume all the input elements are distinct o no duplicate values makes this code faster! o there are better partitioning algorithms when duplicate input elements exist (e.g. Hoare's original code) • Let T(n) = worst-case running time on an array of n elements. 242-535 ADA: 5. Quicksort 24 3.1. Worst-case of quicksort • QUICKSORT runs very slowly when its input array is already sorted (or is reverse sorted). o almost sorted data is quite common in the real-world • This is caused by the partition using the min (or max) element which means that one side of the partition will have has no elements. Therefore: T(n) = T(0) +T(n-1) + Θ(n) = Θ(1) +T(n-1) + Θ(n) = T(n-1) + Θ(n) = Θ(n2) (arithmetic series) no elements n-1 elements 242-535 ADA: 5. Quicksort 25 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn 242-535 ADA: 5. Quicksort 26 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn T(n) 242-535 ADA: 5. Quicksort 27 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn cn T(0) T(n-1) 242-535 ADA: 5. Quicksort 28 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn cn T(0) c(n-1) T(0) 242-535 ADA: 5. Quicksort T(n-2) 29 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn cn T(0) c(n-1) T(0) T(n-2) T(0) Θ(1) 242-535 ADA: 5. Quicksort 30 Worst-case recursion tree T(n) = T(0) +T(n-1) + cn 242-535 ADA: 5. Quicksort 31 Quicksort isn't Quick? • In the worst case, quicksort isn't any quicker than insertion sort. • So why bother with quicksort? • It's average case running time is very good, as we'll see. 242-535 ADA: 5. Quicksort 32 3.2. Best-case Analysis If we’re lucky, PARTITION splits the Case 2 of the array evenly: Master Method T(n) = 2T(n/2) + Θ(n) = Θ(n log n) (same as merge sort) 242-535 ADA: 5. Quicksort 33 3.3. Almost Best-case What if the split is always 1/10 : 9/10? T(n) = T(1/10n) + T(9/10n) + Θ(n) 242-535 ADA: 5. Quicksort 34 Analysis of “almost-best” case T(n) 242-535 ADA: 5. Quicksort 35 Analysis of “almost-best” case cn T(1/10n) 242-535 ADA: 5. Quicksort T(9/10n) 36 Analysis of “almost-best” case cn T(1/10n) T(1/100n ) T(9/100n) 242-535 ADA: 5. Quicksort T(9/10n) T(9/100n) T(81/100n) 37 Analysis of “almost-best” case 242-535 ADA: 5. Quicksort 38 Analysis of “almost-best” case short path long path cn * short path 242-535 ADA: 5. Quicksort cn * long path all leaves 39 Short and Long Path Heights • Short path node value: n (1/10)n (1/10)2n ... 1 n(1/10)sp • =1 • n = 10sp • log10n = sp sp steps // take logs • Long path node value: n (9/10)n (9/10)2n ... 1 n(9/10)lp • =1 • n = (10/9)lp • 242-535 log n = lp ADA:10/9 5. Quicksort lp steps // take logs 40 3.4. Good and Bad Suppose we alternate good, bad, good, bad, good, partitions …. G(n) = 2B(n/2) + Θ(n) good B(n) = L(n – 1) + Θ(n) bad Solving: G(n) = 2( G(n/2 – 1) + Θ(n/2) ) + Θ(n) = 2G(n/2 – 1) + Θ(n) = Θ(n log n) Good! How can we make sure we choose good partitions? 242-535 ADA: 5. Quicksort 41 Randomized Quicksort IDEA: Partition around a random element. • Running time is then independent of the input order. • No assumptions need to be made about the input distribution. • No specific input leads to the worst-case behavior. • The worst case is determined only by the output of a random-number generator. 242-535 ADA: 5. Quicksort 42 4. Quicksort in Practice • Quicksort is a great general-purpose sorting algorithm. o especially with a randomized pivot o Quicksort can benefit substantially from code tuning o Quicksort can be over twice as fast as merge sort • Quicksort behaves well even with caching and virtual memory. 242-535 ADA: 5. Quicksort 43 Timing Comparisons • Running time estimates: • Home PC executes 108 compares/second. • Supercomputer executes 1012 compares/second Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones. 242-535 ADA: 5. Quicksort 44