05. quicksort

advertisement
Algorithm Design and Analysis (ADA)
242-535, Semester 1 2014-2015
5. Quicksort
• Objective
o describe the quicksort algorithm, it's partition
function, and analyse its running time under
different data conditions
242-535 ADA: 5. Quicksort
1
Overview
1. Quicksort
2. Partitioning Function
3. Analysis of Quicksort
4. Quicksort in Practice
242-535 ADA: 5. Quicksort
2
1. Quicksort
• Proposed by Tony Hoare in 1962.
• Voted one of top 10 algorithms of 20th century in
science and engineering
o http://www.siam.org/pdf/news/637.pdf
• A divide-and-conquer algorithm.
• Sorts “in place” -- rearranges elements using only
the array, as in insertion sort, but unlike merge sort
which uses extra storage.
• Very practical (after some code tuning).
242-535 ADA: 5. Quicksort
3
Divide and conquer
Quicksort an n-element array:
1. Divide: Partition the array into two subarrays
around a pivot x such that elements in lower
subarray ≤ x ≤ elements in upper subarray.
2. Conquer: Recursively sort the two subarrays.
3. Combine: Nothing to do.
Key: implementing a linear-time partitioning function
242-535 ADA: 5. Quicksort
4
Pseudocode
quicksort(int[] A, int left, int right)
if (left < right) // If the array has 2 or more items
pivot = partition(A, left, right)
// recursively sort elements smaller than the pivot
quicksort(A, left, pivot-1)
// recursively sort elements bigger than the pivot
quicksort(A, pivot+1, right)
242-535 ADA: 5. Quicksort
5
Quicksort
Diagram
pivot
242-535 ADA: 5. Quicksort
6
Fine Tuning the Code
• quicksort will stop when the subarray is 0 or 1
element big.
• When the subarray gets to a small size, switch over
to dedicated sorting code rather than relying on
recursion.
• quicksort is tail-recursive, a recursive behaviour
which can be optimized.
242-535 ADA: 5. Quicksort
7
Tail-Call Optimization
• Tail-call optimization avoids allocating a new stack
frame for a called function.
o It isn't necesary because the calling function only returns
the value that it gets from the called function.
• The most common use of this technique is for
optimizing tail-recursion
o the recursive function can be rewritten to use a constant
amount of stack space (instead of linear)
242-535 ADA: 5. Quicksort
8
Tail-Call Graphically
• Before applying tail-call optimization:
• After applying it:
242-535 ADA: 5. Quicksort
9
Pseudocode
• Before:
int foo(int n) {
if (n == 0)
return A();
else {
int x = B(n);
return foo(x);
}
}
• After:
int foo(int n) {
if (n == 0)
return A();
else {
int x = B(n);
goto start of foo() code
with x as argument value
}
}
2. Partitioning Function
PARTITION(A, p, q)
x ← A[p]
// pivot = A[p]
i←p
// index
// A[p . . q]
Running time
= O(n) for n
elements.
for j ← p + 1 to q
if A[ j] ≤ x then
i←i+1
// move the i boundary
exchange A[i] ↔ A[ j] // switch big and small
exchange A[p] ↔ A[i]
return i
// return index of pivot
242-535 ADA: 5. Quicksort
11
Example of partitioning
scan right until find something
less than the pivot
242-535 ADA: 5. Quicksort
12
Example of partitioning
242-535 ADA: 5. Quicksort
13
Example of partitioning
242-535 ADA: 5. Quicksort
14
Example of partitioning
swap 10 and 5
242-535 ADA: 5. Quicksort
15
Example of partitioning
resume scan right until find
something less than the pivot
242-535 ADA: 5. Quicksort
16
Example of partitioning
242-535 ADA: 5. Quicksort
17
Example of partitioning
242-535 ADA: 5. Quicksort
18
Example of partitioning
swap 13 and 3
242-535 ADA: 5. Quicksort
19
Example of partitioning
swap 10 and 2
242-535 ADA: 5. Quicksort
20
Example of partitioning
242-535 ADA: 5. Quicksort
21
Example of partitioning
j runs to the end
242-535 ADA: 5. Quicksort
22
Example of partitioning
swap pivot and 2
242-535 ADA: 5. Quicksort
so in the middle
23
3. Analysis of Quicksort
• The analysis is quite tricky.
• Assume all the input elements are distinct
o no duplicate values makes this code faster!
o there are better partitioning algorithms when duplicate
input elements exist (e.g. Hoare's original code)
• Let T(n) = worst-case running time on an array of n
elements.
242-535 ADA: 5. Quicksort
24
3.1. Worst-case of quicksort
• QUICKSORT runs very slowly when its input array is
already sorted (or is reverse sorted).
o almost sorted data is quite common in the real-world
• This is caused by the partition using the min (or max)
element which means that one side of the partition
will have has no elements. Therefore:
T(n) = T(0) +T(n-1) + Θ(n)
= Θ(1) +T(n-1) + Θ(n)
= T(n-1) + Θ(n)
= Θ(n2) (arithmetic series)
no elements
n-1 elements
242-535 ADA: 5. Quicksort
25
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
242-535 ADA: 5. Quicksort
26
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
T(n)
242-535 ADA: 5. Quicksort
27
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
T(n-1)
242-535 ADA: 5. Quicksort
28
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
c(n-1)
T(0)
242-535 ADA: 5. Quicksort
T(n-2)
29
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
c(n-1)
T(0)
T(n-2)
T(0)
Θ(1)
242-535 ADA: 5. Quicksort
30
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
242-535 ADA: 5. Quicksort
31
Quicksort isn't Quick?
• In the worst case, quicksort isn't any quicker than
insertion sort.
• So why bother with quicksort?
• It's average case running time is very good, as we'll
see.
242-535 ADA: 5. Quicksort
32
3.2. Best-case Analysis
If we’re lucky, PARTITION splits the
Case 2 of the
array evenly:
Master Method
T(n) = 2T(n/2) + Θ(n)
= Θ(n log n)
(same as merge sort)
242-535 ADA: 5. Quicksort
33
3.3. Almost Best-case
What if the split is always 1/10 : 9/10?
T(n) = T(1/10n) + T(9/10n) + Θ(n)
242-535 ADA: 5. Quicksort
34
Analysis of “almost-best” case
T(n)
242-535 ADA: 5. Quicksort
35
Analysis of “almost-best” case
cn
T(1/10n)
242-535 ADA: 5. Quicksort
T(9/10n)
36
Analysis of “almost-best” case
cn
T(1/10n)
T(1/100n ) T(9/100n)
242-535 ADA: 5. Quicksort
T(9/10n)
T(9/100n) T(81/100n)
37
Analysis of “almost-best” case
242-535 ADA: 5. Quicksort
38
Analysis of “almost-best” case
short
path
long
path
cn * short path
242-535 ADA: 5. Quicksort
cn * long path
all leaves
39
Short and Long Path Heights
• Short path node value:
n  (1/10)n  (1/10)2n  ...  1
n(1/10)sp
• 
=1
•  n = 10sp
•  log10n = sp
sp steps
// take logs
• Long path node value:
n  (9/10)n  (9/10)2n  ...  1
n(9/10)lp
• 
=1
•  n = (10/9)lp
• 242-535
 log
n = lp
ADA:10/9
5. Quicksort
lp steps
// take logs
40
3.4. Good and Bad
Suppose we alternate good, bad, good, bad, good,
partitions ….
G(n) = 2B(n/2) + Θ(n)
good
B(n) = L(n – 1) + Θ(n)
bad
Solving:
G(n) = 2( G(n/2 – 1) + Θ(n/2) ) + Θ(n)
= 2G(n/2 – 1) + Θ(n)
= Θ(n log n)
Good!
How can we make sure we choose good partitions?
242-535 ADA: 5. Quicksort
41
Randomized Quicksort
IDEA: Partition around a random element.
• Running time is then independent of the input
order.
• No assumptions need to be made about the
input distribution.
• No specific input leads to the worst-case
behavior.
• The worst case is determined only by the output
of a random-number generator.
242-535 ADA: 5. Quicksort
42
4. Quicksort in Practice
• Quicksort is a great general-purpose sorting
algorithm.
o especially with a randomized pivot
o Quicksort can benefit substantially from code tuning
o Quicksort can be over twice as fast as merge sort
• Quicksort behaves well even with caching
and virtual memory.
242-535 ADA: 5. Quicksort
43
Timing Comparisons
• Running time estimates:
• Home PC executes 108 compares/second.
• Supercomputer executes 1012 compares/second
Lesson 1. Good algorithms are better than supercomputers.
Lesson 2. Great algorithms are better than good ones.
242-535 ADA: 5. Quicksort
44
Download