Quicksort Algorithm CS 583 Analysis of Algorithms 7/1/2016

advertisement
Quicksort Algorithm
CS 583
Analysis of Algorithms
7/1/2016
CS583 Fall'06: Quicksort
1
Outline
• Quicksort Algorithm
• Performance Analysis
– Worst-case partitioning
– Best-case partitioning
– Balanced partitioning
• Randomized Quicksort
– Worst-case analysis
– Expected running time
• Self-Testing
– 7.1-1, 7.1-2, 7.2-3, 7.4-2
7/1/2016
CS583 Fall'06: Quicksort
2
Quicksort Algorithm
• Quicksort is a sorting algorithm with worst-case
running time of (n2).
– Often the best practical choice because it is very efficient
on the average: O(n lg n) in a randomized algorithm.
– Sorts in place.
• Used divide-and-conquer approach:
– Divide: Partition A[p..r] to A[p..q-1] and A[q+1..r] so
that any
• e1 <= A[q] and e2 >= A[q], where
• e1  A[p..q-1], e2  A[q+1..r].
– Conquer: Sort A[p..q-1] and A[q+1..r] by recursive calls
to quicksort.
– Combine: Subarrays are sorted in place, hence A[p..r] is
sorted.
7/1/2016
CS583 Fall'06: Quicksort
3
Quicksort: Partitioning
i
i
p,j
7
4
p
7
j
4
6
7
p,i
4
7/1/2016
p,i
4
7
p
4
i
3
p
4
i
3
3
r
5
3
r
5
j
6
3
r
5
6
j
3
r
5
6
j
7
r
5
5
j
7
r
6
6
CS583 Fall'06: Quicksort
4
Quicksort: Pseudocode
Quicksort(A, p, r)
1 if p<r
2
q = Partition(A,p,r)
3 Quicksort(A,p,q-1)
4 Quicksort(A,q+1,r)
Partition(A, p, r)
1 x = A[r]
2 i = p-1
3 for j = p to r-1
4
if A[j] < x
5
i = i+1
6
<exchange A[i] with A[j]>
7 <exchange A[i+1] with A[r]>
8 return i+1
7/1/2016
CS583 Fall'06: Quicksort
5
Quicksort: Correctness
•
As the procedure runs, the array is partitioned into four
regions, which can be considered a loop invariant:
1.
2.
3.
4.
•
p <= k <= i: A[k] < x
i+1 <= k <= j-1: A[k] >= x
k=r: A[k] = x
j <= k <= r-1: undetermined
We need to show that this invariant is true:
– Initialization: only 3 and 4 are relevant and they are true.
– Maintenance:
•
•
A[j] > x: i does not change and a greater than x value is part of [i+1,j-1]
A[j] <= x: [p, i] is expanded with one legitimate element as well as
[i+1,j-1]
– Termination: j=r: all three regions are preserved.
7/1/2016
CS583 Fall'06: Quicksort
6
Partition: Analysis
• At lines 7-8 the pivot is moved to the correct place
and its index is returned.
• To calculate the running time of Partition, assume
n=p-r+1.
• The loop 3 is executed n-1 times with steps 4-6
taking constant time (1).
• Steps 7-8 take constant time as well.
• Hence the running time of Partition is (n).
7/1/2016
CS583 Fall'06: Quicksort
7
Quicksort: Performance
• The running time of quicksort depends on whether
partitioning is balanced or unbalanced.
– If it is balanced, the algorithm runs as fast as heapsort.
– Otherwise, it can run asymptotically as slowly as insertion
sort.
• Worst-case scenario occurs when partitioning
routine produces one subproblem with (n-1)
elements and one with 0 elements.
– T(n) = T(n-1) + T(0) + (n) = T(n-1) + (n)
– Intuitively, T(n) = n + (n-1) + ... + 1 = (n+1)/2 * n =
(n^2)
– This scenario occurs when the array is already sorted.
7/1/2016
CS583 Fall'06: Quicksort
8
Quicksort: Performance (cont.)
• The best-case partitioning is the most even split in
which each subproblem has no more than n/2
elements. The recurrence is:
– T(n) <= 2T(n/2) + (n)
– According to the master theorem's case 2, the above
recurrence has the solution (n lg n).
• The average-case running time of quicksort is much
closer to the best case than to the worst case.
– The key to understanding this is to understand how the
balance of partitioning is reflected on the recurrence.
7/1/2016
CS583 Fall'06: Quicksort
9
Quicksort: Balanced Partitioning
• The average-case running time of quicksort is much
closer to the best case than to the worst case.
• For example, assume the partitioning algorithm
always produces 9:1 splitting ratio, which appears
unbalanced. We have:
– T(n) <= T(9n/10) + T(n/10) + c n
– The recursion tree shows that that the cost at each level is
cn until the boundary level log10n = (lgn), and then
levels have cost less than cn. The recursion stops at depth
log10/9(n) = (lg n).
– In fact, any split of constant proportionality yields a
recursion tree of depth (lg n), where the cost at each
level is c n.
7/1/2016
CS583 Fall'06: Quicksort
10
Quicksort: Average Case
• Make an assumption that all permutations of the input
numbers are equally likely.
– When we run quicksort on a random array, it is unlikely that the
partitioning always happens in the same way on each level.
– We expect splits to be balanced and unbalanced.
• In the average case, Partition produces a mix of "good" and
"bad" splits, that are distributed randomly across the
recursion tree.
– For the sake of intuition assume that the best split follows the worst
split.
– Note that, this achieves in 2 steps what the best partitioning can
achieve in one step.
– Intuitively, the (n-1) cost of extra split can be absorbed by (n)
cost.
7/1/2016
CS583 Fall'06: Quicksort
11
Randomized Quicksort
In the randomized version of the quicksort we simply pick a pivot element
randomly:
Randomized-Partition(A,p,r)
1 i = Random(p, r) // pick a random number in [p,r]
2 <swap A[r] with A[i]>
3 return Partition(A,p,r)
Randomized-Quicksort
1 if p<r
2
q = Randomized-Partition(A,p,r)
3
Randomized-Quicksort(A,p,q-1)
4
Randomized-Quicksort(A,q+1,r)
7/1/2016
CS583 Fall'06: Quicksort
12
Expected Running Time
• The running time of quicksort is dominated by the
time spent in Partition.
– Note that there can be at most n calls to Partition since
once the pivot element is selected it does not participate in
future calls to Partition.
• One call to Partition takes O(1) time plus the time
proportional to the number of iterations in for loop
3-6.
– Each iteration involves comparison at line 4, so we need
to count the total number of times the line 4 is executed.
7/1/2016
CS583 Fall'06: Quicksort
13
Expected Running Time (cont.)
Lemma 7.1. Let X be the number of comparisons performed in line 4. Then
the running time of quicksort is O(n+X).
Proof. There are n calls to partition. Each call does constant amount of
work, then executes the for loop some number of times. Each such iteration
executes line 4.•
Hence we need to compute X. Rename element in A as sorted z_1, ... , z_n.
Also, define the set Zij = {z_i, ... , z_j} to be the set of elements between z_i
and z_j.
Note that, the elements are compared only to the pivot element at each call
of the partition and after the call are never compared o any other element.
7/1/2016
CS583 Fall'06: Quicksort
14
Expected Running Time (cont.)
Define the indicator random variable:
X_ij = I{z_i is compared to z_j} (=1 if yes, = 0 otherwise)
Since each pair of two numbers is compared at most once, we have:
X = i=1,n-1(j=i+1,n(X_ij))
Taking expectations of both sides we have:
E[X] = E[i=1,n-1(j=i+1,n(X_ij))] =
i=1,n-1(j=i+1,n(E(X_ij))) =
i=1,n-1(j=i+1,n(Pr{z_i is compared to z_j}))
7/1/2016
CS583 Fall'06: Quicksort
15
Expected Running Time (cont.)
To calculate Pr{ z_i is compared to z_j }, observe the following:
1) When a pivot x is chosen with: z_i < x < z_j, z_i and z_j will not
be compared.
2) When a pivot x is chosen with: x = z_i or x = z_j, z_i and z_j
will be compared.
3) When a pivot x is chosen with: x < z_i or x > z_j, it does not
affect the comparison event.
The probability of an event = (the number of successes) /
(total number of outcomes).
Note that, the total number of relevant outcomes are i+j-1; see 1) and 2)
above. The number of successes (z_i compared to z_j) is 2 according to rule
2 above. Hence,
7/1/2016
CS583 Fall'06: Quicksort
16
Expected Running Time (cont.)
Pr{z_i is compared to z_j} = 2/(j-i+1)
E[X] = i=1,n-1(j=i+1,n(2/(j-i+1))))
We use change of variables k=j-i:
E[X] = i=1,n-1(k=1,n-i(2/(k+1)))
< i=1,n-1(k=1,n(2/k))))
Harmonic series: k=1,n(1/k) = ln n + O(1) *
Hence: i=1,n-1(k=1,n(2/k)))) =
i=1,n-1(O(lg n)) = O(n lg n) =>
E[X] = O(n lg n)
7/1/2016
CS583 Fall'06: Quicksort
17
Download