Sorting in Linear Time CS 583 Analysis of Algorithms 7/1/2016

advertisement
Sorting in Linear Time
CS 583
Analysis of Algorithms
7/1/2016
CS583 Fall'06: Sorting in Linear Time
1
Outline
•
•
•
•
Comparison Sort Algorithms
Lower Bounds for Sorting
Counting Sort
Order Statistics
– Minimum and Maximum
– Selection
7/1/2016
CS583 Fall'06: Sorting in Linear Time
2
Comparison Sorts
• We have seen several algorithms that can sort n
numbers in O(n lg n) time.
– Merge sort and heapsort achieve this upper bound in the
worst case; quicksort achieves it on average.
– These algorithms have one common property: the sorted
order they determine is based only on comparisons
between the input elements. Such sorting algorithms are
called comparison sorts.
• We prove that any comparison sort must make (n
lg n) comparisons in the worst case to sort n
elements.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
3
Comparison Sort: Decision Tree
We assume without loss of generality that all input elements are distinct. In
this case, we can simply make comparisons of one form, for example, ai <=
a j.
Comparison sorts can be viewed in terms of decision tree. For example,
sorting three elements using insertion sort will look as follows:
1:2
<=
2:3
<=
<1,2,3>
>
1:3
>
1:3
<=
<1,3,2>
7/1/2016
>
<3,1,2>
CS583 Fall'06: Sorting in Linear Time
...
4
Comparison Sort: Decision Tree (cont.)
• In a decision tree each internal node is annotated by
i:j for some i and j in the range 1 <= i,j <= n.
• Each leaf is annotated by a permutation <(1), ... ,
(n)>.
• The execution of the sorting algorithm corresponds
to tracing a path from the root to a leaf.
• Any correct sorting algorithm must be able to
produce each permutation of its input, hence all n!
leaves of the decision tree must be "reachable".
7/1/2016
CS583 Fall'06: Sorting in Linear Time
5
Lower Bound for Comparison Sorts
Theorem 8.1
Any comparison sort algorithm requires (n lg n) comparisons
in the worst case.
Proof.
The length of the longest path from the root of a decision tree to
any of its reachable trees represents the worst-case number of
comparisons that a sorting algorithm performs. Hence, we need
to determine the height of the decision tree, where each
permutation is a reachable leaf.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
6
Lower Bound for Comparison Sorts (cont.)
Consider a tree of height h and l leaves. Since each permutation
appears as a leaf, we have n! <= l. A binary tree of height h has
no more than 2h leaves:
n! <= l <= 2h =>
h >= lg(n!)
lg(n!) = (n lg n) (see 3.18) =>
h >= (n lg n) => h = (n lg n) 
7/1/2016
CS583 Fall'06: Sorting in Linear Time
7
Counting Sort
• This algorithm assumes that each of the n input
elements is an integer in the range 0 to k.
– When k = O(n), the sort runs in (n) time.
• The basic idea is to determine for each input element
x, the number of elements less than x.
– This information can be used to place x directly into its
position in the output array.
– The algorithm requires an input array A, the output array
B, and an intermediate (“counting”) array C.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
8
Counting Sort: Example
n=5, k=2:
2
1
0
2
2
4
5
2
C after steps 1-4:
0
1
2
1
1
3
C after loop 6:
0
1
1
2
2
5
B in loop 9:
1
2
3
index
2
0
1
2
7/1/2016
CS583 Fall'06: Sorting in Linear Time
9
Counting Sort: Pseudocode
Counting-Sort (A,B,n,k)
1 for i = 0 to k
2
C[i] = 0
3 for i = 0 to n
4
C[A[i]]++
5 // C[i] contains the number of elements = i
6 for i = 1 to k
7
C[i] = C[i] + C[i-1]
8 // C[i] now contains number of elements <= i
9 for i = n to 1
10
B[C[A[i]]] = A[i]
11
C[A[i]]-12 return
7/1/2016
CS583 Fall'06: Sorting in Linear Time
10
Counting Sort: Performance
• After loop 6, the array C contains the first position of an
element with value i, which is the same as the number of
elements <= i. At each iteration, when the i element is placed
into the output array, the position of the next element i will
be before the current one.
• To calculate the running time, observe that the number of
operations is k+1(loop 1) + n+k+n = (k+n). When using
k=O(n), we have the running time (n).
• An important quality of the counting sort is that it is stable,
numbers with the same value appear in the output array in the
same order as they do in the input array. The property of
stability is important when satellite date are carried around
with the key.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
11
Order Statistics
• The ith order statistics of a set of n elements is the
ith smallest element.
– The minimum element is the first order statistics (i=1).
– The maximum element is the last order statistics (i=n).
– A median is a the “half point” of the set (i=(n+1)/2).
• The selection problem is finding the ith order
statistics from a set of n distinct numbers.
– It can be solved in O(n lg n) time by sorting elements, and
then selecting the ith element from the sorted array.
– The fastest algorithm runs in O(n) time in the worst case.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
12
Minimum/Maximum: Pseudocode
MINIMUM(A)
1 min = A[1]
2 for i = 2 to length[A]
3
if A[i] < min
4
min = A[i]
5 return min
The above algorithm makes (n-1) comparisons. Finding the maximum can
be accomplished with (n-1) comparisons as well:
MAXIMUM(A)
1 max = A[1]
2 for i = 2 to length[A]
3
if A[i] > max
4
max = A[i]
5 return max
7/1/2016
CS583 Fall'06: Sorting in Linear Time
13
Simultaneous Minimum and Maximum
MINMAX(A)
1 min = A[1]
2 max = A[1]
3 i = 2
4 while (i <= length[A])
5
if (i+1) > length[A]
6
x_min = A[i]; x_max = A[i]
7
else
8
if A[i] < A[i+1]
9
x_min = A[i]; x_max = A[i+1]
10
else
11
x_min = A[i+1]; x_max = A[i]
12
if x_max > max
13
max = x_max
14
if x_min < min
15
min = x_min
16 i += 2
17 return (min, max)
The above algorithm performs at most 5n/2 comparisons to find both minimum and
maximum, and hence runs in (n) time.
7/1/2016
CS583 Fall'06: Sorting in Linear Time
14
General Selection
The general selection algorithm finds an ith order statistics. The algorithm
below is modeled after a quicksort algorithm with expected running
time O(n).
RANDOMIZED-SELECT (A,p,r,i)
1 if p=r
2
return A[p]
3 q = RANDOMIZED-PARTITION(A,p,r)
4 k = q-p+1
5 if i = k // the pivot element is the answer
6
return A[q]
7 else
8
if i<k
9
return RANDOMIZED-SELECT(A,p,q-1,i)
10
else
11
return RANDOMIZED-SELECT(A,q+1,r,i-k)
7/1/2016
CS583 Fall'06: Sorting in Linear Time
15
Download