CS 592

advertisement
SHEN’S CLASS NOTES
Chapter 4
Non-Comparison Sorting
As we can observe that all comparison sorting algorithms
discussed in Chapter 3 have a worst case running time (nlgn)
or larger. Can we do better? The answer is no if the sorting is
based on comparisons. We will prove this conclusion in this
chapter. Moreover, we show that we may get faster running time
if we use non-comparison sorting techniques.
4.1 Lower Bounds for Comparison Sorting
Any comparison sorting algorithm can be represented by a
binary decision tree. A decision tree is used to model a process
that makes a decision based on a sequence of tests. Moreover,
what kind of test should take place at each step is determined
from the outcome of previous test.
Example.
The following binary tree represents an execution of an
algorithm that sorts three numbers in A[1], A[2], and A[3]. Note
that any correct algorithm should clearly define how to
construct the decision tree for any input size n.
1
SHEN’S CLASS NOTES

1:2
>
2:3

1, 2, 3

1, 3, 2
1:3
>

1:3
2, 1, 3
>
2:3

>
3, 1, 2
2, 3, 1
>
3, 2, 1
Fig. 8-1
In the decision tree, each leaf corresponds to a decision that tells
us how to re-arrange the three elements such that they are in
increasing order. This arrangement corresponds to a permutation
among the three numbers.
In general, a decision for sorting n numbers must have at least n!
leaves, each of which represents a permutation among the n
numbers. Conducting the permutation will produce a sorted
sequence. How to conducting the permutation is not shown by
the decision tree, but is clearly given by the algorithm.
Then, why do we need the decision tree?
The decision tree is usually used to evaluate the complexity. For
example, the longest path from the root to a leaf in the decision
tree corresponds to the worst case. This is because the length of
the path equals the number of tests performed to reach the
2
SHEN’S CLASS NOTES
decision (leaf). The shortest path from the root to a leaf
corresponds to the best case. The average path length represents
the average complexity of the algorithm.
Lemma 1
In any rooted binary tree with height h and L
leaves, we have the relation L  2h (or h  lgL).
Proof. Fig. 8-2 illustrates the case for a complete binary tree.
The lemma is correct for the complete tree. Obviously, if the
tree is not complete, then the number of leaves will be less than
2h. Therefore, in any case, we have L  2h (or h  lgL).
Number of nodes
level
     
0
1
1
2
2
22
i
2i
  
h
2h
Fig. 8-2
Theorem 1 Any comparison sorting algorithm for n numbers
requires at least (nlgn) comparisons in the worst case.
Proof. Because the decision tree corresponding to a comparison
sorting algorithm for n numbers must contain at least n! leaves,
3
SHEN’S CLASS NOTES
one for each possible permutation of the input n numbers, by
Lemma 1, we have 2h  n!
That is, h  lg(n!), or h  lg(n!),
which means that the longest path has length lg(n!) or larger.
Because n! 
n
n
2n   we have
e
n
h  lg(n!)  0.5lg(2n) + nlg  
e
= 0.5lg(2) +0.5lgn + nlgn - nlge
 nlgn.
Therefore, h =  (nlgn).
Theorem 1 is the famous theorem on (comparison) sorting lower
bound.
Now, we study the average case.
As we analyzed earlier, given a comparison sorting algorithm,
its average complexity can be measured by the average path
length in the corresponding decision tree T. That is, the average
length of a path from the root to a leaf. In order to compute the
average length, we first compute the sum of all possible paths.
Because a leaf is also called an external node, this sum is
usually called the external path length (EPL).
Let L be the set of leaves.
4
SHEN’S CLASS NOTES
EPL(T) = 
x L
|path from root to x|.
After EPL is obtained, then the average complexity = EPL/|L|.
We will show that EPL/|L| = (nlgn).
Definition 1 A binary tree with L leaves is called the minimum
EPL tree if its EPL value is the smallest among all binary trees
with L leaves.
Obviously, the minimum EPL tree must be a full binary tree
because we can reduce the EPL by shrink the edge (u, v) if u has
only one child of node v as illustrated by Fig. 8-3.
u
u
v
Fig. 8-3
Moreover, we have the following lemma.
5
SHEN’S CLASS NOTES
Lemma 2
In a minimum EPL tree, all leaves must be
on the bottom two levels.
Proof. Suppose a full binary tree T has height k, and a leaf
occurs at level d, where d < k – 1 as illustrated in Fig. 8-4 (a).
x
x
level d
a
b
level d+1
y
y
a
level d
lelvel k-1
b
lelvel k
(b) Tree
T’
(a) Tree T
Fig. 8-4
We will prove that this tree cannot be a minimum EPL tree.
The reason is as follows. If we cut the two leaves at level k and
hook them to the node x, we will transform the binary tree to a
new binary tree T’ with the same number of leaves as shown by
Fig. 8-4 (b). Now the EPL of tree T’ is smaller than the EPL of
T.
EPL(T’) = EPL(T)+length(y)–length(x)+length change of {a, b}
= EPL(T) + (k-1) – d + 2(d+1) – 2k
= EPL(T) + (d+1) – k
< EPL(T).
6
SHEN’S CLASS NOTES
Corollary 1 The EPL of the minimum EPL tree with L leaves is
larger than L(lgL -1).
Theorem 2 Any comparison sorting algorithm for n numbers
requires at least (nlgn) comparisons in the average case.
Proof. Let T be the corresponding decision tree for the
comparison sorting algorithm. As we discussed, the average
number of comparisons can be measured by
A(n) = EPL(T)/L,
where L is the number of leaves in the tree T.
By Corollary 1, EPL(T) > L(lgL -1).
We have A(n) = EPL(T)/L > (lgL -1).
Because L  n!, we have A(n) > lgn! -1 = (nlgn).
From the above discussion, we know that, in order to break the
(nlgn) bound, we must design non-comparison sorting
algorithms. In the following, we will discuss the Counting sort,
Radix sort, and Bucket sort.
4.2 Counting Sort
The counting sort does not rely upon comparisons between
numbers, but it requires that:
(1) The n input numbers, a1, a2, …, an, must be integers.
(2) The n input numbers must be in a limited range,
7
SHEN’S CLASS NOTES
0  a1, a2, …, an  k, and k = O(n).
Let the input numbers be stored in array A[1..n] which satisfy
the above conditions. The following counting sort will produce
the sorted sequence in array B[1..n].
Counting-Sort(A[1..n], B[1..n], k)
1
for i  0 to k
2
do C[i]  0
3
for i  1 to n
4
do C[A[j]]  C[A[j]]+1
5
//C[i] = number of elements equal to i
6
for i  1 to k
7
do C[i]  C[i]+C[i-1]
8
//C[i] = number of elements less than or equal to i
9
for j  n downto 1
10
do { i  A[j]
11
B[C[i]]  i
12
C[i]  C[i] - 1
13
}
14 End
A careful reader may notice that the for loop at line 9 takes
the direction from n to 1. Can we do it from 1 to n? We leave
this question to the reader.
Example.
Input: A[1..8], k = 5
8
SHEN’S CLASS NOTES
A:
1
2
3
4
5
6
7
8
2
5
3
0
2
3
0
3
After the line 4 of the algorithm, the array C becomes:
C:
0
1
2
3
4
5
2
0
2
3
0
1
After the line 7 of the algorithm, the array C becomes:
C:
0
1
2
3
4
5
2
2
4
7
7
8
The following three steps show how the numbers A[8], A[7],
and A[6] are placed in array B and how the array C is updated
after each step.
1
2
3
4
5
6
(1)
B:
C:
8
3
0
1
2
3
4
5
2
2
4
6
7
8
1
2
3
4
5
6
(2)
B:
C:
7
0
7
3
0
1
2
3
4
5
1
2
4
6
7
8
9
8
SHEN’S CLASS NOTES
1
(3)
B:
C:
2
3
4
5
0
6
7
3
3
0
1
2
3
4
5
1
2
4
5
7
8
8
The final result is
B:
1
2
3
4
5
6
7
8
0
0
2
2
3
3
3
5
The complexity of the counting sort is obviously O(n + k) = O(n)
since each loop in the algorithm takes either O(n) steps or (k)
steps.
4.3 Radix Sort
Assume each input number has d digits and each digit takes on
one of k possible values. The Radix Sort sorts the numbers digit
by digit from least significant (rightmost) digit to the most
significant (leftmost) digit.
10
SHEN’S CLASS NOTES
Radix-Sort (A, d)
1
for i  1 to d
2
do use a stable sort to sort array A on ith digit.
3
End
Example
329
457
657
839
436
720
355
720
355
436
457
657
329
839
720
329
436
839
355
457
657
329
355
436
457
657
720
839
Theorem 8.3 Given n d-digit numbers in which each digit can
take one of k possible values, Radix-Sort correctly sort them in
O(d(n+k)) time.
Proof. The correctness of the Radix-Sort can be proved by
induction (Exercise 8.3-3). The complexity of the Radix-Sort is
O(d(n+k)) because we can use counting sort to sort each digit in
O(n+k) time. An easy way to prove is to prove by contradiction.
4.4 Bucket Sort
Bucket Sort is another non-comparison sorting. For the
Bucket Sort, we assume the n input numbers in A[1..n] are
11
SHEN’S CLASS NOTES
within the interval between 0 and 1, 0  A[i] <1, 1  i  n.
Moreover, we divide the interval [0, 1) into n equal-sized
subintervals called buckets.
Then, the n numbers are distributed among the n buckets.
Because
0  A[i] <1, 1  i  n,
we have
0  nA[i] < n
and
0  nA[i] < n.
Therefore, we place A[i] in bucket j if nA[i] = j.
After the distribution, we sort numbers in each bucket, and
concatenate the numbers in the n buckets in order.
Bucket-Sort (A[1..n])
1
for i  1 to n
2
do { j  nA[i]
3
insert A[i] into list B[j]
4
}
5
for i  0 to n - 1
6
do sort list B[i] with insertion sort
7
concatenate the lists B[0], B[1], …, B[n-1] in order.
8
End
12
SHEN’S CLASS NOTES
Example.
A
1 .78
2 .17
3 .39
4 .26
5 .72
6 .94
7 .21
8 .12
9 .23
10 .68
B
0
1
2
3
4
5
6
7
8
9
.12
.21
.39
.17
.23
.68
.72
.26
.78
.94
(a)
(b)
Complexity of the Bucket Sort
n 1
T(n) = (n) +  O(ni2),
n 1
i 0
where  ni = n.
i 0
Because
We have
n2
n 1
=(
i 0
(1)
ni)2
n 1
=  ni2 + 2  ni nj,
i 0
i j
n 1
T(n) = (n) +  O(ni2) = O(n2).
i 0
13
SHEN’S CLASS NOTES
This is the worst case complexity.
It can be improved to O(nlgn). (Exercise 8.4-2).
Now, we prove that the average time is O(n).
We compute the expectation of (1):
n 1
E[T(n)] = E[(n) +  O(ni2)]
i 0
n 1
= (n) +  E[O(ni2)]
i 0
n 1
= (n) +  O(E[ni2]).
i 0
We will show that E[ni2] = 2 -
1
.
n
Let Xij be the random variable such that
Xij = 1 if A[j] falls in bucket i.
Xij = 0 otherwise.
n
So, ni =  Xij.
j 1
E[ni2]
n
= E[(  Xij)2] = E[(Xi1 + Xi2 + … + Xin)2]
j 1
14
SHEN’S CLASS NOTES
n
2
= E[  X ij +
j 1

1 j  n 1 k  n
k j
n
=  E[Xij2] +
j 1


Xij Xik ]
 E[Xij Xik ].
1 j  n 1 k  n
k j
We can assume that Xij and Xij are independent.
1
1
1
Moreover, Pr[Xij] = , we have E[Xij2] = (12) = .
n
n
n
We have
n 1
1
E[ni2] = 
+   ( )2
j 1 n
1 j  n 1 k  n n
k j
1
= 1 + n(n-1) ( ) 2
n
1
=2- .
n
Therefore,
n 1
E[T(n)] = (n) +  O(E[ni2])
i 0
n 1
= (n) +  O(2 i 0
1
)
n
= (n).
15
Download