SHEN’S CLASS NOTES Chapter 4 Non-Comparison Sorting As we can observe that all comparison sorting algorithms discussed in Chapter 3 have a worst case running time (nlgn) or larger. Can we do better? The answer is no if the sorting is based on comparisons. We will prove this conclusion in this chapter. Moreover, we show that we may get faster running time if we use non-comparison sorting techniques. 4.1 Lower Bounds for Comparison Sorting Any comparison sorting algorithm can be represented by a binary decision tree. A decision tree is used to model a process that makes a decision based on a sequence of tests. Moreover, what kind of test should take place at each step is determined from the outcome of previous test. Example. The following binary tree represents an execution of an algorithm that sorts three numbers in A[1], A[2], and A[3]. Note that any correct algorithm should clearly define how to construct the decision tree for any input size n. 1 SHEN’S CLASS NOTES 1:2 > 2:3 1, 2, 3 1, 3, 2 1:3 > 1:3 2, 1, 3 > 2:3 > 3, 1, 2 2, 3, 1 > 3, 2, 1 Fig. 8-1 In the decision tree, each leaf corresponds to a decision that tells us how to re-arrange the three elements such that they are in increasing order. This arrangement corresponds to a permutation among the three numbers. In general, a decision for sorting n numbers must have at least n! leaves, each of which represents a permutation among the n numbers. Conducting the permutation will produce a sorted sequence. How to conducting the permutation is not shown by the decision tree, but is clearly given by the algorithm. Then, why do we need the decision tree? The decision tree is usually used to evaluate the complexity. For example, the longest path from the root to a leaf in the decision tree corresponds to the worst case. This is because the length of the path equals the number of tests performed to reach the 2 SHEN’S CLASS NOTES decision (leaf). The shortest path from the root to a leaf corresponds to the best case. The average path length represents the average complexity of the algorithm. Lemma 1 In any rooted binary tree with height h and L leaves, we have the relation L 2h (or h lgL). Proof. Fig. 8-2 illustrates the case for a complete binary tree. The lemma is correct for the complete tree. Obviously, if the tree is not complete, then the number of leaves will be less than 2h. Therefore, in any case, we have L 2h (or h lgL). Number of nodes level 0 1 1 2 2 22 i 2i h 2h Fig. 8-2 Theorem 1 Any comparison sorting algorithm for n numbers requires at least (nlgn) comparisons in the worst case. Proof. Because the decision tree corresponding to a comparison sorting algorithm for n numbers must contain at least n! leaves, 3 SHEN’S CLASS NOTES one for each possible permutation of the input n numbers, by Lemma 1, we have 2h n! That is, h lg(n!), or h lg(n!), which means that the longest path has length lg(n!) or larger. Because n! n n 2n we have e n h lg(n!) 0.5lg(2n) + nlg e = 0.5lg(2) +0.5lgn + nlgn - nlge nlgn. Therefore, h = (nlgn). Theorem 1 is the famous theorem on (comparison) sorting lower bound. Now, we study the average case. As we analyzed earlier, given a comparison sorting algorithm, its average complexity can be measured by the average path length in the corresponding decision tree T. That is, the average length of a path from the root to a leaf. In order to compute the average length, we first compute the sum of all possible paths. Because a leaf is also called an external node, this sum is usually called the external path length (EPL). Let L be the set of leaves. 4 SHEN’S CLASS NOTES EPL(T) = x L |path from root to x|. After EPL is obtained, then the average complexity = EPL/|L|. We will show that EPL/|L| = (nlgn). Definition 1 A binary tree with L leaves is called the minimum EPL tree if its EPL value is the smallest among all binary trees with L leaves. Obviously, the minimum EPL tree must be a full binary tree because we can reduce the EPL by shrink the edge (u, v) if u has only one child of node v as illustrated by Fig. 8-3. u u v Fig. 8-3 Moreover, we have the following lemma. 5 SHEN’S CLASS NOTES Lemma 2 In a minimum EPL tree, all leaves must be on the bottom two levels. Proof. Suppose a full binary tree T has height k, and a leaf occurs at level d, where d < k – 1 as illustrated in Fig. 8-4 (a). x x level d a b level d+1 y y a level d lelvel k-1 b lelvel k (b) Tree T’ (a) Tree T Fig. 8-4 We will prove that this tree cannot be a minimum EPL tree. The reason is as follows. If we cut the two leaves at level k and hook them to the node x, we will transform the binary tree to a new binary tree T’ with the same number of leaves as shown by Fig. 8-4 (b). Now the EPL of tree T’ is smaller than the EPL of T. EPL(T’) = EPL(T)+length(y)–length(x)+length change of {a, b} = EPL(T) + (k-1) – d + 2(d+1) – 2k = EPL(T) + (d+1) – k < EPL(T). 6 SHEN’S CLASS NOTES Corollary 1 The EPL of the minimum EPL tree with L leaves is larger than L(lgL -1). Theorem 2 Any comparison sorting algorithm for n numbers requires at least (nlgn) comparisons in the average case. Proof. Let T be the corresponding decision tree for the comparison sorting algorithm. As we discussed, the average number of comparisons can be measured by A(n) = EPL(T)/L, where L is the number of leaves in the tree T. By Corollary 1, EPL(T) > L(lgL -1). We have A(n) = EPL(T)/L > (lgL -1). Because L n!, we have A(n) > lgn! -1 = (nlgn). From the above discussion, we know that, in order to break the (nlgn) bound, we must design non-comparison sorting algorithms. In the following, we will discuss the Counting sort, Radix sort, and Bucket sort. 4.2 Counting Sort The counting sort does not rely upon comparisons between numbers, but it requires that: (1) The n input numbers, a1, a2, …, an, must be integers. (2) The n input numbers must be in a limited range, 7 SHEN’S CLASS NOTES 0 a1, a2, …, an k, and k = O(n). Let the input numbers be stored in array A[1..n] which satisfy the above conditions. The following counting sort will produce the sorted sequence in array B[1..n]. Counting-Sort(A[1..n], B[1..n], k) 1 for i 0 to k 2 do C[i] 0 3 for i 1 to n 4 do C[A[j]] C[A[j]]+1 5 //C[i] = number of elements equal to i 6 for i 1 to k 7 do C[i] C[i]+C[i-1] 8 //C[i] = number of elements less than or equal to i 9 for j n downto 1 10 do { i A[j] 11 B[C[i]] i 12 C[i] C[i] - 1 13 } 14 End A careful reader may notice that the for loop at line 9 takes the direction from n to 1. Can we do it from 1 to n? We leave this question to the reader. Example. Input: A[1..8], k = 5 8 SHEN’S CLASS NOTES A: 1 2 3 4 5 6 7 8 2 5 3 0 2 3 0 3 After the line 4 of the algorithm, the array C becomes: C: 0 1 2 3 4 5 2 0 2 3 0 1 After the line 7 of the algorithm, the array C becomes: C: 0 1 2 3 4 5 2 2 4 7 7 8 The following three steps show how the numbers A[8], A[7], and A[6] are placed in array B and how the array C is updated after each step. 1 2 3 4 5 6 (1) B: C: 8 3 0 1 2 3 4 5 2 2 4 6 7 8 1 2 3 4 5 6 (2) B: C: 7 0 7 3 0 1 2 3 4 5 1 2 4 6 7 8 9 8 SHEN’S CLASS NOTES 1 (3) B: C: 2 3 4 5 0 6 7 3 3 0 1 2 3 4 5 1 2 4 5 7 8 8 The final result is B: 1 2 3 4 5 6 7 8 0 0 2 2 3 3 3 5 The complexity of the counting sort is obviously O(n + k) = O(n) since each loop in the algorithm takes either O(n) steps or (k) steps. 4.3 Radix Sort Assume each input number has d digits and each digit takes on one of k possible values. The Radix Sort sorts the numbers digit by digit from least significant (rightmost) digit to the most significant (leftmost) digit. 10 SHEN’S CLASS NOTES Radix-Sort (A, d) 1 for i 1 to d 2 do use a stable sort to sort array A on ith digit. 3 End Example 329 457 657 839 436 720 355 720 355 436 457 657 329 839 720 329 436 839 355 457 657 329 355 436 457 657 720 839 Theorem 8.3 Given n d-digit numbers in which each digit can take one of k possible values, Radix-Sort correctly sort them in O(d(n+k)) time. Proof. The correctness of the Radix-Sort can be proved by induction (Exercise 8.3-3). The complexity of the Radix-Sort is O(d(n+k)) because we can use counting sort to sort each digit in O(n+k) time. An easy way to prove is to prove by contradiction. 4.4 Bucket Sort Bucket Sort is another non-comparison sorting. For the Bucket Sort, we assume the n input numbers in A[1..n] are 11 SHEN’S CLASS NOTES within the interval between 0 and 1, 0 A[i] <1, 1 i n. Moreover, we divide the interval [0, 1) into n equal-sized subintervals called buckets. Then, the n numbers are distributed among the n buckets. Because 0 A[i] <1, 1 i n, we have 0 nA[i] < n and 0 nA[i] < n. Therefore, we place A[i] in bucket j if nA[i] = j. After the distribution, we sort numbers in each bucket, and concatenate the numbers in the n buckets in order. Bucket-Sort (A[1..n]) 1 for i 1 to n 2 do { j nA[i] 3 insert A[i] into list B[j] 4 } 5 for i 0 to n - 1 6 do sort list B[i] with insertion sort 7 concatenate the lists B[0], B[1], …, B[n-1] in order. 8 End 12 SHEN’S CLASS NOTES Example. A 1 .78 2 .17 3 .39 4 .26 5 .72 6 .94 7 .21 8 .12 9 .23 10 .68 B 0 1 2 3 4 5 6 7 8 9 .12 .21 .39 .17 .23 .68 .72 .26 .78 .94 (a) (b) Complexity of the Bucket Sort n 1 T(n) = (n) + O(ni2), n 1 i 0 where ni = n. i 0 Because We have n2 n 1 =( i 0 (1) ni)2 n 1 = ni2 + 2 ni nj, i 0 i j n 1 T(n) = (n) + O(ni2) = O(n2). i 0 13 SHEN’S CLASS NOTES This is the worst case complexity. It can be improved to O(nlgn). (Exercise 8.4-2). Now, we prove that the average time is O(n). We compute the expectation of (1): n 1 E[T(n)] = E[(n) + O(ni2)] i 0 n 1 = (n) + E[O(ni2)] i 0 n 1 = (n) + O(E[ni2]). i 0 We will show that E[ni2] = 2 - 1 . n Let Xij be the random variable such that Xij = 1 if A[j] falls in bucket i. Xij = 0 otherwise. n So, ni = Xij. j 1 E[ni2] n = E[( Xij)2] = E[(Xi1 + Xi2 + … + Xin)2] j 1 14 SHEN’S CLASS NOTES n 2 = E[ X ij + j 1 1 j n 1 k n k j n = E[Xij2] + j 1 Xij Xik ] E[Xij Xik ]. 1 j n 1 k n k j We can assume that Xij and Xij are independent. 1 1 1 Moreover, Pr[Xij] = , we have E[Xij2] = (12) = . n n n We have n 1 1 E[ni2] = + ( )2 j 1 n 1 j n 1 k n n k j 1 = 1 + n(n-1) ( ) 2 n 1 =2- . n Therefore, n 1 E[T(n)] = (n) + O(E[ni2]) i 0 n 1 = (n) + O(2 i 0 1 ) n = (n). 15