Chapter 11 Sorting Assumptions: Mostly, comparison-based sorts We will sort sequences only: Array ArrayList LinkedList For now, assume we want to sort x [0], x [1], …, x [n – 1], an array of n int values. Insertion sort: public static void insertionSort (int[ ] x) { for (int i = 1; i < x.length; i++) { // At this point, x [0] <= x [1] <= … <= x [i-1]. SIFT x [i] DOWN TO ITS PROPER PLACE. } // for } // insertionSort Example: 80, 60, 90, 75, 55, 90, … 80, 60, 90, 75, 55, 90, … 80 80, 60, 90, 75, 55, 90, … 60, 80 60, 80, 90, 75, 55, 90, … 60, 80, 90 60, 80, 90, 75, 55, 90, … 60, 75, 80, 90 60, 75, 80, 90, 55, 90, … 55, 60, 75, 80, 90 55, 60, 75, 80, 90, 90, … 55, 60, 75, 80, 90, 90 Sift x [i] down by comparing it to x [i-1], x [i-2], …, until the proper place for x [i] is found. /** * Sorts a specified array of int values into ascending * order. * The worstTime(n) is O(n * n). * * @param x – the array to be sorted. * */ public static void insertionSort (int[ ] x) { for (int i = 1; i < x.length; i++) for (int j = i; j > 0 && x [j -1] > x [j]; j - -) swap (x, j, j -1); } // method insertionSort Example, starting with i = 3: 60, 80, 90, 75, 55, 90 // i = 3; j = 3; swap 60, 80, 75, 90, 55, 90 // i = 3; j = 2; swap 60, 75, 80, 90, 55, 90 // i = 3; j = 1; no swap 60, 75, 80, 90, 55, 90 // i = 4; j = 4; swap 60, 75, 80, 55, 90, 90 // i = 4; j = 3; swap and so on Worst case: x in descending order Let n = x.length. Outer-loop iterations = n-1 Inner-loop iterations = 1 + 2 + … + n-1 n-1 1 + 2 + 3 + ... + n-2 + n-1 = i = n(n-1) / 2 i=1 worstTime(n) Max number of loop iterations = n-1 + n (n-1) / 2 worstTime(n) is quadratic in n. averageTime(n) Average number of loop iterations n-1 + n (n-1) / 4 averageTime(n) is quadratic in n. Exercise: Determine the exact number of inner-loop iterations (= number of swaps) in applying insertion sort to the following array of int values: 96, 86, 76, 56, 66 What if we want a version of InsertionSort to sort an array of String elements into lexicographic order? public static void insertionSort (Object[ ] x) { for (int i = 1; i < x.length; i++) for (int j = i; j > 0 && x [j -1] > x [j]; j - -) swap (x, j, j -1); } // method insertionSort What if we want a version of InsertionSort to sort an array of String elements into lexicographic order? Illegal! public static void insertionSort (Object[ ] x) { for (int i = 1; i < x.length; i++) for (int j = i; j > 0 && x [j -1] > x [j]; j - -) swap (x, j, j -1); } // method insertionSort public static void insertionSort (Object[ ] x) { for (int i = 1; i < x.length; i++) for (int j = i; j > 0 && (((Comparable)x [j -1]).compareTo (x [j]) > 0);j- -) swap (x, j, j -1); } // method insertionSort To call this method: insertionSort (words); The Comparable interface is appropriate when you want the elements sorted by the “natural” ordering, for example, String objects in lexicographic order. What if you want the elements sorted into an unnatural order? For example, Integer objects in decreasing order, or String objects sorted by the length of the string. public interface Comparator<T> { /** * Compares two specified elements. * * @param element1 – one of the elements. * @param element2 – the other element. * * @return a negative integer, 0, or a positive * integer, depending on whether element1 * is less than, equal to, or greater than * element2. */ int compare (T element1, T element2); } // interface Comparator For example, to sort an array intArray of Integer elements into decreasing order: public class Decreas ing implements Comparator<Integ er> { public int compare (Intege r i, Integer j) { return j.compareTo (i); } // method compare } // class Decreas ing public static <T> void insertionSort (T[ ] x, Comparator<T> comp) { for (int i = 1; i < x.length; i++) for (int j = i; j > 0 && comp.compare (x [j -1], x [j]) > 0); j- -) swap (x, j, j -1); } // method insertionSort insertionSort (intArray, new Decreasing()); Now suppose you want to sort stringArray, an array of strings, in increasing order of the lengths of the strings, with lexicographic comparison of equal-length strings, so that “yes” < “here” < “true” < “maybe” public class ByLength implements Comparator<String> { /** * Compares two specified String objects * lexicographically if they have the same length, and * otherwise returns the difference in their lengths. * * @param s1 – one of the specified String objects. * @param s2 – the other specified String object. * * @return s1.compareTo (s2) if s1 and s2 have the * same length; otherwise, return * s1.length() – s2.length(). * */ public int compare (String s1, String s2) { int len1 = s1.length(), len2 = s2.length(); if (len1 == len2) return s1.compareTo (s2); return len1 – len2; } // method compare } // class ByLength insertionSort (stringArray, new ByLength ()); The insertionSort method is unchanged! The Comparator interface allows you to compare elements in a class any way you want, even if you cannot modify the class. For example, you cannot modify the Integer or String classes. Exercise: Determine the ordering of “Yes”, “here”, “true”, “maybe”, and “yes” by the following version of compare: public int compare (String s1, String s2) { return s1.length() – s2.length(); } // method compare How Fast Can We Sort? A decision tree is a binary tree in which each non-leaf represents a comparison between two elements and each leaf represents a sorted sequence of those elements. Left branch: Yes Right branch: No Example: Apply insertion sort to a1, a2, a3. a 1 < a2 ? a 2 < a3 ? a1 a2 a3 a1 a3 a2 a 1 < a3 ? a 1 < a3 ? a2 a1 a3 a3 a1 a2 a2 < a3? a2 a3 a1 a3 a2 a1 A decision tree has one leaf for each permutation of the n elements to be sorted. The number of permutations of n distinct elements is ? n! So a decision tree to sort n elements must have n! leaves. By the binary tree theorem, for any non-empty tree t, leaves (t) <= 2 height (t) Since n! = leaves(t), we must have n! <= 2 height (t) which implies that log2 (n!) <= height (t) In the context of a decision tree, height(t) represents the maximum number of comparisons needed to sort the n elements. So log2(n!) <= the maximum number of comparisons to sort n elements. Therefore, worstTime(n) >= log2(n!) By concept exercise 11.7, log2(n!) >= n/2 log2(n/2) So worstTime(n) >= n/2 log2(n/2) So worstTime(n) is (n log n) for any comparison-based sort. What can we say about averageTime(n)? averageTime(n) >= average number of comparisons = total number of comparisons / n! In a decision tree, what is the total number of comparisons equal to? averageTime(n) >= average number of comparisons = total number of comparisons / n! In a decision tree, what is the total number of comparisons equal to? Hint: The length of each path from the root to a leaf equals the number of comparisons in that path. The total number of comparisons is equal to the sum of all root-to-leaf path lengths. E(t), the external path length of tree t, is the sum of all root-to-leaf path lengths in t. So the average number of comparisons is E(t) / n! In a decision tree, the number of leaves is n!. so, by the external path length theorem, averageTime(n) >= average # comparisons = E(t) / n! >= (n! / 2) floor (log2(n!)) / n! = (1 / 2) floor (log2(n!)) >= (1 / 4) (log2(n!)) >= (n / 8) (log2(n / 2)) For any comparison-based sort, averageTime(n) is (n log n). Exercise: Suppose, for some sort method, worstTime(n) is linear logarithmic in n. True or false: 1. averageTime(n) must be linear logarithmic in n. 2. averageTime(n) is O(n log n). 3. averageTime(n) is (n log n). Fast Sorts Msrge Sort Given an array x of objects, keep splitting into subarrays until the size of a subarray is less than 7. Apply insertion sort to that subarray and merge the subarrays back together. For example, suppose x.length = 25. Here is an outline based on size: 25 Split 12 Split 6 Ins.sort 13 Split 6 Ins.sort 6 Ins.sort 7 Split 3 4 Ins.sort Ins.sort Merge 12 Merge 7 Merge 13 Merge 25 To simplify the merging, we’ll use an auxiliary array. Suppose we want to merge two sorted subarrays of 5 elements each: 20 30 44 71 95 15 17 28 33 88 15, the smaller of the two, is copied to the auxiliary array and the right index is incremented: 20 30 44 71 95 15 17 28 33 88 15 (in the auxiliary array) 17, the smaller of the two, is appended to the auxiliary array and the right index is again incremented: 20 30 44 71 95 15 17 28 33 88 15 17 (in the auxiliary array) 20, the smaller of the two, is copied to the auxiliary array and the left index is incremented: 20 30 44 71 95 15 17 28 33 88 15 17 20 (in the auxiliary array) 28, the smaller of the two, is copied to the auxiliary array and the right index is incremented. And so on. /** * Sorts a specified array of objects according to the * compareTo method in the specified class of elements. * The worstTime(n) is linear-logarithmic in n. * * @param a – the array of objects to be sorted. * */ public static void sort(Object[ ] a) public static void sort(Object[ ] a) { Object aux[ ] = (Object[ ])a.clone(); mergeSort(aux, a, 0, a.length); } /** * Sorts, by the Comparable interface, a specified range of a * specified array into the same range of another specified array. * The worstTime(k) is linear-logarithmic in k, where k is the * size of the subarray. * * @param src – the specified array whose elements are to be * sorted into another specified array. * @param dest – the specified array whose subarray is to be * * sorted. * @param low: the smallest index in the range to be sorted. * @param high: 1 + the largest index in the range to be * sorted. */ private static void mergeSort (Object src[ ], Object dest[ ], int low, int high) aux 59 46 32 80 46 55 87 43 44 81 mergeSort (aux, a, 0, 10) a 59 46 32 80 46 55 mergeSort (a, aux, 0, 5) Insertion Sort aux 32 46 46 59 87 43 44 81 mergeSort (a , aux, 5, 10) Insertion Sort 80 43 44 55 81 87 merge a 32 43 44 46 46 55 59 80 81 87 aux 59 46 32 80 46 55 87 43 44 81 95 12 17 80 75 33 40 61 16 50 mergeSort (aux, a, 0, 20) a 59 46 32 80 46 55 87 43 44 81 95 12 17 80 75 33 40 61 16 50 mergeSort (a, aux, 0, 10) mergeSort (a, aux, 10, 20) aux 59 46 32 80 46 mergeSort (aux, a, 0, 5) Insertion Sort a 32 46 46 59 80 55 87 43 44 81 95 12 17 80 75 33 40 61 16 50 mergeSort mergeSort mergeSort (aux, a, 5, 10) (aux, a, 10, 15) (aux, a, 15, 20) Insertion Sort Insertion Sort Insertion Sort 43 44 55 81 87 12 17 75 80 95 16 33 40 50 61 a 32 46 46 59 80 43 44 55 81 87 12 17 75 80 95 merge 16 33 40 50 61 merge aux 32 43 44 46 46 55 59 80 81 84 12 16 17 33 40 50 61 75 80 95 merge a 12 16 17 32 33 40 43 44 46 46 50 55 59 61 75 80 80 81 84 87 95 private static void mergeSort (Object src[ ], Object dest[ ], int low, int high) { int length = high – low; // Use Insertion Sort for small subarrays. if (length < 7) { for (int i = low; i < high; i++) for (int j = i; j >low && ((Comparable)dest[j-1]).compareTo(dest[j]) > 0; j--) swap (dest, j, j-1); return; } // if length < 7 // Sort left and right halves of src into dest. int mid = (low + high) / 2; mergeSort (dest, src, low, mid); mergeSort (dest, src, mid, high); // If left subarray less than right subarray, copy src to dest. if (((Comparable)src [mid-1]).compareTo (src [mid]) <= 0) { System.arraycopy (src, low, dest, low, length); return; } // Merge sorted subarrays in src into dest. for (int i = low, p = low, q = mid; i < high; i++) if (q>=high || (p<mid && ((Comparable)src[p]).compareTo (src[q])<= 0)) dest [i] = src [p++]; else dest[i] = src[q++]; } // method mergeSort max comparisons n n/2 n/4 n/2 n/4 n/4 n/4 n/4 n/4 … n/4 n/4 n/2 n/2 n n n n For merge sort, worstTime(n) is O(n log n) Therefore averageTime(n) IS linear-logarithmic in n Exercise: Show the steps, including the calls to mergeSort, in merge sorting the following: aux 27 26 25 … 2 1 mergeSort (aux, a, 0, 27) a Arrays.java also a version of merge sort that takes a Comparator parameter: public static <T> void sort(T[] a, Comparator<? super T> c) To call this version: Arrays.sort (myArray, new ByLength ()); Collections.java also has two versions of merge sort: One to sort List<T> in the natural order, and one to sort List<T> that has a second parameter of type Comparator. Collections.sort (myList); Collections.sort (myList, new Reverse()); Quick Sort /** * Sorts a into ascending order. * The worstTime(n) is O(n * n), and averageTime(n) is linear* logarithmic in n. */ public static void sort (int[ ] a) { sort1(a, 0, a.length); } // method sort /** * Sorts the array x, from index off (inclusive) to index * off + len (exclusive), into ascending order. */ private static void sort1(int x[ ], int off, int len) If len < 7, use insertion sort Otherwise, partition about a pivot element. For now take pivot to be the median of the first, middle and last elements. 59 46 32 80 46 55 87 43 44 81 95 12 17 80 75 33 40 61 16 50 v = pivot = median of x [0], x [0 + 20 / 2], x [19] = median of {59, 95, 50} =? In a loop, we will partition x into a left subarray of items <= v and a right subarray of items >= v. We then recursively call sort1 for the left and right subarrays. Here is the basic loop and calls: int v = x[m]; // v is the pivot int b = off, c = off + len - 1; while(true) { while (x[b] < v) b++; while (x[c] > v) c--; if (b > c) break; swap(x, b++, c--); } if (c – off + 1 > 1) sort1 (x, off, c - off + 1); if (off + len – b > 1) sort1 (x, b, off + len - b); 59 46 32 80 46 55 87 43 44 81 95 12 17 80 75 33 40 61 16 50 b Increment b until x [b] >= 59; decrement c until x [c] =< 59. Then swap x [b] with x [c] and bump b and c. c 50 46 32 80 46 55 87 43 44 81 95 12 17 80 75 33 40 61 16 59 b c 50 46 32 16 46 55 87 43 44 81 95 12 17 80 75 33 40 61 80 59 b c 50 46 32 16 46 55 40 43 44 81 95 12 17 80 75 33 87 61 80 59 b c 50 46 32 16 46 55 40 43 44 33 95 12 17 80 75 81 87 61 80 59 b c 50 46 32 16 46 55 40 43 44 33 17 12 95 80 75 81 87 61 80 59 bc 50 46 32 16 46 55 40 43 44 33 17 12 95 80 75 81 87 61 80 59 c c = 11 and b = 12 Every element in x [0 … 11] <=59 Every element in x [12 … 19] >= 59 b So we now call: sort (x, 0, 12); sort (x, 12, 8); For a refinement, elements equal to the pivot will be moved to locations between c and b, so they are not involved in any further partitioning. During partitioning, elements equal to the pivot are stored at either end of the subarray, and then moved to the middle after partitioning. int v = x[m]; // Establish Invariant: v* (<v)* (>v)* v* int a = off, b = a, c = off + len - 1, d = c; while(true) { while (b <= c && x[b] <= v) { if (x[b] == v) swap(x, a++, b); b++; } while (c >= b && x[c] >= v) { if (x[c] == v) swap(x, c, d--); c--; } if (b > c) break; swap(x, b++, c--); } 59 46 59 80 46 55 87 43 44 81 95 12 17 80 75 33 40 59 16 50 b c 59 59 46 50 46 55 16 43 44 40 33 12 17 80 75 95 81 80 87 59 c b 12 17 46 50 46 55 16 43 44 40 33 59 59 59 75 95 81 80 87 80 sort1 (x, 0, 11); sort1 (x, 14, 6); For another refinement, suppose len, the size of the subarray to be sorted, is > 40. Then split the subarray into 3 segments, and take the median of each segment. pivot = median of the 3 medians For example, suppose we have len = 90. Then we split x up into subarrays x [0 … 29], x [30 … 59] and x [60 … 89]. element 55…87… 22 92…33…12 21…46…67 index 0 15 29 30 45 59 60 75 89 For example, suppose we have len = 90. Then we split x up into subarrays x [0 … 29], x [30 … 59] and x [60 … 89]. element 55…87… 22 92…33…12 21…46…67 index 0 15 29 30 45 59 60 75 89 55 33 46 46 For the estimate of averageTime(n) and worstTime(n), we can imagine that quick sort creates a binary search tree. The root of the tree is the pivot. After the first partitioning, the pivot of the left subarray becomes the root of the left subtree, and so on. Suppose x = {0, 1, 2, 3, … 98, 99} Already in order! 50 25 12 6 75 38 19 32 Best-Case Partitioning 63 44 57 88 69 82 94 Best-Case Partitioning: log2(n/7) levels; at most n comparisons per level Total number of comparisons <= n log2n Average-Case Partitioning: Average height of a binary search tree: logarithmic in n log2(n/7) levels; at most n Comparisons per level Total number of comparisons n log2n The averageTime(n): linear logarithmic in n. Worst-Case Partitioning? Suppose x = {1, 2, . . . , 18, 0, 38, 19, 20, 21, . . . , 37} 37 1 38 0 35 3 2 # of comparisons 39 37 36 35 33 33 5 34 31 … Worst-Case Partitioning: # of comparisons n + (n-2) + (n-4) + … + 8 n/2 = Σ (2i) 2 * (n/2 * (n/2 + 1) / 2) i=4 n2 / 4 The worstTime(n) is quadratic in n. Exercise: Suppose x starts out as 10, 9, 8, 7, 6, 16, 17, 18, 19, 20, 5, 4, 3, 2, 1, 11, 12, 13, 14, 15 the median of x [0], x [10], x [19] = 10. Just before the first swap of x[b] and x[c], b = 5 and c = 14. After all swaps, what does x contain? What are the recursive calls to quick sort: sort1 (x, ?, ?); sort1 (x, ?, ?);