Advanced Sorting Methods: Shellsort Shellsort is an extension of insertion sort, which gains speed by allowing exchanges of elements that are far apart. The idea: Rearrange the file to give it a property that taking very h-th element (starting anywhere) yields a sorted file, called h-sorted. That is, h-sorted file is "h" independent sorted files interleaved together. Example: Let h = 13 during the first step, h = 4 during the second step, and during the final step h= 1 (insertion sort at this step) Step1: 15 8 7 3 2 14 11 1 5 9 4 12 13 6 10 4 12 13 15 10 10 12 13 compare and exchange Step2: 6 8 7 3 2 14 Step 3: 2 8 4 1 5 9 11 7 3 1 6 5 14 9 15 11 To implement Shell sort we need a helper method, SegmentedInsertionSort. Input: A, input array; N, number of elements; H, distance between elements in the same segment. Output: Array, A, H-sorted. Algorithm SegmentedInsertionSort (A, N, H) for l := H + 1 to N do j := l – H /* j counts down through the current segment */ while j > 0 do if precedes (A[j + H], A[j]) then swap (A[j + H], A[j]) j := j – H else j := 0 endif endwhile endfor The Shell sort method now becomes: Input: A, input array; N number of elements; Output: Array, A, sorted. Algorithm ShellSort (A, N) H := N / 2 while H > 0 do SegmentedInsertionSort (A, N, H) H := H / 2 endwhile Notes: 1. H = H / 2 is a "bad" incremental sequence, because it repeatedly compares the same values, and at the same time some values will not be compared to each other until H = 1. 2. Any incremental sequence of values of H can be used, as long as the last value is 1. Here are examples of "good" incremental sequences: H = 3 * H + 1 gives the following incremental sequence … 1093, 364, 121, 40, 13, 4, 1. H = 2 * H + 1 gives the following incremental sequence … 127, 63, 31, 15, 7, 3, 1. Efficiency of Shell sort Let the incremental sequence be H : = H / 2, foe example …, 64, 32, 16, 8, 4, 2, 1. Then: – The number of repetitions of SegmentedInsertionSort is O(Log N). – The outer loop is each SegmentedInsertionSort is O(N). – The inner loop of each SegmentedInsertionSort depends on the current order of the data within that segment. Therefore, the total number of comparisons in this case is O(A * N * Log N), where A is unknown. Empirical results for a better incremental sequence, H = 3 * H + 1, show the average efficiency of Shell sort in terms of number of comparisons to be O(N * (log N)^2), which is almost O(N^1.5). Advanced sorting: Merge sort The idea: Given two files in ascending order, put them into a third file also arranged in ascending order. Example: file A 3 7 9 12 13 14 file B 1 5 8 10 17 19 file C 1 3 5 7 8 9 10 12 13 14 17 19 The efficiency of this process is O(N) The algorithm: (let us call this procedure merge) 1 Compare two numbers 2 Transfer the smaller number 3 Advance to the next number and go to 1. Repeat until one of the files is emptied. Move the numbers left on the other file to the third file. Algorithm merge (source, destination, lower, mid, upper) Input: source array, and a copy of it, destination; lower, mid and upper are integers defining sublists to be merged. Output: destination file sorted. int s1 := lower; int s2 := mid + 1; int d := lower while (s1 <= mid and s2 <=upper) { if (precedes (source[s1], source[s2]) { destination[d] := source[s1]; s1 := s1 + 1 else destination[d] := source[s2]; s2 := s2 + 1 d := d + 1 } // end if } // end while if (s1 > mid) { while (s2 <= upper) { destination[d] := source[s2]; s2 := s2 + 1; d := d +1} else while (s1 <= mid) { destination[d] := source[s1]; s1 := s1 + 1; d := d +1} } // end if Efficiency of merge: O(N), where N is the number of items in source and destination. Note that merge takes two already sorted files. Therefore, we need another procedure, mergeSort, to actually sort these files. mergeSort is a recursive procedure, which at each step takes a file to be sorted, and produces two sorted halves of this file. Because mergeSort continuously calls merge, and merge works on two identical arrays, we must create a copy of original array, source, which we will call destination. Algorithm mergeSort (source, destination, lower, upper) Input: source array; a copy of source, destination; lower and upper are integers defining the current sublist to be sorted. Output: destination array sorted. if (lower <> upper) { mid := (lower + upper) / 2 mergeSort (destination, source, lower, mid) mergeSort (destination, source, mid+1, upper) merge (source, destination, lower, mid, upper) } Algorithm Sort (A, N) Input: Array, A, of items to be sorted; integer N defining the number of items to be sorted. Output: Array, A, sorted. create & initialize destionation[N] mergeSort (A, destination, 1, N) Quick sort The idea (assume the list of items to be sorted is represented as an array): 1. 2. 3. 4. 5. 6. Select a data item, called the pivot, which will be placed in its proper place at the end of the current step. Remove it from the array. Scan the array from right to left, comparing the data items with the pivot until an item with a smaller value is found. Put this item in the pivot’s place. Scan the array from left to right, comparing data items with the pivot, and find the first item which is greater than the pivot. Place it in the position freed by the item moved at the previous step. Continue alternating steps 2-3 until no more exchanges are possible. Place the pivot in the empty space, which is the proper place for that item. Consider the sub-file to the left of the pivot, and repeat the same process. Consider the sub-file to the right of the pivot, and repeat the same process. Example Consider the following list of items, and let the pivot be the leftmost item: Step 1: 15 8 7 3 2 14 11 1 5 9 4 12 13 6 10 10 8 7 3 2 14 11 1 5 9 4 12 13 6 15 Step 2: 10 8 7 3 2 14 11 1 5 9 4 12 13 6 15 6 8 7 3 2 14 11 1 5 9 4 12 13 ( ) 15 6 8 7 3 2 ( ) 11 1 5 9 4 12 13 14 15 6 8 7 3 2 4 11 1 5 9 ( ) 12 13 14 15 6 8 7 3 2 4 ( ) 1 5 9 11 12 13 14 15 6 8 7 3 2 4 9 1 5 ( ) 11 12 13 14 15 6 8 7 3 2 4 9 1 5 10 11 12 13 14 15 Example (contd.) Step 3: 6 8 7 3 2 4 9 1 5 10 11 12 13 14 15 5 8 7 3 2 4 9 1 ( ) 10 11 12 13 14 15 5 ( ) 7 3 2 ( ) 1 8 10 11 12 13 14 15 5 1 7 3 2 4 9 ( ) 8 10 11 12 13 14 15 5 1 ( ) 3 2 4 9 7 8 10 11 12 13 14 15 5 1 4 3 2 6 9 7 8 10 11 12 13 14 15 5 1 4 3 2 6 9 7 8 10 11 12 13 14 15 2 1 4 3 ( ) 6 8 7 9 10 11 12 13 14 15 2 1 4 3 5 6 8 7 9 10 11 12 13 14 15 9 Step 4: Example (contd.) Step 5: 2 1 4 3 5 6 8 ( ) 9 10 11 12 13 14 15 1 ( ) 4 3 5 6 7 8 9 10 11 12 13 14 15 1 2 4 3 5 6 7 8 9 10 11 12 13 14 15 1 2 4 3 5 6 7 8 9 10 11 12 13 14 15 1 2 3 ( ) 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Step 6: The partition method Algorithm partition (A, lo, hi) Input: Array, A, of items to be sorted; lo and hi, integers defining the scope of the array to be sorted. Output: Assuming A[lo] to be a pivotal value, array A is returned in a partitioned form, where pivotPoint is an index of the final destination of the pivot int pivot := A[lo] while (lo < hi) { while (precedes (pivot, A[hi]) & (lo < hi)) hi := hi – 1 if (hi <> lo) { A[lo] := A[hi]; lo := lo + 1} while (precedes (A[lo], pivot) & (lo < hi)) lo := lo + 1 if (hi <> lo) { A[hi] := A[lo]; hi := hi – 1} } // end while A[hi] := pivot; pivotPoint := hi The quickSort and sort procedures Algorithm quickSort (A, lo, hi) Input: Array, A, of items to be sorted; lo and hi, integers defining the scope of the array to be sorted. Output: Array, A, sorted. int pivotPoint := partition (A, lo, hi) if (lo < pivotPoint) quickSort (A, lo, pivotPoint-1) if (hi > pivotPoint) quickSort (A, pivotPoint+1, hi) Algorithm Sort (A, N) Input: Array, A, of items to be sorted; integer N defining the number of items to be sorted. Output: Array, A, sorted. quickSort (A, 1, N) Example and the partitioning method modified Consider the same list as in the previous example. Let the pivot be the rightmost item, and let us scan the file from both ends simultaneously exchanging elements that are out of order. When two pointers cross, exchange the pivot with the leftmost element of the right subfile. Step 1: 15 8 7 3 2 14 11 1 5 9 4 12 13 6 10 6 8 7 3 2 14 11 1 5 9 4 12 13 15 10 6 8 7 3 2 4 11 1 5 9 14 12 13 15 10 6 8 7 3 2 4 9 1 5 11 14 12 13 15 10 6 8 7 3 2 4 9 1 5 10 14 12 13 15 11 Example modified (contd.) Steps 2 - end: 6 8 7 3 2 4 9 1 5 10 14 12 1 8 7 3 2 4 9 6 5 1 4 7 3 2 8 9 6 5 1 4 2 3 7 8 9 6 5 1 4 2 3 5 8 9 6 7 10 11 12 1 2 4 3 5 6 9 8 7 10 11 1 2 3 4 5 6 7 8 9 10 11 13 15 11 13 15 14 12 13 14 15 12 13 14 15 Static representation of the partitioning process Example (original) 15 10 11 6 5 9 2 8 4 1 3 7 12 13 14 Static representation of the partitioning process Example (modified) 10 11 5 3 2 7 4 6 14 9 13 1 8 12 15 Efficiency results Note that in the best case, if at each partitioning stage the file is divided into 2 equal parts, we have: – – – – 1 call to quickSort with a segment of size N; 2 calls to quickSort with a segment of size N/2; 4 calls to quickSort with a segment of size N/4; 8 calls to quickSort with a segment of size N8, etc. That is, the tree of recursive calls has (log N) levels in this best case, and N comparisons are made at each level. Therefore, the total number of comparisons will be N log N. The following recurrence relation describes this case: CN = 2*C(N/2) + N for N >= 2 with C1 = 0 To solve this relation, assume N = 2^n, and divide both sides by 2^n: C(2^n) / 2^n = C(2^(n-1)) / 2^(n-1) + 1 = ... restore = C(2^(n-2)) / 2^(n-2) + 1 + 1 = C(2^(n -3)) / 2^(n-3) + 1 + 1 + 1 = ... the = C(2^0) / 2^0 + n = C1 / 1 + n = 0 + n = log N * N denumerator Efficiency results (cont.) Result 1: The best case efficiency of quick sort is N log N (the pivot always divides the file in two equal halves). Result 2: The worst case efficiency of quick sort is N2 (file already sorted). Result 3: The average case efficiency of quick sort is 1.38 N log N. This result makes Quick sort good "general-purpose" sort. Its inner loop is very short, thus making Quick sort better compared to other N log N sorting methods. Also: Quick Sort is an "in-place" method, which uses only a small auxiliary stack for recursion.