241-423 Advanced Data Structures and Algorithms Semester 2, 2013-2014 3. Sorting Algorithms • Objective – examine popular sorting algorithms, with an emphasis on divide and conquer ADSA: Sorting/3 1 Contents 1. Insertion Sort 2. Divide and Conquer Algorithms 3. Merge Sort 4. Quicksort 5. Comparison of Sorting Algorithms 6. Finding the kth Largest Element ADSA: Sorting/3 2 1. Insertion Sort • Each pass inserts an element (x) into a sorted sublist (sub-array) on the left. • Items larger than x move to the right to make room for its insertion. ADSA: Sorting/3 3 Insertion Sort Diagram ADSA: Sorting/3 4 Outline Algorithm • Assume the first array element is in the right position. • In the ith pass (1 ≤ i ≤ n-1), the elements in the range 0 to i-1 are already sorted. • Insert ith position target into correct position j by moving elements in the range [j, i-1] to the right until there is space in arr[j]. ADSA: Sorting/3 5 Simple Insertion Sort public static void insertion_srt(int arr[]) { int n = arr.length; for (int i = 1; i < n; i++) { int j = i; int target = arr[i]; // sort ith elem while ((j > 0) && (arr[j-1] > target)){ arr[j] = arr[j-1]; // move right j--; } arr[j] = target; } } ADSA: Sorting/3 6 insertionSort() public static <T extends Comparable<? super T>> void insertionSort(T[] arr) { int n = arr.length; for (int i = 1; i < n; i++) { int j = i; T target = arr[i]; while (j > 0 && target.compareTo(arr[j-1]) < 0) { arr[j] = arr[j-1]; j--; } arr[j] = target; } } // end of insertionSort() ADSA: Sorting/3 7 Insertion Sort Efficiency • Best case running time is O(n) – when the array is already sorted • The worst and average case running times are O(n2). • Insertion sort is very efficient when the array is "almost sorted". ADSA: Sorting/3 8 2. Divide and Conquer Algorithms • Divide a problem into smaller versions of the same problem, using recursion. • Solve the smaller versions. • Combine the small versions solutions together to get an answer for the big, original problem. ADSA: Sorting/3 9 Examples • Binary search • Merge sort and quicksort (here) • Binary tree traversal ADSA: Sorting/3 10 3. Merge Sort • Sort an array with n elements by splitting it into two halves. Keep splitting in half recursively. • Sort the small elements. • Merge the small elements recursively back together into a single sorted array. ADSA: Sorting/3 11 Merge Sort Diagram ADSA: Sorting/3 12 General Sort Methods • Ford and Topp's Arrays class provides two versions of the merge sort algorithm. – one version takes an Object array arr[] as input; – the other version is generic and specifies arr[] as an array of type T • Both methods call msort() to carry out the merge sort. ADSA: Sorting/3 13 sort() - with Object Array public static void sort(Object[] arr) { // create a temporary array Object[] tempArr = arr.clone(); msort(arr, tempArr, 0, arr.length); } sort the entire array (the range 0-arr.length) ADSA: Sorting/3 14 sort() - Generic Version public static <T extends Comparable<? super T>> void sort(T[] arr) { // create a temporary array T[] tempArr = (T[])arr.clone(); msort(arr, tempArr, 0, arr.length); } ADSA: Sorting/3 15 msort() • Split into two lists by computing the midpoint of the index range: int midpt = (last + first)/2; • Call msort() recursively on the index range [first, mid) and on the index range [mid, last). • When the resulting lists are small, start merging them back together into sorted order. ADSA: Sorting/3 16 Tracing msort() split merge ADSA: Sorting/3 17 msort() private static void msort(Object[] arr, Object[] tempArr, int first, int last) { // if sublist has more than 1 elem. if ((first + 1) < last){ int midpt = (last + first)/2; msort(arr, tempArr, first, midpt); msort(arr, tempArr, midpt, last); // if arr[] is now sorted, finish if (((Comparable)arr[midpt-1]).compareTo (arr[midpt]) <= 0) return; : ADSA: Sorting/3 18 // indexA scans arr[] in range [first, mid) int indexA = first; // indexB scans arr[] in range [mid, last) int indexB = midpt; int indexC = first; // for merged temp list /* while both sublists are not finished, compare arr[indexA] and arr[indexB]; copy the smaller into the temp list */ while (indexA < midpt && indexB < last) { if (((Comparable)arr[indexA]).compareTo (arr[indexB]) < 0) { tempArr[indexC] = arr[indexA]; indexA++; } ADSA: Sorting/3 19 else { tempArr[indexC] = arr[indexB]; indexB++; } indexC++; } // copy over what's left of sublist A while (indexA < midpt) { tempArr[indexC] = arr[indexA]; indexA++; indexC++; } : ADSA: Sorting/3 20 // copy over what's left of sublist B while (indexB < last) { tempArr[indexC] = arr[indexB]; indexB++; indexC++; } // copy temp array back to arr[] for (int i = first; i < last; i++) arr[i] = tempArr[i]; } } // end of msort() ADSA: Sorting/3 21 msort() Notes • Continue only as long as first+1 • Do not merge arr if arr[mid-1] ADSA: Sorting/3 < last < arr[mid] 22 Recursion Tree for Merge Sort ADSA: Sorting/3 23 Efficiency of Merge Sort • Total number of comparisons = no. of levels * no. of comparisons at a level • msort() starts with a list of size n • msort() recurses until the sublist size is 1 • Each level roughly halves the sublist size: – n, n/2, n/4, ..., 1 – no. of levels = log2n (roughly) ADSA: Sorting/3 24 • No. of msort() calls at a level: – – – – – at level 0: 1 msort() call at level 1: 2 calls at level 2: 4 calls ... at level i: 2i calls ADSA: Sorting/3 25 • No of comparisons in 1 msort call at a level: – – – – – at level 0: a msort() call compares n elements at level 1: n/2 comparisons at level 2: n/4 comparisons ... at level i: n/2i elements • Total no. of comparisons at a level: – no. of calls at a level * comparisons in 1 msort()call – 2i * n/2i = n ADSA: Sorting/3 26 • Total number of comparisons = no. of levels * no. of comparisons at a level = log2n * n • So the worst case running time is = O(n log2n) ADSA: Sorting/3 27 4. Quicksort • Uses a divide-and-conquer strategy like merge sort. • But, unlike merge sort, quicksort is an in-place sorting algorithm – elements are exchanged within the list without the need for temporary lists/arrays – space efficient ADSA: Sorting/3 28 Quicksort Steps • Pick an element, called a pivot, from the list. • Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it ADSA: Sorting/3 29 • Recursively call quicksort on the sublist of lesser elements and the sublist of greater elements. • The stopping case for the recursion are lists of size zero or one, which are always sorted. ADSA: Sorting/3 30 Quicksort Diagram pivot ADSA: Sorting/3 31 Partitioning a List • The pivot is the element at index – mid = (first + last)/2. • Separate the elements of arr[] into two sublists, Sl and Sh. – Sl contains the elements ≤ pivot – Sh contains the elements ≥ pivot ADSA: Sorting/3 (l = low) (h = high) 32 • Exchange arr[first] and arr[mid] • Scan the list with index range [first+1, last) – scanUp starts at first+1 and moves up the list, finding elements for Sl. – scanDown starts at position last -1 and moves down the list, finding elements for Sh. ADSA: Sorting/3 33 • When arr[scanUp] pivot and arr[scanDown] pivot then the two elements are in the wrong sublists. • Exchange the elements at the two positions and then resume scanning. ADSA: Sorting/3 34 ADSA: Sorting/3 35 ADSA: Sorting/3 36 • scanUp and scanDown move toward each other until they meet or pass one another (scanDown scanUp). ADSA: Sorting/3 37 • scanDown is at the place where the pivot should appear – exchange arr[0] and arr[scanDown] to correctly position the pivot ADSA: Sorting/3 38 pivotIndex() • The method public static <T extends Comparable<? super T>> int pivotIndex(T[] arr, int first, int last) takes array arr and index range [first, last) and returns the index of the pivot after partitioning arr[]. ADSA: Sorting/3 39 public static <T extends Comparable<? super T>> int pivotIndex(T[] arr, int first, int last) { int mid; // index for the midpoint T pivot; if (first == last) // empty sublist return last; else if (first == (last-1)) // 1-element sublist return first; else { mid = (last + first)/2; pivot = arr[mid]; : ADSA: Sorting/3 40 // exchange pivot and bottom end of range arr[mid] = arr[first]; arr[first] = pivot; int scanUp = first + 1; int scanDown = last - 1; // scanning indices while(true) { /* move up the lower sublist while scanUp is less than or equal to scanDown and the array value is less than pivot */ while ((scanUp <= scanDown) && (arr[scanUp].compareTo(pivot) < 0)) scanUp++; ADSA: Sorting/3 41 /* move down upper sublist while array value is greater than the pivot */ while (pivot.compareTo(arr[scanDown]) < 0) scanDown--; /* if indices are not in their sublists, partition is complete */ if (scanUp >= scanDown) break; // found two elements in wrong sublists; exchange T temp = arr[scanUp]; arr[scanUp] = arr[scanDown]; arr[scanDown] = temp; scanUp++; scanDown--; } : ADSA: Sorting/3 42 // copy pivot to index posn (scanDown) that // partitions the sublists arr[first] = arr[scanDown]; arr[scanDown] = pivot; return scanDown; } } // end of pivotIndex() ADSA: Sorting/3 43 quicksort() • quicksort() sorts a generic array arr[] by calling qsort() with the index range [0, arr.length). public static <T extends Comparable<? super T>> void quicksort(T[] arr) { qsort(arr, 0, arr.length); } ADSA: Sorting/3 44 qsort() • Recursively partition the elements in the index range into smaller and smaller sublists, terminating when the size of a list is 0 or 1. • For efficiency, handle a list of size 2 by comparing the elements and exchanging them if necessary. ADSA: Sorting/3 45 • For larger lists, call pivotIndex() to reorder the elements and determine the pivot. • Make two calls to qsort(): – the first call specifies the index range for the lower sublist – the second call specifies the index range for the upper sublist ADSA: Sorting/3 46 qSort() Diagram ADSA: Sorting/3 47 private static void qsort(T[] { // if range if ((last – return; <T extends Comparable<? super T>> arr, int first, int last) is less than two elements first) <= 1) // if sublist has two elements else if ((last – first) == 2) { : ADSA: Sorting/3 48 /* compare arr[first] and arr[last-1] and exchange if necessary */ if (arr[last-1].compareTo(arr[first]) < 0) { T temp = arr[last-1]; arr[last-1] = arr[first]; arr[first] = temp; } return; } else { int pivotLoc = pivotIndex(arr, first, last); qsort(arr, first, pivotLoc); qsort(arr, pivotLoc +1, last); } } // end of qsort() ADSA: Sorting/3 49 Running Time of Quicksort • The average case running time is O(n log2n). • The best case occurs when the array is already sorted. ADSA: Sorting/3 50 • Quicksort is efficient even when the array is in descending order. ADSA: Sorting/3 51 • The worst-case occurs when the chosen pivot is always the largest or smallest element in its sublist. – the running time is O(n2) – highly unlikely ADSA: Sorting/3 52 5. Comparison of Sorting Algorithms • An inversion in an array, arr[], is an ordered pair (arr[i], arr[j]), i < j, where arr[i] > arr[j]. • When sorting in ascending order, arr[i] and arr[j] are out of order. ADSA: Sorting/3 53 • The O(n2) sorting algorithms compare adjacent elements, generally remove one inversion with each iteration – e.g. selection and insertion sort • The O(n log2n) sorting algorithms compare non-adjacent elements, and generally remove more than one inversion with each iteration. – e.g. quicksort and merge sort ADSA: Sorting/3 54 Timing Sorts import java.util.Random; import ds.util.Arrays; import ds.time.Timing; public class TimingSorts { public static void main(String[] args) { final int SIZE = 75000; Integer[] arr1 = new Integer[SIZE], arr2 = new Integer[SIZE], arr3 = new Integer[SIZE]; Random rnd = new Random(); : ADSA: Sorting/3 55 /* load each array with the same sequence of random numbers in the range 0 to 999999 */ int rndNum; for (int i=0; i < SIZE; i++) { rndNum = rnd.nextInt(1000000); arr1[i] = arr2[i] = arr3[i] = rndNum; } } // call timeSort() for each sort timeSort(arr1, 0, "Merge sort"); timeSort(arr2, 1, "Quick sort"); timeSort(arr3, 2, "Insertion sort"); // end of main() ADSA: Sorting/3 56 public static <T extends Comparable<? super T>> void timeSort(T[] arr, int sortType, String sortName) { Timing t = new Timing(); t.start(); if(sortType == 0) Arrays.sort(arr); // merge sort in F&T else if (sortType == 1) Arrays.quicksort(arr); else Arrays.insertionSort(arr); double timeRequired = t.stop(); } outputFirst_Last(arr); System.out.print(" " + sortName + " time is " + timeRequired + "\n\n"); // end of timeSort() ADSA: Sorting/3 57 public static void outputFirst_Last(Object[] arr) // output first 3 elements and last 3 elements { for (int i=0; i < 3; i++) System.out.print(arr[i] + " "); System.out.print(". . . "); for (int i=n-3; i < arr.length; i++) System.out.print(arr[i] + " "); System.out.println(); } ADSA: Sorting/3 58 Output 26 38 47 . . . 999980 999984 999984 Merge sort time is 0.109 26 38 47 . . . 999980 999984 999984 Quick sort time is 0.078 26 38 47 . . . 999980 999984 999984 Insertion sort time is 100.611 ADSA: Sorting/3 O(n log2n) O(n2) 59 6. Finding the th k Largest Element • Sort the array and then access the element at position k. – running time is O(n log2n) is we use quicksort or merge sort • For a more efficient solution, locate the position of the kth-largest value by partitioning the elements into two sublists. ADSA: Sorting/3 60 values ≤ kth-largest 0 ... k-1 kth-largest k values ≥ kth-largest k+1 ... n-1 • The lower sublist contains k elements that are ≤ the kth-largest. • The upper sublist contains elements that are ≥ the kth-largest. • The elements in the sublists do not need to be ordered. ADSA: Sorting/3 61 • Use the pivoting technique from the quicksort algorithm to create a partition. • The algorithm is recursive: – index = pivotIndex() – If index == k, done, return arr[index]; – otherwise, call pivotIndex() with range [first, index) if k < index, or with range [index+1, last) if k > index. • examine only one of the lists ADSA: Sorting/3 62 public static <T extends Comparable<? super T>> int findKth(T[] arr, int first, int last, int k) { if (first > last) return -1; // partition range (first, last) in arr about the // pivot arr[index] int index = pivotIndex(arr, first, last); // if index == k, we are done. kth largest is arr[k] if (index == k) return arr[index]; // return array value else if(k < index) // search in lower sublist (first, index) findKth(arr, first, index, k); else // search in upper sublist (index+1, last) findKth(arr, index+1, last, k); } ADSA: Sorting/3 63 Running Time of findKth() • The running time is O(n) – no of comparisons = n + n/2 + n/4 + n/8 + ... = 2n • This is faster than the O(n log2n) result for a sorted array – this is to be expected since findKth() only uses one of its sublists at each recursive call compared to quicksort or merge sort which use both ADSA: Sorting/3 64