Sorting Devon M. Simmonds University of North Carolina, Wilmington TIME: Tuesday/Thursday 11:11:50am in 1012 & Thursday 3:30-5:10pm in 2006. Office hours: TR 1-2pm or by appointment. Office location: CI2046. Email: simmondsd[@]uncw.edu 1 Objectives • To introduce basic sort algorithms: – – – – – Exchange/bubble sort Insertion sort Selection soft Quicksort mergesort 2 Sorting: The Big Picture Given n comparable elements in an array, sort them in an increasing (or decreasing) order. Simple algorithms: O(n2) Insertion sort Selection sort Bubble sort Shell sort … Fancier algorithms: O(n log n) Heap sort Merge sort Quick sort … Comparison lower bound: (n log n) Specialized algorithms: O(n) Bucket sort Radix sort Handling huge data sets External sorting 3 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Exchange (Bubble) Sort • Algorithm (for a list a with n elements) – Repeat until list a is sorted • for each i from n-1 downto 1 – if(a[i] < a[i-1]) » exchange (a[i], a[i-1]) – Result: After the kth pass, the first k elements are sorted. 4 Example • 17 21 6 10 16 12 15 4 5 Try it out: Bubble sort • Insert 31, 16, 54, 4, 2, 17, 6 6 Bubble Sort Code def bubbleSort(self, a): for i in range(len(a)): for j in range(len(a)-1, 0, -1): if(a[j] < a[j-1]): #exchange items temp = a[j-1] a[j-1] = a[j] a[j] = temp 7 Insertion Sort: Idea • Algorithm (for a list a[0..n] with n elements) – for each i from 1 to n-1 • put the ith element in the correct place among the first i+1 elements – At the kth step, put the kth input element in the correct place among the first k elements – Result: After the kth pass, the first k elements are sorted. 8 Example 9 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Example 10 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Try it out: Insertion sort • Insert 31, 16, 54, 4, 2, 17, 6 11 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Insertion Sort Code def insertionSort(self, a): for i in range(1, len(a)): #insert a(i) in correct position in a(0) .. a(i) temp = a[i] j = i-1 while (j >= 0 and a[j] > temp): if(a[j] > temp): a[j+1] = a[j] j -= 1 a[j+1] = temp 12 Selection Sort: idea • • • • Find the smallest element, put it 1st Find the next smallest element, put it 2nd Find the next smallest, put it 3rd And so on … 13 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Selection Sort: idea • Algorithm (for a list a with n elements) – for each i from 1 to n-1 • Find the smallest element, put it in position i-1 – Result: After the kth pass, the first k elements are sorted. 14 Selection Sort: Code void SelectionSort (Array a[0..n-1]) { for (i=0, i<n; ++i) { j = Find index of smallest entry in a[i..n-1] Swap(a[i],a[j]) } } Runtime: worst case : best case : average case : 15 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Try it out: Selection sort • Insert 31, 16, 54, 4, 2, 17, 6 16 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Selection Sort Code def selectionSort(self, a): for i in range(len(a)-1): #find smallest of items i, i+1, i+2, .., size()-1 and #exchange smallest with item in position i sIndex = i smallest = a[i] #j = i + 1 #while (j < len(a)): for j in range(i+1, len(a)): if(a[j] < smallest): sIndex = j smallest = a[j] #exchange items temp = a[i] a[i] = smallest a[sIndex] = temp 17 Merge Sort MergeSort (Array [1..n]) 1. Split Array in half 2. Recursively sort each half 3. Merge two halves together Merge “The 2-pointer method” (a1[1..n],a2[1..n]) i1=1, i2=1 While (i1<n, i2<n) { if (a1[i1] < a2[i2]) { Next is a1[i1] i1++ } else { Next is a2[i2] i2++ } } Now throw in the dregs… 18 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Mergesort example 82945316 Divide 8294 5316 Divide 82 94 53 16 1-element 8 2 9 4 5 3 1 6 Merge 28 49 35 16 Merge 2489 1356 Final: 12345689 19 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Mergesort example def mergeSort(self, alist): if len(alist)>1: Di mid = len(alist)//2 vi lefthalf = alist[:mid] de righthalf = alist[mid:] ms([8 2 9 4 5 3 1 6]) Divide lh=[8 2 9 4] rh=[5 3 1 6] 3 function ms(lh) merge(lh, rh) ms(rh) 1 3 Calls 2 ms([8 2 9 4]) Divide lh=[8 2] rh=[9 4] 3 function ms(lh) ms(rh) Calls 1 2 1 self.mergeSort(lefthalf) 2 self.mergeSort(righthalf) 3 self.merge(lefthalf, righthalf, alist) def mergeSort(self, alist): merge(lh, rh) 3 if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] ms([8 2]) ms([9 4]) Dividerh=[2] lh=[9] rh=[4] Divide lh=[8] 3 function ms(lh) rh) 3 function ms(rh) merge(lh,merge(lh, rh) ms(lh) ms(rh) 3 Calls 1 23 Calls 1 2 [2, 8] self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) [4, 9] CSE 326: Data Structures Sorting Ben Lerner Summer 2007 20 Mergesort example ms([8 2 9 4 5 3 1 6]) Divide lh=[8 2 9 4] rh=[5 3 1 6] 3 function ms(lh) merge(lh, rh) ms(rh) 1 3 Calls 2 ms([8 2 9 4]) Divide lh=[8 2] rh=[9 4] 3 function ms(lh) ms(rh) Calls 1 2 def mergeSort(self, alist): Di viif len(alist)>1: de mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] 1 self.mergeSort(lefthalf) 2 self.mergeSort(righthalf) 3 self.merge(lefthalf, righthalf, alist) merge(lh, rh) ms([8 2]) ms([9 4]) [2, 8] [4, 9] 3 merge([2, 8], [4 9]) [2, 4, 8, 9] What is next? ms([5 3 1 6]) 21 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Try it out: Merge sort • Insert 31, 16, 54, 4, 2, 17, 6 22 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 MergeSort Code def mergeSort(self, alist): if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] print("Splitting ",alist, "into", lefthalf, "and", righthalf) self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) def merge(self, lefthalf, righthalf, alist): print("Merging ", lefthalf, "and", righthalf) i=0 j=0 k=0 while i<len(lefthalf) and j<len(righthalf): if lefthalf[i]<righthalf[j]: alist[k]=lefthalf[i] i=i+1 else: alist[k]=righthalf[j] j=j+1 k=k+1 while i<len(lefthalf): alist[k]=lefthalf[i] i=i+1 k=k+1 while j<len(righthalf): alist[k]=righthalf[j] j=j+1 k=k+1 23 The steps of QuickSort S 81 13 31 43 select pivot value 57 75 92 S1 0 13 26 31 43 0 26 65 S2 partition S 75 65 81 92 57 S1 QuickSort(S1) and QuickSort(S2) S2 0 13 26 31 43 57 65 S 65 0 13 26 31 43 57 75 75 81 81 92 92 Presto! S is sorted [Weiss] 24 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 QuickSort Example 0 1 2 3 4 5 6 7 8 9 8 1 4 9 6 3 5 2 7 0 8 1 4 9 0 3 5 2 7 6 j i •Choose the pivot as the median of three. •Place the pivot and the largest at the right and the smallest at the left 25 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 QuickSort Example 8 1 4 9 0 3 5 2 7 6 j i 1) 2) 3) 4) 5) 6) Move i to the right to first element larger than pivot. Move j to the left to first element smaller than pivot. Swap elements at I and j Repeat until i and j cross Swap pivot with element at i Repeat steps 1-5 for the two partitions 1) Elements > pivot 2) Elements < pivot 26 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Recursive Quicksort Quicksort(A[]: integer array, left,right : integer): { pivotindex : integer; if left + CUTOFF right then pivot := median3(A,left,right); pivotindex := Partition(A,left,right-1,pivot); Quicksort(A, left, pivotindex – 1); Quicksort(A, pivotindex + 1, right); else Insertionsort(A,left,right); } Don’t use quicksort for small arrays. CUTOFF = 10 is reasonable. 27 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Try it out: Recursive quicksort • Insert 31, 16, 54, 4, 2, 17, 6 28 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 QuickSort: Average case complexity Turns out to be O(n log n) See Section 7.7.5 for an idea of the proof. Don’t need to know proof details for this course. 29 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 QuickSort Code def quickSort(self, array, start, end): left = start right= end if (right - left < 1): return else:#at least 2 elements to be sorted pivot = array[start] while (right> left): while (array[left] <= pivot and left < right): left += 1 while (array[right] > pivot and right >= left): right -=1 if (right> left): swap(array, left, right) right -= 1 left += 1 #swap array[start] and array[right] temp = array[start] array[start] = array[right] array[right] = temp self.quickSort(array, start, right- 1); self.quickSort(array, right+ 1, end) 30 Features of Sorting Algorithms • In-place – Sorted items occupy the same space as the original items. (No copying required, only O(1) extra space if any.) • Stable – Items in input with the same value end up in the same order as when they began. 31 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Your Turn Sort Properties Are the following: Bubble Sort? Insertion Sort? Selection Sort? MergeSort? QuickSort? stable? No No No No No Yes Yes Yes Yes Yes in-place? Can Be Can Be Can Be Can Be Can Be No No No No No Yes Yes Yes Yes Yes 32 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 How fast can we sort? • Heapsort, Mergesort, and Quicksort all run in O(N log N) best case running time • Can we do any better? • No, if the basic action is a comparison. 33 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Sorting Model • Recall our basic assumption: we can only compare two elements at a time – we can only reduce the possible solution space by half each time we make a comparison • Suppose you are given N elements – Assume no duplicates • How many possible orderings can you get? – Example: a, b, c (N = 3) 34 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Permutations • How many possible orderings can you get? – – – – Example: a, b, c (N = 3) (a b c), (a c b), (b a c), (b c a), (c a b), (c b a) 6 orderings = 3•2•1 = 3! (ie, “3 factorial”) All the possible permutations of a set of 3 elements • For N elements – N choices for the first position, (N-1) choices for the second position, …, (2) choices, 1 choice – N(N-1)(N-2)(2)(1)= N! possible orderings 35 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 BucketSort (aka BinSort) If all values to be sorted are known to be between 1 and K, create an array count of size K, increment counts while traversing the input, and finally output the result. Example K=5. Input = (5,1,3,4,3,2,1,1,5,4,5) count 1 2 3 4 5 array Running time to sort n items? 36 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 BucketSort Complexity: O(n+K) • Case 1: K is a constant – BinSort is linear time • Case 2: K is variable – Not simply linear time • Case 3: K is constant but large (e.g. 232) – ??? 37 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Fixing impracticality: RadixSort • Radix = “The base of a number system” – We’ll use 10 for convenience, but could be anything • Idea: BucketSort on each digit, least significant to most significant (lsd to msd) 38 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Radix Sort Example (1st pass) Bucket sort by 1’s digit After 1st pass Input data 478 537 9 721 3 38 123 67 0 1 721 2 3 3 123 4 5 6 7 8 537 67 478 38 9 9 721 3 123 537 67 478 38 9 This example uses B=10 and base 10 digits for simplicity of demonstration. Larger bucket counts should be used in an actual implementation. CSE 326: Data Structures Sorting Ben Lerner Summer 2007 39 Radix Sort Example (2nd pass) After 1st Bucket sort by 10’s digit pass 721 3 123 537 67 478 38 9 0 03 09 1 2 3 721 123 537 38 4 5 After 2nd pass 6 7 67 478 8 9 3 9 721 123 537 38 67 478 40 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Radix Sort Example (3rd pass) Bucket sort by 100’s digit After 2nd pass 3 9 721 123 537 38 67 478 0 1 003 009 038 067 123 2 3 4 5 478 537 After 3rd pass 6 7 721 8 9 3 9 38 67 123 478 537 721 Invariant: after k passes the low order k digits are sorted. 41 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Your Turn RadixSort • Input:126, 328, 636, 341, 416, 131, 328 BucketSort on lsd: 0 1 2 3 4 5 6 7 8 9 BucketSort on next-higher digit: 0 1 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 BucketSort on msd: 0 1 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 42 Radixsort: Complexity • How many passes? • How much work per pass? • Total time? • Conclusion? • In practice – RadixSort only good for large number of elements with relatively small values – Hard on the cache compared to MergeSort/QuickSort 43 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Internal versus External Sorting • So far assumed that accessing A[i] is fast – Array A is stored in internal memory (RAM) – Algorithms so far are good for internal sorting • What if A is so large that it doesn’t fit in internal memory? – Data on disk or tape – Delay in accessing A[i] – e.g. need to spin disk and move head 44 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Internal versus External Sorting • Need sorting algorithms that minimize disk/tape access time • External sorting – Basic Idea: – Load chunk of data into RAM, sort, store this “run” on disk/tape – Use the Merge routine from Mergesort to merge runs – Repeat until you have only one run (one sorted chunk) – Text gives some examples 45 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Summary of sorting • Sorting choices: – O(N2) – Bubblesort, Insertion Sort – O(N log N) average case running time: • Heapsort: In-place, not stable. • Mergesort: O(N) extra space, stable. • Quicksort: claimed fastest in practice, but O(N2) worst case. Needs extra storage for recursion. Not stable. – O(N) – Radix Sort: fast and stable. Not comparison based. Not in-place. 46 CSE 326: Data Structures Sorting Ben Lerner Summer 2007 Search Algorithms • We now present several algorithms that can be used for searching and sorting lists – We first discuss the design of an algorithm, – We then show its implementation as a Python function, and, – Finally, we provide an analysis of the algorithm’s computational complexity • To keep things simple, each function processes a list of integers Fundamentals of Python: From First Programs Through Data Structures 47 Search for a Minimum • Python’s min function returns the minimum or smallest item in a list • Alternative version: n – 1 comparisons for a list of size n • O(n) Fundamentals of Python: From First Programs Through Data Structures 48 Linear Search of a List • Python’s in operator is implemented as a method named __contains__ in the list class – Uses a sequential search or a linear search • Python code for a linear search function: – Analysis is different from previous one Fundamentals of Python: From First Programs Through Data Structures 49 Best-Case, Worst-Case, and AverageCase Performance • Analysis of a linear search considers three cases: – In the worst case, the target item is at the end of the list or not in the list at all • O(n) – In the best case, the algorithm finds the target at the first position, after making one iteration • O(1) – Average case: add number of iterations required to find target at each possible position; divide sum by n • O(n) Fundamentals of Python: From First Programs Through Data Structures 50 Binary Search of a List • A linear search is necessary for data that are not arranged in any particular order • When searching sorted data, use a binary search Fundamentals of Python: From First Programs Through Data Structures 51 Binary Search of a List (continued) – More efficient than linear search • Additional cost has to do with keeping list in order Fundamentals of Python: From First Programs Through Data Structures 52 Summary of Searching • Linear versus binary • O(n) vs O(lgn) 53 Reading from course text: Qu es ti ons? ______________________ Devon M. Simmonds Computer Science Department University of North Carolina Wilmington _____________________________________________________________ 54 What is sorting? Given n elements, arrange them in an increasing or decreasing order by some attribute. Simple algorithms: O(n2) Insertion sort Selection sort Bubble sort Shell sort … Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Heap sort Merge sort Quick sort … Specialized algorithms: O(n) Bucket sort Radix sort Handling huge data sets External sorting 55