Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 10 April 8 2004 Today • How to determine Big-O • Compare data structures and algorithms • Sorting algorithms 2 How to determine Big-O • Partition algorithm into known pieces • Identify relationship between pieces – Sequential code (+) – Nested code (*) • Drop constants • Only keep the most dominant factors 3 Does Big-O tell the whole story? • Tx(n) = Ty(n) = O(lg n) T1(n)=50+3n+(10+5+15)n = 50+33n Algorithm 1 setup of algorithm read n elements into for i in 1..n loop do operation1 do operation2 do operation3 array -- takes 50 time units -- 3 units/element on A[i] on A[i] on A[i] -- takes 10 units -- takes 5 units -- takes 15 units Algorithm 2 T2(n)=200+3n+(10+5)n = 200+18n setup of algorithm -- takes 200 time units read n elements into array -- 3 units/element for i in 1..n loop do operation1 on A[i] -- takes 10 units do operation2 on A[i] -- takes 5 units 4 Data structure Traversal Unsorted L List N Sorted L List N Unsorted Array N Sorted Array N Binary Tree N BST N F&B BST N Search Insert 5 Searching • Linear (sequential) search – Checks every element of a list until a match is found – Can be used to search an unordered list • Binary search – Searches a set of sorted data for a particular data – Considerable faster than a linear search – Can be implemented using recursion or iteration 6 Linear Search • If data distributed randomly – Average case: • N/2 comparisons needed – Best case: • values is equal to first element tested – Worst case: • value not in list Æ N comparisons needed Linear Linear search search is is O(N) O(N) 7 Data structure Traversal Search Unsorted L List Sorted L List N N N N Unsorted Array N N Sorted Array Binary Tree N N N BST N N F&B BST N Insert 8 Full and Balanced Binary Search Tree 42 17 11 47 19 43 67 9 Binary Search 11 17 19 42 43 47 52 67 50 not found 3 comparisons 3 = Log (8) 10 Binary Search • Can be performed on – Sorted arrays – Full and balanced BSTs • Compares and cuts half the work – We cut work in ½ each time – How many times can we cut in half? Binary Binary search search is is O(Log O(Log N) N) 11 Data structure Traversal Search Insert Unsorted L List N N PSET Sorted L List N N Unsorted Array N N 1 Sorted Array Binary Tree N N Log N N 1 BST N N F&B BST N Log N PSET 12 Insertion/Shuffling Elements 11 17 19 42 43 47 19 35 42 43 35 11 17 47 Shuffle Shuffle is is O(N) O(N) 13 Insertion to a Sorted Array • Sorted Array – Finding the right spot – – Performing the shuffle – – Performing the insertion - O(Log N) O(N) O(1) – Total work: O(Log O(Log N N+ +N N+ + 1) 1) = = O(N) O(N) 14 Data structure Traversal Search Insert Unsorted L List N N PSET Sorted L List N N PSET Unsorted Array N N 1 Sorted Array N Log N N Binary Tree N N 1 BST N N F&B BST N Log N 15 Insertion into a F&B BST 42 17 11 47 19 43 67 18 16 Insertion into a F&B BST • Finding the right spot – O(Log N) • Performing the insertion – O(1) • Total work: O(Log O(Log N N+ + 1) 1) = = O(Log O(Log N) N) 17 Data structure Traversal Search Insert Unsorted L List N N PSET Sorted L List N N Unsorted Array N N 1 Sorted Array Log N N N Binary Tree N N BST N N 1 N F&B BST N Log N Log N PSET 18 Sorting Algorithms • • • • • • • • Insertion sort Bubble sort Selection sort … Merge sort Heap sort Quick sort … In In the the Worst Worst Case Case O(N2) or worse O(N Log N) or better 19 Insertion Sort • Sorted array/list is built one item at a time – – – – Simple to implement Efficient on small data sets Efficient on already almost ordered data sets Minimal memory requirements 20 Insertion Sort 8 2 4 9 3 6 2 2 2 2 2 8 4 4 3 3 4 8 8 4 4 9 9 9 8 6 3 3 3 9 8 6 6 6 6 9 21 Insertion Sort Statement Work InsertionSort(A, n) T(n) for j in 2..n do c1n key:= A[j] c2(n-1) i := j-1 c3(n-1) while i > 0 and A[i] > key c4X A[i+1]:= A[i] c5(X-(n-1)) i:= i-1 c6(X-(n-1)) A[i+1]:= key c7(n-1) X = x2 + x3 + … + xn where xi is number of while expression evaluations for the ith for loop iteration 22 Insertion Sort Analysis T(n) = c1n + c2(n-1) + c3(n-1) + c4X + c5(X - (n-1)) + c6(X - (n-1)) + c7(n-1) = c8X + c9n + c10 Running time – Best case: • inner loop never executed - Linear Function – Worst case: • inner loop always executed - X is a quadratic function in n – Average case: • all permutations equally likely 23 Insertion Sort – O(N2) • Assume you are sorting 250,000,000 item N = 250,000,000 N2 = 6.25 * 1016 Assume you can do 1 operation/nanosecond Æ 6.25 * 107 seconds = 1.98 years 24 Merge Sort MergeSort A[1..n] 1. If the input sequence has only one element – Return 2. Partition the input sequence into two halves 3. Sort the two subsequences using the same algorithm 4. Merge the two sorted subsequences to form the output sequence 25 Divide and Conquer • It is an algorithmic design paradigm that contains the following steps – Divide: Break the problem into smaller sub-problems – Recur: Solve each of the sub-problems recursively – Conquer: Combine the solutions of each of the sub-problems to form the solution of the problem 26 Merge Sort 47 47 47 11 11 17 5 12 12 67 17 42 5 96 67 5 96 5 96 12 47 5 96 67 5 47 96 67 12 42 12 67 12 17 17 11 12 17 42 42 17 17 42 47 11 42 42 11 11 47 11 67 67 5 96 96 27 Merge Sort – O(N * Log N) • Assume you are sorting 250,000,000 item N = 250,000,000 N*Log N = 250,000,000 * 28 Assume you can do 1 operation/nanosecond Æ 7.25 seconds 28 Merge Sort Analysis Statement Work MergeSort(A, left, right) if (left < right) mid := (left + right) / 2; MergeSort(A, left, mid); MergeSort(A, mid+1, right); Merge(A, left, mid, right); T(n) T(n) = = O(1) O(1) 2T(n/2) 2T(n/2) + + O(n) O(n) T(n) O(1) O(1) T(n/2) T(n/2) O(n) when when nn = = 1, 1, when when nn > > 11 Recurrence Recurrence Equation Equation 29