CS 3343: Analysis of Algorithms Review for final 4/16/2020 1 Review for finals • In chronological order • Only the more important concepts – Very likely to appear in your final • Does not mean to be exclusive 4/16/2020 2 Asymptotic notations • • • • • • O: Big-Oh Ω: Big-Omega Θ: Theta o: Small-oh ω: Small-omega Intuitively: O is like o is like < 4/16/2020 is like is like > is like = 3 Big-Oh • Math: – O(g(n)) = {f(n): positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) n>n0} – Or: lim n→∞ g(n)/f(n) > 0 (if the limit exists.) • Engineering: – g(n) grows at least as faster as f(n) – g(n) is an asymptotic upper bound of f(n) • Intuitively it is like f(n) ≤ g(n) 4/16/2020 4 Big-Oh • Claim: f(n) = 3n2 + 10n + 5 O(n2) • Proof: 3n2 + 10n + 5 3n2 + 10n2 + 5n2 when n > 1 18 n2 when n > 1 Therefore, • Let c = 18 and n0 = 1 • We have f(n) c n2, n > n0 • By definition, f(n) O(n2) 4/16/2020 5 Big-Omega • Math: – Ω(g(n)) = {f(n): positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) n>n0} – Or: lim n→∞ f(n)/g(n) > 0 (if the limit exists.) • Engineering: – f(n) grows at least as faster as g(n) – g(n) is an asymptotic lower bound of f(n) • Intuitively it is like g(n) ≤ f(n) 4/16/2020 6 Big-Omega • f(n) = n2 / 10 = Ω(n) • Proof: f(n) = n2 / 10, g(n) = n – g(n) = n ≤ n2 / 10 = f(n) when n > 10 – Therefore, c = 1 and n0 = 10 4/16/2020 7 Theta • Math: – Θ(g(n)) = {f(n): positive constants c1, c2, and n0 such that c1 g(n) f(n) c2 g(n) n n0 n>n0} – Or: lim n→∞ f(n)/g(n) = c > 0 and c < ∞ – Or: f(n) = O(g(n)) and f(n) = Ω(g(n)) • Engineering: – f(n) grows in the same order as g(n) – g(n) is an asymptotic tight bound of f(n) • Intuitively it is like f(n) = g(n) • Θ(1) means constant time. 4/16/2020 8 Theta • Claim: f(n) = 2n2 + n = Θ (n2) • Proof: – We just need to find three constants c1, c2, and n0 such that – c1n2 ≤ 2n2+n ≤ c2n2 for all n > n0 – A simple solution is c1 = 2, c2 = 3, and n0 = 1 4/16/2020 9 Using limits to compare orders of growth f(n) o(g(n)) • lim f(n) / g(n) = n→∞ 0 c >0 ∞ f(n) O(g(n)) f(n) Θ (g(n)) f(n) Ω(g(n)) f(n) ω (g(n)) 4/16/2020 10 • Compare 2n and 3n • lim 2n / 3n = lim(2/3)n = 0 n→∞ n→∞ • Therefore, 2n o(3n), and 3n ω(2n) 4/16/2020 11 L’ Hopital’s rule lim f(n) / g(n) = lim f(n)’ / g(n)’ n→∞ 4/16/2020 n→∞ If both lim f(n) and lim g(n) goes to ∞ 12 • Compare n0.5 and log n • lim n0.5 / log n = ? n→∞ • • • • 4/16/2020 (n0.5)’ = 0.5 n-0.5 (log n)’ = 1 / n lim (n-0.5 / 1/n) = lim(n0.5) = ∞ Therefore, log n o(n0.5) 13 Stirling’s formula n n n 1 / 2 n n! 2 n 2 n e e n! 4/16/2020 (constant) n n 1/ 2 n e 14 • Compare 2n and n! n n! c nn n lim n lim n n lim c n n 2 n 2 e n 2e n • Therefore, 2n = o(n!) 4/16/2020 15 More advanced dominance ranking 4/16/2020 16 General plan for analyzing time efficiency of a non-recursive algorithm • Decide parameter (input size) • Identify most executed line (basic operation) • worst-case = average-case? • T(n) = i ti • T(n) = Θ (f(n)) 4/16/2020 17 Analysis of insertion Sort Statement InsertionSort(A, n) { for j = 2 to n { key = A[j] i = j - 1; while (i > 0) and (A[i] > key) { A[i+1] = A[i] i = i - 1 } A[i+1] = key } } 4/16/2020 cost time__ c1 c2 c3 c4 c5 c6 0 c7 0 n (n-1) (n-1) S (S-(n-1)) (S-(n-1)) (n-1) 18 Best case Inner loop stops when A[i] <= key, or i = 0 i j 1 sorted Key • Array already sorted n S 1 n (n) j 1 4/16/2020 19 Worst case Inner loop stops when A[i] <= key i j 1 sorted Key • Array originally in reverse order n(n 1) 2 S j 1 2 ... n (n ) 2 j 1 n 4/16/2020 20 Average case Inner loop stops when A[i] <= key i j 1 sorted Key • Array in random order j 1 n n(n 1) 2 E(S ) j (n ) 2 j 1 4 j 1 2 n 4/16/2020 21 Find the order of growth for sums n T (n) j (n ) 2 n T ( n ) log( j ) ? j 1 j 1 n T (n) 2 j ? j 1 n n T (n) j ? j 1 2 ... • How to find out the actual order of growth? – Remember some formulas – Learn how to guess and prove 4/16/2020 22 Arithmetic series • An arithmetic series is a sequence of numbers such that the difference of any two successive members of the sequence is a constant. e.g.: 1, 2, 3, 4, 5 or 10, 12, 14, 16, 18, 20 • In general: a j a j 1 d Or: 4/16/2020 a j a1 ( j 1)d Recursive definition Closed form, or explicit formula 23 Sum of arithmetic series If a1, a2, …, an is an arithmetic series, then n( a1 an ) ai 2 i 1 n 4/16/2020 24 Geometric series • A geometric series is a sequence of numbers such that the ratio between any two successive members of the sequence is a constant. e.g.: 1, 2, 4, 8, 16, 32 or 10, 20, 40, 80, 160 or 1, ½, ¼, 1/8, 1/16 • In general: a j ra j 1 j 1 Or: 4/16/2020 a j r a0 Recursive definition Closed form, or explicit formula 25 Sum of geometric series (1 r n 1 ) /(1 r ) n n 1 i r ( r 1) /( r 1) i 0 n 1 if r < 1 if r > 1 if r = 1 n 1 2 1 i n 1 n 1 2 2 1 2 2 1 i 0 n n n 1 1 i 1 lim n i lim n ( 2 ) 2 1 12 i 0 2 i 0 n n 1 0 0 1 1 lim n i lim n 2 2 2 1 1 i 1 2 i 0 4/16/2020 26 Important formulas 3 n i 2 ( n 3 ) 3 i 1 n n 1 n (n) i 1 n i i 1 n ( n 1) ( n 2 ) 2 1 (1) ( r 1) r n ( r ) ( r 1) r 1 i 0 n i r n 1 n k 1 k 1 i ( n ) k 1 i 1 n k n i n 1 n i 2 ( n 1 ) 2 2 ( n 2 ) i 1 n 1 (lg n ) i 1 i n lg i (n lg n) i 1 4/16/2020 27 Sum manipulation rules (a b ) a b ca c a i i i i n i i i i i i i x n a a a i m i i m i i x 1 i Example: n n n i 1 i 1 i 1 i i n 1 ( 4 i 2 ) 4 i 2 2 n ( n 1 ) 2 2 n n n 1 n i n i i 1 2 i 1 2 4/16/2020 28 Recursive algorithms • General idea: – Divide a large problem into smaller ones • By a constant ratio • By a constant or some variable – Solve each smaller one recursively or explicitly – Combine the solutions of smaller ones to form a solution for the original problem Divide and Conquer 4/16/2020 29 How to analyze the time-efficiency of a recursive algorithm? • Express the running time on input of size n as a function of the running time on smaller problems 4/16/2020 30 Analyzing merge sort T(n) MERGE-SORT A[1 . . n] Θ(1) 1. If n = 1, done. 2T(n/2) 2. Recursively sort A[ 1 . . n/2 ] and A[ n/2+1 . . n ] . f(n) 3. “Merge” the 2 sorted lists Sloppiness: Should be T( n/2 ) + T( n/2 ) , but it turns out not to matter asymptotically. 4/16/2020 31 Analyzing merge sort 1. Divide: Trivial. 2. Conquer: Recursively sort 2 subarrays. 3. Combine: Merge two sorted subarrays T(n) = 2 T(n/2) + f(n) +Θ(1) # subproblems subproblem size 4/16/2020 1. What is the time for the base case? 2. What is f(n)? 3. What is the growth order of T(n)? Work dividing and Combining Constant 32 Solving recurrence • Running time of many algorithms can be expressed in one of the following two recursive forms T ( n ) aT ( n b) f ( n ) or T ( n ) aT ( n / b) f ( n ) Challenge: how to solve the recurrence to get a closed form, e.g. T(n) = Θ (n2) or T(n) = Θ(nlgn), or at least some bound such as T(n) = O(n2)? 4/16/2020 33 Solving recurrence 1. Recurrence tree (iteration) method - Good for guessing an answer 2. Substitution method - Generic method, rigid, but may be hard 3. Master method - Easy to learn, useful in limited cases only - Some tricks may help in other cases 4/16/2020 34 The master method The master method applies to recurrences of the form T(n) = a T(n/b) + f (n) , where a 1, b > 1, and f is asymptotically positive. 1. Divide the problem into a subproblems, each of size n/b 2. Conquer the subproblems by solving them recursively. 3. Combine subproblem solutions Divide + combine takes f(n) time. 4/16/2020 35 Master theorem T(n) = a T(n/b) + f (n) Key: compare f(n) with nlogba CASE 1: f (n) = O(nlogba – e) T(n) = (nlogba) . CASE 2: f (n) = (nlogba) T(n) = (nlogba log n) . CASE 3: f (n) = (nlogba + e) and a f (n/b) c f (n) T(n) = ( f (n)) . e.g.: merge sort: T(n) = 2 T(n/2) + Θ(n) a = 2, b = 2 nlogba = n CASE 2 T(n) = Θ(n log n) . 4/16/2020 36 Case 1 Compare f (n) with nlogba: f (n) = O(nlogba – e) for some constant e > 0. : f (n) grows polynomially slower than nlogba (by an ne factor). Solution: T(n) = (nlogba) i.e., aT(n/b) dominates e.g. T(n) = 2T(n/2) + 1 T(n) = 4 T(n/2) + n T(n) = 2T(n/2) + log n T(n) = 8T(n/2) + n2 4/16/2020 37 Case 3 Compare f (n) with nlogba: f (n) = (nlogba + e) for some constant e > 0. : f (n) grows polynomially faster than nlogba (by an ne factor). Solution: T(n) = (f(n)) i.e., f(n) dominates e.g. T(n) = T(n/2) + n T(n) = 2 T(n/2) + n2 T(n) = 4T(n/2) + n3 T(n) = 8T(n/2) + n4 4/16/2020 38 Case 2 Compare f (n) with nlogba: f (n) = (nlogba). : f (n) and nlogba grow at similar rate. Solution: T(n) = (nlogba log n) e.g. T(n) = T(n/2) + 1 T(n) = 2 T(n/2) + n T(n) = 4T(n/2) + n2 T(n) = 8T(n/2) + n3 4/16/2020 39 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. 4/16/2020 40 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. T(n) 4/16/2020 41 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn T(n/2) 4/16/2020 T(n/2) 42 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn/2 T(n/4) 4/16/2020 T(n/4) dn/2 T(n/4) T(n/4) 43 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn/2 dn/2 dn/4 dn/4 dn/4 dn/4 (1) 4/16/2020 44 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn/2 dn/2 h = log n dn/4 dn/4 dn/4 dn/4 (1) 4/16/2020 45 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn dn/2 dn/2 h = log n dn/4 dn/4 dn/4 dn/4 (1) 4/16/2020 46 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn dn/2 dn/2 h = log n dn/4 dn/4 dn/4 dn dn/4 (1) 4/16/2020 47 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn dn/2 dn/2 dn/4 dn/4 dn/4 dn … h = log n dn/4 dn (1) 4/16/2020 48 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn dn/2 dn/2 dn/4 dn/4 dn/4 dn … h = log n dn/4 dn (1) 4/16/2020 #leaves = n (n) 49 Recursion tree Solve T(n) = 2T(n/2) + dn, where d > 0 is constant. dn dn dn/2 dn/2 dn/4 dn/4 dn/4 dn … h = log n dn/4 dn (1) #leaves = n (n) Total (n log n) 4/16/2020 50 Substitution method The most general method to solve a recurrence (prove O and separately): 1. Guess the form of the solution: (e.g. using recursion trees, or expansion) 2. Verify by induction (inductive step). 4/16/2020 51 Proof by substitution • Recurrence: T(n) = 2T(n/2) + n. • Guess: T(n) = O(n log n). (eg. by recurrence tree method) • To prove, have to show T(n) ≤ c n log n for some c > 0 and for all n > n0 • Proof by induction: assume it is true for T(n/2), prove that it is also true for T(n). This means: • Fact: T(n) = 2T(n/2) + n • Assumption: T(n/2)≤ cn/2 log (n/2) • Need to Prove: T(n)≤ c n log (n) 4/16/2020 52 Proof • Fact: T(n) = 2T(n/2) + n • Assumption: T(n/2)≤ cn/2 log (n/2) • Need to Prove: T(n)≤ c n log (n) • Proof: Substitute T(n/2) into the recurrence function => T(n) = 2 T(n/2) + n ≤ cn log (n/2) + n => T(n) ≤ c n log n - c n + n => T(n) ≤ c n log n (if we choose c ≥ 1). 4/16/2020 53 Proof by substitution • Recurrence: T(n) = 2T(n/2) + n. • Guess: T(n) = Ω(n log n). • To prove, have to show T(n) ≥ c n log n for some c > 0 and for all n > n0 • Proof by induction: assume it is true for T(n/2), prove that it is also true for T(n). This means: • Fact: T(n) = 2T(n/2) + n • Assumption:T(n/2) ≥ cn/2 log (n/2) • Need to Prove: T(n) ≥ c n log (n) 4/16/2020 54 Proof • Fact: T(n) = 2T(n/2) + n • Assumption: T(n/2) ≥ cn/2 log (n/2) • Need to Prove: T(n) ≥ c n log (n) • Proof: Substitute T(n/2) into the recurrence function => T(n) = 2 T(n/2) + n ≥ cn log (n/2) + n => T(n) ≥ c n log n - c n + n => T(n) ≥ c n log n (if we choose c ≤ 1). 4/16/2020 55 Quick sort Quicksort an n-element array: 1. Divide: Partition the array into two subarrays around a pivot x such that elements in lower subarray x elements in upper subarray. x x ≥x 2. Conquer: Recursively sort the two subarrays. 3. Combine: Trivial. Key: Linear-time partitioning subroutine. 4/16/2020 56 Partition • All the action takes place in the partition() function – Rearranges the subarray in place – End result: two subarrays • All values in first subarray all values in second – Returns the index of the “pivot” element separating the two subarrays p x 4/16/2020 r q x ≥x 57 Partition Code Partition(A, p, r) x = A[p]; // pivot is the first element i = p; j = r + 1; while (TRUE) { repeat i++; until A[i] > x or i >= j; repeat What is the running time of j--; until A[j] < x or j < i; partition()? if (i < j) Swap (A[i], A[j]); else break; partition() runs in O(n) time } swap (A[p], A[j]); return j; 4/16/2020 58 p x=6 6 r 10 5 8 13 3 2 11 i 6 j 10 5 8 13 3 i 6 2 2 5 8 13 3 2 5 8 13 5 2 5 p 4/16/2020 3 2 5 3 10 11 j 3 13 i 6 10 11 j i 6 11 j i 6 2 8 10 11 j 3 13 j i q 6 13 8 10 11 r 8 10 11 59 6 3 4/16/2020 10 2 5 8 11 3 2 5 6 11 8 13 10 13 2 3 5 6 10 8 11 13 2 3 5 6 8 10 11 13 2 3 5 6 8 10 11 13 60 Quicksort Runtimes • Best case runtime Tbest(n) O(n log n) • Worst case runtime Tworst(n) O(n2) • Worse than mergesort? Why is it called quicksort then? • Its average runtime Tavg(n) O(n log n ) • Better even, the expected runtime of randomized quicksort is O(n log n) 4/16/2020 61 Randomized quicksort • Randomly choose an element as pivot – Every time need to do a partition, throw a die to decide which element to use as the pivot – Each element has 1/n probability to be selected Partition(A, p, r) d = random(); // a random number between 0 and 1 index = p + floor((r-p+1) * d); // p<=index<=r swap(A[p], A[index]); x = A[p]; i = p; j = r + 1; while (TRUE) { … } 4/16/2020 62 Running time of randomized quicksort T(n) = T(0) + T(n–1) + dn T(1) + T(n–2) + dn M T(n–1) + T(0) + dn if 0 : n–1 split, if 1 : n–2 split, if n–1 : 0 split, • The expected running time is an average of all cases Expectation 1 n 1 T ( n ) T ( k ) T ( n k 1) n (n log n ) n k 0 4/16/2020 63 Heaps • In practice, heaps are usually implemented as arrays: 16 14 10 8 2 7 4 3 1 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 64 Heaps • To represent a complete binary tree as an array: – – – – – The root node is A[1] Node i is A[i] The parent of node i is A[i/2] (note: integer divide) The left child of node i is A[2i] The right child of node i is A[2i + 1] 16 14 A = 16 14 10 8 7 9 3 2 4 8 1 = 2 4/16/2020 10 7 4 9 3 1 65 The Heap Property • Heaps also satisfy the heap property: A[Parent(i)] A[i] for all nodes i > 1 – In other words, the value of a node is at most the value of its parent – The value of a node should be greater than or equal to both its left and right children • And all of its descendents – Where is the largest element in a heap stored? 4/16/2020 66 Heap Operations: Heapify() Heapify(A, i) { // precondition: subtrees rooted at l and r are heaps l = Left(i); r = Right(i); if (l <= heap_size(A) && A[l] > A[i]) largest = l; Among A[l], A[i], A[r], else which one is largest? largest = i; if (r <= heap_size(A) && A[r] > A[largest]) largest = r; if (largest != i) { Swap(A, i, largest); If violation, fix it. Heapify(A, largest); } } // postcondition: subtree rooted at i is a heap 4/16/2020 67 Heapify() Example 16 4 10 14 2 7 8 3 1 A = 16 4 10 14 7 4/16/2020 9 9 3 2 8 1 68 Heapify() Example 16 4 10 14 2 7 8 3 1 A = 16 4 10 14 7 4/16/2020 9 9 3 2 8 1 69 Heapify() Example 16 4 10 14 2 7 8 3 1 A = 16 4 10 14 7 4/16/2020 9 9 3 2 8 1 70 Heapify() Example 16 14 10 4 2 7 8 3 1 A = 16 14 10 4 4/16/2020 9 7 9 3 2 8 1 71 Heapify() Example 16 14 10 4 2 7 8 3 1 A = 16 14 10 4 4/16/2020 9 7 9 3 2 8 1 72 Heapify() Example 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 73 Heapify() Example 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 74 Analyzing Heapify(): Formal • T(n) T(2n/3) + (1) • By case 2 of the Master Theorem, T(n) = O(lg n) • Thus, Heapify() takes logarithmic time 4/16/2020 75 Heap Operations: BuildHeap() • We can build a heap in a bottom-up manner by running Heapify() on successive subarrays – Fact: for array of length n, all elements in range A[n/2 + 1 .. n] are heaps (Why?) – So: • Walk backwards through the array from n/2 to 1, calling Heapify() on each node. • Order of processing guarantees that the children of node i are heaps when i is processed 4/16/2020 76 BuildHeap() // given an unsorted array A, make A a heap BuildHeap(A) { heap_size(A) = length(A); for (i = length[A]/2 downto 1) Heapify(A, i); } 4/16/2020 77 BuildHeap() Example • Work through example A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7} 4 1 3 2 14 4/16/2020 16 8 9 10 7 78 4 1 3 2 14 4/16/2020 16 8 9 10 7 79 4 1 3 14 2 4/16/2020 16 8 9 10 7 80 4 1 10 14 2 4/16/2020 16 8 9 3 7 81 4 16 10 14 2 4/16/2020 7 8 9 3 1 82 16 14 10 8 2 4/16/2020 7 4 9 3 1 83 Analyzing BuildHeap(): Tight • To Heapify() a subtree takes O(h) time where h is the height of the subtree – h = O(lg m), m = # nodes in subtree – The height of most subtrees is small • Fact: an n-element heap has at most n/2h+1 nodes of height h • CLR 7.3 uses this fact to prove that BuildHeap() takes O(n) time 4/16/2020 84 Heapsort Example • Work through example A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7} 4 1 3 2 14 16 8 A= 4/16/2020 9 10 7 4 1 3 2 16 9 10 14 8 7 85 Heapsort Example • First: build a heap 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 86 Heapsort Example • Swap last and first 1 14 10 8 2 7 4 A= 4/16/2020 9 3 16 1 14 10 8 7 9 3 2 4 16 87 Heapsort Example • Last element sorted 1 14 10 8 2 7 4 A= 4/16/2020 9 3 16 1 14 10 8 7 9 3 2 4 16 88 Heapsort Example • Restore heap on remaining unsorted elements 14 8 10 4 2 7 1 16 A = 14 8 10 4 4/16/2020 9 3 Heapify 7 9 3 2 1 16 89 Heapsort Example • Repeat: swap new last and first 1 8 10 4 2 7 14 A= 4/16/2020 9 3 16 1 8 10 4 7 9 3 2 14 16 90 Heapsort Example • Restore heap 10 8 9 4 2 7 14 3 16 A = 10 8 4/16/2020 1 9 4 7 1 3 2 14 16 91 Heapsort Example • Repeat 9 8 3 4 10 7 14 A= 4/16/2020 1 2 16 9 8 3 4 7 1 2 10 14 16 92 Heapsort Example • Repeat 8 7 3 4 10 2 14 A= 4/16/2020 1 9 16 8 7 3 4 2 1 9 10 14 16 93 Heapsort Example • Repeat 1 2 3 4 10 7 14 A= 4/16/2020 8 9 16 1 2 3 4 7 8 9 10 14 16 94 Analyzing Heapsort • The call to BuildHeap() takes O(n) time • Each of the n - 1 calls to Heapify() takes O(lg n) time • Thus the total time taken by HeapSort() = O(n) + (n - 1) O(lg n) = O(n) + O(n lg n) = O(n lg n) 4/16/2020 95 HeapExtractMax Example 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 96 HeapExtractMax Example Swap first and last, then remove last 1 14 10 8 2 7 4 A= 4/16/2020 9 3 16 1 14 10 8 7 9 3 2 4 16 97 HeapExtractMax Example Heapify 14 8 10 4 2 7 1 3 16 A = 14 8 10 4 4/16/2020 9 7 9 3 2 1 16 98 HeapChangeKey Example Increase key 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 99 HeapChangeKey Example Increase key 16 14 10 15 2 7 4 3 1 A = 16 14 10 15 7 4/16/2020 9 9 3 2 4 1 100 HeapChangeKey Example Increase key 16 15 10 14 2 7 4 3 1 A = 16 15 10 14 7 4/16/2020 9 9 3 2 4 1 101 HeapInsert Example HeapInsert(A, 17) 16 14 10 8 2 7 4 3 1 A = 16 14 10 8 4/16/2020 9 7 9 3 2 4 1 102 HeapInsert Example HeapInsert(A, 17) 16 14 10 8 2 7 4 1 9 3 -∞ -∞ makes it a valid heap A = 16 14 10 8 4/16/2020 7 9 3 2 4 1 -∞ 103 HeapInsert Example HeapInsert(A, 17) 16 14 10 8 2 7 4 1 9 3 17 Now call changeKey A = 16 14 10 8 4/16/2020 7 9 3 2 4 1 17 104 HeapInsert Example HeapInsert(A, 17) 17 16 10 8 2 14 4 1 9 7 A = 17 16 10 8 14 9 4/16/2020 3 3 2 4 1 7 105 • Heapify: Θ(log n) • BuildHeap: Θ(n) • HeapSort: Θ(nlog n) • • • • 4/16/2020 HeapMaximum: Θ(1) HeapExtractMax: Θ(log n) HeapChangeKey: Θ(log n) HeapInsert: Θ(log n) 106 Counting sort 1. for i 1 to k Initialize do C[i] 0 Count 2. for j 1 to n do C[A[ j]] C[A[ j]] + 1 ⊳ C[i] = |{key = i}| Compute running sum 3.for i 2 to k do C[i] C[i] + C[i–1] ⊳ C[i] = |{key i}| 4.for j n downto 1 Re-arrange do B[C[A[ j]]] A[ j] C[A[ j]] C[A[ j]] – 1 4/16/2020 107 Counting sort A: 1 2 3 4 5 4 1 3 4 3 B: 3. for i 2 to k do C[i] C[i] + C[i–1] 4/16/2020 1 2 3 4 C: 1 0 2 2 C': 1 1 3 5 ⊳ C[i] = |{key i}| 108 Loop 4: re-arrange A: B: 1 2 3 4 5 4 1 3 4 3 3 1 2 3 4 C: 1 1 3 5 C': 1 1 3 5 4. for j n downto 1 do B[C[A[ j]]] A[ j] C[A[ j]] C[A[ j]] – 1 4/16/2020 109 Analysis (k) (n) (k) (n) 1. for i 1 to k do C[i] 0 2.for j 1 to n do C[A[ j]] C[A[ j]] + 1 3.for i 2 to k do C[i] C[i] + C[i–1] 4.for j n downto 1 do B[C[A[ j]]] A[ j] C[A[ j]] C[A[ j]] – 1 (n + k) 4/16/2020 110 Stable sorting Counting sort is a stable sort: it preserves the input order among equal elements. A: 4 1 3 4 3 B: 1 3 3 4 4 Why this is important? What other algorithms have this property? 4/16/2020 111 Radix sort • Similar to sorting the address books • Treat each digit as a key • Start from the least significant bit Most significant Least significant 198099109123518183599 340199540380128115295 384700101594539614696 382408360201039258538 614386507628681328936 4/16/2020 112 Time complexity • Sort each of the d digits by counting sort • Total cost: d (n + k) – k = 10 – Total cost: Θ(dn) • Partition the d digits into groups of 3 – Total cost: (n+103)d/3 • We work with binaries rather than decimals – – – – – Partition a binary number into groups of r bits Total cost: (n+2r)d/r Choose r = log n Total cost: dn / log n Compare with dn log n • Catch: faster than quicksort only when n is very large 4/16/2020 113 Randomized selection algorithm RAND-SELECT(A, p, q, i) ⊳ ith smallest of A[ p . . q] if p = q & i > 1 then error! r RAND-PARTITION(A, p, q) kr–p+1 ⊳ k = rank(A[r]) if i = k then return A[ r] if i < k then return RAND-SELECT( A, p, r – 1, i ) else return RAND-SELECT(A, r + 1, q, i – k ) k A[r] A[r] p r q 4/16/2020 114 Example Select the i = 6th smallest: 7 10 pivot Partition: 3 2 5 8 11 3 2 13 k=4 5 7 11 8 10 13 i=6 Select the 6 – 4 = 2nd smallest recursively. 4/16/2020 115 Complete example: select the 6th smallest element. 7 i=6 k=4 3 10 2 5 5 8 7 11 3 2 11 8 10 8 8 10 13 10 13 i=6–4=2 k=3 Note: here we always used first element as pivot to do the partition (instead of rand-partition). 11 13 i=2<k k=2 i=2=k 4/16/2020 10 116 Intuition for analysis (All our analyses today assume that all elements are distinct.) Lucky: T(n) = T(9n/10) + (n) nlog10 / 9 1 n0 1 CASE 3 = (n) Unlucky: arithmetic series T(n) = T(n – 1) + (n) = (n2) Worse than sorting! 4/16/2020 117 Running time of randomized selection T(n) ≤ T(max(0, n–1)) + n T(max(1, n–2)) + n M T(max(n–1, 0)) + n if 0 : n–1 split, if 1 : n–2 split, if n–1 : 0 split, • For upper bound, assume ith element always falls in larger side of partition • The expected running time is an average of all cases Expectation 4/16/2020 1 n1 T (n ) T max( k , n k 1) n (n ) n k 0 118 Worst-case linear-time selection SELECT(i, n) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 3. Partition around the pivot x. Let k = rank(x). 4. if i = k then return x Same as elseif i < k RANDthen recursively SELECT the ith smallest element in the lower part SELECT else recursively SELECT the (i–k)th smallest element in the upper part 4/16/2020 119 Developing the recurrence T(n) SELECT(i, n) (n) 1. Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 3. Partition around the pivot x. Let k = rank(x). 4. if i = k then return x elseif i < k then recursively SELECT the ith smallest element in the lower part else recursively SELECT the (i–k)th smallest element in the upper part T(n/5) (n) T(7n/10 +3) 4/16/2020 120 Solving the recurrence 1 7 T (n ) T n T n 3 n 5 10 Assumption: T(k) ck for all k < n T ( n ) c( n 5) c(7n 10 3) n cn 5 3cn / 4 n if n ≥ 60 19cn / 20 n cn ( cn / 20 n ) cn if c ≥ 20 and n ≥ 60 4/16/2020 121 Elements of dynamic programming • Optimal sub-structures – Optimal solutions to the original problem contains optimal solutions to sub-problems • Overlapping sub-problems – Some sub-problems appear in many solutions 4/16/2020 122 Two steps to dynamic programming • Formulate the solution as a recurrence relation of solutions to subproblems. • Specify an order to solve the subproblems so you always have what you need. 4/16/2020 123 Optimal subpaths • Claim: if a path startgoal is optimal, any sub-path, startx, or xgoal, or xy, where x, y is on the optimal path, is also the shortest. • Proof by contradiction – If the subpath between x and y is not the shortest, we can replace it with the shorter one, which will reduce the total length of the new path => the optimal path from start to goal is not the shortest => contradiction! – Hence, the subpath xy must be the shortest among all paths from x to y b start a x y b’ 4/16/2020 a + b + c is shortest c goal b’ < b a + b’ + c < a + b + c 124 Dynamic programming illustration S 0 3 5 3 9 3 5 3 2 6 2 6 13 2 4 17 9 4 5 11 17 11 6 14 2 17 13 13 3 17 2 18 15 3 3 16 4 3 1 3 15 3 7 3 2 2 9 3 6 1 8 13 3 3 2 3 1 3 3 7 12 20 3 2 20 G F(i-1, j) + dist(i-1, j, i, j) F(i, j) = min F(i, j-1) + dist(i, j-1, i, j) 4/16/2020 125 Trace back 0 3 5 9 3 5 3 2 6 2 6 13 2 4 17 9 4 8 11 17 11 6 14 5 17 2 13 2 13 2 17 3 18 16 4 3 1 3 15 3 7 3 15 3 9 3 2 13 3 3 6 1 1 3 2 3 12 3 3 7 4/16/2020 3 20 3 2 20 126 Longest Common Subsequence • Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. “a” not “the” x: A B C B D A y: B D C A B A B BCBA = LCS(x, y) functional notation, but not a function 4/16/2020 127 Optimal substructure • Notice that the LCS problem has optimal substructure: parts of the final solution are solutions of subproblems. – If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y. i m x z n y j • Subproblems: “find LCS of pairs of prefixes of x and y” 4/16/2020 128 Finding length of LCS m x n y • Let c[i, j] be the length of LCS(x[1..i], y[1..j]) => c[m, n] is the length of LCS(x, y) • If x[m] = y[n] c[m, n] = c[m-1, n-1] + 1 • If x[m] != y[n] c[m, n] = max { c[m-1, n], c[m, n-1] } 4/16/2020 129 DP Algorithm • Key: find out the correct order to solve the sub-problems • Total number of sub-problems: m * n c[i, j] = c[i–1, j–1] + 1 max{c[i–1, j], c[i, j–1]} 0 j if x[i] = y[j], otherwise. n 0 i C(i, j) m 4/16/2020 130 LCS Example (0) j i 0 X[i] 1 A 2 B 3 C 4 B 0 Y[j] 1 B 2 D 3 C 4 A ABCB BDCAB 5 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5,6] 4/16/2020 131 LCS Example (1) j i 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 X[i] 1 A 2 B 3 C 0 4 B 0 0 0 for i = 1 to m for j = 1 to n 4/16/2020 ABCB BDCAB 5 c[i,0] = 0 c[0,j] = 0 132 LCS Example (2) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 X[i] 1 A 2 B 3 C 0 4 B 0 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 133 LCS Example (3) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 0 X[i] 1 A 2 B 3 C 0 4 B 0 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 134 LCS Example (4) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 0 X[i] 1 A 2 B 3 C 0 4 B 0 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 135 LCS Example (5) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 1 0 X[i] 1 A 2 B 3 C 0 4 B 0 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 136 LCS Example (6) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 X[i] 1 A 2 B 3 C 0 4 B 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 137 LCS Example (7) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 X[i] 1 A 2 B 3 C 0 4 B 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 138 LCS Example (8) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 2 0 X[i] 1 A 2 B 3 C 0 4 B 0 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 139 LCS Example (14) j i ABCB BDCAB 5 0 Y[j] 1 B 2 D 3 C 4 A B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 2 0 X[i] 1 A 2 B 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] ) 4/16/2020 140 LCS Algorithm Running Time • LCS algorithm calculates the values of each entry of the array c[m,n] • So what is the running time? O(m*n) since each c[i,j] is calculated in constant time, and there are m*n elements in the array 4/16/2020 141 How to find actual LCS • The algorithm just found the length of LCS, but not LCS itself. • How to find the actual LCS? • For each c[i,j] we know how it was acquired: if x[i] y[ j ], c[i 1, j 1] 1 c[i, j ] max( c[i, j 1], c[i 1, j ]) otherwise • A match happens only when the first equation is taken • So we can start from c[m,n] and go backwards, remember x[i] whenever c[i,j] = c[i-1, j-1]+1. 4/16/2020 2 2 2 3 For example, here c[i,j] = c[i-1,j-1] +1 = 2+1=3 142 Finding LCS j i 0 Y[j] 1 B 2 D 3 C 4 A 5 B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 2 0 X[i] 1 A 2 B 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 Time for trace back: O(m+n). 4/16/2020 143 Finding LCS (2) j i 4/16/2020 0 Y[j] 1 B 2 D 3 C 4 A 5 B 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 2 0 X[i] 1 A 2 B 3 C 0 1 1 2 2 2 4 B 0 1 1 2 2 3 LCS (reversed order): B C B LCS (straight order): B C B (this string turned out to be a palindrome) 144 LCS as a longest path problem B D C 4/16/2020 1 1 1 C B B 1 A B A 1 1 145 LCS as a longest path problem B A D C 0 0 0 0 0 0 0 0 1 1 0 1 1 B 0 0 0 1 1 1 1 2 2 2 2 1 1 B C 1 1 B 0 4/16/2020 A 1 1 1 1 1 2 3 146 Restaurant location problem 1 • You work in the fast food business • Your company plans to open up new restaurants in Texas along I-35 • Towns along the highway called t1, t2, …, tn • Restaurants at ti has estimated annual profit pi • No two restaurants can be located within 10 miles of each other due to some regulation • Your boss wants to maximize the total profit • You want a big bonus 4/16/2020 10 mile 147 A DP algorithm • Suppose you’ve already found the optimal solution • It will either include tn or not include tn • Case 1: tn not included in optimal solution – Best solution same as best solution for t1 , …, tn-1 • Case 2: tn included in optimal solution – Best solution is pn + best solution for t1 , …, tj , where j < n is the largest index so that dist(tj, tn) ≥ 10 4/16/2020 148 Recurrence formulation • Let S(i) be the total profit of the optimal solution when the first i towns are considered (not necessarily selected) – S(n) is the optimal solution to the complete problem S(n) = max S(n-1) j < n & dist (tj, tn) ≥ 10 S(j) + pn Generalize S(i) = max S(i-1) j < i & dist (tj, ti) ≥ 10 S(j) + pi Number of sub-problems: n. Boundary condition: S(0) = 0. Dependency: 4/16/2020 S j i-1 i 149 Example Distance (mi) dummy 100 0 Profit (100k) S(i) 5 2 2 6 7 6 3 6 3 6 7 9 8 3 6 7 9 9 10 10 7 4 3 12 2 4 12 5 12 12 14 26 26 Optimal: 26 S(i) = max S(i-1) S(j) + pi j < i & dist (tj, ti) ≥ 10 • Natural greedy 1: 6 + 3 + 4 + 12 = 25 • Natural greedy 2: 12 + 9 + 3 = 24 4/16/2020 150 Complexity • Time: (nk), where k is the maximum number of towns that are within 10 miles to the left of any town – In the worst case, (n2) – Can be improved to (n) with some preprocessing tricks • Memory: Θ(n) 4/16/2020 151 Knapsack problem • Each item has a value and a weight • Objective: maximize value • Constraint: knapsack has a weight limitation Three versions: 0-1 knapsack problem: take each item or leave it Fractional knapsack problem: items are divisible Unbounded knapsack problem: unlimited supplies of each item. Which one is easiest to solve? We study the 0-1 problem today. 4/16/2020 152 Formal definition (0-1 problem) • Knapsack has weight limit W • Items labeled 1, 2, …, n (arbitrarily) • Items have weights w1, w2, …, wn – Assume all weights are integers – For practical reason, only consider wi < W • Items have values v1, v2, …, vn • Objective: find a subset of items, S, such that iS wi W and iS vi is maximal among all such (feasible) subsets 4/16/2020 153 A DP algorithm • Suppose you’ve find the optimal solution S • Case 1: item n is included • Case 2: item n is not included wn Total weight limit: W Find an optimal solution using items 1, 2, …, n-1 with weight limit W - wn 4/16/2020 wn Total weight limit: W Find an optimal solution using items 1, 2, …, n-1 with weight limit W 154 Recursive formulation • Let V[i, w] be the optimal total value when items 1, 2, …, i are considered for a knapsack with weight limit w => V[n, W] is the optimal solution V[n-1, W-wn] + vn V[n, W] = max V[n-1, W] Generalize V[i, w] = V[i-1, w-wi] + vi item i is taken V[i-1, w] item i not taken max V[i-1, w] if wi > w item i not taken Boundary condition: V[i, 0] = 0, V[0, w] = 0. Number of sub-problems = ? 4/16/2020 155 Example • n = 6 (# of items) • W = 10 (weight limit) • Items (weight, value): 2 4 3 5 2 6 4/16/2020 2 3 3 6 4 9 156 w 0 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 0 0 0 0 i wi vi 0 1 2 2 0 2 4 3 0 3 3 3 0 4 5 6 0 5 2 4 0 6 6 9 0 wi V[i-1, w-wi] V[i-1, w] V[i, w] V[i-1, w-wi] + vi item i is taken max V[i, w] = V[i-1, w] V[i-1, w] if wi > w 4/16/2020 item i not taken item i not taken 157 w 0 1 2 3 4 5 6 7 8 9 10 i wi vi 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 2 2 2 2 2 2 2 2 2 2 4 3 0 0 2 2 3 3 5 5 5 5 5 3 3 3 0 0 2 3 3 5 5 6 6 8 8 4 5 6 0 0 2 3 3 6 6 8 9 9 11 5 2 4 0 0 4 4 6 7 7 10 10 12 13 6 6 9 0 0 4 4 6 7 9 10 13 13 15 V[i-1, w-wi] + vi item i is taken max V[i, w] = 4/16/2020 V[i-1, w] V[i-1, w] if wi > w item i not taken item i not taken 158 w 0 1 2 3 4 5 6 7 8 9 10 i wi vi 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 2 2 2 2 2 2 2 2 2 2 4 3 0 0 2 2 3 3 5 5 5 5 5 3 3 3 0 0 2 3 3 5 5 6 6 8 8 4 5 6 0 0 2 3 3 6 6 8 9 9 11 5 2 4 0 0 4 4 6 7 7 10 10 12 13 6 6 9 0 0 4 4 6 7 9 10 13 13 15 Optimal value: 15 Item: 6, 5, 1 Weight: 6 + 2 + 2 = 10 Value: 9 + 4 + 2 = 15 4/16/2020 159 Time complexity • Θ (nW) • Polynomial? – Pseudo-polynomial – Works well if W is small • Consider following items (weight, value): (10, 5), (15, 6), (20, 5), (18, 6) • Weight limit 35 – Optimal solution: item 2, 4 (value = 12). Iterate: 2^4 = 16 subsets – Dynamic programming: fill up a 4 x 35 = 140 table entries • What’s the problem? – Many entries are unused: no such weight combination – Top-down may be better 4/16/2020 160 Longest increasing subsequence • Given a sequence of numbers 125329493568 • Find a longest subsequence that is nondecreasing – E.g. 1 2 5 9 – It has to be a subsequence of the original list – It has to in sorted order => It is a subsequence of the sorted list Original list: 1 2 5 3 2 9 4 9 3 5 6 8 LCS: Sorted: 122334556899 4/16/2020 1234568 161 Events scheduling problem e3 e1 e2 e6 e4 e5 e8 e7 e9 Time • • • A list of events to schedule (or shows to see) – ei has start time si and finishing time fi – Indexed such that fi < fj if i < j Each event has a value vi Schedule to make the largest value – You can attend only one event at any time • Very similar to the new restaurant location problem – Sort events according to their finish time – Consider: if the last event is included or not 4/16/2020 162 Events scheduling problem e3 e1 e2 e6 e4 s8 e8 f8 e7 e5 s7 e9 f7 s9 f9 Time • V(i) is the optimal value that can be achieved when the first i events are considered V(n-1) • V(n) = max { V(j) + vn en not selected en selected j < n and fj < sn 4/16/2020 163 Coin change problem • Given some denomination of coins (e.g., 2, 5, 7, 10), decide if it is possible to make change for a value (e.g, 13), or minimize the number of coins • Version 1: Unlimited number of coins for each denomination – Unbounded knapsack problem • Version 2: Use each denomination at most once – 0-1 Knapsack problem 4/16/2020 164 Use DP algorithm to solve new problems • Directly map a new problem to a known problem • Modify an algorithm for a similar task • Design your own – Think about the problem recursively – Optimal solution to a larger problem can be computed from the optimal solution of one or more subproblems – These sub-problems can be solved in certain manageable order – Works nicely for naturally ordered data such as strings, trees, some special graphs – Trickier for general graphs • The text book has some very good exercises. 4/16/2020 165 Unit-profit restaurant location problem • Now the objective is to maximize the number of new restaurants (subject to the distance constraint) – In other words, we assume that each restaurant makes the same profit, no matter where it is opened 10 mile 4/16/2020 166 A DP Algorithm • Exactly as before, but pi = 1 for all i S(i-1) S(i) = max S(j) + pi j < i & dist (tj, ti) ≥ 10 S(i-1) S(i) = max S(j) + 1 4/16/2020 j < i & dist (tj, ti) ≥ 10 167 Greedy algorithm for restaurant location problem select t1 d = 0; for (i = 2 to n) d = d + dist(ti, ti-1); if (d >= min_dist) select ti d = 0; end end 5 d 4/16/2020 0 2 2 5 7 9 6 6 15 0 3 6 6 9 10 15 0 7 10 0 7 168 Complexity • Time: Θ(n) • Memory: – Θ(n) to store the input – Θ(1) for greedy selection 4/16/2020 169 Optimal substructure • Claim 1: if A = [m1, m2, …, mk] is the optimal solution to the restaurant location problem for a set of towns [t1, …, tn] – m1 < m2 < … < mk are indices of the selected towns – Then B = [m2, m3, …, mk] is the optimal solution to the sub-problem [tj, …, tn], where tj is the first town that are at least 10 miles to the right of tm1 • Proof by contradiction: suppose B is not the optimal solution to the sub-problem, which means there is a better solution B’ to the sub-problem – A’ = mi || B’ gives a better solution than A = mi || B => A is not optimal => contradiction => B is optimal A A’ 4/16/2020 m1 m1 m2 B mk B’ (imaginary) 170 Greedy choice property • Claim 2: for the uniform-profit restaurant location problem, there is an optimal solution that chooses t1 • Proof by contradiction: suppose that no optimal solution can be obtained by choosing t1 – Say the first town chosen by the optimal solution S is ti, i > 1 – Replace ti with t1 will not violate the distance constraint, and the total profit remains the same => S’ is an optimal solution – Contradiction – Therefore claim 2 is valid S S’ 4/16/2020 171 Fractional knapsack problem • Each item has a value and a weight • Objective: maximize value • Constraint: knapsack has a weight limitation 0-1 knapsack problem: take each item or leave it Fractional knapsack problem: items are divisible Unbounded knapsack problem: unlimited supplies of each item. Which one is easiest to solve? We can solve the fractional knapsack problem using greedy algorithm 4/16/2020 172 Greedy algorithm for fractional knapsack problem • Compute value/weight ratio for each item • Sort items by their value/weight ratio into decreasing order – Call the remaining item with the highest ratio the most valuable item (MVI) • Iteratively: – If the weight limit can not be reached by adding MVI • Select MVI – Otherwise select MVI partially until weight limit 4/16/2020 173 Example item Weight (LB) Value ($) $ / LB 1 2 2 1 2 4 3 0.75 3 3 3 1 4 5 6 1.2 5 2 4 2 6 6 9 1.5 4/16/2020 • Weight limit: 10 174 Example • Weight limit: 10 item Weight (LB) Value ($) $ / LB 5 2 4 2 6 6 9 1.5 – 2 LB, $4 4 5 6 1.2 • Take item 6 1 2 2 1 3 3 3 1 2 4 3 0.75 4/16/2020 • Take item 5 – 8 LB, $13 • Take 2 LB of item 4 – 10 LB, 15.4 175 Why is greedy algorithm for fractional knapsack problem valid? • Claim: the optimal solution must contain the MVI as much as possible (either up to the weight limit or until MVI is exhausted) • Proof by contradiction: suppose that the optimal solution does not use all available MVI (i.e., there is still w (w < W) pounds of MVI left while we choose other items) – We can replace w pounds of less valuable items by MVI – The total weight is the same, but with value higher than the “optimal” – Contradiction w w w 4/16/2020 w 176 Graphs • A graph G = (V, E) – V = set of vertices – E = set of edges = subset of V V – Thus |E| = O(|V|2) 1 Vertices: {1, 2, 3, 4} Edges: {(1, 2), (2, 3), (1, 3), (4, 3)} 2 4 3 4/16/2020 177 Graphs: Adjacency Matrix • Example: 1 2 4 3 A 1 2 3 4 1 0 1 1 0 2 0 0 1 0 3 0 0 0 0 4 0 0 1 0 How much storage does the adjacency matrix require? A: O(V2) 4/16/2020 178 Graphs: Adjacency List • Adjacency list: for each vertex v V, store a list of vertices adjacent to v • Example: – – – – Adj[1] = {2,3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3} 1 2 • Variation: can also keep a list of edges coming into vertex 4/16/2020 4 3 179 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 180 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 181 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 182 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 183 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 184 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 185 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 186 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 187 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 188 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 189 Kruskal’s algorithm: example c-d: 3 b-f: 5 b-a: 7 b-d: 8 f-g: 9 d-e: 10 a-f: 12 b-c: 14 e-h: 15 4/16/2020 6 6 f-e: a 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 190 Time complexity • Depending on implementation • Pseudocode: sort all edges according to weights Θ(m log m) T = {}. tree(v) = v for all v. = Θ(m log n) m edges for each edge (u, v) Avg time spent per edge if tree(u) != tree(v) Naïve: Θ (n) T = T U (u, v); Better: Θ (log n) union (tree(u), tree(v)) using set union Overall time complexity Naïve: Θ(nm) Better implementation: Θ(m log n) 4/16/2020 191 Prim’s algorithm: example a 6 5 b 14 3 4/16/2020 b ∞ ∞ f 7 8 c a 12 e 9 15 g h 10 d c d e f g h ∞ ∞ ∞ ∞ ∞ ∞ 192 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d ChangeKey 4/16/2020 c b a d e f g h 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ 193 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d ExctractMin 4/16/2020 h b a d e f g ∞ ∞ ∞ ∞ ∞ ∞ ∞ 194 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d ChangeKey 4/16/2020 d b a h e f g 3 14 ∞ ∞ ∞ ∞ ∞ 195 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d ExctractMin b 4/16/2020 g a h e f 14 ∞ ∞ ∞ ∞ ∞ 196 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e d 9 15 g h 10 Changekey 4/16/2020 b e a h g f 8 10 ∞ ∞ ∞ ∞ 197 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e d 9 15 g h 10 ExtractMin e 4/16/2020 f 10 ∞ a h g ∞ ∞ ∞ 198 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d Changekey 4/16/2020 f e a h g 5 10 6 ∞ ∞ 199 Prim’s algorithm: example a 6 5 b 14 f 7 8 c 3 e 9 15 g h 10 d ExtractMin 4/16/2020 12 a e g h 6 10 ∞ ∞ 200 Prim’s algorithm: example a 6 12 5 b 14 f 7 8 c 3 e 9 15 g h 10 d Changekey 4/16/2020 a e g h 6 7 9 ∞ 201 Prim’s algorithm: example a 6 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 ExtractMin 4/16/2020 e h g 7 ∞ 9 202 Prim’s algorithm: example a 6 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 ExtractMin 4/16/2020 g h 9 ∞ 203 Prim’s algorithm: example a 6 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 Changekey 4/16/2020 g h 9 15 204 Prim’s algorithm: example a 6 12 5 b 14 7 8 c 3 f e d 9 15 g h 10 ExtractMin h 4/16/2020 15 205 Prim’s algorithm: example a 6 5 b 14 3 f 7 8 c 4/16/2020 12 e d 9 15 g h 10 206 Complete Prim’s Algorithm MST-Prim(G, w, r) Q = V[G]; for each u Q Overall running time: Θ(m log n) key[u] = ; key[r] = 0; Cost per ChangeKey T = {}; n vertices while (Q not empty) u = ExtractMin(Q); Θ(n) times for each v Adj[u] if (v Q and w(u,v) < key[v]) T = T U (u, v); ChangeKey(v, w(u,v)); Θ(n2) times? How often is ExtractMin() called? How often is ChangeKey() called? 4/16/2020 Θ(m) times 207 Summary • Kruskal’s algorithm – Θ(m log n) – Possibly Θ(m + n log n) with counting sort • Prim’s algorithm – With priority queue : Θ(m log n) • Assume graph represented by adj list – With distance array : Θ(n^2) • Adj list or adj matrix – For sparse graphs priority queue wins – For dense graphs distance array may be better 4/16/2020 208 14 b 9 14 h f 7 1 6 7 9 3 5 5 e d 6 8 0 a i 4 7 2 1 c 7 g 4/16/2020 a b c d e f g h i ∞ 14 7 5 0 ∞ ∞ ∞ ∞ Dijkstra’s algorithm 209 14 11 b 9 14 h f 7 1 6 7 9 3 5 5 e d 6 8 0 a i 4 11 7 2 1 c 7 g a 4/16/2020 b c d e f g h i 11 11 7 5 0 ∞ ∞ ∞ ∞ Dijkstra’s algorithm 210 14 11 b 9 14 h f 7 1 6 7 9 3 5 a 1 11 9 i 5 e d 6 8 0 4 7 2 c 7 g 4/16/2020 a b c d e f g h i 9 11 7 5 0 ∞ ∞ ∞ ∞ Dijkstra’s algorithm 211 14 11 b 9 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 5 e d 6 8 0 4 7 2 c 7 g 4/16/2020 a b c d e f g h i 9 11 7 5 0 12 ∞ ∞ 17 Dijkstra’s algorithm 212 9 20 14 11 b 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 5 e d 6 8 0 4 7 2 c 7 g 4/16/2020 a b c d e f g h 9 11 7 5 0 12 ∞ 20 17 Dijkstra’s algorithm i 213 9 20 19 14 11 b 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 5 e d 6 8 0 4 7 2 c 7 g 4/16/2020 a b c d e f g h 9 11 7 5 0 12 ∞ 19 17 Dijkstra’s algorithm i 214 9 20 19 18 14 11 b 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 4 4/16/2020 e 7 2 c g 5 d 6 8 0 7 18 a b c d e 9 11 7 5 0 f g h i 12 18 18 17 Dijkstra’s algorithm 215 9 20 19 18 14 11 b 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 4 4/16/2020 e 7 2 c g 5 d 6 8 0 7 18 a b c d e 9 11 7 5 0 f g h i 12 18 18 17 Dijkstra’s algorithm 216 9 20 19 18 14 11 b 14 12 h f 7 1 6 7 9 3 5 17 a 1 11 9 i 4 4/16/2020 e 7 2 c g 5 d 6 8 0 7 18 a b c d e 9 11 7 5 0 f g h i 12 18 18 17 Dijkstra’s algorithm 217 Prim’s Algorithm MST-Prim(G, w, r) Q = V[G]; for each u Q Overall running time: Θ(m log n) key[u] = ; key[r] = 0; Cost per ChangeKey T = {}; while (Q not empty) u = ExtractMin(Q); for each v Adj[u] if (v Q and w(u,v) < key[v]) T = T U (u, v); ChangeKey(v, w(u,v)); 4/16/2020 218 Dijkstra’s Algorithm Dijkstra(G, w, r) Q = V[G]; for each u Q Overall running time: Θ(m log n) key[u] = ; key[r] = 0; Cost per ChangeKey T = {}; while (Q not empty) u = ExtractMin(Q); for each v Adj[u] if (v Q and key[u]+w(u,v) < key[v]) T = T U (u, v); ChangeKey(v, key[u]+w(u,v)); Running time of Dijkstra’s algorithm is the same as Prim’s algorithm 4/16/2020 219 Good luck with your final! 4/16/2020 220