CSE 5311 Notes 1: Mathematical Preliminaries (Last updated 3/6/16 2:46 PM) Chapter 1 - Algorithms & Computing Relationship between complexity classes, e.g. log n , n, n log n , n 2, 2 n, etc. Chapter 2 - Getting Started Loop Invariants and Design-by-Contract: http://dl.acm.org.ezproxy.uta.edu/citation.cfm?doid=2578702.2506375 RAM Model Assumed Reality Memory Access Arithmetic Word size Chapter 3 - Asymptotic Notation f(n) = O(g(n)) g(n) bounds f(n) above (by a constant factor) f(n) = (g(n)) g(n) bounds f(n) below (by a constant factor) f(n) = (g(n)) f(n) = O(g(n)) and f(n) = (g(n)) Iterated Logarithms logk n = (log n)k log(k) n = log(log(k-1) n) = log log . . . log n (log(1) n = log n) lg* n = Count the times that you can punch a log key on your calculator before value is 1 Arises for algorithms that run in “practically” or “nearly” constant time. Appendix A - Summations Review: Geometric Series (p. 1147) Harmonic Series (p. 1147) Approximation by integrals (p. 1154) 2 Chapter 4 - Recurrences SUBSTITUTION METHOD - Review 1. Guess the bound. 2. Prove using (strong) math. induction. Exercise: æ nö T ( n) = 4Tç ÷ + n è 2ø æ ö Try Oç n 3÷ and confirm by math induction è ø Assume T ( k ) £ ck 3 for k < n æ nö n3 Tç ÷ £ c è 2ø 8 æ nö T ( n ) = 4Tç ÷ + n è 2ø n3 £ 4c +n 8 c = n3 + n 2 c = cn 3 - n 3 + n 2 £ cn 3 c - n 3 + n £ 0 if n is sufficiently large 2 æ ö Improve bound to Oç n 2 ÷ and confirm è ø Assume T ( k ) £ ck 2 for k < n æ nö n2 Tç ÷ £ c è 2ø 4 æ nö n2 T ( n) = 4Tç ÷ + n £ 4c + n = cn 2 + n STUCK! è 2ø 4 3 as lower bound: Assume T ( k ) ³ ck 3 for k < n æ nö n3 Tç ÷ ³ c è 2ø 8 æ nö T ( n ) = 4Tç ÷ + n è 2ø n3 ³ 4c +n 8 c = n3 + n 2 c c = cn 3 - n 3 + n - n 3 + n ³ 0? 2 2 ³ cn 3 DID NOT PROVE!!! as lower bound: Assume T ( k ) ³ ck 2 for k < n æ nö n2 Tç ÷ ³ c è 2ø 4 æ nö n2 T ( n) = 4Tç ÷ + n ³ 4c + n = cn2 + n ³ cn 2 for 0 < c è 2ø 4 What’s going on . . . T (1) = d T (2) = 4T (1) + 2 = 4d + 2 T ( 4) = 4T (2) + 4 = 4 ( 4d + 2) + 4 = 16d + 8 + 4 = 16d + 12 T (8) = 4T ( 4) + 8 = 4 (16d + 12) + 8 = 64d + 48 + 8 = 64d + 56 T (16) = 4T (8) + 16 = 4 (64d + 56) + 16 = 256d + 224 + 16 = 256d + 240 Hypothesis : T ( n ) = dn 2 + n 2 - n = ( d + 1) n 2 - n = cn 2 - n 4 æ ö Oç n 2 ÷: è ø Assume T ( k ) £ ck 2 - k for k < n æ nö n2 n Tç ÷ £ c è 2ø 4 2 æ 2 ö æ nö n n T ( n) = 4Tç ÷ + n £ 4ççc - ÷÷ + n = cn 2 - 2n + n = cn2 - n è 2ø 2ø è 4 : [This was already proven.] Assume T ( k ) ³ ck 2 - k for k < n æ nö n2 n Tç ÷ ³ c è 2ø 4 2 Exercise: æ 2 ö æ nö n n÷ ç T ( n) = 4Tç ÷ + n ³ 4çc - ÷ + n = cn 2 - 2n + n = cn2 - n è 2ø 2ø è 4 T ( n) = T ( n ) +1 O(log n): Assume T ( k ) £ c lg k for k < n ( n ) £ c lg lg n 2 lg n lg n T ( n) = T n + 1 £ c + 1 = c lg n - c +1 2 2 £ c lg n if c ³ 2 T n =c ( ) O(loglog n) Assume T ( k ) £ c lglg k for k < n. (Note : log a log a n Î Q(logb logb n )) lg n T n £ c lglg n = c lg = c lglg n - c 2 ( ) T ( n) = T ( n ) + 1 £ c lglg n - c + 1 £ c lglg n if c ³ 1 5 Assume T ( k ) ³ c lglg k for k < n ( n ) ³ c lglg n = c lg lg2n = c lglg n - c T ( n) = T ( n ) + 1 ³ c lglg n - c + 1 T ³ c lglg n if 0 < c £ 1 Substitution Method is a general technique - See CSE 2320 Notes 08 for quicksort analysis RECURSION-TREE METHOD - Review and Prelude to Master Method Convert to summation and then evaluate () Example: Mergesort, p. 38 T ( n) = 2T n + n (case 2 for master method) 2 6 Exercise: æ nö T ( n) = 4Tç ÷ + n è 2ø T(n)Þn T(n/2)Þn/2 T(n/4)Þn/4 lg n + 1 levels T(n/8)Þn/8 T(n/16)Þn/16 . . . T(2)=T(n/2lg n-1)Þ2 T(1)Þ1 n 4n/2=2n 16n/4=4n 64n/8=8n 256n/16=16n . . . 4lg n-1•n/2 lg n-1 =2lg n-1•n 4lg n =nlg 4 =n2 Using definite geometric sum formula: lg n-1 2lg n -1 k 2 cn å 2 + cn = cn +cn 2 2 -1 k=0 = cn n -1 +cn 2 ( ) æ ö = cn2 - cn +cn 2 = Qç n 2 ÷ è ø (Case 1 of master method) t x t+1 -1 k Using å x = x -1 k=0 x ¹1 7 Exercise: (case 3 of master method) æ nö T ( n) = 2Tç ÷ + n 3 è 2ø T(n)Þn3 T(n/2)Þn3/8 T(n/4)Þn3/64 lg n + 1 levels T(n/8)Þn3/512 T(n/16)Þn3/4096 . . . T(2)=T(n/2lg n-1)Þ8 T(1)Þ1 n3 2n3/8=n3/4 4n3/64=n3/16 8n3/512=n3/64 16n3/4096=n3/256 . . . 2lg n-1(n/2lg n-1)3=n3/4lg n-1 2lg n=n Using indefinite geometric sum formula: lg n-1 1 ¥ 1 cn 3 å + cn £ cn 3 å + cn k k 4 4 k=0 k=0 = cn 3 1 + cn 1 14 4 = cn 3 + cn = O(n 3 ) 3 ¥ 1 F rom å x k = 1- x k=0 0 < x <1 Using definite geometric sum formula: () 1 lg n -1 lg n-1 1 t x t+1 -1 cn 3 å + cn = cn 3 4 + cn U sing å x k = k 1 -1 x -1 k=0 4 k=0 4 æ n -2 -1 1- n -2 4 1 ö = cn 3 + cn = cn 3 + cn = cn 3ç1- ÷ + cn 3 3 è n2 ø -3 4 4 æ ö = Qç n 3÷ è ø MASTER METHOD/THEOREM (“new”) - CLRS 4.5 x ¹1 8 () T ( n) = aT n + f ( n) b a1 b>1 Three mutually exclusive cases (proof sketched in 4.6.1): æ logb a ö f n = O ( ) ç n e ÷, e > 0 Þ T ( n) = Qæçè nlogb a ö÷ø 1. è n ø 2. æ ö æ ö f ( n) = Qç n logb a ÷ Þ T ( n ) = Qç n logb a log n÷ è ø è ø 3. (leaves dominate) (each level contributes equally) (root dominates) (Problem 4.6-3 shows that e > 0 in 3. follows from the existence of c and n0. By taking n = bk n0 logb n-logb n0 and the condition on f, a f ( n0 ) £ f ( n) may be established. Last step is c e £ logb 1 .) c () () Example: ænö T ( n) = 10Tç ÷ + n è10 ø a = 10 b = 10 f(n) = n.5 nlog10 10 = n Case 1: n.5 = O(n1-), 0 < 0.5 Þ T ( n) = Q( n) 9 Example: (recursion tree given earlier - leaves dominate) æ nö T ( n) = 4Tç ÷ + n è 2ø a = 4 b = 2 f(n) = n nlog2 4 = n2 Case 1: 2- ), 0 < 1 n = O(n æ ö Þ T ( n) = Qç n 2 ÷ è ø Example: æ nö T ( n) = 9Tç ÷ + 2n 2 è 3ø a = 9 b = 3 f(n) = 2n2 nlog3 9 = n2 Case 2: 2n2 = (n2) æ ö Þ T ( n) = Qç n 2 log n÷ è ø Recursion Tree æ ö T ( n) = 2n2 log3 n + cn 2 = Qç n 2 log n÷ è ø 10 Example: æ nö T ( n) = 2Tç ÷ + n 3 è 2ø a = 2 b = 2 f(n) = n3 nlg 2 = n Case 3: n3 = (n1+), 0 < 2 æ 3ö af n = 2ç n ÷ £ cn 3 = cf ( n), 1 £ c < 1 b 4 è 8 ø () æ ö Þ T ( n) = Qç n 3÷ è ø Example: ænö n T ( n ) = Tç ÷ + è2ø 2 a = 1 b = 2 f(n) = n/2 nlg 1 = n0 = 1 Case 3: n/2 = (n0n), 0 < 1 ænö n af = 1ç 2 ÷ = n £ c n = cf ( n ), 1 £ c < 1 b ç2÷ 4 2 2 è ø () Þ T ( n) = Q( n) From CLRS, p. 95: æ nö T ( n) = 2Tç ÷ + n lg n è 2ø a = 2 b = 2 f(n) = n lg n nlg 2 = n1 = n f ( n) = w ( n), but () for any e > 0 . . . () () af n = 2 f n = 2 n lg n = n lg n - n = cn lg n + (1- c ) n lg n - n b 2 2 2 £ cf ( n ) = cn lg n for c ³ 1 So master method (case 3) does not support T ( n) = Q( n lg n) 11 Using exercise 4.6-2 as a hint . . . (or a simple recursion tree with n = 2k ) æ ö Substitution method to show T ( n) = Qç n lg2 n÷ è ø PROBABILISTIC ANALYSIS Hiring Problem - Interview potential assistants, always hiring the best available. Input: Permutation of 1 . . . n Output: The number of sequence values that are larger than all previous sequence elements. 3 1 2 4 1 2 3 4 4 3 2 1 Worst-case = n Observation: For any k-prefix of sequence, the last element is the largest with probability ______. Summing for all k-prefixes: n ln n < å 1 = H n £ ln n + 1 k k=1 12 Hiring m-of-n Problem Observation: For any k-prefix of sequence, the last element is one of the m largest with probability ______. ______ if k m ______ otherwise Sum for all k-prefixes (expected number that are hired) m+ n å m = m + mH n - mH m k k=m+1 What data structure supports this application? Coupon Collecting (Knuth) n types of coupon. One coupon per cereal box. How many boxes of cereal must be bought (expected) to get at least one of each coupon type? Collecting the n coupons is decomposed into n steps: Step 0 = get first coupon Step 1 = get second coupon Step m = get m+1st coupon Step n - 1 = get last coupon Number of boxes for step m Let pi = probability of needing exactly i boxes (difficult) ¥ i-1 n-m m , so the expected number of boxes for coupon m + 1 is å ipi pi = n n i=1 ( ) Let qi = probability of needing at least i boxes = probability that previous i - 1 boxes are failures (much easier to use) So, pi = qi - qi+1 13 ¥ ¥ ip = å i å i(qi - qi+1) i=1 i=1 ¥ ¥ = å iqi - å iqi+1 i=1 i=1 ¥ ¥ = å iqi - å (i -1)qi i=1 i=2 ¥ ¥ ¥ = q1 + å iqi - å iqi + å qi i=2 i=2 i=2 ¥ = å qi i=1 q1 = 1 q2 = m n ( ) ( ) 2 q3 = m n k-1 qk = m n ¥ ¥ i n = Expected number of boxes for coupon m + 1 å qi = å mn = 1m = n-m 1i=1 i=0 n ( ) Summing over all steps gives n-1 å n = n 1+ 1 + 1 + ...+ 1 £ n (ln n + 1) n-i 2 3 n i=0 ( ) Analyze the following game: n sides on each die, numbered 1 through n. Roll dice one at a time. Keep a die only if its number > numbers on dice you already have. a. What is the expected number of rolls (including those not kept) to get a die numbered n? i ¥ n-1 n = 1 = =n å n n-1 nn-1) ( 1i=0 n ( ) 14 b. What number of dice do you expect (mathematically) to keep? Keep going until an n is encountered. Repeats of a number are discarded. Results in hiring problem ( Hn £ ln n +1) Suppose a gambling game involves a sequence of rolls from a standard six-sided die. A player wins $1 when the value rolled is the same as the previous roll. If a sequence has 1201 rolls, what is the expected amount paid out? Probability of a roll paying $1 = 1/6. 1200/6 = $200 Suppose a gambling game involves a sequence of rolls from a standard six-sided die. A player wins $1 when the value rolled is larger than the previous roll. If a sequence has 1201 rolls, what is the expected amount paid out? 6 Payout for next roll = 1 å Expected payout after roll of i 6 i=1 = 1 5+ 4 + 3+ 2+ 1+0 6 6 6 6 6 6 = 15 = 5 36 12 ( ) 1200*5/12 = $500 Suppose a gambling game involves a sequence of rolls from a standard six-sided die. A player wins k dollars when the value k rolled is smaller than the previous roll. If a sequence has 601 rolls, what is the expected amount paid out? ( ) 1 1· 5 + 2 · 4 + 3· 3 + 4 · 2 + 5 · 1 + 6 · 0 = 5+8+9+8+5 = 35 6 6 6 6 6 6 6 36 36 600*35/36 = $583.33 Suppose a gambling game involves a sequence of rolls from a standard six-sided die. What is the expected number of rolls until the first pair of consecutive sixes appears? æ ö i ¥ ç ÷ 6-1 1 6 Expected rolls to get first six = 6 = å = = ç 6 6-1 66-1 ( )÷ 1èi=0 ø 6 ( ) Probability that next roll is not a six = 5 6 15 k ¥ 5 Expected rolls for consecutive sixes = (6 + 1) å = (6 + 1) 1 = 42 6 1- 5 k=0 6 () Also, see: http://dl.acm.org.ezproxy.uta.edu/citation.cfm?doid=2408776.2408800 http://dl.acm.org.ezproxy.uta.edu/citation.cfm?doid=2428556.2428578 and along with the book by Grinstead & Snell, especially the first four chapters. GENERATING RANDOM PERMUTATIONS PERMUTE-BY-SORTING (p. 125, skim) Generates randoms in 1 . . . n3 and then sorts to get permutation in (n log n) time. Can use radix/counting sort (CSE 2320 Notes 8) to perform in (n) time. RANDOMIZE-IN-PLACE Array A must initially contain a permutation. Could simply be identity permutation: A[i] = i. for i=1 to n swap A[i] and A[RANDOM(i,n)] Code is equivalent to reaching in a bag and choosing a number to “lock” into each slot. Uniform - all n! permutations are equally likely to occur. Problem 5.3-3 PERMUTE-WITH-ALL for i=1 to n swap A[i] and A[RANDOM(1,n)] Produces n n outcomes, but n! does not divide into n n evenly. Not uniform - some permutations are produced more often than others. Assume n=3 and A initially contains identity permutation. RANDOM choices that give each permutation. 1 1 2 2 3 3 2 3 1 3 1 2 3: 2: 3: 1: 2: 1: 1 1 1 1 1 1 2 2 1 1 1 2 3 2 3 2 1 1 1 1 2 1 2 2 3 3 2 3 2 1 2 3 3 1 1 1 2 2 2 2 3 3 1 1 3 2 2 2 3 2 2 2 2 3 3 2 3 2 3 3 2 3 1 3 3 3 1 1 2 3 3 2 3 1 1 3 3 1 3 1 3 16 RANDOMIZED ALGORITHMS Las Vegas Output is always correct. Time varies depending on random choices (or randomness in input). Challenge: Determine expected time. Classic Examples: Randomized (pivot) quicksort takes expected time in Q( n log n) . Universal hashing This Semester: Treaps Perfect Hashing Rabin-Karp Text Search Asides: Randomized Binary Search - About 50% more probes List Ranking - Shared Memory and Distributed Versions (http://ranger.uta.edu/~weems/NOTES4351/09notes.pdf) Ethernet: http://en.wikipedia.org/wiki/Exponential_backoff Valiant-Brebner permutation routing Monte Carlo Correct solution with some (large) probability. Use repetition to improve odds. Usually has one-sided error - one of the outcomes can be wrong Example - testing primality without factoring To check N for being prime: Randomly generate some a with 1 < a < N) If N moda = 0, then report composite else report prime One-sided error - prime could be wrong. (Several observations from number theory are needed to make this robust for avoiding Carmichael numbers) Concept: Repeated application of Monte Carlo technique can lead to: Improved reliability Las Vegas algorithm that gives correct result “with high probability”