CS 2133: Data Structures Mathematics Review and Asymptotic Notation Arithmetic Series Review 1+2+3+. . . +n=? Sn = a + a+d + a+2d + a+3d + . . . a+(n-1)d Sn= 2a+(n-1)d + 2a+(n-2)d + 2a+(n-3)d + … + a 2Sn = 2a+(n-1)d + 2a+(n-1)d + 2a+(n-1)d . . . + 2a+(n-1)d Sn = n/2[2a + (n-1)d] consequently 1+2+3+…+n = n(n+1)/2 Problems Find the sum of the following 1+3+5+ . . . + 121 = ? The first 50 terms of -3 + 3 + 9 + 15 + … 1 + 3/2 + 2 + 5/2 + . . . 25=? Geometric Series Review 1 + 2 + 4 + 8 + . . . + 2n 1 + 1/2 + 1/4 + . . . + 2-n Theorem: a arn 1 a r 1 r i 0 n i Sn= a + ar + ar2 + . . . + arn rSn= ar + ar2 + . . . + arn + arn+1 Sn-rSn = a - arn+1 a ar n 1 Sn 1 r What about the case where -1< r < 1 ? Geometric Problems i 1 1 1 1 1 ? 2 4 8 i 0 2 What is the sum of 3+9/4 + 27/16 + . . . 1/2 - 1/4 + 1/8 - 1/16 + . . . Harmonic Series Hn 1 1 1 1 1 2 3 4 n H n ln n 0.577 This is Eulers constants Just an Interesting Question What is the optimal base to use in the representation of numbers n? Example: with base x we have _ _ _ _ _ _ _ _ _ We minimize c log2 n 1 x X values logx n 1 slots Logarithm Review Ln = loge is called the natural logarithm Lg = log2 is called the binary logarithm How many bits are required to represent the number n in binary log2 n 1 Logarithm Rules The logarithm to the base b of x denoted logbx is defined to that number y such that by = x logb(x1*x2) = logb x1 + logb x2 logb(x1/x2) = logb x1 - logb x2 logb xc = c logbx logbx > 0 if x > 1 logbx = 0 if x = 1 logbx < 0 if 0 < x < 1 Additional Rules For all real a>0, b>0 , c>0 and n logb a = logca / logc b logb (1/a) = - logb a logb a = 1/ logab a logb n = n logb a Asymptotic Performance In this course, we care most about the asymptotic performance of an algorithm. How does the algorithm behave as the problem size gets very large? Running time Memory requirements Coming up: Asymptotic performance of two search algorithms, A formal introduction to asymptotic notation Input Size Time and space complexity This is generally a function of the input size E.g., sorting, multiplication How we characterize input size depends: Sorting: number of input items Multiplication: total number of bits Graph algorithms: number of nodes & edges Etc Running Time Number of primitive steps that are executed Except for time of executing a function call most statements roughly require the same amount of time y =m*x+b c = 5 / 9 * (t - 32 ) z = f(x) + g(y) We can be more exact if need be Analysis Worst case Provides an upper bound on running time An absolute guarantee Average case Provides the expected running time Very useful, but treat with care: what is “average”? Random (equally likely) inputs Real-life inputs An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } Insertion Sort What is the precondition InsertionSort(A, n) { for this loop? for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } How many times will } this loop execute? Insertion Sort Statement Effort InsertionSort(A, n) { for i = 2 to n { c1 n key = A[i] c2(n-1) j = i - 1; c3(n-1) while (j > 0) and (A[j] > key) { c4T A[j+1] = A[j] c5(T-(n-1)) j = j - 1 c6(T-(n-1)) } 0 A[j+1] = key c7(n-1) } 0 } T = t2 + t3 + … + tn where ti is number of while expression evaluations for the ith for loop iteration Analyzing Insertion Sort T(n) = c1n + c2(n-1) + c3(n-1) + c4T + c5(T - (n-1)) + c6(T - (n-1)) + c7(n-1) = c8T + c9n + c10 What can T be? Best case -- inner loop body never executed ti = 1 T(n) is a linear function Worst case -- inner loop body executed for all previous elements = i T(n) is a quadratic function T=1+2+3+4+ . . . n-1 + n = n(n+1)/2 ti Analysis Simplifications Ignore actual and abstract statement costs Order of growth is the interesting measure: Highest-order term is what counts Remember, we are doing asymptotic analysis As the input size grows larger it is the high order term that dominates Upper Bound Notation We say InsertionSort’s run time is O(n2) In general a function Properly we should say run time is in O(n2) Read O as “Big-O” (you’ll also hear it as “order”) f(n) is O(g(n)) if there exist positive constants c and n0 such that f(n) c g(n) for all n n0 Formally O(g(n)) = { f(n): positive constants c and n0 such that f(n) c g(n) n n0 Big O example Show using the definition that 5n+4 O(n) Where g(n)=n First we must find a c and an n0 We now need to show that f(n) c g(n) for every n n0 clearly 5n + 5 6n whenever n 6 Hence c=6 and n0=6 satisfy the requirements. Insertion Sort Is O(n2) Proof Suppose runtime is an2 + bn + c If any of a, b, and c are less than 0 replace the constant with its absolute value an2 + bn + c (a + b + c)n2 + (a + b + c)n + (a + b + c) 3(a + b + c)n2 for n 1 Let c’ = 3(a + b + c) and let n0 = 1 Question Is InsertionSort O(n3)? Is InsertionSort O(n)? Big O Fact A polynomial of degree k is O(nk) Proof: Suppose f(n) = bknk + bk-1nk-1 + … + b1n + b0 Let ai = | b i | f(n) aknk + ak-1nk-1 + … + a1n + a0 i n n ai k n k n k a i cn k Lower Bound Notation We say InsertionSort’s run time is (n) In general a function f(n) is (g(n)) if positive constants c and n0 such that 0 cg(n) f(n) n n0 Proof: Suppose run time is an + b Assume a and b are positive (what if b is negative?) an an + b Asymptotic Tight Bound A function f(n) is (g(n)) if positive constants c1, c2, and n0 such that c1 g(n) f(n) c2 g(n) n n0 Theorem f(n) is (g(n)) iff f(n) is both O(g(n)) and (g(n)) Proof: someday Notation (g) is the set of all functions f such that there exist positive constants c1, c2, and n0 such that 0 c1g(n) f(n) c2 g(n) for every n > nc c2 g(n) f(n) c1g(n) Growth Rate Theorems 1. The power n is in O(n) iff (with ,>0) and n is in o(n) iff 2. logbn o(n ) for any b and 3. n o(cn) for any >0 and c>1 4. logbn O(logbn) for any a and b 5. cn O(dn) iff cd and cn o(dn) iff c<d 6. Any constant function f(n) =c is in O(1) Big O Relationships 1. o(f) O(f) 2. If fo(g) then O(f) o(g) 3. If f O(g) then o(f) o(g) 4. If f O(g) then f(n) + g(n) O(g) 5. If f O(f `) and g O(g`) then f(n)* g(n) O(f `(n) * g`(n)) Theorem: log(n!)(nlogn) Case 1 nlogn O(log(n!)) log(n!) = log(n*(n-1)*(n-2) * * * 3*2*1) = log(n*(n-1)*(n-2)**n/2*(n/2-1)* * 2*1 => log(n/2*n/2* * * n/2*1 *1*1* * * 1) = log(n/2)n/2 = n/2 log n/2 O(nlogn) Case 2 log(n!) O(nlogn) log(n!) = logn + log(n-1) + log(n-2) + . . . Log(2) + log(1) < log n + log n + log n . . . + log n = nlogn The Little o Theorem: If log(f)o(log(g)) and lim g(n) =inf as n goes to inf then f o(g) Note the above theorem does not apply to big O for log(n2) O(log n) but n2 O(n) Application: Show that 2n o(nn) Taking the log of functions we have log(2n)=nlog22 and log( nn) = nlog2n. Hence lim n n log n log n lim n log 2 n log 2 Implies that 2n o(nn) Theorem: lg n o( n ) lg n ln n lim lim n n n ln 2 n 1 ln n lim ln 2 n n 1 1/ n lim ln 2 n 1 (2 n ) 1 2 lim 0 ln 2 n n Practical Complexity 250 f(n) = n f(n) = log(n) f(n) = n log(n) f(n) = n^2 f(n) = n^3 f(n) = 2^n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Practical Complexity 500 f(n) = n f(n) = log(n) f(n) = n log(n) f(n) = n^2 f(n) = n^3 f(n) = 2^n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Practical Complexity 1000 f(n) = n f(n) = log(n) f(n) = n log(n) f(n) = n^2 f(n) = n^3 f(n) = 2^n 0 1 3 5 7 9 11 13 15 17 19 Practical Complexity 5000 4000 f(n) = n f(n) = log(n) 3000 f(n) = n log(n) f(n) = n^2 2000 f(n) = n^3 f(n) = 2^n 1000 0 1 3 5 7 9 11 13 15 17 19 Other Asymptotic Notations A function f(n) is o(g(n)) if positive constants c and n0 such that f(n) < c g(n) n n0 A function f(n) is (g(n)) if positive constants c and n0 such that c g(n) < f(n) n n0 Intuitively, o() is like < O() is like () is like > () is like () is like = Comparing functions Definition: The function f is said to dominate g if f(n)/g(n) increases without bound as n increases without bound. i.e. for any c>0 there exist n0>0 such that f(n)> c g(n) for every n>n0 f ( n) lim n g ( n) 2 n! 2n n2 log2 n 1 n2 Little o Complexity o(g) is the set of all functions that are dominated by g, i.e. The set of all f such that for every c>0 there exist nc>0 such that f(n)c g(n) for every n > nc Up Next Solving recurrences Substitution method Master theorem