Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Chapter 2: Algorithm Analysis Big-Oh and Other Notations in Algorithm Analysis • Classifying Functions by Their Asymptotic Growth • Theta, Little oh, Little omega • Big Oh, Big Omega • Rules to manipulate Big-Oh expressions • Typical Growth Rates Classifying Functions by Their Asymptotic Growth Asymptotic growth : The rate of growth of a function Given a particular differentiable function f(n), all other differentiable functions fall into three classes: .growing with the same rate .growing faster .growing slower Theta f(n) and g(n) have same rate of growth, if lim( f(n) / g(n) ) = c, 0 < c < ∞, n -> ∞ Notation: f(n) = Θ( g(n) ) pronounced "theta" Little oh f(n) grows slower than g(n) (or g(n) grows faster than f(n)) if lim( f(n) / g(n) ) = 0, n → ∞ Notation: f(n) = o( g(n) ) pronounced "little oh" Little omega f(n) grows faster than g(n) (or g(n) grows slower than f(n)) if lim( f(n) / g(n) ) = ∞, n -> ∞ Notation: f(n) = ω (g(n)) pronounced "little omega" Little omega and Little oh if g(n) = o( f(n) ) then f(n) = ω( g(n) ) Examples: Compare n and n2 lim( n/n2 ) = 0, n → ∞, n = o(n2) lim( n2/n ) = ∞, n → ∞, n2 = ω(n) Theta: Relation of Equivalence R: "having the same rate of growth": relation of equivalence, gives a partition over the set of all differentiable functions - classes of equivalence. Functions in one and the same class are equivalent with respect to their growth. Algorithms with Same Complexity Two algorithms have same complexity, if the functions representing the number of operations have same rate of growth. Among all functions with same rate of growth we choose the simplest one to represent the complexity. Examples Compare n and (n+1)/2 lim( n / ((n+1)/2 )) = 2, same rate of growth (n+1)/2 = Θ(n) - rate of growth of a linear function Examples Compare n2 and n2+ 6n lim( n2 / (n2+ 6n ) )= 1 same rate of growth. n2+6n = Θ(n2) rate of growth of a quadratic function Examples Compare log n and log n2 lim( log n / log n2 ) = 1/2 same rate of growth. log n2 = Θ(log n) logarithmic rate of growth Examples Θ(n3): n3 5n3+ 4n 105n3+ 4n2 + 6n Θ(n2): n2 5n2+ 4n + 6 n2 + 5 Θ(log n): log n log n2 log (n + n3) Comparing Functions • same rate of growth: g(n) = Θ(f(n)) • different rate of growth: either g(n) = o (f(n)) g(n) grows slower than f(n), and hence f(n) = ω(g(n)) or g(n) = ω (f(n)) g(n) grows faster than f(n), and hence f(n) = o(g(n)) The Big-Oh Notation f(n) = O(g(n)) if f(n) grows with same rate or slower than g(n). f(n) = Θ(g(n)) or f(n) = o(g(n)) Example n+5 = Θ(n) = O(n) = O(n2) = O(n3) = O(n5) the closest estimation: n+5 = Θ(n) the general practice is to use the Big-Oh notation: n+5 = O(n) The Big-Omega Notation The inverse of Big-Oh is Ω If then g(n) f(n) = O(f(n)), = Ω (g(n)) f(n) grows faster or with the same rate as g(n): f(n) = Ω (g(n)) Rules to manipulate Big-Oh expressions Rule 1: a. If T1(N) = O(f(N)) and T2(N) = O(g(N)) then T1(N) + T2(N) = max( O( f (N) ), O( g(N) ) ) Rules to manipulate Big-Oh expressions b. If and then T1(N) = O( f(N) ) T2(N) = O( g(N) ) T1(N) * T2(N) = O( f(N)* g(N) ) Rules to manipulate Big-Oh expressions Rule 2: If T(N) is a polynomial of degree k, then T(N) = Θ( Nk ) Rule 3: log k N = O(N) for any constant k. Examples n2 + n = O(n2) we disregard any lower-order term nlog(n) = O(nlog(n)) n2 + nlog(n) = O(n2) Typical Growth Rates C logN log2N N NlogN N2 N3 2N N! constant, we write O(1) logarithmic log-squared linear quadratic cubic exponential factorial Exercise N2 2N N N2 2N N True or False = O(N2) = O(N2) = O(N2) = O(N) = O(N) = O(N) Exercise True or False N2 = Θ (N2) 2N = Θ (N2) N = Θ (N2) N2 = Θ (N) 2N = Θ (N) N = Θ (N) Running Time Calculations The work done by an algorithm, i.e. its complexity, is determined by the number of the basic operations necessary to solve the problem. 25 The Task Determine how the number of operations depend on the size of input : N - size of input F(N) - number of operations 26 Basic operations in an algorithm Problem: Find x in an array Operation: Comparison of x with an entry in the array Size of input: The number of the elements in the array 27 Basic operations …. Problem: Multiplying two matrices with real entries Operation: Multiplication of two real numbers Size of input: The dimensions of the matrices 28 Basic operations …. Problem: Sort an array of numbers Operation: Comparison of two array entries plus moving elements in the array Size of input: The number of elements in the array 29 Counting the number of operations A. for loops O(n) The running time of a for loop is at most the running time of the statements inside the loop times the number of iterations. 30 for loops sum = 0; for( i = 0; i < n; i++ ) sum = sum + i; The running time is O(n) 31 Counting the number of operations B. Nested loops The total running time is the running time of the inside statements times the product of the sizes of all the loops 32 Nested loops sum = 0; for( i = 0; i < n; i++) for( j = 0; j < n; j++) sum++; The running time is O(n2) 33 Counting the number of operations C. Consecutive program fragments Total running time : the maximum of the running time of the individual fragments 34 Consecutive program fragments sum = 0; for( i = 0; i < n; i++) sum = sum + i; O(n) sum = 0; O(n2) for( i = 0; i < n; i++) for( j = 0; j < 2n; j++) sum++; The maximum is O(n2) 35 Counting the number of operations D: If statement if C else S1; S2; The running time is the maximum of the running times of S1 and S2. 36 EXAMPLES what is the number of operations? sum = 0; for( i = 0 ; i < n; i++) for( j = 0 ; j < n*n ; j++ ) sum++; 37 EXAMPLES what is the number of operations? sum = 0; for( i = 0; i < n ; i++) for( j = 0; j < i ; j++) sum++; 38 EXAMPLES what is the number of operations? for(j = 0; j < n*n; j++) compute_val(j); The complexity of compute_val(x) is given to be O(n*logn) 39 Search in an unordered array of elements for (i = 0; i < n; i++) if (a[ i ] == x) return 1; return -1; 40 Search in a table n x m for (i = 0; i < n; i++) for (j = 0; j < m; j++) if (a[ i ][ j ] == x) return 1 ; return -1; 41 Max Subsequence Problem • • Given a sequence of integers A1, A2, …, An, find the maximum possible value of a subsequence Ai, …, Aj. Numbers can be negative. You want a contiguous chunk with largest sum. • • Example: -2, 11, -4, 13, -5, -2 The answer is 20 (subseq. A2 through A4). • We will discuss 4 different algorithms, with time complexities O(n3), O(n2), O(n log n), and O(n). With n = 106, algorithm 1 may take > 10 years; algorithm 4 will take a fraction of a second! • • 42 Algorithm 1 for Max Subsequence Sum • Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj 0 if the max value is negative int maxSum = 0; O (1) for( int i = 0; i < a.size( ); i++ ) for( int j = i; j < a.size( ); j++ ) { O (1) int thisSum = 0; for( int k = i; k <= j; k++ ) O (1) thisSum += a[ k ]; if( thisSum > maxSum ) O (1) maxSum = thisSum; } return maxSum; Time 43 complexity: O(n3) O( j i) n 1 n 1 n 1 j i i 0 j i O( ( j i)) O( ( j i)) Algorithm 2 • Idea: Given sum from i to j-1, we can compute the sum from i to j in constant time. • This eliminates one nested loop, and reduces the running time to O(n2). into maxSum = 0; for( int i = 0; i < a.size( ); i++ ) int thisSum = 0; for( int j = i; j < a.size( ); j++ ) { thisSum += a[ j ]; if( thisSum > maxSum ) maxSum = thisSum; } return maxSum; 44 Algorithm 3 • This algorithm uses divide-and-conquer paradigm. • Suppose we split the input sequence at midpoint. • The max subsequence is entirely in the left half, entirely in the right half, or it straddles the midpoint. • Example: left half | right half 4 -3 5 -2 | -1 2 6 -2 • Max in left is 6 (A1 through A3); max in right is 8 (A6 through A7). But straddling max is 11 (A1 thru A7). 45 Algorithm 3 (cont.) • Example: left half | right half 4 -3 5 -2 | -1 2 6 -2 • Max subsequences in each half found by recursion. • How do we find the straddling max subsequence? • Key Observation: – Left half of the straddling sequence is the max subsequence ending with -2. – Right half is the max subsequence beginning with -1. • A linear scan lets us compute these in O(n) time. 46 Algorithm 3: Analysis • The divide and conquer is best analyzed through recurrence: T(1) = 1 T(n) = 2T(n/2) + O(n) • This recurrence solves to T(n) = O(n log n). 47 Algorithm 4 2, 3, -2, 1, -5, 4, 1, -3, 4, -1, 2 int maxSum = 0, thisSum = 0; for( int j = 0; j < a.size( ); j++ ) { thisSum += a[ j ]; if ( thisSum > maxSum ) maxSum = thisSum; else if ( thisSum < 0 ) thisSum = 0; } } return maxSum; • Time complexity clearly O(n) • But why does it work? I.e. proof of correctness. 48 Proof of Correctness • Max subsequence cannot start or end at a negative Ai. • More generally, the max subsequence cannot have a prefix with a negative sum. Ex: -2 11 -4 13 -5 -2 • Thus, if we ever find that Ai through Aj sums to < 0, then we can advance i to j+1 – Proof. Suppose j is the first index after i when the sum becomes < 0 – The max subsequence cannot start at any p between i and j. Because Ai through Ap-1 is positive, so starting at i would have been even better. 49 Algorithm 4 int maxSum = 0, thisSum = 0; for( int j = 0; j < a.size( ); j++ ) { thisSum += a[ j ]; if ( thisSum > maxSum ) maxSum = thisSum; else if ( thisSum < 0 ) thisSum = 0; } return maxSum • The algorithm resets whenever prefix is < 0. Otherwise, it forms new sums and updates maxSum in one pass. 50 Why Efficient Algorithms Matter • Suppose N = 106 • A PC can read/process N records in 1 sec. • But if some algorithm does N*N computation, then it takes 1M seconds = 11 days!!! • 100 City Traveling Salesman Problem. – A supercomputer checking 100 billion tours/sec still requires 10100 years! • 51 Fast factoring algorithms can break encryption schemes. Algorithms research determines what is safe code length. (> 100 digits) How to Measure Algorithm Performance • What metric should be used to judge algorithms? – Length of the program (lines of code) – Ease of programming (bugs, maintenance) – Memory required Running time • Running time is the dominant standard. – Quantifiable and easy to compare – Often the critical bottleneck 52 Logarithms in Running Time • • • • Binary search Euclid’s algorithm Exponentials Rules to count operations 53 Divide-and-conquer algorithms Subsequently reducing the problem by a factor of two require O(logN) operations 54 Why logN? A complete binary tree with N leaves has logN levels. Each level in the divide-and- conquer algorithms corresponds to an operation Hence the number of operations is O(logN) 55 Example: 8 leaves, 3 levels 56 Binary Search Solution 1: Scan all elements from left to right, each time comparing with X. O(N) operations. 57 Binary Search Solution 2: O(logN) Find the middle element Amid in the list and compare it with X If they are equal, stop If X < Amid consider the left part If X > Amid consider the right part Do until the list is reduced to one element 58 Euclid's algorithm Finding the greatest common divisor (GCD) GCD of M and N, M > N, = GCD of N and M % N 59 GCD and recursion Recursion: If M%N = 0 return N Else return GCD(N, M%N) The answer is the last nonzero remainder. 60 M 24 N 15 rem 9 15 9 6 9 6 3 3 0 6 3 0 61 long gcd ( long m, long n) { long rem; while (n != 0) { rem = m % n; m = n; n = rem; } Euclid’s Algorithm (non-recursive implementation) return m; } 62 Why O(logN) M % N <= M / 2 After 1st iteration N appears as first argument, the remainder is less than N/2 After 2nd iteration the remainder appears as first argument and will be reduced by a factor of two Hence O(logN) 63 Computing XN N X N X = 2 N / 2 X*(X ) ,N is odd = 2 N / 2 (X ) ,N is even 64 long pow (long x, int n) { if ( n == 0) return 1; if (is_Even( n )) return pow(x * x, n/2); else return } x * pow ( x * x, n/2); 65 Why O(LogN) If N is odd : two multiplications The operations are at most 2logN: O(logN) 66 Another recursion for XN Another recursive definition that reduces the power just by 1: XN = X*XN -1 Here the operations are N-1, i.e. O(N) and the algorithm is less efficient than the divide-and-conquer algorithm. 67 How to count operations • single statements (not function calls) : constant O(1) = 1. • sequential fragments: the maximum of the operations of each fragment 68 How to count operations • single loop running up to N, with single statements in its body: O(N) • single loop running up to N, with the number of operations in the body O(f(N)): O( N * f(N) ) 69 How to count operations • two nested loops each running up to N, with single statements: O(N2) • divide-and-conquer algorithms with input size N: O(logN) Or O(N*logN) if each step requires additional processing of N elements 70 Example: What is the probability two numbers to be relatively prime? tot = 0; rel = 0; for ( i = 0; i <= n; i++) for (j = i+1; j <= n; j++) { tot++; if ( gcd( i, j ) ==1) rel++; } return (rel/tot); Running time = ? 71