Algorithm Analysis CS 201 Fundamental Structures of Computer Science Introduction • Once an algorithm is given for a problem and decided to be correct, an important step is to determine how much in the way of resources (such as time or space) the algorithm will require. • In this chapter, we shall discuss – How to estimate the time required for a program Mathematical background • The idea of the following definitions is to establish a relative order among functions. • Given two functions, there are usually points where one function is smaller than the other function. – It does not make sense to claim f(N) < g(N) – Let us compare f(N) = 1000N and g(N) = N2 – Thus, we will compare their relative rates of growth Mathematical background • Definition 1: T ( N ) O( f ( N )) if there are positive constants c and n0 such that T ( N ) c f ( N ) when N n0 T ( N ) O( f ( N )) says that the growth rate of T ( N ) is less than or equal to that of f ( N ) T ( N ) grows at a rate no faster than f ( N ) Thus f ( N ) is an upper bound on T ( N ) Mathematical background • Definition 2: T ( N ) ( g ( N )) if there are positive constants c and n0 such that T ( N ) c g ( N ) when N n0 T ( N ) ( g ( N )) says that the growth rate of T ( N ) is greater than or equal to that of g ( N ) T ( N ) grows at a rate no slower than g ( N ) Thus g ( N ) is a lower bound on T ( N ) Mathematical background • Definition 3: T ( N ) (h( N )) if and only if T ( N ) O(h( N )) and T ( N ) (h( N )) T ( N ) (h( N )) says that the growth rate of T ( N ) equals to that of h( N ) Examples • f(N)=N3 grows faster than g(N)=N2, so g(N) = O(f(N)) or f(N) = (g(N)) • f(N)=N2 and g(N)=2N2 grow at the same rate, so f(N) = O(g(N)) and f(N) = (g(N)) • If g(N)=2N2, g(N)=O(N4), g(N)=O(N3), g(N)=O(N2) are all technically correct, but the last one is the best answer. • Do not say T(N)=O(2N2) or T(N)=O(N2+N). The correct form is T(N)=O(N2). Mathematical background • Rule 1: If T1 ( N ) O( f ( N )) and T2 ( N ) O( g ( N )), then (a) T1 ( N ) T2 ( N ) O( f ( N ) g ( N )) (b) T1 ( N ) T2 ( N ) O( f ( N ) g ( N )) • Rule 2: If T ( N ) is a polynomial of degree k , then T ( N ) ( N k ) • Rule 3: log k N O( N ) for any constant k . This tells us that logarithms grow very slowly. Examples • Determine which of f(N)=N logN and g(N)=N1.5 grows faster. Determine which of logN and N0.5 grows faster. Determine which of log2N and N grows faster. Since N grows faster than any power of a log, g(N) grows faster than f(N). Examples • Consider the problem of downloading a file over the Internet. – Setting up the connection: 3 seconds – Download speed: 1.5 Kbytes/second • If a file is N kilobytes, the time to download is T(N) = N/1.5 + 3, i.e., T(N) = O(N). – 1500K file takes 1003 seconds – 750K file takes 503 seconds • If the connection speed doubles, both times decrease, but downloading 1500K still takes approximately twice the time downloading 750K. Typical growth rates Function c log N log2 N N N log N N2 N3 2N Name Constant Logarithmic Log-squared Linear Quadratic Cubic Exponential Plots of various algorithms Algorithm analysis • The most important resource to analyze is generally the running time – We assume that simple instructions (such as addition, comparison, and assignment) take exactly one unit time • Unlike the case with real computers – For example, I/O operations take more time compared to comparison and arithmetic operators – Obviously, we do not have this assumption for fancy operations such as matrix inversion, list insertion, and sorting. – We assume infinite memory – We do not include the time required to read the input Algorithm analysis • Typically, the size of the input is the main consideration – Worst-case performance represents a guarantee for performance on any possible input – Average-case performance often reflects typical behavior – Best-case performance is often of little interest • Generally, it is focused on the worst-case analysis – It provides a bound for all inputs – Average-case bounds are much more difficult to compute Algorithm analysis • Although using Big-Theta would be more precise, Big-Oh answers are typically given. • Big-Oh answer is not affected by the programming language – If a program is running much more slowly than the algorithm analysis suggests, there may be an implementation inefficiency Algorithm analysis • If two programs are expected to take similar times, probably the best way to decide which is faster to code them both up and run them! • On the other hand, we would like to eliminate the bad algorithmic ideas early by algorithm analysis – Although different instructions can take different amounts of time (these would correspond to constants in Big-Oh notation), we ignore this difference in our analysis and try to find the upper bound of the algorithm A simple example • Estimate the running time of the following function int sum(int n){ int partialSum = 0; for (int i = 0; i <= n; i++) partialSum += i * i * i; return partialSum; } General rules • Rule 1 – for loops: The running time of a for loop is at most the running time of the statements inside the for loop (including tests) times the number of iterations • Rule 2 – nested loops: Analyze these inside out. The total running time of a statement inside a group of nested loops is the running time of the statement multiplied by the product of the sizes of all the loops for (i = 0; i < n; i++) for (j = i; j < n; j++) k++; General rules • Rule 3 – consecutive statements: just add. for (i = 0; i < n; i++) a[i] = 0; for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[i] += a[j] + i + j; • Rule 4 – if/else: For the following, the running time is never more than that of the test plus the larger of that of S1 and S2 if (condition) S1 else S2 General rules • If there are function calls, these must be analyzed first • If there are recursive functions, be careful about their analyses. For some recursions, the analysis is trivial long factorial(int n){ if (n <= 1) return 1; return n * factorial(n-1); } General rules • Let us analyze the following recursion long fib(int n){ if (n <= 1) return 1; return fib(n-1) + fib(n-2); } • T ( N ) T ( N 1) T ( N 2) 2 • By induction, it is possible to show that fib( N ) (5 / 3) N and fib( N ) (3 / 2) N • So the running time grows exponentially. By using a for loop, the running time can be reduced substantially Sequential search index = -1; for (i = 0; i < N; i++) if (A[i] == value){ index = i; break; } • Worst-case: It is the last element or it is not found – N iterations O(N) • Best-case: It is the first element – 1 iteration O(1) • Average case: We may have N different cases – (N + 1) / 2 iterations O(N) (successful searches) – N iterations O(N) (unsuccessful searches) Searching a value in a sorted array using binary search int binarySearch(int *A, int N, int value){ int low = 0, high = N; while (low <= high){ int mid = (low + high) / 2; if (a[mid] < value) low = mid + 1; else if (a[mid] > value) high = mid - 1; else return mid; } return -1; } Iterative solution of raising an integer to a power long pow1(long x, long n){ long result = 1; for (int i = 1; i <= n; i++) result = result * x; return result; } Recursive solution of raising an integer to a power long pow2(long x, long n){ if (n == 0) return 1; if (n == 1) return x; if (n % 2 == 0) return pow2(x*x,n/2); else return pow2(x*x,n/2) * x; }