Chapter One: Algorithm Analysis A Big-O Analysis Data Structure: is the way in which the data of the program are stored: The questions the data structure answered: - How data are arranged in relation to each other? - Which data are kept in memory? - Which data are calculated when needed? - Which data are kept in files, and how the files are arranged? Algorithm: is a well defined sequence of computational steps to solve a problem. It is often accepts a set of values as input & produces a set of values as output. Algorithm Analysis: is the number of steps or instructions and memory locations needed to perform a certain problem for any input of a particular size. A big-O Analysis: is a technique for estimating the time and space requirements of an algorithm in terms of Order of magnitude. Lets assume that we are working with a hypothetical computer that requires one microsecond (millionths of a second) to perform one of its fundamental operations (computing two numbers or moving the contents of one memory word to another). With execution speeds of this kind, it makes little sense to analyze the efficiency of those portions of a program that perform only initializations and final reporting of summary results. The key to analyzing a function’s efficiency is to scrutinize its loops, especially its nested loops. Consider the following two examples of nested loops intended to sum each of the rows of an N x N matrix, storing the row sums in the one-dimensional vector rows and the overall total in GrandTotal: - Example - 1: GrandTotal = 0; for (int k = 0 ; k < n-1 ; ++k ) { rows[ k ] = 0; for ( int j = 0 ; j < n-1 ; ++j ) { rows[ k ] = rows[ k ] + matrix[ k ][ j ]; GrandTotal = GrandTotal + matrix[ k ][ j ]; } } - Example - 2: GrandTotal = 0; for (int k = 0 ; k < n-1 ; ++k ) { rows[ k ] = 0; for ( int j = 0 ; j < n-1 ; ++j ) rows[ k ] = rows[ k ] + matrix[ k ][ j ]; GrandTotal = GrandTotal + rows[ k ]; } If we analyze the number of addition operations required by these two examples, it should be obvious that example-2 is better in this respect. Example-1 requires 2N2 additions. Example-2 requires N2+N additions. Use 1000 x 1000 matrix Example-1 requires two seconds to perform additions. Example-2 requires less than two second performing additions. Use 100,000 x 100,000 matrix Example-1 requires about six hours performing additions. Example-2 requires about three hours performing additions. The run times of the two algorithms are directly proportional to each other. Example: Find the number of addition instructions in the following code segment? for ( int i = 1; i<= n/2; i++) { for(int j =1; j <= n; j++) a[i]= a[i]+b[i][j]; for (int k=1 ; k<=n/2; k++) c[k] = c[k]+ d[i][k]; } Order of magnitude: power of ten – two numbers has the same order of magnitude if their representations in scientific notation have identified exponents to designate the power of ten. Because of the phenomenal execution speeds and very large amounts of available memory on modern computers, proportionally small differences between algorithms may often have little practical impact. Such considerations have led computer scientists to devise a method of algorithm classification that makes more precise the notion of order of magnitude as it applies to time and space considerations. This method of classification referred to as big-O notation: Suppose there exists a function f(n) defined on the non-negative integers such that the number of operations required by an algorithm for an input of size n is less than or equal to some constant C multiplied by f(n) for all but finitely many n. That is, the number of operations is at worst proportional to f(n) for all large values of n. Such an algorithm is said to be an O[f(n)] algorithm relative to the number of operations (memory locations) it requires to execute. O( f(n) ) = { g(n) : c, n0 > 0 such that g(n) ≤ c * f(n) , n ≥ n0 } Graphical representation of O[ f(n) ] Big-O notation: saying that an algorithm is O( f(n) ) indicates that the function f(n) may be useful in characterizing how efficiently the algorithm performs for large n. for such n, we are assured that the operations required by the algorithm will be bounded by the product of a constant & f(n). Any algorithm that is O( n2 ) will also be O( n3 ). The importance of the constant C, known as the constant of proportionality, lies in comparing algorithms that share the same function f(n); it makes no difference in the comparison of algorithm for which f(n) is of different magnitude. Example: use big-O analysis to characterize the two code segments from examples 1 & 2. Algorithm of example-1 performs 2N2 addition, it is characterized as O(N2) with 2 as a constant of proportionality. Algorithm of example-2 performs N2+N addition. However, N2+N ≤ 1.1 N2 for any N ≥ 10. so, we can characterize it as O(N2) with 1.1 as a constant of proportionality. Example: ⅓n2 – 3n ⅓n2 – 3n ≤ cn2 ⅓ - 3/n ≤ c O( n2 ) for c=⅓ & n > 1 Complexity categories: growth rates of some common complexity functions. Dominant term: is the highest power of n in a polynomial. Example: n2 + 50n The n2 term dominates the 50n term since, for n ≥ 50 we have n2 + 50n ≤ n2 + n2 = 2 n2 Thus, n2 + 50n would lead to an O(n2). In general: n dominates logan , a is often 2. n logan dominates n , a is often 2. n2 dominates n logan , a is often 2. nm dominates nk , where m > k. an dominates nm ,for any a >1 & m ≥ 0. Example: use big-O notation to analyze the time efficiency of the following fragment of C++ codes. 1) for ( k=1 ; k <= n/2 ; ++k ) { . . . for ( j=1 ; j <= n*n ; ++j ) { . . . } } n2 * n/2 = n3/2 O(n3), with c = ½ Note: for two loops with O[f1(n)] and O[f2(n)] efficiencies, the efficiency of the nesting of these two loops is O[f1(n)]*O[f2(n)]. 2) for ( k=1 ; k <= n/2 ; ++k ) { . . . } for ( j=1 ; j <= n*n ; ++j ) { . . . } n/2 + n2 O(n2) Note: for two loops with O[f1(n)] and O[f2(n)] efficiencies, the efficiency of the sequencing of these two loops is O[fD(n)], where O[fD(n)] is the dominant of the functions O[f1(n)] & O[f2(n)]. 3) while ( k > 1 ) { . . . k = k/2 ; } Because the loop control variable is cut in half each time through the loop, the number of times that statements inside the loop will be executed is log2n. O(log2n)