1 Complexity Analysis Given an algorithm, how do you find its time complexity? One method is to run the corresponding program and measure the time taken for inputs of various sizes. By analyzing the graph you may be able to guess the complexity: linear, quadratic, exponential, etc. Eg: if T(n) = n3 then log T(n) = 3 log n which is revealed by plotting log T(n) against log n; if T(n) = 2n then log T(n) = n log 2 which is revealed by plotting log T(n) against n. This empirical approach may be acceptable, but there are objections. It may provide good evidence about the complexity, but until you have a proof it is not guaranteed. Also, perhaps you are deciding amongst several algorithms: it would be time consuming to code each of them and to run them and collect timing statistics. Instead, it would be better to perform a mathematical analysis first. If the algorithm contains loops, a loop analysis can be used. If the algorithm uses recursion then equations can be set up, and these can be analysed. 2 Loop Analysis Here is a case study to illustrate the method. Consider the problem of listing Pythagorean triplets: integers x, y and z such that x2 + y2 = z2, eg: 3,4,5 and 5,12,13. (These are the lengths of the sides of a right angle triangle.) The following simple algorithm generates all possible triples less than or equal to the input value, n, and tests if the Pythagorean condition is met. pythag(n): for x 1 to n do for y 1 to n do for z 1 to n do if x2 + y2 = z2 then print x, y, z How much time is taken? We could analyze the time taken for each machine instruction: the time taken to set x to 1, to test if x n, to increment x, to compute x2, to add x2 and y2, to compare, to print, etc. That would be very tedious! Instead, we focus on the number of times that the inner “if” statement is executed, and the total time will be proportional to that. The outer loop is executed n times. For each value of x the y loop is also executed n times. For each value of y the z loop is also executed n times. Thus, the “if” statement is executed n3 times. So, the time complexity is O(n3). A more detailed analysis shows that the inner loop has (about) 6 “basic” operations, and the total number of basic operations, T(n), could be something like 6n3 + 5n2 + 7n + 9 (the exact constants are not important), which is still O(n3). A major improvement is to remove the z loop by testing if x2 + y2 is a square: Page 1 pythag(n): for x 1 to n do for y 1 to n do z2 x2 + y2 z z2 if z2 = z2 then print x, y, z The inner statements are now executed n2 times, so the complexity of this improved algorithm is O(n2). (Note that rounding to the nearest integer should be done rather than rounding down, because of the inherent lack of precision of floating values.) Another improvement is to avoid essentially identical output such as both 3,5,12 and 5,3,12. pythag(n): for x 1 to n do for y x to n do z2 x2 + y2 z z2 if z2 = z2 then print x, y, z When x = 1, y varies from 1 to n, so the inner statements are executed n times. When x = 2, y varies from 2 to n requiring n -1 executions. When x = 3, there are n – 2 executions. Finally when x = n, there is 1 execution. Adding all these up we get n + (n-1) + (n-2) + … + 2 + 1. This is the familiar series with sum ½ n(n+1) ½ n2. The complexity is also O(n2), but is twice as fast as before. 3 Recursion Analysis Merge Sort Let’s study the recursive merge sorting. Here we will count only the number of comparisons, T(n). (A similar analysis that counts all basic operations gives the same complexity.) If the list has n=1 item then no comparisons are needed. Otherwise, split the list into one list of size n/2 and another list of size n/2, then do a merge. Merge the two sorted sublists requires, at most, n-1 comparisons. The recursive equations are: T(1) = 0 T(n) = T( n/2 ) + T( n/2 ) + n – 1 A detailed math analysis is beyond this course, but we can make some educated guesses. Looking at the values of T(n) we (hopefully) notice some patterns when n is a power of 2. Page 2 1 0 2 1 4 5 8 17 Notice that 321 is 1 more than a multiple of 64: 2 4 8 16 2*0+1 4*1+1 8*2+1 16*3+1 16 49 32 129 32 32*4+1 64 64*5+1 25 32*4+1 26 64*5+1 64 321 Now if we write n as a power of 2: 21 2*0+1 22 4*1+1 23 8*2+1 24 16*3+1 we see that T(26) = 26*5 + 1, which generalises to T(2m) = (m-1)2m + 1. Using n = 2m, T(n) = (log n – 1)n + 1 = n log n – n + 1. (The log base is 2.) Even when n is not a power of 2, T(n) is very close to n log n – n + 1. The major term is n log n, so T(n) is O(n log n). Integer Multiplication The Standard algorithm to multiply two large n digit numbers requires O(n2) time because there are basically n additions of n digit numbers. A faster algorithm, the Karatsuba algorithm, is found by a clever recursive “divide and conquer” algorithm. Here is the idea, in base 10 - two 4 digit numbers can be multiplied by recursively multiplying 2 digit numbers: (100 a + b)*(100 c + d) = 10000 a*c + 100 [(a+b)*(c+d)-(a*c + b*d)] + b*d Generalising, the left side involves 1 multiplication of two n digit numbers. The right side recursively involves 3 multiplications of two n/2 digit numbers: of a*c, b*d and (a+b)*(c+d). Some additions and shifting are also required, taking O(n) time. So, T(1) = a T(n) = 3T(n/2) + bn for some constants a and b. The solution, beyond this course, is O(nlog 3) = O(n1.59). If you check nlog 3 = 3(n/2)log 3 you see where the log 3 comes from. The Karatsuba algorithm is faster than the Standard one for numbers with more than about 100 digits. Instead of recursing all the way to 1 digit numbers, you can, at some point, use the Standard algorithm. (See Wikipedia for details.) There is a theoretically faster algorithm with complexity O(n log n log log n), which becomes faster than Karatsuba for numbers with more than 20,000 digits. It is used in the GIMPS – Great Internet Mersenne Prime Search software, which in August and September 2008 found two large primes with more than 100 million digits, yielding a $100,000 prize! Page 3