38 CHAPТER 2 Analysis of Algoгithms 2.1 Algorithm Efficiency Опе of the quality characteristics discussed іп section 1.1 is the efficient use of re­ soиrces. Опе of the most important resoиrces is CPU time. The еНісіепсу of ап аl­ gorithm we use to accomplish а particular task is а major factor that determines how fast а program executes. Although the techniques that we will discuss here тау also Ье used to analyze ап algorithm relative to the amount of memory it uses, we will focus oиr discussion оп the efficient use of processing time. The analysis of algorithms is а fundamental computer science topic and involves а variety of techniques and concepts. It is а primary theme that we retиrn to throughout this book. This chapter introduces the issues related to algorithm analysis and lays the groundwork for using analysis techniques. КЕУ CONCEPT Algorithm analysis is а fundamental computer science topic. Let's start with ап everyday ехатрlе: washing dishes Ьу hand. If we assume that washing а dish takes зо seconds and drying а dish takes ап additional зо sec­ onds, then we сап see quite easily that it would take n minutes to wash and dry n dishes. This computation could Ье expressed as follows: Тіте (п dishes) = = n "- (зо seconds wash time + зо seconds dry time) БОп seconds . or, written more formally: f(x) f(x) = = ЗОх + ЗОх БОх Оп the other hand, suppose we were careless while washing the dishes and splashed too much water around. Suppose each time we washed а dish, we had to dry not опlу that dish but also аН of the dishes we had washed before that опе. It would still take зо seconds to wash each dish, but now it would take зо seconds to dry the last dish (опсе), 2 * зо or БО seconds to dry the second-to-last dish (twice), З " ЗО or 90 seconds to dry the third-to-last dish (three times), and so оп. This computation could Ье expressed as follows: n Тіте (п dishes) = п" (ЗО seconds wash time) Using the formula for ап arithmetic series 2:7 і becomes Тіте (п dishes) = = + = L і=l п(п + (і * ЗО) 1)/2 then the function ЗОп + ЗОп(п + 1)/2 15п2 + 45п seconds If there were ЗО dishes to wash, the first approach would take ЗО minutes, whereas the second (careless) approach would take 247.5 minutes. The more dishes 2.2 Growth Functions and Big-OH Notation we wash the worse that discrepancy becomes. For ехатрlе, if there were 300 dishes to wash, the first approach would take 300 minutes or 5 hours, whereas the second approach would take 908,315 minutes or roughly 15,000 hours! 2.2 Growth Functions and Big-OH Notation For every algorithm we want to analyze, we need to define the size of the problem. For our dishwashing ехатрlе, the size of the problem is the number of dishes to Ье washed and dried. We also must determine the уаlие that represents efficient use of time or space. For time considerations, we often pick ап appropriate processing step that we'd like to minimize, such as our goal to minimize the number of times а dish has to Ье washed and dried. The overall amount of time spent оп the task is directly related to how тапу times we have to perform that task. The algorithm's efficiency сап Ье defined іп terms of the problem size and the processing step. Consider ап algorithm that sorts а list of numbers into increasing order. Опе natural way to express the size of the problem would Ье the number of values to Ье sorted. The processing step we are trying to optimize could Ье expressed as the number of compariso.ns we have to make for the algorithm to put the values іп order. The more comparisons we make, the more CPU time is used. КЕУ CONCEPT А growth function shows time ог space utilization relative to the ргоытm size. А growth function shows the relationship between the size of the problem (п) and the уаlие we hope to optimize. This function represents the time complexity or space complexity of the algorithm. The growth function for our second dishwashing algorithm is t(n) = 15п2 + 45п However, it is not typically necessary to know the exact growth function for ап algorithm. Instead, we are таіпlу interested іп the asymptotic complexity of ап algorithm. That is, we want to focus оп the general nature of the function as n іп­ creases. This characteristic is based оп the dominant term of the expression-the term that increases most quickly as n increases. As n gets very.large, the уаlие of the dishwashing growth function is dominated Ьу the п2 term because the п2 term grows much faster than the n term. The constants, іп this case 15 and 45, and the secondary term, іп this case 45п, quickly Ьесоmе irrelevant as n increases. That is to say, the уаlие of п2 dominates the growth іп the уаlие of the expression. The table іп Figure 2.1 shows how the two terms and the уаlие of the expres­ sion grow. As уои сап see from the table, as n gets larger, the 15п2 term domi­ nates the уаlие of the expression. It is important to note that the 45п term is larger for very small values of п. Saying that а term is the dominant term as n gets large does not теап that it is larger than the other terms for all values of п. 39 CHAPTER 2 40 Analysis of Algorithms Number of dishes (п) 15п2 45п 15п2 + 45п 1 15 45 60 2 60 90 150 5 375 225 600 10 1,500 450 1,950 100 150,000 4,500 154,500 1,000 15,000,000 45,000 15,045,000 10,000 1,500,000,000 450,000 1,500,450,000 100,000 150,000,000,000 4,500,000 150,004,500,000 1,000,000 15,000,000,000,000 45,000,000 15,000,045,000,000 10,000,000 1,500,000,000,000,000 450,000,000 1,500,000,450,000,000 FIGURE 2.1 Comparison of terms іп growth function The asymptotic complexity is called the order of the algorithm. Thus, ош sec­ ond dishwashing algorithm is said to have order п2 time complexity, written 0(п2). Ош first, more efficient dishwashing ехаmрlе, with growth function t(n) 60(п) would have order п, written О(п). Thus the reason for the difference Ье­ tween our О(п) original algorithm and ош 0(п2) sloppy algorithm is the fact each dish will have to Ье dried multiple times. = This notation is referred to as ОО or Big-Oh notation. А growth function that executes іп constant time regardless of the size of the problem is said to have 0(1). Іп general, we are опlу concerned with executable statements іп а program or algorithm іп determining its growth function and еНісіепсу. Кеер іп mind, however, that some declarations mау include initializations and some of these mау Ье соmрlех enough to factor into the еНісіепсу of ап algorithm. As ап ехаmрlе, assignment statements and if statements that are опlу executed опсе regardless of the size of the problem are 0(1). The order of ап algorithm is found Ьу Therefore, it does not matter how mапу of those уои string together; it eliminating constants and аll ЬиІ the is stiII 0(1). Loops and method calls mау result іп higher order growth dominant Іегт іп the algorithm's growth function. functions because they mау result іп а statement or series of statements being executed more than опсе based оп the size of the problem. We wiII discuss these separately іп later sections of this chapter. Fіgше 2.2 shows several growth functions and their asymptotic complexity. КЕУ CONCEPT More formally, saying that the growth function t(n) 15п2 + 45п is 0(п2) means that there exists а constant m and some уаlие of n (по), such that t(n) ::; m п2 for аll n > по. Another way of stating this is that the order of ап algorithm provides ап upper bound to its growth function. It is also important to note that there are other related notations such as omega (Q) which refers to а function that provides а lower = КЕУ CONCEPT The order of ап algorithm provides ап иррег bound ІО the algorithm's growth function. * 2.3 Comparing Growth Functions Label Growth Function Order t(n) = 17 0( 1) constant t(n) = 310g n O(log п) logarithmic t(n) = 20п-4 О(п) lіпеаг t(n) = 12п log n + 100п О(п log п) n log n t(n) = зп2 + 5п - 2 0(п2) quadratic t(n) = 8пЗ + зп2 О(пЗ) cubic t(n) = 2П + 18п2 + 3п 0(2П) exponential FIGURE 2.2 Some growth functions and their asymptotic complexity bound and theta (8) which refers to а function that provides both an upper and lower bound. We will focus our discussion оп order. Because the order of the function is the key factor, the other terms and constants are often not even mentioned. Аll algorithms within а given order are considered to Ье generally equivalent in terms of efficiency. For ехатрlе, while two algorithms to accomplish the same task тау have different growth functions, if they are both O(n2) then they are considered to Ье roughly equivalent with fespect to efficiency. 2.3 Comparing Growth Functions One might assume that, with the advances in the speed of processors and the ауаіl­ ability of large amounts of inexpensive memory, algorithm analysis would по longer Ье necessary. However, nothing could Ье farther from the truth. Processor speed and memory cannot make ир for the differences in efficiency of algorithms. Кеер in mind that in our previous discussion we have been eliminating constants as irrelevant when discussing the order of an algorithm. Increasing processor speed simply adds а constant to the growth function. When possible, finding а more effi­ cient algorithm is а better solution than finding а faster processor. Another way of looking at the effect of algorithm complexity was proposed Ьу Aho, Hopcroft, and Ullman (1974). If а system can currently handle а problem of size n in а given tirnt: period, what happens to the allowable size of the problem if we increase the speed of the processor tenfold? As shown in Figure 2.3, the linear case is relatively simple. AIgorithm А, with а linear tirne complexity of n, is indeed improved Ьу а factor of 10, meaning that this algorithm can process 10 times the data in the same amount of time given а tenfold speed ир of the processor. However, algorithm В, with а time complexity of n2, is only improved Ьу а factor of 3.16. Why do we not get the full tenfold increase in problem size? Because the complexity of algorithm В is n2 our effective speedup is only the square root of 10 or 3.16. 41 CHAPТER 2 42 Analysis of Algorithms Algorithm Мах Problem Size Тіте Complexity Before Speedup Мах Problem Size After Speedup А n 81 1081 В п2 82 3.1682 С п3 8з 2.158з D 2n 84 Increase in prob!eт size with F І GUR Е 2. З а 84 + :J. 3 tenfo!d increase in processor speed Siтi!ar!y, a!gorithт С, with coтp!exity n3, is on!y iтproved Ьу а factor of 2.15 or the сuЬе root of 10. For a!gorithтs with If the al90rithm is inefficient, а faster exponential complexity !ike a!gorithт D, in which the size variab!e is processor will not help іп the IОП9 in the exponent of the coтp!exity terт, the situation is far worse. Іп run. this case the speed uр is !og n or in this case, 3.3. Note this is not а 2 factor of 3, but the origina! prob!eт size p!us 3. In the grand scheтe of things, іі an a!gorithт is inefficient, speeding uр the processor will not he!p. КЕУ CONCEPT Figиre 2.4 illustrates various growth functions graphically for re!ative!y sтall va!ues of n. Note that when n is sтall, there is !itt!e difference between the a!go- 500 �----т-,---�-- · • 400 · • · • • , зао • , · , ф Е і= • • , • • , І . , , 200 , . . І : І • . . , 100 , # , , , . е. • , . . . . . е. . • . .. . о о о о о : , � - • о . . . . о . . о о о о о о о о о о о о о о о о о о о о . . . . . . . . . . . . . . . - · ! !І - - - log n -п -••••• nlogn п2 ___ п3 • • • • 2n ..�• . - �-�� - �- - -=-=-==-=-=-=-==-=-=-=-�-=-=-==-=-��- -�-�� o ����'��···;��·;;;;-�-=-==- -==-=-=-=== 10 5 15 20 25 Input Size (Н) FIGURE 2.4 Coтparison of typica! growth functions for sтall va!ues of n 2.4 43 Deteгmining Тіте Complexity rithms. That is, іЕ уои can guarantee а very small prob!em size (5 or !ess), it doesn't [еаllу matter which a!gorithm is used. However, notice that in Figure 2.5, as n gets very !arge, the differences between the growth functions Ьесоmе obvious. 204 Determining Time Complexity Analyzing Loop Execution То determine the order оЕ an a!gorithm, we have to determine how often а particu!ar statement or set оЕ statements gets executed. Therefore, we often have to determine how many times the body оЕ а !оор is executed. То ana!yze !оор execution, first determine the order оЕ the body оЕ the !оор, and then mu!tip!y that Ьу the number оЕ times the !оор will execute re!ative to n. Кеер in mind that n repre­ sents the prob!em size. 200,000 -rl- .--- КЕУ CONCEPT Analyzing algorithm сотрlехіІу of­ Іеп requires analyzing the execution of loops. r--- "" Сі �o :0 • І ggц &Ш2Шifj • . .. - .. о .. . 150,000"1 • .. .. . .. .. . . .. . . Q) Е 100,000 lІ • . і= І • • 0 • І 50,000 -І • : • · : .. • • • • • • • • • . . . о . .. .. . . .. .. . о . . І - •• .е .. .. .. • • • . . . . . · o ���= �· ·; ; o .�=_�����====��==�==���=;���� ::� ;; ., FIGURE 2.5 І . 100 -- 200 300 Input Size (N) n log n •••• • п2 3 _ _ _ п . ' · , а , - - - log n --п 400 Comparison оЕ typica! growth functions for !arge va!ues оЕ n 500 • 2П t 44 CHAPТER 2 Analysis of Algorithms Assuming that the body of а lоор is 0(1), then а lоор such as this: for (int count = О; count < n; count++) { /* Боте sequence of 0(1) steps */ } would Ьауе О(п) time complexity. This is due to the fact that the body of the lоор has 0(1) complexity but is executed n times Ьу the lоор structure. Іп general, if а lоор structure steps through n items іп а linear fashion and the body of the lоор is 0(1), then the lоор is О(п). Еуеп іп а case where the lоор is designed to skip some number of elements, as long as the progression of elements to skip is linear, the lоор is still О(п). For ехатрlе, if the preceding lоор skipped every other пит­ ber (e.g. count += 2), the growth function of the lоор would Ье nl2, but since соп­ stants don't affect the asymptotic complexity, the order is still О(п). Let's look at another ехатрlе. If the progression of the lоор is logarithmic such as the following: count = 1 while (count < n) { count *= 2; /* Боте sequence of 0(1) steps */ } then the lоор is said to Ье O(log п). Note that when we use а loga­ rithm іп ап algorithm complexity, we almost always теап log base 2. This сап Ье explicitly written as 0(log n). Since еасЬ time through 2 the lоор the уаlие of count is multiplied Ьу 2, the number of times the lоор is executed is log n. 2 КЕУ CONCEPT The Ііте сотрlехіІу of а lоор is found Ьу multiplying the сотрlехіІу of the body of the lоор Ьу how тапу times the lоор will ехесиІе. Nested Loops А slightly more interesting scenario arises when loops are nested. Іп this case, we must multiply the complexity of the outer lоор Ьу the complexity of the inner lоор to find the resulting complexity. For ехатрlе, the following nested loops: for (int count = О; count < n; count++) { for (int count2 = О; count2 < n; count2++) { /* Боте sequence of 0(1) steps */ } 2.4 Determining Тіте Complexity ""оиlсі have complexity 0(n2). The Ьосіу оі the inner lоор is 0(1) anсі the inner lоор wi1I execute n times. This means the inner іоор is O(n). Multiplying this result Ьу the number оі times the outer іоор will execute (n) results in 0(n2). КЕУ CONCEPT The analysis of nested loops must take іпІо account both the іппег and outer loops. What is the complexity оі the following nested lоор? for (int count = О; count < n; count++) { for (int count2 = count; count2 < n; count2++) { /* some sequence of 0(1) steps */ } } In this case, the inner lоор index is initialized to the current уаlие оі the index for the outer lоор. The outer lоор executes n times. The inner lоор executes n times the first time, n-1 times the second time, etc. However, remember that we are only interested in the dominant term, not in constants or any lesser terms. If the progression is linear, regardless оі whether some elements are skipped, the or­ der is sti1I O(n). Thus the resulting complexity for this сосіе is 0(n2). Method Calls Let's suppose that we have the following segment оі сосіе: for (int count = О; count < n; count++) { printsum (count); } We know from our previous discussion that we find the order оі the lоор Ьу multiplying the order оі the Ьосіу оі the lоор Ьу the number оі times the іоор wi1I execute. In this case, however, the Ьосіу оі the lоор is а method саll. Therefore, we must first determine the order оі the method before we can determine the or­ der оі the code segment. Let's suppose that the purpose оі the method is to print the sum оі the integers from 1 to n each time it is called. We might Ье tempted to create а brute force method such as the following: public void printsum(int count) { int sum = О; for (int І = 1; І < count; sum += І; System.out.println (sum); } І++) 45 46 CHAPTER 2 Analysis of Algorithms What is the time complexity of this printsurn method? Кеер іп mind that only executable statements contribute to the time complexity so іп this case, аll of the executable statements асе 0(1) except for the loop. The loop оп the other hand is О(п) and thus the method itself is О(п). Now to compute the time сот­ plexity of the original loop that called the method, we simply multiply the complexity of the method, which is the body of the loop, Ьу the питЬес of times the loop wi1l execute. Оис result, then, is 0(п2 ) using . this implementation of the printsurn method. However, if уои сесаll, we know from ош earlier discussion that we do not have to use а loop to calculate the sum of the numbers from 1 to n. Іп fact, we know that the 2,�i n(n + 1)/2. Now let's rewrite ош printsurn method and see what hap­ pens to ош time complexity : = public void printsurn(int count) { Биrn = count*(count+1)/2; systern.out.println (Биrn); } Now the time complexity of the printsurn method is made ир of an assign­ ment statement which is 0(1) and а print statement which is also 0(1). The result of this change is that the time complexity of the printsurn method is now 0(1) meaning that the loop that calls this method now goes from being 0(n2) to O(n). We know from ош оис earlier discussion and from Figше 2.5 that this is а уесу significant improvement. Опсе again we see that there is а difference between delivering correct results and doing so efficiently. What if the body of а method is made ир of multiple method calls and loops? Consider the following code using ош printsum method аЬоуе: public void sarnple(int п) { printsurn(n); for (int count /* this rnethod call іБ = О; count < п; count++) /* this loop іБ О(п) count < п; count++) /* this loop іБ 0(п2) printsurn (count); for (int count = О; for (int count2 = О; count2 < Systern.out.println (count, п; count2++) 0(1) */ */ */ count2); } The initial саll to the printsurn method with the parameter ternp is 0(1) since the method is 0(1). The for loop containing the саН to the printsurn method with the parameter count is O(n) since the method is 0(1) and the loop executes 2.4 Determining Тime Complexity rimes. The nested loops are О(п2) since the inner lоор will execute n times each е the outer lоор executes and the outer lоор will also execute n times. The еп­ е method is then о(п2 ) since опlу the dominant term matters. More formally, the growth function for the method sample is given Ьу: f(x) = 1 + n + п2 Then given that we eliminate constants and аll but the dominant term, the time ;:omplexity is О(п2). There is опе additional issue to deal with when analyzing the time complexity . method calls and that is recursion, the situation when а method calls itself. We will save that discussion for Chapter 7. 47 48 CHAPTER 2 Analysis of Algorithms SIIlППНН'У 0(' Кеу Сопеерts • Software must make efficient use of resources such as CPU time and memory. • AIgorithm analysis is а fundamental computer science topic. • А growth fиnction shows time or space utilization relative to the problem size. • The order of ап algorithm is found Ьу eliminating constants and аН but the dominant term іп the algorithm's growth function. • The order of ап algorithm provides ап upper bound to the algorithm's growth function. • If the algorithm is inefficient, а faster processor will not help іп the long run. • Analyzing algorithm complexity often requires analyzing the execution of loops. • The time complexity of а lоор is found Ьу multiplying the complexity of the body of the lоор Ьу how mапу times the lоор will execute. • The analysis of nested loops must take into account both the inner and outer loops.