Assignment 1 Chapter 2: 1-6, 9, 13, 18, 26, 31, 32 John Visser CS 2560 TA 1-13-2002 2.1 Which program is faster for each program and by how much? Program 1 2 Time on M1 10 seconds 3 seconds Time on M2 5 seconds 4 seconds For program 1, M2 is twice (10s/5s = 2) as fast as M1. For program 2, M1 is 1.33 times (4s/3s = 1.33) as fast as M2. 2.2 Find the instruction rate for each machine for program one given the following information: Program 1 Instructions on M1 200 x 106 Instructions on M2 160 x 106 Instructions per second for M1: 200x106 instr. / 10 seconds = 20x106 instr./second. Instructions per second for M2: 160x106 instr. / 5 seconds = 32x106 instr./second. 2.3 Find the CPI for program 1 if the clock rate for M1 is 200 MHz and the clock rate for M2 is 300 MHz. CPI on M1 10 s 200 10 6 cycles 10 cpi 1s 200 10 6 instructio ns CPI on M2 5s 300 10 6 cycles 9.375 cpi 1s 160 10 6 instructio ns 2.4 Assume that the CPI for M1 and M2 are the same for program 2 as they were for program 1. Find the instruction count for program 2 on M1 and M2. M 1 instructio n count 1 instructio n 200 10 6 cycles 3 s 60 10 6 instructio ns 10 cycles 1s M 2 instructio n count 1 instructio n 300 10 6 cycles 4 s 128 10 6 instructio ns 9.375 cycles 1s 2.5 Suppose that M1 costs $10,000 and M2 costs $15,000. If you are interested in running program 1 a large number of times, which machine should you buy in large quantities? Purchasing machine 2 would certainly be more cost effective. While machine two costs one and a half times more, it runs program 1 twice as fast. This means that for every model of M2 running program 1, two models of M1 would be required to get the same throughput as the M2 machines. The cost of two models of M1 would be $20,000, which is $5,000 more than the cost of a single M2 machine. M2 should definitely be purchased in large quantities. 2.6 Could you use cost multiplied by execution time as a metric to help in a purchasing decision? How about the cost divided by the execution time? Multiplying the cost by the execution time is a valid metric because we are attempting to minimize those two quantities, and we can simply choose the machine that has the smallest product of cost and execution time. On the other hand, dividing cost by execution time is an invalid metric. For example, if we divide and use the value that is highest, then we could select the most expensive machine, and if we divide and use the value that is lowest, then we could be selecting the machine that has the longest execution time. 2.9 Assume that program 1 must be executed 200 times per hour and that the rest of the time can be spent executing program 2. Which machine is faster for this workload? Which machine is more cost effective? (Performance, in this case, is measured as the throughput for program 2.) 3600 executions of P 2 on M 1 3600 executions of P 2 on M 2 s s 200 10 hour exec. of P1 1600 533 sec onds 3 3 exec. of P 2 s s 200 5 hour exec. of P1 2600 650 sec onds 4 4 exec. of P 2 so, M2 is 650/533 = 1.2 times faster than M1. Cost effectiveness is measured as throughput per dollar, so: M1 cost effectiveness = 533/10,000 = .053 M2 cost effectiveness = 650/15,000 = .043 M1 is more cost effective than M2 for this particular job. 2.13 Class A B C The clock rate of M1 is 400 MHz and the clock rate for M2 is 200 MHz, and the amount that a compiler uses a particular class of instruction is shown below. CPI on M1 4 6 8 CPI on M2 2 4 3 C1 Usage 30% 50% 20% C2 Usage 30% 20% 50% 3rd Party Usage 50% 30% 20% Using compiler usage C1 on both M1 and M2, how much faster does M1 appear to be over M2? (I assumed that the program was made up of 10 instructions for all calculations.) M 1 cycles (3 class A instr. 4 CPI ) (5 class A instr. 6 CPI ) (2 class A instr. 8 CPI ) 58 cycles 1s M 1 exec. time 58 cycles .145 10 6 s 145 ns 6 400 10 cycles M 2 cycles (3 class A instr. 2 CPI ) (5 class A instr . 4 CPI ) (2 class A instr. 3 CPI ) 32 cycles 1s M 2 exec. time 32 cycles .160 10 6 s 160 ns 6 200 10 cycles M 2 exec. time 160 ns 1.10 the makers of M1 can say that is is 10% faster tha n M2. M 1 exec. time 145 ns Using C2 on both M1 and M2, how much faster is M2 compared to M1? M 1 cycles (3 class A instr . 4 CPI ) (2 class A instr . 6 CPI ) (5 class A instr . 8 CPI ) 64 cycles M 1 exec. time 64 cycles 1s .160 10 6 s 160 ns 400 10 6 cycles M 2 cycles (3 class A instr . 2 CPI ) (2 class A instr . 4 CPI ) (5 class A instr . 3 CPI ) 29 cycles 1s M 2 exec. time 29 cycles .145 10 6 s 145 ns 6 200 10 cycles M 1 exec. time 160 ns 1.10 the makers of M2 can say that is is 10% times faster tha n M2. M 2 exec. time 145 ns If you purchase M1, which compiler would you use? If you purchase M2, which compiler would you use? Which machine would you purchase if we assume that all other criteria are identical, including costs? First, calculate the performance of M1 versus M2 with the 3rd party compiler: M 1 cycles (5 class A instr . 4 CPI ) (3 class A instr . 6 CPI ) (2 class A instr . 8 CPI ) 54 cycles 1s M 1 exec. time 54 cycles .135 10 6 s 135 ns 400 10 6 cycles M 2 cycles (5 class A instr . 2 CPI ) (3 class A instr . 4 CPI ) (2 class A instr . 3 CPI ) 29 cycles 1s M 2 exec. time 28 cycles .140 10 6 s 140 ns 6 200 10 cycles M 2 exec. time 140 ns 1.037 according the the 3rd party compiler, M1 is just slightly (4%) faster tha n M2. M 1 exec. time 135 ns Given this data, I would use the 3rd party compiler with BOTH machines one and two because the programs from this compiler would run faster than the programs produced by any compilers by the makers of M1 and M2. If M1 and M2 were identical in all other criteria, I would select M1 because it is slightly faster (1.04 times). 2.18 The instruction classes, CPI, and instruction frequency for two different machines is shown below. If the first, Mbase, has a clock rate of 500 MHz, and the second, Mopt, has a clock rate of 600 MHz, what is the CPI for each machine? MBase – 500 MHz Instr. Class CPI A 2 B 3 C 3 D 5 Frequency 40% 25% 25% 10% Mopt – 600 MHz Instr. Class CPI A 2 B 3 C 3 D 5 Frequency 40% 25% 25% 10% Mbase CPI (2 cpi .4) (3 cpi .25) (3 cpi .25) (5 cpi .10) 2.8 cpi overall for MBase Mopt CPI (2 cpi .4) (2 cpi .25) (3 cpi .25) (4 cpi .10) 2.45 cpi overall for Mopt 2.26 Which machine is faster according to total execution time and by how much? Program FLOP 1 2 10,000,000 100,000,000 Computer A 1 1000 Exec. time in seconds Computer B 10 100 Computer C 20 20 Total execution time of computer A is (1000+1) = 1001 seconds; computer B, (100+10) = 110 seconds; computer C, (20+20) = 40 seconds. Computer C is fastest. It’s (1001/40) = 25 times faster than computer A and (110/40) = 2.75 times faster than computer B. 2.31 Assume that multiply instructions take 12 cycles and account for 10% of the instructions in a typical program and that the other 90% of the instructions require an average of 4 cycles for each instruction. What percentage of time does the CPU spend doing multiplication? Assume 100 instructions, then the number of cycles will be 90 4 + 10 12 = 480 cycles. Of these, 120 are spent doing multiplication, and therefore 25% of the time is spent doing multiplication. 2.32 Your hardware engineering team has indicated that it would be possible to reduce the number of cycles required for multiplication to 6 in exercise 2.31, but this will require a 20% increase in the cycle time. Nothing else will be affected. Should they proceed with the modification? Without modifications, it takes 480 cycles for 100 instructions. With the modification, it will only take 90 4 + 10 6 = 420 cycles. However, our cycle length has increased by 20 %, so we have to conclude that we should NOT make the modification. The unmodified architecture is 1.2 420 / 480 = 1.05 times faster than the “improved” architecture.