(5 pts) Exercise 1-51 • 1. Program A runs in 10 seconds on a machine with a 100 MHz clock. How many clock cycles does program A require? Cycle time = 1 / 100*106 = 10ns Time = #cycles * ClockCycleTime 10 s = #cycles * 10 * 10-9 s # cycles = 1 * 109 Or Time = # cycle / ClockRate #cycles = Time * ClockRate = 10 * 100*106 (5 pts) Exercise 1-52 • • 2. ) Our favorite program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We are trying to help a computer designer build a new (faster) machine B. The designer can use new technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate is therefore necessary for machine B, if we want it to be able to run our “favorite program” in just 6 seconds? 3.) Why might machine B need more clock cycles to run the program? CPUtime B CPUClockCycles A CPUtime A ClockRate A CPUClockCycles A 10 400 *10 6 CPUClockCycles A 10 * (400 * 10 ) 6 CPUClockCycles B ClockRate B 6 1.2 * CPUClockCycles A ClockRate B 6 1.2 * 10 * (400 * 10 6 ) ClockRate B ClockRate B 1.2 * 10 * (400 * 10 6 ) 800 * 10 6 800 MHz 6 Why more cycles? Speeding up the clock rate may force some instructions to take more than one cycle (since there is less time in each cycle to get useful work done). (10 pts) Exercise 1-53 • We wish to compare the performance of two different computers: M1 and M2. The following measurements have been made on these computers: Time on M1 Time on M2 Program 1 2.0 seconds 1.5 seconds Program 2 5.0 seconds 10.0 seconds Which computer is faster for each program, and how many times as fast is it? For P1, M2 is 4/3 (2 sec/1.5 sec) times as fast as M1. For P2, M1 is 2 times (10/5.0) times as fast as M2. (10 pts) Exercise 1-56 • Consider the machines from the previous exercise, and assume the following additional measurements were made: Instructions executed on M1 Program 1 5 x 109 Instructions executed on M2 6 x 109 What is the instruction execution rate (instructions per second) for each computer when running program 1? For M1, inst rate = (5 x 109 instructions / 2.0 seconds) = 2.5 x 109 IPS For M2, inst rate = (6 x 109 instructions / 1.5 seconds) = 4 x 109 IPS (10 pts) Exercise 1-57 • Suppose that M1 from Exercise 1-53 costs $500 and M2 costs $800. If you needed to run Program 1 a large number of times, which computer would you buy in large quantities? Why? M2 runs 4/3 as fast as M1, but it costs 8/5 as much. Since 8/5 is more than 4/3, M1 is the better value. NOTE 1: instructions per second (or MIPS) does not equal performance, as we discussed in class. So you should have used the results from Exercise 1-53 for this problem, not the inst/s numbers from Exercise 1-56. NOTE 2: Some students tried to compare the machines using the ratio Time / Cost. That doesn’t work, because we want time AND cost to be small. The right ratio would be to compare Performance / Cost (where Performance = 1/Time): Perf. Ratio1 = (1 / 2 seconds) / $500 = 0.001 Perf. Ratio2 = (1 / 1.5 seconds) / $800 = 0.000833 So machine 1 has a better Perf. vs. Cost ratio (5 pts) Exercise 1-61: “MIPS” • • • Two different compilers are being tested for a 100 MHz. machine that has three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. Compiler #1: code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Compiler #2: code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. Which sequence will be faster according to execution time? Which sequence will be faster according to MIPS? MIPS = Inst. Count / (ExecutionTime * 106) Time #1: NumCycles / ClockRate = (5*1 + 1*2 + 1*3) * 106 / 100 * 106 = 10 / 100 = 0.1 s Time #2: NumCycles / ClockRate = (10*1 + 1*2 + 1*3) * 106 / 100 * 106 = 15 / 100 = 0.15 s MIPS #1: (5 + 1 + 1) * 106 / 0.1s * 106 = 7 / 0.1 = 70 MIPS MIPS #1: (10 + 1 + 1) * 106 / 0.15s * 106 = 12 / .15 = 80 MIPS MIPS does not equal faster! (5 pts) Exercise 1-62 • • Program A runs in 0.34 seconds on a 500 Mhz machine. You know that this program requires 100 million instructions of which: – 10% are mult. instructions that take an unknown number of cycle – 60% are other arithmetic instructions taking 1 cycle – 30% are memory instructions taking 2 cycles How many cycles does a multiplication take on this machine? Strategy – computed needed CPI Time 0.34 0.34 CPI = IC * CPI * CycTime = 100 x 106 inst * CPI * (1 / 500 * 106 cyc) = 100 * CPI / 500 = 0.34 * 5 = 1.7 CPI 1.7 0.5 M = 0.1 * M + 0.6 * 1 + 0.3 * 2 = 0.1M + 0.6 + 0.6 = 0.1M = 5 cycles (5 pts) Exercise 1-63 • • Program A runs in 2 seconds on a certain machine. You know that this program requires 500 million instructions of which: – 30% are multiplication instructions that take 10 cycles – 40% are other arithmetic instructions taking 1 cycle – 30% are memory instructions taking 2 cycles Suppose multiplication could be improved to take just 1 cycle. How much faster would the new machine be compared to the old? Can just compare CPI – cycle time is the same CPI1 = 0.3 * 10 + 0.4 * 1 + 0.3 * 2 = 0.1 * (30 + 4 + 6) = 4 CPI2 = 0.3 * 1 + 0.4 * 1 + 0.3 * 2 = 0.1 * (3 + 4 + 6) = 1.3 Speedup = 4 / 1.3 = 3.07 times faster (10 pts) Exercise 1-66 • Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A-E), which have the following average CPI on the two machines CPI on P1 CPI on P2 Class A 1 2 Class B 2 2 Class C 3 2 Class D 4 4 Class E 3 4 P1 has a clock rate of 4 GHz, P2 has a clock rate of 6 GHz. If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1? Average CPI of P1 = (2 * 1 + 2 + 3 + 4 + 3) / 6 = 7/3 Average CPI of P2 = (2 * 2 + 2 + 2 + 4 + 4) / 6 = 8/3 P2 is then [ (6 x 109 cyc/sec) * (8/3 cyc/inst) ] / [ (4 x 109 cyc/sec) * (7/3 cyc/inst) ] = 21/16 = 1.3125 times faster than P1. (10 pts) Exercise 1-67 • Suppose you wish to run a program P with 7.5 x 109 instructions on a 5 GHz machine with a CPI of 0.8. a. What is the expected CPU time? Time = (seconds/cycle) * (cyc / inst) * (Number of instructions) = (1 sec/5x109 cyc) * (0.8 cyc/inst) * (7.5*109 inst) = 1.2 seconds b. When you run P, it takes 3 seconds of wall clock time to complete. What is the percentage of the CPU time P received? P received 1.2 seconds / 3 seconds = 40% of the total CPU time. (5 pts) Exercise 1-71 • Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if 4 seconds of the 10 seconds is spent executing floating-point instructions? • Formula: Time after Improve. = Exec. Time Unaffected +( Exe. Time Affected / Amount of Improvement) Time = 6s other = 6.8 sec + (4 s / 5) Overall speedup = Original time / new time = 10/6.8 = 1.47 times faster Note that there are two “speedups” involved here – the FP unit is speedup by a factor of 5, which results in an overall speedup of 1.47. In the formula, “amount of improvement” refers to the speedup of a particular component (like the FP unit). (5 pts) Exercise 1-72 • We are looking for a benchmark to show off the new floating-point unit described above (which makes floating point 5 times faster), and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark? X is percent floating point N=5 33 s = 100(1-x) + 100x / 5 33 s = 100 – 100x + 20x 80x = 67 x = 67/80 = 83.75% (10 pts) Exercise 1-75 (use Amdahl’s Law) • • You are going to enhance a computer, and there are two possible improvements: either make multiply instructions run four times faster than before, or make memory access instructions run two times faster than before. You repeatedly run a program that takes 100 seconds to execute. Of this time, 20% is used for multiplication, 50% for memory access instructions, and 30% for other tasks. What will the speedup be if you improve only multiplication? New time = (Time unaffected) + (Time affected / improvement) = 80s + (20s / 4) = 85s Speedup = (old time / new time) = 100/85 = 1.18 times faster SEE ALSO Ex 1-71 for discussion on what speedups to use) • What will the speedup be if you improve only memory access? New time = 50s + (50s / 2) = 75s Speedup = (old time / new time) = 100/75 = 1.33 times faster • What will the speedup be if both improvements are made? 30 seconds is unaffected. 20 seconds speeds up by factor of 4, and 50 seconds speeds up by factor 2. These are separate effects. New time = 30s + (20s / 4) + (50s / 2) = 60s Speedup = (old time / new time) = 100/60 = 1.67 times faster