(5 pts) Exercise 1-51

advertisement
(5 pts) Exercise 1-51
•
1. Program A runs in 10 seconds on a machine with a 100 MHz clock.
How many clock cycles does program A require?
Cycle time = 1 / 100*106 = 10ns
Time = #cycles * ClockCycleTime
10 s = #cycles * 10 * 10-9 s
# cycles = 1 * 109
Or Time = # cycle / ClockRate
#cycles = Time * ClockRate = 10 * 100*106
(5 pts) Exercise 1-52
•
•
2. ) Our favorite program runs in 10 seconds on computer A, which has a
400 Mhz. clock. We are trying to help a computer designer build a new (faster)
machine B. The designer can use new technology to substantially increase the clock
rate, but has informed us that this increase will affect the rest of the CPU design,
causing machine B to require 1.2 times as many clock cycles as machine A for the same
program. What clock rate is therefore necessary for machine B, if we want it to be
able to run our “favorite program” in just 6 seconds?
3.) Why might machine B need more clock cycles to run the program?
CPUtime B 
CPUClockCycles A
CPUtime A 
ClockRate A
CPUClockCycles A
10 
400 *10 6
CPUClockCycles A  10 * (400 * 10 )
6
CPUClockCycles B
ClockRate B
6
1.2 * CPUClockCycles A
ClockRate B
6
1.2 * 10 * (400 * 10 6 )
ClockRate B
ClockRate B 
1.2 * 10 * (400 * 10 6 )
 800 * 10 6  800 MHz
6
Why more cycles? Speeding up the clock rate may force some instructions to
take more than one cycle (since there is less time in each cycle to get useful
work done).
(10 pts) Exercise 1-53
•
We wish to compare the performance of two different computers: M1
and M2. The following measurements have been made on these
computers:
Time on M1
Time on M2
Program 1
2.0 seconds
1.5 seconds
Program 2
5.0 seconds
10.0 seconds
Which computer is faster for each program, and how many times as
fast is it?
For P1, M2 is 4/3 (2 sec/1.5 sec) times as fast as M1.
For P2, M1 is 2 times (10/5.0) times as fast as M2.
(10 pts) Exercise 1-56
•
Consider the machines from the previous exercise, and assume the
following additional measurements were made:
Instructions executed on M1
Program 1
5 x 109
Instructions executed on M2
6 x 109
What is the instruction execution rate (instructions per second) for
each computer when running program 1?
For M1, inst rate = (5 x 109 instructions / 2.0 seconds) = 2.5 x 109 IPS
For M2, inst rate = (6 x 109 instructions / 1.5 seconds) = 4 x 109 IPS
(10 pts) Exercise 1-57
•
Suppose that M1 from Exercise 1-53 costs $500 and M2 costs $800. If
you needed to run Program 1 a large number of times, which
computer would you buy in large quantities? Why?
M2 runs 4/3 as fast as M1, but it costs 8/5 as much. Since 8/5 is more than 4/3, M1 is the better
value.
NOTE 1: instructions per second (or MIPS) does not equal performance, as we discussed in
class. So you should have used the results from Exercise 1-53 for this problem, not the inst/s
numbers from Exercise 1-56.
NOTE 2: Some students tried to compare the machines using the ratio Time / Cost.
That doesn’t work, because we want time AND cost to be small.
The right ratio would be to compare Performance / Cost (where Performance = 1/Time):
Perf. Ratio1 = (1 / 2 seconds) / $500
= 0.001
Perf. Ratio2 = (1 / 1.5 seconds) / $800
= 0.000833
So machine 1 has a better Perf. vs. Cost ratio
(5 pts) Exercise 1-61: “MIPS”
•
•
•
Two different compilers are being tested for a 100 MHz. machine that has three different
classes of instructions: Class A, Class B, and Class C, which require one, two, and three
cycles (respectively). Both compilers are used to produce code for a large piece of
software.
Compiler #1: code uses 5 million Class A instructions, 1 million Class B instructions,
and 1 million Class C instructions.
Compiler #2: code uses 10 million Class A instructions, 1
million Class B instructions, and 1 million Class C instructions.
Which sequence will be faster according to execution time?
Which sequence will be faster according to MIPS?
MIPS = Inst. Count / (ExecutionTime * 106)
Time #1: NumCycles / ClockRate = (5*1 + 1*2 + 1*3) * 106 / 100 * 106 = 10 / 100 = 0.1 s
Time #2: NumCycles / ClockRate = (10*1 + 1*2 + 1*3) * 106 / 100 * 106 = 15 / 100 = 0.15 s
MIPS #1: (5 + 1 + 1) * 106 / 0.1s * 106 = 7 / 0.1 = 70 MIPS
MIPS #1: (10 + 1 + 1) * 106 / 0.15s * 106 = 12 / .15 = 80 MIPS
MIPS does not equal faster!
(5 pts) Exercise 1-62
•
•
Program A runs in 0.34 seconds on a 500 Mhz machine. You know
that this program requires 100 million instructions of which:
– 10% are mult. instructions that take an unknown number of cycle
– 60% are other arithmetic instructions taking 1 cycle
– 30% are memory instructions taking 2 cycles
How many cycles does a multiplication take on this machine?
Strategy – computed needed CPI
Time
0.34
0.34
CPI
= IC * CPI * CycTime
= 100 x 106 inst * CPI * (1 / 500 * 106 cyc)
= 100 * CPI / 500
= 0.34 * 5 = 1.7
CPI
1.7
0.5
M
= 0.1 * M + 0.6 * 1 + 0.3 * 2
= 0.1M + 0.6 + 0.6
= 0.1M
= 5 cycles
(5 pts) Exercise 1-63
•
•
Program A runs in 2 seconds on a certain machine. You know that
this program requires 500 million instructions of which:
– 30% are multiplication instructions that take 10 cycles
– 40% are other arithmetic instructions taking 1 cycle
– 30% are memory instructions taking 2 cycles
Suppose multiplication could be improved to take just 1 cycle. How
much faster would the new machine be compared to the old?
Can just compare CPI – cycle time is the same
CPI1 = 0.3 * 10 + 0.4 * 1 + 0.3 * 2 = 0.1 * (30 + 4 + 6) = 4
CPI2 = 0.3 * 1 + 0.4 * 1 + 0.3 * 2 = 0.1 * (3 + 4 + 6) = 1.3
Speedup = 4 / 1.3 = 3.07 times faster
(10 pts) Exercise 1-66
•
Consider two different implementations, P1 and P2, of the same
instruction set. There are five classes of instructions (A-E), which
have the following average CPI on the two machines
CPI on P1
CPI on P2
Class A
1
2
Class B
2
2
Class C
3
2
Class D
4
4
Class E
3
4
P1 has a clock rate of 4 GHz, P2 has a clock rate of 6 GHz.
If the number of instructions executed in a certain program is divided
equally among the classes of instructions except for class A, which
occurs twice as often as each of the others, how much faster is P2
than P1?
Average CPI of P1 = (2 * 1 + 2 + 3 + 4 + 3) / 6 = 7/3
Average CPI of P2 = (2 * 2 + 2 + 2 + 4 + 4) / 6 = 8/3
P2 is then [ (6 x 109 cyc/sec) * (8/3 cyc/inst) ] / [ (4 x 109 cyc/sec) * (7/3 cyc/inst) ]
= 21/16 = 1.3125 times faster than P1.
(10 pts) Exercise 1-67
•
Suppose you wish to run a program P with 7.5 x 109 instructions on a 5 GHz
machine with a CPI of 0.8.
a. What is the expected CPU time?
Time
= (seconds/cycle) * (cyc / inst) * (Number of instructions)
= (1 sec/5x109 cyc) * (0.8 cyc/inst) * (7.5*109 inst)
= 1.2 seconds
b. When you run P, it takes 3 seconds of wall clock time to complete. What is
the percentage of the CPU time P received?
P received 1.2 seconds / 3 seconds = 40% of the total CPU time.
(5 pts) Exercise 1-71
•
Suppose we enhance a machine making all floating-point instructions run
five times faster. If the execution time of some benchmark before the
floating-point enhancement is 10 seconds, what will the speedup be if 4
seconds of the 10 seconds is spent executing floating-point instructions?
•
Formula:
Time after Improve. = Exec. Time Unaffected +( Exe. Time Affected / Amount of Improvement)
Time = 6s other
= 6.8 sec
+ (4 s
/
5)
Overall speedup = Original time / new time = 10/6.8 = 1.47 times faster
Note that there are two “speedups” involved here – the FP unit is speedup by a factor
of 5, which results in an overall speedup of 1.47. In the formula, “amount of
improvement” refers to the speedup of a particular component (like the FP unit).
(5 pts) Exercise 1-72
•
We are looking for a benchmark to show off the new floating-point unit
described above (which makes floating point 5 times faster), and want the
overall benchmark to show a speedup of 3. One benchmark we are
considering runs for 100 seconds with the old floating-point hardware. How
much of the execution time would floating-point instructions have to account
for in this program in order to yield our desired speedup on this benchmark?
X is percent floating point
N=5
33 s = 100(1-x) + 100x / 5
33 s = 100 – 100x + 20x
80x = 67
x = 67/80 = 83.75%
(10 pts) Exercise 1-75 (use Amdahl’s Law)
•
•
You are going to enhance a computer, and there are two possible improvements: either
make multiply instructions run four times faster than before, or make memory access
instructions run two times faster than before.
You repeatedly run a program that takes 100 seconds to execute. Of this time, 20% is
used for multiplication, 50% for memory access instructions, and 30% for other tasks.
What will the speedup be if you improve only multiplication?
New time = (Time unaffected) + (Time affected / improvement)
= 80s + (20s / 4) = 85s
Speedup = (old time / new time) = 100/85 = 1.18 times faster
SEE ALSO Ex 1-71 for discussion on what speedups to use)
•
What will the speedup be if you improve only memory access?
New time = 50s + (50s / 2) = 75s
Speedup = (old time / new time) = 100/75 = 1.33 times faster
•
What will the speedup be if both improvements are made?
30 seconds is unaffected. 20 seconds speeds up by factor of 4, and
50 seconds speeds up by factor 2. These are separate effects.
New time = 30s + (20s / 4) + (50s / 2) = 60s
Speedup = (old time / new time) = 100/60 = 1.67 times faster
Download
Study collections