1 - Huaxia Xia

advertisement
CSE240, Problem Set 1, 20001003
----By: Huaxia Xia (PID A03598487)
1.(H&P1.1)
Answer:
a. Let x be the “Percent vertorization” and y be the “Net speedup”. According to the
Amdahl’s Law, we have:
y = 1/((1-x) + x/20)
= 1/(1-0.95x)
So, if x is 10%, then y = 1.20;
If x is 30%, then y = 1.39;
If x is 50%, then y = 1.90;
The maximum speedup is y = 20, when
x equals to 1.
b. Speedup of 2 means
1/(1-0.95x) = 2
 1 – 0.95x = 0.5
 x = 0.526 = 52.6%
c. The maximum speedup attainable is 20 (we got it in (a)!), so the half of it is 10. To
get it, the following equation should be satisfied:
1/(1-0.95x) = 10
 1 – 0.95x = 0.1
 x = 0.947 = 94.7%
d. The original percentage of vectorization is 70%.
If double the speed of the vector rate, i.e., the speed of vector mode will be 40 times
of normal mode, then the speedup will be:
y1 = 1/(0.3 + 0.7/40) = 3.150
If we want to attain the same speedup by increasing the percentage of vectorization,
from (a) we get:
3.150 = 1/(1-0.95x)
 x = 0.718 = 71.8%
Obviously, it is easier to improve the percentage of vectorization from 70% to 71.8%
than to double the speed of the vector mode. So, I will choose to improve the
percentage of vectorization.
2.(H&P 1.2)
Answer:
a. We have two ways to get the answer: one is to suppose the original percentage of
the enhanced part to be x, then we get: (1-x) / 0.1x = 50%/50% = 1 and we will
get the final answer. But here we use another method:
We know that in the enhanced-system, the enhanced part is 50%; then the
“SPEEDUP” from the ENHANCED-system to the ORIGINAL-system will be:
1/((1-0.5) + 0.5/0.1) = 2/11
Thus, the speedup from the ORIGINAL-system to the ENHANCED-system will
be 5.5.
b. Suppose the original percentage of the enhanced part to be x, From the Amdahl’s
Law we get:
5.5 = 1/(1-x + x/10)  1-0.9x = 1/5.5  x = 0.909 = 90.9%
3.
Answer:
We’ve got:
Original: CPI=2.0; original percentage of division = 1%; Assume IC to be ICo and clock
cycle time to be ClockCycleo.
a. Modified: 1 division  25 instructions; Clock cycle improves 10%; CPI of
modified system is 1 for 40% instructions and 2 for other 60% instructions.
CPUtimeoriginal = ICo * ClockCycleo * CPIo = 2 * ICo * ClockCycleo
CPUtimemodified = IC1 * ClockCyclemodified * CPI1 + IC2 * ClockCyclemodified * CPI2
= ICo * (1-1%+25%) * 40% * ClockCycleo * (1-10%) * 1
+ ICo * (1-1%+25%) * 60% * ClockCycleo * (1-10%) * 2
= 1.7856 * ICo * ClockCycleo
Speedup
= CPUtimeoriginal / CPUtimemodified = 2/1.7856 = 1.12
Thus, the modified system will be 12% faster than the original one, the
modifications can be made.
b. Modified: 1 division  50 instructions; Clock cycle improves 5%; CPI of
modified system is 1 for 40% instructions and 2 for other 60% instructions.
CPUtimemodified = IC1 * ClockCyclemodified * CPI1 + IC2 * ClockCyclemodified * CPI2
= ICo * (1-1%+50%) * 40% * ClockCycleo * (1-5%) * 1
+ ICo * (1-1%+50%) * 60% * ClockCycleo * (1-5%) * 2
= 2.2648 * ICo * ClockCycleo
Speedup
= CPUtimeoriginal / CPUtimemodified = 2/2.2648 = 0.883
That means the performance decreases 11.7% after the modification. The
division instruction should not be removed.
c. Assume the true CPI of the 40% floating-point instructions is CPItrue. Then we
have:
CPUtimetrue = IC1 * ClockCyclemodified * CPItrue + IC2 * ClockCyclemodified * CPI2
= ICo * (1-1%+50%) * 40% * ClockCycleo * (1-5%) * CPItrue
+ ICo * (1-1%+50%) * 60% * ClockCycleo * (1-5%) * 2
= 1.49*0.95*(0.4* CPItrue + 1.2) * ICo * ClockCycleo
While we have known that:
CPUtimetrue = CPUtimeoriginal / (1-27%) = 2.740 * ICo * ClockCycleo
Thus we get: 1.49*0.95*(0.4* CPItrue + 1.2) = 2.740
 CPItrue = 1.84
4.
Answer:
a. From
CPUtime = IC * Clock Cycle * CPI
we get:
IC = CPUtime / (Clock Cycle*CPI )
In the case without co-processor:
ICnoco = 12sec/(10-9sec * 3.0) = 4 * 109
In the case with co-processor:
ICco = 1sec/(10-9sec * 5.0) = 2 * 108
Thus the MIPS rating of each run is:
MIPSnoco = 4 * 109 /(12sec * 106) = 333.33
MIPSco = 2 * 108 /(1 sec * 106) = 200
b. The total number of the integer operations other than those to simulate the
floating-point operation is(we can get it thinking about case of with co-processor):
2 * 108 – 2 * 106 = 1.98 * 108.
Thinking about the case without co-processor, we get the number of integer
operations simulating the floating-point operation:
4 * 109 – 1.98 * 108 = 3.802 * 109
Thus, to simulate each floating-point operation, we need integer operations of
3.802 * 109 / 2 * 106 = 1901
Download