CSE240, Problem Set 1, 20001003 ----By: Huaxia Xia (PID A03598487) 1.(H&P1.1) Answer: a. Let x be the “Percent vertorization” and y be the “Net speedup”. According to the Amdahl’s Law, we have: y = 1/((1-x) + x/20) = 1/(1-0.95x) So, if x is 10%, then y = 1.20; If x is 30%, then y = 1.39; If x is 50%, then y = 1.90; The maximum speedup is y = 20, when x equals to 1. b. Speedup of 2 means 1/(1-0.95x) = 2 1 – 0.95x = 0.5 x = 0.526 = 52.6% c. The maximum speedup attainable is 20 (we got it in (a)!), so the half of it is 10. To get it, the following equation should be satisfied: 1/(1-0.95x) = 10 1 – 0.95x = 0.1 x = 0.947 = 94.7% d. The original percentage of vectorization is 70%. If double the speed of the vector rate, i.e., the speed of vector mode will be 40 times of normal mode, then the speedup will be: y1 = 1/(0.3 + 0.7/40) = 3.150 If we want to attain the same speedup by increasing the percentage of vectorization, from (a) we get: 3.150 = 1/(1-0.95x) x = 0.718 = 71.8% Obviously, it is easier to improve the percentage of vectorization from 70% to 71.8% than to double the speed of the vector mode. So, I will choose to improve the percentage of vectorization. 2.(H&P 1.2) Answer: a. We have two ways to get the answer: one is to suppose the original percentage of the enhanced part to be x, then we get: (1-x) / 0.1x = 50%/50% = 1 and we will get the final answer. But here we use another method: We know that in the enhanced-system, the enhanced part is 50%; then the “SPEEDUP” from the ENHANCED-system to the ORIGINAL-system will be: 1/((1-0.5) + 0.5/0.1) = 2/11 Thus, the speedup from the ORIGINAL-system to the ENHANCED-system will be 5.5. b. Suppose the original percentage of the enhanced part to be x, From the Amdahl’s Law we get: 5.5 = 1/(1-x + x/10) 1-0.9x = 1/5.5 x = 0.909 = 90.9% 3. Answer: We’ve got: Original: CPI=2.0; original percentage of division = 1%; Assume IC to be ICo and clock cycle time to be ClockCycleo. a. Modified: 1 division 25 instructions; Clock cycle improves 10%; CPI of modified system is 1 for 40% instructions and 2 for other 60% instructions. CPUtimeoriginal = ICo * ClockCycleo * CPIo = 2 * ICo * ClockCycleo CPUtimemodified = IC1 * ClockCyclemodified * CPI1 + IC2 * ClockCyclemodified * CPI2 = ICo * (1-1%+25%) * 40% * ClockCycleo * (1-10%) * 1 + ICo * (1-1%+25%) * 60% * ClockCycleo * (1-10%) * 2 = 1.7856 * ICo * ClockCycleo Speedup = CPUtimeoriginal / CPUtimemodified = 2/1.7856 = 1.12 Thus, the modified system will be 12% faster than the original one, the modifications can be made. b. Modified: 1 division 50 instructions; Clock cycle improves 5%; CPI of modified system is 1 for 40% instructions and 2 for other 60% instructions. CPUtimemodified = IC1 * ClockCyclemodified * CPI1 + IC2 * ClockCyclemodified * CPI2 = ICo * (1-1%+50%) * 40% * ClockCycleo * (1-5%) * 1 + ICo * (1-1%+50%) * 60% * ClockCycleo * (1-5%) * 2 = 2.2648 * ICo * ClockCycleo Speedup = CPUtimeoriginal / CPUtimemodified = 2/2.2648 = 0.883 That means the performance decreases 11.7% after the modification. The division instruction should not be removed. c. Assume the true CPI of the 40% floating-point instructions is CPItrue. Then we have: CPUtimetrue = IC1 * ClockCyclemodified * CPItrue + IC2 * ClockCyclemodified * CPI2 = ICo * (1-1%+50%) * 40% * ClockCycleo * (1-5%) * CPItrue + ICo * (1-1%+50%) * 60% * ClockCycleo * (1-5%) * 2 = 1.49*0.95*(0.4* CPItrue + 1.2) * ICo * ClockCycleo While we have known that: CPUtimetrue = CPUtimeoriginal / (1-27%) = 2.740 * ICo * ClockCycleo Thus we get: 1.49*0.95*(0.4* CPItrue + 1.2) = 2.740 CPItrue = 1.84 4. Answer: a. From CPUtime = IC * Clock Cycle * CPI we get: IC = CPUtime / (Clock Cycle*CPI ) In the case without co-processor: ICnoco = 12sec/(10-9sec * 3.0) = 4 * 109 In the case with co-processor: ICco = 1sec/(10-9sec * 5.0) = 2 * 108 Thus the MIPS rating of each run is: MIPSnoco = 4 * 109 /(12sec * 106) = 333.33 MIPSco = 2 * 108 /(1 sec * 106) = 200 b. The total number of the integer operations other than those to simulate the floating-point operation is(we can get it thinking about case of with co-processor): 2 * 108 – 2 * 106 = 1.98 * 108. Thinking about the case without co-processor, we get the number of integer operations simulating the floating-point operation: 4 * 109 – 1.98 * 108 = 3.802 * 109 Thus, to simulate each floating-point operation, we need integer operations of 3.802 * 109 / 2 * 106 = 1901