1- Percentage of Vectorization a- Let f=fraction speedup , s=speedup on that fraction New_time=(1-f)*old_time+(f/s)*old_time Speedup = old_time / new_time Amdahl’s Law ; Speedup = 1/[(1-f)+(f/s)] Label the Y axis “Net speedup” and label the X axis “Percent Vectorization” ; for s=40 (times faster) ; X : the percentage vectorization, f = 0 to 1 as following When we calculate Net Speedup with Amdahl’s Law , we get following results ; X : Percent Vectorization 0 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00 Y : Net Speedup 1 1,11 1,24 1,41 1,64 1,95 2,41 3,15 4,55 8,16 40,00 Vector Mode Speedup Graph 45 Y : Net Speedup; 40,00 40 Net Speedup 35 30 25 20 15 10 5 Percent Vectorization 0 0 b- 0,10 0,20 0,30 0,40 0,50 Speedup = 2 and s =40 , f = ? Speedup = 1/[(1-f)+(f/s)] 2= 1/[(1-f)+(f/40)] , we can calculate f value from this equation , percent vectorization , f= %51 0,60 0,70 0,80 0,90 1,00 c- Since the maximum speedup attainable would be 40, one-half of this value would be Speedup =20 and s =40 , f= ? Speedup = 1/[(1-f)+(f/s)] 20= 1/[(1-f)+(f/40)] , we can also calculate f value from this equation , percent vectorization , f= %97 The percentage vectorization needed to gain one-half of the maximum speedup attainable is %97. d- Using the hardware doubling the following equation can be generated: f=%55 and s =80 , Speedup = ? Speedup = 1/[(1-f)+(f/s)] = 1/[(1-0,55)+(0,55/80)] , Speedup = 2,189 To find what percentage of vectorization would need to be achieved to match the hardware doubling ; Speedup =2,189 and s =40 , f= ? Speedup = 1/[(1-f)+(f/s)] 2,189= 1/[(1-f)+(f/40)] , we can also calculate f value from this equation , percent vectorization , f= %55,72 2- Iron Law Assume code size is 100. CPI can be calculated as Assume cycle time is 1 for initial case. Then performance will be If ALU instructions are removed from branch instructions, then 20% of ALU instructions will be removed. We assumed code size is 100 for simplicity. Then 20 ALU should be removed. New code size is reduced to 80 with 30 ALU, 30 Memory and 20 branch instructions. New CPI can be calculated as Then performance can be calculated as Even though ALU instructions are removed, performance of the new design is worse.