Erik Lee EN 525.712 Advanced Computer Architecture Charles B. Cameron Homework Set 1 (P1.6, P1.12, P1.15) 2/7/12 Problem 1.6 A programs run time is determined by the product of instructions per program, cycles per instruction, and clock frequency. Assume the following instruction mix for a MIPS-like RISC instruction set: 15% stores, 25% loads, 15% branches, and 35% integer arithmetic, 5% integer shift, and 5% integer multiply. Given that load instructions require two cycles, branches require four cycles, integer ALU instructions require one cycle, and integer multiplies require ten cycles, computer the overall CPI. ππ‘ππππ = 15%, 1 ππ¦πππ πΏππππ = 25%, 2 ππ¦ππππ π΅ππππβππ = 15%, 4 ππ¦ππππ πΌππ‘ππππ π΄πππ‘βπππ‘ππ = 35%, 1 ππ¦πππ πΌππ‘ππππ πβπππ‘ = 5%, 1 ππ¦πππ πΌππ‘ππππ ππ’ππ‘ππππ¦ = 5%, 10 ππ¦ππππ In order to calculate the overall Cycles per Instruction we just need to add up all the cycles need to complete each instruction type multiplied by the percentage of time it is used in the program. So 15% of the time there is a store instruction requiring one cycle and then add that to the 25% time doing Loads requiring two cycles, and so on until we account for 100% of the time. πΆππΌ = 1 ∗ .15 + 2 ∗ .25 + 4 ∗ .15 + 1 ∗ .35 + 1 ∗ .05 + 10 ∗ .05 πΆππΌ = .15 + .50 + .60 + .35 + .05 + .05 πΆππΌ = 1.7 Overall CPI = 1.7 Problem 1.12 Using Amdahl’s law, compute speedups for a program that is 98% vectorizable for a system with 16, 64, 256, and 1024 processors. What would be a reasonable number of processors to build into a system for running such an application? Amdahl’s law for calculating relative speedup is below. π= 1 1 = π (1 − π) + ( π ) π In the speed up equation f represents the portion of the program that is able to run in parallel, which is 98%, and N represents the number of processors available for use. And the resulting speedup is in comparison to a system with one processor. π(π = .98, π = 16) = 1 = Μ 12.3 . 98 (1 − .98) + ( ) 16 1 π(π = .98, π = 64) = = Μ 28.3 . 98 (1 − .98) + ( ) 64 1 π(π = .98, π = 256) = = Μ 41.96 . 98 (1 − .98) + ( ) 256 1 π(π = .98, π = 1024) = = Μ 47.7 . 98 (1 − .98) + ( ) 1024 N = 16, Speed up = 12.3 N = 64, Speed up = 28.3 N = 256, Speed up = 41.96 N = 1024, Speed up = 47.7 Again by taking the relative speedup and graphing it out we see the speedup as the number of processors increases. In locating the knee of the graph we identify the transition point where we get diminishing returns. To identify the knee we need to find where the slope is .5 or 45 degrees. So from the speedup equation we take the derivative and solve for .5 and find what should be. 1 π= . 98 (1 − .98) + ( ) π ππ 2450 = ππ (π + 49)2 2450 .5 = (π + 49)2 π = 21 A reasonable number of processors for this application is 21. Problem 1.15 In 1995, the IBM AS/400 line of computers transitioned from a CISC instruction set to a RISC instruction set. Because of the simpler instruction set, the realizable clock frequency for a given technology generation and the CPI metric improved dramatically. However, for the same reason, the number of instructions per program also increased noticeably. Given the following parameters, compute the total performance improvement that occurred with this transition. Furthermore, compute the break-even clock frequency, breakeven cycles per instruction, and break-even code expansion ratios for this transition, assuming the other two factors are held constant. Performance factor Relative Frequency Cycles per Instruction Relative Instructions per program (dynamic count) AS/400 CISC (IMPI) (Actual) 50 MHz 7 1000 AS/400 RISC (PowerPC) (Actual) 125 MHz 3 3300 Actual Ratio 2.5 0.43 3.3 Break-Even Ratio ? ? ? For computing the performance improvement we need to calculate the time to execute the program before and after the improvement. 1 πππ π‘ππ’ππ‘ππππ ∗ ∗ πππππ’ππππ¦ πππ π‘ππ’ππ‘πππ πππ’ππ‘ ππ¦πππ 1 1 ππππππππππππππ = ∗ ∗ 50π = 7143 1000 7 1 1 ππππππππππππππ€ = ∗ ∗ 125π = 12626 3300 3 πππ€ πππππππππππ πΌπππππ£πππππ‘ = = 1.76 π‘ππππ πππ π‘ππ πππ πππππππππππ = Break even clock frequency 1 1 ∗ ∗ π = 7143 3300 3 π = 70.715ππ»π§ Break even cycles per instruction 1 1 ∗ ∗ 125π = 7143 3300 π π = 5.3 ππ¦ππππ πππ πππ π‘ππ’ππ‘πππ Break even code expansion ratio 1 1 ∗ ∗ 125π = 7143 π 3 π = 5833.2 πππ π‘ππ’ππ‘ππππ Performance Improvement = 1.76 times faster Break-even clock frequency = 70.715MHz Break-even cycles per instruction = 5.3 cycles per instruction Break-even code expansion ratio = 5.8332