HW1

Erik Lee EN 525.712 Advanced Computer Architecture Charles B. Cameron Homework Set 1 (P1.6, P1.12, P1.15) 2/7/12 Problem 1.6 A programs run time is determined by the product of instructions per program, cycles per instruction, and clock frequency. Assume the following instruction mix for a MIPS-like RISC instruction set: 15% stores, 25% loads, 15% branches, and 35% integer arithmetic, 5% integer shift, and 5% integer multiply. Given that load instructions require two cycles, branches require four cycles, integer ALU instructions require one cycle, and integer multiplies require ten cycles, computer the overall CPI. 𝑆𝑡𝑜𝑟𝑒𝑠 = 15%, 1 𝑐𝑦𝑐𝑙𝑒 𝐿𝑜𝑎𝑑𝑠 = 25%, 2 𝑐𝑦𝑐𝑙𝑒𝑠 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 = 15%, 4 𝑐𝑦𝑐𝑙𝑒𝑠 𝐼𝑛𝑡𝑒𝑔𝑒𝑟 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 = 35%, 1 𝑐𝑦𝑐𝑙𝑒 𝐼𝑛𝑡𝑒𝑔𝑒𝑟 𝑆ℎ𝑖𝑓𝑡 = 5%, 1 𝑐𝑦𝑐𝑙𝑒 𝐼𝑛𝑡𝑒𝑔𝑒𝑟 𝑀𝑢𝑙𝑡𝑖𝑝𝑙𝑦 = 5%, 10 𝑐𝑦𝑐𝑙𝑒𝑠 In order to calculate the overall Cycles per Instruction we just need to add up all the cycles need to complete each instruction type multiplied by the percentage of time it is used in the program. So 15% of the time there is a store instruction requiring one cycle and then add that to the 25% time doing Loads requiring two cycles, and so on until we account for 100% of the time. 𝐶𝑃𝐼 = 1 ∗ .15 + 2 ∗ .25 + 4 ∗ .15 + 1 ∗ .35 + 1 ∗ .05 + 10 ∗ .05 𝐶𝑃𝐼 = .15 + .50 + .60 + .35 + .05 + .05 𝐶𝑃𝐼 = 1.7 Overall CPI = 1.7 Problem 1.12 Using Amdahl’s law, compute speedups for a program that is 98% vectorizable for a system with 16, 64, 256, and 1024 processors. What would be a reasonable number of processors to build into a system for running such an application? Amdahl’s law for calculating relative speedup is below. 𝑆= 1 1 = 𝑇 (1 − 𝑓) + ( 𝑓 ) 𝑁 In the speed up equation f represents the portion of the program that is able to run in parallel, which is 98%, and N represents the number of processors available for use. And the resulting speedup is in comparison to a system with one processor. 𝑆(𝑓 = .98, 𝑁 = 16) = 1 = ̃ 12.3 . 98 (1 − .98) + ( ) 16 1 𝑆(𝑓 = .98, 𝑁 = 64) = = ̃ 28.3 . 98 (1 − .98) + ( ) 64 1 𝑆(𝑓 = .98, 𝑁 = 256) = = ̃ 41.96 . 98 (1 − .98) + ( ) 256 1 𝑆(𝑓 = .98, 𝑁 = 1024) = = ̃ 47.7 . 98 (1 − .98) + ( ) 1024 N = 16, Speed up = 12.3 N = 64, Speed up = 28.3 N = 256, Speed up = 41.96 N = 1024, Speed up = 47.7 Again by taking the relative speedup and graphing it out we see the speedup as the number of processors increases. In locating the knee of the graph we identify the transition point where we get diminishing returns. To identify the knee we need to find where the slope is .5 or 45 degrees. So from the speedup equation we take the derivative and solve for .5 and find what should be. 1 𝑆= . 98 (1 − .98) + ( ) 𝑁 𝑑𝑆 2450 = 𝑑𝑁 (𝑁 + 49)2 2450 .5 = (𝑁 + 49)2 𝑁 = 21 A reasonable number of processors for this application is 21. Problem 1.15 In 1995, the IBM AS/400 line of computers transitioned from a CISC instruction set to a RISC instruction set. Because of the simpler instruction set, the realizable clock frequency for a given technology generation and the CPI metric improved dramatically. However, for the same reason, the number of instructions per program also increased noticeably. Given the following parameters, compute the total performance improvement that occurred with this transition. Furthermore, compute the break-even clock frequency, breakeven cycles per instruction, and break-even code expansion ratios for this transition, assuming the other two factors are held constant. Performance factor Relative Frequency Cycles per Instruction Relative Instructions per program (dynamic count) AS/400 CISC (IMPI) (Actual) 50 MHz 7 1000 AS/400 RISC (PowerPC) (Actual) 125 MHz 3 3300 Actual Ratio 2.5 0.43 3.3 Break-Even Ratio ? ? ? For computing the performance improvement we need to calculate the time to execute the program before and after the improvement. 1 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ∗ ∗ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 𝑐𝑦𝑐𝑙𝑒 1 1 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑜𝑙𝑑 = ∗ ∗ 50𝑀 = 7143 1000 7 1 1 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑛𝑒𝑤 = ∗ ∗ 125𝑀 = 12626 3300 3 𝑛𝑒𝑤 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝐼𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 = = 1.76 𝑡𝑖𝑚𝑒𝑠 𝑓𝑎𝑠𝑡𝑒𝑟 𝑜𝑙𝑑 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 = Break even clock frequency 1 1 ∗ ∗ 𝑓 = 7143 3300 3 𝑓 = 70.715𝑀𝐻𝑧 Break even cycles per instruction 1 1 ∗ ∗ 125𝑀 = 7143 3300 𝑐 𝑐 = 5.3 𝑐𝑦𝑐𝑙𝑒𝑠 𝑝𝑒𝑟 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 Break even code expansion ratio 1 1 ∗ ∗ 125𝑀 = 7143 𝑖 3 𝑖 = 5833.2 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 Performance Improvement = 1.76 times faster Break-even clock frequency = 70.715MHz Break-even cycles per instruction = 5.3 cycles per instruction Break-even code expansion ratio = 5.8332

HW1

Related documents

Products

Support

HW1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib