CS1104 2001/02 Semester II Help Session IIA Performance Measures Colin Tan S15-04-05 Ctank@comp.nus.edu.sg Basic Concepts Instruction Execution Cycles • Processors execute instructions in several steps: – Instruction fetch (IF), instruction decode (ID), execute (EX), memory read (MEM), write result (WB). – Previous step must complete before next step can proceed correctly! – Coordination between steps relies on a series of “ticks” called “clock cycles” (CC). Clock cycle n is denoted by CCn – So in our processor: – CC1: IF – CC2: ID – CC3: EX – CC4: MEM – CC5: WB Basic Concepts Instruction Execution Cycles • So each instruction takes a certain number of cycles to execute. • If processor is NOT pipelined, then an instruction may skip some stages and hence may have fewer cycles. • The average number of cycles required for a particular instruction is called the instruction CPI. – E.g. ADD may require 2 cycles, SUB may require 3 cycles. Instruction CPI of ADD is therefore 2, and SUB is 3. Basic Concepts Instruction Frequency • A program (e.g. Microsoft Word) is made up of many instructions coming from each of the different types of instructions. – The number of instructions in each class is called the “instruction frequency” of that class – E.g. there may be 1017 ADDs, 763 MUL, 27839 SUB etc. – This is often expressed as a percentage or as a fraction. Basic Concepts Average Cycles Per Instruction • The instruction frequency and the number of cycles an instruction requires (instruction CPI) can be used to compute what the average Cycles Per Instruction, or simply CPI of a particular program. – Each type of instruction would take a different number of clock cycles. – A program consists of several different types of instructions. – The average CPI is the average number of cycles required to execute each instruction, across all types of instructions. Calculating Average CPI • Find the overall CPI of a program running on a processor with the class CPIs and instruction frequencies shown here: Type Add Sub Mul Div CPI 3 2 4 5 Instruction Frequency 0.4 0.25 0.15 0.20 Calculating Average CPI – Let’s assume that the total number of instructions is IC. Then there are 0.4IC ADD instructions, 0.25IC SUB instructions, 0.15IC MUL instructions and 0.2 DIV instructions. • Total number of clock cycles used by ADD instructions is 0.4IC x 3, SUB is 0.25IC x 2, MUL is 0.15IC x 4, DIV is 0.2IC x 5 cycles. – Hence total number of clock cycles used by this program is 0.4IC x 3 + 0.25IC x 2 + 0.15IC x 4 + 0.2IC x 5 – Number of instructions is IC. Hence average number of cycles per instruction (average CPI) is (0.4IC x 3 + 0.25IC x 2 + 0.15IC x 4 + 0.2IC x 5)/1.0IC • IC cancels off, leaving 0.4 x 3 + 0.25 x 2 + 0.15 x 4 + 0.2 x 5, final answer is 2.7. • Hence for this program, each instruction requires, on average, 2.7 cycles. Exercise • Find the average CPI of the following program: Exercise • Ratio of instructions is shown below: • This gives us the following relative frequencies: Exercise • Hence our average CPI is: – 0.36 x 2 + 0.32 x 2 + 0.28 x 6 + 0.04 x 12 = 3.56 • Thus, on average, each instruction will take 3.56 clock cycles. Why is this useful? • Each cycle that an instruction takes consumes time. • If the clock rate of a CPU is 500 MHz, then each second there will be 500,000,000 cycles (note: 1 MHz is 106 cycles, NOT 220 cycles!) • Therefore each cycle requires 1/(500 x 106) seconds – This works out to 2 ns per cycle. But still.. Why is this useful? • If there are IC instructions in a program (called the instruction count of the program), and if the average CPI is C, then the total number of cycles used by this program is IC x C. • Each cycle requires 2ns. So therefore the program will require (IC x C x 2) ns to execute. • This is called the execution time of the program, and forms the basis for performance comparison. – We take a program and run it on machine M1. Take the execution time TM1, then run the same program on machine M2, taking the execution time TM2. If TM1 > TM2, then machine M2 is faster by M1, and it is faster by TM1 / TM2. Exercise • Find i) average CPI, ii) Execution time of the program below for machines M1 and M2, then find the speedup of M2 over M1. How Caches Affect Performance • Sometimes the instruction/data required is not present in the cache – This is a cache miss! – Cache system needs to go to main memory to remedy the miss. • This will take many many cycles! • If execution proceeds, the results will be meaningless – Either the required instruction is not loaded yet because of the cache miss, or the data is not loaded. • CPU responds by freezing the instruction for many cycles – This is to give memory time to produce the instruction/data for the cache • When cache miss is remedied, CPU re-reads the cache. • Hence cache misses adds cycles to the instruction, and thus affects the instruction CPI. How Caches Affect Performance • Eqn given in lecture notes is: • CPImemory = Instruction Frequency * L1 Miss rate * (L1 miss penalty + L2 miss rate * L2 miss penalty) + Data Access Frequency * L1 Miss rate * (L1 miss penalty + L2 miss rate * L2 miss penalty) • Note that we do not use the cache hit figures because the basic instruction CPI already factors this in – The basic instruction CPI includes reading from the instruction cache assuming a cache hit, or reading from data cache assuming a cache hit. • Hence here we are only concerned with cycles added because of a cache miss. Exercise • Given the following program and machine, assume that L1 miss rate is 0.05, L1 miss penalty is 12 cycles, L2 miss rate is 0.03, L2 miss penalty is 40 cycles, find the average CPI. One Last Exercise One Last Exercise • Moral: Always ensure that the frequencies add up to 1.0 (100%), otherwise you need to normalize the answer by dividing by the total frequency. Summary • Instructions are timed using a central clock. Each tick of the clock is called a clock cycle, or simply a cycle. • Each instruction will require a certain number of cycles on average to operate. This is the instruction CPI. • Different instructions within a program will have different CPI, however we can compute the average CPI across all instructions in a given program. • Performance can be measured by running the same program on different machines. If execution time on M1 is TM1, on M2 is TM2, then the speedup of M1 over M2 is TM2/TM1, and vice-versa. Summary • Cache misses cause the CPI of an instruction, and the overall CPI of a program to go up. – Processor needs to freeze instruction to allow memory to deliver missing instruction/data to cache. • Remember to normalize your CPI if the total frequency adds up to >1.0! Further Reading • Please read Dr. Ankush’s notes as well!