Part I Notes

advertisement
CS1104 2001/02 Semester II
Help Session IIA
Performance Measures
Colin Tan
S15-04-05
Ctank@comp.nus.edu.sg
Basic Concepts
Instruction Execution Cycles
• Processors execute instructions in several steps:
– Instruction fetch (IF), instruction decode (ID), execute (EX),
memory read (MEM), write result (WB).
– Previous step must complete before next step can proceed correctly!
– Coordination between steps relies on a series of “ticks” called
“clock cycles” (CC). Clock cycle n is denoted by CCn
– So in our processor:
– CC1: IF
– CC2: ID
– CC3: EX
– CC4: MEM
– CC5: WB
Basic Concepts
Instruction Execution Cycles
• So each instruction takes a certain number of
cycles to execute.
• If processor is NOT pipelined, then an instruction
may skip some stages and hence may have fewer
cycles.
• The average number of cycles required for a
particular instruction is called the instruction CPI.
– E.g. ADD may require 2 cycles, SUB may require 3
cycles. Instruction CPI of ADD is therefore 2, and SUB
is 3.
Basic Concepts
Instruction Frequency
• A program (e.g. Microsoft Word) is made up of many
instructions coming from each of the different types of
instructions.
– The number of instructions in each class is called the “instruction
frequency” of that class
– E.g. there may be 1017 ADDs, 763 MUL, 27839 SUB etc.
– This is often expressed as a percentage or as a fraction.
Basic Concepts
Average Cycles Per Instruction
• The instruction frequency and the number of cycles an
instruction requires (instruction CPI) can be used to
compute what the average Cycles Per Instruction, or
simply CPI of a particular program.
– Each type of instruction would take a different number of clock
cycles.
– A program consists of several different types of instructions.
– The average CPI is the average number of cycles required to
execute each instruction, across all types of instructions.
Calculating Average CPI
• Find the overall CPI of a program running on a processor
with the class CPIs and instruction frequencies shown
here:
Type
Add
Sub
Mul
Div
CPI
3
2
4
5
Instruction Frequency
0.4
0.25
0.15
0.20
Calculating Average CPI
– Let’s assume that the total number of instructions is IC. Then there
are 0.4IC ADD instructions, 0.25IC SUB instructions, 0.15IC
MUL instructions and 0.2 DIV instructions.
• Total number of clock cycles used by ADD instructions is
0.4IC x 3, SUB is 0.25IC x 2, MUL is 0.15IC x 4, DIV is
0.2IC x 5 cycles.
– Hence total number of clock cycles used by this program is 0.4IC x
3 + 0.25IC x 2 + 0.15IC x 4 + 0.2IC x 5
– Number of instructions is IC. Hence average number of cycles per
instruction (average CPI) is (0.4IC x 3 + 0.25IC x 2 + 0.15IC x 4 +
0.2IC x 5)/1.0IC
• IC cancels off, leaving 0.4 x 3 + 0.25 x 2 + 0.15 x 4 + 0.2 x 5, final
answer is 2.7.
• Hence for this program, each instruction requires, on average, 2.7
cycles.
Exercise
• Find the average CPI of the following
program:
Exercise
• Ratio of instructions is shown below:
• This gives us the following relative frequencies:
Exercise
• Hence our average CPI is:
– 0.36 x 2 + 0.32 x 2 + 0.28 x 6 + 0.04 x 12 = 3.56
• Thus, on average, each instruction will take 3.56 clock
cycles.
Why is this useful?
• Each cycle that an instruction takes consumes
time.
• If the clock rate of a CPU is 500 MHz, then each
second there will be 500,000,000 cycles (note: 1
MHz is 106 cycles, NOT 220 cycles!)
• Therefore each cycle requires 1/(500 x 106)
seconds
– This works out to 2 ns per cycle.
But still..
Why is this useful?
• If there are IC instructions in a program (called the
instruction count of the program), and if the average CPI is
C, then the total number of cycles used by this program is
IC x C.
• Each cycle requires 2ns. So therefore the program will
require (IC x C x 2) ns to execute.
• This is called the execution time of the program, and forms
the basis for performance comparison.
– We take a program and run it on machine M1. Take the execution
time TM1, then run the same program on machine M2, taking the
execution time TM2. If TM1 > TM2, then machine M2 is faster by
M1, and it is faster by TM1 / TM2.
Exercise
• Find i) average CPI, ii) Execution time of the program below for
machines M1 and M2, then find the speedup of M2 over M1.
How Caches Affect Performance
• Sometimes the instruction/data required is not present in the cache
– This is a cache miss!
– Cache system needs to go to main memory to remedy the miss.
• This will take many many cycles!
• If execution proceeds, the results will be meaningless
– Either the required instruction is not loaded yet because of the cache miss,
or the data is not loaded.
• CPU responds by freezing the instruction for many cycles
– This is to give memory time to produce the instruction/data for the cache
• When cache miss is remedied, CPU re-reads the cache.
• Hence cache misses adds cycles to the instruction, and thus affects the
instruction CPI.
How Caches Affect Performance
• Eqn given in lecture notes is:
• CPImemory
= Instruction Frequency * L1 Miss rate *
(L1 miss penalty + L2 miss rate * L2 miss penalty)
+ Data Access Frequency * L1 Miss rate *
(L1 miss penalty + L2 miss rate * L2 miss penalty)
• Note that we do not use the cache hit figures because the basic
instruction CPI already factors this in
– The basic instruction CPI includes reading from the instruction cache
assuming a cache hit, or reading from data cache assuming a cache hit.
• Hence here we are only concerned with cycles added because of a
cache miss.
Exercise
• Given the following program and machine, assume that L1 miss rate is
0.05, L1 miss penalty is 12 cycles, L2 miss rate is 0.03, L2 miss
penalty is 40 cycles, find the average CPI.
One Last Exercise
One Last Exercise
• Moral: Always ensure that the frequencies add up to 1.0
(100%), otherwise you need to normalize the answer by
dividing by the total frequency.
Summary
• Instructions are timed using a central clock. Each tick of
the clock is called a clock cycle, or simply a cycle.
• Each instruction will require a certain number of cycles on
average to operate. This is the instruction CPI.
• Different instructions within a program will have different
CPI, however we can compute the average CPI across all
instructions in a given program.
• Performance can be measured by running the same
program on different machines. If execution time on M1 is
TM1, on M2 is TM2, then the speedup of M1 over M2 is
TM2/TM1, and vice-versa.
Summary
• Cache misses cause the CPI of an
instruction, and the overall CPI of a
program to go up.
– Processor needs to freeze instruction to allow
memory to deliver missing instruction/data to
cache.
• Remember to normalize your CPI if the
total frequency adds up to >1.0!
Further Reading
• Please read Dr. Ankush’s notes as well!
Download