Introduction - Dr. Iyad Jafar

advertisement
Evaluating Performance
Chapter 1
Section 1.4
Dr. Iyad F. Jafar
Outline
 Introduction
 CPU Execution Time
 The Performance Equation
 Determinants of Performance
 SPEC Benchmark
 Other Performance Metrics
 Examples
2
Introduction
 Given a collection of computers, which one to buy?
 Best Performance ?
 Least Cost ?
 Best Cost/Performance?
 How to define performance?
 Time required to finish a task (individual users)
 Number of tasks executed per time (throughput)
 Less time to finish implies better performance
Performance = 1 / Execution Time
 If X is n times faster than Y, then
3
PerformanceX
ExecutionTimeY
n

PerformanceY
ExecutionTimeX
CPU Execution Time
 How to measure the execution time?
 Almost all modern computers are based on a clock
 The clock is a periodic square wave with known
period (cycle time)
 Period = 1 / Frequency
one clock cycle
• The base unit in measuring time is the cycle time ,
thus
Time = cycles * cycle time
Time = cycles / clock frequency
4
CPU Execution Time
 Time required to execute a program is essentially
the time required to execute its instructions!
Time = #instructions x cycle time
 However, not all instructions take the same time!
 One way to think about execution time is that it
equals the number of instructions executed multiplied
by the average time per instruction
Time = #instructions x average cycles per instruction x cycle time
Time = IC x CPI x CC
5
CPU Execution Time
 The average CPI is computed by
Effective CPI =

N
k 1
ICk  CPI k
IC
 Where



Where ICk is the number of instructions of class k
executed
CPIk is the number of clock cycles per instruction for
that instruction class
N is the number of instruction classes
 Note: The overall effective CPI varies by instruction mix –
a measure of the dynamic frequency of instructions across
one or many programs
6
The Performance Equation
Execution Time = IC  CPI x CC
Execution Time = IC  CPI / CR
Performance = 1 / Execution Time
 Notes
 Three key factors for performance: IC, CPI, and CC
 CC: The clock rate is usually given
 IC: Overall instruction count (executed instructions)
by using profilers/ simulators
 CPI: varies by instruction type and ISA
implementation
7
The Performance Equation
 Example 1. In a certain program 1000 instructions
were executed on CPU running at 1 GHz. If the
instruction counts and CPI for each class are given
below, how long does it take to execute the program?
Instruction
Class
Instruction
Count
Class CPI
1
200
2
2
300
3
3
500
1
Effective CPI = (200x2+300x3+500x1)/1000 = 1.8
Time = 1000 x 1.8 x 1 ns = 1.8 us
8
The Performance Equation
 Example 2. Suppose the computer A has clock cycle
of 250 ps and CPI 2.0 for some program, and
computer B has clock cycle time of 500 ps and CPI of
1.2 for the same program, then which computer is
faster ?
TimeA = IC x 2 x 250 ps = 500 IC ps
TimeB = IC x 1.2 x 500 ps = 600 IC ps
PerformanceA
TimeB
600 IC
-------------------- = ---------- = --------- = 1.2
PerformanceB
TimeA
500 IC
9
Computer A is 1.2 faster than B
The Performance Equation
 Example 3. A certain processor that has four instruction classes is
to be modified using different approaches. The details of the program
used in evaluating different approaches are given in the table below.

What is the effective CPI for
 The original processor
 Approach 1. A cache is added and it reduces the average load time to 2 cycles.
 Approach 2. A branch prediction scheme is used and it cuts the branch time
by 1 cycle.
 Approach 3. A second ALU is added to execute two ALU instructions at once.
Original
10
App1
App2
App3
Class
Frequency
Class CPI
CPIk x F
CPIk x F
CPIk x F
CPIk x F
ALU
50%
1
0.5
0.5
0.5
0.25
Load
20%
5
1.0
0.4
1.0
1.0
Store
10%
3
0.3
0.3
0.3
0.3
Branch
20%
2
0.4
0.4
0.2
0.4
Effective CPI
2.2
1.6
2.0
1.95
Speed up
1.375
1.10
1.128
Determinants of Performance
Execution Time = IC  CPI x CC
IC
CPI
Algorithm
X
X
Programming
Language
X
X
Compiler
X
X
ISA
X
X
X
X
X
Processor
Organization
Technology
11
CC
X
SPEC Benchmark
 What programs can be used to evaluate
different computers? Can we cheat? Need a
standard!
 SPEC Benchmark
 Standard Performance Evaluation Corp (SPEC)
 Programs used to measure performance (CPU,
Web, I/O…)
 Typical actual workloads
 SPEC CPU2006




Elapsed time to execute a selection of programs
Negligible I/O, so focuses on CPU performance
Summarize as geometric mean of performance ratios
CINT2006 (integer) and CFP2006 (floating-point)
n
12
Geometric Mean =
n
 Execution Time Ratio
i 1
i
SPEC Benchmark
CINT2006 for Opteron X4 2356
IC×109
CPI
Tc (ns)
Exec time
Ref time
SPECratio
Interpreted string processing
2,118
0.75
0.40
637
9,777
15.3
bzip2
Block-sorting compression
2,389
0.85
0.40
817
9,650
11.8
gcc
GNU C Compiler
1,050
1.72
0.47
24
8,050
11.1
mcf
Combinatorial optimization
336
10.00
0.40
1,345
9,120
6.8
go
Go game (AI)
1,658
1.09
0.40
721
10,490
14.6
hmmer
Search gene sequence
2,783
0.80
0.40
890
9,330
10.5
sjeng
Chess game (AI)
2,176
0.96
0.48
37
12,100
14.5
libquantum
Quantum computer simulation
1,623
1.61
0.40
1,047
20,720
19.8
h264avc
Video compression
3,102
0.80
0.40
993
22,130
22.3
omnetpp
Discrete event simulation
587
2.94
0.40
690
6,250
9.1
astar
Games/path finding
1,082
1.79
0.40
773
7,020
9.1
xalancbmk
XML parsing
1,058
2.70
0.40
1,143
6,900
6.0
Name
Description
perl
Geometric mean
11.7
High cache miss rates
13
SPEC Benchmark
CINT2000 Results for Various Processors
Pentium 3
3400
CINT2000
Pentium 4
2900
P4 Extreme
2400
Xeon
Athlon
1900
Athlon 64
1400
Opteron
Pmac G5
900
Athlon FX (DC)
14
Clock Speed (GHz)
4
3.5
3
2.5
2
1.5
1
0.5
400
Core Duo
Core 2 Duo
SPEC Benchmark
CFP2000 Results for Various Processors
3200
Pentium 3
Pentium 4
2700
CFP2000
P4 Extreme
2200
Xeon
Athlon
1700
Athlon 64
Opteron
1200
Pmac G5
700
Athlon FX (DC)
Core Duo
15
Clock Speed (GHz)
3.5
3
2.5
2
1.5
1
0.5
200
Core 2 Duo
Other Performance Factors
 Power consumption is one factor in evaluating
performance
 This is specifically important in the embedded market
where battery life is important (and passive cooling)
16
Examples
 Example 4. given a program with 106 instructions with the
following mix: 10% class A, 20% class B, 50% class C, and
20% class D. If this program is executed on two different
processors with the specifications given below, then
Processor
CR
(GHz)
CPI Class A
CPI Class B
CPI Class C
CPI Class D
1
1.5
1
2
3
4
2
2
2
2
2
2
 What
is the effective CPI for the program for each
implementation?
 Which implementation is faster?
 What is the speedup?
17
Examples
 Example 5. The information for some program that is
executed on some processor is given below. If the
processor is modified such that the CPI for Class 2
instructions is reduced to 2, then would it be beneficial to
adopt this modification if this
 modification requires increasing the clock cycle by 10%?
 modification does not affect the clock cycle but requires twice
the amount of power to execute the program
18
Classi
CPIi
Frequencyi
1
2
0.3
2
5
0.2
3
3
0.5
Download