Computer Architecture

advertisement
Recap
1
Measuring Performance
 A computer user: response time (execution time).
 A computer center manager - throughput - the total
amount of work done in a period of time.
 CPU time : a very good and fair measure of performance.
 CPU time can also be divided into user CPU time
(program) and system CPU time (OS).
2
Aspects of CPU Execution Time
CPU Time = Instruction count x CPI x Clock cycle
Depends on:
Program Used
Compiler
ISA
Instruction Count I
Depends on:
Program Used
Compiler
ISA
CPU Organization
CPI
Clock
Cycle
C
Depends on:
CPU Organization
Technology
3
Factors Affecting CPU Performance
CPU time
= Seconds
= Instructions x Cycles
Program
Program
Instruction
Instruction
Count I
CPI
Program
X
X
Compiler
X
X
Instruction Set
Architecture (ISA)
X
X
Organization
Technology
x Seconds
X
Cycle
Clock Cycle C
X
X
4
Example: tradeoff between C and
CPI
Op
Frequency
Cycle Count
ALU ops
43%
1
Loads
21%
1
Stores
12%
2
Branches
24%
2
• Assume stores can execute in 1 cycle by slowing
clock 15%
• Should this be implemented?
5
Simple Example
• Old CPI = 0.43 x 1 + 0.21 x 1 + 0.12 x 2 + 0.24 x
2 = 1.36
• New CPI = 0.43 x 1 + 0.21 x 1 + 0.12 x 1 + 0.24 x
2 = 1.24
• Speedup = old time/new time
– = {I x old CPI x C}/{I x new CPI x 1.15 C}
– = 1.36 / (1.24 x 1.15) = 0.95
• Answer: Don’t make the change
6
Some Caveats
• Inter-dependence of I, CPI, and C: Improvement In
One May Impact Another
– increasing pipeline depth tends to increase clock
speed but may increase CPI
– Change in ISA to reduce instruction count may
require a design with slower clock => May Not
Improve Performance
– CPI depends on instruction mix => Smaller
Instruction Count May Not Improve Performance
7
Code Size & Performance
KDF9
B5500
Instructions
executed
Code size in
instructions
Code size
in bits
12
11
10
9
8
7
6
5
ICL 1907 1.1 s
4
ATLAS
3
Performance factor
Time
2
CDC 6600
NU 1108
1
8
Benchmarks and Benchmarking
• In lack of a universal task pick some programs
that represent common tasks
• Use representative programs to compare
performance of systems:
• CAUTIONS:
– Comparisons are as good as the benchmarks are in
representing your real workload.
– Many parameters affect measured performance
9
Example: We must use the same
compiler
• Compiler “enhancements” and performance
800
700
SPEC performance ratio
600
500
400
300
200
100
0
gcc
espresso
spice
doduc
nasa7
li
eqntott
matrix300
fpppp
tomcatv
Benchmark
Compiler
1998 Morgan Kaufmann Publishers
Enhanced compiler
10
Benchmark Suites
• A Suite Is a Collection of Representative Benchmarks From Different
Application Domains
• Weakness of Any One Benchmark Likely to Be Compensated By Another
• Standard Performance Evaluation Corporation (SPEC)
– Most Popular Benchmark Suite
– Suite Consists of Kernels, Small Fragments, Large Applications
– SPEC2006: CINT2006, CFP2006
– http://www.spec.org/
• Benchmark suites for servers
– SPECSFS: measures performance of File servers
– SPECWeb: measurers performance of Web servers
11
SPEC CPU2006 Programs
Benchmark
CINT2006
(Integer)
400.Perlbench
401.bzip2
403.Gcc
429.mcf
445.gobmk
456.Hmmer
458.sjeng
462.libquantum
464.h264ref
471.omnetpp
473.astar
483.xalancbmk
Language Descriptions
C
C
C
C
C
C
C
C
C
C++
C++
C++
Programming Language
Compression
C Compiler
Combinatorial Optimization
Artificial Intelligence: Go
Search Gene Sequence
Artificial Intelligence: chess
Physics / Quantum Computing
Video Compression
Discrete Event Simulation
Path-finding Algorithms
XML Processing
Source: http://www.spec.org/osg/cpu2006/CINT2006/
12
SPEC CPU2006 Programs
Benchmark
CFP2006
(Floating
Point)
410.Bwaves
416.Gamess
433.Milc
434.Zeusmp
435.Gromacs
436.cactusADM
437.leslie3d
444.Namd
447.dealII
450.Soplex
453.Povray
454.Calculix
459.GemsFDTD
465.Tonto
470.Lbm
481.Wrf
482.sphinx3
Language Descriptions
Fortran
Fortran
C
Fortran
C, Fortran
C, Fortran
Fortran
C++
C++
C++
C++
C, Fortran
Fortran
Fortran
C
C, Fortran
C
Fluid Dynamics
Quantum Chemistry
Physics / Quantum Chromodynamics
Physics / CFD
Biochemistry / Molecular Dynamics
Physics / General
Fluid Dynamics
Biology / Molecular Dynamics
Finite Element Analysis
Linear Programming, Optimization
Image Ray-tracing
Structural Mechanics
Computational Electromagnetics
Quantum Chemistry
Fluid Dynamics
Weather
Speech
Source: http://www.spec.org/osg/cpu2006/CFP2006/
13
Top 20 SPEC CPU2006 Results (As of August
2007)
Top 20 SPECint2006
Top 20 SPECfp2006
#
MHz Processor
int peak
int base
MHz Processor
fp peak
fp base
1
3000 Core 2 Duo E6850
22.6
20.2
4700 POWER6
22.4
17.8
2
4700 POWER6
21.6
17.8
3000 Core 2 Duo E6850
19.3
18.7
3
3000 Xeon 5160
21.0
17.9
1600 Dual-Core Itanium 2
18.1
17.3
4
3000 Xeon X5365
20.8
18.9
1600 Dual-Core Itanium 2
17.8
17.0
5
2666 Core 2 Duo E6750
20.5
18.3
2666 Core 2 Duo E6750
17.7
17.1
6
2667 Core 2 Duo E6700
20.0
17.9
3000 Xeon 5160
17.7
17.1
7
2667 Core 2 Quad Q6700
19.7
17.6
3000 Opteron 2222
17.4
16.0
8
2666 Xeon X5355
19.1
17.3
2667 Core 2 Duo E6700
16.9
16.3
9
2666 Xeon 5150
19.1
17.3
2800 Opteron 2220
16.7
13.3
10
2666 Xeon X5355
18.9
17.2
3000 Xeon 5160
16.6
16.1
11
2667 Xeon X5355
18.6
16.8
2667 Xeon X5355
16.6
16.1
12
2933 Core 2
18.5
17.8
2667 Core 2 Quad Q6700
16.6
16.1
13
2400 Core 2 Quad Q6600
18.5
16.5
2666 Xeon X5355
16.6
16.1
14
2600 Core 2 Duo X7800
18.3
16.4
2933 Core 2 Extreme X6800 16.2
16.0
15
2667 Xeon 5150
17.6
16.6
2400 Core 2 Quad Q6600
16.0
15.4
16
2400 Core 2 Duo T7700
17.6
16.6
1400 Dual-Core Itanium 2
15.9
15.2
17
2333 Xeon E5345
17.5
15.9
2667 Xeon 5150
15.9
15.5
18
2333 Xeon 5148
17.4
15.9
2333 Xeon E5345
15.4
14.9
19
2333 Xeon 5140
17.4
15.7
2600 Opteron 2218
15.4
12.5
20
2660 Xeon X5355
17.4
15.7
2400 Xeon X3220
15.3
15.1
14
Source: http://www.spec.org/cpu2006/results/cint2006.html
Performance Evaluation Using
Benchmarks
• “For better or worse, benchmarks shape a field”
• Good products created when we have:
– Good benchmarks
– Good ways to summarize performance
• Given sales depend in big part on performance relative to
competition, there is big investment in improving
products as reported by performance summary
• If benchmarks inadequate, then choose between
improving product for real programs vs. improving
product to get more sales;
Sales almost always wins!
15
tomcatv
fpppp
matrix300
eqntott
li
nasa7
doduc
spice
epresso
gcc
SPEC Perf
How to Summarize
Performance
800
700
600
500
400
300
200
100
0
Benchmark
16
Download