CDA 3101 Spring 2001 Introduction to Computer Organization

advertisement
CDA 3101
Spring 2016
Introduction to Computer Organization
Benchmarks
14 January 2016
Overview
• Benchmarks
• Popular benchmarks
– Linpack
– Intel’s iCOMP
• SPEC Benchmarks
• MIPS Benchmark
• Fallacies and Pitfalls
Benchmarks
• Benchmarks measure different aspects of component
and system performance
• Ideal situation: use real workload
• Types of Benchmarks
• Real programs
• Kernels
• Toy benchmarks
• Synthetic benchmarks
• Risk: adjust design to benchmark requirements
– (partial) solution: use real programs and update constantly
• Engineering or scientific applications
• Software development tools
• Transaction processing
• Office applications
A / Benchmark Story
1. You create a benchmark called the vmark 
2. Run it on lots of different computers
3. Publish the vmarks in www.vmark.org
4. vmark and www.vmark.org become popular 
– Users start buying their PCs based on vmark
– Vendors would be banging on your door
5. Vendors examine the vmark code and fix up their
compilers and/or microarchitecture to run vmark
6. Your vmark benchmark has been broken 
7. Create vmark 2.0 
Performance Reports
• Reproducibility
– Include hardware / software configuration (SPEC)
– Evaluation process conditions
• Summarizing performance
–
–
–
–
–
Total time:
Arithmetic mean:
Harmonic mean:
Weighted mean:
Geometric mean:
GM (Xi)
GM (Yi)
= GM
Xi
Yi
AM = 1/n * Σ exec timei
HM = n / Σ (1/ratei)
WM = Σ wi * exec timei
GM = (Π exec time ratioi)1/n
Ex.1: Linpack Benchmark
• “Mother of all benchmarks”
• Time to solve a dense systems of linear equations
DO I = 1, N
DY(I) = DY(I) + DA * DX(I)
END DO
• Metrics
–
–
–
–
Rpeak: system peak Gflops
Nmax: matrix size that gives the highest Gflops
N1/2: matrix size that achieves half the rated Rmax Gflops
Rmax: the Gflops achieved for the Nmax size matrix
• Used in http://www.top500.org
Ex.2: Intel’s iCOMP Index 3.0
• New version (3.0) reflects:
• Mix of instructions for existing and emerging software.
• Increasing use of 3D, multimedia, and Internet software.
• Benchmarks
•
•
•
•
•
2 integer productivity applications (20% each)
3D geometry and lighting calculations (20%)
FP engineering and finance programs and games (5%)
Multimedia and Internet application (25%. )
Java application (10%)
• Weighted GM of relative performance
– Baseline processor: Pentium II processor at 350MHz
Ex.3: SPEC CPU Benchmarks
• System Performance Evaluation Corporation
• Need to update/upgrade benchmarks
– Longer run time
– Larger problems
– Application diversity
www.spec.org
• Rules to run and report
– Baseline and optimized
– Geometric mean of normalized execution times
– Reference machine: Sun Ultra5_10 (300-MHz SPARC, 256MB)
• CPU2006: latest SPEC CPU benchmark (4th version)
– 12 integer and 17 floating point programs
• Metrics: response time and throughput
Ex.3: SPEC CPU Benchmarks
1989-2006
Previous Benchmarks, now retired
Ex.3: SPEC CPU Benchmarks
• Observe: We will use SPEC 2000 & 2006 CPU
benchmarks in this set of notes.
• Task: However, you are asked to read about
SPEC 2006 CPU benchmark suite, described at
www.spec.org/cpu2006
• Result: Compare SPEC 2006 with SPEC
2000 data www.spec.org/cpu2000 to answer
the extra-credit questions in Homework #2.
SPEC CINT2000 Benchmarks
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
164.gzip
C Compression
175.vpr
C FPGA Circuit Placement and Routing
176.gcc
C C Programming Language Compiler
181.mcf
C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon
C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap
C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator
SPEC CFP2000 Benchmarks
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
168.wupwise F77
171.swim
F77
172.mgrid F77
173.applu
F77
177.mesa
C
178.galgel F90
179.art
C
183.equake C
187.facerec F90
188.ammp C
189.lucas
F90
191.fma3d F90
200.sixtrack F77
301.apsi
F77
Physics / Quantum Chromodynamics
Shallow Water Modeling
Multi-grid Solver: 3D Potential Field
Parabolic / Elliptic Partial Differential Equations
3-D Graphics Library
Computational Fluid Dynamics
Image Recognition / Neural Networks
Seismic Wave Propagation Simulation
Image Processing: Face Recognition
Computational Chemistry
Number Theory / Primality Testing
Finite-element Crash Simulation
High Energy Nuclear Physics Accelerator Design
Meteorology: Pollutant Distribution
SPECINT2000 Metrics
• SPECint2000: The geometric mean of 12 normalized
ratios (one for each integer benchmark) when each
benchmark is compiled with "aggressive" optimization
• SPECint_base2000: The geometric mean of 12
normalized ratios when compiled with "conservative"
optimization
• SPECint_rate2000: The geometric mean of 12
normalized throughput ratios when compiled with
"aggressive" optimization
• SPECint_rate_base2000: The geometric mean of 12
normalized throughput ratios when compiled with
"conservative" optimization
SPECint_base2000 Results
Mips/IRIX
R12000@ 400MHz
Intel/NT 4.0
PIII @ 733 MHz
Alpha/Tru64
21264 @ 667 MHz
SPECfp_base2000 Results
Mips/IRIX
R12000@ 400MHz
Alpha/Tru64
21264 @ 667 MHz
Intel/NT 4.0
PIII @ 733 MHz
Effect of CPI: SPECint95 Ratings
Microarchitecture improvements
CPU time = IC * CPI * clock cycle
Effect of CPI: SPECfp95 Ratings
Microarchitecture improvements
SPEC Recommended Readings
 SPEC 2006 – Survey of Benchmark Programs
http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf
 SPEC 2006 Benchmarks - Journal Articles on
Implementation Techniques and Problems
http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/
 SPEC 2006 Installation, Build, and Runtime Issues
http://www.spec.org/cpu2006/issues/
Another Benchmark: MIPS 
•
•
•
•
Millions of Instructions Per Second
MIPS = IC / (CPUtime * 106)
Comparing apples to oranges
Flaw: 1 MIPS on one processor does not accomplish
the same work as 1 MIPS on another
– It is like determining the winner of a foot race by counting
who used fewer steps
– Some processors do FP in software (e.g. 1FP = 100 INT)
– Different instructions take different amounts of time
• Useful for comparisons between 2 processors from the
same vendor that support the same ISA with the same
compiler (e.g. Intel’s iCOMP benchmark)
Fallacies and Pitfalls
• Ignoring Amdahl’s law
• Using clock rate or MIPS
as a performance metric
1922-2015
• Using the Arithmetic Mean of normalized
CPU times (ratios) instead of the Geometric Mean
• Using hardware-independent metrics
– Using code size as a measure of speed
• Synthetic benchmarks predict performance
– They do not reflect the behavior of real programs
• The geometric mean of CPU times ratios is
proportional to the total execution time [NOT!!]
Conclusions
• Performance is specific to a particular program/s
• CPU time: only adequate measure of performance
• For a given ISA performance increases come from:
– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
– compiler enhancements that lower CPI and/or IC
• Your workload: the ideal benchmark
• You should not always believe everything you read!
Happy & Safe Weekend 
Download