Benchmarking

advertisement
Lecture 2c:
Benchmarks
Benchmarking
Benchmark is a program that is run on a computer to measure its
performance and compare it with other machines

Best benchmark is the users’ workload – the mixture of
programs and operating system commands that users run on a
machine.
 Not practical

Standard benchmarks
Benchmarking
Types of Benchmarks

Synthetic benchmarks

Toy benchmarks

Microbenchmarks

Program Kernels

Real Applications
Benchmarking
Synthetic benchmarks
Artificially created benchmark programs that represent
the average frequency of operations (instruction mix)
of a large set of programs
•
•
•
Whetstone benchmark
Dhrystone benchmark
Rhealstone benchmark
Benchmarking
•
Synthetic benchmarks
Whetstone benchmark
• First written in Algol60 in 1972, today Fortran, C/C++,
•
•
•
•
Java versions are available
Represents the workload of numerical applications
Measures floating point arithmetic performance
Unit is Millions of Whetstone instructions per second
(MWIPS)
Shortcommings:
•
•
Does not represent constructs in modern languages, such
as pointers, etc.
Does not consider cache effects
Benchmarking
•
Synthetic benchmarks
Dhrystone benchmark
•
•
•
•
•
•
First written in Ada in1984, today
Represents the workload of C version is available
Statistics are collected on system software, such as operating
system, compilers, editors and a few numerical programs
Measures integer and string performance, no floating-point
operations
Unit is the number of program iteration completions per
second
Shortcommings:
•
•
•
Does not represent real life programs
Compiler optimization overstates system performance
Small code that may fit in the instruction cache
Benchmarking
•
Synthetic benchmarks
Rhealstone benchmark
•
•
•
Multi-tasking real-time systems
Factors are:
•
•
•
•
•
•
Task switching time
Pre-emption time
Interrupt latency time
Semaphore shuffling time
Dead-lock breaking time
Datagram throughput time
Metric is Rhealstones per second
6
∑ wi . (1/ ti)
i=1
Benchmarking
Toy benchmarks
10-100 lines of code that the result is known before running the toy
program
• Quick sort
• Sieve of Eratosthenes
Finds prime numbers
http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif
func sieve( var N )
var PrimeArray as array of size N
initialize PrimeArray to all true
for i from 2 to N
for each j from i + 1 to N, where i divides j
set PrimeArray( j ) = false
Benchmarking
Microbenchmarks
Small, specially designed programs used to test some specific
function of a system (eg. Floating-point execution, I/O subsystem,
processor-memory interface, etc.)
•
•
Provide values for important parameters of a system
Characterize the maximum performance if the overall
performance is limited by that single component
Benchmarking
Kernels
Key pieces of codes from real applications.
•
LINPACK and BLAS
•
Livermore Loops
•
NAS
Benchmarking
•
Kernels
LINPACK and BLAS Libraries
•
•
LINPACK – linear algebra package
•
•
•
•
•
Measures floating-point computing power
Solves system of linear equations Ax=b with Gaussian
elimination
Metric is MFLOP/s
DAXPY - most time consuming routine
Used as the measure for TOP500 list
BLAS – Basic linear algebra subprograms
•
LINPACK makes use of BLAS library
Benchmarking
•
Kernels
LINPACK and BLAS Libraries
•
SAXPY – Scalar Alpha X Plus Y
•
•
•
Y = a X + Y, where X and Y are vectors, a is a scalar
SAXPY for single and DAXPY for double precision
Generic implementation:
for (int i = m; i < n; i++) {
y[i] = a * x[i] + y[i];
}
Benchmarking
•
Kernels
Livermore Loops
•
•
•
Developed at LLNL
Originally in Fortran, now also in C
24 numerical application kernels, such as:
• hydrodynamics fragment,
• incomplete Cholesky conjugate gradient,
• inner product,
• banded linear systems solution, tridiagonal linear systems solution,
• general linear recurrence equations,
• first sum, first difference,
• 2-D particle in a cell, 1-D particle in a cell,
• Monte Carlo search,
• location of a first array minimum, etc.
• Metrics are arithmetic, geometric and harmonic mean of
CPU rate
Benchmarking
•
Kernels
NAS Parallel Benchmarks
•
•
•
Developed at NASA Advanced Supercomputing division
Paper-and-pencil benchmarks
11 benchmarks, such as:
• Discrete Poisson equation,
• Conjugate gradient
• Fast Fourier Transform
• Bucket sort
• Embarrassingly parallel
• Nonlinear PDE solution
• Data traffic, etc.
Benchmarking
Real Applications
Programs that are run by many users
• C compiler
• Text processing software
• Frequently used user applications
• Modified scripts used to measure particular aspects of
system performance, such as interactive behavior, multiuser
behavior
Benchmarking
Benchmark Suites



Desktop Benchmarks
•
SPEC benchmark suite
Server Benchmarks
•
•
SPEC benchmark suite
TPC
Embedded Benchmarks
•
EEMBC
Benchmarking
SPEC Benchmark Suite

Desktop Benchmarks
•
•

CPU-intensive
•
SPEC CPU2000
•
•
11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks
Real application programs:
• C compiler
• Finite element modeling
• Fluid dynamics, etc.
Graphics intensive
•
•
SPECviewperf
•
Measures rendering performance using OpenGL
SPECapc
•
•
•
Pro/Engineer – 3D rendering with solid models
Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests
Unigraphics – solid modeling for an aircraft design
Server Benchmarks
•
•
SPECWeb – for web servers
SPECSFS – for NFS performance, throughput-oriented
Benchmarking
TPC Benchmark Suite



Server Benchmark
Transaction processing (TP) benchmarks
Real applications
•
•
•
•


TPC-C: simulates a complex query environment
TPC-H: ad hoc decision support
TPC-R: business decision support system where users run a
standard set of queries
TPC-W: business-oriented transactional web server
Measures performance in transactions per second. Throughput
performance is measured only when response time limit is met.
Allows cost-performance comparisons
Benchmarking
EEMBC Benchmarks

for embedded computing systems

34 benchmarks from 5 different application classes:
•
•
•
•
•
Automotive/industrial
Consumer
Networking
Office automation
Telecommunications
Benchmarking
Benchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks
Benchmarking
Benchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks
Benchmarking
Fixed-Computation benchmarks
W: fixed workload (number of instructions,
number of floating-point operations, etc)
T: measured execution time
R: speed
R 
W
T
Compare Speedup

R1
R2

W / T1
W / T2

T2
T1
Benchmarking
Fixed-Computation benchmarks
Amdahl’s Law
Benchmarking
Fixed-Time benchmarks
On a faster system, a larger workload can be processed in
the same amount of time
T: fixed execution time
W: workload
W
R: speed R 
T
Compare
Sizeup 
R1
R2

W1 / T
W2 / T

W1
W2
Benchmarking
Fixed-Time benchmarks
Scaled Speedup
Benchmarking
Variable-Computation and Variable-Time
benchmarks
In this type of benchmark, quality of the solution is
improved.
Q: quality of the solution
T: execution time
Quality improvements per second:
Q
T
Download