PowerPoint Template

advertisement
The Problem With
The Linpack Benchmark 1.0
Matrix Generator
Jack J. Dongarra and Julien Langou
International Journal of High Performance Computing Applications 2009; 23; 5
Group Members
Ahmad Hamidi
Aneel
Zulkhairi Idris
Adam Malik Sahar
Key Words
Linpack Benchmark
The Linpack benchmark measures the floating point rate of
execution for solving a linear system of equations. This
benchmark is used to determine the number of Floating
Point Operations per Second (FLOPS) a super-computer
can achieve. Results are published periodically via the
Top500 list (http://www.top500.org).
LINPACK
A collection of Fortran subroutines that analyze and solve
linear equations and linear least-squares problems. The
package solves linear systems whose matrices are general,
banded, symmetric indefinite, symmetric positive definite,
triangular, and tridiagonal square. LINPACK uses columnoriented algorithms to increase efficiency by preserving
locality of reference.
Key Words
Pseudo-Random Number Generator
(PRNG)
An algorithm for generating a sequence of numbers that
approximates the properties of random numbers. The
sequence is not truly random in that it is completely
determined by a relatively small set of initial values, called
the PRNG's state. Although sequences that are closer to
truly random can be generated using hardware random
number generators, pseudorandom numbers are important
in practice for simulations (e.g., of physical systems with
the Monte Carlo method), and are central in the practice of
cryptography. Common classes of these algorithms are
linear congruential generators, Lagged Fibonacci
generators, linear feedback shift registers and generalised
feedback shift registers. Recent instances of pseudorandom
algorithms include Blum Blum Shub, Fortuna, and the
Mersenne twister.
Key Words
FLOPS (FLOP/S)
An acronym meaning FLoating point Operations Per Second.
The FLOPS is a measure of a computer's performance,
especially in fields of scientific calculations that make heavy
use of floating point calculations, similar to instructions per
second .
 Mflop/s: Million of floating point operations per second
 Gflop/s: Billions of floating point operations per second
 Tflop/s: Trillions of floating point operations per second
 Pflop/s: 1K Trillions of floating point operations per second
Performance Development
4.92 PF/s
1 Pflop/s
281 TF/s
IBM BlueGene/L
100 Tflop/s
10 Tflop/s
SUM
NEC Earth Simulator
1.17 TF/s
N=1
4.0 TF/s
IBM ASCI White
1 Tflop/s
59.7 GF/s
Intel ASCI Red
100 Gflop/s
Fujitsu 'NWT'
10 Gflop/s
N=500
0.4 GF/s
1 Gflop/s
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
100 Mflop/s
Contents
Introduction
History of Linpack Benchmark
Linpack Benchmark Matrix Generator
Problem Statement
How to Fix the Problem
Conclusion
Introduction
 Since 1993, twice a year a list of most powerful computer
system released by top500 project (www.top500.org)
 Based on result obtain from HPL Benchmark are used to
rank computer system.
 HPL Benchmark consist of solving a dense linear system in
double precision, 64 bit floating point arithmetic using
Gaussion elimination with partial pivoting.
 Supplied matrix generator, which uses a PRNG must be
used in running the HPL.
 The result or performance is reported in floating point
operations per sec (flops).
History of Linpack Benchmark
1980: LINPACKD 1.0 – The initial
LINPACKD benchmark
1989: Numerical failure report- David
Hough observed a numerical failure for
matrix size n=512
1989: LINPACKD 2.0 released
1992: LINPACKD 3.0 released
History of Linpack Benchmark
 2000: HPL 1.0 – First release of HPL. PRNG uses a
linear congruential algorithm.
X(n+1)=(a * X(n) + c) mod m
with m= 231 , a=1103515245, c=1235
The period of PRNG is 231
 2004: Numerical failure report – Gregory Bauer
observed a numerical failure with HPL and n = 217
= 131,072. HPL developers recommend to HPL
users willing to test matrices of size larger than
215 not to use power two.
History of Linpack Benchmark
2007 Numerical failure report: A large
manufacturer observed a numerical failure
with HPL and n = 213 x 271 = 2,200, 032.
Note that n = 213 x 271 = 2,200,032 is not a
power of two.
2008: HPL 2.0 – Piotr Luszczek
incorporated a new PRNG which uses a
linear congruential algorithm with a =
636416223846793005, c=11, m= 264. The
period of this PRNG is 264
Linpack Benchmark Matrix Generator
 Linpack Benchmark 1.0 matrix generator
generated pseudo-random coeffiecient matrix A
by the HPL subroutine HPL_pdmatgen.c. The
PRNG uses a linear congruential algorithm:
X(n+1)=(a * X(n) + c) mod m
with m= 231 , a=1103515245, c=1235
The period of PRNG is 231
 HPL 1.0 fill its matrices with pseudo random
numbers by columns using this sequence s
starting with A(1,1)=s(1), A(2,1)=s(2), A(3,1)=s(3)
and so on.
Linpack Benchmark Matrix Generator
A(i,j) = s((j-1)*n+1), 1 ≤ i, j ≤ n
(1)
s(i+ 231) = s(i), for any i N and s(i) ≠ s(j)
for any 1 ≤ i, j ≤ 231
(2)
 Matrix generated by Linpack Benchmark 1.0
matrix generator solely depends on the
dimension n. Its requires benchmakers to use the
same matrix for any block size, for any number of
processors or for any grid size.
Linpack Benchmark Matrix Generator
 The computed PRNG in the sequence S depend
weakly on the computer systems. Consequently
the pivot pattern of the Gaussian elimination is
preserved from on computer system to another.
 Finally, the linear congruential algorithm for the
sequence s enables the matrix generator for
scalable implementation of the constraction of
the matrix.
Problem Statement
 In May 2007, a large HPC manufacturer ran a
20 hours HPL(High performance Linpac)
Benchmark which is failed.
|| Ax-b || _∞ / ( eps * || A || _1+N )
=9.224e+94 ….. FAILED
n = 213 x 271 = 2,200,032
 It was neither a hardware failure nor a
software failure but a predictable numerical
issue.
 Using n = 2,200,032; HPL 1.0 is sometimes able
to pass all the test, sometimes two test out of
three and sometimes none of the three test.
How to Fix the Problem
 The obvious recommendation is to choose any n as long
as it is odd.
 A check can be added at the beginning of the execution
of Linpack Benchmark matrix generator.
 If nS, simply change the variable jump from M to M+1 in
HPL code (HPL_pdmatgen.c).
 Increase the period of PRNG.
 A check for correctness robust ill-conditioned matrix
could be used.
Conclusion
The problem with the Linpack Benchmark
1.0 matrix generator is now corrected in
the Linpack Benchmark 2.0 matrix
generator, uses a linear congruential
algorithm with a = 636416223846793005,
c=11, m= 264. The period of PRNG is 264.
The correction fix both extend the period
of PRNG (Pseudo-Random Number
Generator) and have a test for correctness
robust to ill condition matrices.
Download