The Problem With The Linpack Benchmark 1.0 Matrix Generator Jack J. Dongarra and Julien Langou International Journal of High Performance Computing Applications 2009; 23; 5 Group Members Ahmad Hamidi Aneel Zulkhairi Idris Adam Malik Sahar Key Words Linpack Benchmark The Linpack benchmark measures the floating point rate of execution for solving a linear system of equations. This benchmark is used to determine the number of Floating Point Operations per Second (FLOPS) a super-computer can achieve. Results are published periodically via the Top500 list (http://www.top500.org). LINPACK A collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tridiagonal square. LINPACK uses columnoriented algorithms to increase efficiency by preserving locality of reference. Key Words Pseudo-Random Number Generator (PRNG) An algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state. Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom numbers are important in practice for simulations (e.g., of physical systems with the Monte Carlo method), and are central in the practice of cryptography. Common classes of these algorithms are linear congruential generators, Lagged Fibonacci generators, linear feedback shift registers and generalised feedback shift registers. Recent instances of pseudorandom algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister. Key Words FLOPS (FLOP/S) An acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second . Mflop/s: Million of floating point operations per second Gflop/s: Billions of floating point operations per second Tflop/s: Trillions of floating point operations per second Pflop/s: 1K Trillions of floating point operations per second Performance Development 4.92 PF/s 1 Pflop/s 281 TF/s IBM BlueGene/L 100 Tflop/s 10 Tflop/s SUM NEC Earth Simulator 1.17 TF/s N=1 4.0 TF/s IBM ASCI White 1 Tflop/s 59.7 GF/s Intel ASCI Red 100 Gflop/s Fujitsu 'NWT' 10 Gflop/s N=500 0.4 GF/s 1 Gflop/s 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 100 Mflop/s Contents Introduction History of Linpack Benchmark Linpack Benchmark Matrix Generator Problem Statement How to Fix the Problem Conclusion Introduction Since 1993, twice a year a list of most powerful computer system released by top500 project (www.top500.org) Based on result obtain from HPL Benchmark are used to rank computer system. HPL Benchmark consist of solving a dense linear system in double precision, 64 bit floating point arithmetic using Gaussion elimination with partial pivoting. Supplied matrix generator, which uses a PRNG must be used in running the HPL. The result or performance is reported in floating point operations per sec (flops). History of Linpack Benchmark 1980: LINPACKD 1.0 – The initial LINPACKD benchmark 1989: Numerical failure report- David Hough observed a numerical failure for matrix size n=512 1989: LINPACKD 2.0 released 1992: LINPACKD 3.0 released History of Linpack Benchmark 2000: HPL 1.0 – First release of HPL. PRNG uses a linear congruential algorithm. X(n+1)=(a * X(n) + c) mod m with m= 231 , a=1103515245, c=1235 The period of PRNG is 231 2004: Numerical failure report – Gregory Bauer observed a numerical failure with HPL and n = 217 = 131,072. HPL developers recommend to HPL users willing to test matrices of size larger than 215 not to use power two. History of Linpack Benchmark 2007 Numerical failure report: A large manufacturer observed a numerical failure with HPL and n = 213 x 271 = 2,200, 032. Note that n = 213 x 271 = 2,200,032 is not a power of two. 2008: HPL 2.0 – Piotr Luszczek incorporated a new PRNG which uses a linear congruential algorithm with a = 636416223846793005, c=11, m= 264. The period of this PRNG is 264 Linpack Benchmark Matrix Generator Linpack Benchmark 1.0 matrix generator generated pseudo-random coeffiecient matrix A by the HPL subroutine HPL_pdmatgen.c. The PRNG uses a linear congruential algorithm: X(n+1)=(a * X(n) + c) mod m with m= 231 , a=1103515245, c=1235 The period of PRNG is 231 HPL 1.0 fill its matrices with pseudo random numbers by columns using this sequence s starting with A(1,1)=s(1), A(2,1)=s(2), A(3,1)=s(3) and so on. Linpack Benchmark Matrix Generator A(i,j) = s((j-1)*n+1), 1 ≤ i, j ≤ n (1) s(i+ 231) = s(i), for any i N and s(i) ≠ s(j) for any 1 ≤ i, j ≤ 231 (2) Matrix generated by Linpack Benchmark 1.0 matrix generator solely depends on the dimension n. Its requires benchmakers to use the same matrix for any block size, for any number of processors or for any grid size. Linpack Benchmark Matrix Generator The computed PRNG in the sequence S depend weakly on the computer systems. Consequently the pivot pattern of the Gaussian elimination is preserved from on computer system to another. Finally, the linear congruential algorithm for the sequence s enables the matrix generator for scalable implementation of the constraction of the matrix. Problem Statement In May 2007, a large HPC manufacturer ran a 20 hours HPL(High performance Linpac) Benchmark which is failed. || Ax-b || _∞ / ( eps * || A || _1+N ) =9.224e+94 ….. FAILED n = 213 x 271 = 2,200,032 It was neither a hardware failure nor a software failure but a predictable numerical issue. Using n = 2,200,032; HPL 1.0 is sometimes able to pass all the test, sometimes two test out of three and sometimes none of the three test. How to Fix the Problem The obvious recommendation is to choose any n as long as it is odd. A check can be added at the beginning of the execution of Linpack Benchmark matrix generator. If nS, simply change the variable jump from M to M+1 in HPL code (HPL_pdmatgen.c). Increase the period of PRNG. A check for correctness robust ill-conditioned matrix could be used. Conclusion The problem with the Linpack Benchmark 1.0 matrix generator is now corrected in the Linpack Benchmark 2.0 matrix generator, uses a linear congruential algorithm with a = 636416223846793005, c=11, m= 264. The period of PRNG is 264. The correction fix both extend the period of PRNG (Pseudo-Random Number Generator) and have a test for correctness robust to ill condition matrices.