Numeric and Performance Issues of Java Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology Numeric and Performance Issues of Java for Grid Computing! Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology Why Java? • • • • • • • Portability of the Java Virtual Machine (JVM) Safe, minimize memory leaks and pointer errors Security model Network-aware environment Parallel and Distributed computing: threads, RMI standard interfaces for graphics, GUIs, databases, etc. Widely adopted Why not Java? • Performance – interpreters too slow – poor optimizing compilers – virtual machine Why not Java? • lack of scientific software – computational libraries – numerical issues/interfaces – major effort to port from f77/C Issues • performance • library/application support • numeric concerns • language features/limitations Performance Java Benchmarking Efforts • • • • Caffine Mark SPECjvm98 Java Linpack Java Grande Forum Benchmarks • SciMark • Image/J benchmark • • • • • • • BenchBeans VolanoMark Plasma benchmark RMI benchmark JMark JavaWorld benchmark ... SciMark Benchmark • Numerical benchmark for Java, C/C++ • composite results for five kernels: – – – – – FFT (complex, 1D) Successive Over-relaxation Monte Carlo integration Sparse matrix multiply dense LU factorization • results in Mflops • two sizes: small, large http://math.nist.gov/scimark SciMark results • over 200 platforms (OS/CPU/JVM) • deployed from one mouse click – numerical libraries – instrumentation/analysis – optional database collection • impossible to do in Fortran/C/C++ SciMark 2.0 results 350 AMD 1.5 GHz, IBM JDK 1.3, OS/2 300 Intel PIII (600 MHz), IBM 1.3, Linux 250 200 AMD Athlon (750 MHz), IBM 1.1.8, OS/2 Intel Celeron (464 MHz), MS 1.1.4, Win98 Sun UltraSparc 60, Sun 1.1.3, Sol 2.x 150 SGI MIPS (195 MHz) Sun 1.2, Unix 100 Alpha EV6 (525 MHz), NE 1.1.5, Unix 50 0 JVMs have improved over time 35 30 25 20 15 10 5 0 1.1.6 1.1.8 SciMark : 333 MHz Sun Ultra 10 1.2.1 1.3 SciMark: Java vs. C (Sun UltraSPARC 60) 90 80 70 60 50 C 40 Java 30 20 10 0 FFT SOR * Sun JDK 1.3 (HotSpot) , javac -0; MC Sparse Sun cc -0; SunOS 5.7 LU SciMark: Java vs. C (Intel PIII 500MHz, Linux) 160 140 120 100 80 C 60 Java 40 20 0 FFT * RH Linux 6.2, SOR gcc (v. 2.91.66) -06, MC Sparse IBM JDK 1.3, javac -O LU Current JVMs aren’t so bad... • Scimark high score: 323 Mflops* – – – – – FFT: 271 Mflops Jacobi: 353 Mflops Monte Carlo: 60 Mflops Sparse matmult: 308 LU factorization: 621 Mflops * 1.5 GHz AMD Athlon, IBM 1.3.1, OS/2 Making Java fast(er) • Native methods (JNI) • stand-alone compliers (.java -> .exe) • modified JVMs – (fused mult-adds, bypass array bounds checking) • aggressive bytecode optimization – JITs, HotSpot, etc. • bytecode transformers • concurrency: threads, RMI Numerical Software (Libraries) Scientific Java Libraries • Matrix library (JAMA) – NIST/Mathworks – LU, QR, SVD, eigenvalue solvers • Java Numerical Toolkit (JNT) – – – – special functions quadrature random numbers BLAS subset • IBM – Array class package • Univ. of Maryland – Linear Algebra library • JLAPACK – port of LAPACK • COLT • Visual Numerics – LINPACK – Complex Java Numerics Group • industry-wide consortium to establish tools, APIs, and libraries – IBM, Intel, Compaq/Digital/HP, Sun, MathWorks, VNI, NAG – NIST, Inria – Berkeley, UCSB, Austin, MIT, Indiana • component of Java Grande Forum – Concurrency group Scientific Java Resources • Java Numerics Group – http://math.nist.gov/javanumerics • Java Grande Forum – http://www.javagrade.org • SciMark Benchmark – http://math.nist.gov/scimark Java Numerics Issues • Virtual Machine – – – – reproducability: IEEE floating point model multidimensonal arrays complex data types lightweight objects • Language – operator overloading – generic typing (templates) Floating point model • binary reproducability hard for numerics • floating point model based on Sun architecture... – IEEE 754 standard, no extended formats – on other platforms (Intel, IBM, Alpha) must round down: up to 10x performance hit – solved by relaxing JVM model: extended exponents OK, use strictfp otherwise. Floating Point Model (cont’d) • Elementary functions – – – – loose definition based on Sun C lib e.g. on x86: hardware sin(x) = x for x>2^64 solved by allowing results to vary in 1 ulp use StrictMath library otherwise • Fused Multiply-Adds (FMAs) – available on many platforms, 50% hit – discussed, but not yet resolved Java Arrays • 1-d array size limited to 2^32 elements • multi-arrays ala C: A[i][j] • need Fortran-like arrays for best optimization – no aliasing of subsections – map to 1-d vector • proposed ‘row-major’ contiguous array class – use A.get(i,j): ugly & slow? Java vs. Fortran Performance 250 Mflops 200 150 100 50 MAT MUL T 0 F an r t r o BS O M a Jav S HA LLO W *IBM RS/6000 67MHz POWER2 (266 Mflops peak) AIX Fortran, HPJC Complex numbers / Lightweight Objects • • • • class has wrong semantics (x = = y) can’t return/pass by value temporaries can clog garbage collection solutions: – add complex native type to JVM (?) – allow lightweight objects (C structs) • general solution for intervals, multiprecision, etc. • requires new category of variables in JVM – use preprocessor to unroll complex operations Java Language Issues • operator overloading – numeric community makes best case – unlikely to be introduced into the language – can be solved by external preprocessors • templates (generics) – proposal for Generic Java (GJ) passed through standardization process • Is Java a language issue at all...? Conclusions • Java’s biggest strength for scientific computing: binary portability, dynamic distribution • Java numerics performance can be competitive – can achieve efficiency of optimized C/Fortran – will it, depends on economics -- not technology • best Java performance on commodity platforms • improving Java numerics: – integrate true array and complex into Java standard – more libraries and numerical software support