Template Recognition of Bytecodes and Execution Using Hardware Architecture. Thesis Proposal for the MS ECE Program Written for Dr. Douglas Lyon By Ian Reddy. Electrical and Computer Engineering Department Fairfield University, Fairfield CT 06430 ian_snaheel@rediffmail.com Introduction The goal of this thesis is to explore methods for the vectorization of Java. The Java virtual machine does not take advantage of the vector-processing unit available in most of the CPU’s. Other languages (like Fortran and C/C++) are able to take advantage of these facilities because they do not make use of the more generic byte codes, generated by most java compilers. 1.1 Problem Statement Given bytecodes that have been output by Java compiler, we wish to find a way to generate ahead of time compilation that enables the use of Vector processing Units, common in today’s CPU’s. A series of experiments will be conducted to demonstrate the speed-ups that are available for the code that makes use of the SIMD architecture in the vectorprocessing unit. We anticipate speed-ups between 2 and 10 times depending on the communication overhead. 1.2 Approach The vector-processing unit has a set of registers, which are often unused. There are several subroutine libraries available that are already tuned to take advantage of these instructions. Performance speed-ups could be achieved by using these registers for computation intensive methods. The VECMATH class will be used for experiments and results. VECMATH class can be a classic example of a class with computation intensive methods and patterns such as loops and arrays [18], In PHASE-1 Java Native Interface will be generated for the VECMATH class (4 weeks), in PHASE-2 the VECMATH class library will be manipulated to use the Vector Processing Unit (8 weeks), In PHASE-3 wrappers for the VECMATH will be generated (6 weeks), In PHASE-4 all the invocations to VECMATH will be replaced with invocations referencing the manipulated VECMATH class (6 weeks). In PAHSE-5 experiments will be conducted to test the performance of the Vectorized class and speed-ups in execution will be measured (6 weeks). 1.3 Motivation Modern CPU’s are equipped with Vector processing Units, which are additional set of registers, which are rarely used. As in the case of JAVA slow performance is an important drawback, as the Java Virtual machine does not know how to use these Vector Processing Units. In this thesis we plan to introduce explicit calls to the Vector Processing Units for computation intensive methods. This will compel the Java Virtual machine to use the Vector Processing Unit; considerable speed-ups in execution can be achieved. This will lead to improvement in performance of applications written in JAVA. 1.4 Related Work Considerable research has been done to enhance Java performance. Java seems to have problems with the way it organizes and manages multidimensional arrays, which pose performance problems; customized array packages have been developed to address these problems. These arrays are used for high performance computing, they have their own set of invocations to perform array computing in optimized fashion [11, 13, 15, 17]. Exception management seem to add overheads to Java performance, techniques to eliminate unwanted exception have been proposed. In cases where exceptions are used in iterations, worst case in nested iterations; algorithms exist to modify the bytecodes, such that these iterations could be wrapped into a single exception, there by reducing the overheads, and contributing to performance speed-ups [4]. Since FORTRAN, C and C++ seem to perform better than java, techniques have been proposed to convert java source or bytecodes to native language, may be C, C++, ASSEMBLY or FORTRAN. Sometimes, profilers [18] are used to detect methods which are computational intensive, and use most of the execution time, such methods are converted to native code, wrappers are generated for such native and code and placed in the bytecodes, such that these invocations would use the native code, instead of the java code. Hybrid compilation models have been proposed, selection strategies are used there may be more than one method waiting to be translated to native code due to the concurrent execution, the decision as to which method must go first will effect the performance of the VM, such decisions are made on the basis of historical runs. On the other hand the selection could be made between JIT and interpretation [12, 20]. Other techniques include manipulation of the JIT to the working of the virtual machine [2]. Considerable research as to how the underlying hardware architecture could be used to enhance Java compilation techniques has been mentioned [1, 17]. Just In time compilation Heuristics have been proposed to manipulate bytecodes to take advantage of the hardware architecture organization. Such heuristics, target specific architecture like IA64 [9, 10]. Tools and algorithms exist which can automatically detect implicit loop parallelism and exploit them at bytecode level [3, 5, 6, 7, 8, 16]. References [1] Java Optimization for Super scalar and Vector Architectures Douglas Lyon. Fairfield University, Fairfield, CT. [2] Java Virtual Machine – Present and near Future Qiaoyin Li. Sony Distributed Systems Laboratory. [3] Java Virtual Machine Performance Analysis with Java Instruction Level Parallelism and Advanced Folding Scheme. Austin Kim and Morris Chang Dept. of Computer Science, Illinois Institute of Technology [4] Eliminating Exception Constraints of Java programs for IA-64 Kazuaki Ishizaki, Tatushi Inagaki, Hideaki Komatsu, Toshio Nakatani. [5] Semi-Automatic Parallelization of Java Applications. Pascal A. Felber. [6] Automatic Loop Transformation and Parallelization for Java. Pedro V. Artigas, Manish Gupta, Samuel P.Midkiff, Jose E Moreiera. [7] JAVAB – A prototype bytecode Parallelization tool. Aart J.C. Bik and Dennis B. Gannon. [8] JAVAB – A prototype bytecode Parallelization Manual. Aart J.C. Bik and Dennis B. Gannon. [9] Java Just In Time Compilation for IA-64 Architecture Svante Arvedahl. Ling Ping University [10] Towards An Optimizing JIT Compiler for IA-64 Samuel K. Sanseri. [11] High Performance and Computing in Java: Language and Compiler Issues P. V. Artigas, M. Gupta, S. P. Midkiff, J. E. Moreira. [12] Techniques for Obtaining High Performance in Java Programs Iffat H. Kazi, Howard. H. Chen, Berdenia Stanely, and David J. Lilja. University Of Minnesota. [13] A Standard Java Array Package for Technical Computing Jose E Moriera, Samuel. P. Midkiff, Manish Gupta. IBM T.J. Watson Research Center. [14] Java Virtual Machine Specification [book] Tim Lindholm and Frank Yellin. [15] Developing Numerical Libraries in JAVA Ronald. F. Boisvert, Jack. J. Dongarra, Roldan Pozo, Karin. A. Remington, G.W. Stewart [16] Optimizing direct Threaded code by selective In lining. Ian Piumarta and Fabio Riccardi. [17] Algebraic Java classes for Numerical Optimizations. Matthias Schwab, Joel Schroeder JEST (Java Estimates). [18] Java Performance Tuning [book] Jack K. Shirazi. [19] A Java Virtual Machine Design Based on Hybrid Concurrent Compilation Model Zheng Weimin, Zheng Fengzhou, Yang Bo, Wang Yanling [20] Vector Signal Image Processing Library www.vsipl.org