Template Recognition of Bytecodes and Execution

advertisement
Template Recognition of Bytecodes and Execution
Using Hardware Architecture.
Thesis Proposal for the MS ECE Program
Written for Dr. Douglas Lyon
By
Ian Reddy.
Electrical and Computer Engineering Department
Fairfield University, Fairfield CT 06430
ian_snaheel@rediffmail.com
Introduction
The goal of this thesis is to explore methods for
the vectorization of Java. The Java virtual machine does
not take advantage of the vector-processing unit available
in most of the CPU’s. Other languages (like Fortran and
C/C++) are able to take advantage of these facilities
because they do not make use of the more generic byte
codes, generated by most java compilers.
1.1
Problem Statement
Given bytecodes that have been output by Java
compiler, we wish to find a way to generate ahead of time
compilation that enables the use of Vector processing
Units, common in today’s CPU’s.
A
series
of
experiments
will
be
conducted
to
demonstrate the speed-ups that are available for the code
that makes use of the SIMD architecture in the vectorprocessing unit. We anticipate speed-ups between 2 and 10
times depending on the communication overhead.
1.2
Approach
The vector-processing unit has a set of registers,
which are often unused. There are several subroutine
libraries available that are already tuned to take
advantage of these instructions. Performance speed-ups
could be achieved by using these registers for computation
intensive methods.
The VECMATH class will be used for experiments and
results. VECMATH class can be a classic example of a class
with computation intensive methods and patterns such as
loops and arrays [18], In PHASE-1 Java Native Interface
will be generated for the VECMATH class (4 weeks), in
PHASE-2 the VECMATH class library will be manipulated to
use the Vector Processing Unit (8 weeks), In PHASE-3
wrappers for the VECMATH will be generated (6 weeks), In
PHASE-4 all the invocations to VECMATH will be replaced
with invocations referencing the manipulated VECMATH class
(6 weeks). In PAHSE-5 experiments will be conducted to test
the performance of the Vectorized class and speed-ups in
execution will be measured (6 weeks).
1.3 Motivation
Modern CPU’s are equipped with Vector processing
Units, which are additional set of registers, which are
rarely used. As in the case of JAVA slow performance is an
important drawback, as the Java Virtual machine does not
know how to use these Vector Processing Units. In this
thesis we plan to introduce explicit calls to the Vector
Processing Units for computation intensive methods. This
will compel the Java Virtual machine to use the Vector
Processing Unit; considerable speed-ups in execution can be
achieved. This will lead to improvement in performance of
applications written in JAVA.
1.4 Related Work
Considerable research has been done to enhance
Java performance. Java seems to have problems with the way
it organizes and manages multidimensional arrays, which
pose performance problems; customized array packages have
been developed to address these problems. These arrays are
used for high performance computing, they have their own
set of invocations to perform array computing in optimized
fashion [11, 13, 15, 17]. Exception management seem to add
overheads to Java performance, techniques to eliminate
unwanted exception have been proposed. In cases where
exceptions are used in iterations, worst case in nested
iterations; algorithms exist to modify the bytecodes, such
that these iterations could be wrapped into a single
exception,
there
by
reducing
the
overheads,
and
contributing to performance speed-ups [4]. Since FORTRAN, C
and C++ seem to perform better than java, techniques have
been proposed to convert java source or bytecodes to native
language, may be C, C++, ASSEMBLY or FORTRAN. Sometimes,
profilers [18] are used to detect methods which are
computational intensive, and use most of the execution
time, such methods are converted to native code, wrappers
are generated for such native and code and placed in the
bytecodes, such that these invocations would use the native
code, instead of the java code. Hybrid compilation models
have been proposed, selection strategies are used there may
be more than one method waiting to be translated to native
code due to the concurrent execution, the decision as to
which method must go first will effect the performance of
the VM, such decisions are made on the basis of historical
runs. On the other hand the selection could be made between
JIT and interpretation [12, 20]. Other techniques include
manipulation of the JIT to the working of the virtual
machine [2]. Considerable research as to how the underlying
hardware architecture could be used to enhance Java
compilation techniques has been mentioned [1, 17]. Just In
time
compilation
Heuristics
have
been
proposed
to
manipulate bytecodes to take advantage of the hardware
architecture organization. Such heuristics, target specific
architecture like IA64 [9, 10]. Tools and algorithms exist
which can automatically detect implicit loop parallelism
and exploit them at bytecode level [3, 5, 6, 7, 8, 16].
References
[1] Java Optimization for Super scalar and Vector
Architectures
Douglas Lyon. Fairfield University, Fairfield, CT.
[2] Java Virtual Machine – Present and near Future
Qiaoyin Li. Sony Distributed Systems Laboratory.
[3] Java Virtual Machine Performance Analysis with Java
Instruction Level Parallelism and Advanced Folding Scheme.
Austin Kim and Morris Chang
Dept. of Computer Science, Illinois Institute of Technology
[4] Eliminating Exception Constraints of Java programs for
IA-64
Kazuaki Ishizaki, Tatushi Inagaki, Hideaki Komatsu, Toshio
Nakatani.
[5] Semi-Automatic Parallelization of Java Applications.
Pascal A. Felber.
[6] Automatic Loop Transformation and Parallelization for
Java.
Pedro V. Artigas, Manish Gupta, Samuel P.Midkiff, Jose E
Moreiera.
[7] JAVAB – A prototype bytecode Parallelization tool.
Aart J.C. Bik and Dennis B. Gannon.
[8] JAVAB – A prototype bytecode Parallelization Manual.
Aart J.C. Bik and Dennis B. Gannon.
[9] Java Just In Time Compilation for IA-64 Architecture
Svante Arvedahl.
Ling Ping University
[10] Towards An Optimizing JIT Compiler for IA-64
Samuel K. Sanseri.
[11] High Performance and Computing in Java: Language and
Compiler Issues
P. V. Artigas, M. Gupta, S. P. Midkiff, J. E. Moreira.
[12] Techniques for Obtaining High Performance in Java
Programs
Iffat H. Kazi, Howard. H. Chen, Berdenia Stanely, and David
J. Lilja. University Of Minnesota.
[13] A Standard Java Array Package for Technical Computing
Jose E Moriera, Samuel. P. Midkiff, Manish Gupta.
IBM T.J. Watson Research Center.
[14] Java Virtual Machine Specification [book]
Tim Lindholm and Frank Yellin.
[15] Developing Numerical Libraries in JAVA
Ronald. F. Boisvert, Jack. J. Dongarra, Roldan Pozo,
Karin. A. Remington, G.W. Stewart
[16] Optimizing direct Threaded code by selective In
lining.
Ian Piumarta and Fabio Riccardi.
[17] Algebraic Java classes for Numerical Optimizations.
Matthias Schwab, Joel Schroeder
JEST (Java Estimates).
[18] Java Performance Tuning [book]
Jack K. Shirazi.
[19] A Java Virtual Machine Design Based on Hybrid
Concurrent Compilation Model
Zheng Weimin, Zheng Fengzhou, Yang Bo, Wang Yanling
[20] Vector Signal Image Processing Library
www.vsipl.org
Download