MS Word

advertisement
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
Inlining Java Bytecode
Fabio Pellacini and Ioannis Vetsikas
CS612 final project report
1 Problem statement
Given the OO Java design, small function calls are heavily present in Java programs. In case of programs
that use heavily small functions (such as the arithmetic operations on complex numbers, or accessor
methods) the execution time can be improved a lot by inlining the bytecode. This is true in all OO
languages but especially in Java bytecode since the overhead time required for a method to be invoked is
extremely high compared to native code.
Our project aims at speeding up Java bytecode by inlining functions. To do that we first analyze the
bytecode to determine the set of loaded and instantiated classes. Using the latter we prepare the bytecode
for the inlining phase using a type hierarchy analysis. And finally we inline the bytecode.
Using this very simple system, we got some pretty nice results in case of programs like a complex matrix
multiplication.
2 Bytecode details
This chapter will be a small introduction on the structure of the bytecode and some details about JVM
execution meant to be helpful in understanding the rest of the implementation (it is possible to find more
information about this in [1]). In the following discussion we will assume that no reflection is used.
2.1 The class file format
For each class, the JVM requires to have a file containing the bytecode of that class.
A class file, schematically, contains information about the class hierarchy, an area called constant pool,
information about the fields that are declared in the Java class, information about the methods and a series
of attributes.
The class info section stores the name of the class defined, the name of its parent class, the access flags
for the current class and an array of the interfaces that the class implements. As can be easily seen, this
preserves all the information about the type structure that is in the Java source code.
The constant pool section stores all the constants needed in the class file. Whenever a constant is needed
in the bytecode, the constant is replaced by an index in the constant pool. The constants are also typed, so
we know if we are referring to an integer, a class reference, a string and so on so forth.
The field info section is an array of fields’ information. They basically contain a Java field declaration,
i.e. field name, type and modifiers and an array of attributes relative to that field. As in all the other
attributes in Java class files, there are standard attributes declared in the specification, and one can also
define ones own attributes (so that one can easily annotate the bytecode).
The method section is very similar to the field one. It contains the Java declaration of the method
(signature and modifiers) and a set of attributes. One of the standard ones that is very important for us is the
Code attribute. It contains the compiled code for the method if the method is not abstract or native.
2.2 Class initialization, object instantiation and finalizations
In order to perform type hierarchy analysis, we need to know when classes get loaded and when they are
instantiated. We will assume no specific order for loading and verifying Java bytecodes (unless required for
the initialization process).
1
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
Initialization of a class consists of executing its static initializers1[1 section 2.17.4]. Initialization of an
interface consists of executing the initializers for fields declared in the interface. Before a class or interface
is initialized, its direct superclass must be initialized, but interfaces implemented by the class need not be
initialized. Similarly, the superinterfaces of an interface need not be initialized before the interface is
initialized.
A class or interface type T will be initialized immediately before one of the following occurs:
 T is a class and an instance of T is created.
 T is a class and a static method of T is invoked.
 A nonconstant static field of T is used or assigned. A constant field is one that is (explicitly or
implicitly) both final and static, and that is initialized with the value of a compile-time constant
expression. A reference to such a field must be resolved at compile time to a copy of the compile-time
constant value, so uses of such a field never cause initialization.
An object of a class is instatiated only when a NEW instruction is executed in the bytecode [1 section
2.17.6]. After this instruction there is a call to the appropriate constructor (named <init> in the
bytecode), so we don’t need to deal explicitly with them 2.
For the purpose of our analysis we also need to take into account finalization. Objects are finalized before
they are collected by the garbage collector. To do this, Java execute the finalize method. Classes are
finalized before they are unloaded by the JVM, and the finalization process is a call to the
finalizeClass method [1 section 2.17.7].
2.3 Bytecode method code
Every method call in the JVM has its own stack frame (the stack is not shared between method like in
standard native compilation). Also local variables are not stored on the method stack as far as the bytecode
instruction are concerned. Every method declares the maximum size of the stack and the maximum number
of locals.
2.4 Bytecode method invocation
The procedure for calling a method is similar to the way we usually call when we compile to native code.
All arguments are put on the stack of the caller, and then we invoke an instruction to jump to subroutine.
After the execution of the callee is finished, the result is put on the stack.
The callee receives the parameters in the first n locals (where n is the number of arguments of the callee
method). The first parameter, in case of non-static functions, is the reference to the object from which the
function is called.
There are four different instructions in the bytecode to jump to a new method. They all need a method
reference that is an index in the constant pool. The method reference contains the signature (that also
contains the class name) of the method that must be executed.
The instructions for calling a method are
 invokestatic: used to invoke static methods (statically linked, so can be inlined)
 invokespecial: used to call private, superclass methods, and constructors (statically linked, so can
be inlined)
 invokevirtual: used for virtual calls (dynamic liked, cannot be inlined)
 invokeinterface: used to call a method defined in an interface (dynamic liked, cannot be inlined)
For virtual method calls, the class in the method reference will be the root of the subtree of possibly
overriding classes. For interface calls the reference will be to the method in the interface.
1
In our case, we are only interested in being sure we execute the code in the static initializer for the class,
i.e. the <clinit> method.
2
Actually this is not true for String, but we assume that we will always consider String as loaded in the
JVM, so for our specific application this doesn’t matter.
2
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
3 Description of solution
3.1 Overall structure
Our system takes as input bytecode and outputs optimized bytecode; it does not require Java code. We
assume that all the class files needed are provided to the system and that includes all the libraries needed
for the program to run.
The work is being done in several phases as portrayed in the following schema:
Bytecode
System lib.
J
A
X
Class
Hierarchy
Analyzer
Inliner
Fixer
J
A
X
The first stage is to take the bytecode and the system libraries and insert them into the class hierarchy
analyzer, which analyzes the bytecode, reconstructs the class hierarchy, finds which classes are instantiated,
reproduces the caller graph and resolves as many invocations as it can.
The Inliner is the next stage, which inlines the bytecode where it is deemed necessary and possible.
Then the resultant bytecode is passed through a “fixing” stage to correct any problems related to the
security mechanism.
We have also found a tool from IBM called JAX 3 [2], which can be optionally used before the analyzer to
compress the class hierarchy and, after the final bytecode is produced, to strip the code of the methods
which are not used and to clear up the constant pool.
3.2 Class hierarchy analysis
3.2.1 General
This stage is aimed at getting enough information out of the program to be optimized so that we can
determine if some method invocations, which appear to be polymorphic4, are in fact monomorphic5.
According to the Java Virtual Machine specifications the invokestatic and invokespecial
commands are monomorphic invocations, whereas the invokevirtual and invokeinterface
commands are polymorphic. The inliner can only safely inline monomorphic calls. Therefore it is desirable
to figure out if ‘virtual’ or ‘interface’ call is in fact monomorphic in which case the command is changed to
‘special’ call to indicate to the inliner that the call is monomorphic.
3.2.2 Main idea
We use a simple data flow analysis. What we need to determine is what classes get loaded and which of
those get instantiated. We also need to get an approximation of the caller graph 6. The information therefore
that needs to be propagated in the data flow analysis is the class hierarchy tree and the set of instantiated
classes. So what needs to be done is start from the main function and follow the edges in the caller graph.
We therefore need to keep track of classes that get instantiated, of accesses to static fields and of method
invocations.
3
Its use is briefly described in a later part of this report.
An invocation site is called polymorphic if the invocation at that site can call more than one overridden
method.
5
An invocation site is called monomorphic if only one particular method can get called.
6
The caller graph is a graph where there is a directed edge between two methods of the program if and
only if the first method calls the second one. In our approximation we can have more than one edges per
call site in case the call site is polymorphic.
4
3
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
3.2.3 Implementation
3.2.3.1 Input
Our program only needs to get the name of the class where the ‘main’ method is located. No a priori
information of which classes and methods are loaded and used is necessary, since the analyzer finds all that
information for itself.
3.2.3.2 How it works


It starts from the ‘main’ method
For each method which gets analyzed the following are done in the following order:
 If the corresponding class does not exist it is imperative that all super classes and super interfaces
get loaded (when loading a class the methods <clinit>, finalize and classFinalize
must be called7).
 A node is inserted in the caller graph
 If the method is used for the first time:
 it must be checked whether the method is a “dangerous reflection” method (e.g. forName,
newInstance, loadClass etc.)
 we must check all the instructions in the code of the method for presence of the NEW
instruction, in which case the set of instantiated classes must be updated with the new classes
which got instantiated
 If the method is used for the first time or the set of instantiated classes has grown (some more
classes got instantiated) since the last use of the method we need to check the invocation sites to
make sure that all the possible methods that could potentially get called are indeed called. This
means of course that we need to check the whole hierarchy again, which could potentially have
changed since the last use of the method, and for each class in the sub tree which is instantiated
consider that the appropriate method gets used.
The set of loaded classes, the set of instantiated classes and the caller graph that the analyzer computes
are in fact “conservative approximations” of the real ones. What we mean by that is that we might
actually estimate that more classes are loaded or are instantiated and that more methods can get called
than they really are. However, we are being conservative in doing that and we will not underestimate
the number of methods that get called from a call site or the number or classes that get loaded and
therefore when we devirtualize we preserve the equivalence of the resulting program to the original
one.
It should be noted that when a class or an interface is loaded its super classes or super interfaces are
“updated” to know which their “children” are in the hierarchy. Also when a method gets used it
“remembers” a few pieces of information (e.g. which methods it calls, or how big the instantiated class
set was the last time it was used etc.)


3.2.4 Running Time Complexity
Assume that:
M = total number of methods which are used
C = total number of classes which are instantiated
Li = length of code in method i
Hi = cost to traverse the hierarchical sub trees from each invocation site in the code of method i and
determine which methods get called
For each of the M methods:
 Search its code for NEW instructions. This takes O ( Li ) time.





The analysis for which methods get invoked (can be run) C times in the worst case, since the analysis
is run only if the instantiated classes set has grown and C is the size the set gets to at the end and
Since the finalizers are called before the objects are collected, and we don’t know when this is going to
happen, we will call them at the beginning of the analysis since it does not really matter.
7
4
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
obviously that size cannot decrease. For each run the cost is
O( H i  L i ) equal to the sum of the
cost of checking all the method code and the cost of traversing the hierarchical sub trees from each
invocation site in the code of method i and determining which methods get called. Thus this takes
O(C  ( H i  Li )) time.

So total cost is:



O i 1 Li  C  ( H i  Li ) which, since C  1 in a OO program, gives
M

O i 1 C  ( H i  Li )

M
But this is actually a worst case analysis, in practice it is not really necessary to run the analysis for
each method C times but usually less.
3.2.5 Type hierarchical analysis and devirtualization
This is the aftermath of the class hierarchy analysis. What is done is for each polymorphic call site to go
through the class hierarchy tree and determine which methods can get called; if their number is one then the
call is really monomorphic and thus the call should be changed to invokespecial so that the inliner
will know that this is so8.
3.2.6 Caveats
As the results of the analyzer on several programs have pointed out, it is unfortunately quite often the
case that it is impossible to know exactly what happens in each part of the program (every class and every
method). This is due to the use of two Java platform features:
3.2.6.1 Reflection
The reflection API allows the programmer to do introspection of object and classes in the current JVM.
By means of that the programmer has fields and methods references. Using the latter the programmer can
call a method in a way that we cannot track the call using the given analysis. It is also possible to
dynamically load new classes, which can also change a class that is in memory. This actually breaks down
the analysis even worse, since there is no way to detect this 9.
The unique way, in which we deal with that, is to at least detect that it has been used. If so, we cannot
guarantee any part of the analysis, so we cannot devirtualize safely10.
3.2.6.2 Java Native Interface (JNI)
The Java Native Interface JNI is the standard Java interface to native libraries. A native method can
create objects, modify fields, call Java methods and load new classes. All these possibilities break down our
analysis completely. Since we don’t have the code for native methods, these are just black boxes for us, that
we cannot analyze.
Unfortunately they are used extensively (just print a string). We assume that a native method will not give
us problems; this is true, at least, for all native methods in system libraries. If we find a native method call
in the user code, then we cannot devirtualize safely11.
8
We can define a set of classes than we are not going to change during the inliner phase, nor we change
any call site to a method in that set of classes. From now on they will be called system classes, since
normally we include the system libraries in this set.
9
To our knowledge, there are no compilers or tools that do static analysis that are able to do anything for
reflection.
10
We can always inline static and private methods, and superclass calls.
11
We can always inline static and private methods, and superclass calls.
It should be noted however that since the existence of native functions is the case in almost every
program, we have added the option to the analyzer for different levels of security. This means that,
depending on the security level given by the user, the analyzer will try to devirtualize anyway, but it will
output a warning message to inform the user that the optimization might not work and the reason for it
(existence of native method, use of reflection etc.)
5
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
3.3 Inliner
3.3.1 Inlining heuristic
In the current implementation the inliner will inline every static and special call; there is no heuristic for
inlining. In the test cases we ran, inlining was always better than calling a function. We really don’t have
the problem of having bigger code as in standard inlining procedures, since the cost of a function call is
pretty high in Java bytecode execution.
3.3.2 Support for static/special calls
Given a statically linked method, we can easily inline the callee code. To do this we will follow the
following scheme:
1. Grow maxstack and maxlocals to the sum of the caller and the callee max values12
2. Pop arguments from the stack and store them into the new local variables
3. Insert code for NullPointerException check, in case we are inlining non-static methods (the
code we insert is the same as the compiled code for the statement
if(ref != null)throw new NullPointerException();
4. Insert code for synchronization, in case of synchronized methods (MONITORENTER instruction)
5. Insert the callee code, renumbering all references to local variables; in case we are inlining a method
that is in another class we also need to fix the constant pool
6. Change every return to unconditional jump to the instruction in the caller code which is the first one
after the call
7. Insert code for synchronization, in case of synchronized methods (MONITOREXIT instruction)
After inlining all the methods in a given caller, we also run a little optimization procedure. This will clean
up the resulting code in two ways. We will take out the jump instruction that refer to the next bytecode
instruction and we will clean up some NOP instruction that we used as placeholders for making jump
easier to deal with.
3.3.3 Note about inlining by jump
We also tried to inline the code in the caller by using the jump to subroutine instruction used normally to
compile the finally statement. We did this to avoid an unwanted grow of code size in case we need to inline
several times the same function. The other reason why we were interested in such an inlining procedure,
was that by doing so we basically compile method call with a shared stack, so it would have been useful to
understand the behavior of this method call.
The results we got after running this code are incredibly slower than the non-inlined code. The reason
seems to the fact that the jsr and ret instruction we use are only used to compile the finally
statement, and since the JVM is very slow in dealing with exception, this is much slower than a normal
call.
Unfortunately in the bytecode instruction set there is no other way to implement a subroutine call and
return. In fact we need an instruction to jump to a bytecode address found on the stack and the unique one
is the ret instruction. This expects to find on the stack a return address put there by the jsr instruction
(remember that the JVM verifies the type of every element on the stack before using it, so we cannot just
push an integer on the stack by ourselves, because its type is not “return address”, but “int”).
3.3.4 Caveats
There are some caveats that we have to deal with for inlining correctly. Those are dealing with cycles in
the caller graph, dealing with exception handling and finally dealing with problems due to the way Java
enforces correctness in the bytecode.
12
If we inline more than one call, we share locals and stack slots.
6
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
3.3.4.1 Cycles in caller graph
The way we deal with cycles in the caller graph is just to inline recursively a certain number of times,
then stop doing it. This is surely a correct solution and works well in practice.
3.3.4.2 Dealing with exceptions
The exceptions in the JVM are dealt by means of exception tables. There is no explicit code for a jump
when an exception occurs, but there is a table that for each try/catch block reports the bytecode address that
is in the try block, the type of exception caught and the address of the catch code for it (finally is dealt with
jsr and ret)…
If the callee throws an exception then its stack does not need to be empty when we exit the method; but in
this case, we can accidentally “corrupt” the stack of the caller. Given that in the bytecode, there is no
instruction that let us know the height of the stack (or something similar), there is no way to check that the
stack has not been corrupted. This does not allow us to inline when the call is in a try/catch block.
This conservative solution to this problem is not so bad, since exception handling is the dominant factor
in the execution of a try/catch block (jumping to an exception handler is about 10 times slower than a
function call).
In any compiled code that we saw, there was no evidence of the corrupted stack problem even if we
inlined the code. Unfortunately, we couldn’t find any reference to clarify this point in the JVM
specification, so we decided not to inline in that case, just to be on the safe side.
3.3.4.3 Violation of Java semantics
After the inlining procedure, we can leave instructions that violate the Java semantics in the optimized
bytecode. There are two different problems that we must solve. The first one is a violation of access
privileges.
Consider for example the code in figure 1. We can safely inline A.f() in B.l(). But after we inline it,
we will reference the private field i from class B. This is not possible in the Java semantics, and the code
will not be executed by the JVM since the class B will be verified before execution. This problem has a
simple solution, since we can easily change the access flag of the fields we need in the various class files.
class A {
private int I;
final public f() { I ++; }
final public h() { g(); }
private g() {…g();…}
}
class A1 extends A {
public g() { … }
}
class B {
A1 a = new A1();
public l() { a.f(); a.h(); }
}
Figure 1
A subtler problem is the changes in invocation type. Suppose we want to inline method A.h() in
B.l(). We will have a call to method A.g() that is private in A. Since the method is recursive, we cannot
avoid to call it at least once. To avoid the access violation we could change the access from private to
public (or private to package). But since we are using a special call, some JVM will not execute the code 13.
13
We could not find any specification that should rule this out. The updated JVM specification for Java 2
has no mention of this in the verifier, but the JDK 1.2.1 throws a VerifierException in case we try to
execute the code.
7
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
But if we now change the linking to a virtual call, we will call the method A1.g() instead of A.g()
(similar example can be found in case of super call). The unique way to fix this is to change the access
privileges and rename the method such that we maintain Java semantics.
An easier way to deal with these problems is to switch off the Java verifier. Rewriting the class loader can
do this.
3.4 Library JavaClass
The whole project has been implemented using the library JavaClass [3]. The library lets us read bytecode
files, and modify them quite easily. It has a completely abstract way of dealing with bytecode that lets us
build prototypes much faster. Of course this abstraction is a little slower than coding with another style, but
it is worth it, since we do not have to write code for manipulating bytecode ourselves.
The most important features for our project were the management of the constant pool and of references
to it and the management of jump instructions in the code for the methods. Not having to deal with
bytecode indexes and jumping addresses made our life easier especially because we were modifying
existing bytecode, not writing new bytecode from scratch.
4 Experimental results
4.1 Class hierarchy analysis
Running the analyzer we get some very interesting results 14.
When the analyzer is run even on small programs, it turns out that a big number of classes are being used.
For example even in an extremely small program that consists only of the ‘main’ function, which just prints
a line on the screen, 10 system classes are loaded (1 of those is instantiated) and 13 system methods in
those 10 classes are used whereas 93 are not used. It should be also noted that one of those methods is a
“native” method.
When the analyzer was run on a slightly bigger program, which finds all the divisors of given numbers,
the results were more interesting:
Libraries
Classes
Loaded
Methods
Used
Methods
Not Used
Other
Classes
2
7%
5
62.5%
3
37.5%
26
93%
39
8.6%
417
91.4%
Total
28
44
9.5%
420
90.5%
What can be easily seen is that most of the system methods are unused, whereas most of the user program
methods were in fact used. The 3 methods in the divisor that were not used were prototyping methods.
Almost all of the classes loaded are system classes. Again a native method is found and, since we print the
result to the screen, it is the same native method as in the trivial program case. The devirtualizer was able to
resolve 3 polymorphic calls. In fact the 3 calls where all the ‘invokevirtual’ calls that the inliner attempted
to resolve15, which means that all the ‘virtual’ calls in the program are in fact monomorphic. There were no
‘invokeinterface’ calls.
In order to get a better feeling of what a “real” program would produce we run the analyzer taking as
input the analyzer itself, which is a sizable program.
14
Almost all the results, which are presented in this part, have been obtained by using the JDK 1.2.1
system library classes. The results using Microsoft Visual J++ library classes are different in some cases
but only in the actual numbers and not in the general behavior observed by the analysis.
15
It should be noted that the analyzer does not try to resolve calls in system libraries or calls that invoke a
system library method.
8
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
Libraries
Classes
Loaded
Methods
Used
Methods
Not Used
Other
Classes
76
38%
315
36%
557
64%
125
62%
286
19%
1254
81%
Total
201
601
25%
1811
75%
As it turns out most of the classes loaded are system classes (libraries), but to a smaller percentage than in
the previous two cases. There is a low usage of methods in the system libraries (19% only), but there is
actually a rather low usage of methods in the remaining classes also (36%). The latter might seem strange,
but it can be easily explained by the fact that we include in the other classes the “JavaClass” library, which
is used for manipulating the bytecode files 16. What should be noted however is the fact that the program
realizes that only 69 classes, since 7 classes are the ones we wrote for the analyzer, are used from the over
250 classes found in the “JavaClass” library. Also only 125 system are loaded from the over 2000 classes in
JDK 1.2.1. Native methods are again found, but this time also dynamic class and method loading methods
are invoked (use of reflection) and thus it is uncertain whether the program can safely devirtualize.
However, by setting the security level of the program to ‘Optimize Always’, we force the devirtualizer part
to work. The results are really stunning and show why this kind of analysis is very useful:
Virtual Calls
Interface Calls
Virtual Calls
Interface Calls
Special Calls
Static Calls
Total number of Calls
attempted to be resolved
337
0
Initial Number
1231
8
374
162
Number of
Resolved Calls
329
0
Final Number
902
8
703
162
Change
-329
0
+329
0
Percentage
Resolved
97.6%
Percentage Change
-26.7%
0%
+88.0%
-
The impressive result is the fact that out of the 337 calls that the program attempts to resolve, 329 are
monomorphic and only 8 are polymorphic, which means that 97.6% of the calls are successfully resolved
and thus inlining can further optimize those calls. Overall, the number of special calls has nearly doubled,
whereas the number of virtual calls has been decreased by more that 25%.
4.2 Inliner
In this section we will give some results to show how inlining speeds up Java bytecode. We run the
inliner on codes that execute a lot of times the same short method. All the test cases were run using the
JDK1.2.1 JVM and the last MS JVM on Windows NT systems with the JIT active. The reason why we use
the JIT for our test is that we want to get an overall speedup of Java bytecode, so our solution should work
with the JIT.
4.2.1 Trivial programs
The test cases were built to execute short methods that do some computation on data local to the object
referenced. The results can be found in the following table (the results are taken using MS JVM).
16
We could have asked the analyzer to treat the JavaClass library classes as library classes. In that case, it
would have counted them together with the system classes, and therefore the inliner would not have
analyzed several hundred calls to JavaClass methods and thus we would not have been able to observe the
huge number of calls that were resolved.
9
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
Devirtualization
2%
1%
1%
Prog. 1
Prog. 2
Prog. 3
Inlined
31%
30%
26%
Inlined recursively
62%
-
As can be easily seen, there is almost no speedup obtained for changing the call type from virtual to
special, which has also been found in [2]. In dealing with Java bytecode, the reason is clearly the fact that
the work needed for calling a function in Java depends a lot on the fact that we need to create a new stack
frame, verify the correctness of the call (and maybe something else). The difference in the resolution of the
function call is negligible compared to the rest of the work.
After inlining once we get clearly better running times up to 30%; the speedup is even better in the case
when we inline more than once.
We run another code in which we try to compare the execution time of two different JVMs given the
inlining optimization. The results are in the following table.
JDK 1.2.1
0%
0%
1 Call
No calls
MS
42%
42%
The row labeled “1 Call” contains the speedup obtained after inlining; the line labeled “No calls” contains
the speedup of a Java source code that has no function calls, so it is the theoretical limit for the inlined
code. As shown the JDK JVM seems not to be affected by the optimization, but more importantly it seems
not to be affected also in the code that uses no function calls. This means that in certain cases (and we
stress certain limited cases) the JVM is able to inline the code. In the MS JVM, the speedup is significant,
but more importantly, we also achieve the theoretical limit in inlining the code. This means that the work
we do on the stack, which is needed to inline, is negligible compared to the actual execution time of a
function call and thus no further analysis is necessary to try to minimize that cost.
4.2.2 Complex matrix multiplication
We execute the inliner on a complex naïve matrix multiplication program, which multiplies two matrices
256x256.
JDK1.2.1 (ms)
MS JVM (ms)
2 calls
12500
22100
1 call
12200
11200
11300
4800
10000
5000
No calls
No
calls,
objects
no
JDK 1.2.1
inlined
12000
4%
11200
8%
-
MS JVM inlined
9300
58%
8900
19%
-
We wrote 4 versions of the code. The first three use matrices of complex objects, and differ by the
number of function called in the inner “for loop”. As can be seen we actually get speedup for the two JVMs
that are even better than the theoretical limit. We have no explanation for this effect, since we are running
the code using JIT and we have no control on what it is doing.
Another important aspect is that, if we use two matrices of real numbers instead a matrix of complex
numbers, the speed increases by a factor of two. This could be an effect of null pointer exception checked
on every element in the matrix or just the way Java uses in order to refer to an object on the heap.
5 Related works
There is a lot of work related to OOP optimization using inlining. Some of it can be found in the papers
available on the course Web page [4]. As far as we know, there is little work done in optimizing Java
bytecode. One of the reasons is that anyway Java bytecode is slower than a native compilation.
10
Fabio Pellacini and Ioannis Vetsikas
Inlining Java bytecode – CS612 final project report
In this section we will cover two of the most significant example of work in this area.
5.1 IBM JAX
JAX is a research project at IBM. It is freely available for download on the Web [2]. It uses an analysis
phase very similar to our own, and makes almost the same assumption as far as reflection and JNI are
concerned. The goal of JAX is Java bytecode compression, by eliminate unused fields, methods and
classes. In every test case their code is much slower than ours is. The reason is that they aim at
compression, so they are less interested in inlining (even if they sometimes do so), while we specifically
aim at speed.
5.2 Sun Java Hotspot
Hotspot is the new Sun Java compiler technology [5]. By the time this report is read, it should be
available for download. Hotspot performs full native code optimization at runtime. It differs from a JIT
since the JIT only optimizes sparsely for time constraints, while hotspot uses runtime information about
execution time to optimize only the “hotspot” of the code. In this way they claim to be able to optimize the
“hotspot” as much as they want, and they also deal easily with dynamic linking since it is sort of a JIT.
One of the various optimization techniques they claim to use in Hotspot is code inlining. This is
reasonable since the time required to inline is very little and the information needed is there anyway
(gathered through the verification process). It also seems that Symantec JIT (JDK 1.2.1 JIT) is doing some
sort of inlining in the easy cases, but we think it should be a straightforward extension to a JIT to include
full type hierarchy analysis.
6 Further work
We believe that the analyzer can be improved a little bit by doing a data flow analysis that includes intraprocedural and inter-procedural elements, although we have not seen something like that done for Java
bytecode anywhere in the literature.
However, with the information that the analyzer derives we can easily write a program, which can strip
the classes of all unused methods, and rewrite the constant pool to reflect the changes.
We could also have optimized the polymorphic invocations, by inserting code in the bytecode to check at
runtime the type of the object used in the invocation and calling the appropriate method statically
thereafter. This could be useful in the case where a small number of methods can be called from that site. If
we had some statistical information about the frequency with which each method is called from that site,
we could also inline only the most frequent methods and leave the rest to still be invoked by a polymorphic
call.
7 Conclusions
We showed that it is possible to implement a very easy analysis and inlining procedure and that in case of
CPU intensive application with a lot of small function calls, the optimization is noticeable running on a JIT.
We also think that this should be easily inserted in a JIT compiler, and that this should improve the code
speed up a lot.
8 Bibliography
[1] Java specification web site: http://www.javasoft.com/docs/books/vmspec/index.html
[2] JAX web site: http://www.alphaworks.ibm.com/tech/JAX
[3] JavaClass web site: http://www.inf.fu-berlin.de/~dahm/JavaClass/index.html
[4] CS612 paper web page: http://simon.cs.cornell.edu/Info/Courses/Spring-99/CS612/papers/index.html
[5] Hotspot web site: http://www.javasoft.com/products/hotspot/index.html
11
Download