Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report Inlining Java Bytecode Fabio Pellacini and Ioannis Vetsikas CS612 final project report 1 Problem statement Given the OO Java design, small function calls are heavily present in Java programs. In case of programs that use heavily small functions (such as the arithmetic operations on complex numbers, or accessor methods) the execution time can be improved a lot by inlining the bytecode. This is true in all OO languages but especially in Java bytecode since the overhead time required for a method to be invoked is extremely high compared to native code. Our project aims at speeding up Java bytecode by inlining functions. To do that we first analyze the bytecode to determine the set of loaded and instantiated classes. Using the latter we prepare the bytecode for the inlining phase using a type hierarchy analysis. And finally we inline the bytecode. Using this very simple system, we got some pretty nice results in case of programs like a complex matrix multiplication. 2 Bytecode details This chapter will be a small introduction on the structure of the bytecode and some details about JVM execution meant to be helpful in understanding the rest of the implementation (it is possible to find more information about this in [1]). In the following discussion we will assume that no reflection is used. 2.1 The class file format For each class, the JVM requires to have a file containing the bytecode of that class. A class file, schematically, contains information about the class hierarchy, an area called constant pool, information about the fields that are declared in the Java class, information about the methods and a series of attributes. The class info section stores the name of the class defined, the name of its parent class, the access flags for the current class and an array of the interfaces that the class implements. As can be easily seen, this preserves all the information about the type structure that is in the Java source code. The constant pool section stores all the constants needed in the class file. Whenever a constant is needed in the bytecode, the constant is replaced by an index in the constant pool. The constants are also typed, so we know if we are referring to an integer, a class reference, a string and so on so forth. The field info section is an array of fields’ information. They basically contain a Java field declaration, i.e. field name, type and modifiers and an array of attributes relative to that field. As in all the other attributes in Java class files, there are standard attributes declared in the specification, and one can also define ones own attributes (so that one can easily annotate the bytecode). The method section is very similar to the field one. It contains the Java declaration of the method (signature and modifiers) and a set of attributes. One of the standard ones that is very important for us is the Code attribute. It contains the compiled code for the method if the method is not abstract or native. 2.2 Class initialization, object instantiation and finalizations In order to perform type hierarchy analysis, we need to know when classes get loaded and when they are instantiated. We will assume no specific order for loading and verifying Java bytecodes (unless required for the initialization process). 1 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report Initialization of a class consists of executing its static initializers1[1 section 2.17.4]. Initialization of an interface consists of executing the initializers for fields declared in the interface. Before a class or interface is initialized, its direct superclass must be initialized, but interfaces implemented by the class need not be initialized. Similarly, the superinterfaces of an interface need not be initialized before the interface is initialized. A class or interface type T will be initialized immediately before one of the following occurs: T is a class and an instance of T is created. T is a class and a static method of T is invoked. A nonconstant static field of T is used or assigned. A constant field is one that is (explicitly or implicitly) both final and static, and that is initialized with the value of a compile-time constant expression. A reference to such a field must be resolved at compile time to a copy of the compile-time constant value, so uses of such a field never cause initialization. An object of a class is instatiated only when a NEW instruction is executed in the bytecode [1 section 2.17.6]. After this instruction there is a call to the appropriate constructor (named <init> in the bytecode), so we don’t need to deal explicitly with them 2. For the purpose of our analysis we also need to take into account finalization. Objects are finalized before they are collected by the garbage collector. To do this, Java execute the finalize method. Classes are finalized before they are unloaded by the JVM, and the finalization process is a call to the finalizeClass method [1 section 2.17.7]. 2.3 Bytecode method code Every method call in the JVM has its own stack frame (the stack is not shared between method like in standard native compilation). Also local variables are not stored on the method stack as far as the bytecode instruction are concerned. Every method declares the maximum size of the stack and the maximum number of locals. 2.4 Bytecode method invocation The procedure for calling a method is similar to the way we usually call when we compile to native code. All arguments are put on the stack of the caller, and then we invoke an instruction to jump to subroutine. After the execution of the callee is finished, the result is put on the stack. The callee receives the parameters in the first n locals (where n is the number of arguments of the callee method). The first parameter, in case of non-static functions, is the reference to the object from which the function is called. There are four different instructions in the bytecode to jump to a new method. They all need a method reference that is an index in the constant pool. The method reference contains the signature (that also contains the class name) of the method that must be executed. The instructions for calling a method are invokestatic: used to invoke static methods (statically linked, so can be inlined) invokespecial: used to call private, superclass methods, and constructors (statically linked, so can be inlined) invokevirtual: used for virtual calls (dynamic liked, cannot be inlined) invokeinterface: used to call a method defined in an interface (dynamic liked, cannot be inlined) For virtual method calls, the class in the method reference will be the root of the subtree of possibly overriding classes. For interface calls the reference will be to the method in the interface. 1 In our case, we are only interested in being sure we execute the code in the static initializer for the class, i.e. the <clinit> method. 2 Actually this is not true for String, but we assume that we will always consider String as loaded in the JVM, so for our specific application this doesn’t matter. 2 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report 3 Description of solution 3.1 Overall structure Our system takes as input bytecode and outputs optimized bytecode; it does not require Java code. We assume that all the class files needed are provided to the system and that includes all the libraries needed for the program to run. The work is being done in several phases as portrayed in the following schema: Bytecode System lib. J A X Class Hierarchy Analyzer Inliner Fixer J A X The first stage is to take the bytecode and the system libraries and insert them into the class hierarchy analyzer, which analyzes the bytecode, reconstructs the class hierarchy, finds which classes are instantiated, reproduces the caller graph and resolves as many invocations as it can. The Inliner is the next stage, which inlines the bytecode where it is deemed necessary and possible. Then the resultant bytecode is passed through a “fixing” stage to correct any problems related to the security mechanism. We have also found a tool from IBM called JAX 3 [2], which can be optionally used before the analyzer to compress the class hierarchy and, after the final bytecode is produced, to strip the code of the methods which are not used and to clear up the constant pool. 3.2 Class hierarchy analysis 3.2.1 General This stage is aimed at getting enough information out of the program to be optimized so that we can determine if some method invocations, which appear to be polymorphic4, are in fact monomorphic5. According to the Java Virtual Machine specifications the invokestatic and invokespecial commands are monomorphic invocations, whereas the invokevirtual and invokeinterface commands are polymorphic. The inliner can only safely inline monomorphic calls. Therefore it is desirable to figure out if ‘virtual’ or ‘interface’ call is in fact monomorphic in which case the command is changed to ‘special’ call to indicate to the inliner that the call is monomorphic. 3.2.2 Main idea We use a simple data flow analysis. What we need to determine is what classes get loaded and which of those get instantiated. We also need to get an approximation of the caller graph 6. The information therefore that needs to be propagated in the data flow analysis is the class hierarchy tree and the set of instantiated classes. So what needs to be done is start from the main function and follow the edges in the caller graph. We therefore need to keep track of classes that get instantiated, of accesses to static fields and of method invocations. 3 Its use is briefly described in a later part of this report. An invocation site is called polymorphic if the invocation at that site can call more than one overridden method. 5 An invocation site is called monomorphic if only one particular method can get called. 6 The caller graph is a graph where there is a directed edge between two methods of the program if and only if the first method calls the second one. In our approximation we can have more than one edges per call site in case the call site is polymorphic. 4 3 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report 3.2.3 Implementation 3.2.3.1 Input Our program only needs to get the name of the class where the ‘main’ method is located. No a priori information of which classes and methods are loaded and used is necessary, since the analyzer finds all that information for itself. 3.2.3.2 How it works It starts from the ‘main’ method For each method which gets analyzed the following are done in the following order: If the corresponding class does not exist it is imperative that all super classes and super interfaces get loaded (when loading a class the methods <clinit>, finalize and classFinalize must be called7). A node is inserted in the caller graph If the method is used for the first time: it must be checked whether the method is a “dangerous reflection” method (e.g. forName, newInstance, loadClass etc.) we must check all the instructions in the code of the method for presence of the NEW instruction, in which case the set of instantiated classes must be updated with the new classes which got instantiated If the method is used for the first time or the set of instantiated classes has grown (some more classes got instantiated) since the last use of the method we need to check the invocation sites to make sure that all the possible methods that could potentially get called are indeed called. This means of course that we need to check the whole hierarchy again, which could potentially have changed since the last use of the method, and for each class in the sub tree which is instantiated consider that the appropriate method gets used. The set of loaded classes, the set of instantiated classes and the caller graph that the analyzer computes are in fact “conservative approximations” of the real ones. What we mean by that is that we might actually estimate that more classes are loaded or are instantiated and that more methods can get called than they really are. However, we are being conservative in doing that and we will not underestimate the number of methods that get called from a call site or the number or classes that get loaded and therefore when we devirtualize we preserve the equivalence of the resulting program to the original one. It should be noted that when a class or an interface is loaded its super classes or super interfaces are “updated” to know which their “children” are in the hierarchy. Also when a method gets used it “remembers” a few pieces of information (e.g. which methods it calls, or how big the instantiated class set was the last time it was used etc.) 3.2.4 Running Time Complexity Assume that: M = total number of methods which are used C = total number of classes which are instantiated Li = length of code in method i Hi = cost to traverse the hierarchical sub trees from each invocation site in the code of method i and determine which methods get called For each of the M methods: Search its code for NEW instructions. This takes O ( Li ) time. The analysis for which methods get invoked (can be run) C times in the worst case, since the analysis is run only if the instantiated classes set has grown and C is the size the set gets to at the end and Since the finalizers are called before the objects are collected, and we don’t know when this is going to happen, we will call them at the beginning of the analysis since it does not really matter. 7 4 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report obviously that size cannot decrease. For each run the cost is O( H i L i ) equal to the sum of the cost of checking all the method code and the cost of traversing the hierarchical sub trees from each invocation site in the code of method i and determining which methods get called. Thus this takes O(C ( H i Li )) time. So total cost is: O i 1 Li C ( H i Li ) which, since C 1 in a OO program, gives M O i 1 C ( H i Li ) M But this is actually a worst case analysis, in practice it is not really necessary to run the analysis for each method C times but usually less. 3.2.5 Type hierarchical analysis and devirtualization This is the aftermath of the class hierarchy analysis. What is done is for each polymorphic call site to go through the class hierarchy tree and determine which methods can get called; if their number is one then the call is really monomorphic and thus the call should be changed to invokespecial so that the inliner will know that this is so8. 3.2.6 Caveats As the results of the analyzer on several programs have pointed out, it is unfortunately quite often the case that it is impossible to know exactly what happens in each part of the program (every class and every method). This is due to the use of two Java platform features: 3.2.6.1 Reflection The reflection API allows the programmer to do introspection of object and classes in the current JVM. By means of that the programmer has fields and methods references. Using the latter the programmer can call a method in a way that we cannot track the call using the given analysis. It is also possible to dynamically load new classes, which can also change a class that is in memory. This actually breaks down the analysis even worse, since there is no way to detect this 9. The unique way, in which we deal with that, is to at least detect that it has been used. If so, we cannot guarantee any part of the analysis, so we cannot devirtualize safely10. 3.2.6.2 Java Native Interface (JNI) The Java Native Interface JNI is the standard Java interface to native libraries. A native method can create objects, modify fields, call Java methods and load new classes. All these possibilities break down our analysis completely. Since we don’t have the code for native methods, these are just black boxes for us, that we cannot analyze. Unfortunately they are used extensively (just print a string). We assume that a native method will not give us problems; this is true, at least, for all native methods in system libraries. If we find a native method call in the user code, then we cannot devirtualize safely11. 8 We can define a set of classes than we are not going to change during the inliner phase, nor we change any call site to a method in that set of classes. From now on they will be called system classes, since normally we include the system libraries in this set. 9 To our knowledge, there are no compilers or tools that do static analysis that are able to do anything for reflection. 10 We can always inline static and private methods, and superclass calls. 11 We can always inline static and private methods, and superclass calls. It should be noted however that since the existence of native functions is the case in almost every program, we have added the option to the analyzer for different levels of security. This means that, depending on the security level given by the user, the analyzer will try to devirtualize anyway, but it will output a warning message to inform the user that the optimization might not work and the reason for it (existence of native method, use of reflection etc.) 5 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report 3.3 Inliner 3.3.1 Inlining heuristic In the current implementation the inliner will inline every static and special call; there is no heuristic for inlining. In the test cases we ran, inlining was always better than calling a function. We really don’t have the problem of having bigger code as in standard inlining procedures, since the cost of a function call is pretty high in Java bytecode execution. 3.3.2 Support for static/special calls Given a statically linked method, we can easily inline the callee code. To do this we will follow the following scheme: 1. Grow maxstack and maxlocals to the sum of the caller and the callee max values12 2. Pop arguments from the stack and store them into the new local variables 3. Insert code for NullPointerException check, in case we are inlining non-static methods (the code we insert is the same as the compiled code for the statement if(ref != null)throw new NullPointerException(); 4. Insert code for synchronization, in case of synchronized methods (MONITORENTER instruction) 5. Insert the callee code, renumbering all references to local variables; in case we are inlining a method that is in another class we also need to fix the constant pool 6. Change every return to unconditional jump to the instruction in the caller code which is the first one after the call 7. Insert code for synchronization, in case of synchronized methods (MONITOREXIT instruction) After inlining all the methods in a given caller, we also run a little optimization procedure. This will clean up the resulting code in two ways. We will take out the jump instruction that refer to the next bytecode instruction and we will clean up some NOP instruction that we used as placeholders for making jump easier to deal with. 3.3.3 Note about inlining by jump We also tried to inline the code in the caller by using the jump to subroutine instruction used normally to compile the finally statement. We did this to avoid an unwanted grow of code size in case we need to inline several times the same function. The other reason why we were interested in such an inlining procedure, was that by doing so we basically compile method call with a shared stack, so it would have been useful to understand the behavior of this method call. The results we got after running this code are incredibly slower than the non-inlined code. The reason seems to the fact that the jsr and ret instruction we use are only used to compile the finally statement, and since the JVM is very slow in dealing with exception, this is much slower than a normal call. Unfortunately in the bytecode instruction set there is no other way to implement a subroutine call and return. In fact we need an instruction to jump to a bytecode address found on the stack and the unique one is the ret instruction. This expects to find on the stack a return address put there by the jsr instruction (remember that the JVM verifies the type of every element on the stack before using it, so we cannot just push an integer on the stack by ourselves, because its type is not “return address”, but “int”). 3.3.4 Caveats There are some caveats that we have to deal with for inlining correctly. Those are dealing with cycles in the caller graph, dealing with exception handling and finally dealing with problems due to the way Java enforces correctness in the bytecode. 12 If we inline more than one call, we share locals and stack slots. 6 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report 3.3.4.1 Cycles in caller graph The way we deal with cycles in the caller graph is just to inline recursively a certain number of times, then stop doing it. This is surely a correct solution and works well in practice. 3.3.4.2 Dealing with exceptions The exceptions in the JVM are dealt by means of exception tables. There is no explicit code for a jump when an exception occurs, but there is a table that for each try/catch block reports the bytecode address that is in the try block, the type of exception caught and the address of the catch code for it (finally is dealt with jsr and ret)… If the callee throws an exception then its stack does not need to be empty when we exit the method; but in this case, we can accidentally “corrupt” the stack of the caller. Given that in the bytecode, there is no instruction that let us know the height of the stack (or something similar), there is no way to check that the stack has not been corrupted. This does not allow us to inline when the call is in a try/catch block. This conservative solution to this problem is not so bad, since exception handling is the dominant factor in the execution of a try/catch block (jumping to an exception handler is about 10 times slower than a function call). In any compiled code that we saw, there was no evidence of the corrupted stack problem even if we inlined the code. Unfortunately, we couldn’t find any reference to clarify this point in the JVM specification, so we decided not to inline in that case, just to be on the safe side. 3.3.4.3 Violation of Java semantics After the inlining procedure, we can leave instructions that violate the Java semantics in the optimized bytecode. There are two different problems that we must solve. The first one is a violation of access privileges. Consider for example the code in figure 1. We can safely inline A.f() in B.l(). But after we inline it, we will reference the private field i from class B. This is not possible in the Java semantics, and the code will not be executed by the JVM since the class B will be verified before execution. This problem has a simple solution, since we can easily change the access flag of the fields we need in the various class files. class A { private int I; final public f() { I ++; } final public h() { g(); } private g() {…g();…} } class A1 extends A { public g() { … } } class B { A1 a = new A1(); public l() { a.f(); a.h(); } } Figure 1 A subtler problem is the changes in invocation type. Suppose we want to inline method A.h() in B.l(). We will have a call to method A.g() that is private in A. Since the method is recursive, we cannot avoid to call it at least once. To avoid the access violation we could change the access from private to public (or private to package). But since we are using a special call, some JVM will not execute the code 13. 13 We could not find any specification that should rule this out. The updated JVM specification for Java 2 has no mention of this in the verifier, but the JDK 1.2.1 throws a VerifierException in case we try to execute the code. 7 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report But if we now change the linking to a virtual call, we will call the method A1.g() instead of A.g() (similar example can be found in case of super call). The unique way to fix this is to change the access privileges and rename the method such that we maintain Java semantics. An easier way to deal with these problems is to switch off the Java verifier. Rewriting the class loader can do this. 3.4 Library JavaClass The whole project has been implemented using the library JavaClass [3]. The library lets us read bytecode files, and modify them quite easily. It has a completely abstract way of dealing with bytecode that lets us build prototypes much faster. Of course this abstraction is a little slower than coding with another style, but it is worth it, since we do not have to write code for manipulating bytecode ourselves. The most important features for our project were the management of the constant pool and of references to it and the management of jump instructions in the code for the methods. Not having to deal with bytecode indexes and jumping addresses made our life easier especially because we were modifying existing bytecode, not writing new bytecode from scratch. 4 Experimental results 4.1 Class hierarchy analysis Running the analyzer we get some very interesting results 14. When the analyzer is run even on small programs, it turns out that a big number of classes are being used. For example even in an extremely small program that consists only of the ‘main’ function, which just prints a line on the screen, 10 system classes are loaded (1 of those is instantiated) and 13 system methods in those 10 classes are used whereas 93 are not used. It should be also noted that one of those methods is a “native” method. When the analyzer was run on a slightly bigger program, which finds all the divisors of given numbers, the results were more interesting: Libraries Classes Loaded Methods Used Methods Not Used Other Classes 2 7% 5 62.5% 3 37.5% 26 93% 39 8.6% 417 91.4% Total 28 44 9.5% 420 90.5% What can be easily seen is that most of the system methods are unused, whereas most of the user program methods were in fact used. The 3 methods in the divisor that were not used were prototyping methods. Almost all of the classes loaded are system classes. Again a native method is found and, since we print the result to the screen, it is the same native method as in the trivial program case. The devirtualizer was able to resolve 3 polymorphic calls. In fact the 3 calls where all the ‘invokevirtual’ calls that the inliner attempted to resolve15, which means that all the ‘virtual’ calls in the program are in fact monomorphic. There were no ‘invokeinterface’ calls. In order to get a better feeling of what a “real” program would produce we run the analyzer taking as input the analyzer itself, which is a sizable program. 14 Almost all the results, which are presented in this part, have been obtained by using the JDK 1.2.1 system library classes. The results using Microsoft Visual J++ library classes are different in some cases but only in the actual numbers and not in the general behavior observed by the analysis. 15 It should be noted that the analyzer does not try to resolve calls in system libraries or calls that invoke a system library method. 8 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report Libraries Classes Loaded Methods Used Methods Not Used Other Classes 76 38% 315 36% 557 64% 125 62% 286 19% 1254 81% Total 201 601 25% 1811 75% As it turns out most of the classes loaded are system classes (libraries), but to a smaller percentage than in the previous two cases. There is a low usage of methods in the system libraries (19% only), but there is actually a rather low usage of methods in the remaining classes also (36%). The latter might seem strange, but it can be easily explained by the fact that we include in the other classes the “JavaClass” library, which is used for manipulating the bytecode files 16. What should be noted however is the fact that the program realizes that only 69 classes, since 7 classes are the ones we wrote for the analyzer, are used from the over 250 classes found in the “JavaClass” library. Also only 125 system are loaded from the over 2000 classes in JDK 1.2.1. Native methods are again found, but this time also dynamic class and method loading methods are invoked (use of reflection) and thus it is uncertain whether the program can safely devirtualize. However, by setting the security level of the program to ‘Optimize Always’, we force the devirtualizer part to work. The results are really stunning and show why this kind of analysis is very useful: Virtual Calls Interface Calls Virtual Calls Interface Calls Special Calls Static Calls Total number of Calls attempted to be resolved 337 0 Initial Number 1231 8 374 162 Number of Resolved Calls 329 0 Final Number 902 8 703 162 Change -329 0 +329 0 Percentage Resolved 97.6% Percentage Change -26.7% 0% +88.0% - The impressive result is the fact that out of the 337 calls that the program attempts to resolve, 329 are monomorphic and only 8 are polymorphic, which means that 97.6% of the calls are successfully resolved and thus inlining can further optimize those calls. Overall, the number of special calls has nearly doubled, whereas the number of virtual calls has been decreased by more that 25%. 4.2 Inliner In this section we will give some results to show how inlining speeds up Java bytecode. We run the inliner on codes that execute a lot of times the same short method. All the test cases were run using the JDK1.2.1 JVM and the last MS JVM on Windows NT systems with the JIT active. The reason why we use the JIT for our test is that we want to get an overall speedup of Java bytecode, so our solution should work with the JIT. 4.2.1 Trivial programs The test cases were built to execute short methods that do some computation on data local to the object referenced. The results can be found in the following table (the results are taken using MS JVM). 16 We could have asked the analyzer to treat the JavaClass library classes as library classes. In that case, it would have counted them together with the system classes, and therefore the inliner would not have analyzed several hundred calls to JavaClass methods and thus we would not have been able to observe the huge number of calls that were resolved. 9 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report Devirtualization 2% 1% 1% Prog. 1 Prog. 2 Prog. 3 Inlined 31% 30% 26% Inlined recursively 62% - As can be easily seen, there is almost no speedup obtained for changing the call type from virtual to special, which has also been found in [2]. In dealing with Java bytecode, the reason is clearly the fact that the work needed for calling a function in Java depends a lot on the fact that we need to create a new stack frame, verify the correctness of the call (and maybe something else). The difference in the resolution of the function call is negligible compared to the rest of the work. After inlining once we get clearly better running times up to 30%; the speedup is even better in the case when we inline more than once. We run another code in which we try to compare the execution time of two different JVMs given the inlining optimization. The results are in the following table. JDK 1.2.1 0% 0% 1 Call No calls MS 42% 42% The row labeled “1 Call” contains the speedup obtained after inlining; the line labeled “No calls” contains the speedup of a Java source code that has no function calls, so it is the theoretical limit for the inlined code. As shown the JDK JVM seems not to be affected by the optimization, but more importantly it seems not to be affected also in the code that uses no function calls. This means that in certain cases (and we stress certain limited cases) the JVM is able to inline the code. In the MS JVM, the speedup is significant, but more importantly, we also achieve the theoretical limit in inlining the code. This means that the work we do on the stack, which is needed to inline, is negligible compared to the actual execution time of a function call and thus no further analysis is necessary to try to minimize that cost. 4.2.2 Complex matrix multiplication We execute the inliner on a complex naïve matrix multiplication program, which multiplies two matrices 256x256. JDK1.2.1 (ms) MS JVM (ms) 2 calls 12500 22100 1 call 12200 11200 11300 4800 10000 5000 No calls No calls, objects no JDK 1.2.1 inlined 12000 4% 11200 8% - MS JVM inlined 9300 58% 8900 19% - We wrote 4 versions of the code. The first three use matrices of complex objects, and differ by the number of function called in the inner “for loop”. As can be seen we actually get speedup for the two JVMs that are even better than the theoretical limit. We have no explanation for this effect, since we are running the code using JIT and we have no control on what it is doing. Another important aspect is that, if we use two matrices of real numbers instead a matrix of complex numbers, the speed increases by a factor of two. This could be an effect of null pointer exception checked on every element in the matrix or just the way Java uses in order to refer to an object on the heap. 5 Related works There is a lot of work related to OOP optimization using inlining. Some of it can be found in the papers available on the course Web page [4]. As far as we know, there is little work done in optimizing Java bytecode. One of the reasons is that anyway Java bytecode is slower than a native compilation. 10 Fabio Pellacini and Ioannis Vetsikas Inlining Java bytecode – CS612 final project report In this section we will cover two of the most significant example of work in this area. 5.1 IBM JAX JAX is a research project at IBM. It is freely available for download on the Web [2]. It uses an analysis phase very similar to our own, and makes almost the same assumption as far as reflection and JNI are concerned. The goal of JAX is Java bytecode compression, by eliminate unused fields, methods and classes. In every test case their code is much slower than ours is. The reason is that they aim at compression, so they are less interested in inlining (even if they sometimes do so), while we specifically aim at speed. 5.2 Sun Java Hotspot Hotspot is the new Sun Java compiler technology [5]. By the time this report is read, it should be available for download. Hotspot performs full native code optimization at runtime. It differs from a JIT since the JIT only optimizes sparsely for time constraints, while hotspot uses runtime information about execution time to optimize only the “hotspot” of the code. In this way they claim to be able to optimize the “hotspot” as much as they want, and they also deal easily with dynamic linking since it is sort of a JIT. One of the various optimization techniques they claim to use in Hotspot is code inlining. This is reasonable since the time required to inline is very little and the information needed is there anyway (gathered through the verification process). It also seems that Symantec JIT (JDK 1.2.1 JIT) is doing some sort of inlining in the easy cases, but we think it should be a straightforward extension to a JIT to include full type hierarchy analysis. 6 Further work We believe that the analyzer can be improved a little bit by doing a data flow analysis that includes intraprocedural and inter-procedural elements, although we have not seen something like that done for Java bytecode anywhere in the literature. However, with the information that the analyzer derives we can easily write a program, which can strip the classes of all unused methods, and rewrite the constant pool to reflect the changes. We could also have optimized the polymorphic invocations, by inserting code in the bytecode to check at runtime the type of the object used in the invocation and calling the appropriate method statically thereafter. This could be useful in the case where a small number of methods can be called from that site. If we had some statistical information about the frequency with which each method is called from that site, we could also inline only the most frequent methods and leave the rest to still be invoked by a polymorphic call. 7 Conclusions We showed that it is possible to implement a very easy analysis and inlining procedure and that in case of CPU intensive application with a lot of small function calls, the optimization is noticeable running on a JIT. We also think that this should be easily inserted in a JIT compiler, and that this should improve the code speed up a lot. 8 Bibliography [1] Java specification web site: http://www.javasoft.com/docs/books/vmspec/index.html [2] JAX web site: http://www.alphaworks.ibm.com/tech/JAX [3] JavaClass web site: http://www.inf.fu-berlin.de/~dahm/JavaClass/index.html [4] CS612 paper web page: http://simon.cs.cornell.edu/Info/Courses/Spring-99/CS612/papers/index.html [5] Hotspot web site: http://www.javasoft.com/products/hotspot/index.html 11