Design and Implementation of the Joeq Virtual Machine John Whaley Stanford University Sun Microsystems Labs Mountain View, CA August 26, 2003 About me • Worked on Java VMs since JDK 1.0 – – – – – – 1996: Extended AWT to support pen input 1997: Clean-room Java VM written in C++ 1998: Jalapeno: designed opt compiler, … 1999: MIT Flex: dataflow framework, etc. 2000: IBM Tokyo JIT: x86 performance 2001: joeq virtual machine August 26, 2003 Design and Implementation of the Joeq Virtual Machine 1 Key Features • Implemented in 100% Java – Includes native methods to manipulate addresses, memory, registers directly. • Native vs. hosted execution – Native: run directly on hardware – Hosted: run on top of another VM • Bootstrap to native via reflection • Supports both GC and explicit deallocation August 26, 2003 Design and Implementation of the Joeq Virtual Machine 2 Key Features • Compiler and program analysis framework • Multiple languages: Java, C, C++, … – Single intermediate representation • Static, quasi-static, and dynamic compilation – Single unified compiler infrastructure • Online and offline profiling system • M:N thread scheduler August 26, 2003 Design and Implementation of the Joeq Virtual Machine 3 Motivation/Purpose • Started Ph.D. studies, needed a research infrastructure • Purpose: – Try out new ideas – Do research – Publish papers • Not out to: – Compete with other VMs – Make a shippable product – Change the world August 26, 2003 Design and Implementation of the Joeq Virtual Machine 4 Other Options • SUIF – – – – Written in C++ Limited support for Java No dynamic compilation or runtime system EDG frontend: not 100% gcc compatible – – – – Written in Java Very familiar with the system Supports Java only Not available outside of IBM • Jalapeno August 26, 2003 Design and Implementation of the Joeq Virtual Machine 5 Other Options • MIT Flex compiler – – – – Written in Java Familiar with system Open-source GPL Statically-compiled Java only • Kaffe, etc. – Written in C – Poor design, poor performance August 26, 2003 Design and Implementation of the Joeq Virtual Machine 6 Why Another VM? • General problem with established projects: – Established users and code base made it difficult to make major changes. – Wanted to fix the design "mistakes" of Jalapeno and MIT Flex compiler – More productive in Java than in C++ August 26, 2003 Design and Implementation of the Joeq Virtual Machine 7 Design Goals • Ease of trying out new research ideas – Implemented in Java – Modularity. – Lots of reusable code, use of software patterns. • Support Java and C/C++ – A single intermediate representation – Support GC and explicit deallocation August 26, 2003 Design and Implementation of the Joeq Virtual Machine 8 Design Goals • Support static, quasi-static, dynamic compilation. – Unified compiler framework. – Compiler implemented in Java. – Allow "maybe" responses due to incomplete information. – General code patching mechanism. – Profile framework allows online/offline profiling. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 9 Design Goals • Get something up and running quickly. – Make compiler, runtime easy to debug – Hijack class libraries from running VM – LGPL: can borrow code from other opensource projects – Goal: Self-bootstrapping after one month • Make it available for others to use. – Documentation, etc. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 10 Not Design Goals • Performance leader – An endless pit, takes a lot of effort – Performance just needs to be “reasonable” – Should be designed for good performance if someone wanted to put in the effort • 100% conformance to specification – If programs work, that’s good enough. – No access to good test suites, anyway. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 11 System Overview FRONT-END ELF object file ELF binary loader Disassemble to Quad SUIF file SUIF file loader SUIF to Quad Java class file Object file data section Optimizations and analyses Bytecode decoder Memory heaps Garbage collector MEMORY MANAGER DYNAMIC Controller Profiler Quad IR BACK-END Quad backend Bytecode to Quad Java class file loader August 26, 2003 COMPILER Bytecode IR Bytecode backend Class/member metadata Executable code in memory ELF file code section COFF file code section System interface Bytecode/Quad interpreters INTERPRETER Compiled code plus metadata Profile data file External libraries Introspection, verification, type checking RUN-TIME SUPPORT Design and Implementation of the Joeq Virtual Machine Thread scheduler, synchronization, stack walker 12 Consequences of 100% Java • Implementation purity – Self-applicable – VM code is great for program analysis, makes a great test suite • Portability – >95% of the code is system-independent – Hosted execution • Easier software engineering – Exceptions, GC, software patterns, existing tools August 26, 2003 Design and Implementation of the Joeq Virtual Machine 13 Consequences of 100% Java • Java is not a panacea of portability – Hosted execution works OK on most VMs – Native bootstrapping is horribly VMdependent • Internal class library changes cause Joeq to break – Supporting multiple JDK versions is difficult August 26, 2003 Design and Implementation of the Joeq Virtual Machine 14 Bootstrapping technique • Use reflection and code analysis to determine root set of methods and objects • Dump the objects and code into an object file (COFF or ELF format) • Use a standard linker to generate an executable • Easy support for static and quasi-static compilation, cross-language calls, dynamic linking, etc. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 15 Bootstrapping trickiness • Custom class loaders – Have to hijack class loader and wrap it • Files, etc. must be reinitialized – Some state stored in native code • Objects created during image write – Finalizer threads, reflection caches, character encodings, … • Reflection doesn’t work on all objects – Throwable backtrace, ThreadLocal, etc. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 16 Consequences of bootstrapping technique • Standard file formats very useful – Use existing tools and debuggers • Big startup time improvement on applications (30x) – Skips all of the initialization code, JIT startup costs • Large object files, number of relocations cause problems with some tools. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 17 Consequences of bootstrapping technique • Automatic discovery of necessary code: time-consuming, too conservative. • Hardwired class list: smaller and faster, but breaks often. • Problem: Instantiating an object means class is initialized, which brings in class initializer and many more objects August 26, 2003 Design and Implementation of the Joeq Virtual Machine 18 Consequences of bootstrapping technique • Bootstrapping process is a major pain – Time-consuming: reflection is inefficient – Difficult to debug – Process breaks with different JDK versions, environment variables, command line options, locales, etc. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 19 Class library implementation • GNU Classpath: too incompatible, too buggy • Hijack Sun class library by class merging – Make a “mirror” class with the same name. – Special class loader merges the classes. • Easy implementation of native methods. – Native code is just normal Java code. • Perfect compatibility, easy updates August 26, 2003 Design and Implementation of the Joeq Virtual Machine 20 Consequences of mirror classes • Types don’t match, so javac complains – Cast to java.lang.Object, then back down. • Doesn’t work on different class libraries. • Many changes between subversions. – Use a hierarchy of mirror classes • Incompatible changes lead to many hacks. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 21 Multiple language support • Joeq has support for: – Java class files – SUIF files • C, C++, Fortran, … – x86 object code • All are translated into a single intermediate representation, the Quad. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 22 Quad intermediate representation • Analyses and optimizations are instantly applicable to all languages • Cross-language inlining and optimization – Elimination of JNI overhead • Support for raw address manipulation in Java falls out naturally • Type-accurate garbage collection for wellbehaved C/C++ programs August 26, 2003 Design and Implementation of the Joeq Virtual Machine 23 Quad intermediate representation • Generic interfaces for operators – Lots of shared code • Types are optional – Type analysis will construct type information • Doesn’t support all esoteric C/C++ features – Computed labels, C++ nastiness, etc. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 24 Hierarchy of Operators August 26, 2003 Design and Implementation of the Joeq Virtual Machine 25 Memory management • Memory management is abstracted into different heaps – Each heap has its own allocation/deallocation policy • Interface for querying garbage collection policies – Type-accurate, semi-accurate, conservative – GC-safe points or at any instruction – Thread-local allocation pools • Working out an interface with JMTk August 26, 2003 Design and Implementation of the Joeq Virtual Machine 26 Consequences of memory management framework • Debugging – Run under hosted execution mode – Image snapshots – 100% type-accurate is hard • Coordinating threads for GC – Making a general interface is tricky August 26, 2003 Design and Implementation of the Joeq Virtual Machine 27 Thread scheduler • M:N thread scheduler – Lightweight Java threads – Thread switch at any instruction – Uses local thread queues and work-stealing • Timer ticks by using setitimer interrupts (Linux) or a separate thread (Windows) • Thread-local information stored off of fs register August 26, 2003 Design and Implementation of the Joeq Virtual Machine 28 Consequences of Java thread scheduler • Accessing threads in a machineindependent way is not easy • Linux pthread implementation is broken – Lots of bugs, race conditions, inefficiencies – Changing stack pointer is not always supported – Use of fs register is not always supported • Windows support is much nicer (?) August 26, 2003 Design and Implementation of the Joeq Virtual Machine 29 Running an Open-Source Project • Lots of interest, but very few people actually follow thru • Not many people have the skills – Of those, not many have the time • Of those, even fewer have the perseverance – The result is that there have only been minor contributions by others • Documentation, testing, file releases, updating the web site all take time. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 30 Running an Open-Source Project • What’s needed: – Nightly build scripts and regression testing – Implementation hackers – People interested in GC August 26, 2003 Design and Implementation of the Joeq Virtual Machine 31 Conclusion: What I’ve learned • Software patterns are useful – Joeq: 100K lines of code • Modular design is key – Trying out new type checker: ~2 hours • For maximum efficiency, design the system to be easily debuggable. • Preemptively eliminate obvious problems. • Its more fun to write code when you also write the compiler. August 26, 2003 Design and Implementation of the Joeq Virtual Machine 32