Java Virtual Machine Instruction Set Architecture Justin Dzeja What is a Java Virtual Machine? • JVM is an abstract computing machine ▫ Like an actual computing machine, it has an instruction set and manipulates various memory areas at run time • A JVM enables a set of computer software programs and data structures to use a virtual machine model for the execution of other computer programs and scripts ▫ Not just Java, JVM now supports many languages ▫ Ada, C, LISP, Python Why a Virtual Machine? • The Java platform was initially developed to address the problems of building software for networked consumer devices • It was designed to support multiple host architectures and to allow secure delivery of software components • To meet these requirements, compiled code had to survive transport across networks, operate on any client, and assure the client that it was safe to run • "Write Once, Run Anywhere" Java Timeline • 1991 – James Gosling begins work on Java project ▫ Originally, the language is named “Oak” • 1995 – Sun releases first public implementation as Java 1.0 • 1998 - JDK 1.1 release downloads top 2 million • 1999 - Java 2 is released by Sun • 2005 - Approximately 4.5 million developers use Java technology • 2007 – Sun makes all of Java’s core code available under open-source distribution terms Java Principles • Sun set five primary goals in the creation of the Java language,: ▫ It should be "simple, object oriented, and familiar". ▫ It should be "robust and secure". ▫ It should be "architecture neutral and portable". ▫ It should execute with "high performance". ▫ It should be "interpreted, threaded, and dynamic". JVM Instruction Set Architecture • Instructions ▫ A Java virtual machine instruction consists of a onebyte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation ▫ Operands are not required, there are many instructions that consist of only the opcode ▫ One-byte instructions allow for up to 256 instructions but only about 200 are used in class files, leaving room for more instructions to be added ▫ Each instruction has a mnemonic name which is mapped to the binary one-byte opcode JVM Instruction Set Architecture • Instruction Format ▫ The mnemonic operation names often include the data type followed by the operation name iadd, ladd, fadd, dadd int, long, float, double ▫ JVM supports conversion operations that convert from one data type to another, these include both data types in the operation name i2l, i2f, i2d, l2f, l2d, f2d Operation Types • The JVM ISA is a CISC architecture, thus having many instructions • They can be classified into broad groups ▫ ▫ ▫ ▫ ▫ ▫ ▫ Load and store Arithmetic and logic Type conversion Object creation and manipulation Operand stack management Control transfer Method invocation and return Operation Types • Load and store ▫ ▫ Used to move values from local variable array or heap to the operand stack iload, istore • Arithmetic and logic ▫ ▫ JVM supports addition, subtraction, division, multiplication, remainder, negation, increment irem, idiv, iinc • Type conversion ▫ ▫ Allows converting from one primitive data type to another i2l, i2f, i2d, l2f, l2d, f2d • Object creation and manipulation ▫ ▫ Instantiating objects and manipulating fields new, putfield • Operand stack management ▫ swap, dup2 • Control transfer ▫ ifeq, goto • Method invocation and return ▫ invokespecial, areturn JVM Data Types • The Java virtual machine operates on two kinds of types: primitive types and reference types • Integral Types: ▫ ▫ ▫ ▫ ▫ Byte - 8-bit signed two's-complement integers Short - 16-bit signed two's-complement integers Int - 32-bit signed two's-complement integers Long - 64-bit signed two's-complement integers Char - 16-bit unsigned integers representing Unicode characters JVM Data Types JVM Data Types • Floating Point Types: ▫ Float - values are elements of the float value set (typically 32-bit single-precision but may vary with implementation) ▫ Double - values are elements of the double value set(64-bit double-precision) • Boolean - values true and false ▫ JVM has very little support for the Boolean data type ▫ Boolean variables in a Java program are compiled to use values of the JVM int data type • returnAddress - are pointers to the opcodes of Java virtual machine instructions JVM Data Types • Three kinds of reference types ▫ Class types ▫ Array types ▫ Interface types • These reference dynamically created classes, arrays, or interface implementations • Can be set to null when not referencing anything and then cast to any type JVM Data Types • The basic unit of size for data values in the Java virtual machine is the word ▫ a fixed size chosen by the designer of each Java virtual machine implementation • The word size must be large enough to hold a value of type byte, short, int, char, float, returnAddress, or reference ▫ at least 32 bits JVM Runtime Data Areas • Since JVM is a virtual machine it doesn’t have any physical registers , instead it defines various runtime data areas that are used during execution of a program • One of the areas defined is the program counter register ▫ Each thread of control has its own PC register ▫ The register is wide enough to contain a returnAddress or a native pointer on the specific platform JVM Runtime Data Areas • JVM Stack ▫ Each thread gets its own JVM stack when it is created ▫ Stacks store frames which hold data and play a role in method invocation and return ▫ The actual memory for a JVM stack does not need to be contiguous ▫ The stack can be either of a fixed size or dynamically contracted and expanded as needed JVM Runtime Data Areas • JVM Frames ▫ A frame is used to store data and partial results, as well as to perform dynamic linking , return values for methods, and dispatch exceptions ▫ A new frame is created each time a method is invoked and destroyed when the method is completed ▫ Only one frame, for the executing method, is active at any point ▫ Each frame contains a local variable array Local variables can store primitive or reference data types Variables are addressed by index, starting from zero Data types long and double occupy two consecutive local variables ▫ Frames also contains an operand stack Last-in-first-out (LIFO) JVM loads values from local variables or fields onto the stack Then JVM instructions can take operands from the stack, operate on them, and the push the result back onto the stack The operand stack size is fixed at compile time based on method associated with the frame JVM Operand Stack • Code: ▫ ▫ ▫ ▫ iload_0 iload_1 iadd istore_2 // push the int in local variable 0 // push the int in local variable 1 // pop two ints, add them, push result // pop Int, store into local variable 2 JVM Runtime Data Areas • JVM Heap ▫ The heap is a data area shared by all JVM threads ▫ Memory from the heap is allocated for instances of classes and arrays ▫ Can be either of fixed size or dynamic ▫ Does not to be in contiguous memory space ▫ Maintained by an automatic storage management system or garbage collector JVM Runtime Data Areas • Method Area ▫ The method area is also shared among all JVM threads ▫ It stores per-class structures such as the runtime constant pool, field and method data, code for methods and constructors, including the special methods used in class and instance initialization ▫ The method area is logically part of the heap, but depending on the implementation it may or may not be garbage collected or compacted JVM Runtime Data Areas • Runtime Constant Pool ▫ The runtime constant pool is a per-class runtime representation of the constant pool table in a class file ▫ It contains numeric constants as well as method and field references that are resolved at runtime ▫ This is similar to a symbol table for a conventional programming language, although it stores a wider range of data JVM Addressing Modes • JVM supports three addressing modes ▫ Immediate addressing mode Constant is part of instruction ▫ Indexed addressing mode Accessing variables from local variable array ▫ Stack addressing mode Retrieving values from operand stack using pop JVM Method Calls • Sample Code int add12and13() { return addTwo(12, 13); } • Compiles to Method int add12and13() 0 aload_0 1 bipush 12 3 bipush 13 5 invokevirtual #4 8 ireturn // Push local variable 0 (this) // Push int constant 12 // Push int constant 13 // Method Example.addtwo(II)I // Return int on top of operand stack; it //is the int result of addTwo() Design Principles • Simplicity favors regularity ▫ Examples of this principle can be found throughout the JVM specification ▫ Instructions are all a standard opcode that is one byte in size ▫ The naming conventions for opcode mnemonics are standard across different types of operations • Smaller is faster ▫ Data areas such as the heap are dynamic in size resulting in memory space saved when not in use ▫ JVM has a large instruction set, which results in a smaller code size when converted to byte code Design Principles • Make the common case fast ▫ JVM includes instructions to increment variables or to arithmetically shift values for fast execution of common operations • Good design demands good compromises ▫ The JVM finds a good balance between high performance and being secure and robust JVM Advantages/Disadvantages • A self-contained operating environment as if it’s a separate computer, this gives JVM two main advantages ▫ System Independence – a Java application will run the same on any JVM, regardless of the underlying system ▫ Security – Since a Java program operates in a selfcontained environment there is less risk to files and other applications • The disadvantage is that running the virtual machine is extra overhead on the system, which can impair performance in some situations Sources • http://java.sun.com/docs/books/jvms/second_ edition/html/VMSpecTOC.doc.html • http://www.cis.cau.edu/121/lecture05-2.htm • http://www.particle.kth.se/~lindsey/JavaCours e/Book/Part1/Supplements/Chapter01/JVM.ht ml