Course Overview PART I: overview material 1 2 3 Introduction Language processors (tombstone diagrams, bootstrapping) Architecture of a compiler PART II: inside a compiler 4 5 6 7 Syntax analysis Contextual analysis Runtime organization Code generation PART III: conclusion 8 9 Supplementary material: Java’s runtime organization and the Java Virtual Machine Interpretation Review The Java Virtual Machine 1 What This Topic is About We look at the JVM as an example of a real-world runtime system for a modern object-oriented programming language. JVM is probably the most common and widely used VM in the world, so you’ll get a better idea what a real VM looks like. • • • • • JVM is an abstract machine. What is the JVM architecture? What is the structure of .class files? How are JVM instructions executed? What is the role of the constant pool in dynamic linking? Also visit this site for more complete information about the JVM: http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html The Java Virtual Machine 2 Recap: Interpretive Compilers Why? A tradeoff between fast(er) compilation and a reasonable runtime performance. How? Use an “intermediate language” • more high-level than machine code => easier to compile to • more low-level than source language => easy to implement as an interpreter Example: A “Java Development Kit” for machine M Java–>JVM M The Java Virtual Machine JVM M 3 Abstract Machines Abstract machine implements an intermediate language in between the high-level language (e.g. Java) and the low-level hardware (e.g. Pentium) High level Java Implemented in Java: Machine independent Java Java compiler JVM (.class files) Java JVM interpreter or JVM JIT compiler Low level Pentium The Java Virtual Machine Pentium 4 Abstract Machines An abstract machine is intended specifically as a runtime system for a particular (kind of) programming language. • JVM is a virtual machine for Java programs. • It directly supports object-oriented concepts such as classes, objects, methods, method invocation etc. • Easy to compile Java to JVM => 1. easy to implement compiler 2. fast compilation • Another advantage: portability The Java Virtual Machine 5 Class Files and Class File Format External representation (platform independent) .class files load JVM Internal representation (implementation dependent) classes objects primitive types arrays strings methods The JVM is an abstract machine in the truest sense of the word. The JVM specification does not give implementation details (can be dependent on target OS/platform, performance requirements, etc.) The JVM specification defines a machine independent “class file format” that all JVM implementations must support. The Java Virtual Machine 6 Data Types JVM (and Java) distinguishes between two kinds of types: Primitive types: • boolean: boolean • numeric integral: byte, short, int, long, char • numeric floating point: float, double • internal, for exception handling: returnAddress Reference types: • class types • array types • interface types Note: Primitive types are represented directly, reference types are represented indirectly (as pointers to array or class instances). The Java Virtual Machine 7 JVM: Runtime Data Areas Besides OO concepts, JVM also supports multi-threading. Threads are directly supported by the JVM. => Two kinds of runtime data areas: 1. shared between all threads 2. private to a single thread Shared Garbage Collected Heap Method area The Java Virtual Machine Thread 1 pc Java Stack Thread 2 pc Native Method Stack Java Stack Native Method Stack 8 Java Stacks JVM is a stack based machine, much like TAM. JVM instructions • implicitly take arguments from the stack top • put their result on the top of the stack The stack is used to • pass arguments to methods • return a result from a method • store intermediate results while evaluating expressions • store local variables This works similarly to (but not exactly the same as) what we previously discussed about stack-based storage allocation and routines. The Java Virtual Machine 9 Stack Frames The Java stack consists of frames. The JVM specification does not say exactly how the stack and frames should be implemented. The JVM specification specifies that a stack frame has areas for: Pointer to runtime constant pool args + local vars operand stack A new call frame is created by executing some JVM instruction for invoking a method (e.g. invokevirtual, invokenonvirtual, ...) The operand stack is initially empty, but grows and shrinks during execution. The Java Virtual Machine 10 Stack Frames The role/purpose of each of the areas in a stack frame: pointer to constant pool args + local vars operand stack Used implicitly when executing JVM instructions that contain entries into the constant pool (more about this later). Space where the arguments and local variables of a method are stored. This includes a space for the receiver (this) at position/offset 0. Stack for storing intermediate results during the execution of the method. • Initially it is empty. • The maximum depth is known at compile time. The Java Virtual Machine 11 Stack Frames An implementation using registers such as SB, ST, and LB and a dynamic link is one possible implementation. to previous frame on the stack SB LB dynamic link args + local vars operand stack to runtime constant pool JVM instructions store and load (for accessing args and locals) use addresses which are numbers from 0 to #args + #locals - 1 ST The Java Virtual Machine 12 JVM Interpreter The core of a JVM interpreter is basically this: do { byte opcode = fetch an opcode; switch (opcode) { case opCode1 : fetch operands for opCode1; execute action for opCode1; break; case opCode2 : fetch operands for opCode2; execute action for opCode2; break; case ... } while (more to do) The Java Virtual Machine 13 Instruction-set: typed instructions! JVM instructions are explicitly typed: different opCodes for instructions for integers, floats, arrays, reference types, etc. This is reflected by a naming convention in the first letter of the opCode mnemonics: Example: different types of “load” instructions iload lload fload dload aload integer load long load float load double load reference-type load The Java Virtual Machine 14 Instruction set: kinds of operands JVM instructions have three kinds of operands: - from the top of the operand stack - from the bytes following the opCode - part of the opCode itself Each instruction may have different “forms” supporting different kinds of operands. Example: different forms of “iload” Assembly code Binary instruction code layout iload_0 26 iload_1 27 iload_2 28 iload_3 29 iload n 21 n wide iload n The Java Virtual Machine 196 21 n 15 Instruction-set: accessing arguments and locals arguments and locals area inside a stack frame 0: 1: 2: 3: Instruction examples: iload_1 istore_1 iload_3 astore_1 aload 5 fstore_3 aload_0 The Java Virtual Machine args: indexes 0 .. #args - 1 locals: indexes #args .. #args + #locals - 1 • A load instruction takes something from the args/locals area and pushes it onto the top of the operand stack. • A store instruction pops something from the top of the operand stack and places it in the args/locals area. 16 Instruction-set: non-local memory access In the JVM, the contents of different “kinds” of memory can be accessed by different kinds of instructions. accessing locals and arguments: load and store instructions accessing fields in objects: getfield, putfield accessing static fields: getstatic, putstatic Note: Static fields are a lot like global variables. They are allocated in the “method area” where also code for methods and representations for classes (including method tables) are stored. Note: getfield and putfield access memory in the heap. Note: JVM doesn’t have anything similar to registers L1, L2, etc. The Java Virtual Machine 17 Instruction-set: operations on numbers Arithmetic add: iadd, ladd, fadd, dadd subtract: isub, lsub, fsub, dsub multiply: imul, lmul, fmul, dmul etc. Conversion i2l, i2f, i2d, l2f, l2d, f2d, f2i, d2i, … The Java Virtual Machine 18 Instruction-set … Operand stack manipulation pop, pop2, dup, dup2, swap, … Control transfer Unconditional: goto, jsr, ret, … Conditional: ifeq, iflt, ifgt, if_icmpeq, … The Java Virtual Machine 19 Instruction-set … Method invocation: invokevirtual: usual instruction for calling a method on an object. invokeinterface: same as invokevirtual, but used when the called method is declared in an interface (requires a different kind of method lookup) invokespecial: for calling things such as constructors, which are not dynamically dispatched (this instruction is also known as invokenonvirtual). invokestatic: for calling methods that have the “static” modifier (these methods are sent to a class, not to an object). Returning from methods: return, ireturn, lreturn, areturn, freturn, … The Java Virtual Machine 20 Instruction-set: Heap Memory Allocation Create new class instance (object): new Create new array: newarray: for creating arrays of primitive types. anewarray, multianewarray: for arrays of reference types. The Java Virtual Machine 21 Instructions and the “Constant Pool” Many JVM instructions have operands which are indexes pointing to an entry in the so-called constant pool. The constant pool contains all kinds of entries that represent “symbolic” references for “linking”. This is the way that instructions refer to things such as classes, interfaces, fields, methods, and constants such as string literals and numbers. These are the kinds of constant pool entries that exist: • Integer • Class_info • Float • Fieldref_info • Long • Methodref_info • Double • InterfaceMethodref_info • Name_and_Type_info • String • Utf8_info (Unicode characters) The Java Virtual Machine 22 Instructions and the “Constant Pool” Example: We examine the getfield instruction in detail. Format: 180 indexbyte1 CONSTANT_Fieldref_info { u1 tag; u2 class_index; u2 name_and_type_index; } Class_info { u1 tag; u2 name_index; } CONSTANT_Name_and_Type_info { u1 tag; u2 name_index; u2 descriptor_index; } The Java Virtual Machine indexbyte2 Utf8Info fully qualified class name Utf8Info name of field Utf8Info field descriptor 23 Instructions and the “Constant Pool” That previous picture is rather complicated, let’s simplify it a little: 180 Format: indexbyte1 indexbyte2 Fieldref Class Utf8Info fully qualified class name The Java Virtual Machine Name_and_Type Utf8Info name of field Utf8Info field descriptor 24 Instructions and the “Constant Pool” The constant entries format is part of the Java class file format. Luckily, we have a Java assembler that allows us to write a kind of textual assembly code and that is then transformed into a binary .class file. This assembler takes care of creating the constant pool entries for us. When an instruction operand expects a constant pool entry the assembler allows you to enter the entry “in place” in an easy syntax. Example: getfield mypackage/Queue i I The Java Virtual Machine 25 Instructions and the “Constant Pool” Fully qualified class names and descriptors in constant pool UTF8 entries. 1. Fully qualified class name: a package + class name string. Note this uses “/” instead of “.” to separate each level along the path. 2. Descriptor: a string that defines a type for a method or field. Java boolean integer Object String[] int foo(int,Object) The Java Virtual Machine descriptor Z I Ljava/lang/Object; [Ljava/lang/String; (ILjava/lang/Object;)I 26 Linking In general, linking is the process of resolving symbolic references in binary files. Most programming language implementations have what we call “separate compilation”. Modules or files can be compiled separately and transformed into some binary format. But since these separately compiled files may have connections to other files, they have to be linked. => The binary file is not yet executable, because it has some kind of “symbolic links” in it that point to things (classes, methods, functions, variables, etc.) in other files/modules. Linking is the process of resolving these symbolic links and replacing them by real addresses so that the code can be executed. The Java Virtual Machine 27 Loading and Linking in JVM In JVM, loading and linking of class files happens at runtime, while the program is running! Classes are loaded as needed. The constant pool contains symbolic references that need to be resolved before a JVM instruction that uses them can be executed (this is the equivalent of linking). In JVM a constant pool entry is resolved the first time it is used by a JVM instruction. Example: When a getfield is executed for the first time, the constant pool entry index in the instruction can be replaced by the offset of the field. The Java Virtual Machine 28 Closing Example As a closing example on the JVM, we will take a look at the compiled code of the following simple Java class declaration. class Factorial { int fac(int n) { int result = 1; for (int i=2; i<n; i++) { result = result * i; } return result; } } The Java Virtual Machine 29 Compiling and Disassembling % javac Factorial.java % javap -c -verbose Factorial Compiled from Factorial.java class Factorial extends java.lang.Object { Factorial(); /* Stack=1, Locals=1, Args_size=1 */ int fac(int); /* Stack=2, Locals=4, Args_size=2 */ } Method Factorial() 0 aload_0 1 invokespecial #1 <Method java.lang.Object()> 4 return The Java Virtual Machine 30 Compiling and Disassembling ... // address: Method int fac(int) // stack: 0 iconst_1 // stack: 1 istore_2 // stack: 2 iconst_2 // stack: 3 istore_3 // stack: 4 goto 14 7 iload_2 // stack: 8 iload_3 // stack: 9 imul // stack: 10 istore_2 // stack: 11 iinc 3 1 // stack: 14 iload_3 // stack: 15 iload_1 // stack: 16 if_icmplt 7 // stack: 19 iload_2 // stack: 20 ireturn The Java Virtual Machine 0 this this this this this 1 n n n n n 2 result result result result result 3 i i 1 i i 2 i this this this this this this this this this n n n n n n n n n result result result result result result result result result i i i i i i i i i result result i result*i i i n result 31 Writing Factorial in “jasmin” Jasmin is a Java Assembler Interface. It takes ASCII descriptions for Java classes, written in a simple assembler-like syntax and using the Java Virtual Machine instruction set. It converts them into binary Java class files suitable for loading into a JVM implementation. .class package Factorial .super java/lang/Object .method package <init>( )V .limit stack 50 .limit locals 1 aload_0 invokenonvirtual java/lang/Object/<init>( )V return .end method The Java Virtual Machine 32 Writing Factorial in “jasmin” (continued) .method package fac(I)I .limit stack 50 .limit locals 4 iconst_1 istore 2 iconst_2 istore 3 Label_1: iload 3 iload 1 if_icmplt Label_4 iconst_0 goto Label_5 Label_4: iconst_1 Label_5: ifeq Label_2 The Java Virtual Machine iload 2 iload 3 imul dup istore 2 pop Label_3: iload 3 dup iconst_1 iadd istore 3 pop goto Label_1 Label_2: iload 2 ireturn iconst_0 ireturn .end method 33