Compiler Construction Project

advertisement
Compiler Construction Project
Prof. Dr. Hanspeter Mössenböck
University of Linz
A-4040 Linz
moessenboeck@ssw.uni-linz.ac.at
In this project you will write a small compiler for a Java-like language (MicroJava).
You will learn how to apply the knowledge from the compiler construction module in
practice and study all the details involved in a real compiler implementation.
The project consists of three levels:
 Level 1 requires you to implement a scanner and a parser for the language MicroJava, specified in Appendix A of this document.
 If you are more ambitious, you can try to implement also level 2, which deals with
symbol table handling and type checking.
 If you want to go to full length with your compiler you should also implement level
3, which deals with code generation for the MicroJava Virtual Machine specified in
Appendix B of this document. This level is (more or less) optional so that you can
get a good mark even if you do not implement it.
The marking will be as follows:
class test
up to 45 points
project level 1
+ 25 points
project level 2
+ 25 points
project level 3
+ 5 points
100 points
The project should be implemented in Java using Sun Microsystem's Java
Development Kit (JDK, http://java.sun.com/j2se/1.5.0/index.jsp) or some other
development environment. Before you start with the implementation you should study
the specification of the language MicroJava and the MicroJava Virtual Machine that
executes the bytecodes (i.e. the machine program) generated from a MicroJava source
program.
Level 1: Scanning and Parsing
In this part of the project you will implement a scanner and a recursive descent parser
for MicroJava. Start with the implementation of the scanner and write a test program
that repeatedly requests the next input token from the scanner. The test program
should demonstrate that your scanner returns the correct tokens for a sample program
(you can use the sample program in the MicroJava specification).
Next you should write a recursive descent parser that uses your scanner to read the
input tokens. In the first step, your parser should be implemented without error
handling. If it finds an error it should report it and just terminate. Write a test program
that uses your parser to analyse the sample MicroJava program. In the second step
you should augment your parser with error handling. Test your parser with various
sample programs that contain syntax errors.
All classes of the compiler should belong to a package MJ.
Scanning
Study the specification of MicroJava carefully. What are the tokens of the MicroJava
grammar? What is the format of names, numbers, character constants and comments?
What keywords and predeclared names do you need?
The scanner should be implemented as a file Scanner.java and should have the
following interface:
package MJ;
import ...;
public class Scanner {
// token codes
public static final int
none
= 0,
ident
= 1,
number = 2,
charCon = 3,
...
if_
= 33, // "if" cannot be used as a name because it is a keyword
new_
= 34,
...
eof
= 41;
// static variables
private static Reader in;
private static char ch;
public static int col;
public static int line;
// source file reader
// lookahead character
// current column
// current line
private static void error(String msg) { ... } // print error message
public static void init(Reader r) { ... }
// initialize scanner
public static Token next() { ... }
// return next input token
}
The main method of the scanner is next() which is repeatedly called by the parser (or
by your test program) and delivers the next input token on every call. The type Token
should be implemented in a file Token.java and is specified as follows:
package MJ;
public class Token {
public int kind;
public int line;
public int col;
public int val;
public String string;
}
// token kind
// token line
// token column
// token value (for number and charConst)
// token string
Every call of next() returns the next input token. If a symbol is not a valid token (e.g.,
& or $) next() should return the token value none. At the end of the input stream
next() should return the token value eof. Your scanner should skip blanks, end of line
characters, tabulator characters and comments.
The following situations represent lexical errors:
 The occurrence of an invalid character (e.g., $)
 Character constants with the following characteristics:
- a missing quote at the end of the character constant ('x)
- an empty character constant ('')
 Integer constants that are too large. The range of int is -2147483648..2147483647.
Note that the scanner only recognizes positive constants (i.e., 0 to 2147483647). A
negative number such as -3 is delivered as two distinct tokens for - and for 3.
In these situations next() should report an error and return a token with the value none.
At http://www.ssw.uni-linz.ac.at/Misc/CC/ you will find a fragment of the scanner
that you can use as a starting point of your implementation.
In order to test your scanner, write a test program that repeatedly calls Scanner.next()
and dumps the information returned by this method. For the program
program Nonsense
int dummy;
{
int Nonsense() { dummy=0; return dummy; }
void main() {
dummy= Nonsense();
}
}
the test program should produce the following output:
line 1, col 1:
line 1, col 7:
line 2, col 3:
line 2, col 7:
line 2, col 12:
line 3, col 1:
line 4, col 3:
line 4, col 7:
line 4, col 15:
line 4, col 16:
line 4, col 18:
line 4, col 20:
line 4, col 25:
line 4, col 26:
...
program_
ident
ident
ident
semicolon
lbrace
ident
ident
lpar
rpar
lbrace
ident
assign
number
Nonsense
int
dummy
int
Nonsense
dummy
0
Apply your test program also to the sample program in the MicroJava specification
and to a program containing lexical errors.
Parsing
The next step is to implement a recursive descent parser that should be implemented
as a class Parser in the file Parser.java. Every production of the MicroJava grammar
should be implemented as a static method of class Parser.
package MJ;
public class Parser {
private static Token t; // most recently recognized token
private static Token la; // lookahead token (still unrecognized)
private static int sym;
// most recent token number (always holds la.kind)
public static void parse() {...}
// starts the syntax analysis
private static void scan() {...}
// gets a token from the scanner and stores it in sy
private static void check(int expected) {...} // tries to recognize the token "expected"
private static void error(String msg) {...} // reports a syntax error
private static void Program() {...}
...
private static void Mulop() {...}
// parses the production for Program
// parses the production for Mulop
}
The parser is started by calling the method parse(). It assumes that the scanner has
already been initialized; it requests the first token from the scanner and calls the
method for parsing the production of the start symbol of the MicroJava grammar
(Program). The methods scan(), check() and error() are as discussed in the lecture.
Your first version of the parser should not try to recover from syntax errors. The
method error() should simply report an error with its position (line and column) and a
meaningful error message and should then terminate the program.
In addition to the parser you should also implement a main program (Compiler.java)
that reads the name of the source file to be compiled, initializes the scanner and calls
the parser.
package MJ;
public class Compiler {
public static void main(String [] arg) {...}
}
Test your parser using the sample program of the MicroJava specification. You
should also test it with programs that contain syntax errors in order to see if the parser
correctly reports the errors.
Syntax Error Handling
The final step of level 1 is to augment your parser with syntax error handling and
recovery. Use the method of special anchors discussed in the lecture. In case of an
error the parser should report it (calling error()) and continue parsing until it gets to
the next synchronisation point where it should read and skip input tokens until it
encounters a token that is a valid anchor. With this anchor the parser can continue. In
order not to produce spurious error messages you should use the heuristics of a
minimal error distance as discussed in the lecture.
The parser should also count the number of detected errors and make it accessible to
the main program. At the end of the compilation the main program should print a
message with the number of errors that were detected.
Test your error handler by inserting errors into a MicroJava program and by feeding it
to the parser. Try to find out how many errors your parser can report in a single run.
Level 2: Symbol Table Handling and Type Checking
The parser can now check if a program is syntactically correct. In order to detect
semantic errors, however, the compiler needs a symbol table in which it stores
information about all declared names. This table is used for checking context
conditions in the grammar.
Implement a symbol table as discussed in the lecture. It should use a class Obj to store
information about declared names, a class Struct for type information, and a class
Scope for maintaining nested scopes. The symbol table itself should be implemented
as a class Tab with proper methods to insert and find names as well as to open and
close scopes.
package MJ.SymTab;
public class Obj {
public static final int
// object kinds
Con = 0, Var = 1, Type = 2, Meth = 3, Prog = 4;
public int kind;
// Con, Var, Typ, Fld, Meth, Prog
public String name;
public Struct type;
public Obj next;
// to the next Obj in this scope
public int val;
// Con: constant value
public int adr;
// Var, Meth: address
public int level;
// Var: declaration level
public int nPars;
// Meth: no. of parameters
public Obj locals;
// Meth: to the local variables of this method
}
package MJ.SymTab;
public class Struct {
public static final int
// structure kinds
None = 0, Int = 1, Char = 2, Arr = 3, Class = 4;
public int kind;
// kind of this type (None, Int , Char, Class, Arr)
public Struct elemType; // Arr: element type
public int n;
// Class: number of fields
public Obj fields;
// Class: list of fields
}
package MJ.SymTab;
public class Scope {
public Scope outer;
public Obj locals;
public Int nVars;
}
// to the enclosing scope
// to the objects of this scope
// number of variables in this scope
package MJ.SymTab;
public class Tab {
public static Scope topScope;
// current scope
public static Obj noObj, chrObj, ...;
// predeclared objects
public static Struct intType, charType, ...; // predeclared types
public static void closeScope();
public static void openScope();
public static void init();
public static Obj insert(int kind, String name, Struct type);
public static Obj find(String name);
public static Obj findField(String name, Struct type);
}
Every class should be implemented in a separate file (Obj.java, Struct.java, etc.) in a
subpackage MJ.SymTab (i.e. in a subdirectory MJ/SymTab). The methods of class Tab
have the following meaning:
 insert() creates a new Obj, initializes it with the parameter values, and adds it to the
current scope. If this scope already contains an object with the same name an error
should be reported.
 find() looks up a name in all open scopes starting at the current (i.e. innermost)
scope topScope and returns the Obj node with this name. If the name was not found
find() should report an error and return the predeclared value noObj.
 findField() looks up a name in the field list of the specified class type and returns
the Obj node with this name. If the name was not found findField() should report an
error and return the predeclared value noObj.
 init() initializes the symbol table, in particular it sets up the data structures for the
predeclared objects and types (i.e. the universe) as shown in the lecture.
Extend your parser so that it calls the methods of class Tab to create Obj, Struct and
Scope nodes for the declarations of the compiled program. In order to test if the
symbol table has been built correctly, implement auxiliary methods for class Tab that
dump the contents of the symbol table.
When you have checked that the symbol table is correct, extend your parser again to
check the context conditions described in the MicroJava specification (Appendix A).
For that you have to retrieve the information from the symbol table using the methods
find() and findField().
Level 3: Code Generation
The final task is to generate code for the MicroJava Virtual Machine. Before you start,
carefully study the specification of the VM (Appendix B) in order to become familiar
with the run time data structures, the addressing modes, and the instructions.
All classes of the code generator should be implemented as separate files in the
package MJ.CodeGen. At http://www.ssw.uni-linz.ac.at/Misc/CC/ you will find a
fragment of Code.java that you can use as a starting point of your code generator. Its
interface is as follows:
package MJ.CodeGen;
public class Code {
public static final int
// instruction codes
load = 1, load_n = 2, ...;
public static final int
// compare operators
eq = 0, ne = 1, ...;
private static int[] inverse = {ne, eq, ge, gt, le, lt};
private static byte[] buf;
public static int pc;
public static int mainPc;
public static int dataSize;
// code buffer
// next free byte in the code buffer
// pc of main function (set by the parser)
// length of static data in words (set by the parser)
//--------------- code buffer access ---------------------public static void put(int x) {...}
public static void put2(int x) {...}
public static void put2(int pos, int x) {...}
public static void put4(int x) {...}
public static int get(int pos) {...}
public static int get2(int pos) {...}
private static void error(String msg) {...}
//----------------- instruction generation -------------public static void init() {...}
// initialize the code buffer
public static Item load(Item x) {...}
// load x on the expression stack
public static void loadConst(int n) {...}
public static void assign(Item x, Item y) {...}
public static void inc(Item x, int n) {...}
public static void write(OutputStream s) {...}
// load constant n on the expression stack
// generate code for the assignment x = y
// generate code to increment x by n
// write the code buffer to the output stream
//------------- jumps --------------public static void jump(Label lab) {...}
public static void tJump(Item x) {...}
public static void fJump(Item x) {...}
// unconditional jump
// true jump
// false jump
}
For maintaining Items implement a class Item as discussed in the lecture. Its interface
should look like this:
package MJ.CodeGen;
public class Item {
public static final int
// item kinds
Con = 0, Local =1, Static = 2, Stack = 3, Fld = 4, Elem = 5, Meth = 6, Cond = 7;
public int kind;
// Con, Local, Static, Stack, Fld, Elem, Meth, Cond
public Struct type;
// item type
public Obj obj;
// Meth: method object from the symbol table
public int val;
// Con: constant value
public int adr;
// Local, Static, Fld, Meth: address
public int op;
// Cond: operator
public Label tLabel, fLabel; // Cond: true jumps and false jumps
public Item(Obj o) {...}
public Item(int kind, int adr, Struct typ) {...}
}
For maintaining labels and jumps implement a class Label as discussed in the lecture.
Its interface should look like this:
package MJ.CodeGen;
public class Label {
private boolean defined;
private int adr;
// target address already defined?
// target address or start of threading list
public Label() {...}
public void put() {...}
public void here() {...}
}
Extend your parser step by step so that it calls methods to create items and labels and
to emit instructions. You should implement the code generation for the various
language constructs in the following order:






selectors (i.e. obj.f, arr[i])
expressions
assignments
while statements, if statements, break statements
conditional boolean expressions
method calls and parameter passing
In order to check your generated code you can use a Decoder class available from
http://www.ssw.uni-linz.ac.at/Misc/CC/. Its interface is as follows:
package MJ.CodeGen;
public class Decoder {
public static void decode(byte[] c, int off, int len) {...}
}
On the Web page you will also find an implementation of the MicroJava Virtual
Machine that can be used to execute the programs generated by your compiler. You
have to download the file Run.java and compile it. The interpreter can be started by
the command
java MJ.Run objectFile [-DEBUG]
The object file must conform to the format of the MicroJava VM specification (the
method Code.write() in the fragment of the code generator produces exactly that
format). The option -DEBUG causes a trace of the interpretation to be printed on the
screen.
Appendix A. The MicroJava Language
This section describes the MicroJava language that is used in the practical part of the
compiler construction module. MicroJava is similar to Java but much simpler.
A.1
General Characteristics
 A MicroJava program consists of a single program file with static fields and static
methods. There are no external classes but only inner classes that can be used as
data types.
 The main method of a MicroJava program is always called main(). When a
MicroJava program is called this method is executed.
 There are
- Constants of type int (e.g. 3) and char (e.g. 'x') but no string constants.
- Variables: all variables of the program are static.
- Primitive types: int, char (Ascii)
- Reference types: onedimensional arrays like in Java as wellas classes with fields
but without methods.
- Static methods in the main class.
 There is no garbage collector (allocated objects are only deallocated when the
program ends).
 Predeclared procedures are ord, chr, len.
Sample program
program P
final int size = 10;
class Table {
int[] pos;
int[] neg;
}
Table val;
{
void main()
int x, i;
{ /*---------- Initialize val ------------*/
val = new Table;
val.pos = new int[size];
val.neg = new int[size];
i = 0;
while (i < size) {
val.pos[i] = 0; val.neg[i] = 0;
i++;
}
/*---------- Read values ---------*/
read(x);
while (x != 0) {
if (0 <= x && x < size) {
val.pos[x]++;
} else if (-size < x && x < 0) {
val.neg[-x]++;
}
read(x);
}
}
}
A.2
Syntax
Program
= "program" ident {ConstDecl | VarDecl | ClassDecl}
"{" {MethodDecl} "}".
ConstDecl
VarDecl
ClassDecl
MethodDecl
FormPars
Type
= "final" Type ident "=" (number | charConst) ";".
= Type ident {"," ident } ";".
= "class" ident "{" {VarDecl} "}".
= (Type | "void") ident "(" [FormPars] ")" {VarDecl} Block.
= Type ident {"," Type ident}.
= ident ["[" "]"].
Block
Statement
ActPars
= "{" {Statement} "}".
= Designator ("=" Expr | "(" [ActPars] ")" | "++" | "--") ";"
| "if" "(" Condition ")" Statement ["else" Statement]
| "while" "(" Condition ")" Statement
| "break" ";"
| "return" [Expr] ";"
| "read" "(" Designator ")" ";"
| "print" "(" Expr ["," number] ")" ";"
| Block
| ";".
= Expr {"," Expr}.
Condition
CondTerm
CondFact
Relop
= CondTerm {"||" CondTerm}.
= CondFact {"&&" CondFact}.
= Expr Relop Expr.
= "==" | "!=" | ">" | ">=" | "<" | "<=".
Expr
Term
Factor
= ["-"] Term {Addop Term}.
= Factor {Mulop Factor}.
= Designator ["(" [ActPars] ")"]
| number
| charConst
| "new" ident ["[" Expr "]"]
| "(" Expr ")".
= ident {"." ident | "[" Expr "]"}.
= "+" | "-".
Designator
Addop
Mulop =
"*" | "/" | "%".
Lexical structure
Terminal classes:
ident
= letter {letter | digit | "_"}.
number
= digit {digit}.
charConst = "'" char "'". // including '\r' and '\n'
Keywords:
program class
if
else
void
final
while
new
read
print
return
break
+
-
*
/
!=
||
)
>
>=
%
<
++
<=
--
==
&&
(
[
]
{
}
=
;
,
.
Operators:
Comments:
// to the end of line
A.3
Semantics
All terms in this document that have a definition are underlined to emphasize their
special meaning. The definitions of these terms are given here.
Reference type
Arrays and classes are called reference types.
Type of a constant
 The type of an integer constant (e.g. 17) is int.
 The type of a character constant (e.g. 'x') is char.
Same type
Two types are the same
 if they are denoted by the same type name, or
 if both types are arrays and their element types are the same.
Type compatibility
Two types are compatible
 if they are the same, or
 if one of them is a reference type and the other is the type of null.
Assignment compatibility
A type src is assignment compatible with a type dst
 if src and dst are the same, or
 if dst is a reference type and src is the type of null.
Predeclared names
int
the type of all integer values
char the type of all character values
null the null value of a class or array variable, meaning "pointing to no value"
chr
standard method; chr(i) converts the int expression i into a char value
ord
standard method; ord(ch) converts the char value ch into an int value
len
standard method; len(a) returns the number of elements of the array a
Scope
A scope is the textual range of a method or a class. It extends from the point after the
declaring method or class name to the closing curly bracket of the method or class
declaration. A scope excludes other scopes that are nested within it. We assume that
there is an (artificial) outermost scope, to which the main class is local and which
contains all predeclared names. The declaration of a name in an inner scope S hides
the declarations of the same name in outer scopes.
Note
 Indirect recursion is not allowed, since every name must be declared before it is
used. This would not be possible if indirect recursion were allowed.
 A predeclared name (e.g. int or char) can be redeclared in an inner scope (but this is
not recommended).
A.4
Context Conditions
General context conditions
 Every name must be declared before it is used.
 A name must not be declared twice in the same scope.
 A program must contain a method named main. It must be declared with a void
function type and must not have parameters.
Context conditions for standard methods
chr(e) e must be an expression of type int.
ord(c) c must be of type char.
len(a) a must be an array.
Context conditions for the MicroJava productions
Program = "program" ident {ConstDecl | VarDecl | ClassDecl} "{" {MethodDecl} "}".
ConstDecl = "final" Type ident "=" (number | charConst) ";".
 The type of number or charConst must be the same as the type of Type.
VarDecl = Type ident ["[" "]"] {"," ident ["[" "]"]} ";".
ClassDecl = "class" ident "{" {VarDecl} "}".
MethodDecl = (Type | "void") ident "(" [FormPars] ")" {VarDecl} "{" {Statement} "}".
 If a method is a function it must be left via a return statement (this is checked at run
time).
FormPars = Type ident ["[" "]"] {"," Type ident ["[" "]"]}.
Type = ident.
 ident must denote a type.
Statement = Designator "=" Expr ";".
 Designator must denote a variable, an array element or an object field.
 The type of Expr must be assignment compatible with the type of Designator.
Statement = Designator ("++" | "--") ";".
 Designator must denote a variable, an array element or an object field.
 Designator must be of type int.
Statement = Designator "(" [ActPars] ")" ";".
 Designator must denote a method.
Statement = "break".
 The break statement must be contained in a while statement.
Statement = "read" "(" Designator ")" ";".
 Designator must denote a variable, an array element or an object field.
 Designator must be of type int or char.
Statement = "print" "(" Expr ["," number] ")" ";".
 Expr must be of type int or char.
Statement = "return" [Expr] .
 The type of Expr must be assignment compatible with the function type of the
current method.
 If Expr is missing the current method must be declared as void.
Statement =
|
|
|
"if" "(" Condition ")" Statement ["else" Statement]
"while" "(" Condition ")" Statement
"{" {Statement} "}"
";".
ActPars = Expr {"," Expr}.
 The numbers of actual and formal parameters must match.
 The type of every actual parameter must be assignment compatible with the type of
every formal parameter at corresponding positions.
Condition = CondTerm {"||" CondTerm}.
CondTerm = CondFact {"&&" CondFact}.
CondFact = Expr Relop Expr.
 The types of both expressions must be compatible.
 Classes and arrays can only be checked for equality or inequality.
Expr = Term.
Expr = "-"Term.
 Term must be of type int.
Expr = Expr Addop Term.
 Expr and Term must be of type int.
Term = Factor.
Term = Term Mulop Factor.
 Term and Factor must be of type int.
Factor = Designator | number | charConst| "(" Expr ")".
Factor = Designator "(" [ActPars] ")".
 Designator must denote a method.
Factor = "new" Type .
 Type must denote a class.
Factor = "new" Type "[" Expr "]".
 The type of Expr must be int.
Designator = Designator "." ident .
 The type of Designator must be a class.
 ident must be a field of Designator.
Designator = Designator "[" Expr "]".
 The type of Designator must be an array.
 The type of Expr must be int.
Relop = "==" | "!=" | ">" | ">=" | "<" | "<=".
Addop = "+" | "-".
Mulop = "*" | "/" | "%".
A.5




Implementation Restrictions
There must not be more than 256 local variables.
There must not be more than 65536 global variables.
A class must not have more than 65536 fields.
The code of the program must not be longer than 8 KBytes.
Appendix B. The MicroJava VM
This section describes the architecture of the MicroJava Virtual Machine that is used
in the practical part of this compiler construction module. The MicroJava VM is
similar to the Java VM but has less instructions. Some instructions were also
simplified. Whereas the Java VM uses operand names from the constant pool that are
resolved by the loader, the MicroJava VM uses fixed operand addresses. Java
instructions encode the types of their operands so that a verifyer can check the
consistency of an object file. MicroJava instructions do not encode operand types.
B.1 Memory Layout
The memory areas of the MicroJava VM are as follows.
code
data
heap
pstack
estack
esp
pc
free
fp
ra
dl
sp
code
(byte array)
data
(word array)
heap
(word array)
estack
(word array)
pstack
(word array)
code
This area contains the code of the methods. The register pc contains the index
of the currently executed instruction. mainpc contains the start address of the
method main().
data
This area holds the (static or global) data of the main program. It is an array
of variables. Every variable occupies one word (32 bits). The addresses of
the variables are indexes into the array.
heap
This area holds the dynamically allocated objects and arrays. The blocks are
allocated consecutively. free points to the beginning of the still unused area
of the heap. Dynamically allocated memory is only returned at the end of the
program. There is no garbage collector. All object fields occupy a single
word (32 bits). Arrays of char elements are byte arrays. Their length is a
multiple of 4. Pointers are byte offsets into the heap. Array objects start with
an invisible word, containing the array length.
pstack In this area (the procedure stack) maintains the activation frames of the
invoked methods. Every frame consists of an array of local variables, each
occupying a single word (32 bits). Their addresses are indexes into the array.
ra is the return address of the method, dl is the dynamic link (a pointer to the
frame of the caller). A newly allocated frame is initialized with all zeroes.
estack
This area (the expression stack) is used to store the operands of the
instructions. After every MicroJava statement estack is empty. Method
parameters are passed on the expression stack and are removed by the Enter
instruction of the invoked method. The expression stack is also used to pass
the return value of the method back to the caller.
All data (global variables, local variables, heap variables) are initialized with a null
value (0 for int, chr(0) for char, null for references).
B.2 Instruction Set
The following tables show the instructions of the MicroJava VM together with their
encoding and their behaviour. The third column of the tables show the contents of
estack before and after every instruction, for example
..., val, val
..., val
means that this instruction removes two words from estack and pushes a new word
onto it. The operands of the instructions have the following meaning:
b
s
w
a byte
a short int (16 bits)
a word (32 bits)
Variables of type char are stored in the lowest byte of a word and are manipulated
with word instructions (e.g. load, store). Array elements of type char are stored in a
byte array and are loaded and stored with special instructions.
Loading and storing of local variables
1
load b
...
..., val
Load
push(local[b]);
2..5
load_n
...
..., val
Load (n = 0..3)
push(local[n]);
6
store b
..., val
...
Store
local[b] = pop();
7..10 store_n
..., val
...
Store (n = 0..3)
local[n] = pop();
Loading and storing of global variables
11
getstatic s
...
..., val
Load static variable
push(data[s]);
12
putstatic s
..., val
...
Store static variable
data[s] = pop();
Loading and storing of object fields
13
getfield s
..., adr
..., val
Load object field
adr = pop()/4; push(heap[adr+s]);
14
putfield s
..., adr, val
...
Store object field
val = pop(); adr = pop()/4;
heap[adr+s] = val;
Loading of constants
15..20 const_n
...
..., val
Load constant (n = 0..5)
push(n);
21
const_m1
...
..., -1
Load minus one
push(-1);
22
const w
...
..., val
Load constant
push(w);
Arithmetic
23
add
..., val1, val2
..., val1+val2
Add
push(pop() + pop());
24
sub
..., val1, val2
..., val1-val2
Subtract
push(-pop() + pop());
25
mul
..., val1, val2
..., val1*val2
Multiply
push(pop() * pop());
26
div
..., val1, val2
..., val1/val2
Divide
x = pop(); push(pop() / x);
27
rem
..., val1, val2
..., val1%val2
Remainder
x = pop(); push(pop() % x);
28
neg
..., val
..., - val
Negate
push(-pop());
29
shl
..., val, x
..., val1
Shift left
x = pop(); push(pop() << x);
30
shr
..., val, x
..., val1
Shift right (arithmetically)
x = pop(); push(pop() >> x);
31
inc b1, b2
...
...
Increment variable
local[b1] = local[b1] + b2;
Object creation
32
new s
...
..., adr
New object
allocate area of s bytes;
initialize area to all 0;
push(adr(area));
33
newarray b
..., n
..., adr
New array
n = pop();
if (b==0)
alloc. array with n elems of byte size;
else if (b==1)
alloc. array with n elems of word size;
initialize array to all 0;
push(adr(array))
Array access
34
aload
..., adr, i
..., val
Load array element
i = pop(); adr = pop()/4+1;
push(heap[adr+i]);
35
astore
..., adr, i, val
...
Store array element
val = pop(); i = pop(); adr = pop()/4+1;
heap[adr+i] = val;
36
baload
..., adr, i
..., val
Load byte array element
i = pop(); adr = pop()/4+1;
x = heap[adr+i/4];
push(byte i%4 of x);
37
bastore
..., adr, i, val
...
Store byte array element
val = pop(); i = pop(); adr = pop()/4+1;
x = heap[adr+i/4];
set byte i%4 in x;
heap[adr+i/4] = x;
38
arraylength
..., adr
..., len
Get array length
adr = pop();
push(heap[adr]);
Stack manipulation
39
pop
..., val
...
Remove topmost stack element
dummy = pop();
40
dup
..., val
..., val, val
Duplicate topmost stack element
x = pop(); push(x); push(x);
41
dup2
..., v1, v2
Duplicate top two stack elements
..., v1, v2, v1, v2
y = pop(); x = pop();
push(x); push(y); push(x); push(y);
Jumps (jump distance relative to the beginning of the jump instruction)
42
jmp s
43..48 j<cond> s
Jump unconditionally
pc = pc + s;
..., x, y
...
Jump conditionally (eq, ne, lt, le, gt, ge)
y = pop(); x = pop();
if (x cond y) pc = pc + s;
Method call (PUSH and POP work on pstack)
49
call s
Call method
PUSH(pc+3); pc := pc + s;
50
return
Return
pc = POP();
51
enter b1, b2
Enter method
psize = b1; lsize = b2; // in words
PUSH(fp); fp = sp; sp = sp + lsize;
initialize frame to 0;
for (i=psize-1;i>=0;i--) local[i] = pop();
52
exit
Exit method
sp = fp; fp = POP();
Input/Output
53
read
...
..., val
Read
readInt(x); push(x);
54
print
..., val, width
...
Print
width = pop(); writeInt(pop(), width);
55
bread
...
..., val
Read byte
readChar(ch); push(ch);
56
bprint
..., val, width
...
Print byte
width = pop(); writeChar(pop(), width);
Miscellaneous
57
trap b
Generate run time error
print error message depending on b;
stop execution;
B.3 Object File Format
2 bytes: "MJ"
4 bytes: code size in bytes
4 bytes: number of words for the global data
4 bytes: mainPC: the address of main() relative to the beginning of the code area
n bytes: the code area (n = code size specified in the header)
B.4 Run Time Errors
1
Missing return statement in function.
Download