HM Hw 4

advertisement
CS 322
HM
LANGUAGES & COMPILER DESIGN
PSU
HW 4
HomeWork 4, L1 Compiler, 200 Points, +25 for Case Statement (5/10/2005)
Due Date:
Subject:
Monday May 16h, 2005
Design, implement, and test a compiler for language L1, L1 Compiler
General Rules: Implement Homework in C or C++. Any flavor of C or C++ that runs on a PSU
computer will do. Hand in [a listing of all] C/C++ source files plus include files, if any, plus all inputs,
and their generated outputs. Write your name, the HW number, completion date, and the current PSU
term into the header of each source file.
Summary and main work: (110/200) Design, implement, test, and debug a complete compiler for the
L1 language. Start with L1 Parse implemented earlier, and add semantic actions. These actions a) fill
the symbol table and allocate storage (for declarations) and b) generate code for executable statements.
(60/200) Make sure your generated L1 machine code is humanly readable. Test your L1 compiler by
showing the sources and corresponding generated L1 assembly mnemonics for a large variety of L1
programs which you devise. A separate design document is not needed.
Testing: (30/200) Write numerous L1 source programs, compile them, and show for each source
program the corresponding L1 assembly output. Using small tests you write, show individually each
syntax construct, e.g. While Statement, If Statement, For Statement, etc. Also write numerous larger
programs that each show sequences of syntactic constructs and show nested statements. During these
test runs, sometimes turn on (in 50% of all cases) the ability to trace the recursive calling sequences, so
that you can visually follow each step of the parse.
Commenting: (10/200) While a separate design document is not needed, al code must be suitable
commented. Each data structures, algorithm, and methods of the L1 Compiler code generation phase
must be explained in the C/C++ source.
L1 Assembler Source: (50/200) Your L1 compiler either emits binary object, in which case you also
implement a disassembler. Or the L1 compiler generates L1 assembler ASCII output, which requires
that you also implement an assembler, in order to generate the final binary code. Note that no linking is
necessary. The loading phase for execution by the simulator consists of just reading the object file into
the L1 Simulator’s code space. For this homework, show numerous L1 source programs with their
corresponding L1 assembler output.
L1 Binary Code, an Early Decision: Decide whether binary L1 object code is generated by the L1
compiler or by an L1 assembler. You have 2 choices.
1. If your compiler emits L1 binary code directly, then you must implement a disassembler that
shows, one line at a time per instruction, each L1 opcode plus all applicable operands in human
readable format. Only the operands actually used by an instruction are shown in the disassembly.
Advantage of this scheme: your compiler output is immediately executable. Disadvantage: You
have to implement a disassembler.
2. If your L1 compiler emits L1 assembler mnemonics in ASCII, then you must implement an L1
assembler that reads L1 assembly programs, generated by the compiler, and converts them into
1
HW 4
CS 322
HM
LANGUAGES & COMPILER DESIGN
PSU
HW 4
executable binary L1 Arch instructions. Advantage of this method: your compiler output is
immediately readable by humans. Disadvantage: you have to implement an assembler to generate
binary code.
Symbol Table: Implement a symbol table scheme similar to the one discussed in CS321; even a
simple array of symbol-table structures is acceptable and very effective, as it simplifies your
implementation, while limiting the maximum size/complexity of L1 source programs somewhat. Make
sure that you can dump the symbol table in readable form, so you are sure you implement what you
think you design. Dumping symbol table data as an option of the compile step is crucial for your ability
to debug your design. Show in your homework that symbol table dumping works.
Memory Allocation: Decide early, whether your target system has byte- or word-addressable memory.
For simplicity I recommend you just use words, each32-bits in length, which is likely your real host
computer’s arithmetic power. In that case, even if you allocate a single character, it will waste 75% of
the bits in a word, but this is a simple run-time environment. Storage saving is not a high priority.
I recommend that memory starts at address 0. Each subsequent word has an address 1 greater than
the previous. This will allow easy inclusion of reals (AKA floats) later. It also makes it easy to add a
word-oriented run-time stack later.
If your source program defines more storage than your target computer supports, the compiler
needs to emit a suitable error message. Obviously the program will not execute/simulate in that case.
Code Generation: Decide the type of object code. You can define a low-level, genuine binary
machine format, or you can use a higher-level abstraction expressed as C/C++ structs. The former
approach requires you to pack all opcode and operand bits in binary form into bytes or words, but
enables the use of the same run-time memory space for data as well as instructions. This approach is
closer to a real machine, is a bit harder to implement, and is suitable for execution on a real machine.
The second approach is a bit easier to implement, is suitable for simulation, where the machine
instructions do not need to follow an exact binary format. I recommend you to use the second
approach. A draw-back of the second approach is that the space for data (perhaps an array of words)
and the space for instructions (an array of high-level C++ structures) be separate; this is inconsistent,
however, with real computer architectures.
If you use the higher-level abstraction, then define your machine instructions in terms of C/C++
structs. One of the structure fields is the opcode, another is the first operand, yet another field is the
second operand, and a fourth field is the result. Note that all operand and result types can be structs and
unions themselves, as a source operand may be a register, a memory location, possibly even of one of
several sizes, or it can be an in-line operand (a literal value encode din the instruction space.
Ultimately, thus you define quadruples, which bit-by-bit generally do not match real machine
instructions.
Define, whether your generated code is emitted directly onto a binary file or into a buffer. The file
will be read later by the simulator, and constitutes the load step at the start of execution. The buffer will
have to be flushed by the end of coder generation, so the information is available to the executing
hardware, or the simulator. There are cases of forward references, that dictate -for a single-pass
compilation process- that the object code be buffered anyway, since addresses or other operands have
to be back-patched when known at a future step. The fixed size of your code buffer will limit the
2
HW 4
LANGUAGES & COMPILER DESIGN
CS 322
HM
PSU
HW 4
maximum amount of object code somewhat, as the distance between the first forward reference and its
final resolution is now fixed by the size of the code buffer.
Statements: Each kind of high-level source statement has a natural code pattern, which when used
consistently, allows for correct code generation, no matter how complex the nesting structure of the
source program is. Use one logical module in your compiler for each unique high-level source
construct. For example, have one unique function for If Statements, another for Loop Statements, and a
third one for For Statements, etc. There will be cases, in which one module generates a branch that
targets another instruction which is itself a branch, resulting in non-optimal code. Do not worry about
non-optimal code. Some object code patterns for High-Level source statements are listed here:
Loop Statement, source:
loop
<stmt0>
end loop;
Loop Statement object code pattern:
l1:
<object code for stmt0>
j l1
If Statement, source:
if <cond> then
<stmt1>
else
<stmt2>
end if;
If Statement object code pattern:
<object code for cond>
jf else
<object code for stmt1>
j endif
else:
<object code for stmt2>
endif:
While Statement, source:
while <cond> loop
<stmt>
end loop;
While Statement object code pattern:
w1:
<object code for cond>
jf endw
<object code for stmt>
j w1
endw:
L1 Case Statement: (25/225) If you implement Case Statements L1, you can get up to 25 points of
extra credit just for this one homework. But the 25 points of extra credit are granted only after the
simulator correctly executes them.
3
HW 4
Download