Module #3

advertisement
CS 5300
Compiler Design
The Symbol Table Generator
Fall 2003
due 10/6
During parsing; identifiers, constants, procedure names, etc. are recognized by token only. During code generation
however, actual references will need to be resolved. Multiple references to the same variable will need to be
correlated. The scope of variables will also need to be resolved. Storage will need to be allocated to constants.
During this phase, we will generate a table where all pertinent information regarding these references will be stored.
Our symbol table will consist of five table segments (or sub-tables):
1.
The ID segment represents a table of identifier records. Each record contains five fields;
LEVEL NAME TYPE DIM OFFSET
LEVEL is the procedure nesting level for the declaration.
NAME refers to the identifier name string
TYPE is R (real), I (integer), S (string), or P (proc). Ia, Ra, Sa refer to parameters.
DIM refers to dimensions; 0 for scaler or an index n (n>0) to the dimension array entry (see below)
OFFSET (needed during memory allocation and management)
2.
The second segment is the dimension table. Each entry consists of two parts; the number of dimensions followed by a list of upper bounds. For example, an array declared as [7,6,9,5] would have an entry as; 4 7
6 9 5. The DIM entry in the ID segment for the array identifier would contain the index in a heap array
where the entry (beginning with the 4) is found.
3,4,5. The third and fourth segments are the integer and real constant tables (integer and real strings). The
last or fifth segment is the literal table..
Many approaches to table construction are possible. One simple approach is to fill the table with semantic
actions during shift-reduction parsing. Each time a production is reduced which contains a declaration, a new entry
should be made in the first symbol table segment (and dim heap segment if an array). Each time a production is
parsed which contains a constant or literal, an entry is made into the appropriate segment. Constant strings are simply added at the end of the respective table segments. In other words, the compiler does not attempt to save space
by considering duplicate constants. Unique identifiers should be represented in unique table entries to allow
resolution of multiple references to the same variable.
As a test of your symbol table mechanism, have your compiler resolve each reference to an identifier or
constant found in an executable statement by outputting the symbol table segment name and slot corresponding to
this identifier/constant. For example;
** reference to identifier D represented by segment #1, record 10
** reference to literal "hello" represented by segment #5, record 2
** reference to identifier x represented by segment #1, record 9
** reference to constant 56 represented by segment #3, record 16
** reference to real 3.14 represented by segment #4, record 5
If any forward references are encountered, they may be simply noted. They will be resolved in a future
module.
Please adequately test your compiler (to this point) using the scope-of-reference example and sufficient extra
YAL programs to demonstrate that you have adequately implemented the symbol table phase. Please print out all
tables with sufficient anotation to allow the grader to verify that they are correct.
Download