CS 5300 Compiler Design The Symbol Table Generator Fall 2003 due 10/6 During parsing; identifiers, constants, procedure names, etc. are recognized by token only. During code generation however, actual references will need to be resolved. Multiple references to the same variable will need to be correlated. The scope of variables will also need to be resolved. Storage will need to be allocated to constants. During this phase, we will generate a table where all pertinent information regarding these references will be stored. Our symbol table will consist of five table segments (or sub-tables): 1. The ID segment represents a table of identifier records. Each record contains five fields; LEVEL NAME TYPE DIM OFFSET LEVEL is the procedure nesting level for the declaration. NAME refers to the identifier name string TYPE is R (real), I (integer), S (string), or P (proc). Ia, Ra, Sa refer to parameters. DIM refers to dimensions; 0 for scaler or an index n (n>0) to the dimension array entry (see below) OFFSET (needed during memory allocation and management) 2. The second segment is the dimension table. Each entry consists of two parts; the number of dimensions followed by a list of upper bounds. For example, an array declared as [7,6,9,5] would have an entry as; 4 7 6 9 5. The DIM entry in the ID segment for the array identifier would contain the index in a heap array where the entry (beginning with the 4) is found. 3,4,5. The third and fourth segments are the integer and real constant tables (integer and real strings). The last or fifth segment is the literal table.. Many approaches to table construction are possible. One simple approach is to fill the table with semantic actions during shift-reduction parsing. Each time a production is reduced which contains a declaration, a new entry should be made in the first symbol table segment (and dim heap segment if an array). Each time a production is parsed which contains a constant or literal, an entry is made into the appropriate segment. Constant strings are simply added at the end of the respective table segments. In other words, the compiler does not attempt to save space by considering duplicate constants. Unique identifiers should be represented in unique table entries to allow resolution of multiple references to the same variable. As a test of your symbol table mechanism, have your compiler resolve each reference to an identifier or constant found in an executable statement by outputting the symbol table segment name and slot corresponding to this identifier/constant. For example; ** reference to identifier D represented by segment #1, record 10 ** reference to literal "hello" represented by segment #5, record 2 ** reference to identifier x represented by segment #1, record 9 ** reference to constant 56 represented by segment #3, record 16 ** reference to real 3.14 represented by segment #4, record 5 If any forward references are encountered, they may be simply noted. They will be resolved in a future module. Please adequately test your compiler (to this point) using the scope-of-reference example and sufficient extra YAL programs to demonstrate that you have adequately implemented the symbol table phase. Please print out all tables with sufficient anotation to allow the grader to verify that they are correct.