• Review: Syntax directed translation. – Translation is done according to the parse tree. • Each production (when used in the parsing) is a substructure of the parse tree. • Attributes are associated with grammar symbols – Each grammar symbol represents a construct of the program. – Attributes represent the results of translation for the construct. » E.g: translation = construct the syntax tree, use a tree attribute with each symbol » E.g: translation = calculation the result of the exp, use a val attribute to represent the result. • Semantics rules tell what to do (how to compute the related attributes) when the sub-structure is founded. • Two types of attributes: – Synthesized attribute: • Associated with the left hand side symbol of a production. • the value depends on the attributes associated with the symbols in the right hand side of the production (attributes of its children nodes in the parse tree). – Inherited attribute: • Associated with a symbol in the left hand side of a production. • The value depends on the attributes of its parent or sibling nodes in the parse tree. • Two ways to define the translation: – Syntax directed definition. • Just define the attributes and semantics rules without specifying the order to evaluate the rules. – The order is implicit in the rules • To realize a general syntax directed definition, the compiler needs to conceptually do the following: – Build the parse tree topologically sort the nodes based on the implicit order evaluate the attributes – Not efficient if this has to be done. • Some special definitions can be implemented efficiently without actually build the parse tree. – S-attributed definitions. – L-attributed definitions. • Two ways to define the translation: – Syntax directed translation. • Not only define the attributes and the semantics rules, but also specify the order of how the semantics rules should be applied. • Realizing an S-attributed definition in a LR parser: – Extend the stack to have an additional field (val) for the S-attribute. State … (X, sx) (Y, sy) (Z, zy) … Parser stack val … X.x Y.y Z.z … top • Realizing a S-attributed definition in a LR parser (example 5.17 at page 296): L E n {print(e.val)} E E1 + T {E.val = E1.val + T.val} E T {E.val = T.val} T T1 * F {T.val = T1.val + F.val} T F {T.val = F. val} F ( E ) {F.val = E.val} F digit {F.val = digit.lexval} L E n {print(val[top];} E E1 + T {val[top-2] = val[top-2] + val[top];} ET T T1 * F {val[top-2] = val[top-2] * val[top];} TF F ( E ) {val[top-2] = val[top-1];} F digit • YACC allows only synthesized attributes – It can also handle special types of L-attributes • An attributes can depend on the attributes of the sibling to its left. – Those attributes are already on the stack. How to access them: $i with I <= 0. See the example yacc_inherit.y – Using this is somewhat tricky, need to make sure the context of a production is exactly the same outside the production. » Need to use markers in many cases. – Or passing the attributes with global variables. This is also tricky. • Static checking and symbol table • chapter 6, chapter 7.6 and chapter 8.2 • Static checking: check whether the program follows both the syntactic and semantic conventions at compile time (versus dynamic checking -- check at run time). • Examples of static checking – Type checks: – Flow of control checks int a, b[10], c; … a = b + c; main { int I …. I++; break; } – Examples of static checks – uniqueness check: – defined before use: – name related check: main() { int i, j; double i, j; …. } main() { int i; i1 = 0; …. } LOOPA: LOOP EXIT WHEN I=N I=I+1; END LOOP LOOPB; – Some checks can only be done at runtime: • arraybound checking in java: a[i] = 0; – To perform static checks, semantic information must be recorded in some place -- symbol table. • Grammar specifies the syntax, additional (semantic) information, sometime called attributes, must be recorded in symbol table for all identifiers. • Typically attributes in a symbol table entry include type and offset (where in the memory can I find this variable?). – Struct {int id; int type; int offset;} stentry; • Organization of a symbol table: – basic requirement: must be able to find the information associated with a symbol (identifier) quickly. – Example: array, link list, hash table. – Provides two functions: enter(table, name, type, offset) and lookup(name); – Dealing with nested scope: Program sort(input, output) var a: array [0..10] of integers; x: integer; procedure readarray var x : real; begin …. x …. End procedure quicksort(i, j) begin … x … end begin … x … end main() { int a, b; a = 0; { int a; a = 1; } printf(“a = %d\n”, a); } – How to organize the symbol table? – How to do lookup and enter? • One symbol table for each scope (procedure, blocks)? • Maintain a stack of symbol tables for lookup/enter • Symbol tables for sort: nil header a ... x ... readarray quicksort header x …. Symbol table for readarray Symbol table for sort header Symbol table for quicksort • How does the compiler created the symbol table? – First let us consider the simple case: no nested scope, every thing entered into one symbol table: table by using • enter (table, id, type, offset) – grammar: P ->D D ->D; D D ->id : T T -> integer T ->real T ->array [num] of T T ->^T I : array [10] of integer; j : real; k : integer I array(10, integer) j real k integer 0 40 48 P -> {offset = 0;} D D ->D; D D ->id : T {enter(table, id.name, T.type, offset); offset:= offset + T.width} T -> integer {T.type = integer; T.width = 4} T ->real {T.type = real; T.width = 8;} T ->array [num] of T1 {T.type = array(num.val, T1.type); T.width = num.val * T1.width} T ->^T1 {T.type = pointer(T1.type); T.width = 4;} – Now consider the case when you have nested procedures (blocks can be considered as special procedures) • must maintain a stack of symbol tables, create new ones when entering new procedure • must reset offset when entering new procedures (a stack of offsets) • Let us also compute the total size of a table – Grammar: P->D D ->D; D D->id : T D->proc id; D; S T ->integer | real | array[num] of T | ^T • mktable(previous): make a new table, properly set all links and related information. • Enter(table, name, type, offset). • Addwidth(table, width): compute all memory needed by the symbol table. • Enterproc(table, name, newtable): enter the procedure name with its symbol table into the old table. – Grammar: P->{t=mktable(nil); push(t, tblptr);push(0, offset);}D {addwidth(top(tblptr), top(offset))} D ->D; D D->id : T {enter(top(tblptr), id.name, T.type, top(offset)); top(offset) = top(offset) + T.width;} D->proc id; {t:=mktable(top(tblptr));push(t, tblptr); push(0, offset);}D; S {t:= top(tblptr);addwidth(t, top(offset)); pop(tblptr); pop(offset);enterproc(top(tblptr), id.name, t)} • Dealing with structure (record): – T ->record D end – Make a new symbol table for all the fields in the record. T->record { t=mktable(nil); push(t, tblptr); push(0, offset); } D end { T.type = record(top(tblptr)); T.width = top(offset); pop(tblptr); pop(offset); } Question: How does allowing variable declaration at anywhere in a program (like in C++, java) affect the maintenance of the symbol tables?