Semantic Analysis The Structure of a compiler Character Stream Lexical Analyzer Target Code Token Stream M/c dependent Code Optimizer Syntax Analyzer Syntax tree Target Code Semantic Analyzer Code Generator Syntax tree IR Intermediate Code Generator Front End March 10 & 21, 2014 IR M/c Independent Code Optimizer Vandana, BITS Pilani, CSF363/IS F342 Back End 2 Syntax Directed Definition • The grammar and the set of semantic rules constitute the syntax directed definition. • A syntax-directed translation is used to define the translation of a sequence of tokens to some other value, based on a CFG for the input. • An information is associated as attributes to the grammar symbols • A syntax directed definition (SDD) specifies the values of attributes associating semantic rules with the grammar productions March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 3 Examples • Type of identifiers • Computation of expressions based on static values – X = 37 + 49*2 – 20; • Intermediate code generation • Construction of abstract syntax tree March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 4 Some definitions • Annotated Parse Trees A parse tree showing the attribute values at each node is called an annotated parse tree. • Synthesized Attributes An attribute is said to be synthesized if its value at a parse tree node is determined from attribute values at the children of the node • Inherited attributes An inherited attribute is one whose value at a node in a parse tree is defined in terms of attributes at the parent and/or siblings of that node. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 5 Synthesized Attributes • Synthesized attributes are attributes that are computed purely bottom-up • Such a grammar plus semantic actions is called an S-attributed definition March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 6 Understanding Translation: syntax directed definition for evaluating an expression Production Semantic rule 1 expr exprL + term expr.t:= exprL.t + term.t 2 expr exprL - term expr.t:= exprL.t - term.t 3 expr term expr.t := term.t 4 termdigit term.t := digitval (=getValue(ST, digit)) Example expr expr expr expr term digit - + - term term digit digit term digit expr.t = Example expr -3+9 = 6 expr.t = expr -2 - 1=-3 expr.t = expr2 2-4=– expr.t = expr 2 term.t term= 2 digit - term.t = 4 term digit + term.t term = 1 digit term.t = 9term digit Computing The Translation To compute the s ynthesized attributed translation of a string, • Build the parse tree, • Use the translation rules to compute the translation of each nonterminal in the tree, bottom-up • The translation of the string is the translation of the root nonterminal. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 10 Example : Compute the type of an expression • Compute the type of an expression that includes both arithmetic and boolean operators. (The type is INT, BOOL, or ERROR.) • First define CFG Rules • Determine meaning (semantic) of each rule March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 11 Context free grammar rules with SDD 1. exp -> exp + exp if ((exp2.t == INT) and (exp3.t == INT) then exp1.t = INT else exp1.t = ERROR 2. exp -> exp and exp if ((exp2.t == BOOL) and (exp3.t ==BOOL) then exp1.t = BOOL else exp1.t = ERROR 3. exp -> true exp.t = BOOL March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 12 Context free grammar rules with SDD 4. exp -> false exp.t = BOOL 5. exp -> int exp.t = INT 6. exp -> ( exp ) 7. exp -> exp == exp March 10 & 21, 2014 exp1.t = exp2.t if ((exp2.t == exp3.t) and (exp2.t != ERROR)) then exp1.t = BOOL else exp1.t = ERROR Vandana, BITS Pilani, CSF363/IS F342 13 Generate annotated parse tree for (2+2)==4 exp exp.t=BOOL exp == exp.t=INT ( exp exp.t=INT 2 March 10 & 21, 2014 exp exp.t=INT + ) exp exp.t=INT 2 Vandana, BITS Pilani, CSF363/IS F342 exp exp.t=INT 4 annotated Parse Tree Parse Tree 14 Translation Schemes • A translation scheme is a context free grammar embedded with semantic actions • Semantic actions are the program fragments embedded within the right sides of productions March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 15 Annotated Parse Tree • A parse tree for an S-attributed definition can always be annotated by evaluating the semantic rules for the attributes at each node bottom up, from the leaves to the root. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 16 Example of annotated parse tree for (2+2)==4 exp.t=BOOL == exp.t=INT (exp.t=INT) exp.t=INT 2 March 10 & 21, 2014 + exp.t=INT 4 exp.t=INT 2 Vandana, BITS Pilani, CSF363/IS F342 annotated Parse Tree 17 How to update the symbol table to keep the record of the type of the variables Lexeme Token Int INT num1 id num2 id char CHAR c id type ……… …… Use of inherited attributes evaluates the type Lexeme Token type Int INT integer num1 id integer num2 id integer char CHAR character c id character ……… …… Define Grammar rules for declaration construct (sample) 1. 2. 3. 4. 5. DTL Tint Tchar LL,id Lid March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 20 Attributes for the nonterminals • Terminals={int, char, id, ‘,’} • Non terminals={D,T,L} • Attribute for T:type (T.type represents the type of the declared variables) • Attribute for L: in (L.in represents the inherited attribute information) March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 21 Grammar rules and associated semantic rules 1. 2. 3. 4. 5. DTL Tint Tchar LL1,id Lid March 10 & 21, 2014 1. 2. 3. 4. L.in=T.type T.type=integer T.type=character L1.in=L.in addtype(id.entry,L.in) 5. addtype(id.entry,L.in) Vandana, BITS Pilani, CSF363/IS F342 22 Parse tree for int id,id D L T int L , id id March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 23 Annotated Parse tree for INT id,id D L L.in=integer T.type=integer T int L L.in=integer , id addtype(id.entry,L.in) id Symbol table is updated for both tokens ‘id’ with type of these as integers March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 24 Attributes dependency • Inherited attributes are convenient for expressing the dependence of a programming language construct on the context in which it appears. • If an attribute b at a node in a parse tree depends on an attribute c, then the semantic rule for b at that node must be evaluated after the semantic rule that defines c. • Dependency graph depicts the interdependencies among the inherited and synthesized attributes at a node March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 25 Dependency of attribute • expr.t depends on expr1.t and term.t (refer slide 5 , expression grammar ) • This dependency can be represented as expr.t=f(expr1.t,term.t) • In general, b=f(c1, c2 , c3 , .. , ck) where attribute b depends on attributes c1, c2 , c3 ,.. , ck March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 26 Dependency Graph • The graph has a node for each attribute and an edge to the node for b from the node for c if attribute b depends on attribute c. b c1 March 10 & 21, 2014 c2 c3 Vandana, BITS Pilani, CSF363/IS F342 ck 27 Algorithm to construct Dependency Graph for each node n in the parse tree do for each attribute a of the grammar symbol at n do construct a node in the dependency graph for a; for each node n in the parse tree do for each semantic rule b:= f(c1, c2,……..,ck) associated with the production used at n do for i:=1 to k do construct an edge from the node for ci to the node for b; March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 28 Evaluation Order • A topological sort of a directed acyclic graph is any ordering m1, m2,….mk of the node of the graph such that if mimj is an edge from mi to mj, then mi appears before mj in the ordering. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 29 Topological Sort • Any topological sort of a dependency graph gives a valid order in which the semantic rules associated with the nodes in a parse tree can be evaluated. • The dependent attributes c1, c2,……..,ck in a semantic rule b:= f(c1, c2,……..,ck) are available at a node before f is evaluated. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 30 Topological Sort 7 2 4 1 5 6 9 Directed graph 8 3 edge(i,j) indicates the dependency of node j on i March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 31 Topological Sort 7 2 4 1 5 6 9 Directed graph 8 3 Partial topological order: 1,2,3,4,5,6,8,7,9 March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 32 Semantic Analyzer • The principal job of the semantic analyzer is to enforce static semantic rules • Constructs a syntax tree (usually first) • Information gathered is then needed by the code generator • An attribute's value at a given node depends on those of other predecessor or successor nodes March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 33 Semantic Analyzer • The principal job of the semantic analyzer is to enforce static semantic rules • Constructs a syntax tree (usually first) • Information gathered is then needed by the code generator • An attribute's value at a given node depends on those of other predecessor or successor nodes March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 34 Methods for Evaluating Semantic Rules • Parse Tree Methods Most Flexible But fail if a cycle exists in the dependency graph • Rule Based Methods Analyze rules at compiler-generation time Determine a static ordering at that time Evaluate nodes in that order at compile time • Oblivious Methods Ignore the parse tree and grammar Choose a convenient order and use it March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 35 Parse-tree methods 1. Build the parse tree 2. Build the Abstract Syntax Tree (AST) to get attribute dependence 3. Build the dependency graph 4. Topological sort the graph 5. Evaluate it March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 36 Multi-Pass Approach • Separate AST construction from semantic checking phase • Traverse the AST and perform semantic checks (or other actions) only after the tree has been built and its structure is stable • This approach is less error-prone and is better when efficiency is not a critical issue • Attribute evaluation proceeds as tree-walk of the AST March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 37 Examples of semantic rules Variables must be defined before being used A variable should not be defined multiple times In an assignment stmt, the variable and the expression must have the same type The test expression of an ‘if statement’ must have boolean type March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 38 Abstract Syntax Trees • An Abstract Syntax Tree is a condensed form of parse tree. • It is useful for representing language constructs • The keywords and operators do not appear as the leaves • The syntax directed translation can be based on AST as well as the parse trees • The attributes can be attached to the nodes in similar way as is done in parse trees March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 39 Example Parse tree and an AST E E + E + E - E 7 5 5 7 2 2 Parse Tree March 10 & 21, 2014 - Abstract Syntax Tree Vandana, BITS Pilani, CSF363/IS F342 40 Construction of the Syntax Tree • Similar to the translation of the infix expression into postfix form • A node is created for each operator and operand • The sub-expressions are represented by sub trees whose roots are the children of the operator nodes March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 41 Construction of the Syntax Tree + id - To entry for b num id To entry for a 4 Syntax tree for a-4+b March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 42 Syntax Directed Definition To Construct Syntax Tree Production Semantic rules EE1+T E.nptr:=makenode(‘+’,E1.nptr,T.nptr) EE1-T E.nptr:=makenode(‘-’,E1.nptr,T.nptr) ET E.nptr:=T.nptr T(E) T.nptr:= E.nptr Tid T.nptr:=makeleaf(id,id.entry) Tnum T.nptr:=makeleaf(num,num.val) Rule no Constructing syntax tree x3120 NULL + x3048 E NULL x3048 E NULL T NULL x3048 + T NULL x3120 Semantic rules 1 EE1+T E.nptr:=makenode(‘+’,E1.nptr,T.nptr) 2 EE1-T E.nptr:=makenode(‘-’,E1.nptr,T.nptr) 3 ET E.nptr:=T.nptr 4 T(E) T.nptr:= E.nptr 5 Tid T.nptr:=makeleaf(id,id.entry) 6 Tnum T.nptr:=makeleaf(num,num.val) x3120 x3072 ( NULL E- x3096 NULL ) x3048 IDid Production E x3072 NULL T x3072 NULL - x3096 NULL T NUM num x3096 IDid x3072 • Evaluated Bottom up while generating the parser • The parser keeps the values of S-attributes along with the grammar symbols on its stack Class Assignment: Design Semantic Rules for AST <declarationStmt>===> <type> <var_list> SEMICOLON <var_list>===> ID <more_ids> <more_ids>===> COMMA <var_list> | <type>===> INT | REAL | STRING | MATRIX Input : string str1, s2; • • Construct the parse tree. Identify the information to retain and the one to throw out. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 45 <declarationStmt> <type> SEMICOLON <varList> STRING <moreIds> ID <varList> COMMA ID <moreIds> Parse tree EPSILON March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 46 <declarationStmt> <type> SEMICOLON <varList> STRING <moreIds> ID <varList> COMMA ID Important information and structure must be retained March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 <moreIds> EPSILON 47 <declarationStmt> <type> SEMICOLON <varList> STRING <moreIds> ID <varList> COMMA ID Important information and structure must be retained March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 <moreIds> EPSILON 48 <declarationStmt> STRING ID ID The subtree is traversed now to fetch type info from the extreme left node. Concrete Subtree March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 49 <declarationStmt> <type> STRING SEMICOLON <varList> <moreIds> ID <varList> COMMA ID <moreIds> Important information: Links Associate rules to each production to construct AST March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 EPSILON 50 Semantic Rules for AST <declarationStmt><type> <var_list> SEMICOLON <declarationStmt>.leftChild = <type>.address <declarationStmt>.rightChild = <var_list>.address <var_list> ID <more_ids> <var_list>.leftChild = ID.address <var_list>.rightChild = <more_ids>.address <more_ids> COMMA <var_list> <more_ids>.address = <var_list>.address March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 51 Attributes for different semantic activity • • • • AST Construction: links or addresses Type checking: type Expression evaluation: value Intermediate Code generation: Three address code • Code generation: sequence of lines of assembly code March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 52 Another Example of semantic rules • <assignmentStmt_type1>===> <leftHandSide_singleVar> ASSIGNOP <rightHandSide_type1> SEMICOLON • <leftHandSide_singleVar> ===> ID • <rightHandSide_type1> ===> <arithmeticExpression> | <sizeExpression> |<funCallStmt> • <sizeExpression> ==> SIZE ID March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 53 1. Preserve valuable information 2. Preserve structure AST ASSIGNOP ID SIZE ID Class Assignment: Design AST rules March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 54 AST vs Parse Tree • • • • • • • • AST is condensed form of a parse tree Operators appear at internal nodes, not at leaves. Chains of single productions are collapsed. Lists are "flattened“ into nodes with arbitrary number of children Omits details of concrete syntax (parenthesis, commas, semicolons etc.) AST is a better structure for later compiler stages Contains information about essential structure of the program Source code is only a linearization of the AST Source : google search March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 55 Semantic Analysis : other applications • Three Address Code The general form is x = y op z • x,y,and z are names, constants, compilergenerated temporaries • op stands for any operator such as +,-,… • (x*5)-y is translated as t1 = x * 5 t2 = t1 - y March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 56 Syntax tree vs. Three address code Expression: (A+B*C) + (-B*A) - B _ + B + * _ * A B C A B T1 := B * C T2 = A + T1 T3 = - B T4 = T3 * A T5 = T2 + T4 T6 = T5 – B Three address code is a linearized representation of a syntax tree in which explicit names correspond to the interior nodes of the graph. March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 57 Semantic Analysis Applications: a list • Generating Intermediate Representation (AST, Intermediate Code etc.) • Collecting type information • Type checking and error reporting • Semantic errors reporting for undeclared variables • Collecting Scope information • Expression Evaluation • Generating Code March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 58 Symbol table • Symbol Table is populated with type and scope information, which in turn associates the offset for each identifier and is required at run time for allocating memory to the identifiers March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 59 More on Semantic Analysis • Type checking, intermediate representation and code generation will be discussed later March 10 & 21, 2014 Vandana, BITS Pilani, CSF363/IS F342 60