ACSC373 – Compiler Writing Chapter 6 – Syntax and Semantic Analysis The syntax analyser – the backbone of a compiler It receives sequences of tokens from the lexical analyser and attempts to group these tokens to form syntactic structures as defined by the grammar of the language. The syntactic structures can then be used to generate low-level object code. Another classification for grammars and parsers apart from Chomsky hierarchy Type 2 – context-free grammars of interest here LL(K) grammars If a top-down parser can be written for that grammar which can make a decision as to which production to apply to any stage by simply examining at most the next K symbols of the input LL – left to right, the reading of input LL(1) – the simpler but most important group. Only examination of the next symbol of the input i.e. one symbol look ahead e.g. productions of the form A aB | bC can be handled by an LL(1) parser if ‘a’ apply production A aB if ‘b’ apply production A bC The introduction of extra non-terminals and productions in the grammar, as in A x | y X aB Y bC Efficient parser for this type, no backtracking [single pass] has no effect on the grammars classification i.e. still LL(1). Another example, A aB | aC cannot handled by LL(1). LR(K) grammars If a bottom-up parser can be written for that parser which makes a single left-to-right pass over the input while examining at most the next K symbols of the input. - a bottom-up technique. Top-down Parsing Starts by trying to recognise the starting symbol of the grammar until it reaches the level where it matches the terminal symbols. ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona General approach e.g. S AB AC|D B aD | cD S, starting symbol recognise A in the input sentence then recognise B in the remainder of the input sentence. When A, use second production, if it recognises C or D B or aD or CD aD or cD e.g. a BC procedure A; begin B; C end; Left Recursion Which should be avoided e.g. A Ba | a B AB | b i.e. the procedure A starts by calling procedure B. Procedure B then starts by calling procedure A You can rewrite grammars to eliminate left recursion e.g. Elimination of left recursion: EE+T|T Can be transformed to E TZ Z + TZ | ε or, E T {+T} in EBNF A αβ | αγ Can be rewritten as α € null A αAΓ1 Left factoring AΓ1 αβ | γ Example Consider the following grammar that defines the syntax of an <assignment>: <assignment> <identifier> = <expression>; <expression> <expression> + <term> | <term> <term> <identifier> | <expression> <identifier> x | y | z 2 ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona 2nd production (above) left recursive transform <expression> <term> {+<term>} A recursive parser: Assume the existence of NextToken and error procedures NextToken reads next char token and places it in variable token (onecharacter lookahead) Error produces error messages var token : char; procedure assignment; procedure expression; forward; procedure identifier; begin if token in [‘x’, ‘y’, ‘z’] then NextToken else error (‘Identifier expected’) end; procedure term; begin if token = ‘C’ then begin NextToken; expression; if token <> ‘)’ then error (‘) expected ‘) else NextToken end else identifier end; procedure expression; begin term; while token = ‘+’ do begin NextToken; term end end; begin (*body of assignment*) identifier; if token <> ‘=’ then error (‘=expected’) else begin NextToken; expression; if token <> ‘;’ then error (‘;expected’) else NextToken end end; 3 ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona Similarly, - LL(K) parsing somewhere else! LR(K) parsing Top-down parsers Bottom-up parsing Semantic Analysis Lexical and syntax analysers are not concerned with the meaning or semantics of the programs they process. Once the analysis of the source program is complete, the synthesis of the object program can start and this is where considerations of semantics become important. Semantic analyser e.g. for Pascal Evaluation procedure for expressions by determining the type attributes of the components Selecting appropriate forms of the operators Issuing error messages if incompatible operands etc To ensure that all context-sensitive rules of the language are upheld (symbol table) ‘belonging’ to the semantic analyser The semantic analyser has to perform two distinguishable processes: 1. Flatten the tree (i.e. the parse tree) 2. Cope with type information e.g. i + j – k * r i, j, k integer r real Translation of an arithmetic expression 4 ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona Type checking by consulting the symbol table Types of all variables inserted into the tree Types of intermediate results checking for compatibility i.e. integer * real real e.g. if I div (j – k * r) type conflict div cannot handle integer and real argument 5 ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona The symbol table Table in which symbols such as identifiers are stored and associated with other information such as their type, location, scope and so on. Fast access to this table is important (many references) Symbol table contents 1. TYPE : a name may refer to a constant, a variable type, a procedure or a function e.g. if variable variable’s name if constant value and type 2. If the name refers to an object that can exist at runtime such as a variable, then some means of identifying a name’s run-time location must be included. 3. If the name refers to a subprogram, then some means of identifying the starting address of the subprogram must be included. OR, Store the source line numbers of name declaration and their use. 6