Chapter 4 Syntax Analysis Yu-Chen Kuo 1 4.1 The Role of The Parser • A parser obtains a string of tokens from the lexical analyzer and verifies the string can be generated by the grammar for the source language. • We expect the parser to report any syntax errors in an intelligible fashion. It should also recover from commonly occurring errors so that it can continue processing the remainder of its input. Yu-Chen Kuo 2 4.1 The Role of The Parser Yu-Chen Kuo 3 Three Types of Parsers 1. CYK algorithm and Early’s algorithm: inefficient to use in production compilers 2. Top-down method 3. Bottom-up method Yu-Chen Kuo 4 Syntax Error Handling • Lexical error: – misspelling an identifier, keyword, or operator • Syntactic error: – an arithmetic expression with unbalanced parentheses • Semantic error: – an operator applied to an incompatible operand • Logical error: – an infinitely recursive call Yu-Chen Kuo 5 Syntax Error Handling (Cont.) • The error handler in a parser has simple-tostate goals: – It should report the presence of errors clearly and accurately – It should recover from each error quickly enough to be able to detect subsequence errors – It should not significantly slow down the processing of correct programs Yu-Chen Kuo 6 Error-Recovery Strategies • Panic mode – Discard the input symbol until one of a designated set of synchronizing tokens is found – synchronizing token: ; end – Guarantee not to go into an infinite loop • Phrase level – Parser may perform local correction – replace a prefix of the remaining input by some allowed string; – replace , by ; – delete an extraneous ; or insert missing ; – May lead to an infinite loop if we always insert something on the input ahead the current input symbol Yu-Chen Kuo 7 Error-Recovery Strategies (cont.) • Error production – Grammars to produce errors • Global correction – Given an incorrect input string x and grammar G, find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as possible – Too costly Yu-Chen Kuo 8 4.2 Context-Free Grammars • stmt if expr then stmt else stmt 1. Terminals: tokens • if, then, else 2. Noterminals: set of strings • expr, stmt 3. Start symbol • stmt 4. Productions Yu-Chen Kuo 9 Example 4.2 expr expr op expr expr (expr) expr - expr expr id op + op op * op / op • • • Terminals: id, +, -, *, / . Noterminals: expr, op Start symbol: expr Yu-Chen Kuo 10 Notational Conventions 1. There symbols are terminals: i) ii) iii) iv) v) 2. Lower-case letters: a, b, c Operator symbols: +, Punctuation symbols: parentheses, comma Digits: 0, 1, …,9 Boldface strings: id, if There symbols are nonterminal: i) Upper-case letters: A, B, C ii) The letter S: start symbol iii) Lower-case italic names: expr, stmt Yu-Chen Kuo 11 Notational Conventions (cont.) 3. 4. 5. 6. 7. Upper-case letters late in alphabet: X, Y, Z, represent grammar symbols (terminals or nonterminal) Lower-case letters late in alphabet: u, v,…z, represent string of terminals Lower-case Greek letters : , , , represent string of grammar symbols A-productions (all productions): A 1| 2|…| k Start symbol: the left side of the first production Yu-Chen Kuo 12 Example 4.3 E EAE | (E) | - E | id A+|-|*|/| By notational conventions − Nonterminals: E, A Terminals: remaining symbols Yu-Chen Kuo 13 Derivations E E+E | E*E| (E) | - E | id • E derives -E – • The derivation of -(id) from E − • • • • E-E E - E - (E) -(id) A , if A : one step derivation * : zero or more steps derivations : one or more steps derivations Yu-Chen Kuo 14 Derivations (cont.) • • • • • * , for any string * * If , , then L(G) denotes the language generated by G. L(G) contains all terminal symbols. w L(G), if S w. String w is call a sentence of G. * S , may contain nonterminals. We call is a sentential form of G. E.g., -(id + id) is a sentence of the grammar, because E -(id + id) Yu-Chen Kuo 15 Leftmost & Rightmost Derivations Leftmost derivation ( ) lm • -(E+E) -(id+E) -(id+id) Rightmost derivation ( ) rm – -(E+E) -(E+id) -(id+id) S . We call is a left-sentential form of G. lm S . We call is a cannonical-sentential – • • • rm form of G. Yu-Chen Kuo 16 Parse Tree and Derivations Yu-Chen Kuo 17 Parse Tree and Derivations (cont.) Yu-Chen Kuo 18 Ambiguity • • • More than one parse tree for some sentences More than leftmost derivation for some sentences More than rightmost derivation for some sentences Yu-Chen Kuo 19 4.3 Regular Expression vs. Context-free grammar • Every language that can be described by a regular expression can also be described by a context-free grammar – – • (a|b)*abb A0 aA0 | bA0 | aA1 A1 bA2 A2 bA3 A3 Every regular set is a context-free language Yu-Chen Kuo 20 Why use regular expression to define the lexical syntax of a language ? • 1. 2. 3. 4. Why not use CFG for the lexical syntax Lexical rules of a language are frequently quite simple. We do not need a powerful grammar. Regular expression provide a more concise and easier to understand notation for tokens An efficient lexical analysis can be constructed automatically from regular expressions Separating the syntactic structure of a language into lexical and nonlexical parts Yu-Chen Kuo 21 Why use regular expression to define the lexical syntax of a language ? • • • Regular expressions are most useful for describing structure of lexical constructs such as identifies, constants, keywords Grammars are most useful for describing nested structure of lexical constructs such balanced parentheses, matching begin-end’s, corresponding if-then-else’s. Nested structures can not be described by regular expressions. Yu-Chen Kuo 22 Verifying the Language Generated by a Grammar • Proof that L(G) = L – – Every string generated by G is in L Every string in L can be generated by G S (S)S | , generates all string of balanced ( ) • – Every sentence derived from S is balanced by induction • • • – S (S)S * (x)S * (x)y (n steps) S * x (less than n setps and must be balanced) S * y (less than n setps and must be balanced) Every balanced string length 2n is derivable from S • • • w = (x)y of length 2n x and y are length of less than 2n. They are both balanced and derivable from S S (S)S * (x)S * (x)y =w Yu-Chen Kuo 23 Eliminating Ambiguity stmt if expr then stmt | if expr then stmt else stmt | other Yu-Chen Kuo 24 Eliminating Ambiguity (cont.) • • Disambiguating rule: match each else with the closest previous unmatched then The statement between a then and an else must be matched stmt matched_stmt | unmatched_stmt matched_stmt if expr then matched_stmt else matched_stmt | other unmatched_stmt if expr then matched_stmt else unmatched_stmt | if expr then stmt Yu-Chen Kuo 25 Eliminating Immediate Left Recursion • • A grammar is left recursive if it has a production A+A Top-down parsing methods cannot handle leftrecursion grammars because top-down parsing is corresponding to the leftmost derivation. A A1 | A2 | ... | Am | 1 | 2 | ... | n A 1 A' | 2 A' | ... | n A' A' 1 A' | 2 A' | ... | m A' | Yu-Chen Kuo 26 Eliminating Immediate Left Recursion (cont.) • Non-immediate left recursion S Aa | b A Ac | Sd | S Aa Sda Yu-Chen Kuo 27 Eliminating General Left Recursion • Input Grammar G with no cycle (A+A) or -production Yu-Chen Kuo 28 Eliminating General Left Recursion (cont.) • Non-immediate left recursion S Aa | b A Ac | Sd | A Ac | Aad | bd | S Aa | b A bdA’ A’ cA’ | adA’ | Yu-Chen Kuo 29 Eliminating Left Factoring • When it is not clear which of two alternative productions to use to expand a nonterminal A. We rewrite A-production to defer the decision until we have seen enough of the input. stmt if expr then stmt | if expr then stmt else stmt stmt if expr then stmt S’ S’ else stmt | • A 1 | 2 |…| n | A A’| A’ 1 | 2 |…| n Yu-Chen Kuo 30 Non-Context-Free Language Constructs • • L1={wcw | w is in (a|b)*} is not context-free L1’={wcwR | w is in (a|b)*} is context-free – S aSa | bSb | c • L2 ={a nbmcn d m | n 1, m 1} is not context-free • L2’ ={a nbmcmd n | n 1, m 1} is context-free – S aSd | aAd A bAc | bc n n m m { a • L2’’= b c d | n 1, m 1} is context-free Yu-Chen Kuo 31 Non-Context-Free Language Constructs • • • • L3 = {a nbn c n | n 0} is not context-free n n { a L3’= b | n 1} is context-free – S aSb | ab Context-free grammar can keep count of two items but not three. Regular expression cannot keep count. Yu-Chen Kuo 32 Top-Down Parsing • Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It constructs a parser tree for the input string to root and creating the nodes of the parser tree in preorder. Yu-Chen Kuo 33 Recursive Descent Parsing • A general top-down parsing that may involve backtracking • E.g., S cAd A ab| a , w=cad Yu-Chen Kuo 34 Predictive Parsers • By carefully writing a grammar, eliminating left recursion, and left factoring, we obtain a grammar that can be parsed by a recursivedescent parser that needs no backtracking. (predictive parser) • Predictive Parser is implemented by recursive procedures Yu-Chen Kuo 35 Predictive Parsers (cont.) type simple | id | array [simple] of type simple integer | char | num dotdot num Yu-Chen Kuo 36 Transition Diagrams for Predictive Parsers • • We can create a transition diagram for a predictive parsers For each nonterminal A: 1. Create an initial and final state 2. For each production A X1X2…Xn, create a path from the initial to the final state, with edges labeled X1, X2, …, Xn • Based on transition diagram to match terminals again lookahead input symbols Yu-Chen Kuo 37 Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo 38 Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo 39 Transition Diagrams for Predictive Parsers (cont.) Yu-Chen Kuo 40 Nonrecursive Predictive Parsing • It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than via recursive calls. • The key problem during predictive parsing is that of determining the production to be applied for a nonterminal. The nonrecursive parser looks up the production to be applied in a parsing table. Yu-Chen Kuo 41 Nonrecursive Predictive Parsing(Cont.) Yu-Chen Kuo 42 Nonrecursive Predictive Parsing(Cont.) • The parser has an input buffer, a stack, a parsing table, and an output stream. • The input buffer contains the strings to be parsed followed by $, a symbol used to indicate the end of the input string. • The stack contains a sequence of grammar symbol with $ on the bottom, indicating the bottom of the stack, Initially, the stack contains the start symbol S of the grammar on top of $. Yu-Chen Kuo 43 Nonrecursive Predictive Parsing(Cont.) • The output stream show the derivation steps for the grammar to produce the input string. • The parser table is a two-dimensional array M[A, a] to show the stack action for a nonterminal A in the top of stack to meet a terminal a or the symbol $. Yu-Chen Kuo 44 Predictive Parsing Algorithm • Input. A string w and a parsing table M for G • Output. A leftmost derivation of w, if wL(G) • Method. – Put $S on stack where S is the start symbol of G – Put w$ in the input buffer – Execute the predictive parsing program (Fig. 4.14) Yu-Chen Kuo 45 Predictive Parsing Program Yu-Chen Kuo 46 Example • Consider non-left-recursive grammar for arithmetic expression E TE’ E’ + TE’ | T FT’ T’ * FT’ | F (E) | id Yu-Chen Kuo 47 Example (parsing table M) Yu-Chen Kuo 48 Example (Stack Moves) Yu-Chen Kuo 49 FIRST and FOLLOW • The construction of a predictive parser is aided by FIRST and FOLLOW functions. • These functions help us to construction the predictive parser table. • FOLLOW function can also be used as synchronizing tokens during panic-mode error recovery. Yu-Chen Kuo 50 FIRST function • If is a string of a grammar symbols, FIRST() be the set of terminals that begin the strings derived from . • If * , FIRST(). Yu-Chen Kuo 51 FIRST Sets • Compute FIRST(X) for all grammar symbols X, by the following rules until no terminals or can be added to any FIRST(X) 1. If X is a terminal, then FIRST(X)={X}. 2. If X , FIRST(X) 3. If XY1Y2…Yk, then aFIRST(X) where a FIRST(Yi) and FIRST(Y1), FIRST(Y2) ,…,FIRST(Yi-1), Y1Y2…Yi-1 * Yu-Chen Kuo 52 FIRST sets (cont.) 3. If FIRST(Yj), for all j=1,2,..,k, then FIRST(X). • • Everything in FIRST(Y1) is also in FIRST(X). If Y1 does not derive , nothing more added to FIRST(X). Otherwise, we add FIRST(Y2), and so on. For FIRST(X1X2…Xn), FIRST(X1) FIRST(X1X2…Xn), FIRST(X2) FIRST(X1X2…Xn), if FIRST(X1) and so on. FIRST(X1X2…Xn) if FIRST(Xi) for all i. Yu-Chen Kuo 53 FOLLOW function • Define FOLLOW(A), for nonterminal A, to be the set of terminal a that can appear immediately to right of A. • S*Aa, aFOLLOW(A). • If A can be the rightmost symbol in some sentential form, then $ FOLLOW(A). Yu-Chen Kuo 54 FOLLOW Sets • Compute FOLLOW(A), for nonterminal A, by the following rules until nothing can be added to any FOLLOW set. 1. If S is a start symbol, $FOLLOW(S). 2. If AB, FIRST()FOLLOW(B). 3. If AB or AB and FIRST(), FOLLOW(A) FOLLOW(B). (FOLLOW(B) may not FOLLOW(A)) Yu-Chen Kuo 55 Example • • • • • • E TE’ E’ + TE’ | T FT’ T’ * FT’ | F (E) | id FIRST(E)=FIRST(T)=FIRST(F)={(, id} FIRST(E’)={+, } FIRST(T’)={*, } FOLLOW(E)=FOLLOW(E’)={ ), $} FOLLOW(T)=FOLLOW(T’)={+, ), $} FOLLOW(F)={+, *, ), $} Yu-Chen Kuo 56 Construction of Predictive Parsing Table • • Suppose A , aFIRST(). Then, the parser will expand A by when the current input symbol is a. If A , = or *, then the parser will expand A by when the input symbol is in FOLLOW(A) or if $ on the input has been reached and $FOLLOW(A). Yu-Chen Kuo 57 Construction of Predictive Parsing Table (Algorithm) • • • 1. 2. Input. Grammar G. Output. Parsing table M. Method. For each production A , do steps 2 and 3. For each terminal aFIRST(), add A to M[A, a] 3. If FIRST(), add A to M[A, b] for each terminal bFOLLOW(A). If FIRST(), $ FOLLOW(A), add A to M[A, $]. 4. Make each undefined entry of M be error. Yu-Chen Kuo 58 Example Yu-Chen Kuo 59 LL(1) • A grammar whose parsing table has no multiply-defined entries is said to be LL(1). • The first “L” means scanning the input form left to right. • The second “L”, means a leftmost derivation. • And, “1” means using one input symbol of lookahead at each step. Yu-Chen Kuo 60 Example (multiply-defined entry) S iEtSS’ | a (ambiguous) S’ eS | FIRST(S)={i,a}, FIRST(S’)={e, } E b FOLLOW(S)={e,$}, FOLLOW(S’}={e,$} Yu-Chen Kuo 61 LL(1) Properties • No ambiguous or left-recursive grammar can be LL(1). • A grammar G is LL(1) if and only if A | , the following conditions hold: 1. FIRST() FIRST()=. 2. At most one of * or *. 3. If *, then FIRST() FOLLOW(A) )=. • If-then-else statement violates the condition 3, so not LL(1). Yu-Chen Kuo 62 Error Recovery in Predictive Parsing • In nonrecursive predictive parsing, an error is detected in one of the following two situations: 1. When the terminal on top of the stack does not match the next input symbol 2. When nonterminal A is on top of the stack, a is the next input symbol, and the parsing table entry M[A,a] is empty. Yu-Chen Kuo 63 Error Recovery in Predictive Parsing (cont.) • • • Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens appears. Its effectiveness depends on the choice on the synchronizing set. Some heuristics are as follows. Yu-Chen Kuo 64 Error Recovery in Predictive Parsing (cont.) 1. We place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue. 2. There is hierarchical structure on constructs in a language; e.g., expressions within blocks, and so on. We can add to the synchronizing set of a lower construct the symbols that begin higher constructs. Yu-Chen Kuo 65 Error Recovery in Predictive Parsing (cont.) 3. If we add symbols in FIRST(A) to the synchronizing set of nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input. 4. If a nonterminal can generate the empty string, then the production deriving can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed. Yu-Chen Kuo 66 Error Recovery in Predictive Parsing (cont.) 5. If a terminal on top of the stack, cannot be matched, a simple idea is to pop the terminal, issue a message saying that terminal was inserted, and continue parsing. Yu-Chen Kuo 67 Example • Add “sync” to indicate synchronizing tokens obtained from FOLLOW sets. Yu-Chen Kuo 68 Example • • • Yu-Chen Kuo If M[A,a]=, skip a. If M[A,a]=sync, A is popped. If a token on top of stack does not match input, pop it. 69 Bottom-Up Parsing • Shift-reduce parsing is a general style of bottom-up parsing. • It attempts to construct a parse tree for an input string beginning at the leaves and working up towards the root. • At each reduction step a particular substring matching the right side of a production is replaced by the nonterminal on the left side of that production. Yu-Chen Kuo 70 Bottom-Up Parsing (cont.) • If the substring is chosen correctly at each reduction step, a rightmost derivation is traced out in reverse. Yu-Chen Kuo 71 Example • • Consider the following grammar S aABe A Abc | b B d The sentence “a b b c d e” cab be reduced to S by the following reduction steps: Yu-Chen Kuo 72 Example • • Consider the following grammar S aABe A Abc | b B d The sentence “a b b c d e” cab be reduced to S by the following reduction steps: Yu-Chen Kuo 73 Example abbcde (A b) (handle at position 2) aAbcde (A Abc) aAde (B d) aABe (S aABe) S The reductions trace out the following rightmost derivation in reverse: S rm a A B e a A d e a A b c d e a b b c d e 1. 2. 3. 4. 5. • Yu-Chen Kuo 74 Handles • • Informally, a handle of a string is a substring that matched the right side of a production, and whose reduction to the nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation. Formally, a handle of a right-sentential form is a production A and a position of where the string may be found and replace by A to produce the previous right-sentential form in a rightmost derivation . Yu-Chen Kuo 75 Handles (cont.) If S * Aw * w, then A in the position following is a handle of w. • Note: 1. The string w to the right of a handle contains only terminal symbols. 2. If a grammar is unambiguous, then every rightsentential form of the grammar has exactly one handle; otherwise, some right-sentential forms may have more than one handle. • Yu-Chen Kuo 76 Example • • • 1. Consider the following ambiguous grammar EE+E | E*E | (E) | id Two rightmost derivation of id1+id2*id3 E E+E E+ E*E E+ E*id3 E+ id2*id3 id1+ id2*id3 – – id1 is a handle of the right-sentential form id1+id2*id3 E id, replace id1 by E becomes E+ id2*id3 2. E E*E E*id3 E+ E*id3 E+ id2*id3 id1+ id2*id3 two possible handles Yu-Chen Kuo 77 Handles (cont.) Yu-Chen Kuo 78 Handles (cont.) • • The handle represents the leftmost complete subtree consisting of a node and all its children. Reducing to A in w can be thought of as “pruning the handle”, removing the children of A from the parse tree. Yu-Chen Kuo 79 Stack Implementation of ShiftReduce Parsing • • We implement shift-reduce parsing by using a stack to hold grammar symbols and an input buffer to hold input string w. We use $ to mark the end of the stack and the input buffer STACK INPUT $ w$ Yu-Chen Kuo 80 Stack Implementation of ShiftReduce Parsing • • • The parser shifts input symbols onto the stack until a handle is on top of stack. It reduces to the left side of production A. Repeats this cycle until it has an error or the stack contains S and the input buffer is empty. STACK INPUT $S $ Yu-Chen Kuo 81 Example Yu-Chen Kuo 82 Conflicts during Shift-Reduce Parsing • There are context-free grammars for which shift-reduce parsing cannot be used. • It’s possible to reach a configuration such that knowing the entire stack contents and the next input symbol, cannot decide whether to shift or to reduce (a shift/reduce conflict), or which of several reductions to make (a reduce/reduce conflict) Yu-Chen Kuo 83 Example of Shift/Reduce Conflict stmt if expr then stmt | if expr then stmt else stmt | other STACK INPUT $ … if expr then stmt else …$ • Note that if we resolve the conflict in favor of shifting, the parser will behave naturally. Yu-Chen Kuo 84 Example of Reduce/Reduce Conflict (1) (2) (3) (4) (5) (6) (7) (8) (9) stmt → id (parameter_list) stmt → expr := expr parameter_list → parameter_list , parameter parameter_list → parameter parameter → id expr → id (expr_list) expr → id expr_list → expr_list , expr expr_list → expr STACK $ … id ( id INPUT , id )…$ Yu-Chen Kuo 85 LR Parsers • LR(k) parsing is an efficient, bottom-up parsing technique. • The “L” stands for left-to-right scanning of the input, the “R” for constructing a rightmost derivation in reverse, and the “k” for the number of input symbols of lookahead that are used in making parsing decisions. • When (k) is omitted, k is assumed to be 1. Yu-Chen Kuo 86 LR Parsers (cont.) • LR parsing can be used to parse a large class of context-free grammars than LL parsing. • The principal drawback of LR parsing is that is too much work to construct an LR parser by hand for a typical programming language grammar. • We need a specialized tool an LR parser generator. Fortunately, many such generators are available. Yu-Chen Kuo 87 The LR Parsing Algorithm • The schematic form of an LR parser: Yu-Chen Kuo 88 The LR Parsing Algorithm (cont.) • An LR parser consists of an input, an output, a stack, a driver program, and a parsing table that has two parts( action and goto) • The driver program is the same for all LR parsers. • The parsing table changes from one parser to another. • The parsing program reads tokens from an input buffer one at a time. Yu-Chen Kuo 89 The LR Parsing Algorithm (cont.) • The parser uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a state symbol. Each state symbol summarizes that information contained in the stack below it. • The combination of the state symbol on top of the stack and the current input symbol are used to index the parsing table to determine the parsing shiftreduce decision. Yu-Chen Kuo 90 The LR Parsing Algorithm (cont.) • • 1. 2. 3. 4. • The parsing table consists of two parts: a parsing action function and a goto function An action table entry can have one of four values: shift s, where s is a state reduce by a grammar production A accept error The function goto takes a state and a grammar symbol arguments and produces a state. Yu-Chen Kuo 91 The LR Parsing Algorithm (cont.) • • A configuration of an LR parser is a pair whose first component is the stack contents and whose second component is the unexpended input: (s0X1s1X2s2…Xmsm , ai ai+1…an$) The next move of the parser is determined by reading ai, the current input symbol, and sm, the state on top of the stack, and then consulting the parsing action table entry action[sm, ai] Yu-Chen Kuo 92 The LR Parsing Algorithm (cont.) • A configurations resulting after each of the four types of move are as follows: 1. If action[sm , ai ] = shift s, the parser executes a shift move, entering the configuration (s0X1s1X2s2…Xmsm ai s , ai+1…an$) Here the parser has shifted both the current input symbol ai, and the next state s, which is given in action[sm , ai ], onto the stack; ai+1 becomes the current input symbol. Yu-Chen Kuo 93 The LR Parsing Algorithm (cont.) 2. If action[sm , ai ] = reduce A, the parser executes a reduce move, entering the configuration (s0X1s1X2s2…Xm-rsm-r A s, ai ai+1…an$) where s =goto[sm-r, A] and r is the length of . Here the parser first popped 2r symbols off the stack ( r state symbols and r grammar symbols), exposing state sm-r. The parser then pushed both A and s, the entry for goto[sm-r, A], onto the stack. Yu-Chen Kuo 94 The LR Parsing Algorithm (cont.) 3. If action[sm , ai ] = accept, parsing is completed. 4. If action[sm , ai ] = error, the parser has discovered an error and calls an error recovery routine. Yu-Chen Kuo 95 The LR Parsing Program Yu-Chen Kuo 96 Example (1) E (2) E (3) T (4) T (5) F (6) F E+T T T*F F (E) id Yu-Chen Kuo 97 Example (cont.) Yu-Chen Kuo 98 Example (cont.) See p.220 Yu-Chen Kuo 99 Constructing LR Parsing Tables • There are three methods for constructing an LR parsing table for a grammar. (1) Simple LR (SLR) is the easiest to implement, but least powerful. (2) Canonical LR is the most powerful, and the most expensive. (3) Lookahead LR(LALR) is intermediate in power and cost. Yu-Chen Kuo 100 Constructing SLR Parsing Tables 1) FOLLOW(A) for every nonterminal A in G. 2) The augmented grammar G’ 3) The canonical collection of sets of LR(0) items C 4) The transition diagram for viable prefixes 5) The parsing table action and goto function Yu-Chen Kuo 101 Example (1) E (2) E (3) T (4) T (5) F (6) F E+T T T*F F (E) id Yu-Chen Kuo 102 Step 1: FOLLOW sets for Nonterminals • • FOLLOW(E) = { +, ), $} FOLLOW(T)=FOLLOW(F)={+,*,),$} Yu-Chen Kuo 103 Step 2: The Augment Grammar • If G is a grammar with start symbol S, then G’, the argument grammar for G, is G with a new start symbol S’ and productions S’S. • The argument grammar is as follows: E’ E E E+T |T T T*F|F F (E) | id Yu-Chen Kuo 104 Step 3: Sets of LR(0) Items • • • • An LR(0) item( item for short) of a grammar G is a production of G with a dot at some position of the right side. The production AXYZ yields the four items A •XYZ A X • YZ A XY•Z A XYZ • The production A generates only one item A• Yu-Chen Kuo 105 The Closure Operation • If I is a set of LR(0) items for a grammar G, then closure(I) is the set of items constructed from I by the following two rules: 1. Initinally, every item in I is added to closure(I). 2. If A•B is closure(I) and B is a production in G, then add the item B • to closure(I), if it is not already there. 3. We apply this rule until no more new items can be added to closure(I). Yu-Chen Kuo 106 Example • E’ E E T T F F If I is a set of one item {[E’•E]}, then closure(I) contains the items •E •E+T •T •T*F •F • (E) • id Yu-Chen Kuo 107 The Goto Operation • If I is a set of items and X is a grammar symbol, then goto(I,X) is the closure of the set of all items [AX•] such that [A •X] is in I. Yu-Chen Kuo 108 The Goto Operation • E T T F F If I is a set of items {[E’•E], [E E •+T]}, then goto(I,+) consists of E + •T •T * F •F • (E) • id Yu-Chen Kuo 109 The Sets-of-Items Construction • The algorithm to construct C, the canonical collection of sets of LR(0) items (all possible items) for a augumented grammar G’, is shown below. Yu-Chen Kuo 110 Example • closure ({[E’•E]})=I0: E’ •E E •E+T E •T T •T*F T •F F • (E) F • id • goto(I0, E)=I1: E’ E • E E•+T Yu-Chen Kuo 111 Example (cont.) • goto(I0, T)= I2: E T • • • goto(I0, F)= I3: T F • goto(I0, ( )= I4: F (• E) T T•*F E E T F F • •E+T •T •T*F • (E) • id goto(I0, id)= I5: F id • Yu-Chen Kuo 112 Example (cont.) • • goto(I1, +)= I6: E E + • T T T F F goto(I2, *)= I7: T F F •T*F •F • (E) • id T*•F • (E) • id Yu-Chen Kuo 113 Example (cont.) • goto(I4, E)= I8 : F • (E) E E•+T • • • • • goto(I4, T)= I2 goto(I4, F)= I3 goto(I4, ( )= I4 goto(I4, id)= I5 goto(I6, T)= I9 : E E + T • T T•*F Yu-Chen Kuo 114 Example (cont.) • • • • • • • • • goto(I6, F)= I3 goto(I6, ( )= I4 goto(I6, id)= I5 goto(I7, F )= I10 : T T * F • goto(I7, ( )= I4 goto(I7, id )= I5 goto(I8, ) )= I11 : F (E) • goto(I8, + )= I6 goto(I9, * )= I7 Yu-Chen Kuo 115 Step 4: The Transition Diagram • The goto functions for the canonical collection of sets of items be shown as a transition diagram. Yu-Chen Kuo 116 Step 4: The Transition Diagram (cont.) Yu-Chen Kuo 117 Step 4: The Transition Diagram (cont.) Yu-Chen Kuo 118 Step 5: The Parsing Table • State i is construction form Ii. 1) The parsing actions for state i are determinated as follows: a) If [A•a] is in Ii and goto(Ii, a)=Ij, set action[i,a] to “shift j”. Here a must be a terminal. b) If [A•] is in Ii, set action [i, a] to “reduce A” for all a in FOLLOW (A). Here A must not S’. c) If [ S’S•] in Ii, set action[i,$] to “accept”. Yu-Chen Kuo 119 Step 5: The Parsing Table (cont.) 2) The goto transactions for state i are constructed for all nonterminals A using the rule: • If goto(Ii, A)=Ij, the goto[i,A]= j. 3) All entries not defined by 1) and 2) are set “error” 4) The start state of the parser is the one constructed from the set of items containing [S’• S] Yu-Chen Kuo 120 Step 5: The Parsing Table (cont.) Yu-Chen Kuo 121 Example for a not SLR(1) and unambiguous grammar • • • • • S S L L R L=R R *R id L Yu-Chen Kuo 122 Example for a not SLR(1) and unambiguous grammar (cont.) Yu-Chen Kuo 123 Example for a not SLR(1) and unambiguous grammar (cont.) • • Consider the set of items I2. I2: S L = R R L – – • • • action[2, =]= “shift 6” FOLLOW(R) contains =, set action[2, =]= “reduce R L” Shift/Reduce Conflict but not ambiguous In fact, no right-sentential form that begins R= ….. when the viable prefix L only. (*L) Spilt the state accord to FOLLOW set. Yu-Chen Kuo 124 Construction Canonical LR Parsing Tables • • • An LR(1) item is of the form [A, a] where A is a production and a is a terminal or $. The “1” refers to the length of the second component, called the lookahead of the item. The lookahead has no effect in an item of the form [A, a], where is not , but an item of the form [A, a] calls for a reduction by A only if the input symbol is a. Yu-Chen Kuo 125 Construction Canonical LR Parsing Tables • • • Thus, we are compelled to reduce by A only on those input symbols a for which [A , a] is an LR(1) item in the state on top of the stack. The set of such a’s will always be a subset of FOLLOW(A), but it could be a proper subset. The method for constructing the collection of sets of LR(1) items is essentially the same as the way we built the collection of sets of LR(0) items. We only to modify two procedures closure and goto. Yu-Chen Kuo 126 Construction Canonical LR Parsing Tables (closure function ) Yu-Chen Kuo 127 Construction Canonical LR Parsing Tables (item & goto) Yu-Chen Kuo 128 Example (cont.) • • • Consider the following augmented grammar S’ S S CC C cC | d closure ( {[S’ S, $]})= I0: S’ S, $ S CC, $ C cC, c/d C d, c/d goto(I0, S)= I1: S’ S , $ Yu-Chen Kuo 129 Example (cont.) • • • • goto (I0, C)= I2: S C C, $ C cC, $ C d, $ goto (I0, c)= I3: C c C, c/d C cC, c/d C d, c/d goto(I0, d)= I4: C d , c/d goto (I2, C)= I5: S C C , $ Yu-Chen Kuo 130 Example (cont.) • goto (I2, c)= I6: • • • • • • • goto (I2, d)= I7: goto(I3, C)=I8: goto(I3, c)=I3 goto(I3, d)=I4 goto(I6, C)=I9: goto(I6, c)=I6 goto(I6,d)=I7 C c C, $ C cC, $ C d, $ C d , $ C cC, c/d C cC, $ Yu-Chen Kuo 131 Example (Transition Diagram) • Yu-Chen Kuo Compare I6 & I3 with different FOLLOW 132 Example (Transition Diagram) Yu-Chen Kuo 133 Example (Parsing Table) Yu-Chen Kuo 134 Canonical LR Parser vs. SLR Parser • Every SLR(1) grammar is a LR(1) grammar. • Canonical LR Parser (LR(1) grammar) may have more states than SLR parser (SLR(1)) form the same grammar. • Exercise: check the following grammar is a LR(1) grammar or not. S S L L L=R R *R id R L Yu-Chen Kuo 135