Compiler Construction

Chapter 4: Top-Down Parsing 1 Objectives of Top-Down Parsing   2 an attempt to find a leftmost derivation for an input string. an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder. Input String : lm ＞ lm ＞ lm ＞ Approaches of Top-Down Parsing 1. with backtracking (making repeated scans of the input, a general form of top-down parsing) Methods: To create a procedure for each nonterminal. 4 e.g. S -> cAd A -> ab | a S( ) { if input symbol == ‘c’ { Advance(); if A() if input-symbol == ‘d’ { Advance(); return true; } } return false; } c a d L = { cabd, cad } A( ) { isave= input-pointer; if input-symbol == ‘a’ { Advance(); if input-symbol == ‘b’ { Advance(); return true; } } input-pointer = isave; if input-symbol == ‘a’ { Advance(); return true; } else return false; } Problems for top-down parsing with backtracking : (1) left-recursion (can cause a top-down parser to go into an infinite loop) Def. A grammar is said to be left-recursive +if it has a nonterminal A s.t. there is a derivation A => A  for some  . (2) backtracking - undo not only the movement but also the semantics entering in symbol table. (3) the order the alternatives are tried (For the grammar shown above, try w = cabd where A -> a is applied first) Elimination of Left-Recursion With immediate left recursion: A -> A  |  ==> transform into A ->  A' A' ->  A' |  A A A A .. A A  7    ===>  A'  A'  A' .   … . A'  e.g. E -> E + T | T F -> (E) | id T -> T * F | F After transformation: E -> TE' E' -> +TE' |  T -> FT' T' -> *FT' |  F -> (E) | id 8 General form (with left recursion): A -> A 1 | A 2 | ... | A n | 1 | 2 | ... | m After transformation: ==> A -> 1 A' | 2 A' | ... | m A' A' -> 1 A' | 2 A' | ... | n A' |  9 How about left recursion occurred for derivation with more than two steps? e.g., 10 S -> Aa | b A -> Ac | Sd | e where S => Aa => Sda Algorithm: Eliminating left recursion Context-free Grammar G with no cycles (i.e., A => A ) or -production Methods: 1. Arrange the nonterminals in some order A1, A2, ... , An 2. for i = 1 to n do { for j = 1 to i -1 do replace each production of the form Ai -> Aj  by the production Ai -> 1  | 2  | ... | k  , where Aj -> 1 | 2 | ... | k are all current Aj-production; eliminate the immediate left-recursion among the Aiproduction; } + Input An Example e.g. S -> Aa | b A -> Ac | Sd | e Step 1: ==> S -> Aa | b Step 2: ==> A -> Ac | Aad | bd | e Step 3: ==> A -> bdA' |eA' A' -> cA' |adA' |  12 2. Non-backtracking (recursive-descent) parsing recursive descent : use a collection of mutually recursive routines to perform the syntax analysis. Left Factoring : A -> 1 |  2 ==> A ->  A' A' -> 1 | 2 Methods: 1. For each nonterminal A find the longest prefix  common to two or more of its alternatives. If    replace all the A productions A ->  1 |  2 | ... |  n | others by A ->  A‘ | others A' -> 1 | 2 | ... | n 2. Repeat the transformation until no more found e.g. S -> iCtS | iCtSeS | a C -> b ==> S -> iCtSS' | a S' -> eS |  C -> b Predicative Parsing Features: - maintains a stack rather than recursive calls - table-driven Components: 1. An input buffer with end marker ($) 2. A stack with endmarker ($) on the bottom 3. A parsing table, a two-dimensional array M[A,a], where ‘A’ is a nonterminal symbol and ‘a’ is the current input symbol (terminal/token). 14 Parsing Table 15 M[A,a] ( S S(S)S ) Sε $ Sε Algorithm: Input: An input string w and a parsing table M for grammar G. Output: A leftmost derivation of w or an error indication. 16 Initially w$ is in input buffer and S$ is in the stack. Method: Starting Symbol of the grammar do { Let a of w be the next input symbol and X be the top stack symbol; if X is a terminal { if X == a then pop X from stack and remove a from input; else ERROR();} else { if M[X, a] = X -> Y1Y2...Yn then 1. pop X from the stack; 2. push YnYn-1...Y1 onto the stack with Y1 on top; else ERROR(); } } while (X ≠ $) if (X == $) and (the next input symbol == $) then accept else error(); An Example 19 Construction of the parsing table for predictive parser First and Follow Def. First() /* denotes grammar symbol*/ is the set of terminals that begin the string derived from . If  => , then  is also in First(). * Def. Follow(A), A is a nonterminal, is the set of terminals a that can appear immediately to the right of A in some sentential form, that is, the set of terminals 'a' s.t. there exists a derivation of the form S =>*  A a  for some  and . If A can be the rightmost symbol in some sentential form, then  is in Follow(A). 22 Compute First(X) for all grammar symbols X: 1. If X is terminal, then First(X) = {X}. 2. If X ->  is a production then  is in First(X). 3. If X is nonterminal and X -> Y1Y2...Yk is a production, then place 'a' in First(X) if for some i, a is in First(Yi), and  is in all of * First(Y1), ... , First(Yi-1); that is Y1 ... Yi-1 => . If  is in First(Yj) for all j = 1,2,...,k, then add  in First(X). 23 An Example E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ |  F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*,  } 24 25 Compute Follow(A) for all nonterminals A 1. Place $ in Follow(S), where S is the start symbol and $ is the input buffer endmarker. 2. If there is a production A ->  B , then everything in First() except for  is placed in Follow(B). 3. If there is a production A ->  B, or a production A ->  B  where First() contains , then everything in Follow(A) is in Follow(B). 26 An Example E -> TE' E' -> +TE'|  T -> FT' T' -> *FT' |  F -> (E) | id /* E is the start symbol */ 27 Follow(E) = { $,) } Follow(E') = { $,) } Follow(T) = { +,$,) } Follow(T') = { +,$,) } Follow(F) = { *,+,$,) } // rules 1 & 2 // rule 3 // rules 2 & 3 // rule 3 // rules 2 & 3 E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ |  F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*,  } 28 Construct a Predicative Parsing Table 1. For each production A ->  of the grammar, do steps 2 and 3. 2. For each terminal a in First(), add A ->  to M[A, a]. 3. If  is in First(), add A ->  to M[A, b] for each terminal b in Follow(A). If  is in First() and $ is in Follow(A), add A ->  to M[A, $]. 4. Make each undefined entry of M be error. 29 LL(1) grammar A grammar whose parsing table has no multiply-defined entries is said to be LL(1). First 'L' : scan the input from left to right. Second 'L': produce a leftmost derivation. '1' : use one input symbol to determine parsing action. * No ambiguous or left-recursive grammar can be LL(1). Properties of LL(1) grammar A grammar G is LL(1) iff whenever A ->  |  are two distinct productions of G, the following conditions hold: (1) For no terminal a do both  and  derive strings beginning with a. (based on method 2)  First() ∩ First() = ψ (2) At most one of  and  can derive the empty string  (based on method 3). 31 *  then  does not derive any string beginning (3) if  => with a terminal in Follow (A) (based on methods 2 and 3).  First() ∩ Follow(A) = ψ (i.e. If First(A) contains  then First(A) ∩ Follow(A) = ψ) Def. for Multiply-defined entry If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry. e.g. S -> iCtSS'| a S' -> eS |  C -> b generates: M[S',e] = { S' -> , S' -> eS} with multiplydefined entry. 32 Parsing table with multiply-defined entry a S b S-> a 33 i t $ S -> iCtSS' S’->  S' -> eS S’ C e C->b S’->  Difficulty in predictive parsing Left recursion elimination and left factoring make the resulting grammar hard to read and difficult to use for translation purpose. Thus: * Use predictive parser for control constructs * Use operator precedence for expressions. - 34 Assignment #3b Do exercises 4.3, 4.10, 4.13, 4.15 35

Compiler Construction

Related documents

Products

Support

Compiler Construction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib