Chapter 4: Syntax Analysis Part 2: Top-Down Parsing

Chapter 4: Syntax Analysis Part 2: Top-Down Parsing CSE4100 Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre CH4p2.1 Motivation  CSE4100     Source  We have a grammar Product  We want a parsing algorithm Idea  Synthesize the algorithm from the grammar Problem  How are we going to use the grammar ??? Hint...  We can look at the beginning of the input... CH4p2.2 Basic intuition  Use a sliding window over the input stream The Input CSE4100  Benefit  Reveal the input  Slowly  A little bit at a time – 1 token at a time – Maybe 2 at a time ?  But systematically – Left to Right  What kind of derivation can use tokens in this way? CH4p2.3 Lookahead  CSE4100  Technique  Use a window of lookahead  Guide a LEFTMOST derivation with the window content So...  How to guide ? CH4p2.4 Predictive Parsing  CSE4100   Lookahead  Predictive tool!  Helps to select the right production Question  How? Answered with an example... CH4p2.5 Example  Consider the grammar stmt → if expr then stmt else stmt CSE4100 |while expr do stmt | for ( expr ; expr ; expr) stmt | { stmtList } | lvalue = expr lvalue → expr  id → id | integer And the input fragment while x do { if x then ....  Which production should we start with ?  Why? CH4p2.6 Key Idea CSE4100   Use the production body  To know what can start a production Selecting a production  Pick the rule that can start with the symbol in the lookahead window! Using the production  What should we do with chosen production? Production while expr do stmt Input while x do { if x then .... Matching  CH4p2.7 Matching  When input matches  Eat the matched symbols CSE4100  In the production  In the input   Move the window further down Deal with the rest of the production  How? Production while expr do stmt Input while x do { if x then .... CH4p2.8 No Matching?   CSE4100 What does it mean? Example below  In production:  NON-TERMINAL  In window:  TERMINAL  expr x [an identifier token] What should we do ? Production while expr do stmt Input while x do { if x then .... CH4p2.9 Dealing with non-terminals  CSE4100  Simple...  The non-terminal is defined somewhere  The non-terminal is defined by a set of productions Corollary  Pop the non-terminal  Use the lookahead to choose a production for NT.  Push and Recur CH4p2.10 On the Example  Recall the grammar and the current state stmt → if expr then stmt else stmt .... CSE4100 lvalue → id expr Production while expr do stmt Input while x do { if x then .... 1. Pop result: expr 2. Choose production for expr result: expr → id 3. Push & Recur... result: Production Input → id | integer while id do stmt while x do { if x then .... CH4p2.11 Gain?  CSE4100   Recall  The topmost symbol ( id )  The lookahead x [an instance of id] Gain?  The top symbol and lookahead match! So....  Recur  Recursive call will match them and pop them  Keep on going CH4p2.12 Final outcome  What can happen in the end ?  We “eat” (match) the entire input CSE4100  Meaning ?  We get “stuck” at some point  What does it mean (to be stuck)  How can we get stuck ? CH4p2.13 Negative outcome  CSE4100 We can get stuck  Lookahead window and topmost non-terminal yield...  An empty prediction!  This is the “classic” syntax error Syntax error at file:line. Expecting xyz got abc  Expecting xyz – A list of tokens predicting the productions of the current non-terminal  Got abc – “abc” is the actual content of the lookahead window CH4p2.14 Overall Top-Down Algorithm  CSE4100  Data structures  A stack Holds the symbols to treat  A lookahead window to choose a prediction Algorithm  Startup  Initialize stack to start symbol (a non-terminal)  Initialize lookahead window at start of token stream  Recursive Process  Find out if front of window and top of stack match  If match – Consume the symbols  No match – Pop / Select / Push CH4p2.15 Are We Done ?   CSE4100 Almost.... A Few Remaining issues...  How do we get the predictions from the grammar ?  What should we do if the same symbol predict >1 rule ?  How can we implement the algorithm above ?  How large should the lookahead window be ?  Why is this called “top-down” ?  What can be automated about all this ? CH4p2.16 The Lookahead Window How large ?  Very good question.  The art of counting  CSE4100 1,2, a lot.  Conclusion  Usually, 1 token is plenty  Occasionally, 2 tokens may be needed  Beyond 2 is wild  In theory   Any finite constant will do 1,2,3....,k CH4p2.17 Lookahead Size and Languages With k tokens of lookahead  Some languages can be parsed (this way)  Some languages cannot be parsed  What about k+1 tokens ?  CSE4100     More languages can be parsed.... So the set of languages recognized with k is a subset of the set of languages recognized with k+1! We have a hierarchy! Still...  In practice LL(1) should be enough. CH4p2.18 Top-Down Parsing   CSE4100     Identify a leftmost derivation for an input string Why ?  By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion consistent with scanning the input.  A  aBc  adDc  adec (scan a, scan d, scan e, scan c - accept!) Recursive-descent parsing concepts Predictive parsing  Recursive / Brute force technique  non-recursive / table driven Error recovery Implementation CH4p2.19 Top-Down Parsing   CSE4100     Identify a leftmost derivation for an input string Why ?  By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion consistent with scanning the input.  A  aBc  adDc  adec (scan a, scan d, scan e, scan c - accept!) Recursive-descent parsing concepts Predictive parsing  Recursive / Brute force technique  non-recursive / table driven Error recovery Implementation CH4p2.20 Recursive Descent Parsing Concepts CSE4100 • General category of Parsing Top-Down • Choose production rule based on input symbol • May require backtracking to correct a wrong choice. • Example: S cAd input: cad A  ab | a S cad c cad cad d A S c a S c a A A d b cad d b S c A a Problem: backtrack cad d S c A d a CH4p2.21 Predictive Parsing : Recursive  • CSE4100   To eliminate backtracking, grammar must have:  no left recursion  apply left factoring  remove -moves If so, we can utilize current input symbol in conjunction with non-terminals to be expanded to uniquely determine the next action Utilize transition diagrams (TDs):  For each non-terminal of the grammar:  Create an initial and final state   If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn TDs can be algorithmized into Program CH4p2.22 Transition Diagrams (TDs) CSE4100 • Unlike lexical equivalents, each edge represents a token •Transition implies: if token, choose edge, call proc • Recall earlier grammar and its associated TDs F  ( E ) | id T  FT’ T’  * FT’ |  E  TE’ E’  + TE’ |  E: 0 E’: 3 T + 1 4 E’ T 2 5 E’ How are transition diagrams used ? 6  T: 7 T’: 10 F: 14 F * ( 8 11 15 T’ F  E id 9 12 16 T’ ) Are -moves a problem ? Can we simplify transition diagrams ? 13 Why is simplification critical ? 17 CH4p2.23 How are Transition Diagrams Used ? CSE4100 main() { TD_E(); } TD_E() { TD_T(); TD_E’(); } TD_T() { TD_F(); TD_T’(); } TD_E’() { token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); } } TD_F() { token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else ... } What happened to -moves? NOTE: not all error conditions have been represented. TD_E’() { token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); } } CH4p2.24 How can Transition Diagrams be Simplified ? E’: 3 + 4 T 5 E’ 6  CSE4100 CH4p2.25 How can Transition Diagrams be Simplified ? (2) E’: 3 + 4 T 5 E’ 6  CSE4100  E’: 3 + 4 T 5  6 CH4p2.26 How can Transition Diagrams be Simplified ? (3) E’: 3 + 4 T 5 E’ 6  CSE4100 T  E’: 3 + 4 T 5 E’: 3 + 4   6 6 CH4p2.27 How can Transition Diagrams be Simplified ? (4) E’: 3 + 4 T 5 E’ 6  CSE4100 T  E’: 3 + 4 T 5 T + 4 6 6 0 3   E: E’: 1 E’ 2 CH4p2.28 How can Transition Diagrams be Simplified ? (5) E’: 3 + 4 T 5 E’ 6  CSE4100 T  E’: 3 + 4 T 5 T + 4 6 6 0 3   E: E’: E’ 1 2 T E: 0 T 3 + 4  6 CH4p2.29 How can Transition Diagrams be Simplified ? (6) E’: 3 + 4 T 5 E’ 6  CSE4100 T  E’: 3 + 4 T E’: 5 3 + 4   6 6 E: T 0 E’ 1 2 + T E: 0 T 3 + E: 4  How ? 6 0 T 3  6 CH4p2.30 Additional Transition Diagram Simplifications • Similar steps for T and T’ CSE4100 • Simplified Transition diagrams: * T: F 7 10  Why is simplification important ? 13 F T’: 10 * How does code change? 11  13 F: 14 ( 15 E 16 ) 17 id CH4p2.31 Motivating Table-Driven Parsing 1. Left to right scan input CSE4100 2. Find leftmost derivation Grammar: E  TE’ E’  +TE’ |  T  id Terminator Input : id + id $ Derivation: E  Processing Stack: CH4p2.32 Non-Recursive / Table Driven a + b $ CSE4100 Stack X NT + T symbols of CFG Y Empty stack symbol $ Z Input Predictive Parsing Program Output What actions parser should take based on stack / input Parsing Table M[A,a] General parser behavior: X : top of stack (String + terminator) a : current input 1. When X=a = $ halt, accept, success 2. When X=a  $ , POP X off stack, advance input, go to 1. 3. When X is a non-terminal, examine M[X,a] if it is an error  call recovery routine if M[X,a] = {X  UVW}, POP X, PUSH W,V,U DO NOT expend any input CH4p2.33 Algorithm for Non-Recursive Parsing Set ip to point to the first symbol of w$; repeat CSE4100 let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then Input pointer if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Yk then begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end else error() May also execute other code based on the production used until X=$ /* stack is empty */ CH4p2.34 Example E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id CSE4100 Our well-worn example ! Table M Nonterminal E INPUT SYMBOL id ( TFT’ $ E’ E’ T’ T’ TFT’ T’ Fid ) ETE’ E’+TE’ T’ F * ETE’ E’ T + T’*FT’ F(E) CH4p2.35 Trace of Example STACK CSE4100 $E $E’T $E’T’F $E’T’id $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ INPUT id + id * id$ id + id * id$ id + id * id$ id + id * id$ + id * id$ + id * id$ + id * id$ id * id$ id * id$ id * id$ * id$ * id$ id$ id$ $ $ $ OUTPUT E TE’ T FT’ F  id T’   E’  +TE’ Expend Input T FT’ F  id T’  *FT’ F  id T’   E’   CH4p2.36 Leftmost Derivation for the Example The leftmost derivation for the example is as follows: CSE4100 E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id CH4p2.37 What’s the Missing Puzzle Piece ? Constructing the Parsing Table M ! CSE4100 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table Conceptual Perspective: First: Let  be a string of grammar symbols. First() are the first terminals that can appear in  in any possible * derivation. NOTE: If   , then  is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in * some sentential form. (S  Aa, for some  and ). * NOTE: If S  A, then $ is Follow(A). CH4p2.38 Computing First(X) : All Grammar Symbols 1. If X is a terminal, First(X) = {X} 2. If X  is a production rule, add  to First(X) CSE4100 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule Place First(Y1) in First(X) * if Y1 , Place First(Y2) in First(X) * , if Y2  Place First(Y3) in First(X) … * , if Yk-1  Place First(Yk) in First(X) *  , Stop. NOTE: As soon as Yi  May repeat 1, 2, and 3, above for each Yj CH4p2.39 Computing First(X) : All Grammar Symbols - continued Informally, suppose we want to compute CSE4100 First(X1 X2 … Xn ) = First (X1) “+” First(X2) if  is in First(X1) “+” First(X3) if  is in First(X2) “+” … First(Xn) if  is in First(Xn-1) Note 1: Only add  to First(X1 X2 … Xn) if  is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) ! CH4p2.40 Conceptually: What is First (E, T, …) in Derivation? CSE4100 The leftmost derivation for the example is as follows: INPUT: id + id * id $ E $  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id $ CH4p2.41 Example Computing First for: CSE4100 First(TE’) First(E) E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(T) “+” First(E’) *  Not First(E’) since T  First(T) First(F) “+” First(T’) First((E)) “+” First(id) Overall: First(F) Not First(T’) since F  *  “(“ and “id” First(E) = { ( , id } = First(F) First(E’) = { + ,  } First(T’) = { * ,  } First(T)  First(F) = { ( , id } CH4p2.42 Example 2 Given the production rules: CSE4100 S  i E t SS’ | a S’  eS |  E b Verify that First(S) = { i, a } First(S’) = { e,  } First(E) = { b } CH4p2.43 Computing Follow(A) : All Non-Terminals 1. Place $ in Follow(S), where S is the start symbol and $ signals end of input CSE4100 2. If there is a production A B, then everything in First() is in Follow(B) except for . *  3. If A B is a production, or A B and   (First() contains  ), then everything in Follow(A) is in Follow(B) (Whatever followed A must follow B, since nothing follows B from the production rule) We’ll calculate Follow for two grammars. CH4p2.44 Conceptually: What is Follow in Derivation? The leftmost derivation for the example is as follows: CSE4100 INPUT: id + id * id $ E$  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id $ CH4p2.45 Example Compute Follow for: CSE4100 E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id • Follow(E) - contains $ since E is the start symbol. Also, since F  (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ } • Follow(E’) : E  TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ } * , put in • Follow(T) : E  TE’ implies put in First(E’). Since E’  * , put in Follow(E). Since E’  +TE’ , Put in First(E’), and since E’  Follow(E’). Thus Follow(T) = { +, ), $ }. • Follow(T’) • Follow(F) You do these ! CH4p2.46 Computing Follow : 2nd Example Recall: CSE4100 S  i E t SS’ | a First(S) = { i, a } S’  eS |  First(S’) = { e,  } E b First(E) = { b } Follow(S) – Contains $, since S is start symbol Since S  i E t SS’ , put in First(S’) – not  * , Put in Follow(S) Since S’  Since S’  eS, put in Follow(S’) So…. Follow(S) = { e, $ } Follow(S’) = Follow(S) HOW? Follow(E) = { t } CH4p2.47 First & Follow – One More Look Consider the following derivation: CSE4100 E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.48 First & Follow – One More Look Consider the following derivation: First(E) = { ( , id } What’s First for each non-terminal ? CSE4100 E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.49 First & Follow – One More Look Consider the following derivation: First(T) = { ( , id } What’s First for each non-terminal ? CSE4100 E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.50 First & Follow – One More Look Consider the following derivation: First(T’) = { * ,  } What’s First for each non-terminal ? CSE4100 E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’    * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.51 First & Follow – One More Look Consider the following derivation: First(E’) = { + ,  } What’s First for each non-terminal ? CSE4100 E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  E’   CH4p2.52 First & Follow – One More Look Consider the following derivation: First(F) = { ( , id } What’s First for each non-terminal ? CSE4100 You do First(F) ! E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.53 First & Follow – One More Look Consider the following derivation: What’s First for each non-terminal ? CSE4100 Still needs your First(F) E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’    * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  E’   CH4p2.54 First & Follow – One More Look Consider the following derivation: CSE4100 Follow(E) = { ( , id } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.55 First & Follow – One More Look Consider the following derivation: CSE4100 Follow(T) = { + , ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  This “+” in Follow(T) comes from the First(E’) * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.56 First & Follow – One More Look Consider the following derivation: CSE4100 Follow(T’) = { + , ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.57 First & Follow – One More Look Consider the following derivation: CSE4100 Follow(E’) = { ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  E’   CH4p2.58 First & Follow – One More Look Consider the following derivation: CSE4100 Follow(F) = { +, *, ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  You do Follow(F) ! ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  CH4p2.59 First & Follow – One More Look Consider the following derivation: CSE4100 What’s Follow for each non-terminal ? Still needs your Follow(F) E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’   * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  E’   CH4p2.60 First & Follow – One More Look Consider the following derivation: CSE4100 Still needs your First(F) and Follow(F) What’s First for each non-terminal ? What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’    * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  E’   CH4p2.61 First & Follow – One More Look Consider the following derivation: What are implications ? CSE4100 1. M [ E, ( ] ( id ) * id + id$ (input) E  TE’  FT’E’  ( E ) T’E’  2. M [ T, ( ] 3. M [ F, ( ] ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  M - Table 1. E  TE’ and ( in First(E) 2. TFT’ and ( in First(T) 3. F (E) and ( in First(F) 4. E’  and ) in Follow(E’) 4. M [ E’, ) ] ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + id$ ( id ) * id + FT’E’  ( id ) * id + T’E’  5. M [ T’, $ ] 6. M [ E’, $ ] 5. Since $ in Follow(T’), T’ 6. Since $ in Follow(E’), E’ CH4p2.62 Motivation Behind First & Follow First: CSE4100 Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream) Example: If A   , and a is in First(), then when a=input, replace with . ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices. * , then what follows A dictates the When    or   next choice to be made. Example: If A   , and b is in Follow(A ), then when a * , and if b is an input character, then we expand A with  , which will eventually expand to , of which b follows! ( Above  * . Here First( ) contains .) CH4p2.63 Constructing Parsing Table Algorithm: CSE4100 1. Repeat Steps 2 & 3 for each rule A 2. Terminal a in First()? Add A  to M[A, a ] 3.1  in First()? Add A  to M[A, a ] for all terminals b in Follow(A). 3.2  in First() and $ in Follow(A)? Add A  to M[A, $ ] 4. All undefined entries are errors. CH4p2.64 Constructing Parsing Table - Example E  TE’ E’  + TE’ |  T  FT’ CSE4100 T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ),  } Follow(T,T’) = { +, ),  } Expression Example: E  TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E  TE’ M[E, id ] : E  TE’ by rule 2 (by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’ (by rule 3) E’   :  in First( ) T’   :  in First( ) M[E’, )] : E’   (3.1) M[T’, +] : T’   (3.1) M[E’, $] : E’   (3.2) M[T’, )] : T’   (3.1) (Due to Follow(E’) M[T’, $] : T’   (3.2) CH4p2.65 Constructing Parsing Table – Example 2 CSE4100 S  i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ } S’  eS |  First(S’) = { e,  } Follow(S’) = { e, $ } E b First(E) = { b } Follow(E) = { t } S  i E t SS’ Sa Eb First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’  eS First(eS) = {e} S First() = {} Follow(S’) = { e, $ } INPUT SYMBOL Nonterminal a S S a b i t $ S iEtSS’ S’  S’ eS S’ E e S  E b CH4p2.66 Example  Step 1  Compute CSE4100 S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id  Follow  First Overall: First(S) = { First(E) = { ( , id } = First(F) First(E’) = { + ,  } First(T’) = { * ,  } First(T)  First(F) = { ( , id } Follow(E) = Follow(E’) = { ), $ } Follow(T) = Follow(T’) = {+, ), $ } Follow(F) = {+, *, ), $ } CH4p2.67 Example  CSE4100  Step 2  Build the parser table Step 3  Input: Id + Id * Id $ S→E$ T → F T’ E → T E’ T’ → * F T’ E’→ + T E’ → → F→(E) → Id Parser Table Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.68 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.69 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.70 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.71 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.72 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.73 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.74 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.75 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.76 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.77 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.78 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.79 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.80 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.81 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.82 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.83 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.84 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.85 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.86 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.87 Parsing Process Over Time CSE4100 Time Id + Id * Id $ Input Symbols NT Id + * ( S S → E$ S → E$ E E → TE’ E →TE’ E’ T E’ →+TE’ T → FT’ F F → Id $ E’ →  E’ →  T’ →  T’ →  T →FT’ T’ →  T’ ) T’ →*FT’ F → (E) CH4p2.88 LL(1) Grammars L : Scan input from Left to Right L : Construct a Leftmost Derivation CSE4100 1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action LL(1) grammars have no multiply-defined entries in the parsing table. Properties of LL(1) grammars: • Grammar can’t be ambiguous or left recursive • Grammar is LL(1) when A  1.  &  do not derive strings starting with the same terminal a 2. Either  or  can derive , but not both. Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar CH4p2.89 Error Recovery When Do Errors Occur? Recall Predictive Parser Function: a + b $ CSE4100 Stack X Y Z $ Predictive Parsing Program Input Output Parsing Table M[A,a] 1. If X is a terminal and it doesn’t match input. 2. If M[ X, Input ] is empty – No allowable actions Consider two recovery techniques: A. Panic Mode B. Phase-level Recovery CH4p2.90 Panic Mode Recovery Augment parsing table with action that attempts to realign / synchronize token stream with the expected input. CSE4100 Suppose : A on top of stack doesn’t mesh with current input symbol 1. Use Follow(A) to remove input tokens – sync (discard) 2. Use First(A) to determine when to restart parsing 3. Incorporate higher level language concepts (begin/end, while, repeat/until) to sync actions  we don’t skip tokens unnecessarily. Other actions: 4. When A  , use it to manipulate stack to postpone error detection 5. Use non-matching terminal on stack as token that is inserted into input. CH4p2.91 Revised Parsing Table / Example Nonterminal CSE4100 E INPUT SYMBOL id ( ) ETE’ synch E’+TE’ TFT’ T’ F * ETE’ E’ T + Fid E’ synch TFT’ T’ T’*FT’ synch synch From Follow sets. Pop stack entry – T or NT synch T’ F(E) synch $ synch E’ synch T’ synch Skip input symbol CH4p2.92 Skip & Synch  Meaning  Skip CSE4100 Discard input symbol  Synch Pop top of stack Messages  Constructed based on lookahead an non-terminal  Example     NT = F Lookahead = + Expecting a FACTOR. Got + for a Term. So a factor is missing. CH4p2.93 Revised Parsing Table / Example(2) STACK CSE4100 $E $E $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’ $ INPUT ) id * + id$ id * + id$ id * + id$ id * + id$ id * + id$ * + id$ * + id$ + id$ + id$ + id$ + id$ id$ id$ id$ $ $ $ Remark error, skip ) id is in First(E) error, M[F,+] = synch F has been popped CH4p2.94 Phase-Level Recovery  CSE4100    Fill in blanks entries of parsing table with error handling routines These routines  modify stack and / or input stream  issue error message Problems:  Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language  Infinite loops must be avoided Can be used in conjunction with panic mode to have more complete error handling CH4p2.95 How Would You Implement TD Parser • Stack – Easy to handle. Write ADT to manipulate its contents • Input Stream – Responsibility of lexical analyzer CSE4100 • Key Issue – How is parsing table implemented ? One approach: Assign unique IDS INPUT SYMBOL Nonterminal E id ( ) ETE’ synch E’+TE’ TFT’ T’ F * ETE’ E’ T + Fid All rules have unique IDs E’ synch TFT’ T’ T’*FT’ synch synch Ditto for synch actions synch T’ F(E) synch $ synch E’ synch T’ synch Also for blanks which handle errors CH4p2.96 Revised Parsing Table: Nonterminal CSE4100 INPUT SYMBOL id + * ( ) E 1 18 19 1 9 E’ 20 2 21 22 3 3 T 4 11 23 4 12 13 T’ 24 6 5 25 6 6 F 8 14 15 7 16 17 1 ETE’ 2 E’+TE’ 3 E’ 4 TFT’ 5 T’*FT’ 6 T’ 7 F(E) 8 Fid 9 – 17 : Sync Actions $ 10 18 – 25 : Error Handlers CH4p2.97 Revised Parsing Table: (2) Each # ( or set of #s) corresponds to a procedure that: CSE4100 • Uses Stack ADT • Gets Tokens • Prints Error Messages • Prints Diagnostic Messages • Handles Errors CH4p2.98 How is Parser Constructed ? One large CASE statement: CSE4100 state = M[ top(s), current_token ] switch (state) { case 1: proc_E_TE’( ) ; break ; … case 8: proc_F_id( ) ; break ; case 9: proc_sync_9( ) ; break ; … case 17: proc_sync_17( ) ; break ; case 18: … Procs to handle errors case 25: } Combine  put in another switch Some sync actions may be same Some error handlers may be similar CH4p2.99 Final Comments – Top-Down Parsing CSE4100 So far, • We’ve examined grammars and language theory and its relationship to parsing • Key concepts: Rewriting grammar into an acceptable form • Examined Top-Down parsing: Brute Force : Transition diagrams & recursion Elegant : Table driven • We’ve identified its shortcomings: Not all grammars can be made LL(1) ! • Bottom-Up Parsing – Next Up! CH4p2.100

Chapter 4: Syntax Analysis Part 2: Top-Down Parsing

Related documents

Products

Support

Chapter 4: Syntax Analysis Part 2: Top-Down Parsing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib