Document

CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4 1 Parsing Techniques Top-down parsers (LL(1), recursive descent) • Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation) • Pick a production and try to match the input • Bad “pick”  may need to backtrack • Some grammars are backtrack-free Bottom-up parsers (predictive parsing) (LR(1), operator precedence) • Start at the leaves and grow toward root • We can think of the process as reducing the input string to the start symbol • At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production • Bottom-up parsers handle a large class of grammars 2 Top-down Parsing S A fringe of the parse tree start symbol D B ? C S left-to-right scan ? left-most derivation lookahead Bottom-up Parsing lookahead S input string upper fringe of the parse tree ? A D right-most derivation in reverse C lookahead 3 Handle-pruning, Bottom-up Parsers The process of discovering a handle & reducing it to the appropriate left-hand side is called handle pruning Handle pruning forms the basis for a bottom-up parsing method To construct a rightmost derivation S  0  1  2  …  n-1  n  w Apply the following simple algorithm for i  n to 1 by -1 Find the handle < i i , ki > in i Replace i with i to generate i-1 4 Example 1 2 3 4 5 6 7 8 9 S  Expr Expr  Expr + Term | Expr – Term | Term Term  Term * Factor | Term / Factor | Factor Factor  num | id Sentential Form S Expr Expr – Term Expr – Term * Factor Expr – Term * <id,y> Expr – Factor * <id,y> Expr – <num,2> * <id,y> Term – <num,2> * <id,y> Factor – <num,2> * <id,y> <id,x> – <num,2> * <id,y> Handle Prod’n , Pos’n — 1,1 3,3 5,5 9,5 7,3 8,3 4,1 7,1 9,1 The expression grammar Handles for rightmost derivation of input string: x–2*y 5 Handle-pruning, Bottom-up Parsers One implementation technique is the shift-reduce parser push $ lookahead = get_ next_token( ) repeat until (top of stack == start symbol and lookahead == $) if the top of the stack is a handle  then /* reduce  to  */ pop || symbols off the stack push  onto the stack else if (lookahead  $) then /* shift */ push lookahead lookahead = get_next_token( ) How do errors show up? • failure to find a handle • hitting $ and needing to shift (final else clause) Either generates an error 6 Example, Corresponding Parse Tree S Expr Expr – Term Term Term * Fact. Fact. Fact. <id,y> <id,x> <num,2> 1. Shift until top-of-stack is the right end of a handle 2. Pop the left end of the handle & reduce 5 shifts + 9 reduces + 1 accept 7 Shift-reduce Parsing Shift reduce parsers are easily built and easily understood A shift-reduce parser has just four actions • Shift — next word is shifted onto the stack • Reduce — right end of handle is at top of stack Locate left end of handle within the stack Pop handle off stack & push appropriate lhs • Accept — stop parsing & report success • Error — call an error reporting/recovery routine Handle finding is key • handle is on stack • finite set of handles  use a DFA ! Accept & Error are simple Shift is just a push and a call to the scanner Reduce takes |rhs| pops & 1 push If handle-finding requires state, put it in the stack 8 LR Parsers • LR(k) parsers are table-driven, bottom-up, shift-reduce parsers that use a limited right context (k-token lookahead) for handle recognition • LR(k): Left-to-right scan of the input, Rightmost derivation in reverse with k token lookahead A grammar is LR(k) if, given a rightmost derivation S  0  1  2  …  n-1  n  sentence We can 1. isolate the handle of each right-sentential form i , and 2. determine the production by which to reduce, by scanning i from left-to-right, going at most k symbols beyond the right end of the handle of i 9 LR Parsers A table-driven LR parser looks like Stack source code grammar Scanner Table-driven Parser Parser Generator ACTION & GOTO Tables IR 10 LR Shift-Reduce Parsers push($); // $ is the end-of-file symbol push(s0); // s0 is the start state of the DFA that recognizes handles lookahead = get_next_token(); repeat forever s = top_of_stack(); if ( ACTION[s,lookahead] == reduce  ) then pop 2*|| symbols; s = top_of_stack(); push(); push(GOTO[s,]); else if ( ACTION[s,lookahead] == shift si ) then push(lookahead); push(si); lookahead = get_next_token(); else if ( ACTION[s,lookahead] == accept and lookahead == $ ) then return success; else error(); The skeleton parser •uses ACTION & GOTO • does |words| shifts • does |derivation| reductions • does 1 accept 11 LR Parsers (parse tables) To make a parser for L(G), we need a set of tables The grammar 1 S 2 Z 3  Z  Zz | z The tables ACTION State $ 0 — 1 accept 2 reduce 3 3 reduce 2 z shift 2 shift 3 reduce 3 reduce 2 GOTO State Z 0 1 1 2 3 12 Example Parses The string “z” Stack $ s0 $ s 0 z s2 $ s0 Z s1 Input z$ $ $ Action shift 2 reduce 3 accept The string “zz” Stack $ s0 $ s 0 z s2 $ s0 Z s1 $ s0 Z s1 z s3 $ s0 Zs1 Input zz$ z$ z$ $ $ Action shift 2 reduce 3 shift 3 reduce 2 accept 13 LR Parsers How does this LR stuff work? • Unambiguous grammar  unique rightmost derivation • Keep upper fringe on a stack – All active handles include TOS – Shift inputs until TOS is right end of a handle Reduce action • Language of handles is regular – Build a handle-recognizing DFA S1 S3 z – ACTION & GOTO tables encode the DFA Z S0 • To match subterms, recurse and leave z DFA’s state on stack Reduce S2 action • Final states of the DFA correspond to reduce actions Control DFA for the – New state is GOTO[lhs , state at TOS] simple example – For Z, this takes the DFA to S1 14 Building LR Parsers How do we generate the ACTION and GOTO tables? • Use the grammar to build a model of the handle recognizing DFA • Use the DFA model to build ACTION & GOTO tables • If construction succeeds, the grammar is LR How do we build the handle-recognizing DFA ? • Encode the set of productions that can be used as handles in the DFA state: Use LR(k) items • Use two functions goto( s,  ) and closure( s ) – goto() is analogous to move() in the DFA to NFA conversion – closure() is analogous to -closure • Build up the states and transition functions of the DFA • Use this information to fill in the ACTION and GOTO tables 15 LR(k) items An LR(k) item is a pair [A , B], where A is a production  with a • at some position in the rhs B is a lookahead string of length ≤ k (terminal symbols or $) Examples: [• , a], [• , a], [• , a], & [• , a] The • in an item indicates the position of the top of the stack • LR(0) items [  •  ] (no lookahead symbol) • LR(1) items [  •  , a ] (one token lookahead) • LR(2) items [  •  , a b ] (two token lookahead) ... 16 LR(k) items The • in an item indicates the position of the top of the stack [• , a] means that the input seen so far is consistent with the use of  immediately after the symbol on top of the stack [• , a] means that the input seen so far is consistent with the use of  at this point in the parse, and that the parser has already recognized . [• , a] means that the parser has seen , and that a lookahead a is consistent with reducing to  (for LR(k) parsers a is a string of terminal symbols of length k) The table construction algorithm uses items to represent valid configurations of an LR(1) parser 17 LR(1) Items The production •, with lookahead a, generates 4 items [• , a], [• , a], [• , a], & [• , a] The set of LR(1) items for a grammar is finite What’s the point of all these lookahead symbols? • Carry them along to choose correct reduction • Lookaheads are bookkeeping, unless item has • at right end – Has no direct use in [• , a] – In [• , a], a lookahead of a implies a reduction by  – For { [• , a],[• , b] } lookahead = a  reduce to ; lookahead  FIRST()  shift  Limited right context is enough to pick the actions 18 Back to Finding Handles Parser in a state where the stack (the fringe) was Expr – Term With lookahead of * How did it choose to expand Term rather than reduce to Expr? • Lookahead symbol is the key • With lookahead of + or –, parser should reduce to Expr • With lookahead of * or /, parser should shift • Parser uses lookahead to decide • All this context from the grammar is encoded in the handlerecognizing mechanism 19 Back to x - 2 * y shift here reduce here 1. Shift until TOS is the right end of a handle 2. Find the left end of the handle & reduce 20 LR(1) Table Construction High-level overview  Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1) items), C = { I0 , I1 , ... , In } a Introduce a new start symbol S’ which has only one production S’  S b Initial state, I0 should include • [S’ •S, $], along with any equivalent items • Derive equivalent items as closure( I0 ) c Repeatedly compute, for each Ik , and each grammar symbol , goto(Ik , ) • If the set is not already in the collection, add it • Record all the transitions created by goto( ) This eventually reaches a fixed point 2 Fill in the ACTION and GOTO tables using the DFA The canonical collection completely encodes the transition diagram for the handle-finding DFA 21 Computing Closures closure(I) adds all the items implied by items already in I • Any item [ , a] implies [  , x] for each production with  on the lhs, and x  FIRST(a) • Since  is valid, any way to derive  is valid, too The algorithm Closure( I ) while ( I is still changing ) for each item [   •  , a]  I for each production     P for each terminal b  FIRST(a) if [  •  , b]  I then add [  •  , b] to I Fixpoint computation 22 Example Grammar Initial step builds the item [S  • A ,$] and takes its closure( ) 1 S 2 Z 3  Z  Zz | z Closure( [S  • A , $] ) Item [S  • Z , $] [Z • Z z , $] [Z  • z , $] [Z  • Z z , z] [Z  • z , z] From Original item 1,  a is $ 1,  a is $ 2,  a is z $ 2,  a is z $ So, initial state s0 is { [S • Z ,$], [Z • Z z, $],[Z• z , $], [Z • Z z , z], [Z • z , z] } 23 Computing Gotos goto(I , x) computes the state that the parser would reach if it recognized an x while in state I • goto( { [   , a] },  ) produces [   , a] • It also includes closure( [   , a] ) to fill out the state The algorithm Goto( I, x ) new = Ø for each [   • x  , a]  I new = new  [  x •  , a] • Not a fixpoint method • Uses closure return closure(new) 24 Example Grammar s0 is { [S • Z ,$], [Z • Z z, $],[Z • z , $], [Z • Z z , z], [Z • z , z] } goto( S0 , z ) • Loop produces Item [Z  z • , $] [Z  z • , z] From Item 3 in s0 Item 5 in s0 • Closure adds nothing since • is at end of rhs in each item In the construction, this produces s2 { [Z z • , {$ , z}]} New, but obvious, notation for two distinct items [Zz • , $] and [Zz • , z] 25

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib