COS 320 Compilers David Walker last time • context free grammars (Appel 3.1) – terminals, non-terminals, rules – derivations & parse trees – ambiguous grammars • recursive descent parsers (Appel 3.2) – parse LL(k) grammars – easy to write as ML programs – algorithms for automatic construction from a CFG non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules: 1. S ::= IF E THEN S ELSE S 4. L ::= END 2. | BEGIN S L 5. |;SL 3. | PRINT E 6. E ::= NUM = NUM datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ val tok = ref (getToken ()) fun advance () = tok := getToken () fun eat t = if (! tok = t) then advance () else error () fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E () and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L () and E () = eat NUM; eat EQ; eat NUM Constructing RD Parsers • To construct an RD parser, we need to know what rule to apply when – we have seen a non terminal X – we see the next terminal a in input • We apply rule X ::= s when – a is the first symbol that can be generated by string s, OR – s reduces to the empty string (is nullable) and a is the first symbol in any string that can follow X Computing Nullable Sets • Non-terminal X is Nullable only if the following constraints are satisfied (computed using iterative analysis) – base case: • if (X := ) then X is Nullable – inductive case: • if (X := ABC...) and A, B, C, ... are all Nullable then X is Nullable Computing First Sets • First(X) is computed iteratively – base case: • if T is a terminal symbol then First (T) = {T} – inductive case: • if X is a non-terminal and (X:= ABC...) then – First (X) = First (X) U First (ABC...) where First(ABC...) = F1 U F2 U F3 U ... and » F1 = First (A) » F2 = First (B), if A is Nullable » F3 = First (C), if A is Nullable & B is Nullable » ... Computing Follow Sets • Follow(X) is computed iteratively – base case: • initially, we assume nothing in particular follows X – (Follow (X) is initially { }) – inductive case: • if (Y := s1 X s2) for any strings s1, s2 then – Follow (X) = First (s2) U Follow (X) • if (Y := s1 X s2) for any strings s1, s2 then – Follow (X) = Follow(Y) U Follow (X), if s2 is Nullable building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z Y X X ::= a X ::= b Y e first follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z no Y yes X no base case X ::= a X ::= b Y e first follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z no Y yes X no X ::= a X ::= b Y e first follow after one round of induction, we realize we have reached a fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d Y yes c X no a,b base case follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d,a,b Y yes c X no a,b after one round of induction, no fixed point follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d,a,b Y yes c X no a,b follow after two rounds of induction, no more changes ==> fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c {} X no a,b {} base case building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b after one round of induction, no fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b after two rounds of induction, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3rd round) Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a Z Y X b nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d Z ::= d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d Z ::= d Y ::= c e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b What are the blanks? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b What are the blanks? --> syntax errors a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b Is it possible to put 2 grammar rules in the same box? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Z ::= d e Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b Is it possible to put 2 grammar rules in the same box? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Z ::= d e Y ::= c Y ::= Y ::= predictive parsing tables • if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1) – Left-to-right parse, Left-most derivation, 1 symbol lookahead • if not, of the grammar is not LL(1) • in LL(k) parsing table, columns include every klength sequence of terminals: aa ab ba bb ac ca ... another trick • Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z ::= X Y Z Z ::= d Z ::= d e • how do we fix it? Y ::= c Y ::= X ::= a X ::= b Y e another trick • Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z ::= X Y Z Z ::= d Z ::= d e Y ::= c Y ::= X ::= a X ::= b Y e • solution here is left-factoring: Z ::= X Y Z Z ::= d W W ::= W ::= e Y ::= c Y ::= X ::= a X ::= b Y e summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method...such as using a bottomup parsing technique Bottom-up (Shift-Reduce) Parsing shift-reduce parsing • shift-reduce parsing – aka: bottom-up parsing – aka: LR(k) Left-to-right parse, Rightmost derivation, k-token lookahead • more powerful than LL(k) parsers • LALR variant: – the basis for parsers for most modern programming languages – implemented in tools such as ML-Yacc shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id = shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id = num shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= id = num State of parse so far: ( S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE L ::= S State of parse so far: ( L shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( L ; shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ; id = num SHIFT SHIFT SHIFT shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= id = num State of parse so far: ( L ; S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= L ; S State of parse so far: ( L shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( L ) shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= ( L ) State of parse so far: S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: A REDUCE A ::= S EOF ACCEPT shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num Input from lexer: ( id = num ; id = num ) EOF State of parse so far: A A successful parse! Is this grammar LL(1)? Shift-reduce algorithm • Parser keeps track of – position in current input (what input to read next) – a stack of terminal & non-terminal symbols representing the “parse so far” • Based on next input symbol & stack, parser table indicates – shift: push next input on to top of stack – reduce R: • top of stack should match RHS of rule • replace top of stack with LHS of rule – error – accept (we shift EOF & can reduce what remains on stack to start symbol) Shift-reduce algorithm (a detail) • The parser summarizes the current “parse state” using an integer – the integer is actually a state in a finite automaton – the current parse state can be computed by running the automaton over the current parse stack • Revised algorithm: Based on next input symbol & the parse state (as opposed to the entire stack), parser table indicates – shift s: • push next input on to top of stack and move automaton into state s – reduce R & goto s: • top of stack should match RHS of rule • replace top of stack with LHS of rule • move automaton into state s – error – accept shift-reduce parsing Grammar: ???? Input from lexer: ???? ???? EOF State of parse so far: ???? Like LL parsing, shift-reduce parsing does not always work. What sort of grammar rules make shift-reduce parsing impossible? shift-reduce parsing Grammar: ???? Input from lexer: ???? ???? EOF State of parse so far: ???? Like LL parsing, shift-reduce parsing does not always work. • Shift-Reduce errors: can’t decide whether to Shift or Reduce • Reduce-Reduce errors: can’t decide whether to Reduce by R1 or R2 shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id Input from lexer: ???? ???? EOF State of parse so far: ???? shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id Input from lexer: id + id * id EOF State of parse so far: S + S • reduce by rule (S ::= S + S) or • shift the * ??? notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing shift-reduce errors Grammar: A ::= S id EOF S ::= E ; E ::= E ; E E ::= id Input from lexer: id ; id EOF some unambiguous grammars can’t be parsed by LR(1) parsers either id ; id ; id EOF State of parse so far: E ; • reduce by rule (S ::= E ;) or • shift the id input might be this, making shifting correct reduce-reduce errors Grammar: A ::= S EOF S ::= ( E ) S ::= E Input from lexer: ( id ) EOF State of parse so far: ( E ) • reduce by rule ( S ::= ( E ) ) or • reduce by rule ( E ::= ( E ) ) E ::= ( E ) E ::= E + E E ::= id Summary • Top-down Parsing – simple to understand and implement – you can code it yourself using nullable, first, follow sets – excellent for quick & dirty parsing jobs • Bottom-up Parsing – more complex: uses stack & table – more powerful – Bonus: tools do the work for you ==> ML-Yacc • but you need to understand how shift-reduce & reducereduce errors can arise