COS 320 Compilers David Walker last time • context free grammars (Appel 3.1) – terminals, non-terminals, rules – derivations & parse trees – ambiguous grammars • recursive descent parsers (Appel 3.2) – parse LL(k) grammars – easy to write as ML programs – algorithms for automatic construction from a CFG non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules: 1. S ::= IF E THEN S ELSE S 4. L ::= END 2. | BEGIN S L 5. |;SL 3. | PRINT E 6. E ::= NUM = NUM datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ val tok = ref (getToken ()) fun advance () = tok := getToken () fun eat t = if (! tok = t) then advance () else error () fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E () and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L () and E () = eat NUM; eat EQ; eat NUM Constructing RD Parsers • To construct an RD parser, we need to know what rule to apply when – we have seen a non terminal X – we see the next terminal a in input • We apply rule X ::= s when – a is the first symbol that can be generated by string s, OR – s reduces to the empty string (is nullable) and a is the first symbol in any string that can follow X Computing Nullable Sets • Non-terminal X is Nullable only if the following constraints are satisfied (computed using iterative analysis) – base case: • if (X := ) then X is Nullable – iterative/inductive case: • if (X := ABC...) and A, B, C, ... are all Nullable then X is Nullable Computing First Sets • First(X) is computed iteratively – base case: • if T is a terminal symbol then First (T) := {T} • Otherwise First (T) := { } – iterative/inductive case: • if X is a non-terminal and (X:= ABC...) then – First (X) := First (X) U First (ABC...) where First(ABC...) = F1 U F2 U F3 U ... and » F1 = First (A) » F2 = First (B), if A is Nullable » F3 = First (C), if A is Nullable & B is Nullable » ... Computing Follow Sets • Follow(X) is computed iteratively – base case: • initially, we assume nothing in particular follows X – Follow (X) := { } for all X – inductive case: • if (Y := s1 X s2) for any strings s1, s2 then – Follow (X) := First (s2) U Follow (X) • if (Y := s1 X s2) for any strings s1, s2 then – Follow (X) := Follow(Y) U Follow (X), if s2 is Nullable building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z Y X X ::= a X ::= b Y e first follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z no Y yes X no base case X ::= a X ::= b Y e first follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= nullable Z no Y yes X no X ::= a X ::= b Y e first follow after one round of induction, we realize we have reached a fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no {} Y yes {} X no {} base case follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d Y yes c X no a,b round 1, no fixed point follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d,a,b Y yes c X no a,b round 2, no fixed point follow building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first Z no d,a,b Y yes c X no a,b follow round 3, no more changes ==> fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c {} X no a,b {} base case building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b after one round of induction, no fixed point building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b after two rounds, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3rd round) Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a Z Y X b nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d Z ::= d e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: Z Y X a b Z ::= XYZ Z ::= XYZ nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d Z ::= d Y ::= c e Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b • if T First(s) then enter (X ::= s) in row X, col T • if s is Nullable and T Follow(X) enter (X ::= s) in row X, col T c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b What are the blanks? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b What are the blanks? --> syntax errors a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b Is it possible to put 2 grammar rules in the same box? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Y ::= c Y ::= Y ::= Grammar: Z ::= X Y Z Z ::= d Z ::= d e Computed Sets: Y ::= c Y ::= X ::= a X ::= b Y e nullable first follow Z no d,a,b {} Y yes c e,d,a,b X no a,b c,e,d,a,b Is it possible to put 2 grammar rules in the same box? a b Z Z ::= XYZ Z ::= XYZ Y Y ::= Y ::= X X ::= a X ::= b Y e c d e Z ::= d Z ::= d e Y ::= c Y ::= Y ::= predictive parsing tables • if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1) – Left-to-right parse, Left-most derivation, 1 symbol lookahead • if not, of the grammar is not LL(1) • in LL(k) parsing table, columns include every klength sequence of terminals: aa ab ba bb ac ca ... another trick • Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z ::= X Y Z Z ::= d Z ::= d e • how do we fix it? Y ::= c Y ::= X ::= a X ::= b Y e another trick • Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases • the example non-LL(1) grammar we just saw: Z ::= X Y Z Z ::= d Z ::= d e Y ::= c Y ::= X ::= a X ::= b Y e • solution here is left-factoring: Z ::= X Y Z Z ::= d W W ::= W ::= e Y ::= c Y ::= X ::= a X ::= b Y e summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method summary of RD parsing • CFGs are good at specifying programming language structure • parsing general CFGs is expensive so we define parsers for simpler classes of CFG – LL(k), LR(k) • we can build a recursive descent parser for LL(k) grammars by: – – – – computing nullable, first and follow sets constructing a parse table from the sets checking for duplicate entries, which indicates failure creating an ML program from the parse table • if parser construction fails we can – rewrite the grammar (left factoring, eliminating left recursion) and try again – try to build a parser using some other method...such as using a bottomup parsing technique Bottom-up (Shift-Reduce) Parsing shift-reduce parsing • shift-reduce parsing – aka: bottom-up parsing – aka: LR(k) Left-to-right parse, Rightmost derivation, k-token lookahead • more powerful than LL(k) parsers • LALR variant: – the basis for parsers for most modern programming languages – implemented in tools such as ML-Yacc Shift-reduce algorithm • Parser keeps track of – position in current input (what input to read next) – a stack of terminal & non-terminal symbols representing the “parse so far” • Based on next input symbol & stack, parser table indicates – shift: push next input on to top of stack – reduce R: • top of stack should match RHS of rule • replace top of stack with LHS of rule – error – accept (we shift EOF & can reduce what remains on stack to start symbol) shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id = shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( id = num shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= id = num State of parse so far: ( S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE L ::= S State of parse so far: ( L shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( L ; shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF State of parse so far: ( L ; id = num SHIFT SHIFT SHIFT shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= id = num State of parse so far: ( L ; S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= L ; S State of parse so far: ( L shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: ( L ) shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num yet to read Input from lexer: ( id = num ; id = num ) EOF REDUCE S ::= ( L ) State of parse so far: S shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num Input from lexer: ( id = num ; id = num ) EOF SHIFT State of parse so far: A REDUCE A ::= S EOF ACCEPT shift-reduce parsing example Parsing Table Grammar: A ::= S EOF L ::= L ; S L ::= S S ::= ( L ) S ::= id = num Input from lexer: ( id = num ; id = num ) EOF State of parse so far: A A successful parse! Is this grammar LL(1)? Shift-reduce algorithm • Parser keeps track of – position in current input (what input to read next) – a stack of terminal & non-terminal symbols representing the “parse so far” • Based on next input symbol & stack, parser table indicates – shift: push next input on to top of stack – reduce R: • top of stack should match RHS of rule • replace top of stack with LHS of rule – error – accept (we shift EOF & can reduce what remains on stack to start symbol) • Reinterpreting the entire stack on every iteration would be very slow – O(averageStackSize * input) – need optimized algorithm that only looks at top of stack (plus parsing table to figure out what to do. O(input) Shift-reduce algorithm (details) • The parser summarizes the current “parse state” using an integer – the integer is actually a state in a finite automaton – the current parse state can be computed by running the automaton over the current parse stack • Revised algorithm: Based on next input symbol & the parse state (as opposed to the entire stack), parser table indicates – shift s: • push next input on to top of stack and move automaton into state s – reduce R & goto s: • • • • top of stack should match RHS of rule replace top of stack with LHS of rule move automaton into state s build parse tree corresponding to reduction R – error – accept shift-reduce parsing Grammar: ???? Input from lexer: ???? ???? EOF State of parse so far: ???? Like LL parsing, shift-reduce parsing does not always work. What sort of grammar rules make shift-reduce parsing impossible? shift-reduce parsing Grammar: ???? Input from lexer: ???? z??? EOF State of parse so far: ??cd Like LL parsing, shift-reduce parsing does not always work. • Shift-Reduce errors: can’t decide whether to Shift z or Reduce cd by a rule • Reduce-Reduce errors: can’t decide whether to Reduce by R1 or R2 shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id Input from lexer: ???? ???? EOF State of parse so far: ???? notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF State of parse so far: S + S parse tree so far: • reduce by rule (S ::= S + S) or S • shift the * ??? id + S id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF reduce: State of parse so far: S S • reduce by rule (S ::= S + S) S id + S id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF shift: State of parse so far: S * S • reduce by rule (S ::= S + S) S + S shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF shift: State of parse so far: S * id S • reduce by rule (S ::= S + S) S + S shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF reduce: State of parse so far: S * S S S • reduce by rule (S ::= S + S) S + S id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF reduce: S State of parse so far: S * S S • reduce by rule (S ::= S + S) S + S id alternative parse Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF State of parse so far: S + S S • reduce by rule (S ::= S + S) or + id S • shift the * id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF shift: State of parse so far: S + S * S • reduce by rule (S ::= S + S) or + id S • shift the * id * shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF shift: State of parse so far: S + S * id S • reduce by rule (S ::= S + S) or + id S • shift the * id * id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF reduce: State of parse so far: S + S * S S • reduce by rule (S ::= S + S) or + id S • shift the * id * S id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF reduce: State of parse so far: S + S S • reduce by rule (S ::= S + S) or + S S * id • shift the * id S id shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id Input from lexer: id + id * id EOF notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing reduce: S State of parse so far: S S + S S * • reduce by rule (S ::= S + S) or • shift the * S shift-reduce errors Grammar: A ::= S EOF S ::= S + S S ::= S * S S ::= id notice, this is an ambiguous grammar – we are always going to need some mechanism for resolving the outstanding ambiguity before parsing Input from lexer: id + id * id EOF State of parse so far: S + S reduce by rule (S ::= S + S) : reduce shift the *: S S * S S + S S id S + S id S * S shift-reduce errors Grammar: A ::= S id EOF S ::= E ; E ::= E ; E E ::= id Input from lexer: id ; id EOF id ; id ; id EOF State of parse so far: E ; • reduce by rule (S ::= E ;) or • shift the id input might be this, making shifting correct reduce-reduce errors Grammar: A ::= S EOF S ::= ( E ) S ::= E Input from lexer: ( id ) EOF State of parse so far: ( E ) • reduce by rule ( S ::= ( E ) ) or • reduce by rule ( E ::= ( E ) ) E ::= ( E ) E ::= E + E E ::= id Summary • Top-down Parsing – simple to understand and implement – you can code it yourself using nullable, first, follow sets – excellent for quick & dirty parsing jobs • Bottom-up Parsing – more complex: uses stack & table – more powerful – Bonus: tools do the work for you ==> ML-Yacc • but you need to understand how shift-reduce & reducereduce errors can arise