LR(k) Parsing CPSC 388 Ellen Walker Hiram College Bottom Up Parsing • • • • Start with tokens Build up rule RHS (right side) Replace RHS by LHS Done when stack is only start symbol • (Working from leaves of tree to root) Operations in Bottom-up Parsing • Shift: – Push the terminal from the beginning of the string to the top of the stack • Reduce – Replace the string xyz at the top of the stack by a nonterminal A (assuming A->xyz) • Accept (when stack is $S’; empty input) Sample Parse • S’ -> S; S-> aSb | bSa | SS | e • String: abba – Stack = $, input = abba$; shift – Stack = $a input = bba$; reduce S->e – Stack = $aS input = bba$ ; shift – Stack = $aSb input = ba$ ; reduce S->aSb – Stack = $S input = ba ; shift Sample Parse (cont) – Stack = $S input = ba$ ; shift – Stack = $Sb input = a$ ; reduce S->e – Stack = $SbS input = a$ ; shift – Stack = $SbSa input = $; reduce S->bSa – Stack = $SS input = $; reduce S->SS – Stack = $S input = $; reduce S’-> S – Stack = $S’ input = $; accept LR(k) Parsing • LR(0) grammars can be parsed with no lookahead (stack only) • LR(1) grammars need 1 character lookahead • LR(k), k>1 use multi-character lookahead • Most “real” grammars are LR(1) Shift vs. Reduce • First, build NFA of LR(0) items • Transform NFA to DFA • If unambiguous, grammar is LR(0) - use DFA directly to parse (states indicate shift vs. reduce) • Otherwise, use SLR(1) algorithm LR(0) Items • Rules with . between stack & input • For S->(S) | a, the LR(0) items are: S -> .(S) S-> .a S-> (.S) S-> a. S->(S.) S->(S). • S -> .(S) and S-> .a are initial items • S-> (S). and S->a. are complete items Building NFA • Each LR(0) item is a state • Shift transitions a A - > .a B A - > a .B • Change of goal transitions S - > x .A y A -> .a B More on NFA • Initial state is “ S’ -> .S” • No final state, but acceptance happens in S’->S. state • Complete LR(0) items have no outbound transitions – We’ll worry about getting past them later • No “reduce transitions” – “shift” on non-terminal used during reduce NFA: S-> (S) | Ab ; A -> aA | S S '- > .S S'- > S . A -> . ( S -> .( S ) S S -> (.S ) A S -> .A b b S ->A .b S ->A b. a A -> .a A A A -> a .A A -> a A . ) S -> (S .) S -> (S ). NFA -> DFA • Compute -closure (closure items) – All are initial items • Use subset construction (kernel items) • Grammar + kernel items are sufficient (closure items can be inferred) • DFA is computed directly by YACC, etc. DFA Construction Details • For each symbol (terminal or nonterminal) after the marker, create a shift transition. These are kernel items. S S'-> .S S' -> S. DFA Construction Details • If there are multiple shift transitions on the same symbol, these are combined into the same state. • (Because the NFA will be in all those states at once). Adding Closure Items • When the marker is immediately before a non-terminal symbol, the closure items are all of the initial forms for the new symbol, e.g. – S’ -> .S (kernel item) – S -> .(S) (closure item) – S -> .Ab (closure item) • These denote the change of goal transitions (which are all epsilon-transitions) DFA “Final” States • The DFA doesn’t actually accept the string, so the concept of “final” isn’t the same • In JFLAP, mark any state where a reduction can take place as final DFA S-> (S) | Ab ; A -> aA | LR(0) Parsing • At each step, push a state onto the stack, and do an action based on the current state – A->a.xb (not a complete item) If x is terminal, shift. – A->aXb. (a complete item) Reduce by A->aXb When Not LR(0)? • Shift-reduce conflict – State contains both a complete item and a “shift” item (with leading terminal) • Reduce-reduce conflict – State contains 2 or more complete items. • Previous example is not LR(0)! (Why)? Simple LR(1) • If a shift is possible, do it • Else if there is a complete item for A, and the next terminal is in Follow(A), reduce A. Compute the next state by taking the A link from the last state left on the stack before pushing A • Otherwise, there is a parse error SLR(1) Table • Rows are states, columns are symbols (terminal and nonterminal) • Table entries (3 types): – sn – Rk –n shift & goto state n (only for terminals) reduce using rule k (rule #’s start at 0 in JFLAP) Goto state n (only for nonterminals, after reduction) Transitions and Table Entries • Transition from state m to state n on terminal x – Put sn in table [m][x] • Transition from state m to state n on nonterminal X – Put n in table [m][X] • State m has a complete item for rule k, and terminal x is in FINAL of the LHS of rule k – Put rk in table[m][x] • State m is “S’->S” – Put acc (accept) in table[m][$] SLR(1) Example • Grammar – S-> (S) | Ab A-> aA | • Firsts – S: (,a,b A: a, • Follows – S: $,) A: b SLR(1) Example Table Stat ( 0 s2 1 2 s2 3 4 5 6 7 8 ) a s3 b r4 $ A 7 S 1 7 4 5 acc s3 s3 r4 r4 r3 s6 r1 r1 s8 r2 r2 SLR(1) Example • Stack $0 $0(2 $0(2a7 $0(2a7a7 $0(2a7a7A8 $0(2a7A8 $0(2A5 input (aab)$ aab)$ ab)$ b)$ b)$ b)$ b)$ A-> A-> A->aA SLR(1) Example cont. • • • • • • $0(2A5 $0(2A5b6 $0(2S3 $0(2S3)4 $0S1 $0S’ b)$ )$ )$ $ $ $ accept! Another SLR(1) Grammar to Try • • • • • S -> zMNz M -> aMa M -> z N -> bNb N -> z Parsing Conflicts in SLR(1) • Shift-reduce conflict – Prefer shift over reduce • Reduce-reduce conflicts – Error in design of grammar (usually) – Possible to designate a grammar-specific choice Dangling Else • Remember: if C if C else S – Shift-preference puts else with inner if! – To put else with outer if, inner “if C” must be reduced to S first • Good example of how language “evolved” to make it easy for the compiler! More than SLR(1) • SLR(k) Parsing – Multiple-token lookahead (for shifts) and multiple-token follow information (for reductons) • General LR(1) parsing – Include lookaheads in DFA construction • LALR(1) parsing – Simplified state diagram for GLR(1) – What YACC / Bison uses