LR0 and SLR(1) parsing (Louden 5.2-5.3, JFLAP

advertisement
LR(k) Parsing
CPSC 388
Ellen Walker
Hiram College
Bottom Up Parsing
•
•
•
•
Start with tokens
Build up rule RHS (right side)
Replace RHS by LHS
Done when stack is only start symbol
• (Working from leaves of tree to root)
Operations in Bottom-up Parsing
• Shift:
– Push the terminal from the beginning of the
string to the top of the stack
• Reduce
– Replace the string xyz at the top of the
stack by a nonterminal A
(assuming A->xyz)
• Accept (when stack is $S’; empty input)
Sample Parse
• S’ -> S; S-> aSb | bSa | SS | e
• String: abba
– Stack = $, input = abba$; shift
– Stack = $a input = bba$; reduce S->e
– Stack = $aS input = bba$ ; shift
– Stack = $aSb input = ba$ ; reduce S->aSb
– Stack = $S input = ba ; shift
Sample Parse (cont)
– Stack = $S input = ba$ ; shift
– Stack = $Sb input = a$ ; reduce S->e
– Stack = $SbS input = a$ ; shift
– Stack = $SbSa input = $; reduce S->bSa
– Stack = $SS input = $; reduce S->SS
– Stack = $S input = $; reduce S’-> S
– Stack = $S’ input = $; accept
LR(k) Parsing
• LR(0) grammars can be parsed with no
lookahead (stack only)
• LR(1) grammars need 1 character
lookahead
• LR(k), k>1 use multi-character
lookahead
• Most “real” grammars are LR(1)
Shift vs. Reduce
• First, build NFA of LR(0) items
• Transform NFA to DFA
• If unambiguous, grammar is LR(0) - use
DFA directly to parse (states indicate
shift vs. reduce)
• Otherwise, use SLR(1) algorithm
LR(0) Items
• Rules with . between stack & input
• For S->(S) | a, the LR(0) items are:
S -> .(S)
S-> .a
S-> (.S)
S-> a.
S->(S.)
S->(S).
• S -> .(S) and S-> .a are initial items
• S-> (S). and S->a. are complete items
Building NFA
• Each LR(0) item is a state
• Shift transitions
a
A - > .a B
A - > a .B
• Change of goal transitions

S - > x .A y
A -> .a B
More on NFA
• Initial state is “ S’ -> .S”
• No final state, but acceptance happens
in S’->S. state
• Complete LR(0) items have no
outbound transitions
– We’ll worry about getting past them later
• No “reduce transitions”
– “shift” on non-terminal used during reduce
NFA: S-> (S) | Ab ; A -> aA | 
S
S '- > .S
S'- > S .

A -> .

(
S -> .( S )
S


S -> (.S )

A
S -> .A b
b
S ->A .b

S ->A b.

a
A -> .a A
A
A -> a .A

A -> a A .
)
S -> (S .)
S -> (S ).
NFA -> DFA
• Compute -closure (closure items)
– All are initial items
• Use subset construction (kernel items)
• Grammar + kernel items are sufficient
(closure items can be inferred)
• DFA is computed directly by YACC, etc.
DFA Construction Details
• For each symbol (terminal or
nonterminal) after the marker, create a
shift transition. These are kernel items.
S
S'-> .S
S' -> S.
DFA Construction Details
• If there are multiple shift transitions on
the same symbol, these are combined
into the same state.
• (Because the NFA will be in all those
states at once).
Adding Closure Items
• When the marker is immediately before a
non-terminal symbol, the closure items are all
of the initial forms for the new symbol, e.g.
– S’ -> .S (kernel item)
– S -> .(S) (closure item)
– S -> .Ab (closure item)
• These denote the change of goal transitions
(which are all epsilon-transitions)
DFA “Final” States
• The DFA doesn’t actually accept the
string, so the concept of “final” isn’t the
same
• In JFLAP, mark any state where a
reduction can take place as final
DFA S-> (S) | Ab ; A -> aA | 
LR(0) Parsing
• At each step, push a state onto the
stack, and do an action based on the
current state
– A->a.xb (not a complete item)
If x is terminal, shift.
– A->aXb. (a complete item)
Reduce by A->aXb
When Not LR(0)?
• Shift-reduce conflict
– State contains both a complete item and a
“shift” item (with leading terminal)
• Reduce-reduce conflict
– State contains 2 or more complete items.
• Previous example is not LR(0)! (Why)?
Simple LR(1)
• If a shift is possible, do it
• Else if there is a complete item for A,
and the next terminal is in Follow(A),
reduce A. Compute the next state by
taking the A link from the last state left
on the stack before pushing A
• Otherwise, there is a parse error
SLR(1) Table
• Rows are states, columns are symbols
(terminal and nonterminal)
• Table entries (3 types):
– sn
– Rk
–n
shift & goto state n
(only for terminals)
reduce using rule k
(rule #’s start at 0 in JFLAP)
Goto state n
(only for nonterminals, after reduction)
Transitions and Table Entries
• Transition from state m to state n on terminal
x
– Put sn in table [m][x]
• Transition from state m to state n on
nonterminal X
– Put n in table [m][X]
• State m has a complete item for rule k, and
terminal x is in FINAL of the LHS of rule k
– Put rk in table[m][x]
• State m is “S’->S”
– Put acc (accept) in table[m][$]
SLR(1) Example
• Grammar
– S-> (S) | Ab
A-> aA | 
• Firsts
– S: (,a,b
A: a,
• Follows
– S: $,)
A: b
SLR(1) Example Table
Stat (
0
s2
1
2
s2
3
4
5
6
7
8
)
a
s3
b
r4
$
A
7
S
1
7
4
5
acc
s3
s3
r4
r4
r3
s6
r1
r1
s8
r2
r2
SLR(1) Example
• Stack
$0
$0(2
$0(2a7
$0(2a7a7
$0(2a7a7A8
$0(2a7A8
$0(2A5
input
(aab)$
aab)$
ab)$
b)$
b)$
b)$
b)$
A->
A->
A->aA
SLR(1) Example cont.
•
•
•
•
•
•
$0(2A5
$0(2A5b6
$0(2S3
$0(2S3)4
$0S1
$0S’
b)$
)$
)$
$
$
$
accept!
Another SLR(1) Grammar to Try
•
•
•
•
•
S -> zMNz
M -> aMa
M -> z
N -> bNb
N -> z
Parsing Conflicts in SLR(1)
• Shift-reduce conflict
– Prefer shift over reduce
• Reduce-reduce conflicts
– Error in design of grammar (usually)
– Possible to designate a grammar-specific
choice
Dangling Else
• Remember: if C if C else S
– Shift-preference puts else with inner if!
– To put else with outer if, inner “if C” must
be reduced to S first
• Good example of how language
“evolved” to make it easy for the
compiler!
More than SLR(1)
• SLR(k) Parsing
– Multiple-token lookahead (for shifts) and
multiple-token follow information (for
reductons)
• General LR(1) parsing
– Include lookaheads in DFA construction
• LALR(1) parsing
– Simplified state diagram for GLR(1)
– What YACC / Bison uses
Download