moreparsing

advertisement
COS 320
Compilers
David Walker
last time
• context free grammars (Appel 3.1)
– terminals, non-terminals, rules
– derivations & parse trees
– ambiguous grammars
• recursive descent parsers (Appel 3.2)
– parse LL(k) grammars
– easy to write as ML programs
– algorithms for automatic construction from a CFG
non-terminals:
S, E, L
terminals:
NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, =
rules:
1. S ::= IF E THEN S ELSE S
4. L ::= END
2.
| BEGIN S L
5.
|;SL
3.
| PRINT E
6. E ::= NUM = NUM
datatype token = NUM | IF
| THEN | ELSE | BEGIN | END
| PRINT | SEMI | EQ
val tok = ref (getToken ())
fun advance () = tok := getToken ()
fun eat t = if (! tok = t) then advance () else error ()
fun S () = case !tok of
IF
=> eat IF; E (); eat THEN; S (); eat ELSE; S ()
| BEGIN => eat BEGIN; S (); L ()
| PRINT => eat PRINT; E ()
and L () = case !tok of
END => eat END
| SEMI => eat SEMI; S (); L ()
and E () = eat NUM; eat EQ; eat NUM
Constructing RD Parsers
• To construct an RD parser, we need to know
what rule to apply when
– we have seen a non terminal X
– we see the next terminal a in input
• We apply rule X ::= s when
– a is the first symbol that can be generated by string s,
OR
– s reduces to the empty string (is nullable) and a is the
first symbol in any string that can follow X
Computing Nullable Sets
• Non-terminal X is Nullable only if the
following constraints are satisfied
(computed using iterative analysis)
– base case:
• if (X := ) then X is Nullable
– inductive case:
• if (X := ABC...) and A, B, C, ... are all Nullable then
X is Nullable
Computing First Sets
• First(X) is computed iteratively
– base case:
• if T is a terminal symbol then First (T) = {T}
– inductive case:
• if X is a non-terminal and (X:= ABC...) then
– First (X) = First (X) U First (ABC...)
where First(ABC...) = F1 U F2 U F3 U ... and
» F1 = First (A)
» F2 = First (B), if A is Nullable
» F3 = First (C), if A is Nullable & B is Nullable
» ...
Computing Follow Sets
• Follow(X) is computed iteratively
– base case:
• initially, we assume nothing in particular follows X
– (Follow (X) is initially { })
– inductive case:
• if (Y := s1 X s2) for any strings s1, s2 then
– Follow (X) = First (s2) U Follow (X)
• if (Y := s1 X s2) for any strings s1, s2 then
– Follow (X) = Follow(Y) U Follow (X), if s2 is Nullable
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
nullable
Z
Y
X
X ::= a
X ::= b Y e
first
follow
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
nullable
Z
no
Y
yes
X
no
base case
X ::= a
X ::= b Y e
first
follow
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
nullable
Z
no
Y
yes
X
no
X ::= a
X ::= b Y e
first
follow
after one round of induction, we realize we have reached a fixed point
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
Z
no
d
Y
yes
c
X
no
a,b
base case
follow
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
Z
no
d,a,b
Y
yes
c
X
no
a,b
after one round of induction, no fixed point
follow
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
Z
no
d,a,b
Y
yes
c
X
no
a,b
follow
after two rounds of induction, no more changes ==> fixed point
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
{}
X
no
a,b
{}
base case
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,d,a,b
after one round of induction, no fixed point
building a predictive parser
Z ::= X Y Z
Z ::= d
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,d,a,b
after two rounds of induction, fixed point
(but notice, computing Follow(X) before Follow (Y) would have required 3rd round)
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a
Z
Y
X
b
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
e
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
Z
Y
X
a
b
Z ::= XYZ
Z ::= XYZ
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
e
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
Z
Y
X
a
b
Z ::= XYZ
Z ::= XYZ
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
Z ::= d
e
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
Z
Y
X
a
b
Z ::= XYZ
Z ::= XYZ
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
Z ::= d
Y ::= c
e
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
e
Z ::= d
Y ::= c
Y ::=
Y ::=
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
Build parsing table where row X, col T
tells parser which clause to execute in
function X with next-token T:
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
• if T  First(s) then
enter (X ::= s) in row X, col T
• if s is Nullable and T  Follow(X)
enter (X ::= s) in row X, col T
c
d
e
Z ::= d
Y ::= c
Y ::=
Y ::=
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
What are the blanks?
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
X ::= a
X ::= b Y e
c
d
e
Z ::= d
Y ::= c
Y ::=
Y ::=
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
What are the blanks? --> syntax errors
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
X ::= a
X ::= b Y e
c
d
e
Z ::= d
Y ::= c
Y ::=
Y ::=
Grammar:
Z ::= X Y Z
Z ::= d
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
Is it possible to put 2 grammar rules in the same box?
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
X ::= a
X ::= b Y e
c
d
e
Z ::= d
Y ::= c
Y ::=
Y ::=
Grammar:
Z ::= X Y Z
Z ::= d
Z ::= d e
Computed Sets:
Y ::= c
Y ::=
X ::= a
X ::= b Y e
nullable
first
follow
Z
no
d,a,b
{}
Y
yes
c
e,d,a,b
X
no
a,b
c,e,d,a,b
Is it possible to put 2 grammar rules in the same box?
a
b
Z
Z ::= XYZ
Z ::= XYZ
Y
Y ::=
Y ::=
X
X ::= a
X ::= b Y e
c
d
e
Z ::= d
Z ::= d e
Y ::= c
Y ::=
Y ::=
predictive parsing tables
• if a predictive parsing table constructed this way
contains no duplicate entries, the grammar is
called LL(1)
– Left-to-right parse, Left-most derivation, 1 symbol
lookahead
• if not, of the grammar is not LL(1)
• in LL(k) parsing table, columns include every klength sequence of terminals:
aa
ab
ba
bb
ac
ca
...
another trick
• Previously, we saw that grammars with
left-recursion were problematic, but could
be transformed into LL(1) in some cases
• the example non-LL(1) grammar we just
saw:
Z ::= X Y Z
Z ::= d
Z ::= d e
• how do we fix it?
Y ::= c
Y ::=
X ::= a
X ::= b Y e
another trick
• Previously, we saw that grammars with
left-recursion were problematic, but could
be transformed into LL(1) in some cases
• the example non-LL(1) grammar we just
saw:
Z ::= X Y Z
Z ::= d
Z ::= d e
Y ::= c
Y ::=
X ::= a
X ::= b Y e
• solution here is left-factoring:
Z ::= X Y Z
Z ::= d W
W ::=
W ::= e
Y ::= c
Y ::=
X ::= a
X ::= b Y e
summary of RD parsing
• CFGs are good at specifying programming language structure
• parsing general CFGs is expensive so we define parsers for
simpler classes of CFG
– LL(k), LR(k)
• we can build a recursive descent parser for LL(k) grammars by:
–
–
–
–
computing nullable, first and follow sets
constructing a parse table from the sets
checking for duplicate entries, which indicates failure
creating an ML program from the parse table
• if parser construction fails we can
– rewrite the grammar (left factoring, eliminating left recursion) and try
again
– try to build a parser using some other method
summary of RD parsing
• CFGs are good at specifying programming language structure
• parsing general CFGs is expensive so we define parsers for
simpler classes of CFG
– LL(k), LR(k)
• we can build a recursive descent parser for LL(k) grammars by:
–
–
–
–
computing nullable, first and follow sets
constructing a parse table from the sets
checking for duplicate entries, which indicates failure
creating an ML program from the parse table
• if parser construction fails we can
– rewrite the grammar (left factoring, eliminating left recursion) and try
again
– try to build a parser using some other method...such as using a bottomup parsing technique
Bottom-up (Shift-Reduce) Parsing
shift-reduce parsing
• shift-reduce parsing
– aka: bottom-up parsing
– aka: LR(k) Left-to-right parse, Rightmost
derivation, k-token lookahead
• more powerful than LL(k) parsers
• LALR variant:
– the basis for parsers for most modern
programming languages
– implemented in tools such as ML-Yacc
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
State of parse so far:
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: (
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: ( id
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: ( id =
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: ( id = num
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
REDUCE
S ::= id = num
State of parse so far: ( S
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
REDUCE
L ::= S
State of parse so far: ( L
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: ( L ;
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
State of parse so far: ( L ; id = num
SHIFT
SHIFT
SHIFT
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
REDUCE
S ::= id = num
State of parse so far: ( L ; S
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
REDUCE
S ::= L ; S
State of parse so far: ( L
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: ( L )
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
yet to read
Input from lexer: ( id = num ; id = num ) EOF
REDUCE
S ::= ( L )
State of parse so far: S
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
Input from lexer: ( id = num ; id = num ) EOF
SHIFT
State of parse so far: A
REDUCE
A ::= S EOF
ACCEPT
shift-reduce parsing example
Parsing Table
Grammar:
A ::= S EOF
L ::= L ; S
L ::= S
S ::= ( L )
S ::= id = num
Input from lexer: ( id = num ; id = num ) EOF
State of parse so far: A
A successful parse! Is this grammar LL(1)?
Shift-reduce algorithm
• Parser keeps track of
– position in current input (what input to read next)
– a stack of terminal & non-terminal symbols representing the
“parse so far”
• Based on next input symbol & stack, parser table
indicates
– shift: push next input on to top of stack
– reduce R:
• top of stack should match RHS of rule
• replace top of stack with LHS of rule
– error
– accept (we shift EOF & can reduce what remains on stack to
start symbol)
Shift-reduce algorithm (a detail)
• The parser summarizes the current “parse state” using an
integer
– the integer is actually a state in a finite automaton
– the current parse state can be computed by running the automaton over
the current parse stack
• Revised algorithm: Based on next input symbol & the parse
state (as opposed to the entire stack), parser table indicates
– shift s:
• push next input on to top of stack and move automaton into state s
– reduce R & goto s:
• top of stack should match RHS of rule
• replace top of stack with LHS of rule
• move automaton into state s
– error
– accept
shift-reduce parsing
Grammar:
????
Input from lexer: ???? ???? EOF
State of parse so far: ????
Like LL parsing, shift-reduce parsing does not always work.
What sort of grammar rules make shift-reduce parsing impossible?
shift-reduce parsing
Grammar:
????
Input from lexer: ???? ???? EOF
State of parse so far: ????
Like LL parsing, shift-reduce parsing does not always work.
• Shift-Reduce errors: can’t decide whether to Shift or Reduce
• Reduce-Reduce errors: can’t decide whether to Reduce by R1 or R2
shift-reduce errors
Grammar:
A ::= S EOF
S ::= S + S
S ::= S * S
S ::= id
Input from lexer: ???? ???? EOF
State of parse so far: ????
shift-reduce errors
Grammar:
A ::= S EOF
S ::= S + S
S ::= S * S
S ::= id
Input from lexer: id + id * id EOF
State of parse so far: S + S
• reduce by rule (S ::= S + S) or
• shift the * ???
notice, this is an ambiguous
grammar – we are always going to
need some mechanism for resolving
the outstanding ambiguity before parsing
shift-reduce errors
Grammar:
A ::= S id EOF
S ::= E ;
E ::= E ; E
E ::= id
Input from lexer: id ; id EOF
some unambiguous
grammars can’t be
parsed by LR(1)
parsers either
id ; id ; id EOF
State of parse so far: E ;
• reduce by rule (S ::= E ;) or
• shift the id
input might be this,
making shifting
correct
reduce-reduce errors
Grammar:
A ::= S EOF
S ::= ( E )
S ::= E
Input from lexer: ( id ) EOF
State of parse so far: ( E )
• reduce by rule ( S ::= ( E ) ) or
• reduce by rule ( E ::= ( E ) )
E ::= ( E )
E ::= E + E
E ::= id
Summary
• Top-down Parsing
– simple to understand and implement
– you can code it yourself using nullable, first, follow
sets
– excellent for quick & dirty parsing jobs
• Bottom-up Parsing
– more complex: uses stack & table
– more powerful
– Bonus: tools do the work for you ==> ML-Yacc
• but you need to understand how shift-reduce & reducereduce errors can arise
Download