COP4020 Programming Languages Computing LL(1) parsing table

advertisement
COP4020
Programming
Languages
Computing LL(1) parsing table
Prof. Xin Yuan
Overview


LL(1) parsing in action (Top-down parsing)
Computing LL(1) parsing table
5/29/2016
COP4020 Spring 2014
2

Using the parsing table, the predictive
parsing program works like this:





A stack of grammar symbols ($ on the
bottom)
A string of input tokens ($ at the end)
A parsing table, M[NT, T] of productions
Algorithm:
put ‘$ Start’ on the stack ($ is the end
of input string).
1) if top == input == $ then accept
2) if top == input then
pop top of the stack; advance to next
input symbol; goto 1;
3) if top is nonterminal
if M[top, input] is a production then
replace top with the production; goto 1
else error
4) else error

Example:
(1) E->TE’
(2) E’->+TE’
(3) E’->
(4) T->FT’
(5) T’->*FT’
(6) T’->
(7) F->(E)
(8) F->id


Stack
$E
$E’T
$E’T’F
$E’T’id
$E’T’
…...
E
E’
T
T’
F
id
(1)
+
*
(
(1)
(2)
(4)
input
id+id*id$
id+id*id$
id+id*id$
id+id*id$
+id*id$
$
(3)
(3)
(6)
(6)
(4)
(6)
(8)
)
(5)
(7)
production
E->TE’
T->FT’
F->id
This produces leftmost derivation:
E=>TE’=>FT’E’=>idT’E’=>….=>id+id*id
(1) E->TE’
(2) E’->+TE’
(3) E’->
(4) T->FT’
(5) T’->*FT’
(6) T’->
(7) F->(E)
(8) F->id


E
E’
T
T’
F
id
(1)
+
*
(
(1)
(2)
(4)
)
$
(3)
(3)
(6)
(6)
(4)
(6)
(5)
(8)
(7)
How to compute the parsing table for LL(1) grammar?
Key: We need to make choice for every production
When can E be expanded with production E->TE’?
Intuitively, any token that can be the first token by expanding TE’.
This should include all first token by expanding T, what are they?
What if T can derive empty string (
) , we should also include the first
token that can be derived from E’
What if E’ can also derive empty string? We should all possible tokens that
can potentially follow E?

When should E’ be expanded with production E’->
?
(1) E->TE’
(2) E’->+TE’
(3) E’->
(4) T->FT’
(5) T’->*FT’
(6) T’->
(7) F->(E)
(8) F->id


E
E’
T
T’
F
id
(1)
+
*
(
(1)
(2)
(4)
)
$
(3)
(3)
(6)
(6)
(4)
(6)
(5)
(8)
(7)
How to compute the parsing table for LL(1) grammar?
Intuition: We need to make choice for every production
•
•
•
•
Case 1 (easy): E’->+TE’: expand for all tokens that can be the first token
after expanding the right hand side of the production (expanding +TE’)
Case 1 (harder): E->TE’: expand for all tokens that can be the first token
after expanding TE’
We call this First set.

Case 2: E’->
: no first token?
Whenever we see a token that can potential follow E’ in a sentential form.
(Follow set)

For a production that can derive a string of tokens,
find all possible first tokens.


A production N -> X Y Z should be expanded when the token
can be the first of X Y Z (after derivation): First(X Y Z).
For a production that can derive empty string, find all
possible tokens that can follow the nonterminal.

When should we expand with E’->

?
Anything token that can potentially follow E’: Follow(E’).

First set and follow set
 First(  ): Here,  is a string of symbols. The set of
terminals that begin strings derived from a.

If a is empty string or generates empty string, then empty
string is in First( ).
 Follow(A):
Here, A is a nonterminal symbol.
Follow(A) is the set of terminals that can
immediately follow A in a sentential form.
 Example:
S->iEtS | iEtSeS|a
E->b
First(a) = ?, First(iEtS) = ?, First(S) = ?
Follow(E) = ? Follow(S) = ?

Compute FIRST(X)

If a is a terminal then FIRST(a) = {a} (Case 1)
If X->  , add  to FIRST(X). (Case 2)
If X  Y1 Y2 ... Yk and Y1 Y2 ... Yi 1   add every none  in
FIRST( Yi ) to FIRST(X). If Y1 Y2 ... Yk   , add  to
FIRST(X). (Case 3)

FIRST( Y1 Y2 ... Yk ): similar to the third case.


E->TE’
E’->+TE’| 
T->FT’
T’->*FT’ | 
F->(E) | id
FIRST(E) = ?
FIRST(E’)= ?
FIRST(T) = ?
FIRST(T’) = ?
FIRST(F) = ?
Computing first set
E->TE’
E’->+TE’|
T->FT’
T’->*FT’ |
F->(E) | id
5/29/2016


FIRST(E) = {(, id}
FIRST(E’)={+,  }
FIRST(T) = {(, id}
FIRST(T’) = {*,  }
FIRST(F) = {(, id}
COP4020 Spring 2014
10

Compute Follow(A)

If S is the start symbol, add $ to Follow(S).
If A->  B  , add First( )-{ } to Follow(B).
If A-> B or A-> B  and  => , add Follow(A) to Follow(B).

Note: you are looking at the right hand side of productions!!!


E->TE’
E’->+TE’| 
T->FT’
T’->*FT’ | 
F->(E) | id
First(E) = {(, id}, Follow(E)={), $}
First(E’)={+, e}, Follow(E’) = {), $}
First(T) = {(, id}, Follow(T) = {+, ), $}
First(T’) = {*, e}, Follow(T’) = {+, ), $}
First(F) = {(, id}, Follow(F) = {*, +, ), $}

How to construct the parsing table?
 With first(a) and follow(A), we can build the
parsing table. For each production A-> :


Add A-> to M[A, t] for each t in First( ).
If First() contains empty string




Add A-> to M[A, t] for each t in Follow(A)
if $ is in Follow(A), add A-> to M[A, $]
Make each undefined entry of M error.
Construct parsing table for the following
grammar:
E->TE’
E’->+TE’| 
T->FT’
T’->*FT’ | 
F->(E) | id
First(E) = {(, id}, Follow(E)={), $}
First(E’)={+, e}, Follow(E’) = {), $}
First(T) = {(, id}, Follow(T) = {+, ), $}
First(T’) = {*, e}, Follow(T’) = {+, ), $}
First(F) = {(, id}, Follow(F) = {*, +, ), $}

LL(1) grammar:

A grammar whose parsing table has no multiply-defined
entries is a LL(1) grammar.




use one input symbol of lookahead at each step to make a parsing
decision.
No ambiguous or left-recursive grammar can be LL(1)
A grammar is LL(1) iff for each set of A productions, where
A  1 |  2 | ... |  n The following conditions hold:
First ( i )  First ( j )  {}, when 1  i  n and 1  j  n and i  j
if  i   , the n
(a) no,  j  e, when i  j
(b) First(  j )  Follow(A)  {}, when i  j.
Download