Grammars & Table

advertisement
Discussion #5
LL(1) Grammars
&Table-Driven Parsing
Discussion #5
1/18
Topics
• Approaches to Parsing
– Full backtracking
– Deterministic
• Simple LL(1), table-driven parsing
• Improvements to simple LL(1) grammars
Discussion #5
2/18
Prefix Expression Grammar
• Consider the following grammar (which yields prefix
expressions for binary operators):
E  N | OEE
O+||*|/
N0|1|2|3|4
• Here, prefix expressions associate an operator with the
next two operands.
*+234
(* (+ 2 3) 4)
(2 + 3) * 4 = 20
Discussion #5
*2+34
(* 2 (+ 3 4))
2 * (3 + 4) = 14
3/18
Top-Down Parsing with
Backtracking
*+342
N
E
O
E
E
… +  * N O E E
… + N
E  N | OEE
O+||*|/
N0|1|2|3|4
Discussion #5
N
N
0 1 2
0 1 2 3 0 1 2 3 4
4/18
What are the obvious problems?
• We never know what production to try.
• It appears to be terribly inefficient—and it is.
• Are there grammars for which we can always
know what rule to choose? Yes!
• Characteristics:
– Only single symbol look ahead
– Given a non-terminal and a current symbol, we
always know which production rule to apply
Discussion #5
5/18
LL(1) Parsers
• An LL parser parses the input from Left to
right, and constructs a Leftmost derivation of
the sentence.
• An LL(k) parser uses k tokens of look-ahead.
• LL(1) parsers, although fairly restrictive, are
attractive because they only need to look at the
current non-terminal and the next token to
make their parsing decisions.
• LL(1) parsers require LL(1) grammars.
Discussion #5
6/18
Simple LL(1) Grammars
For simple LL(1) grammars all rules have the form
A a11 | a22 | … | ann
where
• ai is a terminal, 1 <= i <= n
• ai  aj for i  j and
• i is a sequence of terminals and non-terminal or
is empty, 1 <= i <= n
Discussion #5
7/18
Creating Simple LL(1) Grammars
• Why is this not a
• By making all production
simple LL(1) grammar?
rules of the form:
A  a11 | a22 | … | ann
E  N | OEE
• Thus,
O+||*|/
N0|1|2|3|4
E  0 | 1 | 2 | 3 | 4 | +EE | EE
| *EE | /EE
• How can we change it
to simple LL(1)?
Discussion #5
8/18
Example: LL(1) Parsing
E  (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE
2*3
*+234
E
E
8
7
*
E
E
6
5
+
E
3
E
4

E
E
3
2
8
*
4
2
3
Success!
E
E
4
3
?
Fail!
Output = 8 6 3 4 5
Discussion #5
9/18
Simple LL(1) Parse Table
A parse table is defined as follows:
(V  {#})  (VT  {#})  {(, i), pop, accept, error}
where
–  is the right side of production number i
– # marks the end of the input string (#  V)
If A  (V  {#}) is the symbol on top of the stack and a  (VT 
{#}) is the current input symbol, then:
ACTION(A, a) =
pop
if A = a for a  VT
accept if A = # and a = #
(a, i) which means “pop, then push a and output i”
(A  a is the ith production)
error otherwise
Discussion #5
10/18
Parse Table
E  (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE
VT {#}
0
1
2
3
E
(0,1)
(1,2)
(2,3)
(3,4)
0
pop
1
V{#}
+
*
#
(+EE,5) (*EE,6)
pop
2
3
+
*
#
pop
pop
pop
pop
accept
All blank entries are error
Discussion #5
11/18
0
1
2
3
+
*
E
(0,1)
(1,2)
(2,3)
(3,4)
(+EE,5)
(*EE,6)
0,1,2,3,+,*
pop
pop
pop
pop
pop
pop
#
accept
Action
Stack
Input
Initialize
ACTION(E,*) = Replace [E,*EE], Out 6
ACTION(*,*) = pop(*,*)
ACTION(E,+) = Replace [E,+EE], Out 5
ACTION(+,+) = pop(+,+)
ACTION(E,1) = Replace [E,1], Out 2
ACTION(1,1) = pop(1,1)
ACTION(E,2) = Replace [E,2], Out 3
ACTION(2,2) = pop(2,2)
ACTION(E,3) = Replace [E,3], Out 4
ACTION(3,3) = pop(3,3)
ACTION(#,#) = accept
E#
*EE#
EE#
+EEE#
EEE#
1EE#
EE#
2E#
E#
3#
#
*+123#
Discussion #5
#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
Output
6
6
65
65
652
652
6523
6523
65234
65234
Done!
12/18
Simple LL(1):
More Restrictive than Necessary
• Simple LL(1) grammars are very easy and
efficient to parse but also very restrictive.
• The good news: we can achieve the same
desirable results without being so restrictive.
• How? We only need to retain the restriction
that single-symbol look ahead uniquely
determines which rule to use.
Discussion #5
13/18
Relaxing Simple LL(1) Restrictions
• Consider the following grammar, which is not
simple LL(1):
E  (1)N | (2)OEE
O  (3)+ | (4)*
N  (5)0 | (6)1 | (7)2 | (8)3
• What are the problem rules? (1) & (2)
• Observe that it is possible distinguish between
rules 1 and 2.
–
–
–
–
N leads to {0, 1, 2, 3}
O leads to {+, *}
{0, 1, 2, 3}  {+, *} = 
Thus, if we see 0, 1, 2, or 3 we choose (1), and if we
see + or *, we choose (2).
Discussion #5
14/18
LL(1) Grammars
• FIRST() = { |  * and   VT}
• A grammar is LL(1) if for all rules of the form
A  1 | 2 | … | n
the sets
FIRST(1), FIRST(2), …, and FIRST(n)
are pair-wise disjoint; that is,
FIRST(i)  FIRST(j) =  for i  j
Discussion #5
15/18
E  (1)N | (2)OEE
O  (3)+ | (4)*
N  (5)0 | (6)1 | (7)2 | (8)3
For (A, a), we select (, i) if
a  FIRST() and  is the
right hand side of rule i.
VT {#}
V{#}
E
O
N
+
*
0
1
2
3
#
Discussion #5
+
(OEE,2)
(+,3)
*
(OEE,2)
(*,4)
0
(N,1)
1
(N,1)
2
(N,1)
3
(N,1)
(0,5)
(1,6)
(2,7)
(3,8)
#
pop
pop
pop
pop
pop
pop
accept
16/18
+
*
0
1
2
3
E
(OEE,2)
(OEE,2)
(N,1)
(N,1)
(N,1)
(N,1)
O
(+,3)
(*,4)
(0,5)
(1,6)
(2,7)
(3,8)
pop
pop
pop
pop
N
+,*,0,1,2,3
pop
pop
#
accept
Action
Stack
Input
Initialize
ACTION(E,*) = Replace [E,OEE], Out 2
ACTION(O,*) = Replace [O,*], Out 4
ACTION(*,*) = pop(*,*)
ACTION(E,+) = Replace [E,OEE], Out 2
ACTION(O,+) = Replace [O,+], Out 3
ACTION(+,+) = pop(+,+)
ACTION(E,1) = Replace [E,N], Out 1
ACTION(N,1) = Replace [N,1], Out 6
ACTION(1,1) = pop(1,1)
ACTION(E,2) = Replace [E,N], Out 1
ACTION(N,2) = Replace [N,2], Out 7
ACTION(2,2) = pop(2,2)
ACTION(E,3) = Replace [E,N], Out 1
ACTION(N,3) = Replace [N,3], Out 8
ACTION(3,3) = pop(3,3)
ACTION(#,#) = accept
E#
OEE#
*EE#
EE#
OEEE#
+EEE#
EEE#
NEE#
1EE#
EE#
NE#
2E#
E#
N#
3#
#
*+123#
Discussion #5
#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
*+123#
Output
2
24
24
242
2423
2423
24231
242316
242316
2423161
24231617
24231617
242316171
2423161718
2423161718
Done!
17/18
What does 2 4 2 3 1 6 1 7 1 8 mean?
E  (1)N | (2)OEE
O  (3)+ | (4)*
N  (5)0 | (6)1 | (7)2 | (8)3
E
(4)
*
(3)
+
(2)
OEE
(2)
OEE
(1)
N
(6)
1
(1)
(1)
N
(7)
N
(8)
3
2
2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal.
Discussion #5
18/18
Download