PPTX

advertisement
Fall 2015-2016 Compiler Principles
Lecture 3: Parsing part 2
Roman Manevich
Ben-Gurion University
Tentative syllabus
Front
End
Intermediate
Representation
Optimizations
Code
Generation
Scanning
Operational
Semantics
Dataflow
Analysis
Register
Allocation
Top-down
Parsing (LL)
Lowering
Loop
Optimizations
Instruction
Selection
Bottom-up
Parsing (LR)
2
Previously
• Role of syntax analysis
• Context-free grammars refresher
• Top-down (predictive) parsing
– Recursive descent
3
Functions for nonterminals
E  LIT | (E OP E) | not E
LIT  true | false
OP  and | or | xor
E() {
if (current  {TRUE, FALSE})
else if (current == LPAREN)
else if (current == NOT)
else
LIT();
match(LPARENT); E(); OP(); E(); match(RPAREN);
match(NOT); E();
error;
}
LIT() {
if (current == TRUE)
else if (current == FALSE)
else
match(TRUE);
match(FALSE);
error;
}
OP() {
if (current == AND)
else if (current == OR)
else if (current == XOR)
else
match(AND);
match(OR);
match(XOR);
error;
}
4
Technical challenges
with recursive descent
5
Recursive descent: problem 1
term  ID | indexed_elem
indexed_elem  ID [ expr ]
• With lookahead 1, the function for indexed_elem will
never be tried…
– What happens for input of the form ID[expr]
6
Recursive descent: problem 2
SAab
Aa|
int S() {
return A() && match(token(‘a’)) && match(token(‘b’));
}
int A() {
return match(token(‘a’)) || 1;
}
 What happens for input “ab”?
 What happens if you flip order of alternatives and try “aab”?
7
Recursive descent: problem 3
p. 127
E  E - term | term
int E() {
return E() && match(token(‘-’)) && term();
}
 What happens when we execute this procedure?
 Recursive descent parsers cannot handle left-recursive grammars
8
Agenda
• Predicting productions via
FIRST/FOLLOW/NULLABLE sets
• Handling conflicts
• LL(k) via pushdown automata
9
How do we predict?
E  LIT | (E OP E) | not E
LIT  true | false
OP  and | or | xor
• How can we decide which production of ‘E’ to
take?
10
FIRST sets
• For a nonterminal A, FIRST(A) is the set of
terminals that can start in a sentence derived
from A
– Formally: FIRST(A) = {t | A * t ω}
• For a sentential form α, FIRST(α) is the set of
terminals that can start in a sentence derived
from α
– Formally: FIRST(α) = {t | α * t ω}
11
FIRST sets example
E  LIT | (E OP E) | not E
LIT  true | false
OP  and | or | xor
• FIRST(E) = …?
• FIRST(LIT) = …?
• FIRST(OP) = …?
12
FIRST sets example
E  LIT | (E OP E) | not E
LIT  true | false
OP  and | or | xor
• FIRST(E) = FIRST(LIT) FIRST(( E OP E ))
• FIRST(LIT) = { true, false }
• FIRST(OP) = {and, or, xor}
FIRST(not E)
• A set of recursive equations
• How do we solve them?
13
Computing FIRST sets
Assume no null productions (A  )
1. Initially, for all nonterminals A, set
FIRST(A) = { t | A t ω for some ω }
2. Repeat the following until no changes occur:
for each nonterminal A
for each production A α1 | … | αk
FIRST(A) = FIRST(α1) ∪ … ∪ FIRST(αk)
• This is known as a fixed-point algorithm
• We will see such iterative methods later in the
course and learn to reason about them
14
Exercise: compute FIRST
STMT  if EXPR then STMT
| while EXPR do STMT
| EXPR ;
EXPR  TERM -> id
| zero? TERM
| not EXPR
| ++ id
| -- id
TERM  id
| constant
STMT
EXPR
TERM
15
1. Initialization
STMT  if EXPR then STMT
| while EXPR do STMT
| EXPR ;
EXPR  TERM -> id
| zero? TERM
| not EXPR
| ++ id
| -- id
TERM  id
| constant
STMT
EXPR
TERM
if
while
zero?
Not
++
--
id
constant
16
2. Iterate 1
STMT  if EXPR then STMT
| while EXPR do STMT
| EXPR ;
EXPR  TERM -> id
| zero? TERM
| not EXPR
| ++ id
| -- id
TERM  id
| constant
STMT
EXPR
TERM
if
while
zero?
Not
++
--
id
constant
zero?
Not
++
--
17
2. Iterate 2
STMT  if EXPR then STMT
| while EXPR do STMT
| EXPR ;
EXPR  TERM -> id
| zero? TERM
| not EXPR
| ++ id
| -- id
TERM  id
| constant
STMT
EXPR
TERM
if
while
zero?
Not
++
--
id
constant
zero?
Not
++
--
id
constant
18
2. Iterate 3 – fixed-point
STMT  if EXPR then STMT
| while EXPR do STMT
| EXPR ;
EXPR  TERM -> id
| zero? TERM
| not EXPR
| ++ id
| -- id
TERM  id
| constant
STMT
EXPR
TERM
if
while
zero?
Not
++
--
id
constant
zero?
Not
++
--
id
constant
id
constant
19
Reasoning about the algorithm
Assume no null productions (A  )
1. Initially, for all nonterminals A, set
FIRST(A) = { t | A t ω for some ω }
2. Repeat the following until no changes occur:
for each nonterminal A
for each production A α1 | … | αk
FIRST(A) = FIRST(α1) ∪ … ∪ FIRST(αk)
• Is the algorithm correct?
• Does it terminate? (complexity)
20
Reasoning about the algorithm
• Termination:
• Correctness:
21
LL(1) Parsing of grammars
without epsilon productions
22
Using FIRST sets
• Assume G has no epsilon productions and for
every non-terminal X and every pair of
productions X   and X   we have that
FIRST() FIRST() = {}
• No intersection between FIRST sets => can
always pick a single rule
23
Using FIRST sets
• In our Boolean expressions example
– FIRST( LIT ) = { true, false }
– FIRST( ( E OP E ) ) = { ‘(‘ }
– FIRST( not E ) = { not }
• If the FIRST sets intersect, may need longer
lookahead
– LL(k) = class of grammars in which production rule
can be determined using a lookahead of k tokens
– LL(1) is an important and useful class
• What if there are epsilon productions?
24
Extending LL(1) Parsing
for epsilon productions
25
FIRST, FOLLOW, NULLABLE sets
• For each non-terminal X
• FIRST(X) = set of terminals that can start in a
sentence derived from X
– FIRST(X) = {t | X * t ω}
• NULLABLE(X) if X * 
• FOLLOW(X) = set of terminals that can follow X
in some derivation
– FOLLOW(X) = {t | S *  X t }
26
Computing the NULLABLE set
• Lemma: NULLABLE(1 … k) =
NULLABLE(1) …
NULLABLE(k)
1. Initially NULLABLE(X) = false
2. For each non-terminal X if exists a production
X
then NULLABLE(X) = true
3. Repeat
for each production Y  1 … k
if NULLABLE(1 … k) then
NULLABLE(Y) = true
until NULLABLE stabilizes
27
Exercise: compute NULLABLE
SAab
Aa|
BAB|C
Cb|
NULLABLE(S) = NULLABLE(A)
NULLABLE(b)
NULLABLE(A) = NULLABLE(a)
NULLABLE(B) = NULLABLE(A)
NULLABLE(C)
NULLABLE(C) = NULLABLE(b)
NULLABLE(a)
NULLABLE()
NULLABLE(B)
NULLABLE()
28
FIRST with epsilon productions
• How do we compute FIRST(1 … k) when
epsilon productions are allowed?
– FIRST(1 … k) = ?
29
FIRST with epsilon productions
• How do we compute FIRST(1 … k) when
epsilon productions are allowed?
– FIRST(1 … k) =
if not NULLABLE(1) then FIRST(1)
else FIRST(1) FIRST (2 … k)
30
Exercise: compute FIRST
SAcb
Aa|
NULLABLE(S) = NULLABLE(A)
NULLABLE(b)
NULLABLE(A) = NULLABLE(a)
FIRST(S) = FIRST(A)
FIRST(A) = FIRST(a)
FIRST(cb)
FIRST ()
NULLABLE(c)
NULLABLE()
FIRST(S) = FIRST(A)
FIRST(A) = {a}
{c}
31
FOLLOW sets
p. 189
α Y  then
FOLLOW(Y) ?
if NULLABLE() or = then
FOLLOW(Y) ?
• if X
32
FOLLOW sets
p. 189
α Y  then
FOLLOW(Y)  FIRST()
if NULLABLE() or = then
FOLLOW(Y) ?
• if X
33
FOLLOW sets
p. 189
α Y  then
FOLLOW(Y)  FIRST()
if NULLABLE() or = then
FOLLOW(Y)  FOLLOW(X)
• if X
34
FOLLOW sets
p. 189
α Y  then
FOLLOW(Y)  FIRST()
if NULLABLE() or = then
FOLLOW(Y)  FOLLOW(X)
• Allows predicting epsilon productions:
X  when the lookahead token is in
FOLLOW(X)
• if X
SAcb
Aa|
 What should we predict for input “cb”?
 What should we predict for input “acb”?
35
LL(k) grammars
36
Conflicts
• FIRST-FIRST conflict
– X α and X  and
– If FIRST(α)  FIRST(β)  {}
• FIRST-FOLLOW conflict
– NULLABLE(X)
– If FIRST(X)  FOLLOW(X)  {}
37
LL(1) grammars
• A grammar is in the class LL(1) when it can be
derived via:
–
–
–
–
–
Top-down derivation
Scanning the input from left to right (L)
Producing the leftmost derivation (L)
With lookahead of one token
For every two productions A α and A β we have
FIRST(α) ∩ FIRST(β) = {}
and if NULLABLE(A) then FIRST(A)  FOLLOW(A) = {}
• A language is said to be LL(1) when it has an LL(1)
grammar
38
LL(k) grammars
• Generalizes LL(1) for k lookahead tokens
• Need to generalize FIRST and FOLLOW for k
lookahead tokens
39
Agenda
• Predicting productions via
FIRST/FOLLOW/NULLABLE sets
• Handling conflicts
• LL(k) via pushdown automata
40
Handling conflicts
41
Back to problem 1
term  ID | indexed_elem
indexed_elem  ID [ expr ]
• FIRST(term) = { ID }
• FIRST(indexed_elem) = { ID }
• FIRST-FIRST conflict
42
Solution: left factoring
• Rewrite the grammar to be in LL(1)
term  ID | indexed_elem
indexed_elem  ID [ expr ]
New grammar is more
complex – has epsilon
production
term  ID after_ID
After_ID  [ expr ] | 
Intuition: just like factoring in algebra: x*y + x*z into x*(y+z)
43
Exercise: apply left factoring
S  if E then S else S
| if E then S
|T
44
Exercise: apply left factoring
S  if E then S else S
| if E then S
|T
S  if E then S S’
|T
S’  else S | 
45
Back to problem 2
SAab
Aa|
• FIRST(S) = { a }
• FIRST(A) = { a }
FOLLOW(S) = { }
FOLLOW(A) = { a }
• FIRST-FOLLOW conflict
46
Solution: substitution
SAab
Aa|
Substitute A in S
Saab|ab
47
Solution: substitution
SAab
Aa|
Substitute A in S
Saab|ab
Left factoring
S  a after_A
after_A  a b | b
48
Back to problem 3
E  E - term | term
• Left recursion cannot be handled with a
bounded lookahead
• What can we do?
49
Left recursion removal
p. 130
N  Nα | β
N  βN’
N’  αN’ | 
G1
• L(G1) = β, βα, βαα, βααα, …
• L(G2) = same
 For our 3rd example:
E  E - term | term
G2
Can be done algorithmically.
Problem 1: grammar becomes
mangled beyond recognition
Problem 2: grammar may not be LL(1)
E  term TE | term
TE  - term TE | 
50
Recap
• Given a grammar
• Compute for each non-terminal
– NULLABLE
– FIRST using NULLABLE
– FOLLOW using FIRST and NULLABLE
• Compute FIRST for each sentential form
appearing on right-hand side of a production
• Check for conflicts
– If exist: attempt to remove conflicts by rewriting
grammar
51
Agenda
• Predicting productions via
FIRST/FOLLOW/NULLABLE sets
• Handling conflicts
• LL(k) via pushdown automata
52
LL(1) parsing:
the automata approach
By MG (talk · contribs) (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons
53
Marking “end-of-file”
• Sometimes it will be useful to transform a
grammar G with start non-terminal S into a
grammar G’ with a new start non-terminal S‘
and a new production rule
S’ S $
where $ is not part of the set of tokens
• To parse an input α with G’ we change it into
α$
• Simplifies top-down parsing with null
productions and LR parsing
54
Another convention
• We will assume that all productions have been
consecutively numbered
(1) S  E $
(2) E  T
(3) E  E + T
(4) T  id
(5) T  ( E )
55
LL(1) Parsers
• Recursive Descent
– Manual construction
(parsing combinators make this easier, but…)
– Uses recursion
• Wanted
– A parser that can be generated automatically
– Does not use recursion
56
LL(1) parsing via pushdown automata
• Pushdown automaton uses
– Input stream
– Prediction stack
– Parsing table
• Nonterminal  token  production rule
• Entry indexed by nonterminal N and token t contains
the alternative of N that must be predicated when
current input starts with t
• Essentially, classic conversion from CFG to PDA
– The only difference is that we replace nondeterministic choice with the parsing table
57
Model of non-recursive
predictive parser
a
Stack
X
Y
+
b
Predictive Parsing
program
$
Output
Z
$
Parsing Table
58
LL(1) parsing algorithm
• Set stack=S$
• While true
– Prediction
•
•
•
•
When top of stack is nonterminal N
pop N, lookup table[N,t]
If table[N,t] is not empty, push table[N,t] on prediction stack
Otherwise: return syntax error
– Match
• When top of prediction stack is a terminal t, must be equal to next
input token t’. If (t = t’), pop t and consume t’.
If (t ≠ t’): return syntax error
– End
• When prediction stack is empty
• If input is empty at that point: return success
• Otherwise: return syntax error
59
Example transition table
(1) E → LIT
(2) E → ( E OP E )
(3) E → not E
(4) LIT → true
(5) LIT → false
(6) OP → and
(7) OP → or
(8) OP → xor
(
FIRST(E)
Which rule should
be used
Nonterminals
Input tokens
(
E
LIT
OP
2
)
not
true
false
3
1
1
4
5
and
or
xor
6
7
8
$
60
Running parser example
A
aacbb$
aAb | c
Input suffix
Stack content
Move
aacbb$
A$
predict(A,a) = A
aacbb$
aAb$
match(a,a)
acbb$
Ab$
predict(A,a) = A
acbb$
aAbb$
match(a,a)
cbb$
Abb$
predict(A,c) = A
cbb$
cbb$
match(c,c)
bb$
bb$
match(b,b)
b$
b$
match(b,b)
$
$
match($,$) – success
a
A
A
aAb
b
aAb
aAb
c
c
A
c
61
Illegal input example
A
abcbb$
aAb | c
Input suffix
Stack content
Move
abcbb$
A$
predict(A,a) = A
abcbb$
aAb$
match(a,a)
bcbb$
Ab$
predict(A,b) = ERROR
a
A
A
aAb
b
aAb
c
A
c
62
Creating the prediction table
•
•
•
•
Let G be an LL(1) grammar
Compute FIRST/NULLABLE/FOLLOW
Check for conflicts
For non-terminal N and token t predict: …
63
Top-down parsing summary
•
•
•
•
Recursive descent
LL(k) grammars
LL(k) parsing with pushdown automata
Cannot deal with left recursion
– Left-recursion removal might result with
complicated grammar
64
Next lecture:
Bottom-up parsing
Download