Compiler Construction

advertisement
Chapter 4: Top-Down Parsing
1
Objectives of Top-Down Parsing


2
an attempt to find a leftmost derivation for an
input string.
an attempt to construct a parse tree for the input
string starting from the root and creating the
nodes of the parse tree in preorder.
Input String :
lm
>
lm
>
lm
>
Approaches of Top-Down Parsing
1. with backtracking (making repeated scans
of the input, a general form of top-down
parsing)
Methods: To create a procedure for each
nonterminal.
4
e.g. S -> cAd
A -> ab | a
S( ) { if input symbol == ‘c’
{ Advance();
if A()
if input-symbol == ‘d’
{ Advance();
return true;
}
}
return false;
}
c
a
d
L = { cabd, cad }
A( ) { isave= input-pointer;
if input-symbol == ‘a’
{ Advance();
if input-symbol == ‘b’
{ Advance();
return true;
}
}
input-pointer = isave;
if input-symbol == ‘a’
{ Advance();
return true; }
else
return false;
}
Problems for top-down parsing with
backtracking :
(1) left-recursion (can cause a top-down parser to go
into an infinite loop)
Def. A grammar is said to be left-recursive +if it has a
nonterminal A s.t. there is a derivation A => A  for
some  .
(2) backtracking - undo not only the movement but
also the semantics entering in symbol table.
(3) the order the alternatives are tried (For the grammar
shown above, try w = cabd where A -> a is applied
first)
Elimination of Left-Recursion
With immediate left recursion: A -> A  | 
==> transform into A ->  A' A' ->  A' | 
A
A
A
A
..
A
A

7



===>

A'

A'

A'
.

 …
.
A'

e.g. E -> E + T | T
F -> (E) | id
T -> T * F | F
After transformation:
E -> TE' E' -> +TE' | 
T -> FT' T' -> *FT' | 
F -> (E) | id
8
General form (with left recursion):
A -> A 1 | A 2 | ... | A n | 1 | 2 | ... | m
After transformation:
==> A -> 1 A' | 2 A' | ... | m A'
A' -> 1 A' | 2 A' | ... | n A' | 
9
How about left recursion occurred for
derivation with more than two steps?
e.g.,
10
S -> Aa | b A -> Ac | Sd | e
where S => Aa => Sda
Algorithm: Eliminating left recursion
Context-free Grammar G with no cycles
(i.e., A => A ) or -production
Methods:
1. Arrange the nonterminals in some order A1, A2, ... , An
2. for i = 1 to n do
{ for j = 1 to i -1 do
replace each production of the form Ai -> Aj  by the
production Ai -> 1  | 2  | ... | k  , where
Aj -> 1 | 2 | ... | k are all current Aj-production;
eliminate the immediate left-recursion among the Aiproduction;
}
+
Input
An Example
e.g. S -> Aa | b
A -> Ac | Sd | e
Step 1: ==> S -> Aa | b
Step 2: ==> A -> Ac | Aad | bd | e
Step 3: ==> A -> bdA' |eA' A' -> cA' |adA' | 
12
2. Non-backtracking (recursive-descent) parsing
recursive descent : use a collection of mutually recursive
routines to perform the syntax analysis.
Left Factoring : A -> 1 |  2 ==> A ->  A' A' -> 1 | 2
Methods:
1.
For each nonterminal A find the longest prefix  common to two or
more of its alternatives. If    replace all the A productions
A ->  1 |  2 | ... |  n | others by A ->  A‘ | others A' -> 1 | 2 | ... |
n
2. Repeat the transformation until no more found
e.g. S -> iCtS | iCtSeS | a C -> b
==> S -> iCtSS' | a
S' -> eS | 
C -> b
Predicative Parsing
Features:
- maintains a stack rather than recursive calls
- table-driven
Components:
1. An input buffer with end marker ($)
2. A stack with endmarker ($) on the bottom
3. A parsing table, a two-dimensional array M[A,a],
where ‘A’ is a nonterminal symbol and ‘a’ is the
current input symbol (terminal/token).
14
Parsing Table
15
M[A,a]
(
S
S(S)S
)
Sε
$
Sε
Algorithm:
Input: An input string w and a parsing table M
for grammar G.
Output: A leftmost derivation of w or an error
indication.
16
Initially w$ is in input buffer and S$ is in the stack.
Method:
Starting Symbol of the grammar
do { Let a of w be the next input symbol and X be the top stack symbol;
if X is a terminal
{ if X == a then pop X from stack and remove a from input;
else ERROR();}
else
{ if M[X, a] = X -> Y1Y2...Yn then
1. pop X from the stack;
2. push YnYn-1...Y1 onto the stack with Y1 on top;
else
ERROR();
}
} while (X ≠ $)
if (X == $) and (the next input symbol == $) then accept else error();
An Example
19
Construction of the parsing
table for predictive parser
First and Follow
Def. First() /* denotes grammar symbol*/ is the set of
terminals that begin the string derived from . If  => ,
then  is also in First().
*
Def. Follow(A), A is a nonterminal, is the set of terminals a
that can appear immediately to the right of A in some
sentential form, that is, the set of terminals 'a' s.t. there
exists a derivation of the form S =>*  A a  for some  and
. If A can be the rightmost symbol in some sentential form,
then  is in Follow(A).
22
Compute First(X) for all grammar symbols X:
1. If X is terminal, then First(X) = {X}.
2. If X ->  is a production then  is in
First(X).
3. If X is nonterminal and X -> Y1Y2...Yk is a
production, then place 'a' in First(X) if for
some i, a is in First(Yi), and  is in all of
*
First(Y1), ... , First(Yi-1); that is Y1 ... Yi-1 =>
. If  is in First(Yj) for all j = 1,2,...,k, then add
 in First(X).
23
An Example
E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ | 
F -> (E) | id
First(E) = First(T) = First(F) = {(, id}
First(E') = {+, }
First(T') = {*,  }
24
25
Compute Follow(A) for all nonterminals A
1. Place $ in Follow(S), where S is the start symbol and
$ is the input buffer endmarker.
2. If there is a production A ->  B , then everything in
First() except for  is placed in Follow(B).
3. If there is a production A ->  B, or a production A ->
 B  where First() contains , then everything in
Follow(A) is in Follow(B).
26
An Example
E -> TE' E' -> +TE'|  T -> FT' T' -> *FT' | 
F -> (E) | id /* E is the start symbol */
27
Follow(E) = { $,) }
Follow(E') = { $,) }
Follow(T) = { +,$,) }
Follow(T') = { +,$,) }
Follow(F) = { *,+,$,) }
// rules 1 & 2
// rule 3
// rules 2 & 3
// rule 3
// rules 2 & 3
E -> TE' E' -> +TE'|  T -> FT' T' -> *FT‘ | 
F -> (E) | id
First(E) = First(T) = First(F) = {(, id}
First(E') = {+, }
First(T') = {*,  }
28
Construct a Predicative Parsing Table
1. For each production A ->  of the grammar, do steps 2
and 3.
2. For each terminal a in First(), add A ->  to M[A, a].
3. If  is in First(), add A ->  to M[A, b] for each terminal
b in Follow(A). If  is in First() and $ is in Follow(A),
add A ->  to M[A, $].
4. Make each undefined entry of M be error.
29
LL(1) grammar
A grammar whose parsing table has no multiply-defined
entries is said to be LL(1).
First 'L' : scan the input from left to right.
Second 'L': produce a leftmost derivation.
'1'
: use one input symbol to determine parsing
action.
* No ambiguous or left-recursive grammar can be LL(1).
Properties of LL(1) grammar
A grammar G is LL(1) iff whenever A ->  |  are two
distinct productions of G, the following conditions hold:
(1) For no terminal a do both  and  derive strings
beginning with a. (based on method 2)
 First() ∩ First() = ψ
(2) At most one of  and  can derive the empty string 
(based on method 3).
31
*  then  does not derive any string beginning
(3) if  =>
with a terminal in Follow (A) (based on methods 2 and 3).
 First() ∩ Follow(A) = ψ
(i.e. If First(A) contains  then First(A) ∩ Follow(A) = ψ)
Def. for Multiply-defined entry
If G is left-recursive or ambiguous, then M
will have at least one multiply-defined entry.
e.g.
S -> iCtSS'| a S' -> eS |  C -> b
generates:
M[S',e] = { S' -> , S' -> eS} with multiplydefined entry.
32
Parsing table with multiply-defined entry
a
S
b
S-> a
33
i
t
$
S -> iCtSS'
S’-> 
S' -> eS
S’
C
e
C->b
S’-> 
Difficulty in predictive parsing
Left recursion elimination and left factoring
make the resulting grammar hard to read and
difficult to use for translation purpose.
Thus:
* Use predictive parser for control constructs
* Use operator precedence for expressions.
-
34
Assignment #3b
Do exercises 4.3, 4.10, 4.13, 4.15
35
Download