COP4020 Programming Languages Parsing

advertisement
COP4020
Programming
Languages
Parsing
Prof. Xin Yuan
Overview



Top-down and bottom-up parsing
Recursive descent parsing
Table driven LL(1) parsing
5/29/2016
COP4020 Spring 2014
2

Parsing:

The process to determine whether the start symbol can
derive the program.



If successful, the program is a valid program.
If failed, the program is invalid.
Two approaches in general.


Expanding from the start symbol to the whole program (top down)
Reduction from the whole program to start symbol (bottom up).
<expression>
<expression>
<operator>
<expression>
<expression> <operator> <expression>
identifier
*
identifier
+
identifier

Top-down parsing

build the parse tree from root to leave (using leftmost
derivation, why?).
 Recursive descent parser
 LL parser


First L – left to right scan
Second L – left most derivation

Recursive descent parsing associates a procedure with each
nonterminal in the grammar, it may require backtracking of
the input string.

Example: <type>-><simple> | ^ id | array [<simple>] of <type>
<simple> ->integer | char | num dotdot num
void type() {
if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM)
simple();
else if (lookahead == ‘^’) {
match (‘^’);
match(ID);
} else if (lookahead == ARRAY) {
match (ARRAY);
match(‘[‘);
simple();
match (‘]’);
match (OF);
type();
} else error();
}

Example: <type>-><simple> | ^ id | array [<simple>] of <type>
<simple> ->integer | char | num dotdot num
void simple() {
if (lookahead == INTEGER) match (INTEGER);
else if (lookahead == CHAR) match (CHAR);
else if (lookahead == NUM) {
match(NUM);
match(DOTDOT);
match(NUM);
} else error();
}
void match(token t) {
if (lookahead == t) {lookahead = nexttoken();}
else error();
}

Recursive descent parsing may require backtracking of the
input string



try out all productions, backtrack if necessary.
E.g S->cAd, A->ab | a
input string cad
A
special case of recursive-descent parser that
needs no backtracking is called a predictive
parser.

Look at the input string, must predict the right production every time
to avoid backtracking – LL(1)

Needs to know what first symbols can be generated by the right side of a
production only lookahead for one token

Non recursive predictive parsing (table driven
LL(1) parsing)

Predictive parser can be implemented by recursive-descent
parsing


Requirement: by looking at the first terminal symbol that a
nonterminal symbol can derive, we should be able to choose the
correct production to expand the nonterminal symbol.
If the requirement is met, the parser easily be implemented
using a non-recursive scheme by building a parsing table.

A parsing table example
(1) E->TE’
(2) E’->+TE’
(3) E’->
(4) T->FT’
(5) T’->*FT’
(6) T’->
(7) F->(E)
(8) F->id


E
E’
T
T’
F
id
(1)
+
*
(
(1)
(2)
(4)
$
(3)
(3)
(6)
(6)
(4)
(6)
(8)
)
(5)
(7)

Using the parsing table, the predictive
parsing program works like this:





A stack of grammar symbols ($ on the
bottom)
A string of input tokens ($ at the end)
A parsing table, M[NT, T] of productions
Algorithm:
put ‘$ Start’ on the stack ($ is the end
of input string).
1) if top == input == $ then accept
2) if top == input then
pop top of the stack; advance to next
input symbol; goto 1;
3) If top is nonterminal
if M[top, input] is a production then
replace top with the production; goto 1
else error
4) else error

Example:
(1) E->TE’
(2) E’->+TE’
(3) E’->
(4) T->FT’
(5) T’->*FT’
(6) T’->
(7) F->(E)
(8) F->id


Stack
$E
$E’T
$E’T’F
$E’T’id
$E’T’
…...
E
E’
T
T’
F
id
(1)
+
*
(
(1)
(2)
(4)
input
id+id*id$
id+id*id$
id+id*id$
id+id*id$
+id*id$
$
(3)
(3)
(6)
(6)
(4)
(6)
(8)
)
(5)
(7)
production
E->TE’
T->FT’
F->id
This produces leftmost derivation:
E=>TE’=>FT’E’=>idT’E’=>….=>id+id*id
Download