11CS10026

advertisement
-Mandakinee Singh (11CS10026)

What is parsing?
◦ Discovering the derivation of a string: If one exists.
◦ Harder than generating strings.

Two major approaches
◦ Top-down parsing
◦ Bottom-up parsing

A parser is top-down if it discovers a parse tree top to bottom.
◦ A top-down parse corresponds to a preorder traversal(preorder
expansion) of the parse tree.
◦ A leftmost derivation is applied at each derivation step.

Start at the root of the parse tree and grow toward leaves.
Pick a production & try to match the input.

Bad “pick”  may need to backtrack.



Top Down Parser –LL(1) Grammar
LL(1) parsers
◦ Left-to-right input
◦ Leftmost derivation
◦ 1 symbol of look-ahead
Grammars that this
can handle are
called LL(1)
grammars
Preorder Expansion: The Leftmost non terminal production will
occur first.

Start with the root of the parse tree
◦ Root of the tree: node labeled with the start symbol.

Algorithm:
◦ Repeat until the fringe of the parse tree matches input
string.
◦ Declare a pointer which will represent the current position
of the parser in string.
◦ Start scanning character by character from left to right from
the parse tree and match it with input string .

If the scanned symbol is:
◦ Terminal: Increase the pointer by one.
◦ Non-Terminal: Go for a production.
Add a child node for each symbol of chosen production.

If a terminal symbol is added that doesn’t match, backtrack.
Find the next node to be expanded (a non-terminal)
Repeat The process.

Done when:


◦ Leaves of parse tree match input string (success)
◦ All productions exhausted in backtracking
(failure)
Grammar
E
E+T(rule 1) | E-T(2) | T(3)
T
T*F(4) | T/F (5)| F(6)
F
number(7) | Id(8)

Input String:x-2*y
Rule Sentential form Input string
1
3
6
8
-

E
E + T
T+ T
F+T
<Id> + T
<id,x> + T
x-2*y
x-2*y
x–2*y
x–2*y
x–2*y
x–2*y
Problem:
◦ Can’t match next terminal
◦ We guessed wrong at step 2
E
T
T
F
x
+
T
Rule Sentential form Input string
1
3
6
8
?
E
E + T
T+ T
F + T
<Id> + T
<Id,x> + T




x
x
x
x
x
x


–
–
–
–
2
2
2
2
2
2
*
*
*
*
*
*
Go for next production.
y
y
y
y
y
y
Undo all these
productions
Rule Sentential form Input string
2
3
6
8
6
7

E
E - T
T - T
F - T
<Id> - T
<Id,x> - T
<Id,x> - F
<Id,x> - <num>
Problem:




x
x
x
x
x
x
x
x

–
–
–
–
–
–


2
2
2
2
2
2
2
2

*
*
*
*
*
*
*
*
E
y
y
y
y
y
y
y
y
◦ More input to read
◦ Another cause of backtracking
E
-
T
T
F
F
2
x
Rule Sentential form
-
E
2
3
6
8
4
6
7
8
E - T
T - T
For - T
<id> - T
<id,x> - T
<id,x> - T * F
<id,x> - F * F
<id,x> - <num> * F
<id,x> - <num,2> * F
<id,x> - <num,2> * <id>
Input string




x
x
x
x
x
x
x
x
x
x
x

–
–
–
–
–
–
–
–
–



2
2
2
2
2
2
2
2
2
2
2

*
*
*
*
*
*
*
*
*
*
*

y
y
y
y
y
y
y
y
y
y
y

All terminals matches- we are done.
E
E
-
T
T
T
F
F
x
2
*
F
y

If we see it carefully then there is one more
possibility
Rule Sentential form
2
2
2
2

E
E+T
E+T+T
E+T+T+T
E+T+T+T+T
Input string
 x - 2
 x - 2
 x – 2
 x – 2
 x – 2
*
*
*
*
*
Problem: Termination
◦ Wrong choice leads to infinite expansion
(More importantly: without consuming any input!)
◦ May not be as obvious as this
◦ Our grammar is left recursive
y
y
y
y
y


Formally,
A grammar is left recursive if  a non-terminal A such that
A → A a |b (for some set of symbols a )
A →AA a
A→AAA a ………………
A→AAAAAAAAA a
A→bAAAAAA……AAAAAAAa
How to remove it:
A →b A’
A’→ a A’| e

Up Next: Predictive Parser
Download