Chapter 4: Syntax Analysis Part 2: Top-Down Parsing

advertisement
Chapter 4: Syntax Analysis
Part 2: Top-Down Parsing
CSE4100
Prof. Steven A. Demurjian
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Way, Unit 2155
Storrs, CT 06269-3155
steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Material for course thanks to:
Laurent Michel
Aggelos Kiayias
Robert LeBarre
CH4p2.1
Motivation

CSE4100 



Source
 We have a grammar
Product
 We want a parsing algorithm
Idea
 Synthesize the algorithm from the grammar
Problem
 How are we going to use the grammar ???
Hint...
 We can look at the beginning of the input...
CH4p2.2
Basic intuition

Use a sliding window over the input stream
The Input
CSE4100

Benefit
 Reveal the input
 Slowly
 A little bit at a time
– 1 token at a time
– Maybe 2 at a time ?
 But systematically
– Left to Right

What kind of derivation can use tokens in this way?
CH4p2.3
Lookahead

CSE4100

Technique
 Use a window of lookahead
 Guide a LEFTMOST derivation with the
window content
So...
 How to guide ?
CH4p2.4
Predictive Parsing

CSE4100


Lookahead
 Predictive tool!
 Helps to select the right production
Question
 How?
Answered with an example...
CH4p2.5
Example
 Consider the grammar
stmt → if expr then stmt else stmt
CSE4100
|while expr do stmt
| for ( expr ; expr ; expr) stmt
| { stmtList }
| lvalue = expr
lvalue →
expr

id
→ id | integer
And the input fragment
while x do { if x then ....

Which production should we start with ?
 Why?
CH4p2.6
Key Idea
CSE4100 

Use the production body
 To know what can start a production
Selecting a production
 Pick the rule that can start with the symbol in
the lookahead window!
Using the production
 What should we do with chosen production?
Production
while expr do stmt
Input
while x do { if x then ....
Matching

CH4p2.7
Matching

When input matches
 Eat the matched symbols
CSE4100
 In the production
 In the input


Move the window further down
Deal with the rest of the production
 How?
Production
while expr do stmt
Input
while x do { if x then ....
CH4p2.8
No Matching?


CSE4100
What does it mean?
Example below
 In production:
 NON-TERMINAL

In window:
 TERMINAL

expr
x [an identifier token]
What should we do ?
Production
while expr do stmt
Input
while x do { if x then ....
CH4p2.9
Dealing with non-terminals

CSE4100

Simple...
 The non-terminal is defined somewhere
 The non-terminal is defined by a set of
productions
Corollary
 Pop the non-terminal
 Use the lookahead to choose a production for
NT.
 Push and Recur
CH4p2.10
On the Example

Recall the grammar and the current state
stmt →
if expr then stmt else stmt
....
CSE4100
lvalue → id
expr
Production
while expr do stmt
Input
while x do { if x then ....
1. Pop
result: expr
2. Choose production for expr
result: expr → id
3. Push & Recur...
result:
Production
Input
→ id | integer
while id do stmt
while x do { if x then ....
CH4p2.11
Gain?

CSE4100


Recall
 The topmost symbol
( id )
 The lookahead
x [an instance of id]
Gain?
 The top symbol and lookahead match!
So....
 Recur
 Recursive call will match them and pop them
 Keep on going
CH4p2.12
Final outcome

What can happen in the end ?
 We “eat” (match) the entire input
CSE4100
 Meaning ?

We get “stuck” at some point
 What does it mean (to be stuck)
 How can we get stuck ?
CH4p2.13
Negative outcome

CSE4100
We can get stuck
 Lookahead window and topmost non-terminal
yield...
 An empty prediction!

This is the “classic” syntax error
Syntax error at file:line. Expecting xyz got abc
 Expecting xyz
– A list of tokens predicting the productions of the current
non-terminal
 Got abc
– “abc” is the actual content of the lookahead window
CH4p2.14
Overall Top-Down Algorithm

CSE4100

Data structures
 A stack Holds the symbols to treat
 A lookahead window to choose a prediction
Algorithm
 Startup
 Initialize stack to start symbol (a non-terminal)
 Initialize lookahead window at start of token stream

Recursive Process
 Find out if front of window and top of stack match
 If match
– Consume the symbols
 No match
– Pop / Select / Push
CH4p2.15
Are We Done ?


CSE4100
Almost....
A Few Remaining issues...
 How do we get the predictions from the
grammar ?
 What should we do if the same symbol predict
>1 rule ?
 How can we implement the algorithm above ?
 How large should the lookahead window be ?
 Why is this called “top-down” ?
 What can be automated about all this ?
CH4p2.16
The Lookahead Window
How large ?
 Very good question.
 The art of counting

CSE4100
1,2, a lot.
 Conclusion

Usually, 1 token is plenty
 Occasionally, 2 tokens may be needed
 Beyond 2 is wild
 In theory


Any finite constant will do 1,2,3....,k
CH4p2.17
Lookahead Size and Languages
With k tokens of lookahead
 Some languages can be parsed (this way)
 Some languages cannot be parsed
 What about k+1 tokens ?

CSE4100




More languages can be parsed....
So the set of languages recognized with k is a
subset of the set of languages recognized with
k+1!
We have a hierarchy!
Still...
 In practice LL(1) should be enough.
CH4p2.18
Top-Down Parsing


CSE4100




Identify a leftmost derivation for an input string
Why ?
 By always replacing the leftmost non-terminal
symbol via a production rule, we are guaranteed
of developing a parse tree in a left-to-right
fashion consistent with scanning the input.
 A  aBc  adDc  adec (scan a, scan d,
scan e, scan c - accept!)
Recursive-descent parsing concepts
Predictive parsing
 Recursive / Brute force technique
 non-recursive / table driven
Error recovery
Implementation
CH4p2.19
Top-Down Parsing


CSE4100




Identify a leftmost derivation for an input string
Why ?
 By always replacing the leftmost non-terminal
symbol via a production rule, we are guaranteed
of developing a parse tree in a left-to-right
fashion consistent with scanning the input.
 A  aBc  adDc  adec (scan a, scan d,
scan e, scan c - accept!)
Recursive-descent parsing concepts
Predictive parsing
 Recursive / Brute force technique
 non-recursive / table driven
Error recovery
Implementation
CH4p2.20
Recursive Descent Parsing Concepts
CSE4100
• General category of Parsing Top-Down
• Choose production rule based on input symbol
• May require backtracking to correct a wrong choice.
• Example:
S cAd
input: cad
A  ab | a
S
cad
c
cad
cad
d
A
S
c
a
S
c
a
A
A
d
b
cad
d
b
S
c
A
a
Problem: backtrack
cad
d
S
c
A
d
a
CH4p2.21
Predictive Parsing : Recursive

•
CSE4100


To eliminate backtracking, grammar must have:
 no left recursion
 apply left factoring
 remove -moves
If so, we can utilize current input symbol in
conjunction with non-terminals to be expanded to
uniquely determine the next action
Utilize transition diagrams (TDs):
 For each non-terminal of the grammar:
 Create an initial and final state

 If A X1X2…Xn is a production, add path
with edges X1, X2, … , Xn
TDs can be algorithmized into Program
CH4p2.22
Transition Diagrams (TDs)
CSE4100
• Unlike lexical equivalents, each edge represents a token
•Transition implies: if token, choose edge, call proc
• Recall earlier grammar and its associated TDs
F  ( E ) | id
T  FT’
T’  * FT’ | 
E  TE’
E’  + TE’ | 
E:
0
E’:
3
T
+
1
4
E’
T
2
5
E’
How are transition
diagrams used ?
6

T:
7
T’:
10
F:
14
F
*
(
8
11
15
T’
F

E
id
9
12
16
T’
)
Are -moves a
problem ?
Can we simplify
transition diagrams ?
13
Why is simplification
critical ?
17
CH4p2.23
How are Transition Diagrams Used ?
CSE4100
main()
{
TD_E();
}
TD_E()
{
TD_T();
TD_E’();
}
TD_T()
{
TD_F();
TD_T’();
}
TD_E’()
{
token = get_token();
if token = ‘+’ then
{ TD_T(); TD_E’(); }
}
TD_F()
{
token = get_token();
if token = ‘(’ then
{ TD_E(); match(‘)’); }
else
if token.value <> id then
{error + EXIT}
else
...
}
What
happened to
-moves?
NOTE: not
all error
conditions
have been
represented.
TD_E’()
{
token = get_token();
if token = ‘*’ then
{ TD_F(); TD_T’(); }
}
CH4p2.24
How can Transition Diagrams be
Simplified ?
E’:
3
+
4
T
5
E’
6

CSE4100
CH4p2.25
How can Transition Diagrams be
Simplified ? (2)
E’:
3
+
4
T
5
E’
6

CSE4100

E’:
3
+
4
T
5

6
CH4p2.26
How can Transition Diagrams be
Simplified ? (3)
E’:
3
+
4
T
5
E’
6

CSE4100
T

E’:
3
+
4
T
5
E’:
3
+
4


6
6
CH4p2.27
How can Transition Diagrams be
Simplified ? (4)
E’:
3
+
4
T
5
E’
6

CSE4100
T

E’:
3
+
4
T
5
T
+
4
6
6
0
3


E:
E’:
1
E’
2
CH4p2.28
How can Transition Diagrams be
Simplified ? (5)
E’:
3
+
4
T
5
E’
6

CSE4100
T

E’:
3
+
4
T
5
T
+
4
6
6
0
3


E:
E’:
E’
1
2
T
E:
0
T
3
+
4

6
CH4p2.29
How can Transition Diagrams be
Simplified ? (6)
E’:
3
+
4
T
5
E’
6

CSE4100
T

E’:
3
+
4
T
E’:
5
3
+
4


6
6
E:
T
0
E’
1
2
+
T
E:
0
T
3
+
E:
4

How ?
6
0
T
3

6
CH4p2.30
Additional Transition Diagram
Simplifications
• Similar steps for T and T’
CSE4100
• Simplified Transition diagrams:
*
T:
F
7
10

Why is simplification
important ?
13
F
T’:
10
*
How does code change?
11

13
F:
14
(
15
E
16
)
17
id
CH4p2.31
Motivating Table-Driven Parsing
1. Left to right scan input
CSE4100
2. Find leftmost derivation
Grammar: E  TE’
E’  +TE’ | 
T  id
Terminator
Input : id + id $
Derivation: E 
Processing Stack:
CH4p2.32
Non-Recursive / Table Driven
a + b $
CSE4100
Stack
X
NT + T
symbols of
CFG
Y
Empty stack
symbol
$
Z
Input
Predictive Parsing
Program
Output
What actions parser
should take based on
stack / input
Parsing Table
M[A,a]
General parser behavior: X : top of stack
(String + terminator)
a : current input
1. When X=a = $ halt, accept, success
2. When X=a  $ , POP X off stack, advance input, go to 1.
3. When X is a non-terminal, examine M[X,a]
if it is an error  call recovery routine
if M[X,a] = {X  UVW}, POP X, PUSH W,V,U
DO NOT expend any input
CH4p2.33
Algorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$;
repeat
CSE4100
let X be the top stack symbol and a the symbol pointed to by ip;
if X is terminal or $ then
Input pointer
if X=a then
pop X from the stack and advance ip
else error()
else
/* X is a non-terminal */
if M[X,a] = XY1Y2…Yk then begin
pop X from stack;
push Yk, Yk-1, … , Y1 onto stack, with Y1 on top
output the production XY1Y2…Yk
end
else error()
May also execute other code
based on the production used
until X=$ /* stack is empty */
CH4p2.34
Example
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
CSE4100
Our well-worn example !
Table M
Nonterminal
E
INPUT SYMBOL
id
(
TFT’
$
E’
E’
T’
T’
TFT’
T’
Fid
)
ETE’
E’+TE’
T’
F
*
ETE’
E’
T
+
T’*FT’
F(E)
CH4p2.35
Trace of Example
STACK
CSE4100
$E
$E’T
$E’T’F
$E’T’id
$E’T’
$E’
$E’T+
$E’T
$E’T’F
$E’T’id
$E’T’
$E’T’F*
$E’T’F
$E’T’id
$E’T’
$E’
$
INPUT
id + id * id$
id + id * id$
id + id * id$
id + id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
id * id$
id * id$
* id$
* id$
id$
id$
$
$
$
OUTPUT
E TE’
T FT’
F  id
T’  
E’  +TE’
Expend Input
T FT’
F  id
T’  *FT’
F  id
T’  
E’  
CH4p2.36
Leftmost Derivation for the Example
The leftmost derivation for the example is as follows:
CSE4100
E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’
 id + id T’E’  id + id * FT’E’  id + id * id T’E’
 id + id * id E’  id + id * id
CH4p2.37
What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M !
CSE4100
1st : Calculate First & Follow for Grammar
2nd: Apply Construction Algorithm for Parsing Table
Conceptual Perspective:
First: Let  be a string of grammar symbols. First() are
the first terminals that can appear in  in any possible
*
derivation. NOTE: If  
, then  is First( ).
Follow: Let A be a non-terminal. Follow(A) is the set of
terminals that can appear directly to the right of A in
*
some sentential form. (S 
Aa, for some  and ).
*
NOTE: If S  A, then $ is Follow(A).
CH4p2.38
Computing First(X) :
All Grammar Symbols
1. If X is a terminal, First(X) = {X}
2. If X  is a production rule, add  to First(X)
CSE4100
3. If X is a non-terminal, and X Y1Y2…Yk is a production rule
Place First(Y1) in First(X)
*
if Y1
,
Place First(Y2) in First(X)
* ,
if Y2 
Place First(Y3) in First(X)
…
* ,
if Yk-1 
Place First(Yk) in First(X)
*  , Stop.
NOTE: As soon as Yi 
May repeat 1, 2, and 3, above for each Yj
CH4p2.39
Computing First(X) :
All Grammar Symbols - continued
Informally, suppose we want to compute
CSE4100
First(X1 X2 … Xn ) = First (X1) “+”
First(X2) if  is in First(X1) “+”
First(X3) if  is in First(X2) “+”
…
First(Xn) if  is in First(Xn-1)
Note 1: Only add  to First(X1 X2 … Xn) if 
is in First(Xi) for all i
Note 2: For First(X1), if X1 Z1 Z2 … Zm ,
then we need to compute First(Z1 Z2 … Zm) !
CH4p2.40
Conceptually:
What is First (E, T, …) in Derivation?
CSE4100
The leftmost derivation for the example is as follows:
INPUT: id + id * id $
E $  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’
 id + id T’E’  id + id * FT’E’  id + id * id T’E’
 id + id * id E’  id + id * id $
CH4p2.41
Example
Computing First for:
CSE4100
First(TE’)
First(E)
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
First(T) “+” First(E’)
* 
Not First(E’) since T 
First(T)
First(F) “+” First(T’)
First((E)) “+” First(id)
Overall:
First(F) Not First(T’) since F 
* 
“(“ and “id”
First(E) = { ( , id } = First(F)
First(E’) = { + ,  }
First(T’) = { * ,  }
First(T)  First(F) = { ( , id }
CH4p2.42
Example 2
Given the production rules:
CSE4100
S  i E t SS’ | a
S’  eS | 
E b
Verify that
First(S) = { i, a }
First(S’) = { e,  }
First(E) = { b }
CH4p2.43
Computing Follow(A) :
All Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $
signals end of input
CSE4100
2. If there is a production A B, then everything in
First() is in Follow(B) except for .
* 
3. If A B is a production, or A B and  
(First() contains  ), then everything in Follow(A) is in
Follow(B)
(Whatever followed A must follow B, since nothing
follows B from the production rule)
We’ll calculate Follow for two grammars.
CH4p2.44
Conceptually:
What is Follow in Derivation?
The leftmost derivation for the example is as follows:
CSE4100
INPUT: id + id * id $
E$  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’
 id + id T’E’  id + id * FT’E’  id + id * id T’E’
 id + id * id E’  id + id * id $
CH4p2.45
Example
Compute Follow for:
CSE4100
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
• Follow(E) - contains $ since E is the start symbol. Also, since F  (E) then
First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ }
• Follow(E’) : E  TE’ implies Follow(E) is in Follow(E’), and
Follow(E’) = { ) , $ }
* , put in
• Follow(T) : E  TE’ implies put in First(E’). Since E’ 
* , put in
Follow(E). Since E’  +TE’ , Put in First(E’), and since E’ 
Follow(E’). Thus Follow(T) = { +, ), $ }.
• Follow(T’)
• Follow(F)
You do these !
CH4p2.46
Computing Follow : 2nd Example
Recall:
CSE4100
S  i E t SS’ | a
First(S) = { i, a }
S’  eS | 
First(S’) = { e,  }
E b
First(E) = { b }
Follow(S) – Contains $, since S is start symbol
Since S  i E t SS’ , put in First(S’) – not 
* , Put in Follow(S)
Since S’ 
Since S’  eS, put in Follow(S’)
So…. Follow(S) = { e, $ }
Follow(S’) = Follow(S) HOW?
Follow(E) = { t }
CH4p2.47
First & Follow – One More Look
Consider the following derivation:
CSE4100
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.48
First & Follow – One More Look
Consider the following derivation:
First(E) = { ( , id }
What’s First for each non-terminal ?
CSE4100
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.49
First & Follow – One More Look
Consider the following derivation:
First(T) = { ( , id }
What’s First for each non-terminal ?
CSE4100
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.50
First & Follow – One More Look
Consider the following derivation:
First(T’) = { * ,  }
What’s First for each non-terminal ?
CSE4100
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
T’  

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.51
First & Follow – One More Look
Consider the following derivation:
First(E’) = { + ,  }
What’s First for each non-terminal ?
CSE4100
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
E’  
CH4p2.52
First & Follow – One More Look
Consider the following derivation:
First(F) = { ( , id }
What’s First for each non-terminal ?
CSE4100
You do
First(F) !
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.53
First & Follow – One More Look
Consider the following derivation:
What’s First for each non-terminal ?
CSE4100
Still needs your
First(F)
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
T’  

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
E’  
CH4p2.54
First & Follow – One More Look
Consider the following derivation:
CSE4100
Follow(E) = { ( , id }
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.55
First & Follow – One More Look
Consider the following derivation:
CSE4100
Follow(T) = { + , ) , $ }
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
This “+” in Follow(T)
comes from the First(E’)
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.56
First & Follow – One More Look
Consider the following derivation:
CSE4100
Follow(T’) = { + , ) , $ }
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
T’  
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.57
First & Follow – One More Look
Consider the following derivation:
CSE4100
Follow(E’) = { ) , $ }
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
E’  
CH4p2.58
First & Follow – One More Look
Consider the following derivation:
CSE4100
Follow(F) = { +, *, ) , $ }
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 
You do
Follow(F) !
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
CH4p2.59
First & Follow – One More Look
Consider the following derivation:
CSE4100
What’s Follow for each non-terminal ?
Still needs your
Follow(F)
E  TE’  FT’E’  ( E ) T’E’ 
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
T’  
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
E’  
CH4p2.60
First & Follow – One More Look
Consider the following derivation:
CSE4100
Still needs your
First(F) and
Follow(F)
What’s First for each non-terminal ?
What’s Follow for each non-terminal ?
E  TE’  FT’E’  ( E ) T’E’ 

( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 

( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
T’  

* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
E’  
CH4p2.61
First & Follow – One More Look
Consider the following derivation:
What are implications ?
CSE4100
1. M [ E, ( ]
( id ) * id + id$ (input)
E  TE’  FT’E’  ( E ) T’E’ 
2. M [ T, ( ]
3. M [ F, ( ]
( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’ 
( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’ 
M - Table
1. E  TE’ and (
in First(E)
2. TFT’ and (
in First(T)
3. F (E) and (
in First(F)
4. E’  and )
in Follow(E’)
4. M [ E’, ) ]
( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’ 
* ( id ) * id + id$
( id ) * id + FT’E’  ( id ) * id + T’E’ 
5. M [ T’, $ ]
6. M [ E’, $ ]
5. Since $ in
Follow(T’),
T’
6. Since $ in
Follow(E’),
E’
CH4p2.62
Motivation Behind First & Follow
First:
CSE4100
Is used to indicate the relationship between non-terminals
(in the stack) and input symbols (in input stream)
Example: If A   , and a is in First(), then when
a=input, replace with .
( a is one of first symbols of , so when A is on the stack
and a is input, POP A and PUSH .
Follow: Is used when First has a conflict, to resolve choices.
* , then what follows A dictates the
When    or  
next choice to be made.
Example: If A   , and b is in Follow(A ), then when
a * , and if b is an input character, then we expand A
with  , which will eventually expand to , of which b
follows!
( Above  * . Here First( ) contains .)
CH4p2.63
Constructing Parsing Table
Algorithm:
CSE4100
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A  to M[A, a ]
3.1  in First()? Add A  to M[A, a ] for all
terminals b in Follow(A).
3.2  in First() and $ in Follow(A)? Add A  to
M[A, $ ]
4. All undefined entries are errors.
CH4p2.64
Constructing Parsing Table - Example
E  TE’
E’  + TE’ | 
T  FT’
CSE4100 T’  * FT’ | 
F  ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +,  }
First(T’) = { *,  }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ),  }
Follow(T,T’) = { +, ),  }
Expression Example: E  TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E  TE’
M[E, id ] : E  TE’
by rule 2
(by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’
(by rule 3) E’   :  in First( )
T’   :  in First( )
M[E’, )] : E’   (3.1)
M[T’, +] : T’   (3.1)
M[E’, $] : E’   (3.2)
M[T’, )] : T’   (3.1)
(Due to Follow(E’)
M[T’, $] : T’   (3.2)
CH4p2.65
Constructing Parsing Table – Example 2
CSE4100
S  i E t SS’ | a
First(S) = { i, a }
Follow(S) = { e, $ }
S’  eS | 
First(S’) = { e,  }
Follow(S’) = { e, $ }
E b
First(E) = { b }
Follow(E) = { t }
S  i E t SS’
Sa
Eb
First(i E t SS’)={i}
First(a) = {a}
First(b) = {b}
S’  eS
First(eS) = {e}
S
First() = {}
Follow(S’) = { e, $ }
INPUT SYMBOL
Nonterminal
a
S
S a
b
i
t
$
S iEtSS’
S’ 
S’ eS
S’
E
e
S 
E b
CH4p2.66
Example

Step 1
 Compute
CSE4100
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
 Follow
 First
Overall:
First(S) = {
First(E) = { ( , id } = First(F)
First(E’) = { + ,  }
First(T’) = { * ,  }
First(T)  First(F) = { ( , id }
Follow(E) = Follow(E’) = { ), $ }
Follow(T) = Follow(T’) = {+, ), $ }
Follow(F) = {+, *, ), $ }
CH4p2.67
Example

CSE4100 
Step 2
 Build the parser table
Step 3
 Input: Id + Id * Id $
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
Parser Table
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.68
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.69
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.70
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.71
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.72
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.73
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.74
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.75
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.76
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.77
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.78
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.79
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.80
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.81
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.82
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.83
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.84
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.85
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.86
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.87
Parsing Process Over Time
CSE4100
Time
Id + Id * Id $
Input Symbols
NT
Id
+
*
(
S
S → E$
S → E$
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
CH4p2.88
LL(1) Grammars
L : Scan input from Left to Right
L : Construct a Leftmost Derivation
CSE4100
1 : Use “1” input symbol as lookahead in conjunction
with stack to decide on the parsing action
LL(1) grammars have no multiply-defined entries in
the parsing table.
Properties of LL(1) grammars:
• Grammar can’t be ambiguous or left recursive
• Grammar is LL(1) when A 
1.  &  do not derive strings starting with the
same terminal a
2. Either  or  can derive , but not both.
Note: It may not be possible for a grammar to be
manipulated into an LL(1) grammar
CH4p2.89
Error Recovery
When Do Errors Occur? Recall Predictive Parser Function:
a + b $
CSE4100
Stack
X
Y
Z
$
Predictive Parsing
Program
Input
Output
Parsing Table
M[A,a]
1. If X is a terminal and it doesn’t match input.
2. If M[ X, Input ] is empty – No allowable actions
Consider two recovery techniques:
A. Panic Mode
B. Phase-level Recovery
CH4p2.90
Panic Mode Recovery
Augment parsing table with action that attempts to realign
/ synchronize token stream with the expected input.
CSE4100
Suppose : A on top of stack doesn’t mesh with current input
symbol
1. Use Follow(A) to remove input tokens – sync (discard)
2. Use First(A) to determine when to restart parsing
3. Incorporate higher level language concepts (begin/end,
while, repeat/until) to sync actions  we don’t skip
tokens unnecessarily.
Other actions:
4. When A  , use it to manipulate stack to postpone
error detection
5. Use non-matching terminal on stack as token that is
inserted into input.
CH4p2.91
Revised Parsing Table / Example
Nonterminal
CSE4100
E
INPUT SYMBOL
id
(
)
ETE’
synch
E’+TE’
TFT’
T’
F
*
ETE’
E’
T
+
Fid
E’
synch
TFT’
T’
T’*FT’
synch
synch
From Follow sets. Pop
stack entry – T or NT
synch
T’
F(E)
synch
$
synch
E’
synch
T’
synch
Skip input symbol
CH4p2.92
Skip & Synch

Meaning
 Skip
CSE4100
Discard input symbol

Synch
Pop top of stack
Messages
 Constructed based on lookahead an non-terminal
 Example




NT = F
Lookahead = +
Expecting a FACTOR. Got + for a Term. So a
factor is missing.
CH4p2.93
Revised Parsing Table / Example(2)
STACK
CSE4100
$E
$E
$E’T
$E’T’F
$E’T’id
$E’T’
$E’T’F*
$E’T’F
$E’T’
$E’
$E’T+
$E’T
$E’T’F
$E’T’id
$E’T’
$E’
$
INPUT
) id * + id$
id * + id$
id * + id$
id * + id$
id * + id$
* + id$
* + id$
+ id$
+ id$
+ id$
+ id$
id$
id$
id$
$
$
$
Remark
error, skip )
id is in First(E)
error, M[F,+] = synch
F has been popped
CH4p2.94
Phase-Level Recovery

CSE4100 


Fill in blanks entries of parsing table with error
handling routines
These routines
 modify stack and / or input stream
 issue error message
Problems:
 Modifying stack has to be done with care, so as
to not create possibility of derivations that
aren’t in language
 Infinite loops must be avoided
Can be used in conjunction with panic mode to
have more complete error handling
CH4p2.95
How Would You Implement TD Parser
• Stack – Easy to handle. Write ADT to manipulate its contents
• Input Stream – Responsibility of lexical analyzer
CSE4100 • Key Issue – How is parsing table implemented ?
One approach: Assign unique IDS
INPUT SYMBOL
Nonterminal
E
id
(
)
ETE’
synch
E’+TE’
TFT’
T’
F
*
ETE’
E’
T
+
Fid
All rules have
unique IDs
E’
synch
TFT’
T’
T’*FT’
synch
synch
Ditto for synch
actions
synch
T’
F(E)
synch
$
synch
E’
synch
T’
synch
Also for blanks
which handle
errors
CH4p2.96
Revised Parsing Table:
Nonterminal
CSE4100
INPUT SYMBOL
id
+
*
(
)
E
1
18
19
1
9
E’
20
2
21
22
3
3
T
4
11
23
4
12
13
T’
24
6
5
25
6
6
F
8
14
15
7
16
17
1 ETE’
2 E’+TE’
3 E’
4 TFT’
5 T’*FT’
6 T’
7 F(E)
8 Fid
9 – 17 :
Sync
Actions
$
10
18 – 25 :
Error
Handlers
CH4p2.97
Revised Parsing Table: (2)
Each # ( or set of #s) corresponds to a procedure that:
CSE4100
• Uses Stack ADT
• Gets Tokens
• Prints Error Messages
• Prints Diagnostic Messages
• Handles Errors
CH4p2.98
How is Parser Constructed ?
One large CASE statement:
CSE4100
state = M[ top(s), current_token ]
switch (state)
{
case 1: proc_E_TE’( ) ;
break ;
…
case 8: proc_F_id( ) ;
break ;
case 9: proc_sync_9( ) ;
break ;
…
case 17: proc_sync_17( ) ;
break ;
case 18:
…
Procs to handle errors
case 25:
}
Combine  put
in another switch
Some sync
actions may be
same
Some error
handlers may be
similar
CH4p2.99
Final Comments – Top-Down Parsing
CSE4100
So far,
• We’ve examined grammars and language theory
and its relationship to parsing
• Key concepts: Rewriting grammar into an
acceptable form
• Examined Top-Down parsing:
Brute Force : Transition diagrams & recursion
Elegant : Table driven
• We’ve identified its shortcomings:
Not all grammars can be made LL(1) !
• Bottom-Up Parsing – Next Up!
CH4p2.100
Download