Midterm Exam Advice and Hints

advertisement
Midterm Exam Advice and Hints
CSE4100
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
191 Auditorium Road, Box U-155
Storrs, CT 06269-3155
steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Dr. Robert LaBarre
United Technologies Research Center
411 Silver Lane
E. Hartford, CT 06018
LaBarrRE@utrc.utc.com
MTE.1
Core Material

CSE4100


Chapter 1: Introduction to Compilers
 Basic Compiler Ideas and Concepts
 The “Big Picture”
Chapter 2: A Simple One-Pass Compiler
 A Look at All Phases of Compilation Process
 From Lexical Analysis Thru Code Generation
FOCUS: Chapter 3: Lexical Analysis
 Specifying/Recognizing Tokens
 Patterns (Regular Expressions) and Lexemes
 Regular Expressions and DFA/NFA
 Algorithms for
 Regular Expression to NFA
 NFA to DFA
MTE.2
Core Material

FOCUS Chapter 4: Syntax Analysis
 Context Free Grammar: Defs and Concepts
CSE4100
 Derivations, Specification, Languages
 Writing Grammars
 Ambiguity, Left Recursion, Left Factoring,
Removing epsilon Moves
 Algorithm for Left Recursion Removal

Top-Down Parsing





Recursive Descent and Predictive Parsing
First and Follow Calculation
Constructing LL(1) Parsing Table
Ambiguity and Error Handling
Lex and Yacc will not be Tested!
MTE.3
Hints for Taking Exam


CSE4100 



Read the Questions Carefully!
Ask Questions if you are Confused!
Answer Questions in Any Order
 Organized to fit on minimum number of pages
 Answer “Easiest” questions for you!
Assess Points per Time Unit
 75 minutes = 75 points
 30 minutes = 30 points; 20 minutes = 20 points
Don't Be Afraid to Not Answer a Question
 60% Correct for 100 Points = 60 Points
 90% Correct For 80 Points = 72 Points
Partial Credit is the Norm
MTE.4
Possible Questions


CSE4100 




Open Notes and Open Book
5 to 6 Total Multi-Part Questions
Possibilities…
 Constructive and Algorithm Questions
 Writing and Using Grammar
 Understanding Significance and Relevance of
Concepts
Know your Algorithms and Constructs
(Regular Expressions, NFA, DFA, CFG)
Show All Work to Receive Partial (Any) Credit
Do Not Jump to Final Answer
Avoid Run-on Explanations
MTE.5
Chapter 3 Excerpted Material
Introducing Basic Terminology
Token
CSE4100
Sample Lexemes
Informal Description of Pattern
const
const
const
if
if
if
relation
<, <=, =, < >, >, >=
< or <= or = or < > or >= or >
id
pi, count, D2
letter followed by letters and digits
num
3.1416, 0, 6.02E23
any numeric constant
literal
“core dumped”
any characters between “ and “ except
“
Classifies
Pattern
Actual values are critical. Info is :
1. Stored in symbol table
2. Returned to parser
MTE.6
Language Concepts
A language, L, is simply any set of strings
over a fixed alphabet.
CSE4100
Alphabet
{0,1}
Languages
{0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c}
{abc,aabbcc,aaabbbccc,…}
{A, … ,Z}
{TEE,FORE,BALL,…}
{FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}
{ All grammatically correct
English sentences }
Special Languages:  - EMPTY LANGUAGE
 - contains  string only
MTE.7
Formal Language Operations
CSE4100 OPERATION
union of L and M
written L  M
concatenation of L
and M written LM
Kleene closure of L
written L*
DEFINITION
L  M = {s | s is in L or s is in M}
LM = {st | s is in L and t is in M}

L*=  Li
i 0
L* denotes “zero or more concatenations of “ L
positive closure of L
written L+

L+=
L
i
i 1
L+ denotes “one or more concatenations of “ L
MTE.8
Formal Language Operations
Examples
L = {A, B, C, D }
CSE4100
D = {1, 2, 3}
L  D = {A, B, C, D, 1, 2, 3 }
LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
L4 = L2 L2 = ??
L* = { All possible strings of L plus  }
L+ = L* - 
L (L  D ) = ??
L (L  D )* = ??
MTE.9
Language & Regular Expressions

A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of Symbols
(Strings) From an Alphabet.

Let  Be an Alphabet, r a Regular Expression
CSE4100
Then L(r) is the Language That is Characterized
by the Rules of R
MTE.10
Rules for Specifying Regular Expressions:
1.  is a regular expression denoting { }
CSE4100
If a is in , a is a regular expression that denotes {a}
2.
3. Let r and s be regular expressions with languages L(r)
and L(s). Then
p
r
e
c
e
d
e
n
c
e
(a) (r) | (s) is a regular expression  L(r)  L(s)
(b) (r)(s) is a regular expression  L(r) L(s)
(c) (r)* is a regular expression  (L(r))*
(d) (r) is a regular expression  L(r)
All are Left-Associative.
MTE.11
EXAMPLES of Regular Expressions
L = {A, B, C, D }
D = {1, 2, 3}
CSE4100
A|B|C|D =L
(A | B | C | D ) (A | B | C | D ) = L2
(A | B | C | D )* = L*
(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L  D)
MTE.12
Algebraic Properties of
Regular Expressions
CSE4100
AXIOM
r|s=s|r
r | (s | t) = (r | s) | t
(r s) t = r (s t)
r(s|t)=rs|rt
(s|t)r=sr|tr
r = r
r = r
r* = ( r |  )*
r** = r*
DESCRIPTION
| is commutative
| is associative
concatenation is associative
concatenation distributes over |
 Is the identity element for concatenation
relation between * and 
* is idempotent
MTE.13
Automata & Language Theory

Terminology
 FSA
 A recognizer that takes an input string and determines whether it’s
a valid string of the language.
CSE4100

Non-Deterministic FSA (NFA)
 Has several alternative actions for the same input symbol

Deterministic FSA (DFA)
 Has at most 1 action for any given input symbol

Bottom Line
 expressive power(NFA) == expressive power(DFA)
 Conversion can be automated
MTE.14
Finite Automata & Language Theory
Finite Automata :
CSE4100
A recognizer that takes an input
string & determines whether it’s a
valid sentence of the language
Non-Deterministic : Has more than one alternative action
for the same input symbol.
Can’t utilize algorithm !
Deterministic :
Has at most one action for a given
input symbol.
Both types are used to recognize regular expressions.
MTE.15
NFAs & DFAs
CSE4100
Non-Deterministic Finite Automata (NFAs) easily
represent regular expression, but are somewhat less
precise.
Deterministic Finite Automata (DFAs) require more
complexity to represent regular expressions, but offer
more precision.
We’ll discuss both plus conversion algorithms, i.e.,
NFA  DFA and DFA  NFA
MTE.16
Non-Deterministic Finite Automata
An NFA is a mathematical model that consists of :
CSE4100
• S, a set of states
• , the symbols of the input alphabet
• move, a transition function.
• move(state, symbol)  state
• move : S    S
• A state, s0  S, the start state
• F  S, a set of final or accepting states.
MTE.17
Example NFA
a
S = { 0, 1, 2, 3 }
CSE4100
start
s0 = 0
a
0
F={3}
1
b
b
2
3
b
 = { a, b }
What Language is defined ?
What is the Transition Table ?
input
s
t
a
t
e
a
b
0
{ 0, 1 }
{0}
1
--
{2}
2
--
{3}
 (null) moves possible
i

j
Switch state but do not
use any input symbol
MTE.18
Epsilon-Transitions

CSE4100

Given the regular expression: (a (b*c)) | (a (b |c+)?)
 Find a transition diagram NFA that recognizes
it.
Solution ?
MTE.19
Deterministic Finite Automata
CSE4100
A DFA is an NFA with the following restrictions:
•  moves are not allowed
• For every state s S, there is one and only one
path from s for every input symbol a  .
Since transition tables don’t have any alternative options, DFAs are
easily simulated via an algorithm.
s  s0
c  nextchar;
while c  eof do
s  move(s,c);
c  nextchar;
end;
if s is in F then return “yes”
else return “no”
MTE.20
Example - DFA
b
a
CSE4100
start
a
0
b
1
b
2
3
a
b
a
What Language is Accepted?
Recall the original NFA:
a
start
a
0
1
b
2
b
3
b
MTE.21
Regular Expression to NFA Construction
We now focus on transforming a Reg. Expr. to an NFA
CSE4100
This construction allows us to take:
• Regular Expressions (which describe tokens)
• To an NFA (to characterize language)
• To a DFA (which can be computerized)
The construction process is componentwise
Builds NFA from components of the regular
expression in a special order with particular
techniques.
NOTE: Construction is syntax-directed translation, i.e., syntax of
regular expression is determining factor for NFA construction and
structure.
MTE.22
Motivation: Construct NFA For:
:
CSE4100
start

i
f
a:
b:
start
start
ab:
start
b
A
0
a
0
1
B
a
1

A
b
B
 | ab :
a*
( | ab )* :
MTE.23
Construction Algorithm : R.E.  NFA
Construction Process :
CSE4100
1st : Identify subexpressions of the regular expression

 symbols
r|s
rs
r*
2nd : Characterize “pieces” of NFA for each subexpression
MTE.24
Piecing Together NFAs
1. For  in the regular expression, construct NFA
CSE4100
start

i
f
L()
2. For a   in the regular expression, construct NFA
start
i
a
f
L(a)
MTE.25
Piecing Together NFAs – continued(1)
3.(a) If s, t are regular expressions, N(s), N(t) their NFAs
CSE4100 s|t has NFA:
N(s)

start
i

f

N(t)
L(s)  L(t)

where i and f are new start / final states, and -moves
are introduced from i to the old start states of N(s) and
N(t) as well as from all of their final states to f.
MTE.26
Piecing Together NFAs – continued(2)
3.(b) If s, t are regular expressions, N(s), N(t) their NFAs
st (concatenation) has NFA:
CSE4100
start
i
N(s)
N(t)
f
L(s) L(t)
overlap
Alternative:
start
i

N(s)

N(t)

f
where i is the start state of N(s) (or new under the
alternative) and f is the final state of N(t) (or new).
Overlap maps final states of N(s) to start state of N(t).
MTE.27
Piecing Together NFAs – continued(3)
3.(c) If s is a regular expressions, N(s) its NFA, s*
(Kleene star) has NFA:
CSE4100

start
i

N(s)

f

where : i is new start state and f is new final state
-move i to f (to accept null string)
-moves i to old start, old final(s) to f
-move old final to old start (WHY?)
MTE.28
Properties of Construction
Let r be a regular expression, with NFA N(r), then
CSE4100
1. N(r) has at most 2*(#symbols + #operators) of r
2. N(r) has exactly one start and one accepting state
3. Each state of N(r) has at most one outgoing edge
a and at most two outgoing ’s
4. BE CAREFUL to assign unique names to all states !
MTE.29
Detailed Example
See example 3.16 in textbook for (a | b)*abb
2nd Example - (ab*c) | (a(b|c*))
CSE4100
Parse Tree for this regular expression:
r13
r5
r3
r12
|
r4
r11
a
r1
r2
a
r10
(
r7
r0
*
)
r9
r8
|
c
b
r6
*
b
What is the NFA? Let’s construct it !
c
MTE.30
Detailed Example – Construction(1)
CSE4100
r0:
b
r3:
a
r2:
c


r1:
b



r4 : r1 r2


b
c


r5 : r3 r4
a

b

c

MTE.31
Detailed Example – Construction(2)

b
r7:

r8:
CSE4100 r :
11
a
c


b
c
r6:


r9 : r7 | r 8


c


r10 : r9
b

r12 : r11 r10
a



c


MTE.32
Detailed Example – Final Step
r13 : r5 | r12
CSE4100

a
2
3

4

b
5

6
c
7


17
1
b
10

a
9



8
11
12

13
c
14

15

16

MTE.33
Conversion : NFA  DFA Algorithm
• Algorithm Constructs a Transition Table for DFA
from NFA
CSE4100 • Each state in DFA corresponds to a SET of states of
the NFA
• Why does this occur ?
•  moves
• non-determinism
Both require us to characterize multiple situations
that occur for accepting the same string.
(Recall : Same input can have multiple paths in
NFA)
• Key Issue : Reconciling AMBIGUITY !
MTE.34
Converting NFA to DFA – 1st Look

CSE4100
a
2
b
3
4

0


1
5

8


6
c
7

From State 0, Where can we move without consuming any input ?
This forms a new state: 0,1,2,6,8 What transitions are defined for
this new state ?
MTE.35
The Resulting DFA
a
0, 1, 2, 6, 8
CSE4100
3
a
a
c
b
1, 2, 5, 6, 7, 8
c
1, 2, 4, 5, 6, 8
c
Which States are FINAL States ?
a
A
a
B
a
b
c
D
c
c
How do we handle
alphabet symbols not
defined for A, B, C, D ?
C
MTE.36
Algorithm Concepts
NFA
N = ( S, , s0, F, MOVE )
-Closure(S) : s S
CSE4100
: set of states in S that are reachable
No input is
from s via -moves of N that originate
consumed
from s.
-Closure of T : T  S
: NFA states reachable from all t  T
on -moves only.
move(T,a)
: T  S, a
: Set of states to which there is a
transition on input a from some t  T
These 3 operations are utilized by algorithms /
techniques to facilitate the conversion process.
MTE.37
Illustrating Conversion – An Example
Start with NFA:
CSE4100
(a | b)*abb

a
2
3

start
0



1
6

b
4

5
a
7
b
8
9
b
10

First we calculate: -closure(0) (i.e., state 0)
-closure(0) = {0, 1, 2, 4, 7} (all states reachable from 0
on -moves)
Let A={0, 1, 2, 4, 7} be a state of new DFA, D.
MTE.38
Chapter 4 Excerpted Material
Context Free Grammars
Definition: A Context Free Grammar, CFG, is described
by T, NT, S, PR, where:
CSE4100
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generatable
by the grammar & in the language
S: Start symbol, SNT, which defines all strings of the
language
PR: Production rules to indicate how T and NT are
combines to generate valid strings of the language.
PR: NT  (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free
Grammar is a mathematical model !
MTE.39
Context Free Grammars : A First Look
assign_stmt  id := expr ;
expr  term operator term
CSE4100
term  id
term  real
term  integer
What do “BLUE”
symbols represent?
What do “BLACK”
symbols represent?
operator  +
operator  -
Derivation: A sequence of grammar rule applications
and substitutions that transform a starting non-term
into a collection of terminals / tokens.
Simply stated: Grammars / production rules allow us to
“rewrite” and “identify” correct syntax.
MTE.40
How is Grammar Used ?
Given the rules on the previous slide, suppose
CSE4100
id := real + int;
is input. Is it syntactically correct? How do we know?
expr is represented as: expr  term operator term
Is this accurate / complete?
expr  expr operator term
expr  term
How does this affect
the derivations which
are possible?
MTE.41
Grammar Concepts
A step in a derivation is zero or one action that
replaces a NT with the RHS of a production rule.
CSE4100
EXAMPLE: E  -E (the  means “derives” in one
step) using the production rule: E  -E
EXAMPLE: E  E A E  E * E  E * ( E )
DEFINITION:  derives in one step
+
 derives in  one step
*

derives in  zero steps
EXAMPLES:  A      if A  is a production rule
*
*
1  2 …  n  1 
n ;  
 for all 
*  and    then  
* 
If  
MTE.42
Leftmost and Rightmost Derivations
Leftmost: Replace the leftmost non-terminal symbol
CSE4100
E  E A E  id A E  id * E  id * id
lm
lm
lm
lm
Rightmost: Replace the leftmost non-terminal symbol
E rm
 EAE 
E A id 
E * id 
id * id
rm
rm
rm
Important Notes:
A
If A     , what’s true about  ?
lm
If A     , what’s true about  ?
rm
Derivations: Actions to parse input can be represented
pictorially in a parse tree.
MTE.43
Examples of LM / RM Derivations
E  E A E | ( E ) | -E | id
CSE4100
A+|-|*| / |
A leftmost derivation of : id + id * id
A rightmost derivation of : id + id * id
MTE.44
Derivations & Parse Tree
E
E EAE
E
CSE4100
A
E
E
E*E
E
A
E
*
E
 id * E
 id * id
E
A
E
id
*
E
E
A
E
id
*
id
MTE.45
Parse Trees and Derivations
Consider the expression grammar:
E  E+E | E*E | (E) | -E | id
CSE4100
Leftmost derivations of
id + id * id
E
E
EE+E
E
+
E + E  id + E
E
E
+
E
id
E
id + E  id + E * E
E
+
E
id
E
*
E
MTE.46
Removing Ambiguity
Take Original Grammar:
CSE4100
stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Revise to remove ambiguity:
stmt  matched_stmt | unmatched_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
unmatched_stmt  if expr then stmt
| if expr then matched_stmt else unmatched_stmt
How does this grammar work ?
MTE.47
Resolving Difficulties : Left Recursion
A left recursive grammar has rules that support the
+
derivation : A 
A, for some .
CSE4100
Top-Down parsing can’t reconcile this type of grammar,
since it could consistently make choice which wouldn’t
allow termination.
A  A  A  A … etc. A A | 
Take left recursive grammar:
A  A | 
To the following:
A’  A’
A’  A’ | 
MTE.48
Why is Left Recursion a Problem ?
CSE4100
Consider:
EE+T | T
TT*F | F
F  ( E ) | id
Derive : id + id + id
EE+T
How can left recursion be removed ?
EE+T | T
What does this generate?
EE+TT+T
EE+TE+T+TT+T+T

How does this build strings ?
What does each string have to start with ?
MTE.49
Resolving Difficulties : Left Recursion (2)
Informal Discussion:
CSE4100
Take all productions for A and order as:
A  A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A  1A’ | 2A’ | … | nA’
A’  1A’ | 2A’| … | m A’ | 
For our example:
EE+T | T
TT*F | F
F  ( E ) | id
E  TE’
E’  + TE’ | 
F  ( E ) | id
T  FT’
T’  * FT’ | 
MTE.50
Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep,
this isn’t enough
S  Aa | b
CSE4100
A  Ac | Sd | 
S  Aa  Sda
Algorithm:
Input: Grammar G with no cycles or -productions
Output: An equivalent grammar with no left recursion
1.
Arrange the non-terminals in some order A1,A2,…An
2.
for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai  Aj
by the productions Ai  1 | 2 | … | k
where Aj  1|2|…|k are all current Aj productions;
end
eliminate the immediate left recursion among Ai productions
end
MTE.51
Using the Algorithm
Apply the algorithm to:
A1  A2a | b
A2  A2c | A1d | 
CSE4100
i=1
For A1 there is no left recursion
i=2
for j=1 to 1 do
Take productions: A2  A1 and replace with
A2  1  | 2  | … | k |
where
A1 1 | 2 | … | k are A1 productions
in our case A2  A1d becomes A2  A2ad | bd
What’s left: A1 A2a | b
A2  A2 c | A2 ad | bd | 
Are we done ?
MTE.52
Using the Algorithm (2)
No ! We must still remove A2 left recursion !
CSE4100
A1 A2a | b
A2  A2 c | A2 ad | bd | 
Recall:
A  A1 | A2 | … | Am | 1 | 2 | … | n
A  1A’ | 2A’ | … | nA’
A’  1A’ | 2A’| … | m A’ | 
Apply to above case. What do you get ?
MTE.53
Removing Difficulties : -Moves
Very Simple: A  and B uAv implies add B uv
to the grammar G.
CSE4100
Why does this work ?
Examples:
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
A1  A2 a | b
A2  bd A2’ | A2’
A2’  c A2’ | bd A2’ | 
MTE.54
Removing Difficulties : Cycles
How would cycles be removed ?
CSE4100
S  SS | ( S ) | 
Has: S  SS  S
S
MTE.55
Removing Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose:
CSE4100
stmt  if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A  1 | 2
 : if expr then stmt
1: else stmt 2 : 
Transform to:
A   A’
A’  1 | 2
EXAMPLE:
stmt  if expr then stmt rest
rest  else stmt | 
MTE.56
Resolving Grammar Problems :
Another Look
• Forcing Matching Within Grammar
CSE4100
- if-then-else  else must go with most recent if
stmt  if expr then stmt | if expr then stmt else stmt | other
Leads to:
stmt  matched_stmt | unmatched_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
unmatched_stmt  if expr then stmt
| if expr then matched_stmt else unmatched_stmt
MTE.57
Resolving Grammar Problems :
Another Look (2)
• Left Factoring - Know correct choice in advance !
CSE4100
where_queries  where are_does declarations of symbol
| where are_does “search_option” appear
Rules given in Assignment or
where_queries  where rest_where
rest_where  are declarations of symbol
| does “search_option” appear
MTE.58
Resolving Grammar Problems :
Another Look (3)
• Left Recursion - Avoid infinite loop when choosing rules
CSE4100
A  A | 
EXAMPLE:
EE+T | T
TT*F | F
F  ( E ) | id
A  A’
A’  A’ | 
E  TE’
E’  + TE’ | 
F  ( E ) | id
T  FT’
T’  * FT’ | 
If we use the original production rules on input: id + id (with leftmost):
* T+T+T+…+T
EE+TE+T+TE+T+T+T
not assured
If we use the transformed production rules on input: id + id (with leftmost):
* id + TE’ 
* id + id E’  id + id
E  TE’  T + TE’ 
MTE.59
Overall Top-Down Algorithm

CSE4100

Data structures
 A stack Holds the symbols to treat
 A lookahead window to choose a prediction
Algorithm
 Startup
 Initialize stack to start symbol (a non-terminal)
 Initialize lookahead window at start of token stream

Recursive Process
 Find out if front of window and top of stack match
 If match
– Consume the symbols
 No match
– Pop / Select / Push
MTE.60
Lookahead Size and Languages
With k tokens of lookahead
 Some languages can be parsed (this way)
 Some languages cannot be parsed
 What about k+1 tokens ?

CSE4100




More languages can be parsed....
So the set of languages recognized with k is a
subset of the set of languages recognized with
k+1!
We have a hierarchy!
Still...
 In practice LL(1) should be enough.
MTE.61
Top-Down Parsing


CSE4100




Identify a leftmost derivation for an input string
Why ?
 By always replacing the leftmost non-terminal
symbol via a production rule, we are guaranteed
of developing a parse tree in a left-to-right
fashion consistent with scanning the input.
 A  aBc  adDc  adec (scan a, scan d,
scan e, scan c - accept!)
Recursive-descent parsing concepts
Predictive parsing
 Recursive / Brute force technique
 non-recursive / table driven
Error recovery
Implementation
MTE.62
Non-Recursive / Table Driven
a + b $
CSE4100
Stack
X
NT + T
symbols of
CFG
Y
Empty stack
symbol
$
Z
Input
Predictive Parsing
Program
Output
What actions parser
should take based on
stack / input
Parsing Table
M[A,a]
General parser behavior: X : top of stack
(String + terminator)
a : current input
1. When X=a = $ halt, accept, success
2. When X=a  $ , POP X off stack, advance input, go to 1.
3. When X is a non-terminal, examine M[X,a]
if it is an error  call recovery routine
if M[X,a] = {X  UVW}, POP X, PUSH W,V,U
DO NOT expend any input
MTE.63
Algorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$;
repeat
CSE4100
let X be the top stack symbol and a the symbol pointed to by ip;
if X is terminal or $ then
Input pointer
if X=a then
pop X from the stack and advance ip
else error()
else
/* X is a non-terminal */
if M[X,a] = XY1Y2…Yk then begin
pop X from stack;
push Yk, Yk-1, … , Y1 onto stack, with Y1 on top
output the production XY1Y2…Yk
end
else error()
May also execute other code
based on the production used
until X=$ /* stack is empty */
MTE.64
Example
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
CSE4100
Our well-worn example !
Table M
Nonterminal
E
INPUT SYMBOL
id
(
TFT’
$
E’
E’
T’
T’
TFT’
T’
Fid
)
ETE’
E’+TE’
T’
F
*
ETE’
E’
T
+
T’*FT’
F(E)
MTE.65
Trace of Example
STACK
CSE4100
$E
$E’T
$E’T’F
$E’T’id
$E’T’
$E’
$E’T+
$E’T
$E’T’F
$E’T’id
$E’T’
$E’T’F*
$E’T’F
$E’T’id
$E’T’
$E’
$
INPUT
id + id * id$
id + id * id$
id + id * id$
id + id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
id * id$
id * id$
* id$
* id$
id$
id$
$
$
$
OUTPUT
E TE’
T FT’
F  id
T’  
E’  +TE’
Expend Input
T FT’
F  id
T’  *FT’
F  id
T’  
E’  
MTE.66
Leftmost Derivation for the Example
The leftmost derivation for the example is as follows:
CSE4100
E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’
 id + id T’E’  id + id * FT’E’  id + id * id T’E’
 id + id * id E’  id + id * id
MTE.67
What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M !
CSE4100
1st : Calculate First & Follow for Grammar
2nd: Apply Construction Algorithm for Parsing Table
Conceptual Perspective:
First: Let  be a string of grammar symbols. First() are
the first terminals that can appear in  in any possible
derivation. NOTE: If *  , then  is First( ).
Follow: Let A be a non-terminal. Follow(A) is the set of
terminals that can appear directly to the right of A in
*
*
some sentential form. (S  Aa, for some  and ).
NOTE: If S  A, then $ is Follow(A).
MTE.68
Computing First(X) :
All Grammar Symbols
1. If X is a terminal, First(X) = {X}
2. If X  is a production rule, add  to First(X)
CSE4100
3. If X is a non-terminal, and X Y1Y2…Yk is a production rule
Place First(Y1) in First(X)
*
if Y1
,
Place First(Y2) in First(X)
* ,
if Y2 
Place First(Y3) in First(X)
…
* ,
if Yk-1 
Place First(Yk) in First(X)
*  , Stop.
NOTE: As soon as Yi 
May repeat 1, 2, and 3, above for each Yj
MTE.69
Computing First(X) :
All Grammar Symbols - continued
Informally, suppose we want to compute
CSE4100
First(X1 X2 … Xn ) = First (X1) “+”
First(X2) if  is in First(X1) “+”
First(X3) if  is in First(X2) “+”
…
First(Xn) if  is in First(Xn-1)
Note 1: Only add  to First(X1 X2 … Xn) if 
is in First(Xi) for all i
Note 2: For First(X1), if X1 Z1 Z2 … Zm ,
then we need to compute First(Z1 Z2 … Zm) !
MTE.70
Example
Computing First for:
CSE4100
First(TE’)
First(E)
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
First(T) “+” First(E’)
* 
Not First(E’) since T 
First(T)
First(F) “+” First(T’)
First((E)) “+” First(id)
Overall:
First(F) Not First(T’) since F 
* 
“(“ and “id”
First(E) = { ( , id } = First(F)
First(E’) = { + ,  }
First(T’) = { * ,  }
First(T)  First(F) = { ( , id }
MTE.71
Example 2
Given the production rules:
CSE4100
S  i E t SS’ | a
S’  eS | 
E b
Verify that
First(S) = { i, a }
First(S’) = { e,  }
First(E) = { b }
MTE.72
Computing Follow(A) :
All Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $
signals end of input
CSE4100
2. If there is a production A B, then everything in
First() is in Follow(B) except for .
* 
3. If A B is a production, or A B and  
(First() contains  ), then everything in Follow(A) is in
Follow(B)
(Whatever followed A must follow B, since nothing
follows B from the production rule)
We’ll calculate Follow for two grammars.
MTE.73
Example
Compute Follow for:
CSE4100
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
• Follow(E) - contains $ since E is the start symbol. Also, since F  (E) then
First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ }
• Follow(E’) : E  TE’ implies Follow(E) is in Follow(E’), and
Follow(E’) = { ) , $ }
* , put in
• Follow(T) : E  TE’ implies put in First(E’). Since E’ 
* , put in
Follow(E). Since E’  +TE’ , Put in First(E’), and since E’ 
Follow(E’). Thus Follow(T) = { +, ), $ }.
• Follow(T’)
• Follow(F)
You do these !
MTE.74
Computing Follow : 2nd Example
Recall:
CSE4100
S  i E t SS’ | a
First(S) = { i, a }
S’  eS | 
First(S’) = { e,  }
E b
First(E) = { b }
Follow(S) – Contains $, since S is start symbol
Since S  i E t SS’ , put in First(S’) – not 
* , Put in Follow(S)
Since S’ 
Since S’  eS, put in Follow(S’)
So…. Follow(S) = { e, $ }
Follow(S’) = Follow(S) HOW?
Follow(E) = { t }
MTE.75
Motivation Behind First & Follow
First:
CSE4100
Is used to indicate the relationship between non-terminals
(in the stack) and input symbols (in input stream)
Example: If A   , and a is in First(), then when
a=input, replace with .
( a is one of first symbols of , so when A is on the stack
and a is input, POP A and PUSH .
Follow: Is used when First has a conflict, to resolve choices.
* , then what follows A dictates the
When    or  
next choice to be made.
Example: If A   , and b is in Follow(A ), then when
a * , and if b is an input character, then we expand A
with  , which will eventually expand to , of which b
follows!
( Above  * . Here First( ) contains .)
MTE.76
Constructing Parsing Table
Algorithm:
CSE4100
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A  to M[A, a ]
3.1  in First()? Add A  to M[A, a ] for all
terminals b in Follow(A).
3.2  in First() and $ in Follow(A)? Add A  to
M[A, $ ]
4. All undefined entries are errors.
MTE.77
Constructing Parsing Table - Example
E  TE’
E’  + TE’ | 
T  FT’
CSE4100 T’  * FT’ | 
F  ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +,  }
First(T’) = { *,  }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ),  }
Follow(T,T’) = { +, ),  }
Expression Example: E  TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E  TE’
M[E, id ] : E  TE’
by rule 2
(by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’
(by rule 3) E’   :  in First( )
T’   :  in First( )
M[E’, )] : E’   (3.1)
M[T’, +] : T’   (3.1)
M[E’, $] : E’   (3.2)
M[T’, )] : T’   (3.1)
(Due to Follow(E’)
M[T’, $] : T’   (3.2)
MTE.78
Constructing Parsing Table – Example 2
CSE4100
S  i E t SS’ | a
First(S) = { i, a }
Follow(S) = { e, $ }
S’  eS | 
First(S’) = { e,  }
Follow(S’) = { e, $ }
E b
First(E) = { b }
Follow(E) = { t }
S  i E t SS’
Sa
Eb
First(i E t SS’)={i}
First(a) = {a}
First(b) = {b}
S’  eS
First(eS) = {e}
S
First() = {}
Follow(S’) = { e, $ }
INPUT SYMBOL
Nonterminal
a
S
S a
b
i
t
$
S iEtSS’
S’ 
S’ eS
S’
E
e
S 
E b
MTE.79
Example

CSE4100
Step 1
 Compute
 Follow
 First
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
MTE.80
Example

CSE4100 
Step 2
 Build the parser table
Step 3
 Input: Id + Id * Id $
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
Parser Table
Input Symbols
NT
Id
+
*
(
S
E → TE’
E → TE’
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
MTE.81
Motivation Behind First & Follow
First:
CSE4100
Is used to indicate the relationship between non-terminals
(in the stack) and input symbols (in input stream)
Example: If A   , and a is in First(), then when
a=input, replace with .
( a is one of first symbols of , so when A is on the stack
and a is input, POP A and PUSH .
Follow: Is used when First has a conflict, to resolve choices.
* , then what follows A dictates the
When    or  
next choice to be made.
Example: If A   , and b is in Follow(A ), then when
a * , and if b is an input character, then we expand A
with  , which will eventually expand to , of which b
follows!
( Above  * . Here First( ) contains .)
MTE.82
Constructing Parsing Table
Algorithm:
CSE4100
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A  to M[A, a ]
3.1  in First()? Add A  to M[A, a ] for all
terminals b in Follow(A).
3.2  in First() and $ in Follow(A)? Add A  to
M[A, $ ]
4. All undefined entries are errors.
MTE.83
Constructing Parsing Table - Example
E  TE’
E’  + TE’ | 
T  FT’
CSE4100 T’  * FT’ | 
F  ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +,  }
First(T’) = { *,  }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ),  }
Follow(T,T’) = { +, ),  }
Expression Example: E  TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E  TE’
M[E, id ] : E  TE’
by rule 2
(by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’
(by rule 3) E’   :  in First( )
T’   :  in First( )
M[E’, )] : E’   (3.1)
M[T’, +] : T’   (3.1)
M[E’, $] : E’   (3.2)
M[T’, )] : T’   (3.1)
(Due to Follow(E’)
M[T’, $] : T’   (3.2)
MTE.84
Constructing Parsing Table – Example 2
CSE4100
S  i E t SS’ | a
First(S) = { i, a }
Follow(S) = { e, $ }
S’  eS | 
First(S’) = { e,  }
Follow(S’) = { e, $ }
E b
First(E) = { b }
Follow(E) = { t }
S  i E t SS’
Sa
Eb
First(i E t SS’)={i}
First(a) = {a}
First(b) = {b}
S’  eS
First(eS) = {e}
S
First() = {}
Follow(S’) = { e, $ }
INPUT SYMBOL
Nonterminal
a
S
S a
b
i
t
$
S iEtSS’
S’ 
S’ eS
S’
E
e
S 
E b
MTE.85
Example

CSE4100
Step 1
 Compute
 Follow
 First
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
MTE.86
Example

CSE4100 
Step 2
 Build the parser table
Step 3
 Input: Id + Id * Id $
S→E$
T → F T’
E → T E’
T’ → * F T’
E’→ + T E’
→
→
F→(E)
→ Id
Parser Table
Input Symbols
NT
Id
+
*
(
S
E → TE’
E → TE’
E
E → TE’
E →TE’
E’
T
E’ →+TE’
T → FT’
F
F → Id
$
E’ → 
E’ → 
T’ → 
T’ → 
T →FT’
T’ → 
T’
)
T’ →*FT’
F → (E)
MTE.87
LL(1) Grammars
L : Scan input from Left to Right
L : Construct a Leftmost Derivation
CSE4100
1 : Use “1” input symbol as lookahead in conjunction
with stack to decide on the parsing action
LL(1) grammars have no multiply-defined entries in
the parsing table.
Properties of LL(1) grammars:
• Grammar can’t be ambiguous or left recursive
• Grammar is LL(1) when A 
1.  &  do not derive strings starting with the
same terminal a
2. Either  or  can derive , but not both.
Note: It may not be possible for a grammar to be
manipulated into an LL(1) grammar
MTE.88
Error Recovery
When Do Errors Occur? Recall Predictive Parser Function:
a + b $
CSE4100
Stack
X
Y
Z
$
Predictive Parsing
Program
Input
Output
Parsing Table
M[A,a]
1. If X is a terminal and it doesn’t match input.
2. If M[ X, Input ] is empty – No allowable actions
Consider two recovery techniques:
A. Panic Mode
B. Phase-level Recovery
MTE.89
Panic Mode Recovery
Augment parsing table with action that attempts to realign
/ synchronize token stream with the expected input.
CSE4100
Suppose : A on top of stack doesn’t mesh with current input
symbol
1. Use Follow(A) to remove input tokens – sync (discard)
2. Use First(A) to determine when to restart parsing
3. Incorporate higher level language concepts (begin/end,
while, repeat/until) to sync actions  we don’t skip
tokens unnecessarily.
Other actions:
4. When A  , use it to manipulate stack to postpone
error detection
5. Use non-matching terminal on stack as token that is
inserted into input.
MTE.90
Revised Parsing Table / Example
Nonterminal
CSE4100
E
INPUT SYMBOL
id
(
)
ETE’
synch
E’+TE’
TFT’
T’
F
*
ETE’
E’
T
+
Fid
E’
synch
TFT’
T’
T’*FT’
synch
synch
From Follow sets. Pop
stack entry – T or NT
synch
T’
F(E)
synch
$
synch
E’
synch
T’
synch
Skip input symbol
MTE.91
Skip & Synch

Meaning
 Skip
CSE4100
Discard input symbol

Synch
Pop top of stack
Messages
 Constructed based on lookahead an non-terminal
 Example




NT = F
Lookahead = +
Expecting a FACTOR. Got + for a Term. So a
factor is missing.
MTE.92
Revised Parsing Table / Example(2)
STACK
CSE4100
$E
$E
$E’T
$E’T’F
$E’T’id
$E’T’
$E’T’F*
$E’T’F
$E’T’
$E’
$E’T+
$E’T
$E’T’F
$E’T’id
$E’T’
$E’
$
INPUT
) id * + id$
id * + id$
id * + id$
id * + id$
id * + id$
* + id$
* + id$
+ id$
+ id$
+ id$
+ id$
id$
id$
id$
$
$
$
Remark
error, skip )
id is in First(E)
error, M[F,+] = synch
F has been popped
MTE.93
Phase-Level Recovery

CSE4100 


Fill in blanks entries of parsing table with error
handling routines
These routines
 modify stack and / or input stream
 issue error message
Problems:
 Modifying stack has to be done with care, so as
to not create possibility of derivations that
aren’t in language
 Infinite loops must be avoided
Can be used in conjunction with panic mode to
have more complete error handling
MTE.94
Download