• Review:

• Review:
– How do we define a grammar (what are the
components in a grammar)?
– What is a context free grammar?
– What is the language defined by a grammar?
– What is an ambiguous grammar?
– Why we care about left or right derivation?
• Example:
<PROGRAM> ->’program’ id ‘begin’ <stmt_list> ‘end’
<STMT> -> id ‘=‘ <EXPR>
<EXPR> -><EXPR> <OP> <EXPR> | id
<OP> -> ‘+’ | ‘-’ | ‘*’ | ‘/’
program test
t0 = t1 + t2;
t3 = t0 * t4
program test
t0 = t1+t2;
t3 = t0*t4
• Parsing:
– The process to determine whether the start
symbol can derive the program.
• If successful, the program is a valid program.
• If failed, the program is invalid.
– Two approaches in general.
• Expanding from the start symbol to the whole
program (top down)
• Reduction from the whole program to start symbol
(bottom up).
• Parsing methods:
– universal:
• There exists algorithms that can parse any context free
grammar. These algorithms are too inefficient to be used
• What is considered efficient? Scan the program (from left to
right) once.
– Top-down parsing
• build the parse tree from root to leave (using leftmost
derivation, why?).
• Recursive descent, and LL parser
– Bottom-up parsing
• build the parse tree from leaves to root.
• Operator precedence parsing, LR (SLR, canonical LR, LALR).
– Recursive descent parsing associates a procedure with each
nonterminal in the grammar, it may require backtracking of the
input string.
– Example: <type>-><simple> | ^ id | array [<sample>] of <type>
<simple> ->integer | char | num dotdot num
void type() {
if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM)
else if (lookahead == ‘^’) {
match (‘^’);
} else if (lookahead == ARRAY) {
match (ARRAY);
match (‘]’);
match (OF);
} else error();
– Example: <type>-><simple> | ^ id | array [<simple>] of <type>
<simple> ->integer | char | num dotdot num
void simple() {
if (lookahead == INTEGER) match (INTEGER);
else if (lookahead == CHAR) match (CHAR);
else if (lookahead == NUM) {
} else error();
void match(token t) {
if (lookahead == t) {lookahead = nexttoken();}
else error();
– Recursive descent parsing may require
backtracking of the input string
• try out all productions, backtrack if necessary.
• E.g S->cAd, A->ab | a
• input string cad
– A special case of recursive-descent parser that needs
no backtracking is called a predictive parser.
• Look at the input string, must predict the right
production every time to avoid backtracking.
• Needs to know what first symbols can be generated
by the right side of a production only lookahead for
one token)
– First(a) - the set of tokens that can appear as the first
symbols of one or more strings generated from a. If a is
empty string or can generate empty string, then empty
string is also in First(a).
– Given productions A ->a | b, predictive (by looking at 1
token ahead) parsing requires First(a) and First(b) to be
– Predictive parsing won’t work on some type of
• Left recursion: A->Aw (expanding A results in an
infinite loop).
• Have common left factor: A->aB | aC (First(aB) and
First(aC) is not disjoint).
– Eliminating Left Recursion
• Immediate Left Recursion
– Replace A->Aa | b with A->bA’ and A’->aA’ | e
– Example: E->E+T | T
T->T*F | F
F->(E) | id
– In general,
A  A1 | A 2 | ... | A m | 1 |  2 | ... |  n
Can be replaced by
A  1 A' | 2A' | ... |  n A'
A'  1 A' |  2 A' | ... | mA' | 
• Algorithm 4.1. Eliminating left recursion:
Arrange the nonterminals in some order A1, A2, …,
for i = 1 to n do begin
for j = 1 to I-1 do begin
expand production of the form Ai ->Aj w
end for
eliminate the immediate left recursion among Ai
End for
(the algorithm can fail if the grammar has a cycle (A==> A), or A->e)
Example 1:
S->Aa | b
A->Ac | Sd | e
Example 2:
X->YZ | a
Y->ZX |Xb
Z->XY | ZZ | a
– Left factoring (to produce a grammar suitable
for predictive parsing)
• replace productions
A  1 | 2 | ... | n |  1 | ... |  m
A  A' |  1 | ... |  m
A'  1 | ... |  n
Example: S->iEtS | iEtSeS|a