Document

advertisement
Chapter 4
Syntax Analysis
Yu-Chen Kuo
1
4.1 The Role of The Parser
• A parser obtains a string of tokens from the
lexical analyzer and verifies the string can
be generated by the grammar for the source
language.
• We expect the parser to report any syntax
errors in an intelligible fashion. It should
also recover from commonly occurring
errors so that it can continue processing the
remainder of its input.
Yu-Chen Kuo
2
4.1 The Role of The Parser
Yu-Chen Kuo
3
Three Types of Parsers
1. CYK algorithm and Early’s algorithm:
inefficient to use in production compilers
2. Top-down method
3. Bottom-up method
Yu-Chen Kuo
4
Syntax Error Handling
• Lexical error:
– misspelling an identifier, keyword, or operator
• Syntactic error:
– an arithmetic expression with unbalanced
parentheses
• Semantic error:
– an operator applied to an incompatible operand
• Logical error:
– an infinitely recursive call
Yu-Chen Kuo
5
Syntax Error Handling (Cont.)
• The error handler in a parser has simple-tostate goals:
– It should report the presence of errors clearly
and accurately
– It should recover from each error quickly
enough to be able to detect subsequence errors
– It should not significantly slow down the
processing of correct programs
Yu-Chen Kuo
6
Error-Recovery Strategies
• Panic mode
– Discard the input symbol until one of a designated set
of synchronizing tokens is found
– synchronizing token: ; end
– Guarantee not to go into an infinite loop
• Phrase level
– Parser may perform local correction
– replace a prefix of the remaining input by some allowed
string;
– replace , by ;
– delete an extraneous ; or insert missing ;
– May lead to an infinite loop if we always insert
something on the input ahead the current input symbol
Yu-Chen Kuo
7
Error-Recovery Strategies (cont.)
• Error production
– Grammars to produce errors
• Global correction
– Given an incorrect input string x and grammar
G, find a parse tree for a related string y, such
that the number of insertions, deletions, and
changes of tokens required to transform x into y
is as small as possible
– Too costly
Yu-Chen Kuo
8
4.2 Context-Free Grammars
• stmt if expr then stmt else stmt
1. Terminals: tokens
• if, then, else
2. Noterminals: set of strings
• expr, stmt
3. Start symbol
• stmt
4. Productions
Yu-Chen Kuo
9
Example 4.2
expr  expr op expr
expr  (expr)
expr  - expr
expr  id
op  +
op  op  *
op  /
op  
•
•
•
Terminals: id, +, -, *, / . 
Noterminals: expr, op
Start symbol: expr
Yu-Chen Kuo
10
Notational Conventions
1.
There symbols are terminals:
i)
ii)
iii)
iv)
v)
2.
Lower-case letters: a, b, c
Operator symbols: +, Punctuation symbols: parentheses, comma
Digits: 0, 1, …,9
Boldface strings: id, if
There symbols are nonterminal:
i) Upper-case letters: A, B, C
ii) The letter S: start symbol
iii) Lower-case italic names: expr, stmt
Yu-Chen Kuo
11
Notational Conventions (cont.)
3.
4.
5.
6.
7.
Upper-case letters late in alphabet: X, Y, Z,
represent grammar symbols (terminals or
nonterminal)
Lower-case letters late in alphabet: u, v,…z,
represent string of terminals
Lower-case Greek letters : , , , represent string
of grammar symbols
A-productions (all productions): A  1| 2|…| k
Start symbol: the left side of the first production
Yu-Chen Kuo
12
Example 4.3
E  EAE | (E) | - E | id
A+|-|*|/|
By notational conventions −
Nonterminals: E, A
Terminals: remaining symbols
Yu-Chen Kuo
13
Derivations
E  E+E | E*E| (E) | - E | id
•
E derives -E
–
•
The derivation of -(id) from E
−
•
•
•
•
E-E
E  - E  - (E)  -(id)
A    , if A  
 : one step derivation
*
 : zero or more steps derivations

: one or more steps derivations

Yu-Chen Kuo
14
Derivations (cont.)
•
•
•
•
•
*
  , for any string 
*
*
If   ,    , then   
L(G) denotes the language generated by G. L(G)
contains
all terminal symbols. w  L(G), if

S  w. String w is call a sentence of G.
*
S  ,  may contain nonterminals. We call 
is a sentential form of G.
E.g., -(id + id) is a sentence of the grammar,

because E  -(id + id)
Yu-Chen Kuo
15
Leftmost & Rightmost Derivations
Leftmost derivation ( 
)
lm
•
-(E+E) -(id+E)  -(id+id)
Rightmost derivation ( 
)
rm
– -(E+E) -(E+id)  -(id+id)
S  . We call  is a left-sentential form of G.
lm
S  . We call  is a cannonical-sentential
–
•
•
•
rm
form of G.
Yu-Chen Kuo
16
Parse Tree and Derivations
Yu-Chen Kuo
17
Parse Tree and Derivations (cont.)
Yu-Chen Kuo
18
Ambiguity
•
•
•
More than one parse tree for some sentences
More than leftmost derivation for some
sentences
More than rightmost derivation for some
sentences
Yu-Chen Kuo
19
4.3 Regular Expression
vs. Context-free grammar
•
Every language that can be described by a
regular expression can also be described by a
context-free grammar
–
–
•
(a|b)*abb
A0  aA0 | bA0 | aA1
A1 bA2
A2 bA3
A3 
Every regular set is a context-free language
Yu-Chen Kuo
20
Why use regular expression to define
the lexical syntax of a language ?
•
1.
2.
3.
4.
Why not use CFG for the lexical syntax
Lexical rules of a language are frequently quite
simple. We do not need a powerful grammar.
Regular expression provide a more concise and
easier to understand notation for tokens
An efficient lexical analysis can be constructed
automatically from regular expressions
Separating the syntactic structure of a language
into lexical and nonlexical parts
Yu-Chen Kuo
21
Why use regular expression to define
the lexical syntax of a language ?
•
•
•
Regular expressions are most useful for
describing structure of lexical constructs such
as identifies, constants, keywords
Grammars are most useful for describing
nested structure of lexical constructs such
balanced parentheses, matching begin-end’s,
corresponding if-then-else’s.
Nested structures can not be described by
regular expressions.
Yu-Chen Kuo
22
Verifying the Language
Generated by a Grammar
•
Proof that L(G) = L
–
–
Every string generated by G is in L
Every string in L can be generated by G
S  (S)S |  , generates all string of balanced ( )
•
–
Every sentence derived from S is balanced by induction
•
•
•
–
S  (S)S * (x)S * (x)y (n steps)
S * x (less than n setps and must be balanced)
S * y (less than n setps and must be balanced)
Every balanced string length 2n is derivable from S
•
•
•
w = (x)y of length 2n
x and y are length of less than 2n. They are both balanced and
derivable from S
S  (S)S * (x)S * (x)y =w
Yu-Chen Kuo
23
Eliminating Ambiguity
stmt  if expr then stmt
| if expr then stmt else stmt
| other
Yu-Chen Kuo
24
Eliminating Ambiguity (cont.)
•
•
Disambiguating rule: match each else with the
closest previous unmatched then
The statement between a then and an else must be
matched
stmt  matched_stmt | unmatched_stmt
matched_stmt  if expr then matched_stmt else matched_stmt
| other
unmatched_stmt  if expr then matched_stmt else unmatched_stmt
| if expr then stmt
Yu-Chen Kuo
25
Eliminating Immediate Left
Recursion
•
•
A grammar is left recursive if it has a production
A+A
Top-down parsing methods cannot handle leftrecursion grammars because top-down parsing is
corresponding to the leftmost derivation.
A  A1 | A2 | ... | Am | 1 |  2 | ... | n
 A  1 A' |  2 A' | ... |  n A'
A'  1 A' | 2 A' | ... |  m A' | 
Yu-Chen Kuo
26
Eliminating Immediate Left
Recursion (cont.)
•
Non-immediate left recursion
S  Aa | b
A  Ac | Sd | 
S  Aa  Sda
Yu-Chen Kuo
27
Eliminating General Left
Recursion
•
Input Grammar G with no cycle (A+A) or
-production
Yu-Chen Kuo
28
Eliminating General Left
Recursion (cont.)
•
Non-immediate left recursion
S  Aa | b
A  Ac | Sd | 
 A  Ac | Aad | bd | 
 S  Aa | b
A  bdA’
A’  cA’ | adA’ | 
Yu-Chen Kuo
29
Eliminating Left Factoring
•
When it is not clear which of two alternative
productions to use to expand a nonterminal A.
We rewrite A-production to defer the decision
until we have seen enough of the input.
stmt  if expr then stmt
| if expr then stmt else stmt
 stmt  if expr then stmt S’
S’  else stmt | 
•
A  1 | 2 |…| n | 
 A  A’| 
A’  1 | 2 |…| n
Yu-Chen Kuo
30
Non-Context-Free Language
Constructs
•
•
L1={wcw | w is in (a|b)*} is not context-free
L1’={wcwR | w is in (a|b)*} is context-free
– S  aSa | bSb | c
•
L2 ={a nbmcn d m | n  1, m  1} is not context-free
•
L2’ ={a nbmcmd n | n  1, m  1} is context-free
– S  aSd | aAd
A  bAc | bc
n n m m
{
a
•
L2’’= b c d | n  1, m  1} is context-free
Yu-Chen Kuo
31
Non-Context-Free Language
Constructs
•
•
•
•
L3 = {a nbn c n | n  0} is not context-free
n n
{
a
L3’= b | n  1} is context-free
– S  aSb | ab
Context-free grammar can keep count of
two items but not three.
Regular expression cannot keep count.
Yu-Chen Kuo
32
Top-Down Parsing
• Top-down parsing can be viewed as an
attempt to find a leftmost derivation for an
input string.
• It constructs a parser tree for the input string
to root and creating the nodes of the parser
tree in preorder.
Yu-Chen Kuo
33
Recursive Descent Parsing
• A general top-down parsing that may involve
backtracking
• E.g., S  cAd
A  ab| a , w=cad
Yu-Chen Kuo
34
Predictive Parsers
• By carefully writing a grammar, eliminating
left recursion, and left factoring, we obtain a
grammar that can be parsed by a recursivedescent parser that needs no backtracking.
(predictive parser)
• Predictive Parser is implemented by
recursive procedures
Yu-Chen Kuo
35
Predictive Parsers (cont.)
type  simple | id | array [simple] of type
simple  integer | char | num dotdot num
Yu-Chen Kuo
36
Transition Diagrams for Predictive
Parsers
•
•
We can create a transition diagram for a
predictive parsers
For each nonterminal A:
1. Create an initial and final state
2. For each production A X1X2…Xn, create a path
from the initial to the final state, with edges labeled
X1, X2, …, Xn
•
Based on transition diagram to match terminals
again lookahead input symbols
Yu-Chen Kuo
37
Transition Diagrams for Predictive
Parsers (cont.)
Yu-Chen Kuo
38
Transition Diagrams for Predictive
Parsers (cont.)
Yu-Chen Kuo
39
Transition Diagrams for Predictive
Parsers (cont.)
Yu-Chen Kuo
40
Nonrecursive Predictive Parsing
• It is possible to build a nonrecursive
predictive parser by maintaining a stack
explicitly, rather than via recursive calls.
• The key problem during predictive parsing
is that of determining the production to be
applied for a nonterminal. The nonrecursive
parser looks up the production to be applied
in a parsing table.
Yu-Chen Kuo
41
Nonrecursive Predictive
Parsing(Cont.)
Yu-Chen Kuo
42
Nonrecursive Predictive
Parsing(Cont.)
• The parser has an input buffer, a stack, a parsing
table, and an output stream.
• The input buffer contains the strings to be parsed
followed by $, a symbol used to indicate the end
of the input string.
• The stack contains a sequence of grammar symbol
with $ on the bottom, indicating the bottom of the
stack, Initially, the stack contains the start symbol
S of the grammar on top of $.
Yu-Chen Kuo
43
Nonrecursive Predictive
Parsing(Cont.)
• The output stream show the derivation steps
for the grammar to produce the input string.
• The parser table is a two-dimensional array
M[A, a] to show the stack action for a
nonterminal A in the top of stack to meet a
terminal a or the symbol $.
Yu-Chen Kuo
44
Predictive Parsing Algorithm
• Input. A string w and a parsing table M for G
• Output. A leftmost derivation of w, if wL(G)
• Method.
– Put $S on stack where S is the start symbol of G
– Put w$ in the input buffer
– Execute the predictive parsing program (Fig. 4.14)
Yu-Chen Kuo
45
Predictive Parsing Program
Yu-Chen Kuo
46
Example
• Consider non-left-recursive grammar for
arithmetic expression
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  (E) | id
Yu-Chen Kuo
47
Example (parsing table M)
Yu-Chen Kuo
48
Example (Stack Moves)
Yu-Chen Kuo
49
FIRST and FOLLOW
• The construction of a predictive parser is
aided by FIRST and FOLLOW functions.
• These functions help us to construction the
predictive parser table.
• FOLLOW function can also be used as
synchronizing tokens during panic-mode
error recovery.
Yu-Chen Kuo
50
FIRST function
• If  is a string of a grammar symbols,
FIRST() be the set of terminals that begin
the strings derived from .
• If  * , FIRST().
Yu-Chen Kuo
51
FIRST Sets
•
Compute FIRST(X) for all grammar symbols
X, by the following rules until no terminals
or  can be added to any FIRST(X)
1. If X is a terminal, then FIRST(X)={X}.
2. If X , FIRST(X)
3. If XY1Y2…Yk, then aFIRST(X) where
a FIRST(Yi) and FIRST(Y1),
FIRST(Y2) ,…,FIRST(Yi-1), Y1Y2…Yi-1 *
Yu-Chen Kuo
52
FIRST sets (cont.)
3. If  FIRST(Yj), for all j=1,2,..,k, then
FIRST(X).
•
•
Everything in FIRST(Y1) is also in FIRST(X). If Y1
does not derive , nothing more added to FIRST(X).
Otherwise, we add FIRST(Y2), and so on.
For FIRST(X1X2…Xn),
FIRST(X1) FIRST(X1X2…Xn),
FIRST(X2) FIRST(X1X2…Xn), if FIRST(X1) and
so on. FIRST(X1X2…Xn) if FIRST(Xi) for all i.
Yu-Chen Kuo
53
FOLLOW function
• Define FOLLOW(A), for nonterminal A, to
be the set of terminal a that can appear
immediately to right of A.
• S*Aa, aFOLLOW(A).
• If A can be the rightmost symbol in some
sentential form, then $ FOLLOW(A).
Yu-Chen Kuo
54
FOLLOW Sets
•
Compute FOLLOW(A), for nonterminal A,
by the following rules until nothing can be
added to any FOLLOW set.
1. If S is a start symbol, $FOLLOW(S).
2. If AB, FIRST()FOLLOW(B).
3. If AB or AB and FIRST(),
FOLLOW(A) FOLLOW(B).
(FOLLOW(B) may not  FOLLOW(A))
Yu-Chen Kuo
55
Example
•
•
•
•
•
•
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  (E) | id
FIRST(E)=FIRST(T)=FIRST(F)={(, id}
FIRST(E’)={+, }
FIRST(T’)={*, }
FOLLOW(E)=FOLLOW(E’)={ ), $}
FOLLOW(T)=FOLLOW(T’)={+, ), $}
FOLLOW(F)={+, *, ), $}
Yu-Chen Kuo
56
Construction of Predictive Parsing
Table
•
•
Suppose A , aFIRST(). Then, the
parser will expand A by  when the current
input symbol is a.
If A , = or  *, then the parser will
expand A by  when the input symbol is in
FOLLOW(A) or if $ on the input has been
reached and $FOLLOW(A).
Yu-Chen Kuo
57
Construction of Predictive Parsing
Table (Algorithm)
•
•
•
1.
2.
Input. Grammar G.
Output. Parsing table M.
Method.
For each production A , do steps 2 and 3.
For each terminal aFIRST(), add A  to
M[A, a]
3. If  FIRST(), add A  to M[A, b] for each
terminal bFOLLOW(A).
If  FIRST(), $ FOLLOW(A), add A  to
M[A, $].
4. Make each undefined entry of M be error.
Yu-Chen Kuo
58
Example
Yu-Chen Kuo
59
LL(1)
• A grammar whose parsing table has no
multiply-defined entries is said to be LL(1).
• The first “L” means scanning the input form
left to right.
• The second “L”, means a leftmost derivation.
• And, “1” means using one input symbol of
lookahead at each step.
Yu-Chen Kuo
60
Example (multiply-defined entry)
S  iEtSS’ | a (ambiguous)
S’  eS |  FIRST(S)={i,a}, FIRST(S’)={e, }
E b
FOLLOW(S)={e,$}, FOLLOW(S’}={e,$}
Yu-Chen Kuo
61
LL(1) Properties
•
No ambiguous or left-recursive grammar can be
LL(1).
•
A grammar G is LL(1) if and only if
 A | , the following conditions hold:
1. FIRST() FIRST()=.
2. At most one of  * or  *.
3. If  *, then FIRST() FOLLOW(A) )=.
• If-then-else statement violates the condition 3, so
not LL(1).
Yu-Chen Kuo
62
Error Recovery in Predictive Parsing
•
In nonrecursive predictive parsing, an error is
detected in one of the following two situations:
1. When the terminal on top of the stack does not
match the next input symbol
2. When nonterminal A is on top of the stack, a is
the next input symbol, and the parsing table
entry M[A,a] is empty.
Yu-Chen Kuo
63
Error Recovery in Predictive Parsing
(cont.)
•
•
•
Panic-mode error recovery is based on the
idea of skipping symbols on the input until a
token in a selected set of synchronizing
tokens appears.
Its effectiveness depends on the choice on the
synchronizing set.
Some heuristics are as follows.
Yu-Chen Kuo
64
Error Recovery in Predictive Parsing
(cont.)
1. We place all symbols in FOLLOW(A) into the
synchronizing set for nonterminal A. If we skip
tokens until an element of FOLLOW(A) is seen
and pop A from the stack, it is likely that parsing
can continue.
2. There is hierarchical structure on constructs in a
language; e.g., expressions within blocks, and so
on. We can add to the synchronizing set of a lower
construct the symbols that begin higher constructs.
Yu-Chen Kuo
65
Error Recovery in Predictive Parsing
(cont.)
3. If we add symbols in FIRST(A) to the
synchronizing set of nonterminal A, then it may
be possible to resume parsing according to A if a
symbol in FIRST(A) appears in the input.
4. If a nonterminal can generate the empty string,
then the production deriving  can be used as a
default. Doing so may postpone some error
detection, but cannot cause an error to be missed.
Yu-Chen Kuo
66
Error Recovery in Predictive Parsing
(cont.)
5. If a terminal on top of the stack, cannot be
matched, a simple idea is to pop the
terminal, issue a message saying that
terminal was inserted, and continue
parsing.
Yu-Chen Kuo
67
Example
• Add “sync” to indicate synchronizing
tokens obtained from FOLLOW sets.
Yu-Chen Kuo
68
Example
•
•
•
Yu-Chen Kuo
If M[A,a]=,
skip a.
If M[A,a]=sync,
A is popped.
If a token on
top of stack
does not match
input, pop it.
69
Bottom-Up Parsing
• Shift-reduce parsing is a general style of
bottom-up parsing.
• It attempts to construct a parse tree for an
input string beginning at the leaves and
working up towards the root.
• At each reduction step a particular substring
matching the right side of a production is
replaced by the nonterminal on the left side
of that production.
Yu-Chen Kuo
70
Bottom-Up Parsing (cont.)
• If the substring is chosen correctly at each
reduction step, a rightmost derivation is
traced out in reverse.
Yu-Chen Kuo
71
Example
•
•
Consider the following grammar
S  aABe
A  Abc | b
B d
The sentence “a b b c d e” cab be reduced
to S by the following reduction steps:
Yu-Chen Kuo
72
Example
•
•
Consider the following grammar
S  aABe
A  Abc | b
B d
The sentence “a b b c d e” cab be reduced
to S by the following reduction steps:
Yu-Chen Kuo
73
Example
abbcde
(A b) (handle at position 2)
aAbcde
(A Abc)
aAde
(B d)
aABe
(S aABe)
S
The reductions trace out the following rightmost
derivation in reverse:
S rm a A B e  a A d e  a A b c d e  a b b c d e
1.
2.
3.
4.
5.
•
Yu-Chen Kuo
74
Handles
•
•
Informally, a handle of a string is a substring that
matched the right side of a production, and whose
reduction to the nonterminal on the left side of the
production represents one step along the reverse
of a rightmost derivation.
Formally, a handle of a right-sentential form  is a
production A and a position of  where the
string  may be found and replace by A to
produce the previous right-sentential form in a
rightmost derivation .
Yu-Chen Kuo
75
Handles (cont.)
If S * Aw * w, then A in the position
following  is a handle of w.
• Note:
1. The string w to the right of a handle contains only
terminal symbols.
2. If a grammar is unambiguous, then every rightsentential form of the grammar has exactly one
handle; otherwise, some right-sentential forms
may have more than one handle.
•
Yu-Chen Kuo
76
Example
•
•
•
1.
Consider the following ambiguous grammar
EE+E | E*E | (E) | id
Two rightmost derivation of id1+id2*id3
E  E+E E+ E*E  E+ E*id3  E+ id2*id3
 id1+ id2*id3
–
–
id1 is a handle of the right-sentential form id1+id2*id3
E  id, replace id1 by E becomes E+ id2*id3
2. E  E*E E*id3 E+ E*id3  E+ id2*id3
 id1+ id2*id3
two possible handles
Yu-Chen Kuo
77
Handles (cont.)
Yu-Chen Kuo
78
Handles (cont.)
•
•
The handle represents the leftmost
complete subtree consisting of a node and
all its children.
Reducing  to A in w can be thought of
as “pruning the handle”, removing the
children of A from the parse tree.
Yu-Chen Kuo
79
Stack Implementation of ShiftReduce Parsing
•
•
We implement shift-reduce parsing by
using a stack to hold grammar symbols and
an input buffer to hold input string w.
We use $ to mark the end of the stack and
the input buffer
STACK
INPUT
$
w$
Yu-Chen Kuo
80
Stack Implementation of ShiftReduce Parsing
•
•
•
The parser shifts input symbols onto the
stack until a handle  is on top of stack.
It reduces  to the left side of production A.
Repeats this cycle until it has an error or the
stack contains S and the input buffer is
empty.
STACK
INPUT
$S
$
Yu-Chen Kuo
81
Example
Yu-Chen Kuo
82
Conflicts during Shift-Reduce
Parsing
• There are context-free grammars for which
shift-reduce parsing cannot be used.
• It’s possible to reach a configuration such
that knowing the entire stack contents and
the next input symbol, cannot decide
whether to shift or to reduce (a shift/reduce
conflict), or which of several reductions to
make (a reduce/reduce conflict)
Yu-Chen Kuo
83
Example of Shift/Reduce Conflict
stmt  if expr then stmt
| if expr then stmt else stmt
| other
STACK
INPUT
$ … if expr then stmt else …$
• Note that if we resolve the conflict in favor
of shifting, the parser will behave naturally.
Yu-Chen Kuo
84
Example of Reduce/Reduce Conflict
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
stmt
→ id (parameter_list)
stmt
→ expr := expr
parameter_list → parameter_list , parameter
parameter_list → parameter
parameter
→ id
expr
→ id (expr_list)
expr
→ id
expr_list
→ expr_list , expr
expr_list
→ expr
STACK
$ … id ( id
INPUT
, id )…$
Yu-Chen Kuo
85
LR Parsers
• LR(k) parsing is an efficient, bottom-up
parsing technique.
• The “L” stands for left-to-right scanning of
the input, the “R” for constructing a
rightmost derivation in reverse, and the “k”
for the number of input symbols of
lookahead that are used in making parsing
decisions.
• When (k) is omitted, k is assumed to be 1.
Yu-Chen Kuo
86
LR Parsers (cont.)
• LR parsing can be used to parse a large class of
context-free grammars than LL parsing.
• The principal drawback of LR parsing is that is
too much work to construct an LR parser by hand
for a typical programming language grammar.
• We need a specialized tool  an LR parser
generator. Fortunately, many such generators are
available.
Yu-Chen Kuo
87
The LR Parsing Algorithm
• The schematic form of an LR parser:
Yu-Chen Kuo
88
The LR Parsing Algorithm (cont.)
• An LR parser consists of an input, an output, a stack,
a driver program, and a parsing table that has two
parts( action and goto)
• The driver program is the same for all LR parsers.
• The parsing table changes from one parser to
another.
• The parsing program reads tokens from an input
buffer one at a time.
Yu-Chen Kuo
89
The LR Parsing Algorithm (cont.)
• The parser uses a stack to store a string of the form
s0X1s1X2s2…Xmsm, where sm is on top. Each Xi is a
grammar symbol and each si is a state symbol. Each
state symbol summarizes that information contained
in the stack below it.
• The combination of the state symbol on top of the
stack and the current input symbol are used to index
the parsing table to determine the parsing shiftreduce decision.
Yu-Chen Kuo
90
The LR Parsing Algorithm (cont.)
•
•
1.
2.
3.
4.
•
The parsing table consists of two parts: a parsing
action function and a goto function
An action table entry can have one of four values:
shift s, where s is a state
reduce by a grammar production A
accept
error
The function goto takes a state and a grammar
symbol arguments and produces a state.
Yu-Chen Kuo
91
The LR Parsing Algorithm (cont.)
•
•
A configuration of an LR parser is a pair whose
first component is the stack contents and whose
second component is the unexpended input:
(s0X1s1X2s2…Xmsm , ai ai+1…an$)
The next move of the parser is determined by
reading ai, the current input symbol, and sm, the
state on top of the stack, and then consulting the
parsing action table entry action[sm, ai]
Yu-Chen Kuo
92
The LR Parsing Algorithm (cont.)
•
A configurations resulting after each of the four
types of move are as follows:
1. If action[sm , ai ] = shift s, the parser executes a
shift move, entering the configuration
(s0X1s1X2s2…Xmsm ai s , ai+1…an$)
Here the parser has shifted both the current input
symbol ai, and the next state s, which is given in
action[sm , ai ], onto the stack; ai+1 becomes the
current input symbol.
Yu-Chen Kuo
93
The LR Parsing Algorithm (cont.)
2. If action[sm , ai ] = reduce A, the parser
executes a reduce move, entering the configuration
(s0X1s1X2s2…Xm-rsm-r A s, ai ai+1…an$)
where s =goto[sm-r, A] and r is the length of .
Here the parser first popped 2r symbols off the
stack ( r state symbols and r grammar symbols),
exposing state sm-r. The parser then pushed both A
and s, the entry for goto[sm-r, A], onto the stack.
Yu-Chen Kuo
94
The LR Parsing Algorithm (cont.)
3. If action[sm , ai ] = accept, parsing is completed.
4. If action[sm , ai ] = error, the parser has discovered
an error and calls an error recovery routine.
Yu-Chen Kuo
95
The LR Parsing Program
Yu-Chen Kuo
96
Example
(1) E
(2) E
(3) T
(4) T
(5) F
(6) F
E+T
T
T*F
F
 (E)
 id
Yu-Chen Kuo
97
Example (cont.)
Yu-Chen Kuo
98
Example (cont.)
See p.220
Yu-Chen Kuo
99
Constructing LR Parsing Tables
• There are three methods for constructing an
LR parsing table for a grammar.
(1) Simple LR (SLR) is the easiest to
implement, but least powerful.
(2) Canonical LR is the most powerful, and
the most expensive.
(3) Lookahead LR(LALR) is intermediate in
power and cost.
Yu-Chen Kuo
100
Constructing SLR Parsing Tables
1) FOLLOW(A) for every nonterminal A in G.
2) The augmented grammar G’
3) The canonical collection of sets of LR(0)
items C
4) The transition diagram for viable prefixes
5) The parsing table action and goto function
Yu-Chen Kuo
101
Example
(1) E
(2) E
(3) T
(4) T
(5) F
(6) F
E+T
T
T*F
F
 (E)
 id
Yu-Chen Kuo
102
Step 1: FOLLOW sets for
Nonterminals
•
•
FOLLOW(E) = { +, ), $}
FOLLOW(T)=FOLLOW(F)={+,*,),$}
Yu-Chen Kuo
103
Step 2: The Augment Grammar
•
If G is a grammar with start symbol S, then G’, the
argument grammar for G, is G with a new start
symbol S’ and productions S’S.
• The argument grammar is as follows:
E’  E
E E+T |T
T T*F|F
F  (E) | id
Yu-Chen Kuo
104
Step 3: Sets of LR(0) Items
•
•
•
•
An LR(0) item( item for short) of a grammar
G is a production of G with a dot at some
position of the right side.
The production AXYZ yields the four items
A  •XYZ A  X • YZ
A  XY•Z A  XYZ •
The production A  generates only one item
A•
Yu-Chen Kuo
105
The Closure Operation
•
If I is a set of LR(0) items for a grammar G, then
closure(I) is the set of items constructed from I by
the following two rules:
1. Initinally, every item in I is added to closure(I).
2. If A•B is closure(I) and B is a production
in G, then add the item B • to closure(I), if it is
not already there.
3. We apply this rule until no more new items can be
added to closure(I).
Yu-Chen Kuo
106
Example
•
E’
E
E
T
T
F
F
If I is a set of one item {[E’•E]}, then closure(I)
contains the items
•E
•E+T
•T
•T*F
•F
 • (E)
 • id
Yu-Chen Kuo
107
The Goto Operation
•
If I is a set of items and X is a grammar
symbol, then goto(I,X) is the closure of the
set of all items [AX•] such that
[A •X] is in I.
Yu-Chen Kuo
108
The Goto Operation
•
E
T
T
F
F
If I is a set of items {[E’•E], [E E •+T]},
then goto(I,+) consists of
 E + •T
 •T * F
•F
 • (E)
 • id
Yu-Chen Kuo
109
The Sets-of-Items Construction
•
The algorithm to construct C, the canonical
collection of sets of LR(0) items (all possible
items) for a augumented grammar G’, is shown
below.
Yu-Chen Kuo
110
Example
•
closure ({[E’•E]})=I0:
E’  •E
E •E+T
E •T
T •T*F
T •F
F  • (E)
F  • id
•
goto(I0, E)=I1:
E’  E •
E E•+T
Yu-Chen Kuo
111
Example (cont.)
•
goto(I0, T)= I2: E  T •
•
•
goto(I0, F)= I3: T  F •
goto(I0, ( )= I4: F  (• E)
T T•*F
E
E
T
F
F
•
•E+T
•T
•T*F
 • (E)
 • id
goto(I0, id)= I5: F  id •
Yu-Chen Kuo
112
Example (cont.)
•
•
goto(I1, +)= I6: E  E + • T
T
T
F
F
goto(I2, *)= I7: T
F
F
•T*F
 •F
 • (E)
 • id
T*•F
 • (E)
 • id
Yu-Chen Kuo
113
Example (cont.)
•
goto(I4, E)= I8 : F  • (E)
E E•+T
•
•
•
•
•
goto(I4, T)= I2
goto(I4, F)= I3
goto(I4, ( )= I4
goto(I4, id)= I5
goto(I6, T)= I9 : E  E + T •
T T•*F
Yu-Chen Kuo
114
Example (cont.)
•
•
•
•
•
•
•
•
•
goto(I6, F)= I3
goto(I6, ( )= I4
goto(I6, id)= I5
goto(I7, F )= I10 : T  T * F •
goto(I7, ( )= I4
goto(I7, id )= I5
goto(I8, ) )= I11 : F  (E) •
goto(I8, + )= I6
goto(I9, * )= I7
Yu-Chen Kuo
115
Step 4: The Transition Diagram
•
The goto functions for the canonical collection of
sets of items be shown as a transition diagram.
Yu-Chen Kuo
116
Step 4: The Transition Diagram
(cont.)
Yu-Chen Kuo
117
Step 4: The Transition Diagram
(cont.)
Yu-Chen Kuo
118
Step 5: The Parsing Table
• State i is construction form Ii.
1) The parsing actions for state i are determinated as
follows:
a) If [A•a] is in Ii and goto(Ii, a)=Ij, set
action[i,a] to “shift j”. Here a must be a terminal.
b) If [A•] is in Ii, set action [i, a] to “reduce
A” for all a in FOLLOW (A). Here A must
not S’.
c) If [ S’S•] in Ii, set action[i,$] to “accept”.
Yu-Chen Kuo
119
Step 5: The Parsing Table (cont.)
2) The goto transactions for state i are
constructed for all nonterminals A using
the rule:
• If goto(Ii, A)=Ij, the goto[i,A]= j.
3) All entries not defined by 1) and 2) are set
“error”
4) The start state of the parser is the one
constructed from the set of items
containing [S’• S]
Yu-Chen Kuo
120
Step 5: The Parsing Table (cont.)
Yu-Chen Kuo
121
Example for a not SLR(1) and
unambiguous grammar
•
•
•
•
•
S 
S 
L 
L 
R
L=R
R
*R
id
L
Yu-Chen Kuo
122
Example for a not SLR(1) and
unambiguous grammar (cont.)
Yu-Chen Kuo
123
Example for a not SLR(1) and
unambiguous grammar (cont.)
•
•
Consider the set of items I2.
I2: S  L = R
R L
–
–
•
•
•
action[2, =]= “shift 6”
FOLLOW(R) contains =,
set action[2, =]= “reduce R  L”
Shift/Reduce Conflict but not ambiguous
In fact, no right-sentential form that begins
R= ….. when the viable prefix L only. (*L)
Spilt the state accord to FOLLOW set.
Yu-Chen Kuo
124
Construction Canonical LR Parsing
Tables
•
•
•
An LR(1) item is of the form [A, a]
where A is a production and a is a
terminal or $.
The “1” refers to the length of the second
component, called the lookahead of the item.
The lookahead has no effect in an item of the
form [A, a], where  is not , but an item
of the form [A, a] calls for a reduction by
A only if the input symbol is a.
Yu-Chen Kuo
125
Construction Canonical LR Parsing
Tables
•
•
•
Thus, we are compelled to reduce by A  only
on those input symbols a for which
[A , a] is an LR(1) item in the state on top of
the stack.
The set of such a’s will always be a subset of
FOLLOW(A), but it could be a proper subset.
The method for constructing the collection of
sets of LR(1) items is essentially the same as the
way we built the collection of sets of LR(0)
items. We only to modify two procedures
closure and goto.
Yu-Chen Kuo
126
Construction Canonical LR Parsing
Tables (closure function )
Yu-Chen Kuo
127
Construction Canonical LR Parsing
Tables (item & goto)
Yu-Chen Kuo
128
Example (cont.)
•
•
•
Consider the following augmented grammar
S’  S
S  CC
C  cC | d
closure ( {[S’  S, $]})= I0: S’   S, $
S   CC, $
C  cC, c/d
C  d, c/d
goto(I0, S)= I1: S’  S  , $
Yu-Chen Kuo
129
Example (cont.)
•
•
•
•
goto (I0, C)= I2: S  C C, $
C  cC, $
C  d, $
goto (I0, c)= I3: C  c C, c/d
C  cC, c/d
C  d, c/d
goto(I0, d)= I4: C  d , c/d
goto (I2, C)= I5: S  C C , $
Yu-Chen Kuo
130
Example (cont.)
•
goto (I2, c)= I6:
•
•
•
•
•
•
•
goto (I2, d)= I7:
goto(I3, C)=I8:
goto(I3, c)=I3
goto(I3, d)=I4
goto(I6, C)=I9:
goto(I6, c)=I6
goto(I6,d)=I7
C  c C, $
C  cC, $
C  d, $
C  d , $
C  cC, c/d
C  cC, $
Yu-Chen Kuo
131
Example (Transition Diagram)
•
Yu-Chen Kuo
Compare I6 &
I3 with different
FOLLOW
132
Example (Transition Diagram)
Yu-Chen Kuo
133
Example (Parsing Table)
Yu-Chen Kuo
134
Canonical LR Parser vs. SLR Parser
• Every SLR(1) grammar is a LR(1) grammar.
• Canonical LR Parser (LR(1) grammar) may have
more states than SLR parser (SLR(1)) form the
same grammar.
• Exercise: check the following grammar is a LR(1)
grammar or not.
S
S
L
L




L=R
R
*R
id
R L
Yu-Chen Kuo
135
Download