Document

advertisement
CIS 461
Compiler Design & Construction
Fall 2012
slides derived from Tevfik Bultan, Keith Cooper, and
Linda Torczon
Lecture-Module #12
Parsing 4
1
Parsing Techniques
Top-down parsers
(LL(1), recursive descent)
•
Start at the root of the parse tree from the start symbol and grow toward leaves
(similar to a derivation)
•
Pick a production and try to match the input
•
Bad “pick”  may need to backtrack
•
Some grammars are backtrack-free
Bottom-up parsers
(predictive parsing)
(LR(1), operator precedence)
•
Start at the leaves and grow toward root
•
We can think of the process as reducing the input string to the start symbol
•
At each reduction step a particular substring matching the right-side of a
production is replaced by the symbol on the left-side of the production
•
Bottom-up parsers handle a large class of grammars
2
Top-down Parsing
S
A
fringe of the
parse tree
start symbol
D
B
?
C
S
left-to-right
scan
?
left-most
derivation
lookahead
Bottom-up Parsing
lookahead
S
input string
upper fringe of
the parse tree
?
A
D
right-most
derivation
in reverse
C
lookahead
3
Handle-pruning, Bottom-up Parsers
The process of discovering a handle & reducing it to the
appropriate left-hand side is called handle pruning
Handle pruning forms the basis for a bottom-up parsing method
To construct a rightmost derivation
S  0  1  2  …  n-1  n  w
Apply the following simple algorithm
for i  n to 1 by -1
Find the handle < i i , ki > in i
Replace i with i to generate i-1
4
Example
1
2
3
4
5
6
7
8
9
S
 Expr
Expr  Expr + Term
| Expr – Term
| Term
Term  Term * Factor
| Term / Factor
| Factor
Factor  num
| id
Sentential Form
S
Expr
Expr – Term
Expr – Term * Factor
Expr – Term * <id,y>
Expr – Factor * <id,y>
Expr – <num,2> * <id,y>
Term – <num,2> * <id,y>
Factor – <num,2> * <id,y>
<id,x> – <num,2> * <id,y>
Handle
Prod’n , Pos’n
—
1,1
3,3
5,5
9,5
7,3
8,3
4,1
7,1
9,1
The expression grammar
Handles for rightmost derivation of input
string:
x–2*y
5
Handle-pruning, Bottom-up Parsers
One implementation technique is the shift-reduce parser
push $
lookahead = get_ next_token( )
repeat until (top of stack == start symbol and lookahead == $)
if the top of the stack is a handle 
then /* reduce  to  */
pop || symbols off the stack
push  onto the stack
else if (lookahead  $)
then /* shift */
push lookahead
lookahead = get_next_token( )
How do errors show
up?
• failure to find a handle
• hitting $ and needing
to
shift (final else clause)
Either generates an
error
6
Example, Corresponding Parse Tree
S
Expr
Expr
–
Term
Term
Term
*
Fact.
Fact.
Fact.
<id,y>
<id,x> <num,2>
1. Shift until top-of-stack is the right end of a
handle
2. Pop the left end of the handle & reduce
5 shifts +
9 reduces +
1 accept
7
Shift-reduce Parsing
Shift reduce parsers are easily built and easily understood
A shift-reduce parser has just four actions
• Shift — next word is shifted onto the stack
• Reduce — right end of handle is at top of stack
Locate left end of handle within the stack
Pop handle off stack & push appropriate lhs
• Accept — stop parsing & report success
• Error — call an error reporting/recovery routine
Handle finding is key
• handle is on stack
• finite set of handles
 use a DFA !
Accept & Error are simple
Shift is just a push and a call to the scanner
Reduce takes |rhs| pops & 1 push
If handle-finding requires state, put it in the stack
8
LR Parsers
• LR(k) parsers are table-driven, bottom-up, shift-reduce parsers
that use a limited right context (k-token lookahead) for handle
recognition
• LR(k): Left-to-right scan of the input, Rightmost derivation in reverse
with k token lookahead
A grammar is LR(k) if, given a rightmost derivation
S  0  1  2  …  n-1  n  sentence
We can
1. isolate the handle of each right-sentential form i , and
2. determine the production by which to reduce,
by scanning i from left-to-right, going at most k symbols beyond
the right end of the handle of i
9
LR Parsers
A table-driven LR parser looks like
Stack
source
code
grammar
Scanner
Table-driven
Parser
Parser
Generator
ACTION &
GOTO
Tables
IR
10
LR Shift-Reduce Parsers
push($); // $ is the end-of-file symbol
push(s0); // s0 is the start state of the DFA that recognizes handles
lookahead = get_next_token();
repeat forever
s = top_of_stack();
if ( ACTION[s,lookahead] == reduce  ) then
pop 2*|| symbols;
s = top_of_stack();
push();
push(GOTO[s,]);
else if ( ACTION[s,lookahead] == shift si ) then
push(lookahead);
push(si);
lookahead = get_next_token();
else if ( ACTION[s,lookahead] == accept and lookahead == $ )
then return success;
else error();
The skeleton parser
•uses ACTION & GOTO
• does |words| shifts
• does |derivation|
reductions
• does 1 accept
11
LR Parsers (parse tables)
To make a parser for L(G), we need a set of tables
The grammar
1 S
2 Z
3
 Z
 Zz
| z
The tables
ACTION
State $
0
—
1
accept
2
reduce 3
3
reduce 2
z
shift 2
shift 3
reduce 3
reduce 2
GOTO
State Z
0
1
1
2
3
12
Example Parses
The string “z”
Stack
$ s0
$ s 0 z s2
$ s0 Z s1
Input
z$
$
$
Action
shift 2
reduce 3
accept
The string “zz”
Stack
$ s0
$ s 0 z s2
$ s0 Z s1
$ s0 Z s1 z s3
$ s0 Zs1
Input
zz$
z$
z$
$
$
Action
shift 2
reduce 3
shift 3
reduce 2
accept
13
LR Parsers
How does this LR stuff work?
• Unambiguous grammar  unique rightmost derivation
• Keep upper fringe on a stack
– All active handles include TOS
– Shift inputs until TOS is right end of a handle
Reduce
action
• Language of handles is regular
– Build a handle-recognizing DFA
S1
S3
z
– ACTION & GOTO tables encode the DFA
Z
S0
• To match subterms, recurse and leave
z
DFA’s state on stack
Reduce
S2
action
• Final states of the DFA correspond to reduce actions
Control DFA for the
– New state is GOTO[lhs , state at TOS]
simple example
– For Z, this takes the DFA to S1
14
Building LR Parsers
How do we generate the ACTION and GOTO tables?
• Use the grammar to build a model of the handle recognizing DFA
• Use the DFA model to build ACTION & GOTO tables
• If construction succeeds, the grammar is LR
How do we build the handle-recognizing DFA ?
• Encode the set of productions that can be used as handles in the DFA
state: Use LR(k) items
• Use two functions goto( s,  ) and closure( s )
– goto() is analogous to move() in the DFA to NFA conversion
– closure() is analogous to -closure
• Build up the states and transition functions of the DFA
• Use this information to fill in the ACTION and GOTO tables
15
LR(k) items
An LR(k) item is a pair [A , B], where
A is a production  with a • at some position in the rhs
B is a lookahead string of length ≤ k
(terminal symbols or $)
Examples: [• , a], [• , a], [• , a], & [• , a]
The • in an item indicates the position of the top of the stack
• LR(0) items [  •  ] (no lookahead symbol)
• LR(1) items [  •  , a ] (one token lookahead)
• LR(2) items [  •  , a b ] (two token lookahead) ...
16
LR(k) items
The • in an item indicates the position of the top of the stack
[• , a] means that the input seen so far is consistent with the use of 
immediately after the symbol on top of the stack
[• , a] means that the input seen so far is consistent with the use of 
at this point in the parse, and that the parser has already recognized .
[• , a] means that the parser has seen , and that a lookahead a is
consistent with reducing to  (for LR(k) parsers a is a string of terminal
symbols of length k)
The table construction algorithm uses items to represent valid
configurations of an LR(1) parser
17
LR(1) Items
The production •, with lookahead a, generates 4 items
[• , a], [• , a], [• , a], & [• , a]
The set of LR(1) items for a grammar is finite
What’s the point of all these lookahead symbols?
• Carry them along to choose correct reduction
• Lookaheads are bookkeeping, unless item has • at right end
– Has no direct use in [• , a]
– In [• , a], a lookahead of a implies a reduction by 
– For { [• , a],[• , b] }
lookahead = a
 reduce to ;
lookahead  FIRST()
 shift
 Limited right context is enough to pick the actions
18
Back to Finding Handles
Parser in a state where the stack (the fringe) was
Expr – Term
With lookahead of *
How did it choose to expand Term rather than reduce to Expr?
• Lookahead symbol is the key
• With lookahead of + or –, parser should reduce to Expr
• With lookahead of * or /, parser should shift
• Parser uses lookahead to decide
• All this context from the grammar is encoded in the handlerecognizing mechanism
19
Back to x - 2 * y
shift here
reduce here
1. Shift until TOS is the right end of a handle
2. Find the left end of the handle & reduce
20
LR(1) Table Construction
High-level overview
 Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1)
items), C = { I0 , I1 , ... , In }
a Introduce a new start symbol S’ which has only one production
S’  S
b Initial state, I0 should include
• [S’ •S, $], along with any equivalent items
• Derive equivalent items as closure( I0 )
c Repeatedly compute, for each Ik , and each grammar symbol , goto(Ik , )
• If the set is not already in the collection, add it
• Record all the transitions created by goto( )
This eventually reaches a fixed point
2
Fill in the ACTION and GOTO tables using the DFA
The canonical collection completely encodes the
transition diagram for the handle-finding DFA
21
Computing Closures
closure(I) adds all the items implied by items already in I
• Any item [ , a] implies [  , x] for each production
with  on the lhs, and x  FIRST(a)
• Since  is valid, any way to derive  is valid, too
The algorithm
Closure( I )
while ( I is still changing )
for each item [   •  , a]  I
for each production     P
for each terminal b  FIRST(a)
if [  •  , b]  I
then add [  •  , b] to I
Fixpoint computation
22
Example Grammar
Initial step builds the item [S  • A ,$]
and takes its closure( )
1 S
2 Z
3
 Z
 Zz
| z
Closure( [S  • A , $] )
Item
[S  • Z , $]
[Z • Z z , $]
[Z  • z , $]
[Z  • Z z , z]
[Z  • z , z]
From
Original item
1,  a is $
1,  a is $
2,  a is z $
2,  a is z $
So, initial state s0 is
{ [S • Z ,$], [Z • Z z, $],[Z• z , $], [Z • Z z , z], [Z • z , z] }
23
Computing Gotos
goto(I , x) computes the state that the parser would reach
if it recognized an x while in state I
• goto( { [   , a] },  ) produces [   , a]
• It also includes closure( [   , a] ) to fill out the state
The algorithm
Goto( I, x )
new = Ø
for each [   • x  , a]  I
new = new  [  x •  , a]
• Not a fixpoint method
• Uses closure
return closure(new)
24
Example Grammar
s0 is { [S • Z ,$], [Z • Z z, $],[Z • z , $], [Z • Z z , z], [Z • z , z] }
goto( S0 , z )
• Loop produces
Item
[Z  z • , $]
[Z  z • , z]
From
Item 3 in s0
Item 5 in s0
• Closure adds nothing since • is at end of rhs in each item
In the construction, this produces s2
{ [Z z • , {$ , z}]}
New, but obvious, notation
for two distinct items
[Zz • , $] and [Zz • , z]
25
Download