Document

advertisement
Compiler Construction
강의 8
1
Context-Free Languages
Context-free languages
LR(k) ≡ LR(1)
Deterministic languages (LR(k))
LL(k) languages
Simple precedence
languages
LL(1) languages
Operator precedence
languages
The inclusion hierarchy for
context-free languages
2
Context-Free Grammars
Context-free grammars
Floyd-Evans
Parsable
Unambiguous
CFGs
Operator
Precedence
LR(k)
LL(k)
LR(1)
LALR(1)
• Operator precedence
includes some ambiguous
grammars
• LL(1) is a subset of SLR(1)
SLR(1)
LL(1)
LR(0)
The inclusion hierarchy for
context-free grammars
3
Beyond Syntax
There is a level of correctness that is deeper than grammar
fie(a,b,c,d)
int a, b, c, d;
{…}
fee() {
int f[3],g[0],
h, i, j, k;
char *p;
fie(h,i,“ab”,j, k);
k = f * i + j;
h = g[17];
printf(“<%s,%s>.\n”,
p,q);
p = 10;
}
What is wrong with this program?
(let me count the ways …)
4
Semantics
There is a level of correctness that is deeper than grammar
fie(a,b,c,d)
int a, b, c, d;
{…}
fee() {
int f[3],g[0],
h, i, j, k;
char *p;
fie(h,i,“ab”,j, k);
k = f * i + j;
h = g[17];
printf(“<%s,%s>.\n”,
p,q);
p = 10;
}
What is wrong with this program?
(let me count the ways …)
• declared g[0], used g[17]
• wrong number of args to fie()
• “ab” is not an int
• wrong dimension on use of f
• undeclared variable q
• 10 is not a character string
All of these are “deeper than
syntax”
5
Syntax-directed translation
In syntax-directed translation, we attach ATTRIBUTES to grammar
symbols.
The values of the attributes are computed by SEMANTIC RULES
associated with grammar productions.
Conceptually, we have the following flow:
In practice, however, we do everything in a single pass.
6
Syntax-directed translation
There are two ways to represent the semantic rules we
associate with grammar symbols.
− SYNTAX-DIRECTED DEFINITIONS (SDDs) do not specify the
order in which semantic actions should be executed
− TRANSLATION SCHEMES explicitly specify the ordering of
the semantic actions.
SDDs are higher level; translation schemes are closer to
an implementation
7
Syntax-Directed Definitions
8
Syntax-directed definitions
The SYNTAX-DIRECTED DEFINITION (SDD) is a
generalization of the context-free grammar.
Each grammar symbol in the CFG is given a set of
ATTRIBUTES.
Attributes can be any type (string, number, memory loc,
etc)
SYNTHESIZED attributes are computed from the values
of the CHILDREN of a node in the parse tree.
INHERITED attributes are computed from the attributes
of the parents and/or siblings of a node in the parse
tree.
9
Attribute dependencies
Given a SDD, we can describe the dependencies
between the attributes with a DEPENDENCY GRAPH.
A parse tree annotated with attributes is called an
ANNOTATED parse tree.
Computing the attributes of the nodes in a parse tree is
called ANNOTATING or DECORATING the tree.
10
Form of a syntax-directed definition
In a SDD, each grammar production A -> α has
associated with it semantic rules
b := f( c1, c2, …, ck )
where f() is a function, and either
1. b is a synthesized attribute of A, and c1, c2, …,
are attributes of the grammar symbols of α, or
2. b is an inherited attribute of one of the
symbols on the RHS, and c1, c2, … are
attributes of the grammar symbols of α
in either case, we say b DEPENDS on c1, c2, …, ck.
11
Semantic rules
Usually we actually write the semantic rules with
expressions instead of functions.
If the rule has a SIDE EFFECT, e.g. updating the symbol
table, we write the rule as a procedure call.
When a SDD has no side effects, we call it an
ATTRIBUTE GRAMMAR (Programming Languages
course에서 설명).
12
Synthesized attributes
Synthesized attributes depend only on the attributes of
children. They are the most common attribute type.
If a SDD has synthesized attributes ONLY, it is called a SATTRIBUTED DEFINITION.
S-attributed definitions are convenient since the
attributes can be calculated in a bottom-up
traversal of the parse tree.
13
Example SDD: desk calculator
Production
L -> E newline
E -> E1 + T
E -> T
T -> T1 * F
T -> F
F -> ( E )
F -> number
Semantic Rule
print( E.val )
E.val := E1.val + T.val
E.val := T.val
T.val := T1.val x F.val
T.val := F.val
F.val := E.val
F.val := number.lexval
Notice the similarity to the yacc spec from last lecture.
14
Calculating synthesized attributes
Input string: 3*5+4 newline
Annotated tree:
15
Inherited attributes
An inherited attribute is defined in terms of the
attributes of the node’s parents and/or siblings.
Inherited attributes are often used in compilers for
passing contextual information forward, for example,
the type keyword in a variable declaration statement.
16
Example SDD with an inherited attribute
Suppose we want to describe decls like “real x, y, z”
Production
D -> T L
T -> int
T -> real
L -> L1, id
L -> id
Semantic Rules
L.in := T.type
T.type := integer
T.type := real
L1.in := L.in
addtype( id.entry, L.in )
addtype( id.entry, L.in )
addtype() is just
a procedure that
sets the type
field in the
symbol table.
L.in is inherited since it depends on a sibling or parent.
17
Annotated parse tree for real id1, id2, id3
What are the
dependencies?
18
Dependency Graphs
19
Dependency graphs
If an attribute b depends on attribute c, then attribute b
has to be evaluated AFTER c.
DEPENDENCY GRAPHS visualize these requirements.
− Each attribute is a node
− We add edges from the node for attribute c to the node for
attribute b, if b depends on c.
− For procedure calls, we introduce a dummy synthesized
attribute that depends on the parameters of the procedure
calls.
20
Dependency graph example
Production
E -> E1 + E2
Semantic Rule
E.val := E1.val + E2.val
Wherever this rule appears in the parse, tree we draw:
21
Example dependency graph
22
Example dependency graph
23
Finding a valid evaluation order
A TOPOLOGICAL SORT of a directed acyclic graph
orders the nodes so that for any nodes a and b such
that a -> b, a appears BEFORE b in the ordering.
There are many possible topological orderings for a
DAG.
Each of the possible orderings gives a valid order for
evaluation of the semantic rules.
24
Example dependency graph
25
Syntax Trees
26
Application: syntax tree construction
One thing SDDs are useful for is construction of
SYNTAX TREES.
Recall from Lecture 1 that a syntax tree is a condensed
form of parse tree.
Syntax trees are useful for representing programming
language constructs like expressions and statements.
They help compiler design by decoupling parsing from
translation.
27
Syntax trees
Leaf nodes for operators and keywords are removed.
Internal nodes corresponding to uninformative non-terminals are
replaced by the more meaningful operators.
28
SDD for syntax tree construction
We need some functions to help us build the syntax
tree:
− mknode(op,left,right) constructs an operator node with label
op, and two children, left and right
− mkleaf(id,entry) constructs a leaf node with label id and a
pointer to a symbol table entry
− mkleaf(num,val) constructs a leaf node with label num and
the token’s numeric value val
Use these functions to build a syntax tree for a-4+c:
− P1 := mkleaf( id, st_entry_for_a )
− P2 := …
29
SDD for syntax tree construction
Production
Semantic Rules
E -> E1 + T
E -> E1 - T
E -> T
T -> ( E )
T -> id
T -> num
E.nptr := mknode( ‘+’, E1.nptr,T.nptr)
E.nptr := mknode( ‘-’, E1.nptr,T.nptr)
E.nptr := T.nptr
T.nptr := E.nptr
T.nptr := mkleaf( id, id.entry )
T.nptr := mkleaf( num, num.val )
Note that this is a S-attributed definition.
Try to derive the annotated parse tree for a-4+c.
30
Evaluating SDDs Bottom-up
31
Bottom-up evaluation of S-attributed defn
How can we build a translator for a given SDD?
For S-attributed definitions, it’s pretty easy!
A bottom-up shift-reduce parser can evaluate the
(synthesized) attributes as the input is parsed.
We store the computed attributes with the grammar
symbols and states on the stack.
When a reduction is made, we calculate the values of
any synthesized attributes using the alreadycomputed attributes from the stack.
32
Bottom-up evaluation of S-attributed defns
In the scheme, our parser’s stack now stores grammar
symbols AND attribute values.
For every production A -> XYZ with semantic rule
A.a := f( X.x, Y.y, Z.z ), before XYZ is reduced to A,
we should already have X.x Y.y and Z.z on the stack.
33
Desk calculator example
If attribute values are placed on the stack as described, it is now
easy to implement the semantic rules for the desk calculator.
Production
Semantic Rule
Code
L -> E newline
E -> E1 + T
E -> T
T -> T1 * F
T -> F
F -> ( E )
F -> number
print( E.val )
E.val := E1.val + T.val
E.val := T.val
T.val := T1.val x F.val
T.val := F.val
F.val := E.val
F.val := number.lexval
print val[top-1]
val[newtop] = val[top-2]+val[top]
/*newtop==top, so nothing to do*/
val[newtop] = val[top-2]+val[top]
/*newtop==top, so nothing to do*/
val[newtop] = val[top-1]
/*newtop==top, so nothing to do*/
34
Desk calculator example
For input 3 * 5 + 4 newline, what happens?
Assume when a terminal’s attribute is shifted when it is.
Input
3*5+4n
*5+4n
States
number
Values
3
Action
shift
reduce
reduce
shift
shift
reduce
reduce
reduce
shift
shift
reduce
reduce
reduce
shift
reduce
F->number
T->F
F->number
T->T*F
E->T
F->number
T->F
E->E+T
E->En
35
L-attributed definitions
S-attributed definitions only allow synthesized attributes.
We saw earlier that inherited attributes are useful.
But we prefer definitions that can be evaluated in one
pass.
L-ATTRIBUTED definitions are the set of SDDs whose
attributes can be evaluated in a DEPTH-FIRST
traversal of the parse tree.
36
Depth-first traversal
algorithm dfvisit( node n ) {
for each child m of n, in left-to-right order, do {
evaluate the inherited attributes of m
dfvisit( m )
}
evaluate the synthesized attributes of n
}
37
L-attributed definitions
If a definition can be evaluated by dfvisit() we say it is
L-attributed.
Another way of putting it: a SDD is L-attributed if each
INHERITED attribute of Xi on the RHS of a production
A -> X1 X2 … Xn depends ONLY on:
− The attributes of X1, …, Xi-1 (to the LEFT of Xi in the
production)
− The INHERITED attributes of A.
Since S-attributed definitions have no inherited
attributes, they are necessarily L-attributed.
38
L-attributed definitions
Is the following SDD L-attributed?
Production
A -> L M
A -> Q R
Semantic Rules
L.i := l(A.i)
M.i := m(L.s)
A.s := f(M.s)
R.i := r(A.i)
Q.i := q(R.s)
A.s := f(Q.s)
39
Translation Schemes
40
Translation schemes
Translation schemes are another way to describe
syntax-directed translation.
Translation schemes are closer to a real implementation
because the specify when, during the parse,
attributes should be computed.
Example, for conversion of INFIX expressions to
POSTFIX:
E -> T R
R -> addop T { print ( addop.lexeme ) } | ε
T -> num { print( num.val ) }
This translation scheme will turn 9-5+2 into 95-2+
41
Turning a SDD into a translation scheme
For a translation scheme to work, it must be the case that an
attribute is computed BEFORE it is used.
If the SDD is S-attributed, it is easy to create the translation
scheme implementing it:
Production
T -> T1 * F
Semantic Rule
T.val := T1.val x F.val
Translation scheme:
T -> T1 * F { T.val = T1.val * F.val }
That is, we just turn the semantic rule into an action and add at the
far right hand side. This DOES NOT WORK for inherited attribs!
42
Turning a SDD into a translation scheme
With inherited attributes, the translation scheme
designer needs to follow three rules:
1. An inherited attribute for a symbol on the RHS MUST
be computed in an action BEFORE the occurrence
of the symbol.
2. An action MUST NOT refer to the synthesized
attribute of a symbol to the right of the action.
3. A synthesized attribute for the LHS nonterminal can
ONLY be computed in an action FOLLOWING the
symbols for all the attributes it references.
43
Example
This translation scheme does NOT follow the rules:
S -> A1 A2
A -> a
{ A1.in = 1; A2.in = 2 }
{ print( A.in ) }
If we traverse the parse tree depth first, A1.in has not
been set when referred to in the action print( A.in )
S -> { A1.in = 1 } A1 { A2.in = 2 } A2
A -> a { print( A.in ) }
44
Bottom-up evaluation of inherited attributes
The first step is to convert the SDD to a valid translation
scheme.
Then a few “tricks” have to be applied to the translation
scheme.
It is possible, with the right tricks, to do one-pass
bottom-up attribute evaluation for ALL LL(1)
grammars and MOST LR(1) grammars, if the SDD is
L-attributed.
This means when adding semantic actions to your yacc
specifications, you might run into trouble. See
section 5.6 of the text!
45
Beyond Syntax
These questions are part of context-sensitive analysis
Answers depend on values, not parts of speech
Questions & answers involve non-local information
Answers may involve computation
How can we answer these questions?
Use formal methods
− Context-sensitive grammars?
− Attribute grammars?
(attributed grammars?)
Use ad-hoc techniques
− Symbol tables
− Ad-hoc code
(action routines)
In scanning & parsing, formalism won; different story here.
46
Beyond Syntax
Telling the story
The attribute grammar formalism is important
− Succinctly makes many points clear
− Sets the stage for actual, ad-hoc practice
The problems with attribute grammars motivate practice
− Non-local computation
− Need for centralized information
Some folks in the community still argue for attribute grammars
We will cover attribute grammars, then move on to ad-hoc ideas
47
Attribute Grammars
What is an attribute grammar?
A context-free grammar augmented with a set of rules
Each symbol in the derivation has a set of values, or attributes
The rules specify how to compute a value for each attribute
Example grammar
Number → Sign List
Sign
→+
| –
List
→ List Bit
| Bit
Bit
→0
|1
This grammar describes
signed binary numbers
We would like to augment
it with rules that compute
the decimal value of each
valid input string
48
Examples
For “-1”
Number
For “-101”
→
→
→
→
Sign List
– List
– Bit
–1
Number
Sign
–
Number →
→
→
→
→
→
→
→
Sign List
Sign List Bit
Sign List 1
Sign List Bit 1
Sign List 1 1
Sign Bit 0 1
Sign 1 0 1
– 101
Number
Sign
List
List
–
Bit
List
List
Bit
Bit
0
1
Bit
1
1
49
Attribute Grammars
Add rules to compute the decimal value of a signed binary number
Productions
Attribution Rules
Symbol
Attributes
Number → Sign List
List.pos ← 0
If Sign.neg
then Number.val ← – List.val
else Number.val ← List.val
Number
val
Sign
neg
List
pos, val
→+
Sign.neg ← false
Bit
pos, val
| –
Sign.neg ← true
Sign
List0
Bit
→ List1 Bit
List1.pos ← List0.pos + 1
Bit.pos ← List0.pos
List0.val ← List1.val + Bit.val
| Bit
Bit.pos ← List.pos
List.val ← Bit.val
→ 0
| 1
Bit.val ← 0
Bit.val ← 2Bit.pos
50
Back to The Examples
Rules + parse tree
imply an attribute
dependence graph
For “-1”
Number.val ← –List.val≡ –1
One possible evaluation order:
1.
2.
3.
4.
5.
6.
Number
neg ← true
Sign
List
List.pos ← 0
List.val ← Bit.val≡ 1
–
Bit
Bit.pos ← 0
Bit.val ← 2Bit.pos ≡ 1
List.pos
Sign.neg
Bit.pos
Bit.val
List.val
Number.val
Other orders are possible
1
Knuth suggested a data-flow model for evaluation
Independent attributes first
Others in order as input values become available
Evaluation order
must be consistent
with the attribute
dependence graph
51
Back to the Examples
This is the complete
attribute dependence
graph for “–101”.
It shows the flow of all
attribute values in the
example.
Some flow downward
→ inherited attributes
Some flow upward
→ synthesized attributes
A rule may use attributes
in the parent, children, or
siblings of a node
52
The Rules of The Game
Attributes associated with nodes in parse tree
Rules are value assignments associated with productions
Attribute is defined once, using local information
Label identical terms in production for uniqueness
Rules & parse tree define an attribute dependence graph
− Graph must be non-circular
This produces a high-level, functional specification
Synthesized attribute
− Depends on values from children
Inherited attribute
− Depends on values from siblings & parent
53
Using Attribute Grammars
Attribute grammars can specify context-sensitive actions
Take values from syntax
Perform computations with values
Insert tests, logic, …
Synthesized Attributes
• Use values from children
& from constants
• S-attributed grammars
• Evaluate in a single
bottom-up pass
Good match to LR parsing
Inherited Attributes
• Use values from parent,
constants, & siblings
• directly express context
• can rewrite to avoid them
• Thought to be more natural
Not easily done at parse time
We want to use both kinds of attribute
54
Evaluation Methods
Dynamic, dependence-based methods
Build the parse tree
Build the dependence graph
Topological sort the dependence graph
Define attributes in topological order
Rule-based methods
Analyze rules at compiler-generation time
Determine a fixed (static) ordering
Evaluate nodes in that order
(treewalk)
Oblivious methods
(passes, dataflow)
Ignore rules & parse tree
Pick a convenient order (at design time) & use it
55
Back to the Example
Number
Sign
–
List
List
Bit
List
Bit
1
0
Bit
1
For “-101”
56
Back to the Example
Number
Sign
–
List
Bit
1
val:
neg:
List
pos:
val:
pos:
val:
pos:0
val:
List
pos:
val:
Bit
Bit
pos:
val:
pos:
val:
1
0
For “-101”
57
Back to the Example
Number
Sign
–
List
Bit
1
val:-5
Inherited Attributes
neg:true
List
pos:2
val:4
pos:2
val:4
List
pos:0
val:5
pos:1
val:4
Bit
Bit
pos:1
val:0
pos:0
val:1
1
0
For “-101”
58
Back to the Example
Number
Sign
–
List
Bit
1
val:-5
Synthesized Attributes
neg:true
List
pos:2
val:4
pos:2
val:4
List
pos:0
val:5
pos:1
val:4
Bit
Bit
pos:1
val:0
pos:0
val:1
1
0
For “-101”
59
Back to the Example
Number
Sign
–
List
Bit
1
val:-5
Synthesized Attributes
neg:true
List
pos:2
val:4
pos:2
val:4
List
pos:0
val:5
pos:1
val:4
Bit
Bit
pos:1
val:0
pos:0
val:1
1
0
For “-101”
60
Back to the Example
Number
Sign
–
List
Bit
1
val:-5
If we show the computation …
neg:true
List
pos:2
val:4
pos:2
val:4
List
pos:0
val:5
pos:1
val:4
Bit
Bit
pos:1
val:0
pos:0
val:1
& then peel away the parse tree …
1
0
For “-101”
61
Back to the Example
All that is left is the attribute
dependence graph.
val:-5
neg:true
pos:2
val:4
pos:2
val:4
1
pos:0
val:1
pos:1
val:4
–
This succinctly represents the
flow of values in the problem
instance.
pos:0
val:5
pos:1
val:0
The dynamic methods sort this
graph to find independent
values, then work along graph
edges.
1
The rule-based methods try to
discover “good” orders by
analyzing the rules.
0
For “-101”
The oblivious methods ignore
the structure of this graph.
The dependence graph must be acyclic
62
Circularity
We can only evaluate acyclic instances
We can prove that some grammars can only generate instances with
acyclic dependence graphs
Largest such class is “strongly non-circular” grammars (SNC )
SNC grammars can be tested in polynomial time
Failing the SNC test is not conclusive
Many evaluation methods discover circularity dynamically
⇒ Bad property for a compiler to have
SNC grammars were first defined by Kennedy & Warren
63
A Circular Attribute Grammar
Productions
Attribution Rules
Number → List
List.a ← 0
List0
→ List1 Bit
List1.a ← List0.a + 1
List0.b ← List1.b
List1.c ← List1.b + Bit.val
| Bit
List0.b ← List0.a + List0.c + Bit.val
Bit
→ 0
| 1
Bit.val ← 0
Bit.val ← 2Bit.pos
64
An Extended Example
(§ 4.3.3)
Grammar for a basic block
Block0 →
|
Assign →
Expr0 →
|
|
Term0 →
|
|
Factor →
|
|
Block1 Assign
Assign
Ident = Expr ;
Expr1 + Term
Expr1 – Term
Term
Term1 * Factor
Term1 / Factor
Factor
( Expr )
Number
Identifier
Let’s estimate cycle counts
• Each operation has a COST
• Add them, bottom up
• Assume a load per value
• Assume no reuse
Simple problem for an AG
Hey, this looks useful !
65
An Extended Example
Adding attribution rules
Block0
→ Block1 Assign
Assign
| Assign
→ Ident = Expr
Expr0
;
→ Expr1 + Term
| Expr1 – Term
Term0
| Term
→ Term1 * Factor
| Term1 / Factor
Factor
|
→
|
|
Factor
( Expr )
Number
Identifier
Block0.cost ← Block1.cost +
Assign.cost
Block0.cost ← Assign.cost
Assign.cost ← COST(st ore) +
Expr.cost
Expr0.cost ← Expr1.cost +
COST(add) + Term.cost
Expr0.cost ← Expr1.cost +
COST(add) + Term.cost
Expr0.cost ← Term.cost
Term0.cost ← Term1.cost +
COST(mult ) + Factor.cost
Term0.cost ← Term1.cost +
COST(div) + Factor.cost
Term0.cost ← Factor.cost
Factor.cost ← Expr.cost
Factor.cost ← COST(loadI)
Factor.cost ← COST(load)
All these
attributes
are
synthesized!
66
An Extended Example
Properties of the example grammar
All attributes are synthesized ⇒ S-attributed grammar
Rules can be evaluated bottom-up in a single pass
− Good fit to bottom-up, shift/reduce parser
Easily understood solution
Seems to fit the problem well
What about an improvement?
Values are loaded only once per block (not at each use)
Need to track which values have been already loaded
67
A Better Execution Model
Adding load tracking
Need sets Before and After for each production
Must be initialized, updated, and passed around the tree
Factor → ( Expr )
| Number
| Identifier
Factor.cost ← Expr.cost ;
Expr.Before ← Factor.Before ;
Factor.After ← Expr.After
Factor.cost ← COST(loadi) ;
Factor.After ← Factor.Before
if(Identifier.name ∉ Factor.Before)
then
Factor.cost ← COST(load);
Factor.After ← Factor.Before
∪ Identifier.name
else
Factor.cost ← 0
Factor.After ← Factor.Before
This looks more complex!
68
A Better Execution Model
Load tracking adds complexity
But, most of it is in the “copy rules”
Every production needs rules to copy Before & After
A Sample Production
Expr0 → Expr1 + Term
Expr0 .cost ← Expr1.cost +
COST(add) + Term.cost ;
Expr1.Before ← Expr0.Before ;
Term.Before ← Expr1.After;
Expr0.After ← Term.After
These copy rules multiply rapidly
Each creates an instance of the set
Lots of work, lots of space, lots of rules to write
69
An Even Better Model
What about accounting for finite register sets?
Before & After must be of limited size
Adds complexity to Factor→Identifier
Requires more complex initialization
Jump from tracking loads to tracking registers is small
Copy rules are already in place
Some local code to perform the allocation
70
The Extended Example
Tracking loads
Introduced Before and After sets to record loads
Added ≥ 2 copy rules per production
− Serialized evaluation into execution order
Made the whole attribute grammar large & cumbersome
Finite register set
Complicated one production (Factor → Identifier)
Needed a little fancier initialization
Changes were quite limited
Why is one change hard and the other easy?
71
The Moral of the Story
Non-local computation needed lots of supporting rules
Complex local computation was relatively easy
The Problems
Copy rules increase cognitive overhead
Copy rules increase space requirements
− Need copies of attributes
− Can use pointers, for even more cognitive overhead
Result is an attributed tree
− Must build the parse tree
− Either search tree for answers or copy them to the root
72
Addressing the Problem
Ad-hoc techniques
Introduce a central repository for facts
Table of names
− Field in table for loaded/not loaded state
Avoids all the copy rules, allocation & storage headaches
All inter-assignment attribute flow is through table
−
−
−
−
Clean, efficient implementation
Good techniques for implementing the table
When its done, information is in the table !
Cures most of the problems
(hashing, § B.4)
Unfortunately, this design violates the functional paradigm
Do we care?
73
The Realist’s Alternative
Ad-hoc syntax-directed translation
Associate a snippet of code with each production
At each reduction, the corresponding snippet runs
Allowing arbitrary code provides complete flexibility
− Includes ability to do tasteless & bad things
To make this work
Need names for attributes of each symbol on lhs & rhs
− Typically, one attribute passed through parser + arbitrary code
(structures, globals, statics, …)
− Yacc introduced $$, $1, $2, … $n, left to right
ANTLR allows defining variables
Need an evaluation scheme
− Should fit into the parsing algorithm
74
Reworking the Example
Block0
Assign
Expr0
Term0
Factor
→ Block1 Assign
| Assign
→ Ident = Expr ;
→ Expr1 + Term
| Expr1 – Term
| Term
→ Term1 * Factor
| Term1 / Factor
| Factor
→ ( Expr )
| Number
| Identifier
cost← cost + COST(store);
cost← cost + COST(add);
cost← cost + COST(sub);
cost← cost + COST(mult);
cost← cost + COST(div);
cost← cost + COST(loadi);
{ i← hash(Identifier);
if(Table[i].loaded = false)
then {
cost ← cost + COST(load);
Table[i].loaded ← true;
}
}
This looks
cleaner &
simpler than
the AG sol’n !
One missing detail:
initializing cost
75
Reworking the Example
Start
Init
Block0
→ Init Block
→ε
→ Block1 Assign
| Assign
Assign → Ident = Expr ;
cost ← 0;
cost← cost + COST(store);
… and so on as in the previous version of the example …
Before parser can reach Block, it must reduce Init
Reduction by Init sets cost to zero
This is an example of splitting a production to create a reduction
in the middle — for the sole purpose of hanging an action routine
there!
76
Reworking the Example
Block0
Assign
Expr0
Term0
Factor
→ Block1 Assign
| Assign
→ Ident = Expr ;
→ Expr1 + Term
| Expr1 – Term
| Term
→ Term1 * Factor
| Term1 / Factor
| Factor
→ ( Expr )
| Number
| Identifier
$$ ← $1 + $2 ;
$$ ← $1 ;
$$ ← COST(store) + $3;
$$ ← $1 + COST(add) + $3;
$$ ← $1 + COST(sub) + $3;
$$ ← $1;
$$ ← $1 + COST(mult) + $3;
$$ ← $1 + COST(div) + $3;
$$ ← $1;
$$ ← $2;
$$ ← COST(loadi);
{ i← hash(Identifier);
if (Table[i].loaded = false)
then {
$$ ← COST(load);
Table[i].loaded ← true;
}
else $$ ← 0
}
This version
passes the
values through
attributes. It
avoids the need
for initializing
“cost”
77
Reworking the Example
Assume constructors for each node
Assume stack holds pointers to nodes
Assume yacc syntax
Goal
Expr
→
→
|
|
Term →
|
|
Factor →
|
|
Expr
Expr + Term
Expr – Term
Term
Term * Factor
Term / Factor
Factor
( Expr )
number
id
$$
$$
$$
$$
$$
$$
$$
$$
$$
$$
=
=
=
=
=
=
=
=
=
=
$1;
MakeAddNode($1,$3);
MakeSubNode($1,$3);
$1;
MakeMulNode($1,$3);
MakeDivNode($1,$3);
$1;
$2;
MakeNumNode(token);
MakeIdNode(token);
78
Reality
Most parsers are based on this ad-hoc style of context-sensitive
analysis
Advantages
Addresses the shortcomings of the AG paradigm
Efficient, flexible
Disadvantages
Must write the code with little assistance
Programmer deals directly with the details
Annotate action code with grammar rules
79
Typical Uses
Building a symbol table
− Enter declaration information as processed
− At end of declaration syntax, do some post processing
− Use table to check errors as parsing progresses
Simple error checking/type checking
−
−
−
−
Define before use → lookup on reference
Dimension, type, ... → check as encountered
Type conformability of expression → bottom-up walk
Procedure interfaces are harder
Build a representation for parameter list & types
Create list of sites to check
Check offline, or handle the cases for arbitrary orderings
80
Is This Really “Ad-hoc” ?
Relationship between practice and attribute grammars
Similarities
Both rules & actions associated with productions
Application order determined by tools, not author
(Somewhat) abstract names for symbols
Differences
Actions applied as a unit; not true for AG rules
Anything goes in ad-hoc actions; AG rules are functional
AG rules are higher level than ad-hoc actions
81
Making Ad-hoc SDT Work
What about a rule that must work in mid-production?
Can transform the grammar
− Split it into two parts at the point where rule must go
− Apply the rule on reduction to the appropriate part
Can also handle reductions on shift actions
− Add a production to create a reduction
Was: fee → fum
Make it: fee → fie → fum and tie action to this reduction
ANTLR supports the above automatically
Together, these let us apply rule at any point in the parse
82
Limitations of Ad-hoc SDT
Forced to evaluate in a given order: postorder
− Left to right only
− Bottom up only
Implications
− Declarations before uses
− Context information cannot be passed down
How do you know what rule you are called from within?
Example: cannot pass bit position from right down
− Could you use globals?
Requires initialization & some re-thinking of the solution
− Can we rewrite it in a form that is better for the ad-hoc sol’n
83
Alternative Strategy
Build an abstract syntax tree
Use tree walk routines
Use “visitor” design pattern to add functionality
TreeNodeVisitor
VisitAssignment(AssignmentNode)
VisitVariableRef(VariableRefNode)
TypeCheckVisitor
AnalysisVisitor
VisitAssignment(AssignmentNode)
VisitAssignment(AssignmentNode)
VisitVariableRef(VariableRefNode)
VisitVariableRef(VariableRefNode)
84
Visitor Treewalk
Parallel structure of tree:
Separates treewalk code from node handling code
Facilitates change in processing without change to tree structure
TreeNode
Accept(NodeVisitor)
AssignmentNode
VariableRefNode
Accept(NodeVisitor v)
Accept(NodeVisitor v)
v.VisitAssignment(this)
v.VisitVariableRef(this)
85
Summary
Wrap-up of parsing
− More example to build LR(1) table
Attribute Grammars
− Pros: Formal, powerful, can deal with propagation strategies
− Cons: Too many copy rules, no global tables, works on parse tree
Ad-hoc SDT
− Annotate production with ad-hoc action code
− Postorder Code Execution
Pros: Simple and functional, can be specified in grammar, does not require parse
tree
Cons: Rigid evaluation order, no context inheritance
− Generalized Tree Walk
Pros: Full power and generality, operates on abstract syntax tree (using Visitor
pattern)
Cons: Requires specific code for each tree node type, more complicated
Powerful tools like ANTLR can help with this
86
Download