AbstractSyntax

advertisement
Abstract Syntax Trees
COMP2010: Compiler Engineering
Bernd Fischer
b.fischer@ecs.soton.ac.uk
Parse trees represent derivations.
(a+1)*b
E
F
EC
T
(
E
)
F
FC
ID(a)
ε
+
ε
T
EC
*
ID(b) ε
EC
T
FC
F
EC
T
ε
• each path corresponds
to possible call stack
• contains “punctuation”
tokens: (, ), begin, ...
⇒ concrete syntax tree
• contains redundant
non-terminal symbols
⇒ chain rules: E → F → T
⇒ too much detail!
E
NUM(1)
E
EC
F
FC
T
→ F EC
→ + F EC | – F EC | ε
→ T FC
→ * T FC | / T FC | ε
→ ( E ) | ID | NUM
E
*
ID(b)
ID(a) + NUM(1)
?
How do we
get there?
Abstract syntax trees represent
the essential structure of derivations.
Abstract syntax drops detail: (a+1)*b
• punctuation tokens
• chain productions
ID(a)
E
EC
F
FC
T
→ F EC
→ + F EC | – F EC | ε
→ T FC
→ * T FC | / T FC | ε
→ ( E ) | ID | NUM
E
E
*
ID(b)
+ NUM(1)
E → E+E
|E–E
|E*E
|E/E
| ID | NUM
Abstract syntax rules can be ambiguous
• only describes structure of legal trees
• not meant for parsing
• usually allows unparsing (text reconstruction)
⇒ abstract syntax tree (AST) is clean interface
Manually building ASTs in Java
Design principle based on abstract syntax grammar:
• One abstract class per non-terminal
• One concrete class per rule
– One field per non-terminal on rhs
public abstract class Expr {}
public class Num extends Expr {
public int val;
public Num(int v) { val=v;}
}
public class Sum extends Expr {
public Expr left,right;
public Sum(Expr l, Expr r)
{left = l; right = r;}
}
public class Diff extends Expr {
…
Alternatively:
public class Binop extends Expr {
public Expr left,right;
public int op;
public Binop(Expr l, Expr r, int o; )
{left = l; right = r; op = o;}
}
Manually building ASTs in Java
Design principle based on abstract syntax grammar:
• One abstract class per non-terminal
• One concrete class per rule
– One field per non-terminal on rhs
public abstract class Expr {}
For error reporting:
public class Num extends Expr {
public int val;
public Num(int v) { val=v;}
}
public class Expr {
public FilePos start,end;
}
public class Sum extends Expr {
public Expr left,right;
public Sum(Expr l, Expr r)
{left = l; right = r;}
}
public class Diff extends Expr {
…
public class Sum extends Expr {
public Sum(Expr l,r)
{left = l; right = r;
start = l.start; end = r.end;}
}
Manually building ASTs in Java (II)
/* T -> ( E ) | Num */
public static Expr T() throws ParseException {
Expr r;
switch(token) {
case '(':
advance(); r = E(); eat(')'); return r;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
return Num(); break;
default:
throw new ParseException("in T");
}
}
• change return value type from void
• add explicit returns
• add auxiliary variables for results of recursive calls
Problem: left-factorization
moves left arguments upwards.
-
???
+
E
3+2-1
3
T
NUM(3)
T
NUM(2)
E → T EC
EC → + T EC | – T EC | ε
T → ( E ) | ID | NUM
2
EC
+
EC
–
1
T
EC
NUM(1)
ε
Manually building ASTs in Java (III)
/* E -> F EC */
public static Expr E() throws ParseException {
Expr left = F();
return EC(left);
}
/* EC -> + F EC | - F EC | epsilon */
public static Expr EC(Expr left) throws ParseException {
Expr right;
switch(token) {
add semantic value as argument to
case ')': case '\n':
functions for left-factorized symbols
return left;
case '+':
advance(); right = F();
return EC(new Binop(left, right, PLUS);
case '-':
advance(); right = F();
return EC(new Binop(left, right, MINUS);
default: ...
}
}
ANTLR automates building ASTs.
Design principle: add tree building instructions to rules
rule: rule-elems1 -> build-instr1
| rule-elems2 -> build-instr2
...
| rule-elemsn -> build-instrn
;
• build instructions are automatically executed
when rule is applied
• build instructions return AST node or AST node list
• use with
options{output=AST;ASTLabelType=CommonTree;}
Basic AST building instructions
• reference: use AST node from parse element
trm: '(' exp ')' -> exp;
return exp AST,
ignore brackets
• named reference: resolve ambiguities
add: l=exp '+' r=exp -> $l $r;
• node construction: build tagged node
return list with
both exp ASTs
ext: 'exit' exp -> ^('exit’ exp);
tag token
children
dcl: type ID -> ^(VARDCL ID type);
virtual tag token
children
(must be defined in tokens)
‘exit'
exp
Collecting and duplicating elements
• list elements can be collected into a single list:
args: arg (',' arg)* -> arg+;
• individual elements can be copied into lists:
dcl: type ID (',' ID)*
-> ^(VARDCL type ID+);
VARDCL
type [ID, ID, ID, ...]
vs.
dcl: type ID (',' ID)*
-> ^(VARDCL type ID)+;
VARDCL→ VARDCL→
type
ID type
ID
Building alternative trees
• nodes can be null:
init: exp? -> ^(INIT exp)?;
• nodes can be built for empty input:
skip: -> ^SKIP;
'for'
dcl COND ITER stmts
• sub-trees can be added:
c
i
for: 'for' '(' dcl? ';' c=exp? ';' i=exp? ')' stmts
-> ^('for' dcl? ^(COND $c)? ^(ITER $i)? stmts);
• nodes can be built in rule alternatives:
if: 'if' '(' expr ')' s1=stmt
('else' s2=stmt -> ^(IFELSE expr $s1 $s2)
|
-> ^('if' expr $s1)
);
Updating trees
• nodes can be initialized in rule parts and updated:
exp: (INT
-> INT)
('+' i=INT -> ^('+' $exp $i))*;
1:
1+2:
1+2+3:
INT(1)
'+'
'+'
$exp
INT(2)
$exp
'+'
INT(1) INT(2)
INT(3)
'+'
'+'
INT(3)
INT(1) INT(2)
Download