Code Generation

advertisement
Code Generation
CPSC 388
Ellen Walker
Hiram College
Intermediate Representations
• Source code
– Parse tree (or abstract syntax tree)
– Symbol table
– Intermediate code
• Target code
Why Intermediate Code?
• Easier analysis for optimization
• Multiple target machines
• Direct interpretation (e.g. Java P-code)
3-Address Code
• Statements like x = y op z
• Generous use of temp. variables
– One for each internal node of (abstract)
parse tree
• Closely related to arithmetic expression
– Example: a = b*(c+d) becomes:
tmp1 = c+d
a = b*tmp1
Beyond Math Operations
• No standardized 3 address code
• Other operators in textbook
– Comparison operators (e.g. x = y == z)
– I/O (read x and write x)
– Conditional & unconditional branch
operators (if_true x goto L1, goto L2)
– Label instructions (label L1)
– Halt instruction (halt)
Representing 3-address code
• Quadruple implementation
– 4 fields: (op,y,z,x) for x=y op z
– Fields are null if not needed, e.g. (rd,x,,)
– Instead of names, put pointers into symbol
table
• Triple implementation
– 4th element is always a temp
– Don’t name temp, use triple index instead
Example: a = b+(c*d)
•
•
•
•
•
•
•
[quadruple]
(rd,c,_,_)
(rd,d,_,_)
(mul,c,d,t1)
(rd,b,_,_)
(add,b,t1,t2)
(asn,a,t2,_)
[triple]
1: (rd,c,_)
2: (rd,d,_)
3: (mul,c,d)
4: (rd,b,_)
5: (add,b,3)
6: (asn,a,5)
P-Code
• Developed for Pascal compilers
• Code for hypothetical P-machine
• P-machine is a stack (0-address)
machine [Load inst. takes 1-address]
– Load = push, Store = pop
– Operators act on top element(s) of stack
– No temp. variable names needed
P-Code operators
LDC x - load const. x
LDA x - load addr. x
LOD x - load var. x
STO - store val in addr
STN - store & push
MPI - multiply integers
SBI - subtract integers
ADI - add integers
RDI -read int
WRI - write int
LAB - label
FJP - jump on false
GRT - >
EQU - =
STP - stop
Example: a = b+(c*d)
•
•
•
•
•
•
•
LDA a
LOD d
LOD c
MPI
LOD b
ADI
STO
P-Code as attribute
• Include code (so far) as attribute in
attribute grammar
– exp -> id = exp
• $$.code = LDA $1.name; $3.code; STN
– aexp -> aexp+factor
• $$.code = $1.code;$3.code;ADI
– factor -> id
• $$.code = LOD $1.name
Generating 3 address code
• Need a meta-function to generate temp
names (newtemp())
– exp -> id = exp
• $$.code = $3.code; “$1.name = $3.name”
– aexp -> aexp+factor
• $$.name = newtemp()
• $$.code =
“$1.code;$3.code;$$.name=$1.name+$3.name”
Why real compilers don’t do this
• Generating strings is inefficient
– Lots of copying
– Code, when generated, isn’t saved; just
copied around until done
– Code generation depends on inherited (not
just synthesized) attributes
• E.g. object type for assignment
• This complicates grammars!
Practical code generation
• Modified postorder traversal of syntax
tree
• Remember postorder:
– Act on the children recursively
– Act on the parent directly
• In this case, the action is “generate
code”
Code Generation
Gen_code(node *n){
switch(n->op){
case ‘+’:
gen_code(n->first);
gen_code(n->first->next);
cout << “ADI”;
break;
More Code Generation
case ‘=’:
cout << “LDA “ << t->name;
Gen_code(t->first);
cout << “STN”);
break;
…
}
Nothing new!
• Postorder traversal executes in the
same order as LALR parsing!
• Code for code generation looks almost
like the attribute grammar
– $n.code --> Generate_code(child N);
– $$.attr --> n->attr; (where n is param)
Code Gen in YACC
• Looks like attribute grammar, almost
• Use code inside expression for
assignment
Exp : id {//generate lda code} ‘=‘ exp
{generate rest}
• Can we combine code generation with
other attribute computation?
Intermediate -> Target Code
• Macro expansion
– Direct replacement of intermediate
statement with target statement(s)
– Prepend a definition file to the code, then
assemble
– But it’s not as easy as it seems
• Different data types require different code
• Compiler tracks locations, etc. separately
Intermediate -> Target Code
(cont)
• Static simulation
– Simulate results of intermediate code (i.e.
interpret it)
– Then generate equivalent assembly code
to get results
– Might include abstract interpretation (e.g.
symbolic algebra)
P-code -> 3 address code
• We must “run” the p-code to see what is on
the stack for the 3 address code
• Use a stack data structure during translation
– “new top” = “old top” + “old second”
– New temp. for “new top”
– Temp or variable names stored in stack elements
• Code is generated when stack is popped
(only)
3 address code -> pcode
• Each instruction a = b op c translates to:
– LDA a
– LOD b
– LOD c
– ADI -- or other operator based on “op”
– STO
Too much Pcode!
• 3 address code has many temps
• Temps are simply loaded & stored
without changing!
• Sequence “lda x, lod x, sto” is useless!
• Similarly, “lda x, lda t1, … sto, sto”
doesn’t really need t1
Cleaning it up
• Instead, use a tree form
– Parent is op, has label of variable name
– Children are id, num, or another op
• Assignment statements generate no
code, only an alternative label
• Pcode generated from the eventual tree
(which is essentially an expression tree)
– Extra tmp names are ignored (p. 416)
Download