Code Generation CPSC 388 Ellen Walker Hiram College Intermediate Representations • Source code – Parse tree (or abstract syntax tree) – Symbol table – Intermediate code • Target code Why Intermediate Code? • Easier analysis for optimization • Multiple target machines • Direct interpretation (e.g. Java P-code) 3-Address Code • Statements like x = y op z • Generous use of temp. variables – One for each internal node of (abstract) parse tree • Closely related to arithmetic expression – Example: a = b*(c+d) becomes: tmp1 = c+d a = b*tmp1 Beyond Math Operations • No standardized 3 address code • Other operators in textbook – Comparison operators (e.g. x = y == z) – I/O (read x and write x) – Conditional & unconditional branch operators (if_true x goto L1, goto L2) – Label instructions (label L1) – Halt instruction (halt) Representing 3-address code • Quadruple implementation – 4 fields: (op,y,z,x) for x=y op z – Fields are null if not needed, e.g. (rd,x,,) – Instead of names, put pointers into symbol table • Triple implementation – 4th element is always a temp – Don’t name temp, use triple index instead Example: a = b+(c*d) • • • • • • • [quadruple] (rd,c,_,_) (rd,d,_,_) (mul,c,d,t1) (rd,b,_,_) (add,b,t1,t2) (asn,a,t2,_) [triple] 1: (rd,c,_) 2: (rd,d,_) 3: (mul,c,d) 4: (rd,b,_) 5: (add,b,3) 6: (asn,a,5) P-Code • Developed for Pascal compilers • Code for hypothetical P-machine • P-machine is a stack (0-address) machine [Load inst. takes 1-address] – Load = push, Store = pop – Operators act on top element(s) of stack – No temp. variable names needed P-Code operators LDC x - load const. x LDA x - load addr. x LOD x - load var. x STO - store val in addr STN - store & push MPI - multiply integers SBI - subtract integers ADI - add integers RDI -read int WRI - write int LAB - label FJP - jump on false GRT - > EQU - = STP - stop Example: a = b+(c*d) • • • • • • • LDA a LOD d LOD c MPI LOD b ADI STO P-Code as attribute • Include code (so far) as attribute in attribute grammar – exp -> id = exp • $$.code = LDA $1.name; $3.code; STN – aexp -> aexp+factor • $$.code = $1.code;$3.code;ADI – factor -> id • $$.code = LOD $1.name Generating 3 address code • Need a meta-function to generate temp names (newtemp()) – exp -> id = exp • $$.code = $3.code; “$1.name = $3.name” – aexp -> aexp+factor • $$.name = newtemp() • $$.code = “$1.code;$3.code;$$.name=$1.name+$3.name” Why real compilers don’t do this • Generating strings is inefficient – Lots of copying – Code, when generated, isn’t saved; just copied around until done – Code generation depends on inherited (not just synthesized) attributes • E.g. object type for assignment • This complicates grammars! Practical code generation • Modified postorder traversal of syntax tree • Remember postorder: – Act on the children recursively – Act on the parent directly • In this case, the action is “generate code” Code Generation Gen_code(node *n){ switch(n->op){ case ‘+’: gen_code(n->first); gen_code(n->first->next); cout << “ADI”; break; More Code Generation case ‘=’: cout << “LDA “ << t->name; Gen_code(t->first); cout << “STN”); break; … } Nothing new! • Postorder traversal executes in the same order as LALR parsing! • Code for code generation looks almost like the attribute grammar – $n.code --> Generate_code(child N); – $$.attr --> n->attr; (where n is param) Code Gen in YACC • Looks like attribute grammar, almost • Use code inside expression for assignment Exp : id {//generate lda code} ‘=‘ exp {generate rest} • Can we combine code generation with other attribute computation? Intermediate -> Target Code • Macro expansion – Direct replacement of intermediate statement with target statement(s) – Prepend a definition file to the code, then assemble – But it’s not as easy as it seems • Different data types require different code • Compiler tracks locations, etc. separately Intermediate -> Target Code (cont) • Static simulation – Simulate results of intermediate code (i.e. interpret it) – Then generate equivalent assembly code to get results – Might include abstract interpretation (e.g. symbolic algebra) P-code -> 3 address code • We must “run” the p-code to see what is on the stack for the 3 address code • Use a stack data structure during translation – “new top” = “old top” + “old second” – New temp. for “new top” – Temp or variable names stored in stack elements • Code is generated when stack is popped (only) 3 address code -> pcode • Each instruction a = b op c translates to: – LDA a – LOD b – LOD c – ADI -- or other operator based on “op” – STO Too much Pcode! • 3 address code has many temps • Temps are simply loaded & stored without changing! • Sequence “lda x, lod x, sto” is useless! • Similarly, “lda x, lda t1, … sto, sto” doesn’t really need t1 Cleaning it up • Instead, use a tree form – Parent is op, has label of variable name – Children are id, num, or another op • Assignment statements generate no code, only an alternative label • Pcode generated from the eventual tree (which is essentially an expression tree) – Extra tmp names are ignored (p. 416)