COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Lecture Outline [ALSU07] Chapter 6 Intermediate Code Representation – Section 6.2 [ALSU07] Chapter 7 Run-Time Environments – Sections 7.1, 7.2, 7.3 Copyright (c) 2011 Ioanna Dionysiou Chapter 6 Lecture Outline Intermediate Code Representation – 3-address code essentials – 3-address code implementation techniques Copyright (c) 2011 Ioanna Dionysiou Intermediate Code Representation Syntax tree Parsing Type-checking Intermediate Form Generation Assumption: Source code is parsed and type-checked Copyright (c) 2011 Ioanna Dionysiou Intermediate Languages So far, we have seen one kind of intermediate representation – Syntax trees (2nd week of classes) • Natural hierarchical structure of program A new kind – 3-address code • Abstract form Copyright (c) 2011 Ioanna Dionysiou 3-address code It is a sequence of statements of the general form x := y op z Arithmetic operator Logical operator Names Constants temporaries Copyright (c) 2011 Ioanna Dionysiou 3-address code /* source language expression */ /* 3-address statements */ x+y*z t1 := y * z t2 := x + t1 Where t1, t2 are compilergenerated temporary names Variable names can appear directly in 3-address statements Copyright (c) 2011 Ioanna Dionysiou 3-address code /* source language expression */ /* 3-address statements */ a := b * -c + b * -c t1 := -c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 := + a * b * unimus c b unimus In other words, for each interior node in the syntax tree we create a new temporary variable c Copyright (c) 2011 Ioanna Dionysiou Types of 3-address statements – Assignment statement x := y op z – Assignment instruction x := op z – Copy statement x := y – Unconditional jump Procedural call p(x1,x2,…,xn) param x1 param x2 param xn call p, n Indexed assignments x := y[i] , x[i] := y goto L – Conditional jump if x relop y goto L Address and pointer assignments x := &y, x := *y, *x := y Copyright (c) 2011 Ioanna Dionysiou Implementation of 3-address 3-address statement is an abstract form of intermediate code Need a concrete form – Records with fields for operator and operands • Quadruples • Triples • Indirect triples Copyright (c) 2011 Ioanna Dionysiou Quadruples It is a record structure with 4 fields • • • • Operator Argument 1 Argument 2 Result a:= b * -c + b * -c t1 := -c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 op arg1 unimus c * b unimus c * b t3 t4 + t2 t4 t5 := t5 Copyright (c) 2011 Ioanna Dionysiou arg2 result t1 t1 t2 t3 a Triples It is a record structure with 3 fields • Operator • Argument 1 (pointer to symbol table or triple) • Argument 2 (pointer to symbol table or triple) – Idea – avoid entering temporary names to symbol table – Refer to a temp value by the position of that statement that computes it a:= b * -c + b * -c t1 := -c t2 := b * t1 t3 := -c t4 := b * t3 t5 := t2 + t4 a := t5 op arg1 unimus c * b unimus c * b (2) + (1) (3) assign a (4) Copyright (c) 2011 Ioanna Dionysiou arg2 (0) Indirect Triples It is a record structure with 3 fields • Operator • Argument 1 (pointer to symbol table or triple) • Argument 2 (pointer to symbol table or triple) – Idea • List pointers to triples instead listing triples statement op arg1 (0) (14) unimus c (1) (15) * b (2) (16) unimus c (3) (17) * b (16) (4) (18) + (15) (17) (5) (19) assign a (18) Copyright (c) 2011 Ioanna Dionysiou arg2 (14) In-class Exercise Show the 3-address code for the following statements: a := 5 + 6 c := a + 3 Try to optimize the 3-address code generated Copyright (c) 2011 Ioanna Dionysiou Chapter 7 Lecture Outline Chapter 7 – Source language issues • Activation trees, control stacks, scope of declaration, bindings – Storage Organization • Activation records – Storage Allocation Strategies • Static allocation, stack allocation, heap allocation Copyright (c) 2011 Ioanna Dionysiou Source Language Issues Before considering code generation – Need to relate static program source to actions that must take place at runtime to implement this program • Relationship between names and data objects • Allocation and deallocation of data objects is managed by runtime support package – Routines loaded with the generated target code • Depending on the language, the runtime environment may be – Fully static, stack-based or dynamic – Type of environment determines the need to use stack, heap or both Copyright (c) 2011 Ioanna Dionysiou Program and Procedures Procedure definition procedure name associated with procedure body Procedure call PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; Formal parameters BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Actual parameters Copyright (c) 2011 Ioanna Dionysiou Program and Procedures Procedure Activation An execution of a procedure body Lifetime of Procedure Activation Sequence of steps between the first and last statements in the execution of the procedure body, including time spent executing other procedures called by p PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; Formal parameters BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Actual parameters Copyright (c) 2011 Ioanna Dionysiou Flow of control Assumptions about the flow of control among procedures during program execution – Control flows sequentially – Execution of a procedure starts at the beginning of procedure body and eventually returns control to the point immediately following the place where the procedure was called • In this case we can use trees to illustrate the flow of control in a program • goto statement (transfers control to another point) Copyright (c) 2011 Ioanna Dionysiou Procedure Activation Each time control enters a procedure q from procedure p, it eventually returns to p (if no errors occur) – Activation of procedure p – Activation of procedure q – Return to activation of procedure p Consider – lifetime(p) and lifetime(q) • Lifetime(p) and lifetime(q) are non-overlapping • Lifetime(p) and lifetime(q) are nested – See example above Copyright (c) 2011 Ioanna Dionysiou Recursive Procedures A recursive procedure is a procedure that calls itself – A new activation begins before an earlier activation of the same procedure has ended – Several activations for procedure p exist at time t A recursive procedure is a procedure p that calls another procedure q, which in turns calls procedure p – Procedure p Procedure q • Call to procedure q Copyright (c) 2011 Ioanna Dionysiou Call to procedure p Activation tree We can use a tree (activation tree) to depict the way control enters and leaves activations How to build an activation tree – Each node represents an activation of a procedure • The root represents the activation of the main program • The node for a is the parent of node b iff control flows from activation a to b • The node for a is to the left of the node for b iff the lifetime(a) occurs before lifetime(b) Copyright (c) 2011 Ioanna Dionysiou Activation Tree for a program PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Can you build the activation tree for this program? Use s - activation of main program sort r - activation of readarray p - activation of partition q(x,y) - activation of quicksort Start with s as the root of the tree r and q(1,9) are its children, etc… Copyright (c) 2011 Ioanna Dionysiou Activation Tree for a program PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. s r q(1,9) p(1,9) q(1,3) p(1,3) p(2,3) q(1,0) q(5,9) q(2,3) p(5,9) q(2,1) Copyright (c) 2011 Ioanna Dionysiou q(3,3) p(7,9) q(5,5) q(7,9) q(7,7) q(9,9) Control Stack and Activation Tree Flow of control in a program – Depth-first traversal of the activation tree – Use a stack called control stack to keep track of the live procedure activations • Push node for an activation onto the stack as the activation begins • Pop node for an activation off the stack when the activation ends • Control Stack contents – Node n is at top of the control stack – Stack contains the nodes along the path from n to the root Copyright (c) 2011 Ioanna Dionysiou Activation Tree for a program s r p(1,9) p(1,3) r, p(1,9), p(1,3) and q(1,0) have executed to completion (dashed lines) q(1,9) q(1,3) q(1,0) q(2,3) Stack contains q(2,3) q(1,3) q(1,9) s Copyright (c) 2011 Ioanna Dionysiou top Scope of declaration PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Scope of variables depends on the language scope rules (1) Local (2) Nonlocal Same name, different scope Copyright (c) 2011 Ioanna Dionysiou Bindings of names Environment – Function that maps a name to a storage location • f: name storage location State – Function that maps a storage location to a value • g: storage location value g( f(name) ) = value Copyright (c) 2011 Ioanna Dionysiou Bindings of names Consider storage location 100 is associated with variable x and it currently holds value 0 – Environment • x is bound to storage location 100 – State • Value held is 0 Consider assignment statement x:=10 – Environment • x is still bound to storage location 100 – State • Value held is 10 Copyright (c) 2011 Ioanna Dionysiou Bindings of names A binding is the dynamic counterpart of a declaration – Pascal local variable name in a procedure is bound to a different storage location in each activation of a procedure STATIC NOTION DYNAMIC COUNTERPART Definition of procedure Activations of the procedure Declaration of name Bindings of the name Scope of declaration Lifetime of binding Copyright (c) 2011 Ioanna Dionysiou Lecture Outline Source language issues – Activation trees, control stacks, scope of declaration, bindings Storage Organization – Activation records Storage Allocation Strategies – Static allocation, stack allocation, heap allocation Copyright (c) 2011 Ioanna Dionysiou Storage Organization Suppose that the compiler obtains memory from the OS so that it can execute the compiled program – Program gets loaded on a newly created process This runtime storage must hold – Generated target code – Data objects – A counterpart of the control stack to keep track of procedure activations Copyright (c) 2011 Ioanna Dionysiou Runtime Memory code for function 1 code for function 2 PASCAL and C use extensions of the control stack to manage activations of procedures ... code for function n global / static area stack free space Stack contains information about register values, value of program counter and data objects whose lifetimes are contained in that of an activation Heap holds all other information. For example, activations that cannot be represented as a tree. By convention, stack grows down and the top of the stack is drawn towards the bottom of this slide (value of top is usually kept in a register) heap Copyright (c) 2011 Ioanna Dionysiou Runtime Memory code for function 1 code for function 1 code for function 2 code for function 2 ... ... code for function n code for function n global / static area global / static area stack stack Stack grows free space free space heap heap Copyright (c) 2011 Ioanna Dionysiou Activation Record Information needed by a single execution of a procedure is managed using an activation record or frame – Not all compilers use all of the fields – Pascal and C push activation record on the runtime stack when procedure is called and pop the activation record off the stack when control returns to the caller Copyright (c) 2011 Ioanna Dionysiou returned value actual parameters optional control link optional access link saved machine status local data temporaries Activation Record 1) Temporary values e.g. those arising in the evaluation of expressions 2) actual parameters Local data Data that is local to an execution of the procedure 3) returned value Saved machine status State of the machine info before procedure is called. Values of program counter and machine registers that have to be restored when control returns from the procedure optional control link optional access link saved machine status local data temporaries Copyright (c) 2011 Ioanna Dionysiou Activation Record 4) Access Link refer to non-local data held in other activation records 5) Control link points to the activation record of the caller 6) Actual parameters actual parameters optional control link used by the calling procedure to supply parameters to the called procedure (in practice these are passed in registers) 7) returned value Returned value optional access link saved machine status used by the called procedure to return a value to the calling procedure (in practice it is returned in a register) local data temporaries Copyright (c) 2011 Ioanna Dionysiou Local data The field for local data is set when declarations in a procedure are examined during compile time – Variable-length data is not stored here – Keep a count of the memory locations that have been allocated so far – Determine a relative address (offset) of the storage for a local with respect to some position (e.g. beginning of the frame) • Multibyte objects are stored in consecutive bytes and given the address of the first byte Copyright (c) 2011 Ioanna Dionysiou Lecture Outline Source language issues – Activation trees, control stacks, scope of declaration, bindings Storage Organization – Activation records Storage Allocation Strategies – Static allocation, stack allocation, heap allocation Copyright (c) 2011 Ioanna Dionysiou Storage-allocation strategies There are three storage allocation strategies: – Static allocation – Stack-based allocation – Dynamic (or heap-based) allocation Copyright (c) 2011 Ioanna Dionysiou Static Allocation In a static environment (Fortran 77) there are a number of restrictions: – Size of data objects are known at compile time – No recursive procedures – No dynamic memory allocation Only one copy of each procedure activation record exists at time t – We can allocate storage at compile time • Bindings do not change at runtime • Every time a procedure is called, the same bindings occur Copyright (c) 2011 Ioanna Dionysiou Static Allocation int i = 10; code for function 1 int f(int j) { int k; int m; … } main() { int k; f(k); } code main() code for function 2 ... code f() i (int) main() Activation record code for function n global / static area stack k (int) free space f() Activation record k (int) m (int) Copyright (c) 2011 Ioanna Dionysiou heap Stack-based Allocation In a stack-based allocation, the previous restrictions are lifted (Pascal, C, etc) – procedures are allowed to be called recursively • Need to hold multiple activation records for the same procedure • Created as required and placed on the stack – Each record will maintain a pointer to the record that activated it – On completion, the current record will be deleted from the stack and control is passed to the calling record – Dynamic memory allocation is allowed – Pointers to data locations are allowed Copyright (c) 2011 Ioanna Dionysiou Stack-based Allocation PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Position in Activation Tree s Copyright (c) 2011 Ioanna Dionysiou Activation Records On Stack s a (array) Stack-based Allocation PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Position in Activation Tree Activation Records On Stack s s r Copyright (c) 2011 Ioanna Dionysiou a (array) r i (integer) Stack-based Allocation PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Position in Activation Tree Activation Records On Stack s s r q(1,9) Copyright (c) 2011 Ioanna Dionysiou a (array) q(1,9) i (integer) Stack-based Allocation PROGRAM sort(input,output); VAR a : array[0..10] of Integer; PROCEDURE readarray; VAR i : Integer; BEGIN for i:= 1 to 9 do read(a[i]); END; FUNCTION partition(y,z : Integer): Integer; VAR i,j,x,v : Integer; BEGIN … END; PROCEDURE quicksort(m,n : Integer); VAR i : Integer; BEGIN if (n > m) then BEGIN i := partition(m,n); quicksort(m, i-1); quicksort(i+1,n) END END; BEGIN /* of main */ a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) END. Position in Activation Tree Activation Records On Stack s s r q(1,9) p(1,9) q(1,3) Copyright (c) 2011 Ioanna Dionysiou a (array) q(1,9) i (integer) q(1,3) i (integer) Calling Sequences Procedure calls are implemented by generating calling sequences in the target code – Call sequence: allocates activation record and enters information into fields – Return sequence: restores the state of the machine so that the calling procedure can continue execution Copyright (c) 2011 Ioanna Dionysiou returned value actual parameters optional control link optional access link saved machine status local data temporaries Calling Sequences returned value Why placing returned value and actual parameters next to the activation record of the caller? – Caller can access these values using offsets from its own activation record – No need to know the middle part of the callee’s activation record actual parameters optional control link optional access link saved machine status local data temporaries returned value actual parameters optional control link optional access link saved machine status local data temporaries Copyright (c) 2011 Ioanna Dionysiou caller callee Calling Sequences returned value How do we calculate offset? actual parameters – Maintain a register that points to the end of the machine status field in an activation record – Top_sp is known to the caller, so it can be responsible for setting it before control flows to the called procedure – Callee can access its temporaites and local data using offsets from top_sp optional control link optional access link saved machine status local data temporaries returned value actual parameters optional control link optional access link top_sp saved machine status local data temporaries Copyright (c) 2011 Ioanna Dionysiou caller callee Call Sequence The caller evaluates actuals The caller – stores a return address and the old value of top_sp into the callee’s activation record – increments top_sp; that is moved past the caller’s local data and temporaries and the callee’s parameter and status fields The callee – saves register values and other status information The callee – initializes its local data and begins execution Copyright (c) 2011 Ioanna Dionysiou Return Sequence The callee places a return value next to the activation record of the caller Using the information in the status field, the callee – restores top_sp and other registers – braches to a return address in the caller’s code Although top_sp has been decremented, the caller can copy the returned value into its own activation record and use it to evaluate an expression Copyright (c) 2011 Ioanna Dionysiou Dangling References Whenever storage is deallocated, the problem of dangling references arises – Occurs when there is a reference to storage that has been deallocated – Logical error • Mysterious bugs can appear Copyright (c) 2011 Ioanna Dionysiou Dangling References int *dangle() { int i = 23; return &i; } What’s the problem? main() { int *p; p = dangle(); } Copyright (c) 2011 Ioanna Dionysiou Dangling References Local variable i only exists in dangle() int *dangle() { int i = 23; return &i; } main() { int *p; p = dangle(); } When procedure completes execution and control is transferred to main(), the space for i does not exist anymore (pop activation record for dangle off the stack) Pointer p is a dangling reference Copyright (c) 2011 Ioanna Dionysiou Dynamic Allocation Returning the address of a local variable is defined to be a logical error (e.g. in C) In a dynamic environment there is no such restriction – All variables and activation records must be maintained for as long as there are references to them • Callee outlives the caller – It is also possible to return pointers to local functions – Must deallocate space when procedures and variables are no longer needed (garbage collection) Copyright (c) 2011 Ioanna Dionysiou Dynamic Allocation Use a heap to maintain these records – Also called free store – Heap management is challenging Copyright (c) 2011 Ioanna Dionysiou