3 Variables and Storage A simple storage model. Simple and composite variables. Copy semantics vs reference semantics. Lifetime. Pointers. © 2004, D.A. Watt, University of Glasgow 3-1 Variables and Storage (cont’d) Commands. Expressions with side effects. Implementation notes. 3-2 An abstract model of storage (1) In functional/logic PLs (as in mathematics), a “variable” stands for a fixed but unknown value. In imperative/OO PLs, a variable is a container for a value, which may be inspected and updated as often as desired. Such a variable can be used to model a real-world object whose state changes over time. 3-3 An abstract model of storage (2) To understand such variables, assume a simple abstract model of storage: • A store is a collection of storage cells. Each storage cell has a unique address. • Each storage cell is either allocated or unallocated. unallocated cells 7 true 3.14 allocated cells ? undefined • Each allocated storage cell contains either a simple value or undefined. ‘X’ 3-4 Simple vs composite variables A simple value is one that can be stored in a single storage cell (typically a primitive value or a pointer). A simple variable occupies a single allocated storage cell. A composite variable occupies a group of allocated storage cells. 3-5 Simple variables When a simple variable is declared, a storage cell is allocated for it. Assignment to the simple variable updates that storage cell. At the end of the block, that storage cell is deallocated. Animation (C++): { int n; n = 0; n = n+1; n 10? } 3-6 Composite variables (1) A variable of a composite type has the same structure as a value of that type. For instance: • A record variable is a tuple of component variables. • An array variable is a mapping from an index range to a group of component variables. The component variables can be inspected and updated either totally or selectively. 3-7 Composite variables (2) Animation (C++): struct Date { int y,m,d; } Date xmas, today; begin xmas.d = 25; xmas.m = 12; xmas.y = 2004; today = xmas; end; xmas today today xmas xmas today 2004 ?2004 2004 ?? ?? decdec ?dec ? ?? 2525 ? 25 ? ? 3-8 Total vs selective update Total update of a composite variable means updating it with a new (composite) value in a single step: today = xmas; Selective update of a composite variable means updating a single component: today.y = 2004; 3-9 Static vs dynamic vs flexible arrays A static array is an array variable whose index range is fixed by the program code. A dynamic array is an array variable whose index range is fixed at the time when the array variable is created. • In Ada, the definition of an array type must fix the index type, but need not fix the index range. Only when an array variable is created must its index range be fixed. Ada arrays are therefore dynamic. A flexible array is an array variable whose index range is not fixed at all, but may change whenever a new array value is assigned. 3-10 Example: C static arrays Array variable declarations: index range is {0, …, 3} float v1[] = {2.0, 3.0, 5.0, 7.0}; float v2[10]; index range is {0, …, 9} Function: void print_vector (float v[], int n) { // Print the array v[0], …, v[n-1] in the form “[… … …]”. int i; printf("[%f8.2", v[0]); A C array for (i = 1; i < n; i++) doesn’t know printf(" %f8.2", v[i]); its own length! printf("]"); } … print_vector(v1, 4); print_vector(v2, 10); 3-11 Example: Ada dynamic arrays Array type and variable declarations: type Vector is array (Integer range <>) of Float; v1: Vector(1 .. 4) := (1.0, 0.5, 5.0, 3.5); v2: Vector(0 .. m) := (0 .. m => 0.0); Procedure: procedure print_vector (v: in Vector) is -- Print the array v in the form “[… … …]”. begin put('['); put(v(v'first)); for i in v'first + 1 .. v'last loop put(' '); put(v(i)); end loop; put(']'); end; … print_vector(v1); print_vector(v2); 3-12 Example: Java flexible arrays Array variable declarations: float[] v1 = {1.0, 0.5, 5.0, 3.5}; float[] v2 = {0.0, 0.0, 0.0}; … v1 = v2; index range is {0, …, 3} index range is {0, …, 2} v1’s index range is now {0, …, 2} Method: static void printVector (float[] v) { // Print the array v in the form “[… … …]”. System.out.print("[" + v[0]); for (int i = 1; i < v.length; i++) System.out.print(" " + v[i]); System.out.print("]"); } … printVector(v1); printVector(v2); 3-13 Copy semantics vs reference semantics What exactly happens when a composite value is assigned to a variable of the same type? Copy semantics: All components of the composite value are copied into the corresponding components of the composite variable. Reference semantics: The composite variable is made to contain a pointer (or reference) to the composite value. C and Ada adopt copy semantics. Java adopts copy semantics for primitive values, but reference semantics for objects. 3-14 Example: C++ copy semantics (1) Declarations: struct Date { int y,m,d; } Date dateA = {2004, 1, 1}; Date dateB; Effect of copy semantics: dateB = dateA; dateB.y = 2005; dateA dateB 2004 2005 2004 ? jan jan ? 1? 1 3-15 Example: Java reference semantics (1) Declarations: class Date { int y, m, d; public Date (int y, int m, int d) { … } } Date dateR = new Date(2004, 1, 1); Date dateS = new Date(2004, 12, 25); Effect of reference semantics: dateS = dateR; dateR.y = 2005; dateR dateS 2005 2004 1 1 2004 12 25 3-16 Example: C++ copy semantics (2) We can achieve the effect of reference semantics in Ada by using explicit pointers: Date* dateP = new Date; Date* dateQ = new Date; *dateP = dateA; dateQ = dateP; 3-17 Example: Java reference semantics (2) We can achieve the effect of copy semantics in Java by cloning: Date dateR = new Date(2004, 4, 1); dateT = dateR.clone(); 3-18 Lifetime (1) Every variable is created (or allocated) at some definite time, and destroyed (or deallocated) at some later time when it is no longer needed. A variable’s lifetime is the interval between its creation and destruction. A variable occupies storage cells only during its lifetime. When the variable is destroyed, the storage cells that it occupied may be deallocated (and subsequently allocated for some other purpose). 3-19 Lifetime (2) A global variable’s lifetime is the program’s run-time. It is created by a global declaration. A local variable’s lifetime is an activation of a block. It is created by a declaration within that block, and destroyed on exit from that block. A heap variable’s lifetime is arbitrary, but bounded by the program’s run-time. It can be created at any time, by a command or expression, and may be destroyed at any later time. It is accessed through a pointer. 3-20 Example: C++ global and local variables (1) Outline of C++ program: int g1; void main() { int g2; … P(); … Q(); … } void P() { float p1; int p2; … Q(); … } void Q() { int q; … } 3-21 Example: C++ global and local variables (2) Lifetimes of global and local variables: call start P call Q return return from Q from P call Q return from Q stop lifetime of g1, g2 lifetime of p1, p2 lifetime of q lifetime of q time Global and local variables’ lifetimes are nested. 3-22 Example: C++ local variables of recursive procedure (1) Outline of Ada program: void main() { int g; … R(); … } void R() { int r; … R(); … 3-23 Example: C++ local variables of recursive procedure (2) Lifetimes of global and local variables (assuming 3-deep recursive activation of R): start call R call R call R return return return from R from R from R stop lifetime of g lifetime of r lifetime of r lifetime of r time 3-24 Example: Ada heap variables (1) Outline of Ada program: procedure main is type IntNode; type IntList is access IntNode; type IntNode is record elem: Integer; succ: IntList; end record; odds, primes: IntList := null; function cons (h: Integer; t: IntList) return IntList is begin return new IntNode'(h, t); end; 3-25 Example: Ada heap variables (2) Outline of Ada program (continued): procedure A is begin odds := cons(3, cons(5, cons(7, null))); primes := cons(2, odds); end; procedure B is begin odds.succ := odds.succ.succ; end; begin … A; … B; … end; 3-26 Example: Ada heap variables (3) After call and return from A: primes 2 3 5 odds heap variables After call and return from B: primes 2 7 3 5 7 odds unreachable 3-27 Example: Ada heap variables (4) Lifetimes of global and heap variables: start return from A call A return call B from B stop lifetime of primes lifetime of odds lifetime of 7-node lifetime of 5-node lifetime of 3-node lifetime of 2-node time Heap variables’ lifetimes have no particular pattern. 3-28 Allocators and deallocators An allocator is an operation that creates a heap variable, yielding a pointer to that heap variable. • Ada and Java’s allocator is an expression of the form “new …”. • C’s allocator is a library function, malloc. A deallocator is an operation that explicitly destroys a designated heap variable. • Ada’s deallocator is a library (generic) procedure, unchecked_deallocation. • C’s deallocator is a library function, free. • Java has no deallocator at all. 3-29 Reachability A heap variable remains reachable as long as it can be accessed by following pointers from a global or local variable. A heap variable’s lifetime extends from its creation until: • it is destroyed by a deallocator, or • it becomes unreachable, or • the program stops. 3-30 Pointers (1) A pointer is a reference to a particular variable. (In fact, pointers are sometimes called references.) A pointer’s referent is the variable to which it refers. A null pointer is a special pointer value that has no referent. In terms of our abstract model of storage, a pointer is essentially the address of its referent in the store. However, each pointer also has a type, and the type of a pointer allows us to infer the type of its referent. 3-31 Pointers (C++, Ada) C++ struct IntNode { int elem; IntNode* succ; } IntNode* p; Ada type IntPointer is access IntNode; type IntNode is record elem: Integer; succ: IntPointer; end record; p: IntPointer; 3-32 Pointers (2) Pointers and heap variables can be used to represent recursive values such as lists and trees. But the pointer itself is a low-level concept. Manipulation of pointers is notoriously error-prone and hard to understand. For example, the assignment “p.succ := q;” appears to manipulate a list, but which list? Also: • Does it delete nodes from the list? • Does it stitch together parts of two different lists? • Does it introduce a cycle? 3-33 Dangling pointers (1) A dangling pointer is a pointer to a variable that has been destroyed. Dangling pointers arise from the following situations: • where a pointer to a heap variable still exists after the heap variable is destroyed by a deallocator • where a pointer to a local variable still exists at exit from the block in which the local variable was declared. A deallocator immediately destroys a heap variable; all existing pointers to that heap variable then become dangling pointers. Thus deallocators are inherently unsafe. 3-34 Dangling pointers (2) C is highly unsafe: • After a heap variable is destroyed, pointers to it might still exist. • At exit from a block, pointers to its local variables might still exist (e.g., stored in global variables). Ada is safer: • After a heap variable is destroyed, pointers to it might still exist. • But pointers to local variables may not be stored in global variables. Java is very safe: • It has no deallocator. • Pointers to local variables cannot be obtained. 3-35 Example: C dangling pointers Consider this C code: struct Date {int y, m, d;}; allocates a new heap variable Date* dateP; Date* dateQ; dateP = (Date*)malloc(sizeof Date); dateP->y = 2004; dateP->m = 1; dateP->d = 1; dateQ = dateP; makes dateQ point to the same heap variable as dateP free(dateQ); printf("%d4", dateP->y); dateP->y = 2005; fails deallocates that heap variable (dateP and dateQ are now dangling pointers) fails 3-36 Commands A command (or statement) is a PL construct that will be executed to update variables. Commands are characteristic of imperative and OO (but not functional) PLs. Forms of commands: • skips • assignments • procedure calls • sequential commands • conditional commands • iterative commands. 3-37 Skips A skip is a command with no effect. Typical forms: • “;” in C and Java • “null;” in Ada. Skips are useful mainly within conditional commands. 3-38 Assignments An assignment stores a value in a variable. Single assignment: • “V = E;” in C and Java • “V := E;” in Ada – the value of expression E is stored in variable V. Multiple assignment: • “V1 = = Vn = E;” in C and Java – the value of E is stored in each of V1, , Vn. Assignment combined with binary operator: • “V = E;” in C and Java means the same as “V = V E;”. 3-39 Procedure calls A procedure call achieves its effect by applying a procedure to some arguments. Typical form: P(E1, , En); Here P determines the procedure to be applied, and E1, , En are evaluated to determine the arguments. Each argument may be either a value or (sometimes) a reference to a variable. The net effect of the procedure call is to update variables. The procedure achieves this effect by updating variables passed by reference, and/or by updating global variables. (But updating its local variables has no net effect.) 3-40 Sequential commands Sequential, conditional, and iterative commands (found in all imperative/OO PLs) are ways of composing commands to achieve different control flows. Control flow matters because commands update variables, so the order in which they are executed makes a difference. A sequential command specifies that two (or more) commands are to be executed in sequence. Typical form: C1 C2 – command C1 is executed before command C2. 3-41 Conditional commands A conditional command chooses one of its subcommands to execute, depending on a condition. An if-command chooses from two subcommands, using a boolean condition. A case-command chooses from several subcommands. 3-42 If-commands (1) Typical forms (Ada and C/Java, respectively): if E then C1 else C2 end if; if (E) C1 else C2 E must be of type Boolean – if E yields true, C1 is executed; otherwise C2 is executed. Common abbreviation (Ada): if E then C1 end if; if E then C1 else null; end if; 3-43 If-commands (2) Generalisation to multiple conditions (in Ada): if E1 then C1 elsif E2 then C2 … elsif En then Cn else C0 end if; E1, …, En must be of type Boolean – if E1, …, Ei-1 all yield false but Ei yields true, then Ci is executed; otherwise C0 is executed. 3-44 Case-commands (1) In Ada: case E is when v1 => C1 … when vn => Cn when others => C0 end case; E must be of some primitive type other than Float v1, …, vn must be distinct values of that type – if the value of E equals some vi, then Ci is executed; otherwise C0 is executed. 3-45 Case-commands (2) In C and Java: switch (E) { case v1: C1 … case vn: Cn default: C0 } E must be of integer type v1, …, vn must be integers, not necessarily distinct – if the value of E equals some vi, then Ci, …, Cn, C0 are all executed; otherwise only C0 is executed. 3-46 Example: Ada case-command Code: today: Date; case today.m is when jan => put("JAN"); when feb => put("FEB"); when nov => put("NOV"); when dec => put("DEC"); end case; 3-47 Example: Java switch-command Code: Date today; switch (today.m) { case 1: System.out.print("JAN"); break; case 2: System.out.print("FEB"); break; case 11: System.out.print("NOV"); break; case 12: System.out.print("DEC"); } breaks are essential 3-48 Iterative commands An iterative command (or loop) repeatedly executes a subcommand, which is called the loop body. Each execution of the loop body is called an iteration. Classification of iterative commands: • Indefinite iteration: the number of iterations is not predetermined. • Definite iteration: the number of iterations is predetermined. 3-49 Indefinite iteration (1) Indefinite iteration is most commonly supported by the while-command. Typical forms (Ada and C/Java): while E loop C end loop; while (E) C Meaning (defined recursively): while E loop C end loop; if E then C while E loop C end loop; end if; 3-50 Indefinite iteration (2) Indefinite iteration is also supported in some PLs by the do-while-command. Typical form (C/Java): do C while (E); Meaning: do C while (E); C if (E) { do C while (E); } C while (E) C 3-51 Definite iteration (1) Definite iteration is characterized by a control sequence, a predetermined sequence of values that are successively assigned (or bound) to a control variable. Ada for-command: for V in R loop C end loop; R must be of some primitive type other than Float – the control sequence consists of all values in the range R, in ascending order. 3-52 Definite iteration (2) Java 1.5’s new-style for-command can iterate over an array, list, or set: for (T V : E) C – the control sequence consists of all component values of the array/list/set yielded by E. NB: Java’s old-style for-command is just an abbreviation for a while-command (indefinite iteration): for (C1; E; C2) C3 C1 while (E) { C3 C2 } 3-53 Example: definite iteration over arrays In Ada: dates: array (…) of Date; … for i in dates'range loop put(dates(i)); end loop; In Java: Date[] dates; … for (int i = 0; i < dates.length; i++) System.out.println(dates[i]); oldstyle for (Date dat : dates) System.out.println(dat); newstyle 3-54 Expressions with side effects The primary purpose of evaluating an expression is to yield a value. But in many imperative/OO PLs, evaluating an expression can also update variables – side effects. In Ada, C, and Java, the body of a function is a command. If that command updates global variables, calling the function has side effects. In C and Java, assignments are in fact expressions with side effects: “V = E” stores the value of E in V as well as yielding that value. Similarly “V = E”. 3-55 Example: side effects in C The C function getchar(f) reads a character and updates the file variable that f points to. The following code is correct and concise: char ch; while ((ch = getchar(f)) != NUL) putchar(ch); The following code is incorrect (why?): enum Gender {female, male}; Gender g; if (getchar(f) == 'F') g = female; else if (getchar(f) == 'M') g = male; else 3-56 Implementation notes Each variable occupies storage space throughout its lifetime. That storage space must be allocated at the start of the variable’s lifetime (or before), and deallocated at the end of the variable’s lifetime (or later). The amount of storage space occupied by each variable depends on its type. Assume that the PL is statically typed: all variables’ types are declared explicitly, or the compiler can infer them. 3-57 Storage for global and local variables (1) A global variable’s lifetime is the program’s entire runtime. So the compiler can allocate a fixed storage space to each global variable. A local variable’s lifetime is an activation of the block in which the variable is declared. The lifetimes of local variables are nested. So the compiler allocates storage space to local variables on a stack. 3-58 Storage for global and local variables (2) At any given time, the stack contains several activation frames. Each activation frame contains enough space for the local variables of a particular procedure. housekeeping data local variables An activation frame is: • pushed on to the stack when a procedure is called • popped off the stack when the procedure returns. Storage can be allocated to local variables of recursive procedures in exactly the same way. 3-59 Example: storage for global and local variables (1) Outline of C++ program: int g1; void main() { int g2; … P(); … Q(); … } void P() { float p1; int p2; … Q(); … } void Q() { int q; … } 3-60 Example: storage for global and local variables (2) Storage layout as the program runs: call P call Q return from Q g1 g1 g1 g1 g1 g1 g2 g2 g2 g2 g2 g2 p1 p1 p1 p2 p2 p2 q return from P call Q q 3-61 Storage for heap variables (1) A heap variable’s lifetime starts when the heap variable is created and ends when it is destroyed or becomes unreachable. There is no pattern in their lifetimes. Heap variables occupy a storage region called the heap. At any given time, the heap contains all currently-live heap variables, interspersed with unallocated storage space. • When a new heap variable is to be created, some unallocated storage space is allocated to it. • When a heap variable is to be destroyed, its storage space reverts to being unallocated. 3-62 Storage for heap variables (2) A heap manager (part of the run-time system) keeps track of allocated and unallocated storage space. If the programming language has no explicit deallocator, the heap manager must be able to find any unreachable heap variables. (Otherwise heap storage will eventually be exhausted.) This is called garbage collection. A garbage collector must visit all heap variables in order to find the unreachable ones. This is time-consuming. But garbage collection eliminates some common errors: • omitting to destroy unreachable heap variables • destroying heap variables that are still reachable. 3-63 Example: storage for heap variables (1) Consider the Ada program on slides 3-24 and 3-25. Storage layout as the program runs: 3-64 Example: storage for heap variables (2) call and return from A call and return from B collect garbage odds primes heap (initially unallocated) 2 2 2 3 3 3 5 5 7 7 7 3-65 Mark-scan garbage collection algorithm To collect garbage: 1. For each variable v in the heap: 1.1. Mark v as unreachable. 2. For each pointer p in the stack: 2.1. Scan all variables that can be reached from p. 3. For each variable v in the heap: 3.1. If v is marked as unreachable: 3.1.1. Deallocate v. To scan all variables that can be reached from p: 1. Let variable v be the referent of p. 2. If v is marked as unreachable: 2.1. Mark v as reachable. 2.2. For each pointer q in v: 2.2.1. Scan all variables that can be reached from q. 3-66 Representation of dynamic/flexible arrays (1) The array indexing operation will behave unpredictably if the index value is out-of-range. To avoid this, in general, we need a run-time range check on the index value. A static array’s index range is known at compile-time. So the compiler can easily generate object code to perform the necessary range check. However, a dynamic/flexible array’s index range is known only at run-time. So it must be stored as part of the array’s representation: • If the lower bound is fixed, only the length need be stored. • Otherwise, both lower and upper bounds must be stored. 3-67 Representation of dynamic/flexible arrays (2) Example (Ada): type Vector is array (Integer range <>) of Float; v1: Vector(1 .. 4); v2: Vector(0 .. 2); lower upper v1 1 4 upper v2 0 2 1 1.0 0 0.0 2 0.5 1 0.0 3 5.0 2 0.0 4 3.5 lower 3-68 Representation of dynamic/flexible arrays (3) Example (Java): float[] v1 = new float[4]; float[] v2 = new float[3]; v1 tag float[] length 4 v2 tag float[] length 3 0 1.0 0 0.0 1 0.5 1 0.0 2 5.0 2 0.0 3 3.5 3-69