Chapter 9 Functions It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. A. Perlis 9.1 9.2 9.3 9.4 9.5 9.6 9.7 Basic Terminology Function Call and Return Parameters Parameter Passing Mechanisms Activation Records Recursive Functions Run Time Stack “Subprogram”: an independent, reusable program unit; performs a single logical task Subprogram Types: ◦ Functions: modeled after mathematical functions which return a single value; e.g., f(x) = x2 + x Purpose: use in an expression; e.g., y = f(x) * x; ◦ Subroutines: didn’t return values directly; worked through side-effects: compute(X, Y) Called as independent statements. Value-returning functions: ◦ The “non-void functions/methods” in C/C++/Java ◦ called from within an expression; e.g., x = (b*b - sqrt(4*a*c))/2*a Non-value-returning functions: ◦ known as “procedures” in Ada, “subroutines” in Fortran, “void functions/methods” in C/C++ ◦ called as a separate statement; e.g., strcpy(s1, s2); Fig 9.1 : Example C/C++ Program int h, i; void B(int w) { int j, k; i = 2*w; w = w+1; } void A(int x, int y) { bool i, j; B(h); } int main() { int a, b; h = 5; a = 3; b = 2; A(a, b); } Definitions ◦ A parameter is an identifier that appears in a function declaration. ◦ An argument is an expression that appears in a function call. Example: in Figure 9.1 ◦ The function declaration A(int x, int y) has parameters x and y. ◦ The call A(a, b) has arguments a and b. Usually by number and by position. ◦ i.e., any call to A must have two arguments, and they must match the corresponding parameters’ types. Exceptions: ◦ ◦ Python: parameters aren’t typed Perl - parameters aren’t declared in a function header. Instead, parameters are stored in an array @_, and are accessed using an array index. ◦ Ada - arguments and parameters can be linked by name; e.g., the call A(y=>b, x=>a) is the same as A(a,b). By By By By By value reference value-result result name Compute the value of the argument at the time of the call and assign that value to the parameter. e.g., in the call A(a, b) in Fig. 9.1, a and b are passed by value. So the values of parameters x and y become 3 and 2, respectively when the call begins. Pass by value doesn’t allow the called function to modify an argument’s value in the caller’s environment. Technically, all arguments in C and Java are passed by value. But references (adresses of arguments) can be passed to allow argument values to be modified. Compute the address of the argument at the time of the call and assign it to the parameter. int h, i; void B(int* w) { int j, k; i = 2*(*w); *w = *w+1; } void A(int* x, int* y) { bool i, j; B(&h); } int main() { int a, b; h = 5; a = 3; b = 2; A(&a, &b); } Pass by reference means the memory address of the argument is copied to the corresponding parameter so the parameter is an indirect reference (a pointer) to the actual argument. Assignments to the parameter affect the value of the argument directly, rather than a copy of the value. This is an example of a side effect. In languages like C++ that support both value and reference parameters, there must be a way to indicate which is which. ◦ In C++, this is done by preceding the parameter name in the function definition with an ampersand (&) if the parameter is a reference parameter. Otherwise, it is a value parameter. Pass by value-result: Pass by value at the time of the call and copy the result back to the argument at the end of the call. ◦ E.g., Ada’s in out parameter can be implemented as value-result. ◦ Value-result is often called copy-in-copy-out. Pass by result: Copy the final value of the parameter out to the argument at the end of the function call. Reference and value-result are the same, except when aliasing occurs. Aliasing:refer to the same variable by two names; e.g., ◦ the same variable is both passed and globally referenced from the called function, ◦ the same variable is passed to two different parameters using a parameter method other than pass by value. ◦ Having two pointers to the same location Example parameter aliases in C++: shift(int &a, int &b, int &c) { a = b; b = c; } The result of shift(x, y, z) is that x is set to y and y is set to z The result of shift(x, y, x) is that x is set to y but y is unchanged. void f (int& x, int& y) { x = x + 1; y = y + 1; } Procedure f (x, y: in out Integer) is begin x = x + 1; y = y + 1; end f; C++ f(a,b) versus f(a,a) Ada: f(a,b) versus f(a,a) a = 2, b = 2; or a = 3, b = 1 Reference parameters a = 2, b = 2; or a = 2, b = 1 In-out parameters Textually substitute the argument for every instance of its corresponding parameter in the function body. ◦ Originated with Algol 60, but was dropped by Algol’s successors -- Pascal, Ada, Modula. ◦ Exemplifies late binding, since evaluation of the argument is delayed until its occurrence in the function body is actually executed. ◦ Associated with lazy evaluation in functional languages (see, e.g., Haskell discussion in Chapter 14). procedure swap(a, b); integer a, b; // declare parameter types begin integer t; // declare local variable t = a; // t = i (t = 3) a = b; // i = a[i] (i = 1) Consider b = t; // a[i] = i (a[1] = 1) end; Consider the call swap(i, a[i]) where i = 3 and a = 9 4 -1 1 14 Instead of the expected result i = 1 and a[3] = 3 we get result: i = 1 and a[1] = 1 C/C++ macros (also called defines) adopt call by name ◦ For example #define max(a,b) ( (a)>(b) ? (a) : (b) ) ◦ A "call" to the macro replaces the macro by the body of the macro (called macro expansion), for example max(n+1, m) is replaced by ((n+1)>(m)?(n+1):(m)) in the program text ◦ Macro expansion is applied to the program source text and amounts to the substitution of the formal parameters with the actual parameter expressions Some methods of parameter passing cause side effects, in this case meaning a change to the nonlocal environment. ◦ Call by value is “safe” – there are no side effects. ◦ Pass by reference can cause side effects. Side effects may compromise readability and reliability. ◦ Example: p = (y*z) + f(x, y); ◦ If y is a reference parameter results could depend on the operand evaluation order, which is not specified in any grammar. Using global variables in functions is dangerous. Parameter lists & calls using actual arguments clarify effects of the function. Example: p = (y*x) + f(x, y); Suppose f(x, y) returns x + y & increments y. Assume when the call executes, x = 2 and y = 3 Sub-expressions evaluated left-to-right: ◦ y*x = 6, f(x,y) = 5, f returns 6+5 = 11, y is now set to 4 Or, sub-expressions evaluated right-to-left: ◦ f(2, 3) sets y to 4 and returns 5, y*x = 2 * 4 = 8, p = 8 + 5 = 13 Remember there are no grammar rules that specify the order of evaluation between (y*x) and f(x,y). Activation Records And The Run-time Stack Two kinds of subprograms ◦ Those that act like a mathematical function ◦ Those that act by causing side effects. Parameter passing mechanisms: ◦ ◦ ◦ ◦ ◦ Call Call Call Call Call by by by by by value reference value-result result name Dangers of side effects Types of Data Storage • Static – permanent allocation (e.g., h and i in the sample program) • Stack: (stack-dynamic allocation) storage that contains information about memory allocated due to function calls. When a function is called, a block of storage is pushed onto the stack; when the function exits this storage is popped. • Heap: storage that contains dynamically allocated objects (pointer-referenced) allocated/deallocated in a less predictable order (dynamic memory allocation) More about this in Chapter 10. • • Stack storage: a collection of activation records. Function activation: a particular execution of a function. • If the function is recursive, several activations may exist at the same time. • When a function is called storage is allocated to hold information about that activation; when the function terminates the storage is deallocated. • (The stack top pointer is adjusted.) • What should be in the activation records? • Function call semantics requires the program to • Save state of calling function • Compute and pass parameters & return address • Pass control to the function • Function return semantics • Values of pass-by-value-result or out-parameters are copied back to the arguments • For value-returning functions, the returned value is made available to the caller. • Restore state & pass control to the calling function A block of information associated with each function call, which includes some or all of: • • • • • • • • Parameters and local variables Return address Saved registers Temporary variables Return value (if any) Static link Dynamic link Usually the format of the AR is known at compile time. Static link: points to the bottom of the Activation Record (AR) of the static parent; used to resolve non-local references. ◦ Needed in languages that allow nested function definitions (Pascal, Algol, Ada, Java’s inner classes) and for languages that have global variables or nested blocks. ◦ Static link reflects static scope rules Dynamic link: points to the top of the AR of the calling function; used to reset the runtime stack Simplified structure of a Called Method’s Stack Frame Activation records are created when a function (or block) is entered and deleted when the function returns to its caller (based on a template prepared by the compiler) The stack is a natural structure for storing the activation records (sometimes called stack frames). The AR at the top of the stack contains information about the currently executing function/block. Early languages did not use this approach – all data needed for a function’s activation was allocated statically at compile time. Result: Only one set of locations for each function ◦ ◦ ◦ ◦ One set of locations for parameters One set of locations for local variables, One set of locations for return addresses, Etc. What about recursive functions? A function that can call itself, either directly or indirectly, is a recursive function; e.g., int factorial (int n) { if (n < 2) return 1; else return n*factorial(n-1); } Recursive call When the first call is made, create an activation record to hold its information Each recursive call from the else will cause another activation record to be added. else return n*factorial(n-1); Recursive self-call When a function call is made, the runtime system ◦ Allocates space for the stack frame (activation record) by adjusting stack top pointer ◦ Stores argument values (if any) in the frame ◦ Stores the return address ◦ Stores a pointer to the static memory (the static link) or enclosing scope. ◦ Stores a pointer to the stack frame of the calling method (the dynamic link.) Static Link Dynamic Link Parameters Local Variables s Return Address Saved Registers Temporary Values Return Value Figure 9.5: Structure of a Called Function’s Activation Record Consider the call factorial(3). • This places one activation record onto the stack and generates a second call factorial(2). • This call generates the call factorial(1), so that the stack has three activation records. Another call, say factorial (6), would require 6 ARs. With static storage allocation (no stack), there is only one AR per function, so recursion isn’t supported. int factorial (int n) { if (n < 2) return 1; else return n*factorial(n-1); } Link fields represented by blank entries n 3 First call n 3 n 3 n 3 n 2 n 2 n 2 n 1 Second call Third call returns 1 n 3 Second call First call returns 2*1=2 returns 3*2=6 if (n < 2) return 1; else return n*factorial(n-1) Consider the program from Figure 9.1: main calls A, A calls B The stack grows and shrinks based on the dynamic calling sequence. On the next slide, we see the stack when B is executing As each function finishes, its AR is popped from the stack. int h, i; void B(int w) { int j, k; i = 2*w; w = w+1; } void A(int x, int y) { bool i, j; B(h); } int main() { int a, b; h = 5; a = 3; b = 2; A(a, b); } Run-Time Stack with Stack Frames for Method Invocations Figure 9.8 (Note: h shouldn’t be undefined; it is initialized when a & b are) Three versions of the stack: one after main() is called but before it calls A, one after A is called, one after B is called. Consider lifetime and scope. Any variable not in current activation record must be in static memory to be in scope. Passing an Argument by Reference Example Suppose, in our sample program, w had been a reference parameter. Now, when A calls B and passes in h as a parameter, the address of h is copied onto the stack. The statement w = w + 1 will change the actual value of h. Static versus dynamic scoping Concrete syntax for functions. Static links implement static scoping (nested scopes): ◦ In statically scoped languages, when B assigns to i, the reference is to the global i Dynamic scoping is based on the calling sequence, shown in the dynamic linkage. ◦ In dynamically scoped languages, when B assigns to i, the reference would be to the i defined in A (most recent in calling chain) In either case the links allow a function to refer to non-local variables. Progr { Type Identifier FunctionOrGlobal} MainFunction Type int | boolean | float | char | void FunctionOrGlobal ( Parameters ) { Declarations Statements } |Global Parameters [ Parameter { , Parameter } ] Global { , Identifier } ; MainFunction int main ( ) { Declarations Statements } Statement ; | Block | Assignment | IfStatement | WhileStatement | CallStatement | ReturnStatement CallStatement Call ; ReturnStatement return Expression ; Factor Identifier | Literal | ( Expression ) | Call Call Identifier ( Arguments ) Arguments [ Expression { , Expression } ]