Functions and Recursion

advertisement
Chapter 14/17
Functions & Activation Records
Recursion
Function


Smaller, simpler, subcomponent of program
Provides abstraction
– hide low-level details
– give high-level structure to program,
easier to understand overall program flow
– enables separable, independent development

C functions
–
–
–
–

zero or multiple arguments passed in
single result returned (optional)
return value is always the same type
by convention, only one function named main (this defines the initial PC)
In other languages, called procedures, subroutines, ...
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
2
Functions in C

A function consists of a
– Declaration (a.k.a prototype)

includes return value, function name, and the order and data type of all
arguments – names of argument are optional)
– Calls
– Definition






Names of variables do not need to match prototype, but must match
order/type
Defines functionality (source code) and returns control (and a value) to caller
int Factorial(int n){
May produce side-effects
Declaration
Call
Definition
College of Charleston
Dr. Anderson, Computer Science
int i;
int result = 1;
for (i = 1; i <= n; i++)
result *= i;
return result;
int Factorial (int n);
a = x + Factorial (f + g);
}
CS 250
Comp. Org. & Assembly
3
Why Declaration?


Since function definition also includes return and argument types, why
is declaration needed?
Use might be seen before definition.
– Compiler needs to know return and arg types and number of arguments.

Definition might be in a different file, written by a different
programmer.
– include a "header" file with function declarations only
– compile separately, link together to make executable


What if we need to pass more parameters than we have registers?!?
All the compiler needs to know
– (1) what symbol to call the function (its name),
– (2) the type of the return value (for proper usage/intermediate storage)
– (3) how to set up the stack to call the function (arguments)

This allows the compiler to inserted the necessary code for function
activation when it is called
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
4
Implementing a Function Call: Overview

Making a function call involves three basic steps
– The parameters of the call are passed from the caller to the callee
– The callee does its task
– A return value is returned to the caller

Functions in C are caller-independent
– Every function must be callable from any other function

Activation record
– information about each function, including arguments and local variables
– also includes bookkeeping information

values that must always be saved by the caller before doing the JSR
– arguments and bookeeping info pushed on run-time stack by caller
– callee responsible for other uses of stack (such as local variables)
– popped of the stack by caller after getting return value

R5 (the frame pointer) is the “bottom” of the stack as far as the callee
function definition is aware.
– It should not modify anything existing on the stack before its execution!
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
5
Implementing a Function Call: Overview
Calling function
Called function
allocate memory for activation
record (push)
copy arguments into stack
arguments
allocate space for return value
[bookkeeping] (push)
store mandatory callee save
registers [bookkeeping] (push)
allocate local variables (push)
call function
execute code
get result from stack (pop)
deallocate memory for
activation record (pop)
College of Charleston
Dr. Anderson, Computer Science
put result in return value space
deallocate local variables (pop)
load callee save registers (pop)
return
CS 250
Comp. Org. & Assembly
6
Activation Record

int funName(int a, int b)
{
int w, x, y;
R6
.
.
R5
.
return y;
bookkeeping
}
Name
a
b
w
x
y
Type
Offset
Scope
int
int
int
int
int
4
5
0
-1
-2
funName
funName
funName
funName
funName
College of Charleston
Dr. Anderson, Computer Science
y
x
w
dynamic link
return address
return value
a
b
locals
args
CS 250
Comp. Org. & Assembly
7
Activation Record Bookkeeping

Return value
– space for value returned by function
– allocated even if function does not return a value

Return address
– save pointer to next instruction in calling function
– convenient location to store R7 to protect it from change in case
another function (JSR) is called

Dynamic link
– caller’s frame pointer (basically a caller save of its own frame pointer R5)
– used to “pop” this activation record from stack

In the LC-3 the format for an activation record includes only (and
always) these three words pushed in that order!
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
8
Run-Time Stack
Memory
Memory
Callee
Caller
Before call
College of Charleston
Dr. Anderson, Computer Science
R6
R5
Memory
R6
R5
Caller
Caller
During call
After call
R6
R5
CS 250
Comp. Org. & Assembly
9
Example Function Call
int Callee (int q, int r) {
int k, m;
...
return k;
}
int Caller (int a) {
int w = 25;
w = Callee (w,10);
return w;
}
R6
R5
25
w
dyn link
ret addr
ret val
a
xFD00
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
10
The Function Call: Preparation to Jump



int Callee (int q, int r) …
int Caller (int a) …
… w = Callee(w, 10);…
; push second arg
AND R0, R0, #0
ADD R0, R0, #10
New R6
ADD R6, R6, #-1
STR R0, R6, #0
R6
; push first argument
R5
LDR R0, R5, #0
ADD R6, R6, #-1
STR R0, R6, #0
 JSR CALLEE
; Jump!
Note: Caller needs to know number and type of arguments,
xFD00
doesn't know about local variables.

College of Charleston
Dr. Anderson, Computer Science
25
10
25
q
r
w
dyn link
ret addr
ret val
a
CS 250
Comp. Org. & Assembly
11
Starting the Callee Function


; push space for return value
ADD R6, R6, #-1
; push return address
new R6
ADD R6, R6, #-1
new R5
STR R7, R6, #0
; push dyn link (caller’s R5)
ADD R6, R6, #-1
STR R5, R6, #0
R6
; set new frame pointer
ADD R5, R6, #-1
R5
; allocate space for locals
ADD R6, R6, #-2
… execute body of function
Note: Callee makes room for bookkeeping variables
(standard format) and its local variables (as necessary) xFD00
College of Charleston
Dr. Anderson, Computer Science
xFCFB
x3100
25
10
25
m
k
Dyn lnk (R5)
ret addr
ret value
q
r
w
dyn link
ret addr
ret val
a
CS 250
Comp. Org. & Assembly
12
Ending the Callee Function


return k;
; copy k into return value
LDR R0, R5, #0
STR R0, R5, #3
; pop local variables
ADD R6, R5, #1
; pop dynamic link (into R5)
LDR R5, R6, #0
ADD R6, R6, #1
; pop return addr (into R7)
LDR R7, R6, #0
ADD R6, R6, #1
; return control to caller
RET
College of Charleston
Dr. Anderson, Computer Science
R6
R5
new R6
new R5
-43
217
xFCFB
x3100
217
25
10
25
m
k
dyn link
ret addr
ret val
q
r
w
dyn link
ret addr
ret val
a
xFD00
CS 250
Comp. Org. & Assembly
13
Resuming the Caller Function

w = Callee(w,10);

; JSR Callee
; LAST COMMAND
; load return value (top of stack)
LDR R0, R6, #0
R6
; perform assignment of return value
STR R0, R5, #0
; pop return value and argumentsnew R6
R5
ADD R6, R6, #3
…
Continue caller code
217
25
10
217
ret val
q
r
w
dyn link
ret addr
ret val
a
xFD00
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
14
Summary of LC-3 Function Call
Implementation
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Caller pushes arguments (last to first).
Caller invokes subroutine (JSR).
Callee allocates return value, pushes R7 and R5.
Callee allocates space for local variables (first to last).
Callee executes function code.
Callee stores result into return value slot.
Callee pops local vars, pops R5, pops R7.
Callee returns (JMP R7).
Caller loads return value and pops arguments.
Caller resumes computation…
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
15
Group Problem
main() {
int x = 1, y = 10, z = 20;
…
Assume JSR FOO
z = foo(x, y);
ends up at
…
address x3C00
}
int foo(int a, int b) {
int i, j;
…
Show the
i = j + b;
assembly code
…
for this
return i;
instruction.
}
R6
R5
xFCFC
xFCFD
xFCFE
xFCFF
x0014
x000A
x0001
z
y
x
dyn link
ret addr
ret val
xFD00
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
16
What is Recursion?


A recursive function is one that solves its task by calling itself on
smaller pieces of data.
– Similar to recurrence function in mathematics.
– Like iteration -- can be used interchangeably;
sometimes recursion results in a simpler solution.
Standard example: Fibonacci numbers
– The n-th Fibonacci number is the sum of the previous two Fibonacci
numbers.
– F(n) = F(n – 1) + F(n – 2) where F(1) = F(0) = 1
int Fibonacci(int n){
if ((n == 0) || (n == 1))
return 1;
else
return Fibonacci(n-1) + Fibonacci(n-2);
}
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
17
Activation Records

Whenever Fibonacci is invoked, a new activation record is pushed onto
the stack.
main calls
Fibonacci(3)
Fibonacci(3) calls
Fibonacci(2)
Fibonacci(2) calls
Fibonacci(1)
R6
Fib(1)
R6
Fib(2)
Fib(2)
Fib(3)
Fib(3)
Fib(3)
main
main
main
R6
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
18
Activation Records (cont.)
Fibonacci(1) returns,
Fibonacci(2) calls
Fibonacci(0)
Fibonacci(2) returns,
Fibonacci(3) calls
Fibonacci(1)
Fibonacci(3)
returns
R6
Fib(0)
R6
Fib(2)
Fib(1)
Fib(3)
Fib(3)
R6
main
College of Charleston
Dr. Anderson, Computer Science
main
main
CS 250
Comp. Org. & Assembly
19
Tracing the Function Calls

If we are debugging this program,
we might want to trace all the calls of Fibonacci.
– Note: A trace will also contain the arguments
passed into the function.



For Fibonacci(3), a trace looks like:
Fibonacci(3)
Fibonacci(2)
Fibonacci(1)
Fibonacci(0)
Fibonacci(1)
What would trace of Fibonacci(4) look like?
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
20
Fibonacci: LC-3 Code

Activation Record
bookkeeping
College of Charleston
Dr. Anderson, Computer Science
temp
dynamic link
return address
return value
n
local
arg
Compiler generates
temporary variable to hold
result of first Fibonacci call.
CS 250
Comp. Org. & Assembly
21
In Summary: The Stack


Since our program usually starts at a low memory address and grows
upward, we start the stack at a high memory address and work
downward.
Purposes
– Temporary storage of variables
– Temporary storage of program addresses
– Communication with subroutines




Push variables on stack
Jump to subroutine
Clean stack
Return
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
22
Parameter passing on the stack

If we use registers to pass our parameters:
– Limit of 8 parameters to/from any subroutine.
– We use up registers so they are not available to our program.

So, instead we push the parameters onto the stack.
– Parameters are passed on the stack
– Return values can be provided in registers (such as R0) or on the stack.
– Generally, only R6 should be changed by a subroutine.



Other registers that are changed should must be callee saved/restored.
Subroutines should be transparent
Both the subroutine and the main program must know how many
parameters are being passed!
– In C we would use a prototype:


int power (int number, int exponent);
In assembly, you must take care of this yourself.
After you return from a subroutine, you must also clear the stack.
– Clean up your mess!
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
23
Characteristics of good subroutines

Generality – can be called with any arguments
– Passing arguments on the stack does this.

Transparency – you have to leave the registers like you found them, except R6.
– Registers must be callee saved.


Readability – well documented.
Re-entrant – subroutine can call itself if necessary
– Store all information relevant to specific execution to non-fixed memory locations

The stack!
– This includes temporary callee storage of register values!
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
24
Know how to…



Push parameters onto the stack
Access parameters on the stack using base + offset addressing mode
Draw the stack to keep track of subroutine execution
– Parameters
– Return address

Clean the stack after a subroutine call
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
25
String Library Code
– Implementation of C/C++ function gets

No way to specify limit on number of characters to read
/* Get string from stdin */
char *gets(char *dest)
{
int c = getc();
char *p = dest;
while (c != EOF && c != '\n') {
*p++ = c;
c = getc();
}
*p = '\0';
return dest;
}
– Similar problems with other C/C++ functions


strcpy,scanf, fscanf, sscanf, when given %s conversion specification
cin (as commonly used in C++)
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
26
Vulnerable Buffer Code
/* Echo Line */
void echo()
{
char buf[4];
gets(buf);
puts(buf);
}
/* Way too small! */
int main()
{
printf("Type a string:");
echo();
return 0;
}
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
27
Buffer Overflow Executions
unix>./bufdemo
Type a string:123
123
unix>./bufdemo
Type a string:12345
Segmentation Fault
unix>./bufdemo
Type a string:12345678
Segmentation Fault
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
28
Malicious Use of Buffer Overflow
Stack after call to gets()
return
address
A
void foo(){
bar();
...
}
void bar() {
char buf[64];
gets(buf);
...
}
foo stack frame
data
written
by
gets()
B
B
pad
exploit
code
bar stack frame
– Input string contains byte representation of executable code
– Overwrite return address with address of buffer
– When bar() executes ret, will jump to exploit code
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
29
Exploits Based on Buffer Overflows


Buffer overflow bugs allow remote machines to execute arbitrary code
on victim machines.
Internet worm
– Early versions of the finger server (fingerd) used gets() to read the
argument sent by the client:

finger droh@cs.cmu.edu
– Worm attacked fingerd server by sending phony argument:



finger “exploit-code padding new-return-address”
exploit code: executed a root shell on the victim machine with a direct TCP
connection to the attacker.
IM War
– AOL exploited existing buffer overflow bug in AIM clients
– exploit code: returned 4-byte signature (the bytes at some location in the
AIM client) to server.
– When Microsoft changed code to match signature, AOL changed signature
location.
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
30
Code Red Worm

History
– June 18, 2001. Microsoft announces buffer overflow vulnerability in IIS
Internet server
– July 19, 2001. over 250,000 machines infected by new virus in 9 hours
– White house must change its IP address. Pentagon shut down public
WWW servers for day

When We Set Up CS:APP Web Site
– Received strings of form
GET
/default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN....NNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u9090%u6858%ucbd3%u7801%u9090%u6
858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u
0003%u8b00%u531b%u53ff%u0078%u0000%u00=a
HTTP/1.0" 400 325 "-" "-"
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
31
Code Red Exploit Code
– Starts 100 threads running
– Spread self


Generate random IP addresses & send
attack string
Between 1st & 19th of month
– Attack www.whitehouse.gov

Send 98,304 packets; sleep for 4-1/2
hours; repeat
– Denial of service attack

Between 21st & 27th of month
– Deface server’s home page

After waiting 2 hours
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
32
Avoiding Overflow Vulnerability
/* Echo Line */
void echo()
{
char buf[4]; /* Way too small! */
fgets(buf, 4, stdin);
puts(buf);
}

Use Library Routines that Limit String Lengths
– fgets instead of gets
– strncpy instead of strcpy
– Don’t use scanf with %s conversion specification
 Use fgets to read the string
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
33
Practice problems


14.2, 14.4, 14.9, 14.10, 14.15 (good!)
The convention in LC-3 assembly is that all registers are callee-saved
except for R5 (the frame pointer) R6 (the stack pointer) and R7 (the
return link).
– Why is R5 not callee-saved?
– Why is R6 not callee-saved?
– Why is R7 not callee-saved?

Is it true that any problem that can be solved recursively can be solved
iteratively using a stack data structure? Why or why not?
College of Charleston
Dr. Anderson, Computer Science
CS 250
Comp. Org. & Assembly
34
Download