Addresses, pointers and reference types

advertisement
Advances in Programming Languages:
Addresses, pointers and reference types
Stephen Gilmore
The University of Edinburgh
January 24, 2007
Background
Computer programs manipulate data values stored in memory and in registers. Data values
held in memory have an associated address whereas data values held in registers do not. In some
programming languages addresses can be manipulated as values and in others not. Allowing
programs to modify addresses in an unrestricted way leads to unpredictable behaviour. Modern
programming languages attempt to control modification and update of addresses.
Addresses, pointers and references
C provides an address of operator, &, which allows us to take the address of a variable. A
holder for such an address is a pointer to the data stored at that address. Given a pointer to
an address in memory, we can dereference the pointer (using *). Java provides no means to
obtain the address of an object. Caml provides references created using ref and dereferenced
(using !).
Using addresses: implementing call by reference
One use of the address-of operator is passing the address of a variable to a function. The address
can be used to refer to the contents of the variable which is the actual parameter of the call.
In this way a variable can be shared between the caller, who owns it, and the callee, who uses
it. Passing the address of a variable provides a way to implement call by reference in a call by
value language (such as C).
/* File: programs/c/Addresses.c */
#include <stdio.h>
#include <stdlib.h>
void updateByRef(int* x) { *x = *x + 13; }
void updateByValue(int x) { x += 12; }
int main() {
int x = 3;
/* x is an integer variable */
updateByValue(x); /* this call has no effect on x */
updateByRef(&x);
/* this call updates x */
printf("x is %d\n", x); /* prints "x is 16" */
exit(0);
}
1
UG4 Advances in Programming Languages — 2005/2006
2
Call by reference in C#
Other programming languages consider calling by reference to be an elementary part of the
language, not something which must be simulated by taking addresses and using pointers. C#
allows parameters to be passed by reference by declaring them to be reference parameters in
the method header. A value parameter is declared as int x, as in Java. A reference parameter
is declared as ref int x.
/* File: programs/cs/Addresses.cs */
using System;
class Addresses {
static void updateByRef(ref int x) { x = x + 13; }
static void updateByValue(int x) { x += 12; }
static void Main() {
int x = 3;
/* x is an integer variable */
updateByValue(x); /* this has no effect on x */
updateByRef(ref x); /* this call updates x */
Console.WriteLine("x is {0}", x); /* "x is 16" */
}
}
Handling addresses
Taking the address of a variable may seem entirely unproblematic, but there are some complications.
One complication is that programming languages such as C allow arithmetic operations to
be performed on addresses, for example to obtain an adjacent address. Another is that in
block-structured programming languages variables are declared in a block and are destroyed on
exit from that block. This can lead to incorrectly retaining stale addresses. The following C
program demonstrates one of the problems of address arithmetic: we can obtain a good address,
such as the address of the variable x, but after we move away from this address we point to a
storage location with arbitrary content, or can point to a location which we do not own.
/* File: programs/c/Pointers.c */
#include <stdio.h>
#include <stdlib.h>
int main() {
int x = 3;
/* x is an integer variable */
int *y = &x; /* y is a pointer to an integer */
y--;
/* pointer arithmetic */
printf("*y is %d\n", *y); /* dereference y */
y++;
/* pointer arithmetic */
printf("*y is %d\n", *y); /* dereference y */
y=0;
/* pointer assignment */
printf("*y is %d\n", *y); /* segmentation fault */
exit(0);
}
UG4 Advances in Programming Languages — 2005/2006
3
Cyclone: a modern dialect of C
The Cyclone programming language is a modern dialect of C which attempts to provide C-like
constructs, and to match the efficiency of C, while providing secure type checking comparable
with that of Java and Caml. Cyclone devotes considerable attention to the use of pointers, in
order to avoid errors such as those seen above, and forbids arithmetic on *-pointers.
/* File: programs/cyclone/Pointers.cyc */
#include <stdio.h>
#include <stdlib.h>
int main() {
/* x is an integer variable */
int x = 3;
/* y is a pointer to an integer */
int *y = &x;
/* Arithmetic is not allowed on these pointers
so the compilation is faulted at this point */
y--;
exit(0);
}
Of course, pointer arithmetic is not always problematic, otherwise it would probably not be in
C at all. There are some relatively simple uses of pointer arithmetic, such as to walk through
arrays such as the argument vector (argv) passed to the main function. The argument count
(argc) is decremented on each iteration of the loop, and the pointer into the argument vector
is advanced. The loop terminates when argc reaches 0.
/* File: programs/c/Arguments.c */
#include <stdio.h>
/* char* denotes a string.
char** is an array of strings. */
int main(int argc, char** argv) {
argc--; argv++; /* skip the command name */
while (argc > 0) {
printf("%s ",*argv); /* print a string */
argc--; argv++;
}
printf("\n");
return 0;
}
Fat pointers
One of the difficulties of using C-style pointers is that they are thin, with no additional bounds
information. We need a separate integer counter to know the length of the array. An alternative
is to package the length and content information together in an object, as Java does with its
arrays and strings.
Cylone introduces fat pointers which retain bounds information (hence they are ‘fat’—more
than just a single address). The Cyclone notation for a fat pointer is ?. Fat pointers allow
address arithmetic, supporting the C programming model, but bounds access and null pointer
errors will be trapped at run-time.
UG4 Advances in Programming Languages — 2005/2006
4
/* File: programs/cyclone/Arguments.cyc */
#include <stdio.h>
/* char? is a fat pointer to a string.
char?? is a fat pointer to an array of strings.
Both have null and bounds checks. */
int main(int argc, char?? argv) {
argc--; argv++; /* skip command name */
while (argc > 0) {
printf("%s ",*argv); /* print a string */
argc--; argv++;
}
printf("\n");
return 0;
}
Efficient code and pointer safety
The reason that Cyclone supports both thin pointers and fat pointers is that it allows the
programmer to avoid unnecessary bounds checks when pointer arithmetic is not being used.
Bounds checking is an overhead at run-time so it is good to be able to avoid it sometimes, when
it is safe to do so. A language with only safe pointer and array use such as Java necessarily
inserts some additional run-time checks which are not necessary, but does guarantee memory
safety, which is a huge gain.
Run-time errors with pointers
In C there are no bounds checks so out-of-bounds errors simply corrupt unexpected areas of
memory, leading to unexpected behaviour later, possibly a segmentation fault, at a point in the
execution where the error in the program logic caused things to start to go wrong. This may
make diagnosing the cause of the error more difficult and more expensive.
In contrast bounds checking on strings and arrays raises an exception when the violation
occurs. Bounds checks do not prevent run-time errors, they simply trap them when they occur.
/* File: programs/cyclone/Pointers2.cyc */
#include <stdio.h>
#include <stdlib.h>
int main() {
/* x is an integer variable */
int x = 3;
/* y is a fat pointer to an integer */
int ?y = &x;
/* fat pointer arithmetic is allowed */
y--;
printf("*y is %d\n", *y); /* This attempt to
dereference y leads to a run-time exception. */
exit(0);
}
UG4 Advances in Programming Languages — 2005/2006
5
Cyclone error report
Uncaught exception Cyc_Array_bounds thrown from around Pointers2.cyc:13
Stale addresses
It is tempting to think of the & operator as “returning the address of a variable”, but in truth
it is more correct to think of it as “returning an address (which is currently associated with
this variable)”. As a program calls functions they are loaded onto the run-time stack and their
local variables occupy addresses on the stack. When a function exits, its storage area is freed so
that that memory may be re-used by the next function which is called. If we attempt to take
the address of a local variable in C we will be warned about this by the compiler but warnings,
unlike errors, can sometimes be ignored.
/* File: programs/c/StaleAddresses.c */
#include "stdio.h"
#include "stdlib.h"
int* makeCounter() {
int counter = 0;
return &counter; /* return address of local variable */
}
int main() {
int* p;
p = makeCounter();
printf("*p is %d\n", *p); /* prints "*p is 0" */
exit(0);
}
This program compiles (with warnings) but runs successfully. An inexperienced C programmer
might conclude that it is possible to take the address of a local variable in this way, and thereby
lengthen its lifetime because it needs to be retained while we still hold a pointer to it. To see
that this is not the case, we only need to make the program a little more complex by adding in
an additional function.
/* File: programs/c/StaleAddresses2.c */
#include "stdio.h"
#include "stdlib.h"
int* makeCounter() {
int counter = 0;
return &counter; /* return address of local variable */
}
/* no use of pointers or addresses in this function */
void dummy() {
int x = 13; /* simple integer variable */
}
int main() {
int* p;
p = makeCounter();
UG4 Advances in Programming Languages — 2005/2006
6
printf("*p is %d\n", *p); /* prints "*p is 0" */
dummy();
printf("*p is %d\n", *p); /* prints "*p is 13" */
exit(0);
}
In Cyclone both of these programs are faulted by the compiler with a type error message.
Cyclone considers the type of a variable to include information about the place where it is
allocated (and thus an integer pointer declared in the local function makeCounter cannot be
assigned to an integer pointer declared in main).
Variable lifetimes in Caml
Caml is not a block structured language, so locally-created variables can outlive the function
invocation which created them. In Caml the idiom of creating an initialised counter and returning it from a function can be programmed in the expected way and no memory-related errors
occur.
(* File: programs/caml/StaleAddresses2.ml *)
let makeCounter() =
let counter = ref 0 in counter;;
let dummy() =
let x = 13 in ();;
let main() =
let p = makeCounter()
in print_string("!p is " ^ string_of_int(!p) ^ "\n");
dummy(); (* does not change p: still zero *)
print_string("!p is " ^ string_of_int(!p) ^ "\n");
exit(0);;
main();;
Summary
We discussed addresses, pointers and references.
• In C we can manipulate addresses and pointers as data. There are no checks and there
are no guarantees of safety or security.
• In C# passing by reference can eliminate some uses of addresses and pointers.
• Cyclone separates thin and fat pointers. It allows arithmetic only on fat pointers and
performs bounds checking.
• In Caml and Java addresses and pointers are not exposed to the programmer.
Download