Lecture 10: *&!%[]++ CS216: Program and Data Representation

advertisement
CS216: Program and Data Representation
University of Virginia Computer Science
Spring 2006
David Evans
Lecture 10:
*&!%[]++
http://www.cs.virginia.edu/cs216
C Bounds Non-Checking
int main (void) {
int x = 9;
char s[4];
}
gets(s);
printf ("s is: %s\n“, s);
printf ("x is: %d\n“, x);
Note: your results
may vary
(depending on
machine, compiler,
what else is
running, time of
day, etc.). This is
what makes C fun!
> gcc -o bounds bounds.c
> bounds
abcdefghijkl
(User input)
s is: abcdefghijkl
x is: 9
> bounds
abcdefghijklm
s is: abcdefghijklmn
x is: 1828716553
= 0x6d000009
> bounds
abcdefghijkln
s is: abcdefghijkln
x is: 1845493769 = 0x6e000009
> bounds
aaa... [a few thousand characters]
crashes shell
What does this kind of mistake
look like in a popular server?
UVa CS216 Spring 2006 - Lecture 10: Pointers
2
Code Red
UVa CS216 Spring 2006 - Lecture 10: Pointers
3
Reasons Not to Use C
• No bounds checking
– Programs are vulnerable to buffer
overflow attacks
• No automatic memory management
– Lots of extra work to manage memory
manually
– Mistakes lead to hard to find and fix bugs
• No support for data abstraction,
objects, exceptions
UVa CS216 Spring 2006 - Lecture 10: Pointers
4
So, why would anyone
use C today?
UVa CS216 Spring 2006 - Lecture 10: Pointers
5
Good Reasons to Use C
• Legacy Code: Linux, apache, etc.
• Simple, small
– Embedded systems, often only have a C compiler
• Low-level abstractions
– Performance: typically 20-30x faster than
interpreted Python
– Sometimes we need to manipulate machine state
directly: device drivers
• Lots of experience
– We know pitfalls for C programming
– Tools available that catch them
UVa CS216 Spring 2006 - Lecture 10: Pointers
6
What are those arrows really?
Stack
s = “hello”
Heap
s
“hello
”
UVa CS216 Spring 2006 - Lecture 10: Pointers
7
Pointers
• In Python, an object reference is
really just an address in memory
Heap
Stack
0x80496f0
0x80496f4
0x80496f8
s
0x80496f8
0x80496fb
hell
o\0\0\0
0x8049700
0x8049704
0x8049708
UVa CS216 Spring 2006 - Lecture 10: Pointers
8
Pointers in C
• Addresses in memory
• Programs can manipulate addresses
directly
&expr
*expr
Evaluates to the address of the
location expr evaluates to
Evaluates to the value stored in
the address expr evaluates to
UVa CS216 Spring 2006 - Lecture 10: Pointers
9
&*%&@#*!
int f
int
int
int
int
int
(void) {
s = 1;
t = 1;
*ps = &s;
**pps = &ps;
*pt = &t;
s == 1, t == 1
**pps = 2;
s == 2, t == 1
}
pt = ps;
*pt = 3;
t = s;
s == 3, t == 1
s == 3, t == 3
UVa CS216 Spring 2006 - Lecture 10: Pointers
10
Rvalues and Lvalues
What does = really mean?
int f (void) {
int s = 1; left side of = is an “lvalue”
it evaluates to a location (address)!
int t = 1;
right side of = is an “rvalue”
it evaluates to a value
t = s;
t = 2;
There is an implicit *
}
when a variable is
used as an rvalue!
UVa CS216 Spring 2006 - Lecture 10: Pointers
11
Parameter Passing in C
• Actual parameters are rvalues
void swap (int a, int b) {
int tmp = b; b = a; a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (i, j);
The value of i (3) is passed, not its location!
…
swap does nothing
}
UVa CS216 Spring 2006 - Lecture 10: Pointers
12
Parameter Passing in C
void swap (int *a, int *b) {
int tmp = *b; *b = *a; *a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (&i, &j);
The value of &i is passed, which is the address of i
…
}
Is it possible to define swap in Python?
UVa CS216 Spring 2006 - Lecture 10: Pointers
13
int *value (void)
{
int i = 3;
return &i;
}
Beware!
void callme (void)
{
int x = 35;
}
int main (void) {
int *ip;
ip = value ();
printf (“*ip == %d\n", *ip);
callme ();
printf ("*ip == %d\n", *ip);
}
UVa CS216 Spring 2006 - Lecture 10: Pointers
*ip == 3
*ip == 35
But it could really be anything!
14
Manipulating Addresses
char s[6];
s[0] = ‘h’;
s[1] = ‘e’;
s[2]= ‘l’;
s[3] = ‘l’;
s[4] = ‘o’;
s[5] = ‘\0’;
printf (“s: %s\n”, s);
expr1[expr2] in C is just
syntactic sugar for
*(expr1 + expr2)
s: hello
UVa CS216 Spring 2006 - Lecture 10: Pointers
15
Obfuscating C
char s[6];
*s = ‘h’;
*(s + 1) = ‘e’;
2[s] = ‘l’;
3[s] = ‘l’;
*(s + 4) = ‘o’;
5[s] = ‘\0’;
printf (“s: %s\n”, s);
s: hello
UVa CS216 Spring 2006 - Lecture 10: Pointers
16
Fun with Pointer Arithmetic
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int main (void)
{
char s1[6] = "hello"; The \0 is invisible!
char s2[6] = "hohoh";
}
&s2[1]
&(*(s2 + 1))
 s2 + 1
printf ("match: %d\n", match (s1, s2));
printf ("match: %d\n", match (s2, s2 + 2));
printf ("match: %d\n", match (&s2[1], &s2[3]));
UVa CS216 Spring 2006 - Lecture 10: Pointers
match: 1
match: 3
match: 2
17
Condensing match
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int match (char *s, char *t) {
char *os = s;
while (*s++ == *t++);
return s – os - 1;
}
s++ evaluates to spre, but changes the value of s
Hence, C++ has the same value as C, but has
unpleasant side effects.
UVa CS216 Spring 2006 - Lecture 10: Pointers
18
Quiz
• What does s = s++; do?
It is undefined!
If your C programming contains
it, a correct interpretation of your
program could make s = spre + 1,
s = 37, or blow up the computer.
UVa CS216 Spring 2006 - Lecture 10: Pointers
19
Type Checking in C
• Java: only allow programs the
compiler can prove are type safe
Exception: run-time type errors for downcasts
and array element stores.
• C: trust the programmer. If she
really wants to compare apples and
oranges, let her.
• Python: don’t trust the programmer
or compiler – check everything at
runtime.
UVa CS216 Spring 2006 - Lecture 10: Pointers
20
Type Checking
int main (void) {
char *s = (char *) 3;
printf ("s: %s", s);
}
Windows XP (SP 2)
UVa CS216 Spring 2006 - Lecture 10: Pointers
21
Type Checking
int main (void) {
char *s = (char *) 3;
printf ("s: %s", s);
}
Windows 2000
(earlier versions of Windows would just crash the whole machine)
UVa CS216 Spring 2006 - Lecture 10: Pointers
22
Python’s List Implementation
(A Whirlwind Tour)
http://svn.python.org/view/python/
trunk/Objects/listobject.c
UVa CS216 Spring 2006 - Lecture 10: Pointers
23
listobject.c
/* List object implementation */
We’ll get back to this…
#include "Python.h"
but you should be convinced that
#ifdef STDC_HEADERS
you are lucky to have been using
#include <stddef.h>
#else
a language with automatic
#include <sys/types.h> /* For size_t */
memory management so far!
#endif
/* Ensure ob_item has room for at least newsize elements, and set ob_size to newsize. If newsize
> ob_size on entry, the content of the new slots at exit is undefined heap trash; it's the caller's
responsiblity to overwrite them with sane values. The number of allocated elements may grow,
shrink, or stay the same. Failure is impossible if newsize <= self.allocated on entry, although that
partly relies on an assumption that the system realloc() never fails when passed a number of
bytes <= the number of bytes last allocated (the C standard doesn't guarantee this, but it's hard
to imagine a realloc implementation where it wouldn't be true).
Note that self->ob_item may change, and even if newsize is less than ob_size on entry.
*/
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
…
UVa CS216 Spring 2006 - Lecture 10: Pointers
24
listobject.h
typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
*
0 <= ob_size <= allocated
Now
we know
our answer to PS1 #6 (Python’ s
*
len(list)
== ob_size
*
ob_item
== NULL impliesis
ob_size
== allocated ==
implementation
continuous)
is 0correct!
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
list
Py_ssize_t allocated;
} PyListObject;
http://svn.python.org/view/python/trunk/Include/listobject.h
UVa CS216 Spring 2006 - Lecture 10: Pointers
25
Append
int
PyList_Append(PyObject *op, PyObject *newitem)
{
if (PyList_Check(op) && (newitem != NULL))
return app1((PyListObject *)op, newitem);
PyErr_BadInternalCall();
return -1;
}
UVa CS216 Spring 2006 - Lecture 10: Pointers
26
app1
static int
app1(PyListObject *self, PyObject *v)
{
Py_ssize_t n = PyList_GET_SIZE(self);
assert (v != NULL);
if (n == INT_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
if (list_resize(self, n+1) == -1)
return -1;
}
Py_INCREF(v);
PyList_SET_ITEM(self, n, v);
return 0;
UVa CS216 Spring 2006 - Lecture 10: Pointers
Checks there is
enough space to
add 1 more element
(and resizes if
necessary)
27
app1
static int
app1(PyListObject *self, PyObject *v)
{
Py_ssize_t n = PyList_GET_SIZE(self);
assert (v != NULL);
if (n == INT_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
if (list_resize(self, n+1) == -1)
return -1;
}
Py_INCREF(v);
PyList_SET_ITEM(self, n, v);
return 0;
UVa CS216 Spring 2006 - Lecture 10: Pointers
Complicated macro
for memory
management: needs to
keep track of how many
references there are to
object v
28
Set Item (in listobject.h)
/* Macro, trading safety for speed */
#define PyList_SET_ITEM(op, i, v) \
(((PyListObject *)(op))->ob_item[i] = (v))
Macro: text replacement, not
procedure calls.
PyList_SET_ITEM(self, n, v);
(((PyListObject *)(self))->ob_item[n] = (v))
UVa CS216 Spring 2006 - Lecture 10: Pointers
29
Set Item (in listobject.h)
(((PyListObject *)(self))->ob_item[n] = (v))
typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants: … */
Py_ssize_t allocated;
} PyListObject;
Now we can be (slightly) more confident that our
answer to PS1 #4 (append is O(1)) was correct!
UVa CS216 Spring 2006 - Lecture 10: Pointers
30
list_resize
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
...
/* This over-allocates proportional to the
list size, making room for additional growth.
The over-allocation is mild, but is enough to
give linear-time amortized behavior over
a long sequence of appends()... */
Monday’s class will look at list_resize
UVa CS216 Spring 2006 - Lecture 10: Pointers
31
Charge
• This is complicated, difficult code
– We could (but won’t) spend the rest of the
semester without understanding it all completely
• Now we trust PS1 #4
– But...only amortized O(1) – some appends will be
worse than average!
– We shouldn’t trust Python’s developers’ comments
• Exam 1 is out now, due Monday
– Work alone, read rules on first page carefully
• No regularly scheduled Small Hall and office
hours while Exam 1 is out
UVa CS216 Spring 2006 - Lecture 10: Pointers
32
Download