Low-Level Programming in C

advertisement
Lecture 22:
Low-Level
Programming
in C
CS201j: Engineering Software
University of Virginia
Computer Science
David Evans
http://www.cs.virginia.edu/evans
Menu
• PS5
• C Programming Language
• Pointers in C
– Pointer Arithmetic
• Type checking in C
• Why is garbage collection hard in C?
20 November 2003
CS 201J Fall 2003
2
PS5
• Will return in section tomorrow
• Some very impressive projects!
– Will be posted on the course web site soon
– Many people demonstrated ability to figure
out complicated new things on their own (not
a requirement for PS5)
• Stapling penalty for PS6 will be 25 points
20 November 2003
CS 201J Fall 2003
3
Programming Languages Phylogeny
Fortran (1954)
Algol (1958)
LISP (1957)
Scheme (1975)
CPL (1963), U Cambridge
Combined Programming Language
Simula (1967)
BCPL (1967), MIT
Basic Combined Programming Language
B (1969), Bell Labs
C (1970), Bell Labs
C++ (1983), Bell Labs
Objective C
Java (1995), Sun
20 November 2003
CS 201J Fall 2003
4
C Programming Language
• Developed to build Unix operating system
• Main design considerations:
– Compiler size: needed to run on PDP-11 with
24KB of memory (Algol60 was too big to fit)
– Code size: needed to implement the whole
OS and applications with little memory
– Performance
– Portability
• Little (if any consideration):
– Security, robustness, maintainability
20 November 2003
CS 201J Fall 2003
5
C Language
• No support for:
– Array bounds checking
– Null dereferences checking
– Data abstraction, subtyping, inheritance
– Exceptions
– Automatic memory management
• Program crashes (or worse) when something
bad happens
• Lots of syntactically legal programs have
undefined behavior
20 November 2003
CS 201J Fall 2003
6
Example C Program
void test (int x) {
while (x = 1) {
printf (“I’m an imbecile!”);
x = x + 1;
}
}
Weak type checking:
In C, there is no boolean type.
Any value can be the test expression.
x = 1 assigns 1 to x, and has the value 1.
20 November 2003
I’m an imbecile!
I’m an imbecile!
I’m an imbecile!
I’m an imbecile!
CS 201J Fall 2003
I’m an imbecile!
In Java:
void test (int x) {
while (x = 1) {
printf (“I’m an imbecile!”);
x = x + 1;
}
}
> javac Test.java
Test.java:21: incompatible types
found : int
required: boolean
while (x = 1) {
^
1 error
7
Type Checking isn’t Enough…
void test (boolean x) {
while (x = true) {
printf (“I’m an
imbecile!”);
x = !x;
}
}
20 November 2003
CS 201J Fall 2003
8
Fortran (1954)
LET
:=
Algol (1958)
CPL (1963), U Cambridge
Combined Programming Language
BCPL (1967), MIT
Basic Combined Programming Language
B (1969), Bell Labs
C (1970), Bell Labs
C++ (1983), Bell Labs
Java (1995), Sun
20 November 2003
:=
:=
=
=
=
=
CS 201J Fall 2003
9
= vs. :=
• Why does Java use = for assignment?
– Algol (designed for elegance for presenting
algorithms) used :=
– CPL and BCPL based on Algol, used :=
– Thompson and Ritchie had a small computer to
implement B, saved space by using = instead
– C was successor to B (also on small computer)
– C++’s main design goal was backwards
compatibility with C
– Java’s main design goal was surface similarity
with C++
20 November 2003
CS 201J Fall 2003
10
C/C++ Bounds NonChecking
# include <iostream.h>
int main (void) {
int x = 9;
char s[4];
}
cin >> s;
cout << "s is: " << s << endl;
cout << "x is: " << x << endl;
20 November 2003
> g++ -o bounds bounds.cc
> bounds
cs
(User input)
s is: cs
x is: 9
> bounds
cs201
s is: cs201
x is: 49
> bounds
cs201j
s is: cs201j
x is: 27185
> bounds
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
s is: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
x is: 1633771873
Segmentation fault (core dumped)
CS 201J Fall 2003
11
So, why would anyone use
C today?
20 November 2003
CS 201J Fall 2003
12
Reasons to Use C
• Legacy Code
– Linux, most open source applications are in C
• Simple to write compiler
– Programming embedded systems, often only
have a C compiler
• Performance
– Typically 50x faster than interpreted Java
• Smaller, simpler, lots of experience
20 November 2003
CS 201J Fall 2003
13
User-Defined Structure Types
• Use struct to group data
• Dot (.) operator to access fields of a struct
• Fields are accessible everywhere (no way
to make them private)
typedef struct {
char name[10];
int count;
} Tally;
20 November 2003
CS 201J Fall 2003
14
Abstract Types in C
• How can we get most of the benefits of
data abstraction in C?
Distinguish between client code and implementation code
In client code:
Check types by name instead of by structure
Don’t allow client code to depend on the
representation of a type:
Make struct fields inaccessible
Don’t allow use of C operators
20 November 2003
CS 201J Fall 2003
15
Enforcing Abstract Types
• Implementation Code
– Where datatype is defined (also naming
conventions to allow access)
– Rep and abstract type are interchangable
• Client Code
– Everywhere else
– ADT is type name only: cannot access fields,
use C operators, treat as rep
– Only manipulate by passing to procedures
20 November 2003
CS 201J Fall 2003
16
What are those arrows really?
Heap
Stack
sb
“hello”
20 November 2003
CS 201J Fall 2003
17
Pointers
• In Java, an object reference is really just
an address in memory
– But Java doesn’t let programmers manipulate
addresses directly (unless they have a hair
dryer to break type safety)
Heap
Stack
0x80496f0
0x80496f4
0x80496f8
sb
0x80496f8
0x80496fb
hell
o\0\0\0
0x8049700
0x8049704
0x8049708
20 November 2003
CS 201J Fall 2003
18
Pointers in C
• Addresses in memory
• Programs can manipulate addresses
directly
&expr
*expr
20 November 2003
Evaluates to the address of the
location expr evaluates to
Evaluates to the value stored in
the address expr evaluates to
CS 201J Fall 2003
19
&*%&@#*!
int f (void) {
int s = 1;
int t = 1;
int *ps = &s;
int **pps = &ps;
int *pt = &t;
s == 1, t == 1
**pps = 2;
s == 2, t == 1
pt = ps;
*pt = 3;
t = s;
}
20 November 2003
s == 3, t == 1
s == 3, t == 3
CS 201J Fall 2003
20
Rvalues and Lvalues
What does = really mean?
int f (void) {
int s = 1;
int t = 1;
t = s;
t = 2;
}
20 November 2003
left side of = is an “lvalue”
it evaluates to a location (address)!
right side of = is an “rvalue”
it evaluates to a value
There is an implicit * when a variable is
used as an rvalue!
CS 201J Fall 2003
21
Parameter Passing in C
• Actual parameters are rvalues
void swap (int a, int b) {
int tmp = b; b = a; a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (i, j);
The value of i (3) is passed, not its location!
…
swap does nothing
}
20 November 2003
CS 201J Fall 2003
22
Parameter Passing in C
• Can pass addresses around
void swap (int *a, int *b) {
int tmp = *b; *b = *a; *a = tmp;
}
int main (void) {
int i = 3;
int j = 4;
swap (&i, &j);
The value of &i is passed, which is the address of i
…
}
20 November 2003
CS 201J Fall 2003
23
int *value (void)
{
int i = 3;
return &i;
}
Beware!
void callme (void)
{
int x = 35;
}
int main (void) {
int *ip;
ip = value ();
printf (“*ip == %d\n", *ip);
callme ();
printf ("*ip == %d\n", *ip);
}
20 November 2003
But it could really be anything!
*ip == 3
*ip == 35
CS 201J Fall 2003
24
Manipulating Addresses
char s[6];
s[0] = ‘h’;
expr1[expr2] in C is just syntactic sugar for
s[1] = ‘e’;
*(expr1 + expr2)
s[2]= ‘l’;
s[3] = ‘l’;
s[4] = ‘o’;
s[5] = ‘\0’;
printf (“s: %s\n”, s);
s: hello
20 November 2003
CS 201J Fall 2003
25
Obfuscating C
char s[6];
*s = ‘h’;
*(s + 1) = ‘e’;
2[s] = ‘l’;
3[s] = ‘l’;
*(s + 4) = ‘o’;
5[s] = ‘\0’;
printf (“s: %s\n”, s);
s: hello
20 November 2003
CS 201J Fall 2003
26
Fun with Pointer Arithmetic
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int main (void)
{
char s1[6] = "hello"; The \0 is invisible!
char s2[6] = "hohoh";
}
&s2[1]
&(*(s2 + 1))
 s2 + 1
printf ("match: %d\n", match (s1, s2));
printf ("match: %d\n", match (s2, s2 + 2));
printf ("match: %d\n", match (&s2[1], &s2[3]));
20 November 2003
CS 201J Fall 2003
match: 1
match: 3
match: 2
27
Condensing match
int match (char *s, char *t) {
int count = 0;
while (*s == *t) { count++; s++; t++; }
return count;
}
int match (char *s, char *t) {
char *os = s;
while (*s++ == *t++);
return s – os - 1;
}
s++ evaluates to spre, but changes the value of s
Hence, C++ has the same value as C, but has
unpleasant side effects.
20 November 2003
CS 201J Fall 2003
28
Type Checking in C
• Java: only allow programs the compiler
can prove are type safe
Exception: run-time type errors for downcasts and array element stores.
• C: trust the programmer. If she really
wants to compare apples and oranges, let
her.
20 November 2003
CS 201J Fall 2003
29
Type Checking
int main (void)
{
char *s = (char *) 3;
printf ("s: %s", s);
}
Windows 2000
(earlier versions of Windows would just crash the whole machine)
20 November 2003
CS 201J Fall 2003
30
In Praise of Type Checking
int match (int *s, int *t) {
int *os = s;
while (*s++ == *t++);
return s - os;
}
int main (void)
{
char s1[6] = "hello";
char s2[6] = "hello";
}
printf ("match: %d\n", match (s1, s2));
match: 2
20 November 2003
CS 201J Fall 2003
31
Different Matching
int different (int *s, int *t) {
int *os = s;
while (*s++ != *t++);
return s - os;
}
int main (void)
{
char s1[6] = "hello";
printf ("different: %d\n", different ((int *)s1, (int *)s1 + 1));
}
different: 29
20 November 2003
CS 201J Fall 2003
32
So, why is it hard to garbage
collect C?
20 November 2003
CS 201J Fall 2003
33
Mark and Sweep (Java version)
active = all objects on stack
while (!active.isEmpty ())
newactive = { }
foreach (Object a in active)
mark a as reachable
foreach (Object o that a points to)
if o is not marked
newactive = newactive U { o }
active = newactive
sweep () // remove unmarked objects on heap
20 November 2003
CS 201J Fall 2003
34
Mark and Sweep (C version?)
active = all pointers on stack
while (!active.isEmpty ())
newactive = { }
foreach (pointer a in active)
mark *a as reachable
foreach (address p that a points to)
if *p is not marked
newactive = newactive U { *p }
active = newactive
sweep () // remove unmarked objects on heap
20 November 2003
CS 201J Fall 2003
35
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
s = s + 20;
*s = ‘a’;
return s – 20;
}
There may be objects that only have pointers to their middle!
20 November 2003
CS 201J Fall 2003
36
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
int x = (int) s;
s = 0;
return (char *) x;
}
There may be objects that are reachable through values
that have non-pointer apparent types!
20 November 2003
CS 201J Fall 2003
37
GC Challenges
char *f (void) {
char *s = (char *) malloc (sizeof (char) * 100);
int x = (int) s;
x = x - &f;
s = 0;
return (char *) (x + &f);
}
There may be objects that are reachable through values
that have non-pointer apparent types and have values that don’t
even look like addresses!
20 November 2003
CS 201J Fall 2003
38
Why not just do reference
counting?
Where can you store the references?
Remember C programs can access memory
directly, better not change how objects are
stored!
20 November 2003
CS 201J Fall 2003
39
Summary
• Garbage collection depends on:
– Knowing which values are addresses
– Knowing that objects without references
cannot be reached
• Both of these are problems in C
• Nevertheless, there are some garbage
collectors for C.
– Change meaning of some programs
– Slow down programs a lot
– Are not able to find all garbage
20 November 2003
CS 201J Fall 2003
40
Charge
• Friday’s section: practice problems on
subtyping and concurrency
• If you send me questions by Monday,
Tuesday’s class will be a quiz review
• PS6 due Tuesday
– Either staple your assignment before class,
or you can use my stapler for $5 per staple
20 November 2003
CS 201J Fall 2003
41
Download