A short introduction to system programming in C Overview

advertisement
TDDB68
Concurrent programming and
operating systems
Overview
History
Common organization of a C program
A short introduction to
system programming in C
Data and function definitions
Pointers
Static and automatic variables
Dynamically allocated memory
Pointer programming, Storage Classes,
Compiling, Linking, Loading, Debugging
Linking
Debugging (as time permits)
Christoph Kessler, IDA,
Linköpings universitet.
2.2
TDDB68, C. Kessler, IDA, Linköpings universitet.
Java vs. C
A Short History of C
Java
C
C was developed in the early 1970’s by Dennis Ritchie at Bell
Labs (-> UNIX)
Objective: structured but flexible programming language
e.g. for writing device drivers or operating systems
For application programming only
Design goals:
For system programming mainly
Design goals:
Programmer productivity
Safety
Hardware completely hidden
Comfortable
E.g. automatic memory
management
Protection
E.g., array bound checking
Slow
Time-unpredictable
TDDB68, C. Kessler, IDA, Linköpings universitet.
Direct control of hardware
High performance / real-time
Book 1978 by Brian W. Kernighan and Dennis M. Ritchie
(”K&R-C”)
”ANSI-C” 1989 standard by ANSI (”C89”)
Minimalistic design
Less comfortable
Little protection
”low-level”
The C standard for many programmers (and compilers...)
Became the basis for standard C++
Java borrowed much of its syntax
The GNU C compiler implemented a superset (”GNU-C”)
”ISO-C” 1999 standard by ISO, only minor changes (”C99”)
2.3
Common organization of a C program
2.4
TDDB68, C. Kessler, IDA, Linköpings universitet.
Common organization of a C program (2)
Example
A C program is a collection of C source files
Module files (extension .c)
contain implementations of functionality (similar .java)
Header files (.h)
contain
#include
abc.c
#include
#include
preprocess
declarations of global variables, functions, types
from all module files that need to see them,
using the C preprocessor (missing in Java)
stdio.h
glob.h
xy.h
mymain.c
def.c
#include’d
compile
When a module file is compiled successfully,
an object code file (object module, binary module) is created
(.o, corresponds to Java .class files)
An executable (program) is a single binary file (a.out)
built by the linker by merging all needed object code files
There must be exactly one function
called main()
2.5
TDDB68, C. Kessler, IDA, Linköpings universitet.
abc.o
compile
compile
mymain.o
def.o
libc.a
link
C run-time library is linked
with the user code
a.out (executable)
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.6
Common organization of a C program (3)
Data types in C
Example
/*Comment: declaration of
#include ”glob.h”
globally visible functions
and variables: */
extern int incr( int );
Primitive types
integral types: char, short, int, long; enum
can be signed or unsigned, e.g. unsigned int counter;
sizes are implementation dependent (compiler/platform),
use sizeof( datatype ) operator to write portable code
mymain.c
abc.c
glob.h
#include <stdio.h>
#include ”glob.h”
// definition of var. initval:
int initval = 17;
int counter; // locally def.
// definition of func. incr:
void main( void )
int incr( int k )
{
floatingpoint types: float, double, long double
extern int initval;
Note the difference:
Declarations announce
signatures (names+types).
Definitions declare (if
not already done) AND
allocate memory.
Header files should never
contain
definitions!
TDDB68,
C. Kessler,
IDA, Linköpings universitet.
{
pointers
Composite data types
arrays
structs
counter = initval;
return k+1;
printf(“new counter: %d”,
incr( counter ) );
}
unions
Programmer can define new type names with typedef
}
2.7
TDDB68, C. Kessler, IDA, Linköpings universitet.
Constants and Enumerations
2.8
Composite data types (1): structs
Constant variables:
const int red = 2;
const int blue = 4;
const int green = 5;
const int yellow = 6;
struct My_IComplex {
Enumerations:
enum { red = 2, blue = 4, green, yellow } color;
color = green;
// expanded by compiler to: color = 5;
icplx y;
x:
int re, im;
x.re
x.im
};
struct My_IComplex x;
// definition of x
typedef struct My_IComplex icplx;
// introduces new type name icplx
void main ( void )
{
With the preprocessor:
symbolic names, textually expanded before compilation
#define red 2
...
(no =, no semicolon)
2.9
Unions break type safety
can interpret same memory location with different types
If handled with care, useful in low-level programming
c:
// integer complex (re, im) – see above
float f; // floatingpoint value – overlaps with ic
2.10
Composite data types (3): Arrays
Unions implement variant records:
all attributes share the same storage (start at offset 0)
struct My_IComplex ic;
printf(”x needs %d bytes, an icplx needs %d bytes\n”, sizeof( x ), sizeof (icplx) );
}
TDDB68, C. Kessler, IDA, Linköpings universitet.
Composite data types (2): unions
union My_flexible_Complex {
y.im = 3 * x.re;
// Remark: the sizeof operator is evaluated at compile time.
In C, constants are often capitalized: RED, BLUE, ...
TDDB68, C. Kessler, IDA, Linköpings universitet.
x.re = 2;
c.ic.re,
c.f
c.ic.im
} c;
c.ic.re = 1141123417;
printf(”Float value interpreted from this memory address contents is %f\n”, c.f );
Declaration/definition very similar to Java:
int a[20];
3
6
8
4
3
int b[] = { 3, 6, 8, 4, 3 };
icplx c[4];
float matrix [3] [4];
Addressing:
Location of element a[i] starts at: (address of a) + i * elsize
where elsize is the element size in bytes
Use:
a[3] = 1234567;
a[21] = 345; // ??, there is no array bound checking in C
c[1].re = c[2].im;
Dynamic arrays: see later
Arrays are just a different view of pointers
// writes 528.646057
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.11
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.12
Pointers
Pointer arithmetics
Each data item of a program exists somewhere in memory
Hence, it has a (start) memory address.
A variable of type pointer_to_type_X may take as a value the
memory address of a variable of type X
int a;
int *p; // defines just the pointer p, not its contents!
p = 0x14a236f0; // initialize p
Dereferencing a pointer with the * operator:
*p = 17; // writes 17 to whatever address contained in p
The address of a variable can be obtained by the & operator:
p = &a; // now, p points to address of a
Use the NULL pointer (which is just an alias for address 0)
to denote invalid addresses, end of lists etc.
2.13
TDDB68, C. Kessler, IDA, Linköpings universitet.
Integral values can be added to / subtracted from pointers:
int *q = p + 7;
// new value of q is (value of p) + 7 * sizeof(pointee-type of p)
Arrays are simply constant pointers to their first element:
Notation b[3] is ”syntactic sugar” for *(b + 3)
b[0] is the same as *b
b:
6
8
4
3
A pointer can be subtracted from another pointer:
unsigned int offset = q – p;
TDDB68, C. Kessler, IDA, Linköpings universitet.
Pointers and structs
3
2.14
Why do we need pointers in C?
Defining recursive data structures (lists, trees ...) – as in Java
Argument passing to functions
struct My_IComplex { int re, im; } c, *p;
p = &c;
p is a pointer to a struct
simulates reference parameters – missing in C (not C++)
void foo ( int *p ) { *p = 17; }
Call: foo ( &i );
p->re is shorthand for *(p + (offset of re) )
Example: as p points to c, &(p->re) is the same as &(c.re)
Example: elem->next = NULL;
p: 0xafb024
c:
0xafb024
c.re
c.im
Arrays are pointers
Handle to access dynamically allocated arrays
A 2D array is an array of pointers to arrays:
int m[3][4]; // m[2] is a pointer to start of third row of m
hence, m is a pointer to a pointer to an int
For representing strings – missing in C as basic data type
char *s = ”hello”; // s points to char-array {’h’,’e’,’l’,’l’,’o’, 0 }
For dirty hacks (low-level manipulation of data)
2.15
TDDB68, C. Kessler, IDA, Linköpings universitet.
Pointer type casting
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.16
Pointers to functions (1)
Pointer types can be casted to other pointer types
int i = 1147114711;
int *pi = &i;
printf ( ”%f\n”, * ( (float *) pi ) );
// prints 894.325623
All pointers have the same size (1 address word)
But no conversion of the pointed data! (cf. unions)
Compare this to:
printf( ”%f\n”, (float) i );
A source of type unsafety,
but often needed in low-level programming
Function declaration
extern int f( float );
Function call: f( x )
f is actually a (constant) pointer
to the first instruction of function f in program memory
Call f( x ) dereferences pointer f
push
argument x; save PC and other reg’s; PC := f;
Function pointer variable
int (*pf)(float);
Generic pointer type:
void *
Pointee type is undefined
Always requires a pointer type cast before dereferencing
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.17
// pf is a pointer to a function
// that takes a float and returns an int
pf = f; // pf now contains start address of f
pf( x ); // or (*pf)(x) dereferencing (call): same effect as f(x)
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.18
Pointers to functions (2)
Pointers to functions (3)
Most frequent use: generic functions
Most frequent use: generic functions
Example: Ordinary sort routine
Example: Generic sort routine
void bubble_sort( int arr[], int asize )
{ int i, j;
for (i=0; i<asize-1; i++)
for (j=i+1; j<asize; j++)
if ( arr[i] > arr[j] )
... // interchange arr[i] and arr[j]
}
void bubble_sort( int arr[], int asize, int (*cmp)(int,int) )
{ int i, j;
for (i=0; i<asize-1; i++)
for (j=i+1; j<asize; j++)
if ( cmp ( arr[i] , arr[j] ) )
... // interchange arr[i] and arr[j]
}
Need to rewrite this for sorting in a different order?
int gt ( int a, int b ) { if (a > b) return 1; else return 0; }
Idea: Make bubble_sort generic in the compare function
bubble_sort ( somearray, 100, gt );
bubble_sort ( otherarray, 200, lt );
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.19
TDDB68, C. Kessler, IDA, Linköpings universitet.
Storage classes in C (1)
Storage classes in C (2)
Automatic variables
Local variables and formal parameters of a function
Exist once per call
Visible only within that function (and function call)
Space allocated automatically on the function’s stack frame
Live only as long as the function executes
int *foo ( void ) // function returning a pointer to an int.
{
int t = 3; // local variable
return &t; // ?? t is deallocated on return from foo,
// so its address should not make sense to the caller...
}
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.21
Unless made globally visible by an extern declaration,
visible only within this module (file scope)
int x; // at top level – outside any function
extern int y; // y seen from all modules; only declaration
int y = 9; // only 1 definition of y for all modules seeing y
Space allocated automatically when the program is loaded
Static variables
static int counter;
Allocated once for this module (i.e., not on the stack)
even if declared within a function!
Value will survive function return: next call sees it
2.22
Run-Time Memory Organization
malloc( N )
and returns a generic (void *) pointer to it;
this pointer can be type-casted to the expected pointer type
free( p )
deallocates heap memory block pointed to by p
Can be used e.g. for simulating dynamic arrays:
Recall: arrays are pointers
a[3] = 17;
Memory space for
this user program
NB: This is a simplification.
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.23
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.24
Addr. 0:
Heap limit
Stack limit
fp+8: Space for local var. t
Old fp (of main)
fp+4: some local var. of main
_y: .double 9
_x: .space 4
...
_main:
... (code for main)
_foo:
... (code for foo)...
(OS Kernel
data and code)
Stack
icplx *p = (icplx *)malloc ( sizeof ( icplx ));
Automatic variables
(formal parameters, local variables of functions)
allocated on the run-time stack at function call
and removed on return
indexed by frame pointer (or stack pointer)
Global and static variables
sp
Stack frame for current funreside in the .data segment
ction foo called from main
indexed by absolute address
sp fp
Stack frame for
(or globals pointer)
initial call to main
fp
Dynamically allocated objects
.data segment
reside on the heap
(copied in by loader)
indexed via pointer variables
Compiled program code (for function bodies)
.text segment
reside in the .text segment
(copied in by loader)
branch target addresses:
by absolute address or PC-relative
Heap
allocates a block of N bytes on the heap
int *a = (int *) malloc ( k * sizeof( int ) );
Global variables
Declared and defined outside any function
TDDB68, C. Kessler, IDA, Linköpings universitet.
Dynamic allocation of memory in C
Example:
2.20
C: There is much more to say...
Compiling and Linking
Type conversions and casting
stdio.h
glob.h
xy.h
Bit-level operations
#include
Operator precedence order
#include
#include
preprocess
Variadic functions
(with a variable number of arguments, e.g. printf() )
abc.c
mymain.c
def.c
C standard library
compile
C preprocessor macros
I/O in C
compile
abc.o
compile
mymain.o
def.o
...
libc.a
link
Now: Compiling, Linking, Loading, Debugging
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.25
Compiling C Programs
Needed in both static linking and loading
Patching
Single module: Compile and link (build executable):
gcc mymain.c
executable: a.out
automatizes building process for several modules
Check the man pages!
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.27
Linking Multiple Modules (1)
2.26
How to relocate code?
Example: GNU C Compiler: gcc (Open Source)
One command calls preprocessor, compiler, assembler, linker
gcc –o myprog mymain.c
rename executable
Multiple modules: Compile and link separately
gcc –c –o mymain.o mymain.c
compiler only
gcc –c –o other.o other.c
compiles other.c
gcc other.o mymain.o
call the linker, -> a.out
make (e.g. GNU gmake)
C run-time library is linked
with the user code
a.out (executable)
TDDB68, C. Kessler, IDA, Linköpings universitet.
Absolute Jumps
Adresses of global variables
Relocation table:
- line 3, patch base+1
- line 3, patch base+5
Symbolic addressing:
Relative addressing:
Absolute addressing:
…
0: …
243: …
Definition of data X
1: Space for data
244: Space for data
…
2: …
245: …
If (X) goto P
3: If (*1) goto 5
…
as
P: …
TDDB68, C. Kessler, IDA, Linköpings universitet.
5: …
248: …
2.28
Linking Multiple Modules (2)
Compiler (-c) created an (object) module file (.o, .bin)
Binary format, e.g. COFF (UNIX), ELF (Linux)
Non-executable (yet)
Segments for code, global data, stack / heap space
List of global symbols (e.g., functions, extern variables)
Addresses in each segment start at 0
Relocation table:
List of addresses/instructions that the linker must patch
when changing the start address of the module
Static relocation (at compile/link time):
Merge all object modules to a single object module,
with consecutive addresses in each segment type
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.29
246: If (*244) goto 248
Static
247: …
loading
4: …
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.30
Background: How the Linker Works
Sources of information
Read all object modules to be linked
(including library archive modules if necessary)
Merge the code, data, stack/heap segments of these
into a single code, data, stack/heap segment
Resolve global symbols (e.g., global functions, variables):
check for duplicate globals, undefined globals
Write the resulting object module,
with a new relocation table
and mark it executable.
Variants of static linking: (need hardware support)
Dynamic linking (on demand at run time, as in Java)
Shared libraries
DDD tutorial
Books on C and C++
B. Kernighan, D. Ritchie: The C Programming Language, 2nd ed.,
Prentice Hall, 1988. Covers ANSI-C.
P. Darnell, P. Margolis: Software Engineering in C. Springer, 1988.
H.M. Deitel, P.J. Deitel: C How to Program. Prentice-Hall, 1992.
Bjarne Stroustroup: The C++ Programming Language, 3rd ed.,
Addison-Wesley, 2000.
... and many other good books
Open IDA course on C and C++
TDDB89 Advanced Programming in C++. Given every semester.
2.31
TDDB68, C. Kessler, IDA, Linköpings universitet.
Lab-Kickoff Session
Programming exercise:
C pointer programming and debugging with DDD
2.32
TDDB68, C. Kessler, IDA, Linköpings universitet.
TDDB68
Concurrent programming and
operating systems
Homework
Practise!
Attend the Lab-Kickoff session
Read the DDD tutorial on the course homepage
The following material may also be of some interest.
Appendix: Debugging
Debugging vs. Testing
Debugging Methods and Tools
Debugger Technology
Debugging Concurrent Programs
Literature on Debugging
Debugging vs. Testing
Debugging Techniques and Tools (1)
Testing: may detect existence of a programming error
(but cannot guarantee absence of errors!)
Manual Methods
Static: Code-Inspection
Compare output of testing candidate with reference output
(e.g. of an older, working version – Regression testing, e.g. DEJAGNU)
Debugging: localise error:
Cause
Effect
Symbolic Debugger
Systematic Isolation
Hypothesis
Set
Dynamic: printf-statements,
Validation of assertions (in C: assert(p) macro in assert.h)
Tools for manual debugging:
Iterative Process
Error
detected Initial
e.g. dbx,
Modify
Hypothesis
Set
Choose
Hypothesis
Verify
Hypothesis
no
gdb, jdb, ddd
Debug-Problem Documentation
e.g. BUGZILLA
Error
gone?
(Bug database + Web interface)
Runtime-Protection Monitoring, esp. for memory access errors:
ElectricFence,
VALGRIND, Java VM, INSURE++, PURIFY, …
yes
TDDB68, C. Kessler, IDA, Linköpings universitet.
Christoph Kessler, IDA,
Linköpings universitet.
2.33
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.35
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.36
Symbolic Debugger (1)
Debugging Techniques and Tools (2)
Automatic Debugging:
Formal Verification against a formal specification of the program
Often no or incomplete specification available
Perhaps can derive specification (but then itself error-prone)
Search source program for language-specific error idioms
e.g. lint, splint, jlint
incomplete
Reduce scope of searching by static program analysis:
Program Slicing [Weiser’82] [Lyle, Weiser’87]
Program Dicing (Difference of 2 slices) [Lyle, Weiser’87]
e.g. UNRAVEL slicer [Lyle’95]
Needs good static analysis (data flow analysis, points-to-analysis)
Conservative static approximation – Slices quickly get large
Delta-Debugging
Automatic scope reduction by binary partitioning
(input data, code)
2.37
TDDB68, C. Kessler, IDA, Linköpings universitet.
Needs information about name and type of memory locations
on source code level
i.e., the symbol table and type table of the compiler
added to generated binary file on request (cc –g … )
Needs coordinates of the program points in source code (e.g. line nr.)
Needs tight similarity of control flow
between source code structure and generated machine code
Incompatible with aggressive compiler optimizations
e.g. Prefetching, Loop-invariant code hoisting,
Loop transformations, Scheduling
Trade-Off Code-Efficiency  Debugger-Transparency
Can under certain circumstances lead to ”Heisenbugs”
Error does not occur if run with debugger
(may even happen with printf-debugging)
Graphic User Interface (e.g. ddd, Eclipse Debug-View)
on top of a command-line debugger (z.B. dbx, gdb, jdb)
2.38
TDDB68, C. Kessler, IDA, Linköpings universitet.
Symbolic Debugger (2)
Debugger-Technique with OS/HW-Support
Debugger-Process
Post-Mortem-Debugging
After crash: Read core-file; inspect memory contents, variables values
fork() (via OS)
Stop computation
signal() (via OS): ”stop”
wait() (via OS)
Set / clear breakpoints
ptrace() (via OS)
read, write values in address space;
add breakpoints (special instructions)
…
Step-by-step execution
Print values,
evaluate expressions
(Interpreter)
signal() (via OS): ”resume”
Change values of variables
Find breakpoint:
…
Walk call chain upwards /
downwards
TDDB68, C. Kessler, IDA, Linköpings universitet.
signal(): ”continue”
2.39
TDDB68, C. Kessler, IDA, Linköpings universitet.
Debugging concurrent programs
Thread 2
Nondeterminism
Thread 2
Thread 1
Thread 1
Thread 2
2.40
Debugging - Summary and Literature
Problem: Occurrence of a bug can depend on the schedule
Run 2: CPU
t
Thread 2 Thread 1
Testing vs. Debugging
Debugging Methods
Debugger Technology
Debugging Concurrent Programs
hard to reproduce the error
Solution 1: Deterministic replay
Record inputs and schedule
e.g. DEJAVU for Java
Solution 2: Static Analysis (potential concurrency, ”Data-races”)
Solution 3: Dynamic Analysis
identifies shared-memory accesses at runtime
DDD: www.gnu.org/software/ddd/
see also the DDD tutorial on the course home page
M. Scott: Programming Language Pragmatics, Morgan Kaufmann 2000.
Chapter on Debugging
Srikant, Shankar: Compiler Design Handbook, CRC press 2003,
Chapter 9 on debugger technology (by Aggarwal und Kumar)
J. Rosenberg: How Debuggers Work. Wiley, 1996.
A. Zeller: Why Programs Fail. A Guide to Systematic Debugging.
Morgan Kaufmann, 2005.
Solution 4: Test-based Approach with Delta-Debugging [Choi, Zeller ’02]
In combination with DEJAVU
TDDB68, C. Kessler, IDA, Linköpings universitet.
trap
signal(): ”Breakpoint”
Inspect stack contents
(call chain)
Thread 1
OS-IRC
ptrace() (via OS): ”trace me”
Interactive Debugging
Run 1: CPU
Process being debugged
OS
2.41
TDDB68, C. Kessler, IDA, Linköpings universitet.
2.42
Download