TDDB68 Concurrent programming and operating systems Overview History Common organization of a C program A short introduction to system programming in C Data and function definitions Pointers Static and automatic variables Dynamically allocated memory Pointer programming, Storage Classes, Compiling, Linking, Loading, Debugging Linking Debugging (as time permits) Christoph Kessler, IDA, Linköpings universitet. 2.2 TDDB68, C. Kessler, IDA, Linköpings universitet. Java vs. C A Short History of C Java C C was developed in the early 1970’s by Dennis Ritchie at Bell Labs (-> UNIX) Objective: structured but flexible programming language e.g. for writing device drivers or operating systems For application programming only Design goals: For system programming mainly Design goals: Programmer productivity Safety Hardware completely hidden Comfortable E.g. automatic memory management Protection E.g., array bound checking Slow Time-unpredictable TDDB68, C. Kessler, IDA, Linköpings universitet. Direct control of hardware High performance / real-time Book 1978 by Brian W. Kernighan and Dennis M. Ritchie (”K&R-C”) ”ANSI-C” 1989 standard by ANSI (”C89”) Minimalistic design Less comfortable Little protection ”low-level” The C standard for many programmers (and compilers...) Became the basis for standard C++ Java borrowed much of its syntax The GNU C compiler implemented a superset (”GNU-C”) ”ISO-C” 1999 standard by ISO, only minor changes (”C99”) 2.3 Common organization of a C program 2.4 TDDB68, C. Kessler, IDA, Linköpings universitet. Common organization of a C program (2) Example A C program is a collection of C source files Module files (extension .c) contain implementations of functionality (similar .java) Header files (.h) contain #include abc.c #include #include preprocess declarations of global variables, functions, types from all module files that need to see them, using the C preprocessor (missing in Java) stdio.h glob.h xy.h mymain.c def.c #include’d compile When a module file is compiled successfully, an object code file (object module, binary module) is created (.o, corresponds to Java .class files) An executable (program) is a single binary file (a.out) built by the linker by merging all needed object code files There must be exactly one function called main() 2.5 TDDB68, C. Kessler, IDA, Linköpings universitet. abc.o compile compile mymain.o def.o libc.a link C run-time library is linked with the user code a.out (executable) TDDB68, C. Kessler, IDA, Linköpings universitet. 2.6 Common organization of a C program (3) Data types in C Example /*Comment: declaration of #include ”glob.h” globally visible functions and variables: */ extern int incr( int ); Primitive types integral types: char, short, int, long; enum can be signed or unsigned, e.g. unsigned int counter; sizes are implementation dependent (compiler/platform), use sizeof( datatype ) operator to write portable code mymain.c abc.c glob.h #include <stdio.h> #include ”glob.h” // definition of var. initval: int initval = 17; int counter; // locally def. // definition of func. incr: void main( void ) int incr( int k ) { floatingpoint types: float, double, long double extern int initval; Note the difference: Declarations announce signatures (names+types). Definitions declare (if not already done) AND allocate memory. Header files should never contain definitions! TDDB68, C. Kessler, IDA, Linköpings universitet. { pointers Composite data types arrays structs counter = initval; return k+1; printf(“new counter: %d”, incr( counter ) ); } unions Programmer can define new type names with typedef } 2.7 TDDB68, C. Kessler, IDA, Linköpings universitet. Constants and Enumerations 2.8 Composite data types (1): structs Constant variables: const int red = 2; const int blue = 4; const int green = 5; const int yellow = 6; struct My_IComplex { Enumerations: enum { red = 2, blue = 4, green, yellow } color; color = green; // expanded by compiler to: color = 5; icplx y; x: int re, im; x.re x.im }; struct My_IComplex x; // definition of x typedef struct My_IComplex icplx; // introduces new type name icplx void main ( void ) { With the preprocessor: symbolic names, textually expanded before compilation #define red 2 ... (no =, no semicolon) 2.9 Unions break type safety can interpret same memory location with different types If handled with care, useful in low-level programming c: // integer complex (re, im) – see above float f; // floatingpoint value – overlaps with ic 2.10 Composite data types (3): Arrays Unions implement variant records: all attributes share the same storage (start at offset 0) struct My_IComplex ic; printf(”x needs %d bytes, an icplx needs %d bytes\n”, sizeof( x ), sizeof (icplx) ); } TDDB68, C. Kessler, IDA, Linköpings universitet. Composite data types (2): unions union My_flexible_Complex { y.im = 3 * x.re; // Remark: the sizeof operator is evaluated at compile time. In C, constants are often capitalized: RED, BLUE, ... TDDB68, C. Kessler, IDA, Linköpings universitet. x.re = 2; c.ic.re, c.f c.ic.im } c; c.ic.re = 1141123417; printf(”Float value interpreted from this memory address contents is %f\n”, c.f ); Declaration/definition very similar to Java: int a[20]; 3 6 8 4 3 int b[] = { 3, 6, 8, 4, 3 }; icplx c[4]; float matrix [3] [4]; Addressing: Location of element a[i] starts at: (address of a) + i * elsize where elsize is the element size in bytes Use: a[3] = 1234567; a[21] = 345; // ??, there is no array bound checking in C c[1].re = c[2].im; Dynamic arrays: see later Arrays are just a different view of pointers // writes 528.646057 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.11 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.12 Pointers Pointer arithmetics Each data item of a program exists somewhere in memory Hence, it has a (start) memory address. A variable of type pointer_to_type_X may take as a value the memory address of a variable of type X int a; int *p; // defines just the pointer p, not its contents! p = 0x14a236f0; // initialize p Dereferencing a pointer with the * operator: *p = 17; // writes 17 to whatever address contained in p The address of a variable can be obtained by the & operator: p = &a; // now, p points to address of a Use the NULL pointer (which is just an alias for address 0) to denote invalid addresses, end of lists etc. 2.13 TDDB68, C. Kessler, IDA, Linköpings universitet. Integral values can be added to / subtracted from pointers: int *q = p + 7; // new value of q is (value of p) + 7 * sizeof(pointee-type of p) Arrays are simply constant pointers to their first element: Notation b[3] is ”syntactic sugar” for *(b + 3) b[0] is the same as *b b: 6 8 4 3 A pointer can be subtracted from another pointer: unsigned int offset = q – p; TDDB68, C. Kessler, IDA, Linköpings universitet. Pointers and structs 3 2.14 Why do we need pointers in C? Defining recursive data structures (lists, trees ...) – as in Java Argument passing to functions struct My_IComplex { int re, im; } c, *p; p = &c; p is a pointer to a struct simulates reference parameters – missing in C (not C++) void foo ( int *p ) { *p = 17; } Call: foo ( &i ); p->re is shorthand for *(p + (offset of re) ) Example: as p points to c, &(p->re) is the same as &(c.re) Example: elem->next = NULL; p: 0xafb024 c: 0xafb024 c.re c.im Arrays are pointers Handle to access dynamically allocated arrays A 2D array is an array of pointers to arrays: int m[3][4]; // m[2] is a pointer to start of third row of m hence, m is a pointer to a pointer to an int For representing strings – missing in C as basic data type char *s = ”hello”; // s points to char-array {’h’,’e’,’l’,’l’,’o’, 0 } For dirty hacks (low-level manipulation of data) 2.15 TDDB68, C. Kessler, IDA, Linköpings universitet. Pointer type casting TDDB68, C. Kessler, IDA, Linköpings universitet. 2.16 Pointers to functions (1) Pointer types can be casted to other pointer types int i = 1147114711; int *pi = &i; printf ( ”%f\n”, * ( (float *) pi ) ); // prints 894.325623 All pointers have the same size (1 address word) But no conversion of the pointed data! (cf. unions) Compare this to: printf( ”%f\n”, (float) i ); A source of type unsafety, but often needed in low-level programming Function declaration extern int f( float ); Function call: f( x ) f is actually a (constant) pointer to the first instruction of function f in program memory Call f( x ) dereferences pointer f push argument x; save PC and other reg’s; PC := f; Function pointer variable int (*pf)(float); Generic pointer type: void * Pointee type is undefined Always requires a pointer type cast before dereferencing TDDB68, C. Kessler, IDA, Linköpings universitet. 2.17 // pf is a pointer to a function // that takes a float and returns an int pf = f; // pf now contains start address of f pf( x ); // or (*pf)(x) dereferencing (call): same effect as f(x) TDDB68, C. Kessler, IDA, Linköpings universitet. 2.18 Pointers to functions (2) Pointers to functions (3) Most frequent use: generic functions Most frequent use: generic functions Example: Ordinary sort routine Example: Generic sort routine void bubble_sort( int arr[], int asize ) { int i, j; for (i=0; i<asize-1; i++) for (j=i+1; j<asize; j++) if ( arr[i] > arr[j] ) ... // interchange arr[i] and arr[j] } void bubble_sort( int arr[], int asize, int (*cmp)(int,int) ) { int i, j; for (i=0; i<asize-1; i++) for (j=i+1; j<asize; j++) if ( cmp ( arr[i] , arr[j] ) ) ... // interchange arr[i] and arr[j] } Need to rewrite this for sorting in a different order? int gt ( int a, int b ) { if (a > b) return 1; else return 0; } Idea: Make bubble_sort generic in the compare function bubble_sort ( somearray, 100, gt ); bubble_sort ( otherarray, 200, lt ); TDDB68, C. Kessler, IDA, Linköpings universitet. 2.19 TDDB68, C. Kessler, IDA, Linköpings universitet. Storage classes in C (1) Storage classes in C (2) Automatic variables Local variables and formal parameters of a function Exist once per call Visible only within that function (and function call) Space allocated automatically on the function’s stack frame Live only as long as the function executes int *foo ( void ) // function returning a pointer to an int. { int t = 3; // local variable return &t; // ?? t is deallocated on return from foo, // so its address should not make sense to the caller... } TDDB68, C. Kessler, IDA, Linköpings universitet. 2.21 Unless made globally visible by an extern declaration, visible only within this module (file scope) int x; // at top level – outside any function extern int y; // y seen from all modules; only declaration int y = 9; // only 1 definition of y for all modules seeing y Space allocated automatically when the program is loaded Static variables static int counter; Allocated once for this module (i.e., not on the stack) even if declared within a function! Value will survive function return: next call sees it 2.22 Run-Time Memory Organization malloc( N ) and returns a generic (void *) pointer to it; this pointer can be type-casted to the expected pointer type free( p ) deallocates heap memory block pointed to by p Can be used e.g. for simulating dynamic arrays: Recall: arrays are pointers a[3] = 17; Memory space for this user program NB: This is a simplification. TDDB68, C. Kessler, IDA, Linköpings universitet. 2.23 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.24 Addr. 0: Heap limit Stack limit fp+8: Space for local var. t Old fp (of main) fp+4: some local var. of main _y: .double 9 _x: .space 4 ... _main: ... (code for main) _foo: ... (code for foo)... (OS Kernel data and code) Stack icplx *p = (icplx *)malloc ( sizeof ( icplx )); Automatic variables (formal parameters, local variables of functions) allocated on the run-time stack at function call and removed on return indexed by frame pointer (or stack pointer) Global and static variables sp Stack frame for current funreside in the .data segment ction foo called from main indexed by absolute address sp fp Stack frame for (or globals pointer) initial call to main fp Dynamically allocated objects .data segment reside on the heap (copied in by loader) indexed via pointer variables Compiled program code (for function bodies) .text segment reside in the .text segment (copied in by loader) branch target addresses: by absolute address or PC-relative Heap allocates a block of N bytes on the heap int *a = (int *) malloc ( k * sizeof( int ) ); Global variables Declared and defined outside any function TDDB68, C. Kessler, IDA, Linköpings universitet. Dynamic allocation of memory in C Example: 2.20 C: There is much more to say... Compiling and Linking Type conversions and casting stdio.h glob.h xy.h Bit-level operations #include Operator precedence order #include #include preprocess Variadic functions (with a variable number of arguments, e.g. printf() ) abc.c mymain.c def.c C standard library compile C preprocessor macros I/O in C compile abc.o compile mymain.o def.o ... libc.a link Now: Compiling, Linking, Loading, Debugging TDDB68, C. Kessler, IDA, Linköpings universitet. 2.25 Compiling C Programs Needed in both static linking and loading Patching Single module: Compile and link (build executable): gcc mymain.c executable: a.out automatizes building process for several modules Check the man pages! TDDB68, C. Kessler, IDA, Linköpings universitet. 2.27 Linking Multiple Modules (1) 2.26 How to relocate code? Example: GNU C Compiler: gcc (Open Source) One command calls preprocessor, compiler, assembler, linker gcc –o myprog mymain.c rename executable Multiple modules: Compile and link separately gcc –c –o mymain.o mymain.c compiler only gcc –c –o other.o other.c compiles other.c gcc other.o mymain.o call the linker, -> a.out make (e.g. GNU gmake) C run-time library is linked with the user code a.out (executable) TDDB68, C. Kessler, IDA, Linköpings universitet. Absolute Jumps Adresses of global variables Relocation table: - line 3, patch base+1 - line 3, patch base+5 Symbolic addressing: Relative addressing: Absolute addressing: … 0: … 243: … Definition of data X 1: Space for data 244: Space for data … 2: … 245: … If (X) goto P 3: If (*1) goto 5 … as P: … TDDB68, C. Kessler, IDA, Linköpings universitet. 5: … 248: … 2.28 Linking Multiple Modules (2) Compiler (-c) created an (object) module file (.o, .bin) Binary format, e.g. COFF (UNIX), ELF (Linux) Non-executable (yet) Segments for code, global data, stack / heap space List of global symbols (e.g., functions, extern variables) Addresses in each segment start at 0 Relocation table: List of addresses/instructions that the linker must patch when changing the start address of the module Static relocation (at compile/link time): Merge all object modules to a single object module, with consecutive addresses in each segment type TDDB68, C. Kessler, IDA, Linköpings universitet. 2.29 246: If (*244) goto 248 Static 247: … loading 4: … TDDB68, C. Kessler, IDA, Linköpings universitet. 2.30 Background: How the Linker Works Sources of information Read all object modules to be linked (including library archive modules if necessary) Merge the code, data, stack/heap segments of these into a single code, data, stack/heap segment Resolve global symbols (e.g., global functions, variables): check for duplicate globals, undefined globals Write the resulting object module, with a new relocation table and mark it executable. Variants of static linking: (need hardware support) Dynamic linking (on demand at run time, as in Java) Shared libraries DDD tutorial Books on C and C++ B. Kernighan, D. Ritchie: The C Programming Language, 2nd ed., Prentice Hall, 1988. Covers ANSI-C. P. Darnell, P. Margolis: Software Engineering in C. Springer, 1988. H.M. Deitel, P.J. Deitel: C How to Program. Prentice-Hall, 1992. Bjarne Stroustroup: The C++ Programming Language, 3rd ed., Addison-Wesley, 2000. ... and many other good books Open IDA course on C and C++ TDDB89 Advanced Programming in C++. Given every semester. 2.31 TDDB68, C. Kessler, IDA, Linköpings universitet. Lab-Kickoff Session Programming exercise: C pointer programming and debugging with DDD 2.32 TDDB68, C. Kessler, IDA, Linköpings universitet. TDDB68 Concurrent programming and operating systems Homework Practise! Attend the Lab-Kickoff session Read the DDD tutorial on the course homepage The following material may also be of some interest. Appendix: Debugging Debugging vs. Testing Debugging Methods and Tools Debugger Technology Debugging Concurrent Programs Literature on Debugging Debugging vs. Testing Debugging Techniques and Tools (1) Testing: may detect existence of a programming error (but cannot guarantee absence of errors!) Manual Methods Static: Code-Inspection Compare output of testing candidate with reference output (e.g. of an older, working version – Regression testing, e.g. DEJAGNU) Debugging: localise error: Cause Effect Symbolic Debugger Systematic Isolation Hypothesis Set Dynamic: printf-statements, Validation of assertions (in C: assert(p) macro in assert.h) Tools for manual debugging: Iterative Process Error detected Initial e.g. dbx, Modify Hypothesis Set Choose Hypothesis Verify Hypothesis no gdb, jdb, ddd Debug-Problem Documentation e.g. BUGZILLA Error gone? (Bug database + Web interface) Runtime-Protection Monitoring, esp. for memory access errors: ElectricFence, VALGRIND, Java VM, INSURE++, PURIFY, … yes TDDB68, C. Kessler, IDA, Linköpings universitet. Christoph Kessler, IDA, Linköpings universitet. 2.33 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.35 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.36 Symbolic Debugger (1) Debugging Techniques and Tools (2) Automatic Debugging: Formal Verification against a formal specification of the program Often no or incomplete specification available Perhaps can derive specification (but then itself error-prone) Search source program for language-specific error idioms e.g. lint, splint, jlint incomplete Reduce scope of searching by static program analysis: Program Slicing [Weiser’82] [Lyle, Weiser’87] Program Dicing (Difference of 2 slices) [Lyle, Weiser’87] e.g. UNRAVEL slicer [Lyle’95] Needs good static analysis (data flow analysis, points-to-analysis) Conservative static approximation – Slices quickly get large Delta-Debugging Automatic scope reduction by binary partitioning (input data, code) 2.37 TDDB68, C. Kessler, IDA, Linköpings universitet. Needs information about name and type of memory locations on source code level i.e., the symbol table and type table of the compiler added to generated binary file on request (cc –g … ) Needs coordinates of the program points in source code (e.g. line nr.) Needs tight similarity of control flow between source code structure and generated machine code Incompatible with aggressive compiler optimizations e.g. Prefetching, Loop-invariant code hoisting, Loop transformations, Scheduling Trade-Off Code-Efficiency Debugger-Transparency Can under certain circumstances lead to ”Heisenbugs” Error does not occur if run with debugger (may even happen with printf-debugging) Graphic User Interface (e.g. ddd, Eclipse Debug-View) on top of a command-line debugger (z.B. dbx, gdb, jdb) 2.38 TDDB68, C. Kessler, IDA, Linköpings universitet. Symbolic Debugger (2) Debugger-Technique with OS/HW-Support Debugger-Process Post-Mortem-Debugging After crash: Read core-file; inspect memory contents, variables values fork() (via OS) Stop computation signal() (via OS): ”stop” wait() (via OS) Set / clear breakpoints ptrace() (via OS) read, write values in address space; add breakpoints (special instructions) … Step-by-step execution Print values, evaluate expressions (Interpreter) signal() (via OS): ”resume” Change values of variables Find breakpoint: … Walk call chain upwards / downwards TDDB68, C. Kessler, IDA, Linköpings universitet. signal(): ”continue” 2.39 TDDB68, C. Kessler, IDA, Linköpings universitet. Debugging concurrent programs Thread 2 Nondeterminism Thread 2 Thread 1 Thread 1 Thread 2 2.40 Debugging - Summary and Literature Problem: Occurrence of a bug can depend on the schedule Run 2: CPU t Thread 2 Thread 1 Testing vs. Debugging Debugging Methods Debugger Technology Debugging Concurrent Programs hard to reproduce the error Solution 1: Deterministic replay Record inputs and schedule e.g. DEJAVU for Java Solution 2: Static Analysis (potential concurrency, ”Data-races”) Solution 3: Dynamic Analysis identifies shared-memory accesses at runtime DDD: www.gnu.org/software/ddd/ see also the DDD tutorial on the course home page M. Scott: Programming Language Pragmatics, Morgan Kaufmann 2000. Chapter on Debugging Srikant, Shankar: Compiler Design Handbook, CRC press 2003, Chapter 9 on debugger technology (by Aggarwal und Kumar) J. Rosenberg: How Debuggers Work. Wiley, 1996. A. Zeller: Why Programs Fail. A Guide to Systematic Debugging. Morgan Kaufmann, 2005. Solution 4: Test-based Approach with Delta-Debugging [Choi, Zeller ’02] In combination with DEJAVU TDDB68, C. Kessler, IDA, Linköpings universitet. trap signal(): ”Breakpoint” Inspect stack contents (call chain) Thread 1 OS-IRC ptrace() (via OS): ”trace me” Interactive Debugging Run 1: CPU Process being debugged OS 2.41 TDDB68, C. Kessler, IDA, Linköpings universitet. 2.42