CS252: Systems Programming Ninghui Li Based on Slides by Prof. Gustavo Rodriguez-Rivera Topic 1: Introduction to the Course and Review of C Programming General Information Web Page: http://www.cs.purdue.edu/homes/cs252 Office: LWSN2142K Office hours: Tentative: Mondays between 3:30 and 4:45. Talk with me after class Drop by for short questions E-mail: ninghui@cs.purdue.edu Textbook Textbook: No textbook. We will use my notes and selected material in the web Recommended (not required): The Linux Programming Interface: A Linux and UNIX System Programming Handbook Useful for several projects, and good as a reference book The Linux Command Line: A Complete Introduction How Linux Works: What Every Superuser Should Know These are not substitute for the lectures Course Communications Use Piazza for questions/answers, announcements, etc. Students will be added to Piazza by Thursday’s class Use Blackboard Learn for grade distribution Roles of This Course • Cover some introduction to the following CS topics • • • • • • Compilers (CS 352) Programming language (CS 456) Operating Systems (CS 354) Networking (CS 422) Security (CS 426) Information Systems (CS 348) Roles of This Course • Learn what to expect in upper-division CS courses • Ability to complete (which includes debug) large programming projects independently Policy Regarding Cheating • Cheating defined as • Copy your answers or code for the lab from another source • • Receives 0 in the lab Give your code to others to copy • Receives ½ of your original grade in the lab • Consequence beyond grade • Referred to Dean of Students for record • What happened in the past? Labs There is no lab the first week. The projects will be explained in the lab sessions. E-mail administrative questions to cs252-ta@cs.purdue.edu Post project questions on Piazza TAs office hours will be posted in the web page. Grading Grade allocation Midterm 1 (in class): 12% Midterm 2 (in class): 12% Final: 24% Projects/labs: 52% Six labs, weighted by #of weeks (total is 13) Exams also include questions about the labs. Course Organization: Part 1: C Programming Review • • • • Review of pointers, arrays, pointers to functions Process of Compiling, Assembling, Linking, Loading, Runtime Linker, Static and Shared Libraries; Address space. Structure of a Program. Debugging using GDB. Stack frames and buffer-overflow vulnerabilities Lab 1 (1 week): C/C++ Programming and GDB Course Organization: Part 2: Language Interpreter • • • Source Control Systems (CVS, SVN) and distributed (GIT, Mercurial) Learn a simple function programming language that uses only non-negative integers, and has 5 keywords: • inc, dec, ifz, define, halt Using Lex and YACC to create parsers Lab 2 (3 weeks): Write an interpreter for the language. Course Organization: Part 3: Unix Systems Programming I • • • • • Unix history & file system fundamentals Common Unix utilities Unix Shell scripting Structure of a Unix Shell Useful system calls: file creation, read, write, close, file mode, IO redirection, pipes, fork, wait, waitpid, signals, etc. Lab 3 (3 weeks): Writing a Unix shell. Course Organization: Part 4: Unix Systems Programming II • • • • • Introduction to operating systems OS kernel fundamentals: user mode, kernel mode, interrupts, system calls Processes, scheduling, Programming with threads, thread creation. Race Conditions, Mutex locks. Other synchronization methods Lab 4 (1 week): Introduction to Threads. Course Organization: Part 5: Unix Network Programming • • • Internet overview: ARP, IP, DNS, TCP, UDP, NAT Socket Programming. Server architecture Lab 5 (3 weeks): Implementing a Web Server Course Organization: Part 6: Miscellaneous topics • More UNIX Systems Programming • Introduction to SQL • Introduction to Software Engineering Lab 6 (Team Project, 2 or 3 weeks): Writing a Process Gladiator Problem of the Week • Questions taken from the books • Cracking the Coding Interview: 150 Programming Questions and Solutions • Elements of Programming Interviews • Not officially part of the grade equation • A site will be set up for students to post code and review other solutions POINTERS IN C Memory and Pointers (1) • 32-bit versus 64-bit • Processor: all registers have 32 bits for 32bit architecture • Operating systems • Programs can be compiled in either 32-bit or 64-bit • Main difference, amount of memory • 32-bit equals 4G • 64-bit needed to effectively use more than 4G memory Memory and Pointers A pointer is a variable that contains an address in memory. In a 32 bit architectures, the size of a pointer is 4 bytes independent on the type of the pointer. (4GB-1) 232-1 Char c = ‘A’; //ascii 65 char * p = &c; p:20: 12 c:12: 65 0 Address space Ways to get a pointer value (1) 1. Assign a numerical value into a pointer char * p = (char *) 0x1800; *p = 5; // Store a 5 in location 0x1800; What is likely to happen for the above code? Who would use something like this? What is Likely to Happen Using Absolute Value Pointers When programmer doesn’t what he is doing • Immediate core dump. • Mysterious program crash/misbehaving in the future When programmer knows • Program function as expected, as this is part of OS kernel or device driver • This is part of exploit code for breaking into a computer Note: Assigning a numerical value to a pointer isn't recommended and only left to programmers of OS, kernels, device drivers, or exploitation Ways to get a pointer value (2) 2. Get memory address from another variable: int *p; 220: buff[29]:216: int buff[ 30]; p = &buff[1]; buff[1]:104: buff[0]:100: *p =78; P: 96: 78 104 Ways to get a pointer value (3) 3. Allocate memory from the heap int *p p = new int; int *q; q = (int*)malloc(sizeof(int)); Using pointers You can pass a pointer as a parameter to a function if you want the function to modify the content of the parameters void swap (int *a, int *b){ int temp; temp=*a; *a=*b; *b=temp; } In main: swap(&x, &y) Understanding Call by Value and Call by Reference • Concepts of programming language design: • Call by Value: • • The value of the argument is passed into the function Call by reference: • The address of the argument is passed into the function • Specific coding strategy for a function • Call by reference typically means using pointers when define a function, and using addresses when calling it, see previous slide for an example • Is the C language call by value or call by reference? What about Java? Common Problems with Pointers When using pointers, make sure the pointer is pointing to valid memory before assigning or getting any value from the location Many string utility functions do not allocate memory for you: char *s; strcpy(s, "hello"); --> SEGV(uninitialized pointer) The only string function that allocates memory is strdup (it calls malloc with the length of the string and copies it) Printing Pointers It is useful to print pointers for debugging char*i; char buff[10]; printf("ptr=%d\n", &buff[5]) Or In hexadecimal printf("ptr=0x%x\n", &buff[5]) Instead of using printf, it is better to use fprintf(stderr, …) since stderr is unbuffered and it is guaranteed to be printed on the screen. sizeof() operator in Pointers The size of a pointer is always 4 bytes in a 32 bit architecture independent of the type of the pointer: sizeof(int)==4 bytes sizeof(char)==1 byte sizeof(int*)==4 bytes sizeof(char*)==4 bytes Array and Pointer: When they are equivalent? The following function adds the sum of integers in an array using array indexing. int sum(int * array, int n){ int s=0; for(int i=0; i<n; i++) { s+=array[i]; // Equivalent to //*(int*)((char*)array+i*sizeof(int)) } return s; } Array and Pointer: When they are equivalent? Now the equivalent code using pointers int sum(int* array, int n) { int s=0; int *p=&array[0]; int *pend=&array[n]; while (p < pend) { s+=*p; p++; } return s; } Array and Pointer: When they are equivalent? When you increment a pointer to integer it will be incremented by 4 units because sizeof(int)==4. Using pointers is more efficient because no indexing is required and indexing require multiplication. Note: An optimizer may substitute the multiplication by a “<<“ operator if the size is a power of two. However, the array entries may not be a power of 2 and integer multiplication may be needed. Array Operator Equivalence We have the following equivalences: int a[20]; a[i] - is equivalent to *(a+i) - is equivalent to *(&a[0]+i) – is equivalent to *((int*)((char*)&a[0]+i*sizeof(int))) You may substitute array indexing a[i] by *((int*)((char*)&a[0]+i*sizeof(int))) and it will work! C was designed to be machine independent Array and Pointer: When They are not Equivalent? They are mostly equivalent when referencing element in the code The are not equivalent when defining variables E.g., one file contains int data[]; Another contains extern int*data; Compiler will not treat them as the same Compiler generates different instructions for b[1]=a[1] for the two versions below (one more level of indirection needed for “int *b”). What will happen for code in red? void func (int *a) { int b[10]; … b[1] = a[1]; *(&b) = a; } void func (int *a) { int *b; … b[1] = a[1]; *(& b) = a; } 2D Array. st 1 a[3][2]:144: a[3][1]:140: a[3][0]:136: a[2][2]:132: a[2][1]:128: a[2][0]:124: a[1][2]:120: a[1][1]:116: a[1][0]:112: a[0][2]:108: a[0][1]:104: a: a[0][0]:100: 1st approach Normal 2D array. int a[4][3]; a[i][j] == *(int*)((char*)a + i*3*sizeof(int) + j*sizeof(int)) Implementation 2D Array nd 2 Implementation 2nd approach Array of pointers to rows int* (a[4]); for(int i=0; i<4; i++){ a[i]=(int*)malloc(sizeof(int)*3); assert(a[i]!=NULL); } 2D Array nd 2 Implementation 2nd approach Array of pointers to rows (cont) a: a[3][0] a[3][1]a[3][2] a[3]:112: a[2]:108: a[1]:104: a[0]:100: a[2][0] a[2][1]a[2][2] a[1][0]a[1][1]a[1][2] a[0][0] a[0][1]a[0][2] int*(a[4]); a[3][2]=5 2D Array rd 3 Implementation 3rd approach. a is a pointer to an array of pointers to rows. int **a; a=(int**)malloc(4*sizeof(int*)); assert( a!= NULL) for(int i=0; i<4; i++) { a[i]=(int*)malloc(3*sizeof(int)); assert(a[i] != NULL) } 2D Array rd 3 Implementation a is a pointer to an array of pointers to rows. (cont.) int **a; a:100: 900 a[3]:912: a[2]:908: a[1]:904: a[0]:900: a[3][2]=5 a[3][0] a[3][1]a[3][2] a[2][0] a[2][1]a[2][2] a[1][0]a[1][1]a[1][2] a[0][0] a[0][1]a[0][2] Advantages of Pointer Based Arrays You don’t need to know in advance the size of the array (dynamic memory allocation) You can define an array with different row sizes You do not need to know how many rows there are when writing the program Advantages of Pointer Based Arrays Example: Triangular matrix a:100: 900: a[3]:912: a[2]:908: a[1]:904: a[0]:900: int **a; a[3][0] a[2][0] a[2][1] a[1][0]a[1][1]a[1][2] a[0][0] a[0][1]a[0][2]a[0][3] Pointers to Functions Pointers to functions are often used to implement Polymorphism in “C”. Polymorphism: Being able to use the same function with arguments of different types. Example of function pointer: typedef void (*FuncPtr)(int a); FuncPtr is a type of a pointer to a function that takes an “int” as an argument and returns “void”. An Array Mapper typedef void (*FuncPtr)(int a); void intArrayMapper( int *array, int n, FuncPtr func ) { for( int = 0; i < n; i++ ) { (*func)( array[ i ] ); } } int s = 0; void sumInt( int val ){ s += val; } void printInt( int val ) { printf("val = %d \n", val); } Using the Array Mapper int a[ ] = {3,4,7,8}; main( ){ // Print the values in the array intArrayMapper(a, sizeof(a)/sizeof(int), printInt); // Print the sum of the elements in the array s = 0; intArrayMapper(a, sizeof(a)/sizeof(int), sumInt); printf(“total=%d\”, s); } Review: aka, topics that May Appear in Exams Size of data types under 32-bit and 64-bit. Concept of call-by-value and call-by-reference Design of programming language Method of coding (passing in value or address) Able to identify pointer usage and memory allocation errors Array vs pointers 2-D arrays General C programming questions Not strictly limited to things explicitly covered in this class. Upcoming • Topic 2: Program Compilation and Using GDB