CS252-Slides-2015-to..

advertisement
CS252: Systems Programming
Ninghui Li
Based on Slides by Prof. Gustavo Rodriguez-Rivera
Topic 1: Introduction to the Course
and Review of C Programming
General Information
Web Page:
http://www.cs.purdue.edu/homes/cs252
Office: LWSN2142K
Office hours:
Tentative: Mondays between 3:30 and
4:45.
Talk with me after class
Drop by for short questions
E-mail: ninghui@cs.purdue.edu
Textbook
Textbook:

No textbook. We will use my notes and selected
material in the web
Recommended (not required):
The Linux Programming Interface: A Linux and UNIX
System Programming Handbook
 Useful for several projects, and good as a reference
book
 The Linux Command Line: A Complete Introduction
 How Linux Works: What Every Superuser Should
Know
These are not substitute for the lectures


Course Communications
Use Piazza for questions/answers, announcements,
etc.
Students will be added to Piazza by
Thursday’s class
Use Blackboard Learn for grade distribution
Roles of This Course
• Cover some introduction to the following
CS topics
•
•
•
•
•
•
Compilers (CS 352)
Programming language (CS 456)
Operating Systems (CS 354)
Networking (CS 422)
Security (CS 426)
Information Systems (CS 348)
Roles of This Course
• Learn what to expect in upper-division CS
courses
• Ability to complete (which includes debug)
large programming projects independently
Policy Regarding Cheating
• Cheating defined as
•
Copy your answers or code for the lab from another
source
•
•
Receives 0 in the lab
Give your code to others to copy
•
Receives ½ of your original grade in the lab
• Consequence beyond grade
•
Referred to Dean of Students for record
• What happened in the past?
Labs
There is no lab the first week.
The projects will be explained in the lab
sessions.
E-mail administrative questions to
cs252-ta@cs.purdue.edu
Post project questions on Piazza
TAs office hours will be posted in the web
page.
Grading
Grade allocation
Midterm 1 (in class): 12%
 Midterm 2 (in class): 12%
 Final:
24%
 Projects/labs: 52%
 Six labs, weighted by #of weeks (total is
13)

Exams also include questions about the
labs.
Course Organization: Part 1: C
Programming Review
•
•
•
•
Review of pointers, arrays, pointers to
functions
Process of Compiling, Assembling,
Linking, Loading, Runtime Linker, Static
and Shared Libraries; Address space.
Structure of a Program.
Debugging using GDB.
Stack frames and buffer-overflow
vulnerabilities
Lab 1 (1 week): C/C++ Programming and GDB
Course Organization: Part 2:
Language Interpreter
•
•
•
Source Control Systems (CVS, SVN) and
distributed (GIT, Mercurial)
Learn a simple function programming
language that uses only non-negative
integers, and has 5 keywords:
• inc, dec, ifz, define, halt
Using Lex and YACC to create parsers
Lab 2 (3 weeks): Write an interpreter for the language.
Course Organization: Part 3:
Unix Systems Programming I
•
•
•
•
•
Unix history & file system fundamentals
Common Unix utilities
Unix Shell scripting
Structure of a Unix Shell
Useful system calls: file creation, read,
write, close, file mode, IO redirection,
pipes, fork, wait, waitpid, signals, etc.
Lab 3 (3 weeks): Writing a Unix shell.
Course Organization: Part 4:
Unix Systems Programming II
•
•
•
•
•
Introduction to operating systems
OS kernel fundamentals: user mode,
kernel mode, interrupts, system calls
Processes, scheduling,
Programming with threads, thread
creation.
Race Conditions, Mutex locks. Other
synchronization methods
Lab 4 (1 week): Introduction to Threads.
Course Organization: Part 5:
Unix Network Programming
•
•
•
Internet overview: ARP, IP, DNS, TCP,
UDP, NAT
Socket Programming.
Server architecture
Lab 5 (3 weeks): Implementing a Web Server
Course Organization: Part 6:
Miscellaneous topics
• More UNIX Systems Programming
• Introduction to SQL
• Introduction to Software Engineering
Lab 6 (Team Project, 2 or 3 weeks): Writing a
Process Gladiator
Problem of the Week
• Questions taken from the books
• Cracking the Coding Interview: 150
Programming Questions and Solutions
• Elements of Programming Interviews
• Not officially part of the grade equation
• A site will be set up for students to post
code and review other solutions
POINTERS IN C
Memory and Pointers (1)
•
32-bit versus 64-bit
•
Processor: all registers have 32 bits for 32bit architecture
•
Operating systems
•
Programs can be compiled in either 32-bit
or 64-bit
•
Main difference, amount of memory
•
32-bit equals 4G
•
64-bit needed to effectively use more than 4G
memory
Memory and Pointers
A pointer is a variable that contains an
address in memory.
In a 32 bit architectures, the size of a pointer
is 4 bytes independent on the type of the
pointer.
(4GB-1) 232-1
Char c = ‘A’; //ascii 65
char * p = &c;
p:20:
12
c:12:
65
0
Address space
Ways to get a pointer value (1)
1. Assign a numerical value into a pointer
char * p = (char *) 0x1800;
*p = 5; // Store a 5 in location 0x1800;
What is likely to happen for the above code?
Who would use something like this?
What is Likely to Happen Using
Absolute Value Pointers
When programmer doesn’t what he is doing
•
Immediate core dump.
•
Mysterious program crash/misbehaving in the future
When programmer knows
•
Program function as expected, as this is part of OS
kernel or device driver
•
This is part of exploit code for breaking into a computer
Note: Assigning a numerical value to a pointer isn't
recommended and only left to programmers of OS, kernels,
device drivers, or exploitation
Ways to get a pointer value (2)
2. Get memory address from another variable:
int *p;
220:
buff[29]:216:
int buff[ 30];
p = &buff[1]; buff[1]:104:
buff[0]:100:
*p =78;
P: 96:
78
104
Ways to get a pointer value (3)
3. Allocate memory from the heap
int *p
p = new int;
int *q;
q = (int*)malloc(sizeof(int));
Using pointers
You can pass a pointer as a parameter to a
function if you want the function to
modify the content of the parameters
void swap (int *a, int *b){
int temp;
temp=*a;
*a=*b;
*b=temp;
}
In main: swap(&x, &y)
Understanding Call by Value and
Call by Reference
• Concepts of programming language design:
•
Call by Value:
•
•
The value of the argument is passed into the function
Call by reference:
•
The address of the argument is passed into the function
• Specific coding strategy for a function
•
Call by reference typically means using pointers when
define a function, and using addresses when calling it,
see previous slide for an example
• Is the C language call by value or call by
reference? What about Java?
Common Problems with Pointers
When using pointers, make sure the pointer is
pointing to valid memory before assigning or
getting any value from the location
Many string utility functions do not allocate
memory for you:
char *s;
strcpy(s, "hello"); --> SEGV(uninitialized pointer)
The only string function that allocates memory is
strdup (it calls malloc with the length of the string
and copies it)
Printing Pointers
It is useful to print pointers for debugging
char*i;
char buff[10];
printf("ptr=%d\n", &buff[5])
Or In hexadecimal
printf("ptr=0x%x\n", &buff[5])
Instead of using printf, it is better to use
fprintf(stderr, …) since stderr is unbuffered
and it is guaranteed to be printed on the screen.
sizeof() operator in Pointers
The size of a pointer is always 4 bytes in a
32 bit architecture independent of the type
of the pointer:
sizeof(int)==4 bytes
sizeof(char)==1 byte
sizeof(int*)==4 bytes
sizeof(char*)==4 bytes
Array and Pointer: When they are
equivalent?
The following function adds the sum of integers in an
array using array indexing.
int sum(int * array, int n){
int s=0;
for(int i=0; i<n; i++) {
s+=array[i]; // Equivalent to
//*(int*)((char*)array+i*sizeof(int))
}
return s;
}
Array and Pointer: When they are
equivalent?
Now the equivalent code using pointers
int sum(int* array, int n) {
int s=0;
int *p=&array[0];
int *pend=&array[n];
while (p < pend) {
s+=*p;
p++;
}
return s;
}
Array and Pointer: When they are
equivalent?
When you increment a pointer to integer it will be
incremented by 4 units because sizeof(int)==4.
Using pointers is more efficient because no
indexing is required and indexing require
multiplication.
Note: An optimizer may substitute the
multiplication by a “<<“ operator if the size is a
power of two. However, the array entries may not
be a power of 2 and integer multiplication may be
needed.
Array Operator Equivalence
We have the following equivalences:
int a[20];
a[i]
- is equivalent to
*(a+i)
- is equivalent to
*(&a[0]+i) – is equivalent to
*((int*)((char*)&a[0]+i*sizeof(int)))
You may substitute array indexing a[i] by
*((int*)((char*)&a[0]+i*sizeof(int)))
and it will work!
C was designed to be machine independent
Array and Pointer: When They
are not Equivalent?
They are mostly equivalent when referencing element in the code
The are not equivalent when defining variables
E.g., one file contains
int data[];
Another contains
extern int*data;
Compiler will not treat them as the same
Compiler generates different instructions for b[1]=a[1] for the two
versions below (one more level of indirection needed for “int *b”).
What will happen for code in red?
void func (int *a) {
int b[10];
…
b[1] = a[1];
*(&b) = a;
}
void func (int *a) {
int *b;
…
b[1] = a[1];
*(& b) = a;
}
2D Array.
st
1
a[3][2]:144:
a[3][1]:140:
a[3][0]:136:
a[2][2]:132:
a[2][1]:128:
a[2][0]:124:
a[1][2]:120:
a[1][1]:116:
a[1][0]:112:
a[0][2]:108:
a[0][1]:104:
a: a[0][0]:100:
1st approach
Normal 2D array.
int a[4][3];
a[i][j] ==
*(int*)((char*)a +
i*3*sizeof(int) +
j*sizeof(int))
Implementation
2D Array
nd
2
Implementation
2nd approach
Array of pointers to rows
int* (a[4]);
for(int i=0; i<4; i++){
a[i]=(int*)malloc(sizeof(int)*3);
assert(a[i]!=NULL);
}
2D Array
nd
2
Implementation
2nd approach
Array of pointers to rows (cont)
a:
a[3][0] a[3][1]a[3][2]
a[3]:112:
a[2]:108:
a[1]:104:
a[0]:100:
a[2][0] a[2][1]a[2][2]
a[1][0]a[1][1]a[1][2]
a[0][0] a[0][1]a[0][2]
int*(a[4]);
a[3][2]=5
2D Array
rd
3
Implementation
3rd approach. a is a pointer to an array of pointers
to rows.
int **a;
a=(int**)malloc(4*sizeof(int*));
assert( a!= NULL)
for(int i=0; i<4; i++)
{
a[i]=(int*)malloc(3*sizeof(int));
assert(a[i] != NULL)
}
2D Array
rd
3
Implementation
a is a pointer to an array of pointers to rows.
(cont.)
int **a;
a:100:
900
a[3]:912:
a[2]:908:
a[1]:904:
a[0]:900:
a[3][2]=5
a[3][0] a[3][1]a[3][2]
a[2][0] a[2][1]a[2][2]
a[1][0]a[1][1]a[1][2]
a[0][0] a[0][1]a[0][2]
Advantages of Pointer Based Arrays
You don’t need to know in advance the size
of the array (dynamic memory allocation)
You can define an array with different row
sizes
You do not need to know how many rows
there are when writing the program
Advantages of Pointer Based Arrays
Example: Triangular matrix
a:100: 900:
a[3]:912:
a[2]:908:
a[1]:904:
a[0]:900:
int **a;
a[3][0]
a[2][0] a[2][1]
a[1][0]a[1][1]a[1][2]
a[0][0] a[0][1]a[0][2]a[0][3]
Pointers to Functions
Pointers to functions are often used to implement
Polymorphism in “C”.
Polymorphism: Being able to use the same
function with arguments of different types.
Example of function pointer:
typedef void (*FuncPtr)(int a);
FuncPtr is a type of a pointer to a function that
takes an “int” as an argument and returns
“void”.
An Array Mapper
typedef void (*FuncPtr)(int a);
void intArrayMapper( int *array, int n, FuncPtr func ) {
for( int = 0; i < n; i++ ) {
(*func)( array[ i ] );
}
}
int s = 0;
void sumInt( int val ){
s += val;
}
void printInt( int val ) {
printf("val = %d \n", val);
}
Using the Array Mapper
int a[ ] = {3,4,7,8};
main( ){
// Print the values in the array
intArrayMapper(a, sizeof(a)/sizeof(int), printInt);
// Print the sum of the elements in the array
s = 0;
intArrayMapper(a, sizeof(a)/sizeof(int), sumInt);
printf(“total=%d\”, s);
}
Review: aka, topics that May
Appear in Exams
Size of data types under 32-bit and 64-bit.
Concept of call-by-value and call-by-reference
Design of programming language
Method of coding (passing in value or address)
Able to identify pointer usage and memory allocation
errors
Array vs pointers
2-D arrays
General C programming questions
Not strictly limited to things explicitly covered in this
class.
Upcoming
• Topic 2: Program Compilation and
Using GDB
Download