Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Feb 5, 2016 Lab 4 supplement C vs. Java For our purposes C is almost identical to Java except: C has “functions”, JAVA has “methods” function == method without “class” i.e., a global method C has “pointers” explicitly Java has them (called “references”) but hides them under the covers JVM takes care of handling pointers, so the programmer doesn’t have to C++ is sort of in-between C and Java In this class, we will see how pointers/references are implemented in C What is a “pointer” in C? A pointer is an explicit memory address Example int i Memory address i 1056 1060 1064 1068 i is an integer variable in memory located at, say, address 1056 int *p p is a variable that “points to” an integer p is located at, say, address 2004 p = &i the value in p is now equal to the address of variable i i.e., the value stored in Mem[2004] is 1056 (the location of i) p 1056 2000 2004 Referencing and Dereferencing Referencing an object means … taking its address and assigning it to a pointer variable (e.g., p = &i) Dereferencing a pointer means … going to the memory address pointed to by the pointer, and accessing the value there (e.g., *p) Memory address i 5 1056 1060 1064 1068 Example int i; int *p; p = &i; *p = 5; // // // // // // i is an int variable p is a pointer to int referencing i p is assigned 1056 dereference p i assigned 5 p 1056 2000 2004 Pointer expressions and arrays Dereferencing could be done to an expression So, not just *p, but can also write *(p+400) accesses memory location that is 400th int after i Arrays in C are really pointers underneath! int a[10]; // array of integers a itself simply refers to the address of the a is the same as &a[0] a is a constant of type “int *” a[0] is the same as *a a[1] is the same as *(a+1) a[k] is the same as *(a+k) a[j] = a[k]; is the same as *(a+j) = *(a+k); start of the array // address of a[0] Pointer arithmetic and object size IMPORTANT: Pointer expressions automatically account for the size of object pointed to Memory address i 1056 1060 1064 1068 Example 1 if p is of type “int *” and an int is 4 bytes long if p points to address 1056, (p+2) will point to address 1064 C compiler automatically does the multiply-by-4 BUT… in machine language, we will have to explicitly do the multiply-by-4 Example 2 char *q; // char is 1 byte q++; // really does add 1 p 1056 1064 2000 2004 Pointer examples int i; int a[10]; int *p; // simple integer variable // array of integers // pointer to integer p = &i; p = a; p = &a[5]; *p *p = 1; *(p+1) = 1; p[1] = 1; p++; // // // // // // // // & means address of a means &a[0] address of 6th element of a value at location pointed by p change value at that location change value at next location exactly the same as above step pointer to the next element Pointer pitfalls int i; // simple integer variable int a[10]; // array of integers int *p; // pointer to integer(s) So what happens when p = &i; What is value of p[0]? What is value of p[1]? Very easy to exceed bounds C has no bounds checking! Iterating through an array 2 ways to iterate through an array using array indices void clear1(int array[], int size) { for(int i=0; i<size; i++) array[i] = 0; } using pointers void clear2(int *array, int size) { for(int *p = &array[0]; p < &array[size]; p++) *p = 0; } or, also using pointers, but more concise (more cryptic!) void clear3(int *array, int size) { int *arrayend = array + size; while(array < arrayend) *array++ = 0; } Pointer summary In the “C” world and in the “machine” world: a pointer is just the address of an object in memory size of pointer itself is fixed regardless of size of object to get to the next object: in machine code: increment pointer by the object’s size in bytes in C: increment pointer by 1 to get the ith object: in machine code: add i*sizeof(object) to pointer in C: add i to pointer Examples: int R[5]; // 20 bytes storage R[i] is same as *(R+i) int *p = &R[3] is same as p = (R+3) (p points 12 bytes after start of R) Strings in C There is no string type in C 😞 What?! Only low-level support for strings as character arrays char s[] char *s // array of characters // pointer to the beginning of a string But a rich library of string processing functions (string.h) reading: scanf(), fgets(), etc. printing: printf(), puts(), etc. processing: strcpy(), strlen(), strcat(), strcmp() Strings in C: NULL termination All C functions assume string terminates with a NULL NULL = ASCII 0 character also written as ‘\0’ Example: “hello” is actually { ‘h’ ‘e’ ‘l’ ‘l’ ‘o’ ‘\0’ } uses 6 characters, not 5 so, for a string with N real characters, declare it as char S[N+1] but C functions define its length as 5 strlen(“hello”) returns 5 fputs(), printf(), etc. will print the string until they see a ‘\0’ Be mindful of the terminating character! Declaring strings statically Statically (if size is known at compile time) char S[6] = “hello”; char S[] = “hello”; // compiler puts in the size char string_array[5][10]; an array of 5 strings, each up to 9 characters long (plus NULL) string_array[0] is the 0-th (very first) string string_array[4] is the 4th (last) string string_array[0][4] is the 4th character of the first string etc. Declaring strings dynamically Dynamically (if size is known only at run time) 1. declare a pointer 2. determine size (at run time), e.g., 99 real characters + NULL 3. char *s; size is 100 allocate space in memory s = malloc(100); BUT remember: there is no bounds checking, so do not put a string with more than 99 real characters in s… Comparison to Java char *s = malloc(100); char[] s = new char[100]; // in C // in Java When finished with string free(s); // free up the 100 bytes Java: no need to free; JVM does garbage collection Declaring strings dynamically Examples Declare an array of NUM strings, each at most LEN long (incl. the terminating NULL), where NUM and LEN are known at compile time char string_array[NUM][LEN]; Declare an array of NUM strings, where only NUM is known at compile time char *string_array[NUM]; string_array[0] = malloc(length of string 0…); string_array[1] = malloc(length of string 1…); etc. Declare an array of strings, where we don’t know how many strings there will be, and how long each will be, at compile time char **string_array; string_array = malloc(how many strings … * sizeof(char *)); string_array[0] = malloc(length of string 0…); string_array[1] = malloc(length of string 1…); etc. An array of strings Declare an array of NUM strings, where only NUM is known at compile time char *string_array[NUM]; string_array[0] = malloc(length of string 0…); string_array[1] = malloc(length of string 1…); etc. For sorting: Swap strings by swapping pointers, not by copying contents! char *temp; // pointer to char temp = string_array[i]; // swap pointers! string_array[i] = string_array[j]; // no copying chars string_array[j] = temp; One more thing… Passing pointers as arguments to functions Passing arguments by reference Example 1: reading values into variables int x, y, z; scanf(“%d%d%d”, &x, &y, &z); scanf() changes the values of x, y and z! How? we provide it the addresses where x, y and z are located Example 2: a function to double the value given to it void double_it(int x) { x = x*2; } doesn’t work… because x is “passed by value” a copy of x is modified inside the function, not the original Passing arguments by reference Example 2 take 2: a function to double the value given to it void double_it(int *x) { *x = (*x) * 2; } int main() { int y = 10; double_it(&y); // } y is now 20 works! Because y is “passed by reference” the address in memory where y is stored is sent to the function, which modifies it correctly at that location Passing arguments: C vs. Java Which of these will work in Java? WHY? 1. Modify an integer double_it(int x) { x = x*2; } 2. Modify an object double_it(Point p) { p.x = p.x*2; } 3. Swap two objects swap(Point p1, Point p2) { Point temp = p1; p1 = p2; p2 = temp; } Passing arguments: C vs. Java Java: passes arguments by value for all the primitive data types int, double, float, etc. double_it(int x) { x = x*2; } // won’t work passes objects by reference so member data of an object can be modified double_it(Point p) { p.x = p.x*2; } // works! BUT: references are passed by value to methods… so cannot swap two objects swap(Point p1, Point p2) { Point temp = p1; p1 = p2; p2 = temp; } // won’t work! Understanding C pointers will demystify everything!