Handy Guide to Unix/Linux Commands: (we will be using unix.andrew.cmu.edu) cd <directoryname> Changes the current directory to <directoryname>. Use “cd ..” to change directory to the parent directory. cp <filename1> <filename2> Copies <filename1> to <filename2>. If <filename2> already exists as a file, it is destroyed. logout Ends your Unix/Linux session. ls [<directoryname>] Lists the contents (files and subdirectories) in the directory <directoryname>. Use “ls” by itself to list the contents of the current directory mkdir <directoryname> Creates a subdirectory named <directoryname> in the current directory. If <directoryname> already exists, nothing happens. more <filename> Displays the contents of a text-based file one “page” at a time. Attempting to use more with a non-text-based file might create some interesting side effects. mv <filename1> <filename2> Renames <filename1> to <filename2>. This command can be used to move files if the filename includes a path. If <filename2> already exists as a file, it is destroyed. pwd Shows the complete absolute path to the current directory. rm <filename> Removes <filename>. rmdir <directoryname> Removes the subdirectory named <directoryname> if it is empty. If <directoryname> contains any files or subdirectories, nothing happens. !<letter or letters> Repeats the last command beginning with the <letter or letters> supplied. Compiling a C program: gcc <Cfilename> [-o <outputfilename>] [-lm] Compiles the C code in <Cfilename> and creates an executable file named <outputfilename>. If no <outputfilename> is specified, a file named “a.out” is created. If <outputfilename> (or “a.out”) already exists as a file, it is destroyed. Add “lm” to link the mathematics library. 1 Handy Guide to C: What every program should have: In the first lines, include the libraries you will need. For now, all we need are the input and output functions, so your first line of code should be: #include <stdio.h> Every program needs a main block, which indicates where the program starts execution: main () { // this is where your code goes printf("Hello, world!\n"); } Comments are a very good idea, especially when people like hard-working teaching assistants are trying to evaluate or help debug your code for you. Any text on a line following a double slash will be ignored by the compiler. Key differences between C and C++ or Java: To print to the screen, use the printf() function. To print a string constant, simply use quotes: printf("Hello, world!\n"); the “\n” is called a “new line” character. It tells the output to go to the next line. To print the value of integers on the screen, use the printf() function with “%d” in the place you would like the integer to appear. Then insert as additional parameters, the variable (or variables) to use in place of each “%d”. For example: main () { int i; i = 12; printf("Hello, world... for the %dth time!!\n", i); } will print “Hello, world... for the 12th time!!”. The additional parameters for printf() may be variables, constants, or expressions. Variables must be declared at the beginning of each code block, you may not mix declarations with code. For example, the following for loop in C++ (or similar construction in Java) will not compile in C. main () { for (int i = 0; i < 10; i++) { printf("%d\n", i); } } It must be written as: main () { int i; for (i = 0; i < 10; i++) { printf("%d\n", i); } } 2 Handy Guide to Control Structures (which are the same in C, C++, and Java): If/Then/Else: Basic construction (the else clause is optional). if (expression that is true or false) { statements to execute if the expression is true } else { statements to execute if the expression is false } Example: if (i > 5) { printf("%d is a number bigger than five!!", i); } else { printf("%d is smaller than or equal to five.", i); } While or Do/While: Basic constructions. while (expression that is true or false) { statements to execute if the expression is true after the statements are executed, check the expression again, and rerun the statements if it is still true. } OR do { statements to execute at least once after the statements are executed, check the expression (again), and rerun the statements if it is (still) true. } while (expression that is true or false) Example: while (i > 5) { printf("%d is still a number bigger than five!!", i); i = i – 10; } For: Basic construction. for (initializing statement; expression that is true or false; incrementing statement) { statements to execute if the expression is true after the initializing statement is executed after the statements are executed, execute the incrementing statement and check the expression again. Rerun the statements if it is still true. } Example: for (i = 0; i < 10; i++) { printf("%d\n", i); } Note: The statement “i++” is short hand for “i = i+1”. The first C programmers were not good typists and developed many shortcuts such as this one. 3 Handy Guide to some of the interesting quirks of C, C++, and Java: Uninitialized variables When you first declare a variable, it will usually, but not always, have the value zero. Never rely on the contents of an uninitialized variable. Assignment vs. Equality “=” is the assignment operator, while “==” is the comparison operator. main () { int i; i = 12; //this statement sets the variable i to 12 if (i == 12) { // this starts an if statement printf("it's 12!\n"); // this will execute if i is 12. } if (i = 14) { // this will not do what you think, printf("it's 14!\n"); // but it will not generate a } // compiler error. } Integer Division Performing operations on integers will always yield integers, even if you think they shouldn’t. main () { int i, j; i = 14; j = 3; printf("%d\n", i/j); // remainders are dropped. } } Compound Conditionals with and (&&) and or (||) and not (!). If you wish to construct more complicated true or false expressions using the logical operators, and and or, you use a double ampersand for and (&&) and a double vertical bar for or (||). For an expression that is true when i is between 5 and 10, use if (i >= 5 && i <= 10) { For an expression that is true when either i or j is larger than 100 (or both larger), use if (i > 100 || j > 100) { For an expression that is true when i is not equal to 57 you could use if (!(i == 57)) { or, in this case, you could use the operator for “not equal to”, which is “!=” if (i != 57) { Exercise 1: Write a program that will print all the squares of the integers from 1 to 20 except 6 2. The output should look like: 1 squared is 1. 2 squared is 4. 3 squared is 9. 4 squared is 16. 5 squared is 25. 7 squared is 49. 8 squared is 64. 4 9 squared is 81. 10 squared is 100. etc... 5 Handy Guide to Writing C Functions: C functions are what gives a C program its structure. A function can return a value of any simple type (so no structs in this case). If the function will not return anything, programmers usually indicate this with the void type. Actually, the main block of code is technically a procedure, and some programmers would be greatly offended that we used just plain “main” instead of “void main” to show that our main block of code returns no value. A function can take input parameters, specified in a parameter list following the declaration. #include <stdio.h> int Squarer (int param) { return (param*param); } void main () { int i, j; i = Squarer(4); j = 7; printf("i is equal to %d.\n", i); printf("%d squared is %d.\n", j, Squarer(j)); printf("j is still equal to %d.\n", j); } Note that when variables are passed to a function as parameters, they are not modified. This is because copies of the variable are made for the function to use. Variables declared within a function cannot be seen or used outside the function. These are called local variables. #include <stdio.h> int Factorial (int param) { int returnValue; // this is a local variable returnValue = 1; while (param > 0) { returnValue = returnValue * param; param--; // shorthand for param = param – 1; } return (returnValue); } void main () { int i; i = 5; printf("%d factorial is %d.\n", i, Factorial(i)); printf("i is still equal to %d.\n", i); } Functions with multiple parameters should have each parameter in the parameter list separated with commas. Each parameter still must be given its type individually: void MultiParamFunction (int p1, int p2, int p3) { Functions with no parameters still need parentheses with an empty parameter list: 6 int NoParamFunction () { 7 Handy Guide to structs in C. structs in C (from which came their more robust descendants: classes of C++ and Java) are compound data types, useful for binding multiple pieces of information into a single record. I recommend first creating a type for your structs, and then you can use this type anywhere you might use other types, like int, char, or double. To declare a type for PGSS student records, for example: #include <stdio.h> typedef struct { int IDnumber; // ID number for the student int HomeZipCode; // Zip Code for student’s residence char Lab; // B=Bio, C=Chem, F=Forens, S=Comp Sci, P=Phys } stuType; main () { stuType me; // declares variable "me" of type stuType //main code goes here; } To access the pieces of this struct, use the period operator, just as you would in C++ or Java: me.IDnumber = 47 // I’m now the 47th student if (me.HomeZipCode > 18900) { printf("I live in the Philadelphia Area!\n"); } if (me.LabCourse == 'S') { // Use single quotes for chars printf("I have chosen wisely!\n"); } Note that I didn’t include a field for your name… only because I do not wish to teach you about strings and character arrays in C at this point in time Exercise 2: Write a program with the PGSS student type from this page, then write a function which will print whether the student, passed as a parameter, is in the Computer Science lab or not. If you name your function CSstatus, and the main code looks like: main() { stuType s1, s2; s1.IDnumber = 12; s2.IDnumber = 47; s1.Lab = 'C'; s2.Lab = 'S'; CSstatus(s1); CSstatus(s2); } your program should print something like: Student 12 is foolish not to take the Computer Science Lab. Student 47 is wise to take the Computer Science Lab. 8 Handy Guide to Pointers (a.k.a. Pointers on Pointers) A pointer is a type of variable that stores the location of data, not the data itself. Of course, locations of data are really just another kind of data, but to keep from confusing pointers and data, let us promise never to reference locations as data. These locations are actually memory addresses within the data segment of your program, but you really don’t need to know that. Pointers come in different types, just like data. For example, you can have int pointers, char pointers, double pointers. You also can make pointers to structs or classes. If you really feel brave, you can have pointers pointing to other pointers. The type determines how the data being pointed at is to be interpreted. Declaring a pointer: To declare a pointer of a specific type, use an asterisk before the variable name: o int *p; creates an integer pointer named p. o char *myPtr; creates a character pointer named myPtr. o StructType *qq; creates a pointer named qq that points to an object of type StructType, assuming it is a struct. Note that creating a pointer does not simultaneously create an object for it to point toward. Before using your pointer, you should assign it to a location of an appropriate object. Finding an address: Use the & operator to get an address of an object: o int *p; // create an integer pointer named p int j = 12; // create an integer named j initialized to 12 p = &j; // assign to p the address of j. o The pointer p now points to the variable j. p j 12 Following a pointer: Use the * operator to get to the object that a pointer points toward: p j k o int *p; // create an integer pointer named p 12 14 int j,k; // declare two integers line 4: j = 12; // j is assigned to 12 k = 14; // k is assigned to 14 12 12 p = &j; // assign to p the address of j. line 6: k = *p; // k gets the value of the object p points at: 12 *p = 16; // the object p points at is line 7: 16 12 // assigned the value 16 k = j; // k gets the value of j, which is 16 from the previous line. k = p; // this is bad news, and should create a compiler warning. // You should never mix locations and data types in this way. NULL pointers: If you do not wish your pointer to point at anything (yet), but you do not wish to leave it uninitialized, you may set it to NULL. This can be useful for testing pointers before trying to follow them. o int *p; // create an integer pointer named p p p = NULL; // set it to NULL 9 It should be noted that this is how functions can modify variables. If a pointer to a variable is passed as a parameter, the function makes a copy of the pointer, but what is being pointed at is still the actual variable. For example, this double procedure will actually double the value of a variable: #include <stdio.h> void Doubler (int *p) { *p = 2 * (*p); } main () { int i; i = 7; Doubler (&i); printf("i is now equal to %d.\n", i); } This principal is employed by the scanf functions, which allows the program to read input from the keyboard. #include <stdio.h> void Doubler (int *p) { *p = 2 * (*p); } main () { int i; printf("type an integer, then press enter: "); scanf ("%d", &i); printf("%d doubled is", i); Doubler (&i); printf(" %d.\n", i); } Exercise 3: Write a program that will prompt the user for an integer. Then ask the user whether the number should be doubled or squared, then show the user the result of the chosen operation. (how you design the formatting of the input and output is up to you) 10 Handy Guide to malloc Allocating memory for your pointer as needed: You can use the malloc function to create memory for your pointers to point at instead of having them point at existing variables. malloc takes one parameter, which is the number of bytes needed for the object you are malloc-ing. If you don’t know how many bytes your data type uses (especially if it is a struct you created), fear not, the sizeof function is here to come to your rescue. o int *p; // create an integer pointer named p p = malloc(sizeof(int)); // p now points at an integer-size piece of memory Keeping track of malloc-ed memory It is easy to lose track of malloc-ed memory by reassigning pointers, for example: o int *p, *q; // create two integer pointers p = malloc(sizeof(int)); // p now points at an integer-size piece of memory q = malloc(sizeof(int)); // q now points at an integer-size piece of memory *p = 12; // the integer-size piece of p q // memory p points at is 12 14 // now is assigned to 12 line 5: *q = 14; // similarly for p, but 14 q = p; // this is called a memory leak 12 14 o You might think that this should simply assign line 6: the memory that q points at to the same value that p points at, but it doesn’t. It actually makes q point at the same piece of memory p points at, meaning that we now have nothing pointing at what q used to point at. There is no way to get it back, it is lost forever. o The statement should have been: *q = *p; o Fortunately, programs are good at garbage collections, so when a function completes, any malloc-ed memory is reclaimed and can be recycled. You can choose to recycle memory in the middle of your code (and this is good practice) using the free function. o int *p; // create an integer pointer named p p = malloc(sizeof(int)); // p now points at an integer-size piece of memory *p = 12; // the integer-size piece of memory now is assigned to 12 free (p); // the memory used by p is now free for future mallocs 11 Handy Guide to linked lists What is a linked list? A linked list is a group of objects. Each object has at least one piece of data and a link to the next object. In C, these objects can be created using structs. To create a pointer to a struct type while you are still creating the type requires you to create a temporary name for your structure while you are creating it. o typedef struct emma { // use whatever name you like. int data; struct emma *next; } NodeType; A linked list also needs a pointer to the beginning of the list, traditionally named the head. o NodeType *listHead; A linked list containing the first three primes could be constructed like this. o #include <stdio.h> typedef struct emma { int data; struct emma *next; } NodeType; // structure for linked list. main () { NodeType *listHead; // points to the head of the list. listHead = malloc(sizeof(NodeType)); (*listHead).data = 2; (*listHead).next = malloc(sizeof(NodeType)); (*(*listHead).next).data = 3; (*(*listHead).next).next = malloc(sizeof(NodeType)); (*(*(*listHead).next).next).data = 5; (*(*(*listHead).next).next).next = NULL; printf("%d %d %d\n", (*listHead).data, (*(*listHead).next).data, (*(*(*listHead).next).next).data); } A picture of this list might look something like this: listHead 2 3 5 It doesn’t take long to see that all these asterisks and parentheses will soon get overwhelming. One important shortcut is the arrow operator, which is typed by combining a hyphen and a greater than sign. The arrow is equivalent to “follow the pointer to the struct and get this element”. o (*listHead).data = 2; is equivalent to listHead->data = 2; o (*(*(*listHead).next).next).next = NULL; is equivalent to listHead->next->next->next = NULL; which is a little better Using a temporary pointer to traverse the list, we can print the contents using a while loop. o NodeType *trav; trav = listHead; // trav now points to the head of the list, too. while (trav != NULL) { printf("%d", trav->data); 12 trav = trav->next; } printf("\n"); // trav now points at the next node of the list. 13 Exercise 4: Use a while loop (or for loop if you want to get cute) to create a linked list containing the first ten even integers, then write a function containing the while loop on the last page to print the contents. Inserting in a linked list: Should you decide that you no longer want a list of primes, and that you really want a “4” between the three and five, you do not need to overwrite your data (which would get computationally expensive if there were lots of elements following the insertion place). Create (malloc) a new node and insert it into the list, by setting the pointers appropriately. listHead 2 3 5 4 Before insertion: listHead new malloc-ed node 2 3 5 4 After insertion: Deleting in a linked list: Should you decide that you now find “3” offensive, you can remove this from the list. Don’t forget to free the space. Don’t lose track of the memory before freeing it. listHead 2 3 5 4 After deletion: Note that you need to access the pointer from the previous node to perform the link across the deleted node, so we need to pass in the node before the node to delete to a function: o void DeleteAfter (NodeType *nodeBefore) { NodeType *temp // temporary pointer for deleted node temp = nodeBefore->next; if (temp==NULL) { printf("Hey, this is the end of the list already!\n"); } nodeBefore->next = temp->next; free (temp); } This function cannot be used to delete the first node of a list, you will need a different routine, or you will need to redesign the function. Exercise 5: Write a function, named InsertAfter, which will take two input parameters, one will be a pointer to a new node (with data already assigned) to be inserted, the other will be the node that the new node should follow. Use this insertion routine to insert odd primes between your even number list of exercise 4 before printing. 14 Do you need special cases (different functions) for inserting at the beginning or end of the list? Can you design your routine to avoid the need for any special cases? Why not call malloc in the insertion routine: Garbage collection automatically reclaimes space malloc-ed in a function when the function ends. If you call malloc in the insertion routine, the memory will be reclaimed once you exit the routine, and your memory could be reclaimed and reused later. Exercise 6: a) Write two special functions to operate on your linked list: Push should take two parameters, the list head and the node to be added, and it should add the node to the front of the list and update the head to point at the front of the list. Pop should take one parameter, the head of the list, and it should retrieve the data from the top of the list, update the head to point at the second node in the list, free the node that used to be on top of the list for recycling, and return the data that used to be on top of the list. b) Bonus material: Pop should print an error message if the list is already empty. Technically, Push should print an error message if there is no memory left to be malloc-ed. If malloc fails to allocate memory, it will return NULL. In exercise 6, you have created a data structure known as a stack, where the last item you put on the list is the first item that comes off (much like a stack of cafeteria trays). If you are an experienced programmer and you complete this work before the end of the lab period, think about how you would employ a stack structure to solve a maze. Imagine you have access to a function that takes as input parameters, an x-coordinate, a y-coordinate and an integer representing a direction (0=north, 1=east, 2=south, 3=west). The function returns “1” if there is a wall there, and “0” if not. 15 Handy Guide to variant linked lists: There are two variations on a linked list. A circular linked list is a linked list where the next pointer of the last item points back to the first item. A picture would be: listHead 3 5 A doubly linked list is a linked list where each node not only has a pointer to the next item, but also a pointer to the previous item. The structure could be defined like this: o typedef struct emma { // use whatever name you like. int data; struct emma *next; struct emma *prev; } NodeType; listHead 2 2 3 5 Both variations can be combined to create a circular doubly linked list, where the next pointer of the last item points at the first item and the prev pointer of the first item points at the last item. This is what we are going to need in our final project. listHead 2 3 5 Exercise 7: a) Implement a struct for a doubly linked list, and modify the code from exercise 4 to create this list. Make this list circular, so the next pointer on the end of the list should point to the first node, and the prev pointer at the head should point to the last node in the list. b) Construct such a list containing the integers from 0 to 99. c) Modify your print routine from exercise 4 to print this new list (since the implementation is circular, we will never encounter a NULL pointer, and the print routine would run forever… or at least until the next reboot or power outage). You may assume that data in these nodes will be unique, but this is not necessary to create this piece of code. 16 Exercise 8: Create a routine that will traverse the list (much like your print routine) to find a particular piece of data. The routine should return a pointer to the node containing the desired data. If the data is not in the list, don’t make it keep looping around the list forever, it should stop, like your print routine. If you are unable to find the data you were looking for, return NULL. Your function declaration should look like: NodeType *FindNode (int target, NodeType *trav) { // code goes here!! } Exercise 9: Create a routine that will “reverse” the direction of all nodes between two given nodes (given by supplying pointers to the nodes. If your function declaration looks like void ReverseList (NodeType *startPtr, NodeType *endPtr) { // code goes here!! } and your list looks like: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 then the following code: ptr1 = FindNode (5, listHead); ptr2 = FindNode (9, listHead); ReverseList (ptr1, ptr2); PrintList (listHead); should print a list that looks like: 1 2 3 4 9 8 7 6 5 10 11 12 13 14 15 Notice the underlined nodes are reversed. (No, your program should not underline the nodes) Exercise 10: (completing this exercise will make for a better final project, but it is not required) Create a routine that will move part of the list someplace else in the list. (again, given by supplying pointers to the nodes). If your function declaration looks like void MoveList (NodeType *startPtr, NodeType *endPtr, NodeType *destPtr) { // code goes here!! } and your list looks like: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 then the following code: ptr1 = FindNode (5, listHead); ptr2 = FindNode (9, listHead); ptr3 = FindNode (13, listHead); MoveList (ptr1, ptr2, ptr3); PrintList (listHead); should print a list that looks like: 1 2 3 4 10 11 12 13 5 6 7 8 9 14 15 Notice the underlined nodes have been moved. (No, your program still should not underline) Exercise 11: (not needed for the final project) Package these procedures in a program that prints a linked list containing integers from 1 to 20, then prompt the user to reverse, move, or move & reverse portions of the list. If you really want to get fancy, catch potential disasters before they happen, like trying to move a piece of the list inside itself (moving the portion of the list from 5 to 9 after 7 will create some serious list issues). 17 Handy Guide to arrays: Array declarations. Array declarations are similar to pointer declarations in that array names contain a pointer into a memory location. Array declarations are different from pointer declarations in that an array declaration also creates memory to hold the information... no malloc required. An array declaration creates a block of a pre-specified number of cells that can be accessed by index. int i; int myArray[5]; //creates space for five integers, indexed from zero to four myArray[0] = 3; //sets the first element to three myArray[3] = 7; //sets the fourth element (yes, fourth element) to seven for (i=0; i<5; i++) { //all five elements have been assigned myArray[i] = 10+i; } Unlike a linked list, you can access any piece of information by asking for a particular index. Unlike a linked list, you cannot insert an item in the middle of an array without moving all of the entries past the insertion point down index. After declaring the array, we have uninitialized data: myArray 0 1 2 3 4 ? ? ? ? ? After the first two assignments, some entries have values myArray 0 1 2 3 4 3 7 ? After the loop, all items have been assigned myArray 0 1 2 3 4 10 ? 11 ? 12 13 14 Exercise 12: Create a small block of code to create an array of ten integers, then prompt the user to enter ten integers, one at a time, to be put into the array, then print the contents of the array. Use for loops to read the data from the keyboard and to print the data to the screen. Array data can be initialized when declared by supplying a list inside curly braces. int myArray[5] = {10,11,12,13,14}; //initializes the contents of the array This construction can only be used when the array is first declared, you cannot assign values into an already-declared array with this construction. 18 Adding the line “#include <math.h>” to the beginning of the code, and adding “-lm” to the compile command will allow us to use functions in the math library, like square root. For example: #include <stdio.h> #include <math.h> main () { int i; double d; printf("Enter an integer: "); scanf("%d", &i); d = sqrt(i); printf("The square root is: %f\n", d); //use %f to print doubles } Two dimensional arrays (arrays of arrays) If you wish to create a table of data that is organized in two dimensions, create a two dimensional array. int myArray[3][4]; //creates space for 3 arrays, each of size 4 myArray[1][2] = 12; This array could be pictured like this myArray myArray[0] myArray[1] 0 1 2 3 ? ? ? ? ? ? 12 ? ? ? ? ? myArray[2] Exercise 13: Create a five by five, two-dimensional array of doubles. In each entry, store the distance between two points on the following grid (use Pythagoras, with the square root function). For example, if your array is named distances, then distances[0][2] should be the distance from point zero to point two, which should be the square root of 13. Your program should print this information in a readable format. Use the following declarations for the x-coordinates and y-coordinates for the points. int xCoords[5] = {0,1,3,4,6}; int yCoords[5] = {0,3,2,5,1}; double distances[5][5]; 19 3 1 2 4 0 Handy Guide to sorting arrays: Bubble sort: The easiest to implement and slowest of the sorting algorithms. It looks at adjacent pairs of items, and swaps them if they are in the wrong order. Notice when you pass an array as a parameter, you do not need to specify the number of items in the array in its declaration. #include <stdio.h> void BubbleSort (int theArray[], int numItems) { int loop1, loop2, swapspace; for (loop1=0; loop1<numItems-1; loop1++) { for (loop2=numItems-1; loop2>loop1; loop2--) { if (theArray[loop2-1] > theArray[loop2]) { swapspace = theArray[loop2-1]; theArray[loop2-1] = theArray[loop2]; theArray[loop2] = swapspace; } } } } main () { int myArray[8] = {6,2,5,8,1,7,3,4}; int c; printf("Unsorted: "); for (c=0; c<8; c++) { printf("%d", myArray[c]); } BubbleSort (myArray, 8); printf("\nSorted: "); for (c=0; c<8; c++) { printf("%d", myArray[c]); } printf("\n"); } If you take a close look at the sorting routine, you will see the inner for loop (loop2) travels from the back of the list to the front of the list, swapping items that are out of order. The result of this is a slightly more sorted list, but we can be guaranteed that the smallest item has “bubbled” to the front of the list. 8 6 8 7 6 5 3 2 1 1 Unsorted 1 2 3 4 3 2 6 4 1 One pass in inner loop 8 6 5 4 8 7 5 Two passes in inner loop 7 5 1 20 4 3 2 7 2 3 4 5 6 7 8 Three passes in inner loop Final sorted list 21 Insertion sort: Still slow, but better than bubble sort. This is how humans tend to sort. Outline: o Consider the first element of the array as a sorted list of size one. (odd, I know) o Add the second element to the sorted list by “bubbling” it into its proper location. o Add the third element into this sorted list of size two by “bubbling” it as well. o Continue until the entire list is sorted. Note that some items my “bubble” a great distance, while others only need to “bubble” a short distance, this is what gives the insertion sort a better running time. Picture: 8 6 8 7 6 5 3 2 4 Unsorted 8 5 1 6 5 3 2 1 8 7 5 4 3 2 1 One pass in inner loop 7 6 1 Two & three passes in inner loop 7 3 2 4 4 1 Four passes in inner loop 2 3 4 5 6 7 8 Final sorted list Selection sort: Still slow, but better than bubble sort. This is useful when swaps are expensive. Outline: o Find the lowest item in the list and swap it with the first element. o Find the lowest item in the remaining part of the list; swap it with the second element. o Continue until the entire list is sorted. Requires as many comparisons as bubble sort, but far fewer swaps. This is also convenient if you only need the first few items to be sorted. Picture: 8 6 8 7 5 2 3 5 4 1 1 Unsorted 6 8 7 6 3 2 4 1 One & two passes in inner loop 2 3 Comparisons: Swaps: (n²–n) /2 (n²–n) /4 22 Insertion Sort Selection Sort (n²–n) /4 (n²–n) /4 5 4 Three passes in inner loop Average case analysis, list size n Bubble Sort 7 (n²–n) /2 n–1 Merge Sort: This sort breaks the O(n²) barrier by using a divide and conquer approach. Defined recursively. Outline: o Break the list in half, sort each half (using merge sort), and then merge the lists back into one sorted list. o The “base case” is given by assuming that lists of size one are already sorted. Runs in O(n lg n) time, but requires extra space to be allocated for the merge. Picture: 8 6 8 7 5 3 2 5 4 3 2 1 Unsorted 7 6 4 1 1 2 Sort the two halves 3 4 5 6 7 8 Sorted list Quick Sort: Like merge sort, this usually runs in O(n lg n) time by a recursive divide and conquer approach. Outline: o Using the first element as a pivot point, separate the list into all items less than this pivot value (to the left) and all items more than the pivot point (to the right). Set the “target” to the last element and the “pivot” to the first element. As long as the pivot is less than the target, move the target location left. As soon as the pivot is greater than the target swap the pivot and target. Now the pivot is on the right side of the target, so as long as the pivot is greater than the target, move the target location right. As soon as the pivot is less than the target swap again. Continue until the target meets the pivot. o Sort the remaining two pieces of the list (with quick sort). Will run in O(n²) if the list is already sorted (or nearly sorted). Picture of using the pivot: 8 6 8 7 5 3 2 4 7 5 4 2 pivot = 6, target moves from 4 to 2 then 5 then 8. When target = 8, swap 6 4 3 1 pivot = 6, target moves from 3 to 1 then 7. When target = 7, swap 6 5 2 7 1 pivot = 6, target moves from 8 to 3. When target = 3, swap 1 pivot = 6, target moves from 7 to 6. When target = 6, stop. 6 5 4 23 3 8 3 8 7 6 2 1 8 5 4 3 2 pivot = 6, target = 4. since pivot > target, swap 6 5 4 1 7 2 7 8 3 1 The pivot is in the correct place. Use quick sort to sort the rest. Exercise 14: (see: www.andrew.cmu.edu/~neils/PGSS) You will find a data file containing these points from exercise 13 (with point 4 moved up a few units). Grid marks are now 100 units. You will also find the code for a routine that will read the data into two arrays of doubles, one for x-coordinates and one for y-coordinates. Store the distances in a matrix of integers, by rounding all distances to the nearest integer. This is done by: value = (int)(0.5+sqrt(blah blah blah)) 3 4 1 2 Implement the sort of your choice (I recommend quick sort (harder) 0 or selection sort (easier) for the best final project), and sort each row of your distance matrix with this sort. Note that if you have a sort function that takes the array to be sorted as a parameter, you can access the (n+1)th row of your matrix by passing distances[n] to your routine. Indexed Sorting: In some cases, like with our distance matrix, we wish to have a list of nodes ordered by which node is closest to another node, but we do not wish to reorder the actual data to be sorted. In this case, we create an index list, initialized to an ascending list of integers from 0 to n–1. Where we would ordinarily swap items in the list for sorting, we would swap the indices instead. The BubbleSort function would look like this instead: void BubbleSortIndex (int data[], int index[], int numItems) { int loop1, loop2, swapspace; for (loop1=0; loop1<numItems; loop1++) { index[loop1] = loop1; // initialize the index list } for (loop1=0; loop1<numItems-1; loop1++) { for (loop2=numItems-1; loop2>loop1; loop2--) { if (data[index[loop2-1]] > data[index[loop2]]) { swapspace = index[loop2-1]; index[loop2-1] = index[loop2]; index[loop2] = swapspace; } } } } Exercise 15: Declare a matrix of indexes to go with your matrix of distances. Modify your sort routine in exercise 14 to create an indexed sort. Each row of the index matrix should be the sorted index list for the corresponding row of the distance matrix. Exercise 16: (Nearest Neighbor) Create a list of integers by starting with zero, and then adding the nearest unused number (according to the distance matrix) to the list, until the list contains all the points. In the example above, the list would be “0 1 2 3 4” since point 1 is closest to 0. Point 2 is closest to 1. Point 3 is closest to 2 (we are not allowed to revisit point 1), Finally, point 4 is the only point left. You will need to keep track of which points have been used as you build the list. Another array, filled with zeros at the start, could be used by setting the nth element to 1 when you visit point n. Try your nearest neighbor routine on other test data sets. We will want to use the resulting list as the data for the doubly linked list we implemented in exercise 7. 24 Exercise 17: (2-Opt: flipping part of the tour) Now that you can create a nearest neighbor tour, use this 3 3 ordering to initialize your linked list. By using two traversal pointers (in two nested loops), test for potential distance improvements by examining whether flipping part of the 2 2 tour (exercise 9) would shorten the total distance. Consider 1 1 the test data at the right. The nearest neighbor tour will be 0-1-2-3-4-0, but reversing part of the tour gives a shorter distance: 0-1-3-2-4-0. Data for this instance (ex17.tsp) is also 0 4 0 4 available at: www.andrew.cmu.edu/~neils/PGSS When looking to determine if a flip will give you a shorter Nearest Neighbor 2-Opt distance, consider only the edges that are removed and added by making the switch. In this example, we are removing the edge from 1 to 2 (cost: 224) and the edge from 3 to 4 (cost: 510), and we are adding the edge from 1 to 3 (cost: 316) and the edge from 2 to 4 (cost: 300). Since 224+510 > 316+300, this flip saves us (224+510) – (316+300) = 118. Once you find a flip that grants savings, you should not give up the search, there may be more flips to do. Two options to consider when implementing this algorithm: should you consider all swaps and take the best one, or take one as soon as you find an improvement, and then restart your search. Both approaches will usually result in the same final answer, and the latter will run a little faster. When you can no longer find improvements to the tour, print the final sequence and the cost of the round trip (don’t forget to include the cost to go from the last point to the first point). Exercise 18: (3-Opt: moving part of the tour) 3 3 For this exercise you will need three traversal pointers, and a working exercise 4 4 10. Consider the data from exercise 13 1 1 (seen here). The nearest neighbor tour is 0-1-2-3-4-0, but by moving node 2, we get 2 2 a shorter tour: 0-1-3-4-2-0. When looking to determine if a move will 0 0 give you a shorter distance, again only consider edges that are removed and added. Nearest Neighbor 3-Opt In this example, edges from 1 to 2 (cost 224), 2 to 3 (cost 316), and 4 to 0 (cost 721) are removed; and edges from 1 to 3 (cost 361), 4 to 2 (cost 361) and 2 to 0 (cost 361) are added. Since 224+316+721 > 361+361+361, we save (224+316+721) – (361+361+361) = 178. When you can no longer find improvements to the tour, print the final sequence and the cost of the round trip (don’t forget to include the cost to go from the last point to the first point). Exercise 19: (2-Opt and 3-Opt, also moving and reversing) Your code should first find all flipping improvements. When there are no longer any flipping improvements, becomes or look for moving improvements. When you are moving a piece of the tour that is more than one point, you should consider the possibility of inserting it in its original original moved moved & order, or in a reversed order. flipped Exercise 20: (Or-Opt) In his 1976 Ph.D thesis, Ilhan Or noted that checking moves for only small pieces of the tour performed about as well as checking for all possible moves. Your running time should significantly improve if you only consider moving pieces of up to, say 10 or 15 points when doing larger problems. 25 Exercise 21: (Randomized nearest neighbor) The results of exercises 17-20 will always be the same, but if you modify your nearest neighbor to include a smidge of randomness, you can generate different starting tours from which to run your Or-Opt improvements. For example, you could randomly choose between the two nearest neighbors, or roll a 10-sided die, taking the nearest neighbor 60%, second nearest 30% and third nearest 10%. (Be sure your code does not get stuck when it gets to the end of the tour, when there may only be one point left to choose from). Set up your code to run a randomized nearest neighbor, followed by Or-Opt, save the tour and the cost if it is the best so far, and then continue with a brand new nearest neighbor tour. Try running about 1000 or so of these and see if your best solution matches the best solution in the TSP library. Information for random numbers appears here: Handy guide to random numbers in C. The random number generator: A call to rand() will return a random integer (between negative and positive 2 billion or so) Combining the call with the modulo operator “%” will generate a random non-negative integer in any range you need: o i = rand()%10 //generates a random number between 0 and 9 inclusive Each time you run the program, the (pseudo-)random sequence will be the same. You can start anywhere in this sequence by seeding the random number generator with srand(). o Until you finish debugging your code, I recommend seeding your code in the same place so you can reproduce errors in order to track them down. To do this, call the function once at the beginning of your main (after declarations), with a constant seed, like: o srand(1); o Once your code is working, you can seed your random generator in an apparently random location by feeding the computer’s timer as the parameter: o srand(time(NULL)); 26