COP 3540 Data Structures with OOP Simple Sorting Chapter 3 1/47 Major Topics Introductory Remarks Bubble Sort Selection Sort Insertion Sort Sorting Objects Comparing Sorts 2/47 Introductory Remarks So many lists are better dealt if ordered. Will look at simple sorting first. Note: there are books written on many advanced sorting techniques. • E.g. shell sort; quicksort; heapsort; etc. Will start with ‘simple’ sorts. Relatively slow, Easy to understand, and Excellent performance under circumstances. All these are O(n2) sorts. 3/47 Basic idea of these simple sorts: Compare two items Swap or Copy over Depending on the specific algorithm… 4/47 Bubble Sort Very slow but simple Basic idea: Start at one end of a list - say, an array Compare the first items in the first two positions • If the first one is larger, swap with the second and move to the right; otherwise, move right. At end, the largest item is in the last position. 5/47 Bubble Sort process After first pass, made n-1 comparisons and between 0 and n-1 swaps. Continue this process. Next time we do not check the last entry (n1), because we know it is in the right spot. We stop comparing at (n-2). You see how it goes. Let’s look at some code. Author’s applets are available, if desired. 6/47 class ArrayBub { private long[] a; // ref to array a private int nElems; // number of data items //-------------------------------------------------------------public ArrayBub(int max) // constructor - sets size of array up via argument passed by client. { a = new long[max]; // create the array nElems = 0; // no items yet } public void insert(long value) // put element into array at the end { a[nElems] = value; // insert it nElems++; // increment size } public void display() // displays array contents { for(int j=0; j<nElems; j++) // for each element, System.out.print(a[j] + " "); // display it System.out.println(""); } public void bubbleSort() Here’s the real sort itself! { int out, in; for(out=nElems-1; out>1; out--) // outer loop (backward) for(in=0; in<out; in++) // inner loop (forward) if( a[in] > a[in+1] ) // out of order? swap(in, in+1); // swap them } // end bubbleSort() private void swap(int one, int two) Note: this method is private. So what does this tell you? { long temp = a[one]; a[one] = a[two]; a[two] = temp; 7/47 } } // end class ArrayBub // Normally should also contain find() and delete() methods too… The Bubble Sort Application class BubbleSortApp { public static void main(String[] args) { int maxSize = 100; // array size // note; merely setting up a size ArrayBub arr; // reference to array // creating a pointer to the array arr = new ArrayBub(maxSize); // create the array // creating an instance of the array // class which will contain an array of maxSize. arr.insert(77); arr.insert(99); arr.insert(44); arr.insert(55); arr.insert(22); arr.insert(88); arr.insert(11); arr.insert(00); arr.insert(66); arr.insert(33); // insert 10 items arr.display(); // display items arr.bubbleSort(); don’t care how or where! It is up to the ArrayBub. Asking the object to display the array elements // bubble sort them Asking the array to sort itself arr.display(); // display them again Asking the array to display itself again } // end main() // note all the ‘services’ asked of the array object. } // end class BubbleSortApp 8/47 //////////////////////////////////////////////////////////////// Discussion of Bubble Sort - More public void bubbleSort() { int out, in; Here’s the real sort itself! for(out=nElems-1; out>1; out--) // outer loop (backward) for(in=0; in<out; in++) // inner loop (forward) if( a[in] > a[in+1] ) // out of order? swap(in, in+1); // swap them } // end bubbleSort() Note the elements probed…. private void swap(int one, int two) Note: this method is private. What does this tell you? long temp = a[one]; a[one] = a[two]; a[two] = temp; } If the input numbers are: 77 99 44 55 22 88 11 0 66 33 The sorted numbers will be: 0 11 22 33 44 55 66 77 88 99 Clearly this sort is ‘ascending.’ It doesn’t have to be. Note the loop counter in the outer loop starts at end of array and decrements each cycle. Why? Note: the inner loop starts at the beginning of the array each time and increments until it hits to the current ‘limit,’ which is out-1. (See above) 9/47 Efficiency of the Bubble Sort - Comparisons Can readily see that there are fewer comparisons in each successive ‘pass.’ Thus number of comparisons is computed as: (n-1)+(n-2)+… + 1 = n(n-1)/2; For 10 elements, the number is 10*9/2 = 45. So, the algorithm makes about n2/2 comparisons (ignoring the n which is negligible especially if n is large. (If n = 100, then n(n-1)/2 is (10000 – 100)/2 or 9900/2. So you can see the n2/2 is “close enough”) 10/47 Efficiency of the Bubble Sort - Swaps Clearly there will be fewer ‘swaps’ than comparisons, since every comparison does not result in a swap. We can say that in general, a swap will occur half the time. Given have n2/2 comparisons; Half the time will result in swaps; Conclusion: have (1/2)*(n2/2) = n2/4 swaps. • Worst case, every compare results in a swap. But we will take average. 11/47 Overall – Bubble Sort Note both swaps and compares are proportional to n2. Constants don’t count in Big O notation. Ignore the 2 and 4; say Bubble Sort runs in O(n2) time. This is rather slow. Heuristic: Whenever in algorithms we see nested loops (say, nested-for loops), you can infer O(n2) performance. (Outer loop executes n times and inner loop executes in n times PER execution of the outer n times: hence n2 .) 12/47 Selection Sort 13/47 How Does the Selection Sort Work? Principle: Scan all elements to be sorted selecting the smallest (or largest) item and put this in the first position of an array. Specifically, compare zeroth to first, zeroth to second, zeroth to third, zeroth to fourth, etc. swapping if the first, second, third, or fourth, (etc.) are smaller than the zeroth element in the array. At end of ‘pass 1’, smallest item is found first in list (array, …) in position zero. Next pass, start at position 1 (instead of position 0) Loop until all are sorted. Can improve efficiency by including a switch that is set if at least one swap occurs. 14/47 Selection Sort Can see fewer swaps; same number of comparisons as bubble. But: Swaps reduced from O(n2) to O(n)! Number of comparisons remains the same, O(n2). Reduction in swap time tends to be more important than the comparison time No data movement in comparisons; is in swapping. Significant improvement for movement of large records physically moved around in memory. Think table sorting in COBOL. If moving large records and can reduce the number moved, then performance is improved (Good for COBOL, etc. Not good for Java, since we typically move references and not entire records (objects)). 15/47 Selection Sort – in more detail So, in one pass, you have made n comparisions but possibly ONLY ONE Swap! With each succeeding pass, one more item is sorted and in place; one fewer items needs to be considered. Java code for the Selection Sort: 16/47 // SelectSort.java class ArraySel { private long[ ] a; // ref to array a private int nElems; // number of data items public ArraySel(int max) // constructor { a = new long[max]; // create an array named ‘a’. nElems = 0; // no items yet } public void insert(long value) { // put element into array // Please note I have sacrificed some form for space a[nElems] = value; // insert it nElems++; // increment size } public void display() { // displays array contents for(int j=0; j<nElems; j++) // for each element, System.out.print(a[j] + " "); // display it System.out.println(""); } public void selectionSort() { See separate slide ahead… int out, in, min; for(out=0; out<nElems-1; out++) // outer loop { min = out; // minimum for(in=out+1; in<nElems; in++) // inner loop if(a[in] < a[min] ) // if min greater, min = in; // we have a new min swap(out, min); // swap them } // end for(out) } // end selectionSort() private void swap(int one, int two) { long temp = a[one]; a[one] = a[two]; a[two] = temp; } } // end class ArraySel 17/47 // note scope terminators! The Selection Sort - Client class SelectSortApp { public static void main(String[] args) { int maxSize = 100; // array size ArraySel arr; // creates reference to an object of type ArraySel called arr, arr = new ArraySel(maxSize); // creates the object w/argument for its Constructor arr.insert(77); arr.insert(99); arr.insert(44); arr.insert(55); arr.insert(22); arr.insert(88); arr.insert(11); arr.insert(00); arr.insert(66); arr.insert(33); // insert 10 items // this client doesn’t even know that there is an array in arr! // the ‘insert’ could be inserting into some other kind of container… // could be inserting into, perhaps, a linked list. Don’t know. Don’t care! arr.display(); // display items arr.selectionSort(); // selection-sort them arr.display(); // display them again } // end main() } // end class SelectSortApp 18/47 //////////////////////////////////////////////////////////////// Selection Sort itself: public void selectionSort() { int out, in, min; for(out=0; out<nElems-1; out++) // outer loop { min = out; // assert minimum for(in=out+1; in<nElems; in++) // inner loop if(a[in] < a[min] ) // if min greater, min = in; // we have a new min swap(out, min); // swap them at end of pass } // end outerfor // Walk through this!! } // end selectionSort() 19/47 Selection Sort itself - more Algorithm implies it is an O(n2) sort (and it is). How did we see this? Container class is called ArraySel vice ArrayBub Loops start at beginning and goes to end; Inner loop starts at outer loop’s lowest index+1. Both proceed ‘left to right.’ Outer loop merely Controls the inner loop and keeps track of the smallest (or largest) index of the item per pass. At end of inner pass, swap is called to ‘Swap’ the two elements based on their indices. 20/47 Selection Sort itself - a bit more; Efficiency Same number of comparisons as Bubble Sort – still n2/2 comparisons Fewer swaps Consider: for large N: Comparison times will dominate because there are so many even though there wil be fewer swaps. Yet, with so many fewer swaps, it will still be much faster than the Bubble Sort anyway. Selection Sort still runs in O(n2) time. Consider: for small N, Swap times would dominate; the selection sort may well be much faster, especially if there are a lot of swaps! 21/47 Insertion Sort 22/47 Insertion Sort In many cases, this sort is considered the best of these elementary sorts. Still an O(n2) but: about twice as fast as bubble sort and somewhat faster than selection sort in most situations. Easy, but a bit more complex than the others Insertion sort is sometimes used as final stage of some more sophisticated sorts, such as a QuickSort (coming). 23/47 Insertion Sort – Here’s how it works: 1 of 3 Approach this sort by thinking ‘half’ of the list of items to be sorted is, say, sorted. Pick a position in the middle of an array, say, that you visualize going left to right. Assume items ‘to the left’ of your picked position are sorted (among themselves) recognizing that those elements to the right are not sorted and that those to the left, while sorted, or likely not in their final positions because of the other remaining ‘unsorted’ items have not been inserted into this evolving subset of the original ‘numbers.’. Consider the position you picked as the ‘marked’ position, and that the number is not in its final position. 24/47 Insertion Sort – how it works: 2 of 3 The ‘marked item’ and all items to the right are unsorted. We now want to insert the marked item in the appropriate place in the partially sorted group (the group to the left). But to do this, Need to shift ‘some’ of the items on the left - to the right to make room for the marked item. Remove marked item from list. (Leaves a hole). Look for position in sorted list to place marked item into: Move largest ‘sorted’ item into marked item’s slot; (recall it is empty) Move the ‘next’ largest item into the (former) largest item’s slot, etc. But all the while, you are comparing the marked item with the one to be moved to the right. Shifting stops when you’ve shifted the last item larger than marked item. This opens up a space where the marked item, when inserted, will be in sorted order. 25/47 Insertion Sort – description 3 of 3. Result: partially-ordered list is one item larger and the unsorted list is one item smaller. Marked item moves one slot to the right, so once more it is again in front of the leftmost unsorted item. Continue process until all unsorted items have been inserted. Hence the name ‘insertion sort.’ Code 26/47 // insertSort.java class ArrayIns { private long[] a; // ref to array a private int nElems; // number of data items public ArrayIns(int max) { // constructor a = new long[max]; // creates an array for an object (here, obj name is arr – from client) nElems = 0; // no items yet } public void insert(long value) { // put element into array. Inside this object, we see list is an array. a[nElems] = value; // insert it nElems++; // increment size } public void display() { // displays array contents for(int j=0; j<nElems; j++) // for each element, System.out.print(a[j] + " "); // display it System.out.println(""); } public void insertionSort() { // will examine ahead separately… int in, out; for(out=1; out<nElems; out++) // out is dividing line { long temp = a[out]; // remove marked item in = out; // start shifts at out while(in>0 && a[in-1] >= temp) // until one is smaller, { a[in] = a[in-1]; // shift item to right --in; // go left one position } a[in] = temp; // insert marked item } // end for } // end insertionSort() 27/47 } // end class ArrayIns Insertion Sort Client class InsertSortApp { public static void main(String[] args) { int maxSize = 100; // array size that will be passed to Constructor in arr object. ArrayIns arr; // reference to object of type ArrayIns. arr = new ArrayIns(maxSize); // creates the object and passes argument to its Constructor. arr.insert(77); arr.insert(99); arr.insert(44); arr.insert(55); arr.insert(22); arr.insert(88); arr.insert(11); arr.insert(00); arr.insert(66); arr.insert(33); // insert 10 items arr.display(); // display items arr.insertionSort(); // Please note: no notion of the structure of this list! // Could be implemented as an array, linked list, tree, … // get the object to do the work on its data! // insertion-sort them arr.display(); // display them again } // end main() } // end class InsertSortApp 28/47 Code for Insertion Sort method public void insertionSort() { int in, out; for(out=1; out<nElems; out++) // out is dividing line. { // a[out] is marked item… long temp = a[out]; // remove/move marked item to temp.. in = out; // start shifts at “out” while(in>0 && a[in-1] >= temp) // until one is smaller, { a[in] = a[in-1]; // shift item to right filling the hold --in; // then go left one position } a[in] = temp; // insert marked item; done with this } // end for // marked item. Get next item on the right } // end insertionSort() 29/47 Another Source: On your OWN to read. In abstract terms, every iteration of an insertion sort removes an element from the input data, inserting it at the correct position in an already sorted list, and this continues until no elements are left in the input. The choice of which element to remove from the input is arbitrary and can be made using almost any choice algorithm. Sorting is typically done in-place. The resulting array after k iterations contains the first k entries of the input array and is sorted. In each step, the first remaining entry of the input is removed, inserted into the result at the right position, thus extending the result: 30/47 What we really have is shown below: becomes: . with each element > x copied to the right as it is compared against x 31/47 Discussion of Insertion Sort – How really implemented!! You’ve seen the theory. Theory is general. Implementation is specific. So, here’s what we do: Start with out = 1, which means there is only a single element to its ‘left.’ We infer that this item to its left is sorted unto itself. Hard to argue this is not true. (This is out = 0) a[out] is the marked item, and it is moved into temp. a[out] is the leftmost unsorted item. The inner loop compares the marked item to the one to the left of it. If the marked item is smaller, a[out] and a[out-1] are swapped and ‘in’ is decremented. (Incidentally, note that the algorithm structure itself tells us that this is an O(n2) sort.) Let’s consider some data…32/47 First (of three): Brute Force Approach 1 of 3 You are responsible for going through the algorithm. This is the only way to really convince yourself that you understand it. See algorithm for first two elements only: 40 and 20. The for: out = 1; long temp = a[out] => a[1] = 20 (marked item; create hole) in = out; (in = out = 1) ‘while’ is true, ( in > 0 it is 1; a[in-1] or a[0] >= temp. Yes. that is 40 > 20. so we want: a[in] = a[in-1] that is, a[1] = 40. // moved large to hole ‘in’ is decremented (to 0) and the ‘while’ will be false, so fall through to next stmt in the for: a[in] to temp; that is, a[0] to 20. Go back to outer loop for more serious ‘n’ But for two elements, this is it. 33/47 Second (of three): Let’s make it larger…Brute Force Here, Out = 2 Assume we now have 20 40 30 long temp = a[out] = a[2] = 30. Move 30 out. Create hole. in = out; in = out = 2. while (in > 0 && a[in-1] >= temp, a[1] > 30. True. a[in] = a[in-1] a[2] = a[1] a[2] = 40. (move to hole) decrement ‘in’ to 1; loop within while this time... while (in > 0 && a[0] >= temp) 20 >= 30? False. end while a[in] = temp …. a[1] = 30 a[0] remains = to 20. We now have 20 30 40 in order) Back to outer loop…. One more…. 34/47 Third (of three): Brute Force Approach So, we have 20 30 40 and, say, 25 (marked) (25 is now ‘rightmost’ From for loop out = 3; long temp = 25. in = out = 3 while (in>0 (T) && a[in-1] > temp a[2] > temp = T; loop a[in] = a[in-1] a[3] = a[2] a[3] = 40 decrement ‘in’ to 2 and loop within while while (in>0 (T) and a[in-1] > temp a[1] > temp = True; loop a[2] = a[1] or a[2] = 30. decrement ‘in’ to 1 and loop within while. while (in > 0 (True) and a[in-1] > temp a[0] > temp = False! go to outer loop: a[in] = temp a[1] = 25 a[0] remains = 20…. We now have: 20 25 30 40 See the trend?? 35/47 Efficiency of the Insertion Sort Comparisons: How it really works: Remember, we start on the left… On pass one, it compares a max of one; pass two, it compares a max of two, etc. Up to a max of n-1 comparisons. This equals 1+2+3+…+n-1 = n(n-1)/2 comparisons. But, on average, only half of the maximum number of items are actually compared before the insertion point is found. This gives us: n(n-1)/4. 36/47 Efficiency of the Insertion Sort (2 of 3) Copies: Have lots of ‘copies’ (same as number of comparisons – about) But a copy is not nearly as time-consuming as a swap. Think about this!! Thus, for random data, this algorithm runs about twice as fast as the bubble sort and faster than the selection sort. Yet, it still runs on O(n2) time for random data. 37/47 Efficiency of the Insertion Sort (3 of 3) Lastly: If data is nearly sorted insertion sort does quite well. The condition in while loop is rarely true, so it almost runs in O(n) time; much better than O(n2) time! Thus to order a file that is nearly in order, an insertion sort will be a very efficient sort. But if data is in very unsorted order (nearly backward), performance runs no faster than the bubble sort, as every possible comparison and shift is carried out. 38/47 Sorting Objects 39/47 Sorting Objects Very important to be able to sort objects. Let’s sort an array of Person objects. Must be careful here, especially in noting that ‘that’ which is in an array are objects, and The sort is based on values of String objects inside of the object. • Will use the inherited String method, compareTo. 40/47 Person Class class Person { private String lastName; private String firstName; private int age; //----------------------------------------------------------public Person(String last, String first, int a) { // constructor lastName = last; firstName = first; age = a; } //----------------------------------------------------------public void displayPerson() { System.out.print(" Last name: " + lastName); System.out.print(", First name: " + firstName); System.out.println(", Age: " + age); } //----------------------------------------------------------public String getLast() // get last name { return lastName; } } // end class Person 41/47 //////////////////////////////////////////////////////////////// Array In Object – Here’s the important slide!!!!!! class ArrayInOb { private Person[] a; // ref to array a of Person objects (to be defined) private int nElems; // number of data items public ArrayInOb(int max) { // constructor a = new Person[max]; // create a ref to an array of ‘max’ Person objects. But since Person is nElems = 0; // not a primitive data type, we define it as a class and we create } // an array of Person objects, ‘a.’ public void insert(String last, String first, int age) { // creates individual Person objects a[nElems] = new Person(last, first, age); nElems++; // increment size } public void display() { // displays array contents for (int j=0; j<nElems; j++) // for each element, a[j].displayPerson(); // display it } public void insertionSort() { int in, out; for(out=1; out<nElems; out++) { Person temp = a[out]; // out is dividing line Here’s our marked item (object) in = out; // start shifting at out while(in>0 && a[in-1].getLast().compareTo(temp.getLast())>0) { complicated. >0 + a[in] = a[in-1]; // shift item to the right --in; // go left one position What does this really do? } Why do we need getLast()? a[in] = temp; // insert marked item } // end for 42/47 } // end insertionSort() } // end class ArrayInOb Object Sort Application – Client application class ObjectSortApp { public static void main(String[] args) { int maxSize = 100; // array size ArrayInOb arr; // reference to object that will contain our array arr = new ArrayInOb(maxSize); // creates the object, which creates an array of 100 Person objects. arr.insert("Evans", "Patty", 24); arr.insert("Smith", "Doc", 59); arr.insert("Smith", "Lorraine", 37); arr.insert("Smith", "Paul", 37); arr.insert("Yee", "Tom", 43); arr.insert("Hashimoto", "Sato", 21); arr.insert("Stimson", "Henry", 29); Nothing really different! arr.insert("Velasquez", "Jose", 72); The public interface of ArrayInOb indicates that the insert requires arr.insert("Vang", "Minh", 22); several attributes. But still, the arr.insert("Creswell", "Lucinda", 18); insert could be almost any implementing structure. System.out.println("Before sorting:"); arr.display(); // display items arr.insertionSort(); // insertion-sort them System.out.println("After sorting:"); arr.display(); // display them again } // end main() 43/47 } // end class ObjectSortApp Discussion Note: we are sorting based on last name. Last name is a String Need to use the compareTo method available for String objects. We are not comparing primitives! We are comparing attributes in a specific array element, which happens to be a Person object. Recall: the ‘compareTo’ method returns an integer: a negative integer, zero, or a positive integer. The ‘compareTo’ method is inherited from the String class. 44/47 Secondary Sort Fields? ‘Stable’ sort? What does this mean? Original orders are preserved… All of these algorithms so far are stable Sort ‘keys’ Think Windows: Arrange by Type; then by data modified… 45/47 Comparing the Simple Sorts Bubble Sort – simplest. Use only if you don’t have other algorithms available and ‘n’ is small. Selection Sort Minimizes number of swaps, but number of comparisons still high. Useful when amount of data is small and swapping is very time consuming – like when sorting records in tables – internal sorts. Insertion Sort The most versatile, and is usually best bet in most situations, if amount of data is small or data is almost sorted. For large n, other sorts are better. We will cover advanced sorts later… 46/47 Comparing the Simple Sorts All require very little space and they sort in place. Can’t see the real efficiencies / differences unless you apply the different sources to large amounts of data. 47/47