CS 163 Data Structures Chapter 5 Sorting Herbert G. Mayer, PSU Status 5/23/2015 1 Syllabus Sorting Constraints Complexity Bubble Sort Insertion Sort 2 Sorting Constraints Sorting rearranges elements of a data structure in a defined order; but does not change the total content One order of the arrangement is descending Another order is ascending Without loss of generality we focus on ascending order only; the other order is purely complementary Also, data structures can store information repeatedly, or just once per unique element If repeated, the duplicates may be stored sequentially; or else a count at one instance indicates the total number of occurrences Without loss of generality we focus on unique occurrence only 3 Complexity The cost of a search for 1 elements in a data structure of n unordered elements is n, or O(n) in Big-O notation, with n being the distinct number of elements Cost of a sort is generally higher than n, as each element’s position is considered versus all n, hence cost can be O(n2) Purpose of the sort is often to allow for more efficient searching algorithms, more efficient than O(n) for 1 element This weighs, when the number of lookups is large, i.e. large vs. the cost of the initial sort In Big-O notation, only the complexity n of the data structure is considered, i.e. the number n of elements included 4 Bubble Sort The Bubble Sort is the most intuitive sorting method, but also a most costly For a data structure of size n the cost to sort is O(n2) To sort ascendingly, each element in turn is compared against all other n elements to determine the correct position; uniqueness assumed That means, each of n elements is compared against O(n) other elements In reality, only comparison against n-i are needed, with i = 1..n-1, but in Big-O notation such an effective reduction factor ½ does not have any impact on the Big-O cost function 5 Bubble Sort Implementation // bubble sort sorts in ascending order // assume elements to be unique in data structure a[] // there are MAX integers included in a[] // .. Core of some bubble sort algorithm for ( int outer = 0; outer < MAX-1; outer++ ) { for ( int inner = outer+1; inner < MAX; inner++ ) { if ( a[ outer ] > a[ inner ] ) { // element at a[ outer ] is larger! Exchange! swap( a[ inner ], a[ outer ] ); // C++ } //end if } //end for } //end for 6 Bubble Sort, swap() with & Parameter // swap works with & ref parameters // else resort to de-tour via pointers // swap() makes no assumption, where // val1 and val2 are stored // Also, since val1 and val2 are reference parameters // actuals do not need to be passed with “address of” void swap( int & val1, int & val2 ) { // swap int temp = val1; val1 = val2; val2 = temp; } //end swap 7 // must be C++ Bubble Sort, swap() in Situ // sort in ascending order // assume elements to be unique in data structure a[] // there are MAX integers inside a[] // .. Core of some bubble sort algorithm for ( int outer = 0; outer < MAX-1; outer++ ) { for ( int inner = outer+1; inner < MAX; inner++ ) { if ( a[ outer ] > a[ inner ] ) { // element at lower index outer is larger! int temp = a[ inner ]; a[ inner ] = a[ outer ]; a[ outer ] = temp; // swapping done in situ! } //end if } //end for } //end for 8 Bubble Sort, ptr_swap() // sort in ascending order // assume elements to be unique in data structure a[] // there are MAX integers inside a[] // .. Core of some bubble sort algorithm for ( int outer = 0; outer < MAX-1; outer++ ) { for ( int inner = outer+1; inner < MAX; inner++ ) { if ( a[ outer ] > a[ inner ] ) { // element at lower index outer is larger! ptr_swap( & a[ inner ], & a[ outer ] ); } //end if } //end for } //end for 9 Bubble Sort, ptr_swap() // can be C or C++ // val1 and val2 are *int parameters // This is how C programmer can get around ref parameters void ptr_swap( int * val1, int * val2 ) { // swap int temp = *val1; *val1 = *val2; *val2 = temp; } //end ptr_swap 10 Insertion Sort Insertion sort is a sorting algorithm that is relatively efficient for mostly sorted lists Elements from the list are removed, and then placed, one at a time and inserted in their correct position in a new sorted list The remaining list is moved up (or down) by one position, possible due to the place freed by the moved element Partially sorted Ai Unsorted data > Ai Ai … Insert Partially sorted Ai Ai Unsorted data > Ai 11 … Insertion Sort If the original list is largely unsorted, the cost for insertion sort becomes similar, even equal to the bubble sort For lists that are almost totally sorted, the cost for insertion sort can be low, even O(1) in Big-O notation 12 Insertion Sort, Method Goal is a list in ascending order: Start at index i=1, fetch value = list[i]; then all the way up the last element i=MAX-1 Set j = i-1 and compare value against list[j] As long as element list[j] is larger than value, it is out of place, it must be shifted to a higher index, up to where value was fetched In the end, value is placed into the slot freed 13 Insertion Sort . . . // very clever for ( i = 1; i < MAX-1; i++ ) { value = list[ i ]; j = i - 1; while( ( j >= 0 ) && ( list[ j ] > value ) ) { list[ j+1 ] = list[ j ]; // push up --j; // check next } //end while list[ j + 1 ] = value; // the right place } //end for 14 Insertion Sort The simplicity of the algorithm is striking The cost is not worse than that of the bubble sort For lucky cases, the cost function can be way lower than the O(n2) of the bubble sort In rare cases it may be O(1), something not possible with the bubble sort 15