Simple Sorting As soon as you create a significant database, you’ll probably think of reasons to sort it in various ways. You need to arrange names in alphabetical order, students by grade, home sales by price, cities in order of increasing population, countries by GNP, stars by magnitude, and so on. Sorting data may also be a preliminary step to searching it. As we saw, “Arrays”, a binary search, which can be applied only to sorted data, is much faster than a linear search. Because sorting is so important and potentially so time-consuming, it has been the subject of extensive research in computer science, and some very sophisticated methods have been developed.. We’ll look at three of the simpler algorithms: the bubble sort, the selection sort, and the insertion sort. The three algorithms all involve two steps, executed over and over until the data is sorted: 1. Compare two items. 2. Swap two items, or copy one item. Bubble Sort The bubble sort is notoriously slow, but it’s conceptually the simplest of the sorting algorithms and for that reason is a good beginning for our exploration of sorting techniques. Step-by-step example Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort algorithm. In each step, elements written in bold are being compared. First Pass: (5 1 4 2 8) (1 5 4 2 8), Here, algorithm compares the first two elements, and swaps them. (1 5 4 2 8) ( 1 4 5 2 8 ), Swap since 5 > 4 ( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2 ( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not swap them. Second Pass: (14258) (14258) (12458) (12458) (14258) ( 1 2 4 5 8 ), Swap since 4 > 2 (12458) (12458) Now, the array is already sorted, but our algorithm does not know if it is completed. The algorithm needs one whole pass without any swap to know it is sorted. Third Pass: (12458) (12458) (12458) (12458) (12458) (12458) (12458) (12458) Finally, the array is sorted, and the algorithm can terminate. 74 This is why it’s called the bubble sort: As the algorithm progresses, the biggest items “bubble up” to the top end of the array. Implementation Pseudocode implementation The algorithm can be expressed as: procedure bubbleSort( A : list of sortable items ) do swapped = false for each i in 1 to length(A) - 1 inclusive do: if A[i-1] > A[i] then swap( A[i-1], A[i] ) swapped = true end if end for while swapped end procedure After this first pass through all the data, you’ve made N-1 comparisons and somewhere between 0 and N-1 swaps. The item at the end of the array is sorted and won’t be moved again. Now you go back and start another pass from the left end of the line. Again, you go toward the right, comparing and swapping when appropriate. This rule could be stated as: When you reach the first sorted numbers (for example), start over at the left end of the line. You continue this process until all the numbers are in order. Selection Sort The selection sort improves on the bubble sort by reducing the number of swaps necessary from O (N2) to O (N). Unfortunately, the number of comparisons remains O (N2). However, the selection sort can still offer a significant improvement for large records that must be physically moved around in memory, causing the swap time to be much more important than the comparison time. (Typically, this isn’t the case in Java, where references are moved around, and not entire objects.) A Brief Description What’s involved in the selection sort is making a pass through all the numbers and picking (or selecting, hence the name of the sort) the smallest one. This smallest number is then swapped with the number on the left end of the line, at position 0. Now the leftmost number is sorted and won’t need to be moved again. Notice that in this algorithm the sorted numbers accumulate on the left (lower indices), whereas in the bubble sort they accumulated on the right. The next time you pass down the row of numbers, you start at position 1, and, finding the minimum, swap with position 1. This process continues until all the numbers are sorted. 74 Algorithm The algorithm works as follows: Find the minimum value in the list Swap it with the value in the first position Repeat the steps above for the remainder of the list (starting at the second position and advancing each time) Effectively, the list is divided into two parts: the sublist of items already sorted, which is built up from left to right and is found at the beginning, and the sublist of items remaining to be sorted, occupying the remainder of the array. Here is an example of this sort algorithm sorting five elements: 64 25 12 22 11 11 25 12 22 64 11 12 25 22 64 11 12 22 25 64 11 12 22 25 64 (nothing appears changed on this last line because the last 2 numbers were already in order) Selection sort can also be used on list structures that make add and remove efficient, such as a linked list. In this case it's more common to remove the minimum element from the remainder of the list, and then insert it at the end of the values sorted so far. Insertion Sort In most cases the insertion sort is the best of the elementary sorts described in this sorting lecture. It still executes in O (N2) time, but it’s about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It’s also not too complex, although it’s slightly more involved than the bubble and selection sorts. It’s often used as the final stage of more sophisticated sorts, such as quicksort. Insertion Sort on the Baseball Players To begin the insertion sort, start with your baseball players lined up in random order. It’s easier to think about the insertion sort if we begin in the middle of the process, when the team is half sorted. Partial Sorting At this point there’s an imaginary marker somewhere in the middle of the line. The players to the left of this marker are partially sorted. This means that they are sorted among themselves; each one is taller than the person to his or her left. However, the players aren’t necessarily in 74 their final positions because they may still need to be moved when previously unsorted players are inserted between them. Note that partial sorting did not take place in the bubble sort and selection sort. In these algorithms a group of data items was completely sorted at any given time; in the insertion sort a group of items is only partially sorted. The Marked Player The player where the marker is, whom we’ll call the “marked” player and all the players on her right, is as yet unsorted. This is shown in Figure 1.a. Algorithm An example on insertion sort. Check each element and put them in the right order in the sorted list. Every repetition of insertion sort removes an element from the input data, inserting it into the correct position in the already-sorted list, until no input elements remain. The choice of which element to remove from the input is arbitrary, and can be made using almost any choice algorithm. Sorting is typically done in-place. The resulting array after k iterations has the property where the first k + 1 entries are sorted. In each iteration the first remaining entry of the input is removed, inserted into the result at the correct position, thus extending the result: Becomes with each element greater than x copied to the right as it is compared against x. The most common variant of insertion sort, which operates on arrays, can be described as follows: 1. Suppose there exists a function called Insert designed to insert a value into a sorted sequence at the beginning of an array. It operates by beginning at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. The function has the side effect of overwriting the value stored immediately after the sorted sequence in the array. 2. To perform an insertion sort, begin at the left-most element of the array and invoke Insert to insert each element encountered into its correct position. The ordered sequence into which the element is inserted is stored at the beginning of the array in the set of indices already examined. Each insertion overwrites a single value: the value being inserted. 05 Figure 1 The insertion sort on baseball players. What we’re going to do is insert the marked player in the appropriate place in the (partially) sorted group. However, to do this, we’ll need to shift some of the sorted players to the right to make room. To provide a space for this shift, we take the marked player out of line. (In the program this data item is stored in a temporary variable.) This step is shown in Figure 1.b. Now we shift the sorted players to make room. The tallest sorted player moves into the marked player’s spot, the next-tallest player into the tallest player’s spot, and so on. 05 When does this shifting process stop? Imagine that you and the marked player are walking down the line to the left. At each position you shift another player to the right, but you also compare the marked player with the player about to be shifted. The shifting process stops when you’ve shifted the last player that’s taller than the marked player. The last shift opens up the space where the marked player, when inserted, will be in sorted order. This step is shown in Figure 1.c. Now the partially sorted group is one player bigger, and the unsorted group is one player smaller. The marker T-shirt is moved one space to the right, so it’s again in front of the leftmost unsorted player. This process is repeated until all the unsorted players have been inserted (hence the name insertion sort) into the appropriate place in the partially sorted group. 05