Sorting Lecture 8 16-4-2013 1 • We’ll look at three of the simpler algorithms: the bubble sort, the selection sort, and the insertion sort. • • The three algorithms all involve two steps, executed over and over until the data is sorted: • 1. Compare two items. • 2. Swap two items, or copy one item. 2 Bubble Sort • The bubble sort is notoriously slow, but it’s conceptually the simplest of the sorting algorithms and for that reason is a good beginning for our exploration of sorting techniques. • • Step-by-step example • Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort algorithm. In each step, elements written in bold are being • Compared. 3 Bubble Sort • First Pass: (5 1 4 2 8) (1 5 4 2 8), Here, algorithm compares the first two elements, and swaps them. (1 5 4 2 8) ( 1 4 5 2 8 ), Swap since 5 > 4 ( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2 ( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not swap them. 4 Bubble Sort • Second Pass: (14258)(14258) ( 1 4 2 5 8 ) ( 1 2 4 5 8 ), Swap since 4 > 2 (12458)(12458) (12458)(12458) • Now, the array is already sorted, but our algorithm does not know if it is completed. The algorithm needs one whole pass without any swap to know it is sorted. 5 Bubble Sort • Third Pass: (12458)(12458) (12458)(12458) (12458)(12458) (12458)(12458) • Finally, the array is sorted, and the algorithm can terminate. 6 Sorting • Sorting takes an unordered collection and makes it an ordered one. 1 2 42 77 1 5 3 2 5 4 35 6 101 12 35 3 12 4 5 5 6 42 77 101 7 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 77 2 3 42 4 35 5 6 12 101 5 8 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 2 7742 Swap4277 3 4 35 5 6 12 101 5 9 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 42 2 3 4 7735 Swap 3577 5 6 12 101 5 10 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 42 2 3 35 4 5 77 7712 Swap 12 6 101 5 11 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 42 2 3 35 4 12 5 6 77 101 5 No need to swap 12 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 42 2 3 35 4 12 5 6 77 1015 Swap 101 5 13 "Bubbling Up" the Largest Element • Traverse a collection of elements – Move from the front to the end – “Bubble” the largest value to the end using pairwise comparisons and swapping 1 42 2 3 35 4 12 5 6 77 5 101 Largest value correctly placed 14 Items of Interest • Notice that only the largest value is correctly placed • All other values are still out of order • So we need to repeat this process 1 42 2 3 35 4 12 5 6 77 5 101 Largest value correctly placed 15 Repeat “Bubble Up” How Many Times? • If we have N elements… • And if each time we bubble an element, we place it in its correct location… • Then we repeat the “bubble up” process N – 1 times. • This guarantees we’ll correctly place all N elements. 16 “Bubbling” All the Elements 1 2 42 1 35 2 35 N-1 1 5 3 2 5 5 4 4 4 35 5 101 77 101 77 101 77 101 77 101 6 5 5 6 42 5 6 42 35 3 12 4 6 77 5 3 2 5 42 35 12 1 3 2 4 12 12 12 1 3 5 6 42 17 • This is why it’s called the bubble sort: As the algorithm progresses, the biggest items “bubble up” to the top end of the array. 18 Selection Sort • The selection sort improves on the bubble sort by reducing the number of swaps necessary from O (N2) to O (N). Unfortunately, the number of comparisons remains O (N2). However, the selection sort can still offer a significant improvement for large records that must be physically moved around in memory, causing the swap time to be much more important than the comparison time. (Typically, this isn’t the case in Java, where references are moved around, and not entire objects.) 19 Selection Sort • A Brief Description • What’s involved in the selection sort is making a pass through all the numbers and picking (or selecting, hence the name of the sort) the smallest one. This smallest number is then swapped with the number on the left end of the line, at position 0. Now the leftmost number is sorted and won’t need to be moved again. Notice that in this algorithm the sorted numbers accumulate on the left (lower indices), whereas in the bubble sort they accumulated on the right. • The next time you pass down the row of numbers, you start at position 1, and, finding the minimum, swap with position 1. This process continues until all the numbers are sorted. 20 21 Selection Sort • Effectively, the list is divided into two parts: the sublist of items already sorted, which is built up from left to right and is found at the beginning, and the sublist of items remaining to be sorted, occupying the remainder of the array. • • Here is an example of this sort algorithm sorting five elements: • • 64 25 12 22 11 • • 11 25 12 22 64 • • 11 12 25 22 64 • • 11 12 22 25 64 • • 11 12 22 25 64 22 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 23 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 24 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 25 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 26 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 27 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 28 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 29 Selection Sort 5 1 3 4 6 2 Comparison Data Movement Sorted 30 Selection Sort 5 1 3 4 6 2 Largest Comparison Data Movement Sorted 31 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 32 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 33 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 34 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 35 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 36 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 37 Selection Sort 5 1 3 4 2 6 Comparison Data Movement Sorted 38 Selection Sort 5 1 3 4 2 6 Largest Comparison Data Movement Sorted 39 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 40 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 41 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 42 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 43 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 44 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 45 Selection Sort 2 1 3 4 5 6 Largest Comparison Data Movement Sorted 46 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 47 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 48 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 49 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 50 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 51 Selection Sort 2 1 3 4 5 6 Largest Comparison Data Movement Sorted 52 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 53 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 54 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 55 Selection Sort 2 1 3 4 5 6 Comparison Data Movement Sorted 56 Selection Sort 2 1 3 4 5 6 Largest Comparison Data Movement Sorted 57 Selection Sort 1 2 3 4 5 6 Comparison Data Movement Sorted 58 Selection Sort 1 2 3 4 5 6 DONE! Comparison Data Movement Sorted 59 Insertion Sort • In most cases the insertion sort is the best of the elementary sorts described in this sorting lecture. It still executes in O (N2) time, but it’s about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations. It’s also not too complex, although it’s slightly more involved than the bubble and selection sorts. • It’s often used as the final stage of more sophisticated sorts, such as quicksort. 60 Insertion Sort • Insertion Sort on the Baseball Players • To begin the insertion sort, start with your baseball players lined up in random order. It’s easier to think about the insertion sort if we begin in the middle of the process, when the team is half sorted. 61 Insertion Sort • Partial Sorting • At this point there’s an imaginary marker somewhere in the middle of the line. The players to the left of this marker are partially sorted. This means that they are sorted among themselves; each one is taller than the person to his or her left. However, the players aren’t necessarily in their final positions because they may still need to be moved when previously unsorted players are inserted between them. • Note that partial sorting did not take place in the bubble sort and selection sort. In these algorithms a group of data items was completely sorted at any given time; in the insertion sort a group of items is only partially sorted. 62 63 • The Marked Player • The player where the marker is, whom we’ll call the “marked” player and all the players on her right, is as yet unsorted. This is shown in Figure 1.a. 64 • Algorithm • • An example on insertion sort. • Check each element and put them in the right order in the sorted list. • Every repetition of insertion sort removes an element from the input data, inserting it into the correct position in the already-sorted list, until no input elements remain. The choice of which element to remove from the input is arbitrary, and can be made using almost any choice algorithm. • Sorting is typically done in-place. The resulting array after k iterations has the property where the first k + 1 entries are sorted. 65 • In each iteration the first remaining entry of the input is removed, inserted into the result at the correct position, thus extending the result: Becomes with each element greater than x copied to the right as it is compared against x. 66 • The most common variant of insertion sort, which operates on arrays, can be described as follows: • Suppose there exists a function called Insert designed to insert a value into a sorted sequence at the beginning of an array. It operates by beginning at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. The function has the side effect of overwriting the value stored immediately after the sorted sequence in the array. • To perform an insertion sort, begin at the left-most element of the array and invoke Insert to insert each element encountered into its correct position. The ordered sequence into which the element is inserted is stored at the beginning of the array in the set of indices already examined. Each insertion overwrites a single value: the value being inserted. 67 68 • What we’re going to do is insert the marked player in the appropriate place in the (partially) sorted group. However, to do this, we’ll need to shift some of the sorted players to the right to make room. To provide a space for this shift, we take the marked player out of line. (In the program this data item is stored in a temporary variable.) This step is shown in Figure 1.b. • Now we shift the sorted players to make room. The tallest sorted player moves into the marked player’s spot, the next-tallest player into the tallest player’s spot, and so on. 69 • When does this shifting process stop? Imagine that you and the marked player are walking down the line to the left. At each position you shift another player to the right, but you also compare the marked player with the player about to be shifted. • The shifting process stops when you’ve shifted the last player that’s taller than the marked player. The last shift opens up the space where the marked player, when inserted, will be in sorted order. This step is shown in Figure 1.c. 70 • Now the partially sorted group is one player bigger, and the unsorted group is one player smaller. The marker T-shirt is moved one space to the right, so it’s again in front of the leftmost unsorted player. This process is repeated until all the unsorted players have been inserted (hence the name insertion sort) into the appropriate place in the partially sorted group. 71 Thank you… 72