Simple Sorting

advertisement
Simple Sorting
As soon as you create a significant database, you’ll probably think of reasons to sort it in various
ways. You need to arrange names in alphabetical order, students by grade, home sales by price,
cities in order of increasing population, countries by GNP, stars by magnitude, and so on.
Sorting data may also be a preliminary step to searching it. As we saw, “Arrays”, a binary
search, which can be applied only to sorted data, is much faster than a linear search.
Because sorting is so important and potentially so time-consuming, it has been the subject of
extensive research in computer science, and some very sophisticated methods have been
developed.. We’ll look at three of the simpler algorithms: the bubble sort, the selection sort,
and the insertion sort.
The three algorithms all involve two steps, executed over and over until the data is sorted:
1. Compare two items.
2. Swap two items, or copy one item.
Bubble Sort
The bubble sort is notoriously slow, but it’s conceptually the simplest of the sorting algorithms
and for that reason is a good beginning for our exploration of sorting techniques.
Step-by-step example
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to
greatest number using bubble sort algorithm. In each step, elements written in bold are being
compared.
First Pass:
(5 1 4 2 8) (1 5 4 2 8), Here, algorithm compares the first two elements, and swaps them.
(1 5 4 2 8) ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not
swap them.
Second Pass:
(14258)
(14258)
(12458)
(12458)
(14258)
( 1 2 4 5 8 ), Swap since 4 > 2
(12458)
(12458)
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.
Third Pass:
(12458)
(12458)
(12458)
(12458)
(12458)
(12458)
(12458)
(12458)
Finally, the array is sorted, and the algorithm can terminate.
74
This is why it’s called the bubble sort: As the algorithm progresses, the biggest items “bubble
up” to the top end of the array.
Implementation
Pseudocode implementation
The algorithm can be expressed as:
procedure bubbleSort( A : list of sortable items )
do
swapped = false
for each i in 1 to length(A) - 1 inclusive do:
if A[i-1] > A[i] then
swap( A[i-1], A[i] )
swapped = true
end if
end for
while swapped
end procedure
After this first pass through all the data, you’ve made N-1 comparisons and somewhere between
0 and N-1 swaps. The item at the end of the array is sorted and won’t be moved again.
Now you go back and start another pass from the left end of the line. Again, you go toward the
right, comparing and swapping when appropriate.
This rule could be stated as:
When you reach the first sorted numbers (for example), start over at the left end of the line.
You continue this process until all the numbers are in order.
Selection Sort
The selection sort improves on the bubble sort by reducing the number of swaps necessary
from O (N2) to O (N). Unfortunately, the number of comparisons remains O (N2). However,
the selection sort can still offer a significant improvement for large records that must be
physically moved around in memory, causing the swap time to be much more important than the
comparison time. (Typically, this isn’t the case in Java, where references are moved around, and
not entire objects.)
A Brief Description
What’s involved in the selection sort is making a pass through all the numbers and picking (or
selecting, hence the name of the sort) the smallest one. This smallest number is then swapped
with the number on the left end of the line, at position 0. Now the leftmost number is sorted
and won’t need to be moved again. Notice that in this algorithm the sorted numbers accumulate
on the left (lower indices), whereas in the bubble sort they accumulated on the right.
The next time you pass down the row of numbers, you start at position 1, and, finding the
minimum, swap with position 1. This process continues until all the numbers are sorted.
74
Algorithm
The algorithm works as follows:
 Find the minimum value in the list
 Swap it with the value in the first position
 Repeat the steps above for the remainder of the list (starting at the second position and
advancing each time)
Effectively, the list is divided into two parts: the sublist of items already sorted, which is built
up from left to right and is found at the beginning, and the sublist of items remaining to be
sorted, occupying the remainder of the array.
Here is an example of this sort algorithm sorting five elements:
64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64
(nothing appears changed on this last line because the last 2 numbers were already in order)
Selection sort can also be used on list structures that make add and remove efficient, such as a
linked list. In this case it's more common to remove the minimum element from the remainder of
the list, and then insert it at the end of the values sorted so far.
Insertion Sort
In most cases the insertion sort is the best of the elementary sorts described in this sorting
lecture. It still executes in O (N2) time, but it’s about twice as fast as the bubble sort and
somewhat faster than the selection sort in normal situations. It’s also not too complex, although
it’s slightly more involved than the bubble and selection sorts.
It’s often used as the final stage of more sophisticated sorts, such as quicksort.
Insertion Sort on the Baseball Players
To begin the insertion sort, start with your baseball players lined up in random order. It’s
easier to think about the insertion sort if we begin in the middle of the process, when the team
is half sorted.
Partial Sorting
At this point there’s an imaginary marker somewhere in the middle of the line. The players to
the left of this marker are partially sorted. This means that they are sorted among themselves;
each one is taller than the person to his or her left. However, the players aren’t necessarily in
74
their final positions because they may still need to be moved when previously unsorted players
are inserted between them.
Note that partial sorting did not take place in the bubble sort and selection sort. In these
algorithms a group of data items was completely sorted at any given time; in the insertion sort a
group of items is only partially sorted.
The Marked Player
The player where the marker is, whom we’ll call the “marked” player and all the players on her
right, is as yet unsorted. This is shown in Figure 1.a.
Algorithm
An example on insertion sort.
 Check each element and put them in the right order in the sorted list.
 Every repetition of insertion sort removes an element from the input data, inserting it into the
correct position in the already-sorted list, until no input elements remain. The choice of which
element to remove from the input is arbitrary, and can be made using almost any choice
algorithm.
Sorting is typically done in-place. The resulting array after k iterations has the property where
the first k + 1 entries are sorted. In each iteration the first remaining entry of the input is
removed, inserted into the result at the correct position, thus extending the result:
Becomes
with each element greater than x copied to the right as it is compared against x.
The most common variant of insertion sort, which operates on arrays, can be described as
follows:
1. Suppose there exists a function called Insert designed to insert a value into a sorted
sequence at the beginning of an array. It operates by beginning at the end of the sequence and
shifting each element one place to the right until a suitable position is found for the new
element. The function has the side effect of overwriting the value stored immediately after the
sorted sequence in the array.
2. To perform an insertion sort, begin at the left-most element of the array and invoke Insert
to insert each element encountered into its correct position. The ordered sequence into which the
element is inserted is stored at the beginning of the array in the set of indices already
examined. Each insertion overwrites a single value: the value being inserted.
05
Figure 1 The insertion sort on baseball players.
What we’re going to do is insert the marked player in the appropriate place in the (partially)
sorted group. However, to do this, we’ll need to shift some of the sorted players to the right to
make room. To provide a space for this shift, we take the marked player out of line. (In the
program this data item is stored in a temporary variable.) This step is shown in Figure 1.b.
Now we shift the sorted players to make room. The tallest sorted player moves into the marked
player’s spot, the next-tallest player into the tallest player’s spot, and so on.
05
When does this shifting process stop? Imagine that you and the marked player are walking down
the line to the left. At each position you shift another player to the right, but you also compare
the marked player with the player about to be shifted.
The shifting process stops when you’ve shifted the last player that’s taller than the marked
player. The last shift opens up the space where the marked player, when inserted, will be in
sorted order. This step is shown in Figure 1.c.
Now the partially sorted group is one player bigger, and the unsorted group is one player smaller.
The marker T-shirt is moved one space to the right, so it’s again in front of the leftmost unsorted
player. This process is repeated until all the unsorted players have been inserted (hence the name
insertion sort) into the appropriate place in the partially sorted group.
05
Download