03. Sorting

advertisement
241-423 Advanced Data Structures
and Algorithms Semester 2, 2013-2014
3. Sorting Algorithms
• Objective
– examine popular sorting algorithms, with
an emphasis on divide and conquer
ADSA: Sorting/3
1
Contents
1. Insertion Sort
2. Divide and Conquer Algorithms
3. Merge Sort
4. Quicksort
5. Comparison of Sorting Algorithms
6. Finding the kth Largest Element
ADSA: Sorting/3
2
1. Insertion Sort
• Each pass inserts an element (x) into a
sorted sublist (sub-array) on the left.
• Items larger than x move to the right to
make room for its insertion.
ADSA: Sorting/3
3
Insertion Sort
Diagram
ADSA: Sorting/3
4
Outline Algorithm
• Assume the first array element is in the
right position.
• In the ith pass (1 ≤ i ≤ n-1), the elements in
the range 0 to i-1 are already sorted.
• Insert ith position target into correct
position j by moving elements in the range
[j, i-1] to the right until there is space in
arr[j].
ADSA: Sorting/3
5
Simple Insertion Sort
public static void insertion_srt(int arr[])
{
int n = arr.length;
for (int i = 1; i < n; i++) {
int j = i;
int target = arr[i];
// sort ith elem
while ((j > 0) && (arr[j-1] > target)){
arr[j] = arr[j-1];
// move right
j--;
}
arr[j] = target;
}
}
ADSA: Sorting/3
6
insertionSort()
public static <T extends Comparable<? super T>>
void insertionSort(T[] arr)
{
int n = arr.length;
for (int i = 1; i < n; i++) {
int j = i;
T target = arr[i];
while (j > 0 && target.compareTo(arr[j-1]) < 0) {
arr[j] = arr[j-1];
j--;
}
arr[j] = target;
}
} // end of insertionSort()
ADSA: Sorting/3
7
Insertion Sort Efficiency
• Best case running time is O(n)
– when the array is already sorted
• The worst and average case running times
are O(n2).
• Insertion sort is very efficient when the
array is "almost sorted".
ADSA: Sorting/3
8
2. Divide and Conquer Algorithms
• Divide a problem into smaller versions of
the same problem, using recursion.
• Solve the smaller versions.
• Combine the small versions solutions
together to get an answer for the big,
original problem.
ADSA: Sorting/3
9
Examples
• Binary search
• Merge sort and quicksort (here)
• Binary tree traversal
ADSA: Sorting/3
10
3. Merge Sort
• Sort an array with n elements by splitting it
into two halves. Keep splitting in half
recursively.
• Sort the small elements.
• Merge the small elements recursively back
together into a single sorted array.
ADSA: Sorting/3
11
Merge Sort Diagram
ADSA: Sorting/3
12
General Sort Methods
• Ford and Topp's Arrays class provides two
versions of the merge sort algorithm.
– one version takes an Object array arr[] as input;
– the other version is generic and specifies arr[]
as an array of type T
• Both methods call msort() to carry out the
merge sort.
ADSA: Sorting/3
13
sort() - with Object Array
public static void sort(Object[] arr)
{
// create a temporary array
Object[] tempArr = arr.clone();
msort(arr, tempArr, 0, arr.length);
}
sort the entire array
(the range 0-arr.length)
ADSA: Sorting/3
14
sort() - Generic Version
public static <T extends Comparable<? super T>>
void sort(T[] arr)
{
// create a temporary array
T[] tempArr = (T[])arr.clone();
msort(arr, tempArr, 0, arr.length);
}
ADSA: Sorting/3
15
msort()
• Split into two lists by computing the
midpoint of the index range:
int midpt = (last + first)/2;
• Call msort() recursively on the index
range [first, mid) and on the index range
[mid, last).
• When the resulting lists are small, start
merging them back together into sorted
order.
ADSA: Sorting/3
16
Tracing msort()
split
merge
ADSA: Sorting/3
17
msort()
private static void msort(Object[] arr,
Object[] tempArr,
int first, int last)
{
// if sublist has more than 1 elem.
if ((first + 1) < last){
int midpt = (last + first)/2;
msort(arr, tempArr, first, midpt);
msort(arr, tempArr, midpt, last);
// if arr[] is now sorted, finish
if (((Comparable)arr[midpt-1]).compareTo
(arr[midpt]) <= 0)
return;
:
ADSA: Sorting/3
18
// indexA scans arr[] in range [first, mid)
int indexA = first;
// indexB scans arr[] in range [mid, last)
int indexB = midpt;
int indexC = first;
// for merged temp list
/* while both sublists are not finished,
compare arr[indexA] and arr[indexB];
copy the smaller into the temp list */
while (indexA < midpt && indexB < last) {
if (((Comparable)arr[indexA]).compareTo
(arr[indexB]) < 0) {
tempArr[indexC] = arr[indexA];
indexA++;
}
ADSA: Sorting/3
19
else {
tempArr[indexC] = arr[indexB];
indexB++;
}
indexC++;
}
// copy over what's left of sublist A
while (indexA < midpt) {
tempArr[indexC] = arr[indexA];
indexA++;
indexC++;
}
:
ADSA: Sorting/3
20
// copy over what's left of sublist B
while (indexB < last) {
tempArr[indexC] = arr[indexB];
indexB++;
indexC++;
}
// copy temp array back to arr[]
for (int i = first; i < last; i++)
arr[i] = tempArr[i];
}
}
// end of msort()
ADSA: Sorting/3
21
msort() Notes
• Continue only as long as first+1
• Do not merge arr if arr[mid-1]
ADSA: Sorting/3
< last
< arr[mid]
22
Recursion Tree for Merge Sort
ADSA: Sorting/3
23
Efficiency of Merge Sort
• Total number of comparisons
= no. of levels * no. of comparisons at a level
• msort() starts with a list of size n
• msort() recurses until the sublist size is 1
• Each level roughly halves the sublist size:
– n, n/2, n/4, ..., 1
– no. of levels = log2n (roughly)
ADSA: Sorting/3
24
• No. of msort() calls at a level:
–
–
–
–
–
at level 0: 1 msort() call
at level 1: 2 calls
at level 2: 4 calls
...
at level i: 2i calls
ADSA: Sorting/3
25
• No of comparisons in 1 msort call at a level:
–
–
–
–
–
at level 0: a msort() call compares n elements
at level 1: n/2 comparisons
at level 2: n/4 comparisons
...
at level i: n/2i elements
• Total no. of comparisons at a level:
– no. of calls at a level * comparisons in 1 msort()call
– 2i * n/2i = n
ADSA: Sorting/3
26
• Total number of comparisons
= no. of levels * no. of comparisons at a level
=
log2n
* n
• So the worst case running time is
= O(n log2n)
ADSA: Sorting/3
27
4. Quicksort
• Uses a divide-and-conquer strategy like
merge sort.
• But, unlike merge sort, quicksort is an
in-place sorting algorithm
– elements are exchanged within the list
without the need for temporary lists/arrays
– space efficient
ADSA: Sorting/3
28
Quicksort Steps
• Pick an element, called a pivot, from the
list.
• Reorder the list so that all elements which
are less than the pivot come before the pivot
and so that all elements greater than the
pivot come after it
ADSA: Sorting/3
29
• Recursively call quicksort on the sublist of
lesser elements and the sublist of greater
elements.
• The stopping case for the recursion are lists
of size zero or one, which are always sorted.
ADSA: Sorting/3
30
Quicksort
Diagram
pivot
ADSA: Sorting/3
31
Partitioning a List
• The pivot is the element at index
– mid = (first + last)/2.
• Separate the elements of arr[] into two sublists,
Sl and Sh.
– Sl contains the elements ≤ pivot
– Sh contains the elements ≥ pivot
ADSA: Sorting/3
(l = low)
(h = high)
32
• Exchange arr[first] and arr[mid]
• Scan the list with index range [first+1, last)
– scanUp starts at first+1 and moves up the list,
finding elements for Sl.
– scanDown starts at position last -1 and moves
down the list, finding elements for Sh.
ADSA: Sorting/3
33
• When arr[scanUp]  pivot and
arr[scanDown]  pivot then the two
elements are in the wrong sublists.
• Exchange the elements at the two positions
and then resume scanning.
ADSA: Sorting/3
34
ADSA: Sorting/3
35
ADSA: Sorting/3
36
• scanUp and scanDown move toward each other until
they meet or pass one another (scanDown  scanUp).
ADSA: Sorting/3
37
• scanDown is at the place where the pivot
should appear
– exchange arr[0] and arr[scanDown] to correctly
position the pivot
ADSA: Sorting/3
38
pivotIndex()
• The method
public static <T extends Comparable<? super T>>
int pivotIndex(T[] arr, int first, int last)
takes array arr and index range [first, last)
and returns the index of the pivot after
partitioning arr[].
ADSA: Sorting/3
39
public static <T extends Comparable<? super T>>
int pivotIndex(T[] arr, int first, int last)
{
int mid;
// index for the midpoint
T pivot;
if (first == last) // empty sublist
return last;
else if (first == (last-1))
// 1-element sublist
return first;
else {
mid = (last + first)/2;
pivot = arr[mid];
:
ADSA: Sorting/3
40
// exchange pivot and bottom end of range
arr[mid] = arr[first];
arr[first] = pivot;
int scanUp = first + 1;
int scanDown = last - 1;
// scanning indices
while(true) {
/* move up the lower sublist while scanUp
is less than or equal to scanDown and
the array value is less than pivot */
while ((scanUp <= scanDown) &&
(arr[scanUp].compareTo(pivot) < 0))
scanUp++;
ADSA: Sorting/3
41
/* move down upper sublist while array
value is greater than the pivot */
while (pivot.compareTo(arr[scanDown]) < 0)
scanDown--;
/* if indices are not in their sublists,
partition is complete */
if (scanUp >= scanDown)
break;
// found two elements in wrong sublists; exchange
T temp = arr[scanUp];
arr[scanUp] = arr[scanDown];
arr[scanDown] = temp;
scanUp++;
scanDown--;
}
:
ADSA: Sorting/3
42
// copy pivot to index posn (scanDown) that
// partitions the sublists
arr[first] = arr[scanDown];
arr[scanDown] = pivot;
return scanDown;
}
}
// end of pivotIndex()
ADSA: Sorting/3
43
quicksort()
• quicksort() sorts a generic array arr[] by
calling qsort() with the index range [0,
arr.length).
public static <T extends Comparable<? super T>>
void quicksort(T[] arr)
{ qsort(arr, 0, arr.length); }
ADSA: Sorting/3
44
qsort()
• Recursively partition the elements in the
index range into smaller and smaller
sublists, terminating when the size of a list
is 0 or 1.
• For efficiency, handle a list of size 2 by
comparing the elements and exchanging
them if necessary.
ADSA: Sorting/3
45
• For larger lists, call pivotIndex() to reorder
the elements and determine the pivot.
• Make two calls to qsort():
– the first call specifies the index range for the
lower sublist
– the second call specifies the index range for the
upper sublist
ADSA: Sorting/3
46
qSort() Diagram
ADSA: Sorting/3
47
private static
void qsort(T[]
{
// if range
if ((last –
return;
<T extends Comparable<? super T>>
arr, int first, int last)
is less than two elements
first) <= 1)
// if sublist has two elements
else if ((last – first) == 2) {
:
ADSA: Sorting/3
48
/* compare arr[first] and arr[last-1] and
exchange if necessary */
if (arr[last-1].compareTo(arr[first]) < 0) {
T temp = arr[last-1];
arr[last-1] = arr[first];
arr[first] = temp;
}
return;
}
else {
int pivotLoc = pivotIndex(arr, first, last);
qsort(arr, first, pivotLoc);
qsort(arr, pivotLoc +1, last);
}
}
// end of qsort()
ADSA: Sorting/3
49
Running Time of Quicksort
• The average case running time is O(n log2n).
• The best case occurs when the array is
already sorted.
ADSA: Sorting/3
50
• Quicksort is efficient even when the array
is in descending order.
ADSA: Sorting/3
51
• The worst-case occurs when the chosen pivot
is always the largest or smallest element in its
sublist.
– the running time is O(n2)
– highly unlikely
ADSA: Sorting/3
52
5. Comparison of Sorting Algorithms
• An inversion in an array, arr[], is an ordered
pair (arr[i], arr[j]), i < j, where arr[i] > arr[j].
• When sorting in ascending order, arr[i] and
arr[j] are out of order.
ADSA: Sorting/3
53
• The O(n2) sorting algorithms compare
adjacent elements, generally remove one
inversion with each iteration
– e.g. selection and insertion sort
• The O(n log2n) sorting algorithms compare
non-adjacent elements, and generally remove
more than one inversion with each iteration.
– e.g. quicksort and merge sort
ADSA: Sorting/3
54
Timing Sorts
import java.util.Random;
import ds.util.Arrays;
import ds.time.Timing;
public class TimingSorts
{
public static void main(String[] args)
{
final int SIZE = 75000;
Integer[] arr1 = new Integer[SIZE],
arr2 = new Integer[SIZE],
arr3 = new Integer[SIZE];
Random rnd = new Random();
:
ADSA: Sorting/3
55
/* load each array with the same sequence of
random numbers in the range 0 to 999999 */
int rndNum;
for (int i=0; i < SIZE; i++) {
rndNum = rnd.nextInt(1000000);
arr1[i] = arr2[i] = arr3[i] = rndNum;
}
}
// call timeSort() for each sort
timeSort(arr1, 0, "Merge sort");
timeSort(arr2, 1, "Quick sort");
timeSort(arr3, 2, "Insertion sort");
// end of main()
ADSA: Sorting/3
56
public static <T extends Comparable<? super T>>
void timeSort(T[] arr, int sortType, String sortName)
{
Timing t = new Timing();
t.start();
if(sortType == 0)
Arrays.sort(arr);
// merge sort in F&T
else if (sortType == 1)
Arrays.quicksort(arr);
else
Arrays.insertionSort(arr);
double timeRequired = t.stop();
}
outputFirst_Last(arr);
System.out.print(" " + sortName + " time is " +
timeRequired + "\n\n");
// end of timeSort()
ADSA: Sorting/3
57
public static void outputFirst_Last(Object[] arr)
// output first 3 elements and last 3 elements
{
for (int i=0; i < 3; i++)
System.out.print(arr[i] + " ");
System.out.print(". . . ");
for (int i=n-3; i < arr.length; i++)
System.out.print(arr[i] + " ");
System.out.println();
}
ADSA: Sorting/3
58
Output
26 38 47 . . . 999980 999984 999984
Merge sort time is 0.109
26 38 47 . . . 999980 999984 999984
Quick sort time is 0.078
26 38 47 . . . 999980 999984 999984
Insertion sort time is 100.611
ADSA: Sorting/3
O(n log2n)
O(n2)
59
6. Finding the
th
k
Largest Element
• Sort the array and then access the element at
position k.
– running time is O(n log2n) is we use quicksort
or merge sort
• For a more efficient solution, locate the
position of the kth-largest value by
partitioning the elements into two sublists.
ADSA: Sorting/3
60
values ≤ kth-largest
0 ... k-1
kth-largest
k
values ≥ kth-largest
k+1 ... n-1
• The lower sublist contains k elements that
are ≤ the kth-largest.
• The upper sublist contains elements that are
≥ the kth-largest.
• The elements in the sublists do not need to
be ordered.
ADSA: Sorting/3
61
• Use the pivoting technique from the
quicksort algorithm to create a partition.
• The algorithm is recursive:
– index = pivotIndex()
– If index == k, done, return arr[index];
– otherwise, call pivotIndex() with range [first,
index) if k < index, or with range [index+1,
last) if k > index.
•
examine only one of the lists
ADSA: Sorting/3
62
public static <T extends Comparable<? super T>>
int findKth(T[] arr, int first, int last, int k)
{
if (first > last)
return -1;
// partition range (first, last) in arr about the
// pivot arr[index]
int index = pivotIndex(arr, first, last);
// if index == k, we are done. kth largest is arr[k]
if (index == k)
return arr[index];
// return array value
else if(k < index)
// search in lower sublist (first, index)
findKth(arr, first, index, k);
else
// search in upper sublist (index+1, last)
findKth(arr, index+1, last, k);
}
ADSA: Sorting/3
63
Running Time of findKth()
• The running time is O(n)
– no of comparisons = n + n/2 + n/4 + n/8 + ...
= 2n
• This is faster than the O(n log2n) result
for a sorted array
– this is to be expected since findKth() only uses one
of its sublists at each recursive call compared to
quicksort or merge sort which use both
ADSA: Sorting/3
64
Download