Slides - People Server at UNCW - University of North Carolina

advertisement
Sorting
Devon M. Simmonds
University of North Carolina, Wilmington

TIME: Tuesday/Thursday 11:11:50am in 1012 &
Thursday 3:30-5:10pm in 2006.



Office hours: TR 1-2pm or by appointment.
Office location: CI2046.
Email: simmondsd[@]uncw.edu
1
Objectives
• To introduce basic sort algorithms:
–
–
–
–
–
Exchange/bubble sort
Insertion sort
Selection soft
Quicksort
mergesort
2
Sorting: The Big Picture
Given n comparable elements in an array,
sort them in an increasing (or decreasing)
order.
Simple
algorithms:
O(n2)
Insertion sort
Selection sort
Bubble sort
Shell sort
…
Fancier
algorithms:
O(n log n)
Heap sort
Merge sort
Quick sort
…
Comparison
lower bound:
(n log n)
Specialized
algorithms:
O(n)
Bucket sort
Radix sort
Handling
huge data
sets
External
sorting
3
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Exchange (Bubble) Sort
• Algorithm (for a list a with n elements)
– Repeat until list a is sorted
• for each i from n-1 downto 1
– if(a[i] < a[i-1])
» exchange (a[i], a[i-1])
– Result: After the kth pass, the first k elements are
sorted.
4
Example
• 17 21 6 10 16 12 15 4
5
Try it out: Bubble sort
• Insert 31, 16, 54, 4, 2, 17, 6
6
Bubble Sort Code
def bubbleSort(self, a):
for i in range(len(a)):
for j in range(len(a)-1, 0, -1):
if(a[j] < a[j-1]):
#exchange items
temp = a[j-1]
a[j-1] = a[j]
a[j] = temp
7
Insertion Sort: Idea
• Algorithm (for a list a[0..n] with n elements)
– for each i from 1 to n-1
• put the ith element in the correct place among the first i+1
elements
– At the kth step, put the kth input element in the correct
place among the first k elements
– Result: After the kth pass, the first k elements are sorted.
8
Example
9
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Example
10
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Try it out: Insertion sort
• Insert 31, 16, 54, 4, 2, 17, 6
11
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Insertion Sort Code
def insertionSort(self, a):
for i in range(1, len(a)):
#insert a(i) in correct position in a(0) .. a(i)
temp = a[i]
j = i-1
while (j >= 0 and a[j] > temp):
if(a[j] > temp):
a[j+1] = a[j]
j -= 1
a[j+1] = temp
12
Selection Sort: idea
•
•
•
•
Find the smallest element, put it 1st
Find the next smallest element, put it 2nd
Find the next smallest, put it 3rd
And so on …
13
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Selection Sort: idea
• Algorithm (for a list a with n elements)
– for each i from 1 to n-1
• Find the smallest element, put it in position i-1
– Result: After the kth pass, the first k elements are
sorted.
14
Selection Sort: Code
void SelectionSort (Array a[0..n-1]) {
for (i=0, i<n; ++i) {
j = Find index of smallest entry in a[i..n-1]
Swap(a[i],a[j])
}
}
Runtime:
worst case
:
best case
:
average case :
15
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Try it out: Selection sort
• Insert 31, 16, 54, 4, 2, 17, 6
16
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Selection Sort Code
def selectionSort(self, a):
for i in range(len(a)-1):
#find smallest of items i, i+1, i+2, .., size()-1 and
#exchange smallest with item in position i
sIndex = i
smallest = a[i]
#j = i + 1
#while (j < len(a)):
for j in range(i+1, len(a)):
if(a[j] < smallest):
sIndex = j
smallest = a[j]
#exchange items
temp = a[i]
a[i] = smallest
a[sIndex] = temp
17
Merge Sort
MergeSort
(Array [1..n])
1. Split Array in half
2. Recursively sort each half
3. Merge two halves together
Merge
“The 2-pointer method”
(a1[1..n],a2[1..n])
i1=1, i2=1
While (i1<n, i2<n) {
if (a1[i1] < a2[i2]) {
Next is a1[i1]
i1++
} else {
Next is a2[i2]
i2++
}
}
Now throw in the dregs…
18
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Mergesort example
82945316
Divide
8294
5316
Divide
82 94
53 16
1-element 8 2 9 4 5 3 1 6
Merge
28 49
35 16
Merge
2489
1356
Final:
12345689
19
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Mergesort example
def mergeSort(self, alist):
if len(alist)>1:
Di mid = len(alist)//2
vi lefthalf = alist[:mid]
de righthalf = alist[mid:]
ms([8 2 9 4 5 3 1 6])
Divide lh=[8 2 9 4] rh=[5 3 1 6]
3 function ms(lh)
merge(lh, rh)
ms(rh)
1
3
Calls
2
ms([8 2 9 4])
Divide lh=[8 2] rh=[9 4]
3 function ms(lh)
ms(rh)
Calls
1
2
1 self.mergeSort(lefthalf)
2 self.mergeSort(righthalf)
3 self.merge(lefthalf, righthalf, alist)
def mergeSort(self, alist):
merge(lh, rh)
3
if len(alist)>1:
mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
ms([8 2]) ms([9 4])
Dividerh=[2]
lh=[9] rh=[4]
Divide lh=[8]
3 function ms(lh)
rh)
3 function
ms(rh)
merge(lh,merge(lh,
rh)
ms(lh)
ms(rh)
3
Calls
1
23
Calls
1
2
[2, 8]
self.mergeSort(lefthalf)
self.mergeSort(righthalf)
self.merge(lefthalf, righthalf, alist)
[4, 9]
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
20
Mergesort example
ms([8 2 9 4 5 3 1 6])
Divide lh=[8 2 9 4] rh=[5 3 1 6]
3 function ms(lh)
merge(lh, rh)
ms(rh)
1
3
Calls
2
ms([8 2 9 4])
Divide lh=[8 2] rh=[9 4]
3 function ms(lh)
ms(rh)
Calls
1
2
def mergeSort(self, alist):
Di
viif len(alist)>1:
de mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
1 self.mergeSort(lefthalf)
2 self.mergeSort(righthalf)
3 self.merge(lefthalf, righthalf, alist)
merge(lh, rh)
ms([8 2])
ms([9 4])
[2, 8]
[4, 9]
3
merge([2, 8], [4 9])
[2, 4, 8, 9]
What is next? ms([5 3 1 6])
21
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Try it out: Merge sort
• Insert 31, 16, 54, 4, 2, 17, 6
22
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
MergeSort Code
def mergeSort(self, alist):
if len(alist)>1:
mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
print("Splitting ",alist, "into", lefthalf, "and", righthalf)
self.mergeSort(lefthalf)
self.mergeSort(righthalf)
self.merge(lefthalf, righthalf, alist)
def merge(self, lefthalf, righthalf, alist):
print("Merging ", lefthalf, "and", righthalf)
i=0
j=0
k=0
while i<len(lefthalf) and j<len(righthalf):
if lefthalf[i]<righthalf[j]:
alist[k]=lefthalf[i]
i=i+1
else:
alist[k]=righthalf[j]
j=j+1
k=k+1
while i<len(lefthalf):
alist[k]=lefthalf[i]
i=i+1
k=k+1
while j<len(righthalf):
alist[k]=righthalf[j]
j=j+1
k=k+1
23
The steps of QuickSort
S
81
13
31
43
select pivot value
57
75
92
S1
0
13
26
31
43
0
26
65
S2
partition S
75
65
81
92
57
S1
QuickSort(S1) and
QuickSort(S2)
S2
0 13 26 31 43 57
65
S
65
0 13 26 31 43 57
75
75
81
81
92
92
Presto! S is sorted
[Weiss]
24
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
QuickSort Example
0
1
2
3
4
5
6
7
8
9
8
1
4
9
6
3
5
2
7
0
8
1
4
9
0
3
5
2
7
6
j
i
•Choose the pivot as the median of three.
•Place the pivot and the largest at the right
and the smallest at the left
25
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
QuickSort Example
8
1
4
9
0
3
5
2
7
6
j
i
1)
2)
3)
4)
5)
6)
Move i to the right to first element larger than pivot.
Move j to the left to first element smaller than pivot.
Swap elements at I and j
Repeat until i and j cross
Swap pivot with element at i
Repeat steps 1-5 for the two partitions
1) Elements > pivot
2) Elements < pivot
26
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Recursive Quicksort
Quicksort(A[]: integer array, left,right : integer): {
pivotindex : integer;
if left + CUTOFF  right then
pivot := median3(A,left,right);
pivotindex := Partition(A,left,right-1,pivot);
Quicksort(A, left, pivotindex – 1);
Quicksort(A, pivotindex + 1, right);
else
Insertionsort(A,left,right);
}
Don’t use quicksort for small arrays.
CUTOFF = 10 is reasonable.
27
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Try it out: Recursive quicksort
• Insert 31, 16, 54, 4, 2, 17, 6
28
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
QuickSort:
Average case complexity
Turns out to be O(n log n)
See Section 7.7.5 for an idea of the proof.
Don’t need to know proof details for this course.
29
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
QuickSort Code
def quickSort(self, array, start, end):
left = start
right= end
if (right - left < 1):
return
else:#at least 2 elements to be sorted
pivot = array[start]
while (right> left):
while (array[left] <= pivot and left < right):
left += 1
while (array[right] > pivot and right >= left):
right -=1
if (right> left):
swap(array, left, right)
right -= 1
left += 1
#swap array[start] and array[right]
temp = array[start]
array[start] = array[right]
array[right] = temp
self.quickSort(array, start, right- 1);
self.quickSort(array, right+ 1, end)
30
Features of Sorting Algorithms
• In-place
– Sorted items occupy the same space as the
original items. (No copying required, only O(1)
extra space if any.)
• Stable
– Items in input with the same value end up in
the same order as when they began.
31
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Your Turn
Sort Properties
Are the following:
Bubble Sort?
Insertion Sort?
Selection Sort?
MergeSort?
QuickSort?
stable?
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
in-place?
Can Be
Can Be
Can Be
Can Be
Can Be
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
32
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
How fast can we sort?
• Heapsort, Mergesort, and Quicksort all run
in O(N log N) best case running time
• Can we do any better?
• No, if the basic action is a comparison.
33
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Sorting Model
• Recall our basic assumption: we can only
compare two elements at a time
– we can only reduce the possible solution space by
half each time we make a comparison
• Suppose you are given N elements
– Assume no duplicates
• How many possible orderings can you get?
– Example: a, b, c (N = 3)
34
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Permutations
• How many possible orderings can you get?
–
–
–
–
Example: a, b, c (N = 3)
(a b c), (a c b), (b a c), (b c a), (c a b), (c b a)
6 orderings = 3•2•1 = 3! (ie, “3 factorial”)
All the possible permutations of a set of 3 elements
• For N elements
– N choices for the first position, (N-1) choices for the
second position, …, (2) choices, 1 choice
– N(N-1)(N-2)(2)(1)= N! possible orderings
35
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
BucketSort (aka BinSort)
If all values to be sorted are known to be
between 1 and K, create an array count of size
K, increment counts while traversing the input,
and finally output the result.
Example K=5. Input = (5,1,3,4,3,2,1,1,5,4,5)
count
1
2
3
4
5
array
Running time to sort n items?
36
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
BucketSort Complexity: O(n+K)
• Case 1: K is a constant
– BinSort is linear time
• Case 2: K is variable
– Not simply linear time
• Case 3: K is constant but large (e.g. 232)
– ???
37
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Fixing impracticality: RadixSort
• Radix = “The base of a number system”
– We’ll use 10 for convenience, but could be
anything
• Idea: BucketSort on each digit,
least significant to most significant
(lsd to msd)
38
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Radix Sort Example (1st pass)
Bucket sort
by 1’s digit
After 1st pass
Input data
478
537
9
721
3
38
123
67
0
1
721
2
3
3
123
4
5
6
7
8
537
67
478
38
9
9
721
3
123
537
67
478
38
9
This example uses B=10 and base 10
digits for simplicity of demonstration.
Larger bucket counts should be used
in an actual implementation.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
39
Radix Sort Example (2nd pass)
After
1st
Bucket sort
by 10’s
digit
pass
721
3
123
537
67
478
38
9
0
03
09
1
2
3
721
123
537
38
4
5
After 2nd pass
6
7
67
478
8
9
3
9
721
123
537
38
67
478
40
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Radix Sort Example (3rd pass)
Bucket sort
by 100’s
digit
After 2nd pass
3
9
721
123
537
38
67
478
0
1
003
009
038
067
123
2
3
4
5
478
537
After 3rd pass
6
7
721
8
9
3
9
38
67
123
478
537
721
Invariant: after k passes the low order k digits are sorted.
41
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Your Turn
RadixSort
• Input:126, 328, 636, 341, 416, 131,
328
BucketSort on lsd:
0
1
2
3
4
5
6
7
8
9
BucketSort on next-higher digit:
0
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
BucketSort on msd:
0
1
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
42
Radixsort: Complexity
• How many passes?
• How much work per pass?
• Total time?
• Conclusion?
• In practice
– RadixSort only good for large number of elements with relatively
small values
– Hard on the cache compared to MergeSort/QuickSort
43
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Internal versus External Sorting
• So far assumed that accessing A[i] is fast
– Array A is stored in internal memory
(RAM)
– Algorithms so far are good for internal sorting
• What if A is so large that it doesn’t fit in
internal memory?
– Data on disk or tape
– Delay in accessing A[i] – e.g. need to spin
disk and move head
44
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Internal versus External
Sorting
• Need sorting algorithms that minimize disk/tape
access time
• External sorting – Basic Idea:
– Load chunk of data into RAM, sort, store this “run”
on disk/tape
– Use the Merge routine from Mergesort to merge
runs
– Repeat until you have only one run (one sorted
chunk)
– Text gives some examples
45
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Summary of sorting
• Sorting choices:
– O(N2) – Bubblesort, Insertion Sort
– O(N log N) average case running time:
• Heapsort: In-place, not stable.
• Mergesort: O(N) extra space, stable.
• Quicksort: claimed fastest in practice, but O(N2)
worst case. Needs extra storage for recursion. Not
stable.
– O(N) – Radix Sort: fast and stable. Not
comparison based. Not in-place.
46
CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Search Algorithms
• We now present several algorithms that can be
used for searching and sorting lists
– We first discuss the design of an algorithm,
– We then show its implementation as a Python
function, and,
– Finally, we provide an analysis of the algorithm’s
computational complexity
• To keep things simple, each function processes a
list of integers
Fundamentals of Python: From First Programs Through Data Structures
47
Search for a Minimum
• Python’s min function returns the minimum or
smallest item in a list
• Alternative version:
n – 1 comparisons for a list of size n
• O(n)
Fundamentals of Python: From First Programs Through Data Structures
48
Linear Search of a List
• Python’s in operator is implemented as a method
named __contains__ in the list class
– Uses a sequential search or a linear search
• Python code for a linear search function:
– Analysis is different from previous one
Fundamentals of Python: From First Programs Through Data Structures
49
Best-Case, Worst-Case, and AverageCase Performance
• Analysis of a linear search considers three cases:
– In the worst case, the target item is at the end of the
list or not in the list at all
• O(n)
– In the best case, the algorithm finds the target at the
first position, after making one iteration
• O(1)
– Average case: add number of iterations required to
find target at each possible position; divide sum by n
• O(n)
Fundamentals of Python: From First Programs Through Data Structures
50
Binary Search of a List
• A linear search is necessary for data that are not
arranged in any particular order
• When searching sorted data, use a binary search
Fundamentals of Python: From First Programs Through Data Structures
51
Binary Search of a List (continued)
– More efficient than linear search
• Additional cost has to do with keeping list in order
Fundamentals of Python: From First Programs Through Data Structures
52
Summary of Searching
• Linear versus binary
• O(n) vs O(lgn)
53

Reading from course text:
Qu es
ti ons?
______________________
Devon M. Simmonds
Computer Science Department
University of North Carolina Wilmington
_____________________________________________________________
54
What is sorting?
Given n elements, arrange them in an increasing
or decreasing order by some attribute.
Simple
algorithms:
O(n2)
Insertion sort
Selection sort
Bubble sort
Shell sort
…
Fancier
algorithms:
O(n log n)
Comparison
lower bound:
(n log n)
Heap sort
Merge sort
Quick sort
…
Specialized
algorithms:
O(n)
Bucket sort
Radix sort
Handling
huge data
sets
External
sorting
55
Download