Uploaded by Melih

03 sorting

advertisement
Sorting Algorithms
Sorting
• Sorting is a process that organizes a collection of data
into either ascending or descending order.
• Formally
• Input: A sequence of n numbers <a1,a2,…,an>
• Output: A reordering <a’1,a’2,…,a’n> of the sequence such
that a’1 ≤ a’2 ≤ … ≤ a’n
• Given the input <6, 3, 1, 7>, the algorithm should produce
<1, 3, 6, 7>
• Called an instance of the problem
Sorting
• Sorting is a process that organizes a collection of data
into either ascending or descending order.
• We encounter sorting almost everywhere:
– Sorting prices from lowest to highest
– Sorting flights from earliest to latest
– Sorting grades from highest to lowest
– Sorting songs based on artist, album, playlist,
etc.
Sorting Algorithms
• There are many sorting algorithms (as of
05.10.2020 there are 44 Wikipedia entries)
• In this class we will learn:
– Selection Sort
– Insertion Sort
– Bubble Sort
– Merge Sort
– Quick Sort
• These are among the most fundamental
sorting algorithms
Sorting Algorithms
• As we learnt in the Analysis lecture (time
complexity), a stupid approach uses up
computing power faster than you might think.
• Sorting a million numbers:
Sorting Algorithms
• As we learnt in the Analysis lecture (time
complexity), a stupid approach uses up
computing power faster than you might think.
• Sorting a million numbers:
• https://youtu.be/kPRA0W1kECg
Selection Sort
• Partition the input list into a sorted and unsorted
part (initially sorted part is empty)
• Select the smallest element and put it to the end of
the sorted part
• Increase the size of the sorted part by one
• Repeat this n-1 times to sort a list of n elements
Sorted
Unsorted
23
78
45
8
32
56
Original List
8
78
45
23
32
56
After pass 1
8
23
45
78
32
56
After pass 2
8
8
23
23
32
32
78
45
45
78
56
56
After pass 3
After pass 4
After pass 5
8
23
32
45
56
78
Selection Sort (cont.)
Pseudo-code:
SelectionSort(A[0...n-1])
for i ←0 to n-2
min ← i
for j ← i+1 to n-1
if( A[j] < A[min]) min← j
swap A[i] and A[min]
Selection Sort -- Analysis
• What is the complexity of selection sort?
• Does it have different best, average, and worst case
complexities?
Selection Sort – Analysis (cont.)
• Selection sort is O(n2) for all three cases (prove this)
• Therefore it is not very efficient
Insertion Sort
• Insertion sort is a simple sorting algorithm that is
appropriate for small inputs.
– Most common sorting technique used by card players.
• Again, the list is divided into two parts: sorted and
unsorted.
• In each pass, the first element of the unsorted part
is picked up, transferred to the sorted sublist, and
inserted at the appropriate place.
• A list of n elements will take at most n-1 passes to
sort the data.
Insertion Sort
Sorted
Unsorted
23
78
45
8
32
56
Original List
23
78
45
8
32
56
After pass 1
23
45
78
8
32
56
After pass 2
8
8
23
23
45
32
78
45
32
78
56
56
After pass 3
After pass 4
After pass 5
8
23
32
45
56
78
Insertion Sort Algorithm
Pseudo-code:
InsertionSort(A[0...n-1])
for i ← 1 to n − 1 do
v ← A[i]
j ← i − 1
while j ≥ 0 and A[j] > v do
A[j + 1] ← A[j]
j ← j − 1
A[j + 1] ← v
Insertion Sort – Analysis
• Running time depends on not only the size of the array but also
the contents of the array.
• Best-case:
 O(n)
– Array is already sorted in ascending order.
• Worst-case:
 O(n2)
– Array is in reverse order:
• Average-case:
 O(n2)
– We have to look at all possible initial data organizations.
Analysis of insertion sort
• Which running time will be used to characterize this
algorithm?
– Best, worst or average?
• Worst:
– Longest running time (this is the upper limit for the algorithm)
– It is guaranteed that the algorithm will not be worse than this.
• Sometimes we are interested in average case. But there are
some problems with the average case.
– It is difficult to figure out the average case. i.e. what is average
input?
– Are we going to assume all possible inputs are equally likely?
– In fact for most algorithms average case is same as the worst case.
Bubble Sort
• Repeatedly swap adjacent elements that are out of
order.
• https://youtu.be/vxENKlcs2Tw
• Sort pairs of elements far apart from each other,
then progressively reduce the gap between
elements to be compared. Starting with far apart
elements, it can move some out-of-place elements
into position faster than a simple nearest neighbor
exchange. This generalization is called Shell Sort.
Bubble Sort
23
78
45
8
32
56
23
45
78
8
32
56
23
45
8
78
32
56
23
45
8
32
78
56
23
45
8
32
56
78
23
8
45
32
56
78
8
23
45
32
56
78
8
23
32
45
56
78
Bubble Sort Algorithm
Pseudo-code:
BubbleSort(A[0...n-1])
for i ←0 to n − 2 do
for j ←0 to n − 2 − i do
if A[j + 1]<A[j ]
swap A[j ] and A[j + 1]
Bubble Sort – Analysis
• Best-case:
 O(n)
– Array is already sorted in ascending order.
• Worst-case:
 O(n2)
– Array is in reverse order:
• Average-case:
 O(n2)
– We have to look at all possible initial data organizations.
• In fact, any sorting algorithm which sorts elements by swapping
adjacent elements can be proved to have an O(n2) average case
complexity
Theorem
• Any sorting algorithm which sorts elements by swapping
adjacent elements can be proven to have an O(n2) average case
complexity
• https://youtu.be/vxENKlcs2Tw
• : any pair (a, b) where a > b and index(a) < index(b)
• Example: [3, 2, 1] has 3 inversions: (3, 2), (3, 1), (2, 1)
• All inversions should be fixed to sort the input array.
• There exist inputs that have O(n2) inversions.
• A single swap of adjacent elements removes only one inversion.
 At least O(n2) swaps needed!
Recursive Insertion Sort
• To sort A[0..n-1], we recursively sort A[0..n-2] and then insert
A[n-1] into the sorted array A[0..n-2]
• A good practice of recursion
• Code?
Recursive Insertion Sort
• To sort A[0..n-1], we recursively sort A[0..n-2] and then insert
A[n-1] into the sorted array A[0..n-2]
InsertionSortRec(A[0..n-1])
if(n ≤ 1) return
InsertionSortRec(A[0...n-2])
//recursive call
j ←n-1
//begin insertion of A[n-1]
while(j > 0 & A[j] < A[j-1])
swap A[j] and A[j-1]
j ←j-1
return
Recursive Insertion Sort
• To sort A[0..n-1], we recursively sort A[0..n-2] and then insert
A[n-1] into the sorted array A[0..n-2]
• A=524613
insertKeyIntoSubarray
12456
2456
245
25
5
6
4
2
1
3
Recursive Insertion Sort
• To sort A[0..n-1], we recursively sort A[0..n-2] and then insert
A[n-1] into the sorted array A[0..n-2]
• T(n) = T(n-1) + O(n) is the time complexity of this sorting
Recursive Insertion Sort
• To sort A[0..n-1], we recursively sort A[0..n-2] and then insert
A[n-1] into the sorted array A[0..n-2]
• T(n) = T(n-1) + O(n)
• Reduce the problem size to n-1 w/ O(n) extra work
• Seems to be doing O(n) work n-1 times; so O(n2) guess?
• T(n) ≤ cn2 //assume holds
T(n) ≤ c(n-1)2 + dn
= cn2 -2cn + c + dn
= cn2 –c(2n -1) + dn
≤ cn2 //because large values of c dominates d
More Recurrences
• How about the complexity of T(n) = T(n/2) + O(1)
• Reduce the problem size to half w/ O(1) extra work
• Seems to be doing O(1) work logn times; so O(logn) guess?
• T(n) ≤ clogn //assume holds
T(n) ≤ c(log(n/2)) + d
= clogn – clog2 + d
= clogn – e //c can always selected to be > constant d
≤ clogn
More Recurrences
• How about the complexity of T(n) = 2T(n/2) + O(n)
• Reduce the problem to 2 half-sized problems w/ n extra work
• Seems to be doing O(n) work logn times; so O(nlogn) guess?
• T(n) ≤ cnlogn //assume holds
T(n) ≤ 2c(n/2 log(n/2)) + dn
= cnlogn – cnlog2 + dn
= cnlogn – n(c’ + d) //c’ = clog2
≤ cnlogn
More Recurrences
• How about the complexity of T(n) = T(n-1) + T(n-2)
• Reduce? the problem to a twice bigger one w/ no extra work
• T(n-1) + T(n-2) > 2T(n-2)  n replaced by 2n-4 (doubled)
• Or, n-size problem replaced with 2n-3 size problem (doubled)
• Seems to be doubling the problem size; so O(2n) is good guess
MergeSort
• MergeSort algorithm is one of two important divide-and-conquer
sorting algorithms (the other is QuickSort).
• It is a recursive algorithm.
– Divides the list into halves,
– Sorts each halve separately (recursively), and
– Then merges the sorted halves into one sorted array.
• https://youtu.be/es2T6KY45cA
Mergesort - Example
6 3 9divide
15472
6 3 9 1
5 4 7 2
divide
divide
7 2
6 3
9 1
5 4
divide
divide
divide
6
3
9
merge
3 6
merge
1
5
4
merge
merge
1 9
4
1 3 6 9
divide
7
merge
5
2 7
merge
2 4 5 7
merge
12345679
2
Merge
Pseudo-code:
Merge(B[0..p − 1], C[0..q − 1], A[0..p + q − 1])
i ← 0; j ← 0; k ← 0
while i <p and j <q do
if B[i] ≤ C[j]
A[k] ← B[i]
i ← i + 1
else
A[k] ← C[j]
j ← j + 1
k ← k + 1
if i = p
copy C[j..q − 1] to A[k..p + q − 1]
else
copy B[i..p − 1] to A[k..p + q − 1]
Mergesort
Pseudo-code:
MergeSort(A[0..n − 1])
if n > 1
copy A[0..n/2 − 1]
copy A[n/2..n − 1]
MergeSort(B[0..n/2
MergeSort(C[0..n/2
Merge(B, C, A)
to B[0..n/2 − 1]
to C[0..n/2 − 1]
− 1])
− 1])
Analysis of MergeSort
• What is the complexity of the merge operation for merging two
lists of size n/2?
• O(n), as we need to copy all elements
• Then the complexity of MergeSort follows
T(n) = 2T(n/2) + n
• Solving this recurrence gives us O(n logn) complexity //Slide 29
• The complexity is the same for the best, worst, and average
cases
• The disadvantage of MergeSort is that we need to use an extra
array for the merge operation (not memory efficient)
Analysis of Mergesort
• Solving T(n) = 2T(n/2) + n gives us O(nlogn) complexity
Merging Step (Linear time)
• Extra array for merge operation
Merging Step (Linear time)
Demo
• On array = {5, 2, 4, 7, 1, 3, 2, 6}
QuickSort
•
Like MergeSort, quicksort is also based on the divide-andconquer paradigm.
• But it uses this technique in a somewhat opposite manner,
as all the hard work is done before the recursive calls.
• It works as follows:
1. First selects a pivot element,
2. Then it partitions the array into two parts (elements smaller
than and greater than or equal to the pivot)
3. Then, it sorts the partitions independently (recursively)
https://youtu.be/vxENKlcs2Tw
Partition
• Partitioning places the pivot in its correct (final) position within the array.
• Arranging the array elements around the pivot p generates two smaller sorting
problems.
– sort the left section of the array, and sort the right section of the array.
– when these two smaller sorting problems are solved recursively, our bigger
sorting problem is solved.
Pivot Selection
• Which array item should be selected as pivot?
– Somehow we have to select a pivot, and we hope that we will
get a good partitioning.
– If the items in the array arranged randomly, we choose a pivot
randomly.
– We can choose the first or last element as the pivot (it may not
give a good partitioning).
– We can choose the middle element as the pivot
– We can use a combination of the above to select the pivot (in
each recursive call a different technique can be used)
Partitioning
Loop invariant:
A is segmented into 3 parts:
(1) Items < p (S_1)
(2) Items >= p (S_2)
(3) Items whose state is unknown (to be inspected)
Initially, every item is in «unknown»
Partitioning
Segment bounds
S_1 starts at first+1 upto (and including) lastS_1
S_2 starts at lastS_1+1 upto (excluding) firstUnknown
Unknown starts at firstUnknown upto and (including) last
What should be the initial values of lastS_1 & firstUnknown?
Partitioning
If the next unknown (A[firstUnknown]) is < p
Move it into S_1 by swapping A[firstUnknown] with A[lastS_1+1]
Increment lastS_1 and firstUnknown
Partitioning
If the next unknown (A[firstUnknown]) is >= p
Move it into S_2 by increment firstUnknown
QuickSort
Pseudo-code:
QuickSort(A[l..r]
if l < r //if non-empty
s ← Partition(A[l..r]) //s is a split position
Quicksort(A[l..s − 1])
Quicksort(A[s + 1..r])
QuickSort – Analysis
• If we always select the smallest or largest element as the pivot,
we’ll not be able to divide the array into similar sized partitions
• Recurrence: T(n) = n + T(1) + T(n-1)
• O(n2) complexity (worst case) //Slide 27
• If our partitions are equal sized we have:
• Recurrence: T(n) = n + 2T(n/2) //MergeSort
• O(n logn) complexity (best case)
• On average, quicksort has been proven to be in O(n logn)
• It also does not need an extra array like MergeSort
• Therefore, it is among popular sorting algorithms
Download