Sorting2

advertisement
CS6045: Advanced Algorithms
Sorting Algorithms
Review
• Insertion Sort
– Time complexity
– Space complexity
• Merge Sort
– Time complexity
– Space complexity
• Comparisons
Review: Solving Recurrences
• Substitution method
• Iteration method
• Master method
Review: The Master Theorem
• if T(n) = aT(n/b) + f(n) then


logb a

n



logb a
T (n)   n
log n


 f (n) 







f (n)  O n logb a 


  0
logb a
f ( n)   n

 c 1


f (n)   n logb a  AND

af (n / b)  cf (n) for large n







Quicksort
• Sorts in place like insertion unlike merge
• Divide into two parts such that
– elements of left part < elements of right part
• Conquer: recursively solve for each part
separately
• Combine: trivial - do not do anything
Quicksort(A,p,r)
if p <r
then q  Partition(A,p,r)
Quicksort(A,p,q-1)
Quicksort(A,q+1,r)
//divide
//conquer left
//conquer right
Quicksort
• Another divide-and-conquer algorithm
– The array A[p..r] is partitioned into two nonempty subarrays A[p..q] and A[q+1..r]
• Invariant: All elements in A[p..q] are less than all
elements in A[q+1..r]
– The subarrays are recursively sorted by calls to
quicksort
– Unlike merge sort, no combining step: two
subarrays form an already-sorted array
Partition
• Clearly, all the action takes place in the
partition() function
– Rearranges the subarray in place
– End result:
• Two subarrays
• All values in first subarray  all values in second
– Returns the index of the “pivot” element
separating the two subarrays
Divide = Partition
PARTITION(A,p,r)
//Partition array from A[p] to A[r] with pivot A[r]
//Result: All elements  original A[r] has index  i
x = A[r]
i =p-1
for j = p to r - 1
if A[j] <= x
i=i+1
exchange A[i]  A[j]
exchange A[i+1] with A[r]
return i + 1
PARTITION(A,p,r)
x = A[r]
i =p-1
for j = p to r - 1
if A[j] <= x
i=i+1
exchange A[i]  A[j]
exchange A[i+1] with A[r]
return i + 1
Loop Invariant
Runtime of Quicksort
• Worst case:
– Partition cause one sub-problem with n-1 elements and one with 0
elements
– T(1) = (1)
T(n) = T(n - 1) + (n)
– T(n) = O(n^2)
0123456789
0
123456789
n
89
8
9
Runtime of Quicksort
•
Best case:
– every time partition in (almost) equal parts
– T(n) = 2T(n/2) + O(n)
– T(n) = O(n log n)
• Average case
– O(n log n)
Improving Quicksort
• Book discusses two solutions:
– Randomize the input array, OR
– Pick a random pivot element
Randomized Quicksort
• Idea: select a randomly chosen element as
the pivot
• Randomized algorithms:
– includes (pseudo) random-number generator
– the behavior depends not only from the input
but from random-number generator also
• Simple approach: permute randomly the
input
– same result but more difficult to analyze
Randomized Quicksort
Randomized Quicksort
• Partition around first element: O(n^2)
worst-case
• Average case: O(n log n)
Heap Sort
• So far we’ve talked about three algorithms
to sort an array of numbers
– What is the advantage of merge sort?
– What is the advantage of insertion sort?
– What is the advantage of quick sort?
• Heap sort
Heaps
• A heap can be seen as a complete binary
tree:
16
14
10
8
2
7
4
9
1
– What makes a binary tree complete?
• every node other than the leaves has two children
– Is the example above complete?
3
Heap Data Structure
• A heap (nearly complete binary tree) can be
stored as an array A
–
–
–
–
–
Root of tree is A[1]
Parent of
Left child of A[i] = A[2i]
Right child of A[i] = A[2i + 1]
Computing is fast with binary representation
implementation
Heaps
• Max-heap property:
• Min-heap property:
Heap Operations: Heapify()
• Heapify(): maintain the heap property
– Given: a node i in the heap with children l and r
– Given: two subtrees rooted at l and r, assumed
to be heaps
– Problem: The subtree rooted at i may violate
the heap property (How?)
– Action: let the value of the parent node “float
down” so subtree at i satisfies the heap property
Maintain the Heap Property
Heapify() Example
16
4
10
14
2
7
8
9
3
1
A = 16 4 10 14 7
9
3
2
8
1
Heapify() Example
16
4
10
14
2
7
8
9
3
1
A = 16 4 10 14 7
9
3
2
8
1
Heapify() Example
16
4
10
14
2
7
8
9
3
1
A = 16 4 10 14 7
9
3
2
8
1
Heapify() Example
16
14
10
4
2
7
8
9
3
1
A = 16 14 10 4
7
9
3
2
8
1
Heapify() Example
16
14
10
4
2
7
8
9
3
1
A = 16 14 10 4
7
9
3
2
8
1
Heapify() Example
16
14
10
4
2
7
8
9
3
1
A = 16 14 10 4
7
9
3
2
8
1
Heapify() Example
16
14
10
8
2
7
4
9
3
1
A = 16 14 10 8
7
9
3
2
4
1
Heapify() Example
16
14
10
8
2
7
4
9
3
1
A = 16 14 10 8
7
9
3
2
4
1
Heapify() Example
16
14
10
8
2
7
4
9
3
1
A = 16 14 10 8
7
9
3
2
4
1
Analyzing Heapify(): Informal
• Aside from the
recursive call, what is
the running time of
Heapify()?
• How many times can
Heapify()
recursively call itself?
• What is the worst-case
running time of
Heapify() on a heap
of size n?
Analyzing Heapify(): Formal
• Fixing up relationships between i, l, and r
takes (1) time
• If the heap at i has n elements, how many
elements can the subtrees at l or r have?
– Answer: 2n/3 (worst case: bottom row 1/2 full)
• So time taken by Heapify() is given by
T(n)  T(2n/3) + (1)
Analyzing Heapify(): Formal
• So we have
T(n)  T(2n/3) + (1)
• By case 2 of the Master Theorem,
T(n) = O(log n)
Heap Operations: BuildHeap()
• We can build a heap in a bottom-up manner
by running Heapify() on successive
subarrays
– Fact: for array of length n, all elements in range
A[n/2 + 1 .. n] are heaps (Why?)
– So:
• Walk backwards through the array from n/2 to 1,
calling Heapify() on each node.
• Order of processing guarantees that the children of
node i are heaps when i is processed
Build a Heap
Correctness
• Loop invariant: At start of every iteration of
for loop, each node i+1, i+2, …, n is the
root of a max-heap
Analyzing BuildHeap()
• Each call to Heapify() takes O(log n)
time
• There are O(n) such calls (specifically,
n/2)
• Thus the running time is O(n log n)
• A tighter bound is O(n)
Heapsort
• Given BuildHeap(), an in-place sorting
algorithm is easily constructed:
– Maximum element is at A[1]
– Discard by swapping with element at A[n]
• Decrement heap_size[A]
• A[n] now contains correct value
– Restore heap property at A[1] by calling
Heapify()
– Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort
Analyzing Heapsort
• The call to BuildHeap()
takes O(n) time
• Each of the n - 1 calls to
Heapify() takes O(log n)
time
• Thus the total time taken by
HeapSort()
= O(n) + (n - 1) O(log n)
= O(n) + O(n log n)
= O(n log n)
Min-Heap Operations
2
2
4
6
8
10
11
13
7
4
9
6
8
3
12
2
10
3
13
12
7
6
3
9
8
11
10
7
4
13
12
9
11
Insert(S, x): O(height) = O(log n)
2
12
6
8
10 13
4
11
12
5
6
9
8
10 13
4
4
11
5
6
9
8
10 13
4
12
11
5
6
9
8
5
11
10 13
Extract-min(S):
return head, replace head key with the last, float down, O(log n)
12
9
Priority Queues
• Heapsort is a nice algorithm, but in practice
Quicksort usually wins
• But the heap data structure is incredibly
useful for implementing priority queues
– A data structure for maintaining a set S of
elements, each with an associated value or key
– Supports the operations Insert(),
Maximum(), and ExtractMax()
– What might a priority queue be useful for?
Priority Queue Operations
• Insert(S, x) inserts the element x into set S
• Maximum(S) returns the element of S with
the maximum key
• ExtractMax(S) removes and returns the
element of S with the maximum key
Priority Queues
• Applications
– job scheduling on shared computer
– Dijkstra’s finding shortest paths in graphs
– Prim’s algorithm for minimum spanning tree
Download