Heapsort Algorithm CS 583 Analysis of Algorithms 7/1/2016

advertisement
Heapsort Algorithm
CS 583
Analysis of Algorithms
7/1/2016
CS583 Fall'06: Heapsort
1
Outline
• Sorting Problem
• Heaps
– Definition
– Maintaining heap property
– Building a heap
• Heapsort Algorithm
7/1/2016
CS583 Fall'06: Heapsort
2
Sorting Problem
• Sorting is usually performed not on isolated data,
but records.
– Each record contains a key, which is the value to be
sorted.
– The remainder is called satellite data.
– When a sorting algorithm permutes the keys, it must
permute the satellite data as well.
– If the satellite data is large for each record, we often
permute pointers to records.
• This level of detail is usually irrelevant in the study
of algorithms, but is important when converting an
algorithm to a program.
7/1/2016
CS583 Fall'06: Heapsort
3
Sorting Problem: Importance
• Sorting is arguably the most fundamental problem in
the study of algorithms for the following reasons:
– The need to sort information is often a key part of an
application. For example, sorting the financial reports by
security IDs.
– Algorithms often use sorting as a key subroutine. For
example, in order to match a security against benchmarks,
the latter set needs to be sorted by some key elements.
– There is a wide variety of sorting algorithms, and they use
a rich set of techniques.
7/1/2016
CS583 Fall'06: Heapsort
4
Heaps
• Heapsort algorithm sorts in place and its running
time is O(n log(n)).
– It combines the better attributes of insertion sort and
merge sort algorithms.
– It is based on a data structure, -- heaps.
• The (binary) heap data structure is an array object
that can be viewed as a nearly complete binary tree.
– An array A that represents a heap is an object with two
attributes:
• length[A], which is the number of elements, and
• heap-size[A], the number of elements in the heap stored within
the array A.
7/1/2016
CS583 Fall'06: Heapsort
5
Heaps: Example
A = {10, 8, 6, 5, 7, 3, 2}
10
8
5
6
7
3
2
The root of the tree is A[1]. Children of a node i determined as follows:
Left(i)
return 2i
Right(i)
return 2i+1
7/1/2016
CS583 Fall'06: Heapsort
6
Heaps: Example (cont.)
The above is proven by induction:
1. The root's left child is 2 = 2*1.
2. Assume it is true for node n.
3. The left child of a node (n+1) will follow the right child of
node n: left(n+1) = 2*n + 1 + 1 = 2(n+1) 
The parent of a node i is calculated from i=2p, or i=2p+1,
where p is a parent node. Hence
Parent(i)
return floor(i/2)
7/1/2016
CS583 Fall'06: Heapsort
7
Max-Heaps
• In a max-heap, for every node i other than the root:
– A[Parent(i)] >= A[i]
• For the heapsort algorithm, we use max-heaps.
– The height of the heap is defined to be the longest path
from the root to a leaf, and it is (lg n) since it is a
complete binary tree.
• We will consider the following basic procedures on
the heap:
– Max-Heapify to maintain the max-heap property.
– Build-Max-Heap to produce a max-heap from an
unordered input array.
– Heapsort to sort an array in place.
7/1/2016
CS583 Fall'06: Heapsort
8
Maintaining the Heap Property
• The Max-Heapify procedure takes an array A and its
index i.
• It is assumed that left and right subtrees are already
max-heaps.
• The procedure lets the value of A[i] "float down" in
the max-heap so that the subtree rooted at index i
becomes a max-heap.
7/1/2016
CS583 Fall'06: Heapsort
9
Max-Heapify: Algorithm
Max-Heapify (A, i)
1 l = Left(i)
2 r = Right(i)
3 if l <= heap-size[A] and A[l] > A[i]
4
largest = l
5 else
6
largest = i
7 if r <= heap-size[A] and A[r] > A[largest]
8
largest = r
9 if largest <> i
10 <exchange A[i] with A[largest]>
11 Max-Heapify(A, largest)
7/1/2016
CS583 Fall'06: Heapsort
10
Max-Heapify: Analysis
It takes (1) to find A[largest], plus the time to run the
procedure recursively on at most 2n/3 elements. (This is the
maximum size of a child tree. It occurs when the last row of
the tree is exactly half full.)
Assume there n nodes and x levels in the tree that has half of the
last row. This means:
n = 1 +
2^x – 1
2^(x-1)
2^(x-1)
7/1/2016
2
+
=
=
+ ... + 2^(x-1) + 2^x/2
2^x/2 = n
a => 2a + a = n+1 =>
(n+1)/3
CS583 Fall'06: Heapsort
11
Max-Heapify: Analysis (cont.)
Max subtree size = (half of all elements to level x-1)
+ (elements at the last level) – (1 root element) =
(2^x – 1)/2 + 2^x/2 – 1 = 2^(x-1) – ½ + 2^(x-1) – 1 =
n/3 + 1/3 + n/3 + 1/3 – 1.5 = 2n/3 + 2/3 – 1.5 ~ 2n/3
Therefore the running time of Max-Heapify is described by the following
recurrence:
T(n) <= T(2n/3) + (1) According to the master theorem:
T(n) =
(lg n) (a=1, b=3/2, f(n) = (1))
Since T(n) is the worst-case scenario, we have a running time of the algorithm at
O(lg n).
7/1/2016
CS583 Fall'06: Heapsort
12
Building a Heap
• We can use the procedure Max-Heapify in a bottomup manner to convert the whole array A[1..n] into a
max-heap.
• Note that, elements A[floor(n/2)+1..n] are leaves.
The last element that is not a leaf is a parent of the
last node, -- floor(n/2).
• The procedure Build-Max-Heap goes through all
non-leaf nodes and runs Max-Heapify on each of
them.
7/1/2016
CS583 Fall'06: Heapsort
13
Build-Max-Heap: Algorithm
Build-Max-Heap(A, n)
1 heap-size[A] = n
2 for i = floor(n/2) to 1
3
Max-Heapify(A,i)
Invariant:
At the start of each iteration 2-3, each node i+1, ... , n is the root of a maxheap.
Proof.
• Initialization: i=floor(n/2). Each node in floor(n/2)+1,...,n are leaves and
hence are roots of trivial max-heaps.
7/1/2016
CS583 Fall'06: Heapsort
14
Build-Max-Heap: Correctness
• Maintenance: children of node i are numbered
higher than i, and by the loop invariant are assumed
to be roots of max-heaps.
– This is the condition for Max-Heapify.
– Moreover, the Max-Heapify preserves the property that
i+1, ... , n are roots of max-heaps.
– Decrementing i by 1 makes the loop invariant for the next
iteration.
• Termination: i=0, hence each node 1,2,...,n is the
root of a max-heap.
7/1/2016
CS583 Fall'06: Heapsort
15
Build-Max-Heap: Performance
• Each call to Max-Heapify takes O(lg n) time and
there are n such calls.
– Therefore the running time of Build-Max-Heap is O(n
lgn).
• To derive a tighter bound, we observe that the
running time of Max-Heapify depends on the node's
height.
– An n-element heap has height floor(lgn). There are at
most ceil(n/2^(h+1)) nodes of any height h. Assume these
nodes are at height x of the original tree. Then we have:
7/1/2016
CS583 Fall'06: Heapsort
16
Build-Max-Heap: Performance (cont.)
1+2+...+2^x+...+2^h = n
2^(x+h+1) = n+1
2^x = (n+1)/2^(h+1) = ceil(n/2^(h+1))
The time required by Max-Heapify when called on a node of height h is
O(h). Hence:
h=0,floor(lgn)ceil(n/2^(h+1)) O(h) =
O(nh=0,floor(lgn)h/2^h)
A.8: k=0,k/x^k = x/(1-x)^2
h=0,h/2^h = ½ / (1-1/2)^2 = 2
Thus, the running time of Build-Max-Heap can be bounded
O(n h=0,floor(lgn)h/2^h) = O(nh=0,h/2^h) = O(n)
7/1/2016
CS583 Fall'06: Heapsort
17
The Heapsort Algorithm
The heapsort algorithm uses Build-Max-Heap on A[1..n]. Since the
maximum element of the array is at A[1], it can be put into correct position
A[n]. Now A[1..(n-1)] can be made max-heap again.
Heapsort (A,n)
1 Build-Max-Heap(A,n)
2 for i = n to 2
3
<swap A[1] with A[i]>
4
heap-size[A] = heap-size[A]-1
5
Max-Heapify(A,1)
Step 1 takes O(n) time. Loop 2 is repeated (n-1) times with step 5 taking most
time O(lgn). Hence the running time of heapsort is O(n) + O(n lgn) = O(n
lgn).
7/1/2016
CS583 Fall'06: Heapsort
18
Download