The Heap data structure and Heap Sort

advertisement
The Heap data structure and Heap Sort
A heap (or maxheap) is a complete binary tree in which the value (key value) of any node is
greater than or equal to (less than or equal to in a minheap) than the values of its children. If
heap is used without the min- or max- qualifier, we normally mean a maxheap. Don't confuse
this definition of heap as a data structure with the system heap from which the operating system
allocates RAM when the C++ new operator is invoked.
For example, this is a heap:
44
/ \
38
12
/ \ /
31 2 10
level 0
level 1
level 2
For each node, its value is greater than its children's value. Notice at any level below the root,
particular values may be larger than or smaller than values at a lower level. We only care about
the values of any particular node's children, not its sibling's children. Note that the largest value
is always going to be at the root of a heap.
Heaps are useful for several applications in computer science. One application is a priority
queue, a queue in which the highest priority (largest value) node is extracted during a dequeue
operation, not the oldest. One example of where priority queues can be used is for process
control in operating systems, where the highest priority process runs next, not necessarily the one
that has been waiting the longest. For this reason, adding a node to a heap is often called
enqueueing the node, and removing a node is often called dequeueing the node.
Note also that a heap is complete, meaning that all levels, except possibly the lowest, are full. At
each level n, except perhaps the last level, a complete binary tree has 2n elements, and only the
last level may have fewer than that: i.e., level zero has one node, level one has two, level two
has four, level three has eight, etc. This doubling of the number of nodes at each level, and the
fact that the tree is complete, means that the heap is representable in an array, where the nodes
can find their children or parent using simple algebraic formulae. For any node at index i, its left
child is at index (2*i)+1, its right child is at index (2*i)+2 (assuming these indices are within the
heap's data), and (except for the root node) the parent is at index (i-1)/2 (remember – integer
division).
Creating a heap from an unordered data set is called heapifying. To heapify an array, notice that
a leaf node is always a heap since it has no children with larger values than itself. Also, notice
that we build a heap by adding nodes going across each level of the heap left-to-right (since the
heap must always be full). From these two observations we can deduce how to heapify a data set
in an array.
For example, assume this array (element indices are below):
11
0
17
1
8
2
12
3
5
4
1
5
19
6
3
7
We can create a heap by adding one element of the array at a time, from the 0th element going
right, and then heapifying after adding the element to the heap. We call this heapifying
algorithm heap-up, since the heaping action for each added element moves up through the array,
or towards lower indices.
First add element 0 to the heap (represented by the bold border):
11
0
17
1
8
2
12
3
5
4
1
5
19
6
3
7
This is a heap, so no more action need be taken to heapify element 0. Then add element 1:
11
0
17
1
8
2
12
3
5
4
1
5
19
6
3
7
Now, 17 is the left child of 11, which violates the heap principle. Therefore exchange the
contents of elements 0 and 1:
17
0
11
1
8
2
12
3
5
4
1
5
19
6
3
7
1
5
19
6
3
7
Now the bold area is a heap. Add element 2 to the heap:
17
0
11
1
8
2
12
3
5
4
Element 2 (value 8) is smaller than its parent, element 0 (value 17), so no further action is
needed. Add element 3 to the heap:
17
0
11
1
8
2
12
3
5
4
1
5
19
6
3
7
Element 3 (value 12), the left child of element 1 (value 11), is larger than its parent, so this
violates the heap principle. So exchange the contents of elements 3 and 1:
17
0
12
1
8
2
11
3
5
4
1
5
19
6
3
7
Looking at the new value in element 1 (12), it is smaller than its parent, element 0 (value 17), so
this meets the heap principle. Now, add element 4 to the heap:
17
0
12
1
8
2
11
3
5
4
1
5
19
6
3
7
Element 4 (value 5) is smaller than its parent, element 1 (value 12), so no further action is
needed. Add element 5 to the heap:
17
0
12
1
8
2
11
3
5
4
1
5
19
6
3
7
Element 5 (value 1) is smaller than its parent, element 2 (value 8), so no further action is needed.
Add element 6 to the heap:
17
0
12
1
8
2
11
3
5
4
1
5
19
6
3
7
Element 6 (value 19) is larger than its parent, element 2 (value 8), so exchange the contents of
element 6 and element 2:
17
0
12
1
19
2
11
3
5
4
1
5
8
6
3
7
Now look at element 2 (value 19). It's now larger than its parent, element 0 (value 17), so
exchange the contents of element 2 and element 0:
19
0
12
1
17
2
11
3
5
4
1
5
8
6
3
7
We now have a heap in the first seven elements of the array, so add the last element, number 7:
19
0
12
1
17
2
11
3
5
4
1
5
8
6
3
7
Element 7 (value 3) is smaller than its parent, element 3 (value 11), so no further action is
needed. We now have a heap in the array.
The heap may be used as-is for a priority queue, with heap-up applied to each new element
added in succession. It's the same algorithm whether we're moving down an array like this
example, or adding a new element at the end of the array, provided the array is already a heap.
For a priority queue, the dequeue algorithm is to swap the 0th element with the last (index n-1),
then heap-down to restore the heap of new size n-1. The largest value will then be in the (old)
last position. This fact may also be used to make a sorting algorithm heapsort:
Here's our heap from before:
19
0
12
1
17
2
11
3
5
4
1
5
8
6
3
7
11
3
5
4
1
5
8
6
19
7
We swap the 0th and the 7th element:
3
0
12
1
17
2
Element 7 is now sorted, and the array from index 0 to index 6 is not a heap. So we have to
"heap-down" to restore the heap. Look at the children of element 0, elements 1 and 2. The
larger of these is 17, which is larger than 3, so we swap:
17
12
3
11
5
1
8
19
0
1
2
3
4
5
6
7
The children of element 2 (the one we swapped into) are elements 5 and 6. The larger value in
these elements is 8, which is larger than 3, so we swap:
17
0
12
1
8
2
11
3
5
4
1
5
3
6
19
7
The children of element 6 are off the end of the heap, so we're done. Our heap is the first 7
elements. If we're using the heap as a priority queue, we've just finished dequeueing an element
from the priority queue, and it is element 7.
To sort an array (using a heap for a heap sort), create a heap as above, then repeat this dequeue
process for all the remaining elements. Element 7 has the largest value, so that the rightmost part
of the array is sorted:
17
0
12
1
8
2
11
3
5
4
1
5
3
6
19
7
5
4
1
5
17
6
19
7
Swap the contents of element 0 and element 6:
3
0
12
1
8
2
11
3
Elements 6 and 7 are sorted, and the first 6 elements are no longer a heap. So we heap-down.
The largest child of element 0 is element 1 (value 12), and this is larger than 3 so we swap:
12
0
3
1
8
2
11
3
5
4
1
5
17
6
19
7
The children of element 1 are elements 3 and 4, and the larger value in them is 11 (element 3),
which is larger than 3, so we swap:
12
0
11
1
8
2
3
3
5
4
1
5
17
6
19
7
Now, the children of element 3 are off the end of the heap, so we have a heap in elements 0
through 5. Notice, every time we have a heap again, the largest value is in the 0th element of the
array. So we swap elements 0 and 5, making the last three elements sorted and breaking the heap
again:
1
0
11
1
8
2
3
3
5
4
12
5
17
6
19
7
So we heap-down. The larger of the children of element 0 is element 1, which is 11, and it's
larger than 1, so we swap:
11
0
1
1
8
2
3
3
5
4
12
5
17
6
19
7
Element 1's children are elements 3 and 4. The larger of those is 5, which is bigger than 1, so we
swap:
11
5
8
3
1
12
17
19
0
1
2
3
4
5
6
7
Element 4's children are off the end of the heap, so we have restored the heap property to the first
5 elements and the last 3 are sorted. So continue by swapping element 0 and element 4:
1
0
5
1
8
2
3
3
11
4
12
5
17
6
19
7
The last 4 elements are sorted and we've broken the heap again, so we heap-down. The larger of
element 0's children is 8 and 8 is greater than 1, so we swap:
8
0
5
1
1
2
3
3
11
4
12
5
17
6
19
7
Element 2 has only no children in the heap, so we're done with re-heaping the heap. So swap
element 0 element with element 3:
3
0
5
1
1
2
8
3
11
4
12
5
17
6
19
7
Now the last 5 elements of the array are sorted, and our heap is broken. So we heap-down. The
larger of the children of element 0 has the value 5, which is larger than 1, so we swap:
5
0
3
1
1
2
8
3
11
4
12
5
17
6
19
7
Element 1 has no children in the heap, so we have a heap. So swap element 0 and element 2:
1
0
3
1
5
2
8
3
11
4
12
5
17
6
19
7
Now the last six elements of the array are sorted. Element 0 has only one child, and its value is
greater than the value of element 0, thus the first 2 elements of the array are not a heap, so we
swap element 0 and 1:
1
0
3
1
5
2
8
3
11
4
12
5
17
6
19
7
The last 7 elements of the array are sorted, and the remaining element must be a heap (it has only
one element), which must be smaller than the rest of the array, so the array must be sorted.
There is an interesting pattern to the indices of the children of each node in a heap represented in
an array. Since there are two children per node, and they are adjacent, you can draw a pattern on
the heaped array's indices like this:
Index
Parent's
index
0
-
1
0
2
0
3
1
4
1
5
2
6
2
7
3
In other words, the parent of each element can be determined by skipping the 0th element
(element 0 has no parent), then counting 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6...
8
3
Download