Building Java Programs
Chapter 18
Advanced Data Structures:
Hashing and Heaps
Copyright (c) Pearson 2013.
All rights reserved.
Hashing
Reading: 18.1
Recall: ADTs
• abstract data type (ADT): A specification of a collection of
data and the operations that can be performed on it.
– Describes what a collection does, not how it does it.
• Java's collection framework describes ADTs with interfaces:
– Collection, Deque, List, Map, Queue, Set, SortedMap
• An ADT can be implemented in multiple ways by classes:
– ArrayList and LinkedList
– HashSet and TreeSet
– LinkedList , ArrayDeque, etc.
implement List
implement Set
implement Queue
3
SearchTree as a set
• We implemented a class SearchTree to store a BST of ints:
• Our BST is essentially a set of integers.
Operations we support:
– add
– contains
– remove
...
overallRoot
55
29
-3
87
42
60
91
• But there are other ways to implement a set...
4
Sets
• set: A collection of unique values (no duplicates allowed)
that can perform the following operations efficiently:
– add, remove, search (contains)
– The client doesn't think of a set as having indexes; we just
add things to the set in general and don't worry about order
"if"
set.contains("to")
set.contains("be")
"the"
"to"
"of"
"down"
"from"
"by"
"she"
"you"
"in"
"why" "him"
set
true
false
5
Int Set ADT interface
• Let's think about how to write our own implementation of a
set.
– To simplify the problem, we only store ints in our set for now.
– As is (usually) done in the Java Collection Framework, we will
define sets as an ADT by creating a Set interface.
– Core operations are: add, contains, remove.
public interface IntSet {
void add(int value);
boolean contains(int value);
void clear();
boolean isEmpty();
void remove(int value);
int size();
}
6
Unfilled array set
• Consider storing a set in an unfilled array.
– It doesn't really matter what order the elements appear in a set,
so long as they can be added and searched quickly.
– What would make a good ordering for the elements?
• If we store them in the next available index, as in a list, ...
– set.add(9);
set.add(23);
set.add(8);
set.add(-3);
set.add(49);
set.add(12);
inde 0 1 2 3 4 5
x
valu 9 2 8 -3 4 1
e
3
9 2
6
– How efficient is add? size
contains?
remove?
• O(1), O(N), O(N)
6
7
8
9
0
0
0
0
• (contains must loop over the array; remove must shift elements.)
7
Sorted array set
• Suppose we store the elements in an unfilled array, but
in sorted order rather than order of insertion.
– set.add(9);
set.add(23);
set.add(8);
set.add(-3);
set.add(49);
set.add(12);
inde 0 1
x
valu -3 8
e
size 6
2
3
4
5
6
7
8
9
9
1
2
2
3
4
9
0
0
0
0
– How efficient is add? contains? remove?
• O(N), O(log N), O(N)
• (You can do an O(log N) binary search to find elements in contains,
and to find the proper index in add/remove; but add/remove still
need to shift elements right/left to make room, which is O(N) on
average.)
8
A strange idea
• Silly idea: When client adds value i, store it at index i in the
array.
– Would this work?
– Problems / drawbacks of this approach? How to work around
them?
inde 0 1 2 3 4 5 6 7 8 9
x
set.add(7);
valu 0 1 0 0 0 0 0 7 0 9
set.add(1);
set.add(9);
e
...
size 3
set.add(18);
index 0set.add(12);
1 2 3 4 5
value 0
size
5
1
0
0
0
0
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
7
0
9
0
0
1
2
0
0
0
0
0
1
8
0
9
Hashing
• hash: To map a large domain of values to a smaller fixed
domain.
– Typically, mapping a set of elements to integer indexes in an
array.
– Idea: Store any given element value in a particular predictable
index.
• That way, adding / removing / looking for it are constant-time (O(1)).
– hash table: An array that stores elements via hashing.
• hash function: An algorithm that maps values to indexes.
– hash code: The output of a hash function for a given value.
– In previous slide, our "hash function" was: hash(i)  i
• Potentially requires a large array (a.length > i).
10
Improved hash function
• To deal with negative numbers:
hash(i)  abs(i)
• To deal with large numbers: hash(i)  abs(i) % length
set.add(37);
set.add(-2);
set.add(49);
inde
x
valu
e
size
// inside
// abs(37) % 10 == 7
// abs(-2) % 10 == 2
// abs(49) % 10 == 9
0
1
2
3
4
5
6
7
8
9
0
0
-2
0
0
0
0
3
7
0
4
9
3
HashIntSet
class
private int hash(int i) {
return Math.abs(i) % elements.length;
}
11
Sketch of implementation
public class HashIntSet implements IntSet {
private int[] elements;
...
public void add(int value) {
elements[hash(value)] = value;
}
public boolean contains(int value) {
return elements[hash(value)] == value;
}
public void remove(int value) {
elements[hash(value)] = 0;
}
}
– Runtime of add, contains, and remove: O(1) !!
• Are there any problems with this approach?
12
Collisions
• collision: When hash function maps 2 values to same index.
set.add(11);
set.add(49);
set.add(24);
set.add(37);
set.add(54);
inde
x
valu
e
size
// collides with 24!
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
5
4
0
0
3
7
0
4
9
5
• collision resolution: An algorithm for fixing collisions.
13
Probing
• probing: Resolving a collision by moving to another index.
– linear probing: Moves to the next available index (wraps if
needed).
set.add(11);
set.add(49);
set.add(24);
set.add(37);
set.add(54);
inde 0
x
valu 0
e
size 5
// collides with 24; must probe
1 2 3 4 5 6 7 8 9
1
1
0
0
2
4
5
4
0
3
7
0
4
9
– variation: quadratic probing moves increasingly far away: +1,
14
+4, +9, ...
Implementing HashIntSet
• Let's implement an int set using a hash table with linear
probing.
– For simplicity, assume that the set cannot store 0s for now.
public class HashIntSet implements IntSet {
private int[] elements;
private int size;
// constructs new empty set
public HashIntSet() {
elements = new int[10];
size = 0;
}
// hash function maps values to indexes
private int hash(int value) {
return Math.abs(value) % elements.length;
}
...
15
The add operation
• How do we add an element to the hash table?
–
–
–
–
Use the hash function to find the proper bucket index.
If we see a 0, put it there.
If not, move forward until we find an empty (0) index to store it.
If we see that the value is already in the table, don't re-add it.
– set.add(54);
– set.add(14);
inde
x
valu
e
size
// client code
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
5
4
1
4
3
7
0
4
9
6
16
Implementing add
• How do we add an element to the hash table?
public void add(int value) {
int h = hash(value);
while (elements[h] != 0 &&
elements[h] != value) {
h = (h + 1) % elements.length;
}
if (elements[h] != value) {
elements[h] = value;
size++;
}
}
inde
x
valu
e
size
// linear probing
// for empty slot
// avoid duplicates
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
5
4
0
3
7
0
4
9
5
17
The contains operation
• How do we search for an element in the hash table?
– Use the hash function to find the proper bucket index.
– Loop forward until we either find the value, or an empty index
(0).
– If find the value, it is contained (true). If we find 0, it is not
(false).
– set.contains(24)
– set.contains(14)
– set.contains(35)
inde
x
valu
e
size
// true
// true
// false
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
5
4
1
4
3
7
0
4
9
6
18
Implementing contains
public boolean contains(int value) {
int h = hash(value);
while (elements[h] != 0) {
if (elements[h] == value) {
return true;
}
h = (h + 1) % elements.length;
}
return false;
}
inde
x
valu
e
size
// linear probing
// to search
// not found
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
5
4
0
3
7
0
4
9
5
19
The remove operation
• We cannot remove by simply zeroing out an element:
set.remove(54);
set.contains(14)
// set index 5 to 0
// false??? oops
inde 0 1 2 3 4 5 6 7 8 9
x
valu 0 1 0 0 2 0 1 3 0 4
e
1
4
4 4
9
5 it by a special "removed" placeholder value
• Instead, wesize
replace
– (can be re-used on add, but keep searching on contains)
inde
x
valu
e
size
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
X
X
1
4
3
4
0
4
9
5
20
Implementing remove
public void remove(int value) {
int h = hash(value);
while (elements[h] != 0 && elements[h] != value) {
h = (h + 1) % elements.length;
}
if (elements[h] == value) {
elements[h] = -999;
// "removed" flag value
size--;
}
}
inde 0 1 2 3 4
5
6 7 8 9
x
valu 0 11 0 0 24 - 14 34 0 49
e
999
set.remove(54);
size 5 // client code
set.remove(11);
set.remove(34);
21
Patching add, contains
private static final int REMOVED = -999;
public void add(int value) {
int h = hash(value);
while (elements[h] != 0 && elements[h] != value &&
elements[h] != REMOVED) {
h = (h + 1) % elements.length;
}
if (elements[h] != value) {
elements[h] = value;
size++;
}
}
// contains does not need patching;
// it should keep going on a -999, which it already does
public boolean contains(int value) {
int h = hash(value);
while (elements[h] != 0 && elements[h] != value) {
h = (h + 1) % elements.length;
}
return elements[h] == value;
}
22
Problem: full array
• clustering: Clumps of elements at neighboring indexes.
– Slows down the hash table lookup; you must loop through them.
set.add(11);
set.add(49);
set.add(24);
set.add(37);
set.add(54);
set.add(14);
set.add(86);
// collides with 24
// collides with 24, then 54
// collides with 14, then 37
inde 0 1 2 3 4 5 6
x
valu 0 0 0 0 0 0 0
e
• Where does
size each
0 value go in the array?
7
8
9
0
0
0
• How many indexes must be examined to answer contains(94)?
• What will happen if the array completely fills?
23
Rehashing
• rehash: Growing to a larger array when the table is too full.
– Cannot simply copy the old array to a new one. (Why not?)
• load factor: ratio of (# of elements ) / (hash table length )
– many collections rehash when load factor ≅ .75
inde
x
0
1
2
3
4
5
6
7
8
9
valu
e
9
5
1
1
0
0
2
4
5
4
1
4
3
7
6
6
4
8
size
inde
x
8
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
valu
e
0
0
0
0
2
4
0
6
6
0
4
8
0
0
1
1
0
0
5
4
9
5
1
4
3
7
0
0
size
8
24
Implementing rehash
// Grows hash table to twice its original size.
private void rehash() {
int[] old = elements;
elements = new int[2 * old.length];
size = 0;
for (int value : old) {
if (value != 0 && value != REMOVED) {
add(value);
}
}
}
public void add(int value) {
if ((double) size / elements.length >= 0.75) {
rehash();
}
...
}
25
Hash table sizes
• Can use prime numbers as hash table sizes to reduce collisions.
• Also improves spread / reduces clustering on rehash.
set.add(11);
set.add(39);
set.add(21);
set.add(29);
set.add(71);
set.add(41);
set.add(99);
index
// 11
// 39
// 21
// 29
// 81
// 41
// 101
%
%
%
%
%
%
%
13
13
13
13
13
13
13
== 11
== 0
== 8
== 3
== 6
== 2
== 10
0
1
2
3
4
5
6
7
8
9
10 11 12
value 39
0
41
29
0
0
71
0
21
0
10
1
size
11
0
7
26
Other details
• How would we implement toString on our HashIntSet?
inde
x
valu
e
size
0
1
2
3
4
5
6
7
8
9
0
1
1
0
0
2
4
5
4
0
3
7
0
4
9
5
System.out.println(set);
// [11, 24, 54, 37, 49]
27
Separate chaining
• separate chaining: Solving collisions by storing a list at each
index.
– add/contains/remove must traverse lists, but the lists are short
– impossible to "run out" of indexes, unlike with probing
index 0 1 2 3 4 5 6 7 8 9
value
11
24
7
49
54
private class Node {
public int data;
public Node next;
...
}
14
28
Implementing HashIntSet
• Let's implement a hash set of ints using separate chaining.
public class HashIntSet implements IntSet {
// array of linked lists;
// elements[i] = front of list #i (null if empty)
private Node[] elements;
private int size;
// constructs new empty set
public HashIntSet() {
elements = new Node[10];
size = 0;
}
// hash function maps values to indexes
private int hash(int value) {
return Math.abs(value) % elements.length;
}
...
29
The add operation
• How do we add an element to the hash table?
– When you want to modify a linked list, you must either change
the list's front reference, or the next field of a node in the list.
– Where in the list should we add the new element?
– Must make sure to avoid duplicates.
index 0
value
– set.add(24);
1
11
new node 24
2
3
4
54
5
6
7
7
8
9
49
14
30
Implementing add
public void add(int value) {
if (!contains(value)) {
int h = hash(value);
Node newNode = new Node(value);
newNode.next = elements[h];
elements[h] = newNode;
size++;
}
}
// add to front
// of list #h
31
The contains operation
• How do we search for an element in the hash table?
– Must loop through the linked list for the appropriate hash index,
looking for the desired value.
– Looping through a linked list requires a "current" node reference.
index 0
value
1
2
11
current
– set.contains(14)
– set.contains(84)
– set.contains(53)
// true
// false
// false
3
4
24
5
6
7
7
8
9
49
54
14
32
Implementing contains
public boolean contains(int value) {
Node current = elements[hash(value)];
while (current != null) {
if (current.data == value) {
return true;
}
current = current.next;
}
return false;
}
33
The remove operation
• How do we remove an element from the hash table?
– Cases to consider: front (24), non-front (14), not found (94),
null (32)
– To remove a node from a linked list, you must either change the
list's front reference, or the next field of the previous node in
the list.
index 0
value
– set.remove(54);
1
2
11
current
3
4
24
5
6
7
7
8
9
49
54
14
34
Implementing remove
public void remove(int value) {
int h = hash(value);
if (elements[h] != null && elements[h].data == value) {
elements[h] = elements[h].next; // front case
size--;
} else {
Node current = elements[h];
// non-front case
while (current != null && current.next != null) {
if (current.next.data == value) {
current.next = current.next.next;
size--;
return;
}
current = current.next;
}
}
}
35
Rehashing w/ chaining
• Separate chaining handles rehashing similarly to linear probing.
– Loop over the list in each hash bucket; re-add each element.
– An optimal implementation re-uses node objects, but this is
optional.
inde 0 1 2 3 4 5 6 7 8 9
x
valu
e
inde
x
valu
e
1
1
2
4
5
4
1
4
7
4
9
0 1 2 3 4 5 6 7 8 9
2
4
7
4
9
1
0
1
1
1
1
1
2
1
3
1
4
5
4
1
4
1
5
1
6
1
7
1
8
1
9
36
Hash set of objects
public class HashSet<E> implements Set<E> {
...
private class Node {
public E data;
public Node next;
}
}
• It is easy to hash an integer i (use index abs(i) % length ).
– How can we hash other types of values (such as objects)?
37
The hashCode method
• All Java objects contain the following method:
public int hashCode()
Returns an integer hash code for this object.
– We can call hashCode on any object to find its preferred index.
– HashSet, HashMap, and the other built-in "hash" collections call
hashCode internally on their elements to store the data.
• We can modify our set's hash function to be the following:
private int hash(E e) {
return Math.abs(e.hashCode()) % elements.length;
}
38
Issues with generics
• You must make an unusual cast on your array of generic
nodes:
public class HashSet<E> implements Set<E> {
private Node[] elements;
...
public HashSet() {
elements = (Node[]) new HashSet.Node[10];
}
• Perform all element comparisons using equals:
public boolean contains(int value) {
...
// if (current.data == value) {
if (current.data.equals(value)) {
return true;
}
...
39
Implementing hashCode
• You can write your own hashCode methods in classes you
write.
– All classes come with a default version based on memory address.
– Your overridden version should somehow "add up" the object's
state.
• Often you scale/multiply parts of the result to distribute the results.
public class Point {
private int x;
private int y;
...
public int hashCode() {
// better than just returning (x + y);
// spreads out numbers, fewer collisions
return 137 * x + 23 * y;
}
}
40
Good hashCode behavior
• A well-written hashCode method has:
– Consistently with itself (must produce same results on each
call):
o.hashCode() == o.hashCode(), if o's state doesn't change
– Consistently with equality:
a.equals(b) must imply that a.hashCode() == b.hashCode(),
!a.equals(b) does NOT necessarily imply that
a.hashCode() != b.hashCode() (why not?)
• When your class has an equals or hashCode, it should have both.
– Good distribution of hash codes:
• For a large set of objects with distinct states, they will generally
return unique hash codes rather than all colliding into the same hash
41
bucket.
Example: String hashCode
• The hashCode function inside a String object looks like this:
public int hashCode() {
int hash = 0;
for (int i = 0; i < this.length(); i++) {
hash = 31 * hash + this.charAt(i);
}
return hash;
}
– As with any general hashing function, collisions are possible.
• Example: "Ea" and "FB" have the same hash value.
– Early versions of the Java examined only the first 16 characters.
For some common data this led to poor hash table performance.
42
hashCode tricks
• If one of your object's fields is an object, call its hashCode:
public int hashCode() {
// Student
return 531 * firstName.hashCode() + ...;
• To incorporate a double or boolean, use the hashCode
method from the Double or Boolean wrapper classes:
public int hashCode() {
// BankAccount
return 37 * Double.valueOf(balance).hashCode() +
Boolean.valueOf(isCheckingAccount).hashCode();
• Guava includes an Objects.hashCode(...) method that
takes any number of values and combines them into one hash
code.
public int hashCode() {
// BankAccount
return Objects.hashCode(name, id, balance);
43
Implementing a hash map
• A hash map is like a set where the nodes store key/value pairs:
public class HashMap<K, V> implements Map<K, V> {
...
}
index 0 1 2 3 4 5 6 7 8 9
value
"Stef
"
//
key
value
map.put("Marty", 14);
map.put("Jeff", 21);
map.put("Kasey", 20);
map.put("Stef", 35);
35
"Marty" 14
"Jeff"
21
"Kasey" 20
– Must modify your Node class to store a key and a value
44
Map ADT interface
• Let's think about how to write our own implementation of a
map.
– As is (usually) done in the Java Collection Framework, we will
define map as an ADT by creating a Map interface.
– Core operations: put (add), get, contains key, remove
public interface Map<K, V> {
void clear();
boolean containsKey(K key);
V get(K key);
boolean isEmpty();
void put(K key, V value);
void remove(int value);
int size();
}
45
Hash map vs. hash set
– The hashing is always done on the keys, not the values.
– The contains method is now containsKey; there and in
remove, you search for a node whose key matches a given key.
– The add method is now put; if the given key is already there,
you must replace its old value with the new one.
• map.put("Bill", 66);
index 0
1
2
// replace 49 with 66
3
4
5
6
7
8
9
value
"Stef
"
35
"Marty" 14
"Abby" 57
"Bill"
"Jeff"
21
"Kasey" 20
49 66
46
Priority Queues
and Heaps
Reading: 18.2
Prioritization problems
• print jobs: CSE lab printers constantly accept and complete
jobs from all over the building. We want to print faculty jobs
before staff before student jobs, and grad students before
undergrad, etc.
• ER scheduling: Scheduling patients for treatment in the ER.
A gunshot victim should be treated sooner than a guy with a
cold, regardless of arrival time. How do we always choose the
most urgent case when new patients continue to arrive?
• key operations we want:
– add an element (print job, patient, etc.)
– get/remove the most "important" or "urgent" element
48
Priority Queue ADT
• priority queue: A collection of ordered elements that
provides fast access to the minimum (or maximum) element.
–
–
–
–
add
peek
remove
isEmpty,
adds in order
returns minimum or "highest priority" value
removes/returns minimum value
clear, size, iterator
O(1)
pq.add("if");
pq.add("from");
...
pq.remove()
"the" "of"
"if"
"to"
"down"
"from"
"by"
"she"
"you"
"in"
"why" "him"
"by"
priority queue
49
Unfilled array?
• Consider using an unfilled array to implement a priority
queue.
Store it in the next available index, as in a list.
Loop over elements to find minimum element.
Loop over elements to find min. Shift to remove.
inde 0 1 2 3 4 5 6 7 8 9
queue.add(9);
x
queue.add(23);
valu 9 2 8 - 4 1 0 0 0 0
queue.add(8);
e
3
3 9 2
queue.add(-3);
queue.add(49);
size 6
– add:
– peek:
– remove:
queue.add(12);
queue.remove();
– How efficient is add? peek? remove?
• O(1), O(N), O(N)
• (peek must loop over the array; remove must shift elements)
50
Sorted array?
• Consider using a sorted array to implement a priority queue.
– add:
order.
– peek:
– remove:
Store it in the proper index to maintain sorted
Minimum element is in index [0].
Shift elements to remove min from index [0].
inde 0 1 2 3 4 5 6 7 8 9
queue.add(9);
x
queue.add(23);
valu - 8 9 1 2 4 0 0 0 0
queue.add(8);
e
3
2 3 9
queue.add(-3);
queue.add(49);
size 6
queue.add(12);
queue.remove();
– How efficient is add? peek? remove?
• O(N), O(1), O(N)
• (add and remove must shift elements)
51
Linked list?
• Consider using a doubly linked list to implement a priority
queue.
– add:
– peek:
– remove:
remove.
Store it at the end of the linked list.
Loop over elements to find minimum element.
Loop over elements to find min. Unlink to
queue.add(9);
queue.add(23);
queue.add(8);
queue.add(-3);
queue.add(49);
queue.add(12);
queue.remove();
9
front
2
3
8
– How efficient is add? peek? remove?
• O(1), O(N), O(N)
-3
4
9
1
2
back
52
Sorted linked list?
• Consider using a sorted linked list to implement a priority
queue.
– add:
order.
– peek:
– remove:
Store it in the proper place to maintain sorted
Minimum element is at the front.
Unlink front -3
element
8 to remove.
9
1
queue.add(9);
queue.add(23);
queue.add(8);
queue.add(-3);
queue.add(49);
queue.add(12);
queue.remove();
front
– How efficient is add? peek? remove?
• O(N), O(1), O(1)
2
2
3
4
9
back
53
Binary search tree?
• Consider using a binary search tree to implement a PQ.
– add:
– peek:
tree.
– remove:
Store it in the proper BST L/R - ordered spot.
Minimum element is at the far left edge of the
Unlink far left element to remove.
queue.add(9);
queue.add(23);
queue.add(8);
queue.add(-3);
queue.add(49);
queue.add(12);
queue.remove();
9
8
-3
23
12
49
– How efficient is add? peek? remove?
• O(log N), O(log N), O(log N)...?
• (good in theory, but the tree tends to become unbalanced to the
54
Unbalanced binary tree
queue.add(9);
queue.add(23);
queue.add(8);
queue.add(-3);
queue.add(49);
queue.add(12);
queue.remove();
queue.add(16);
queue.add(34);
queue.remove();
queue.remove();
queue.add(42);
queue.add(45);
queue.remove();
12
23
16
49
34
42
45
– Simulate these operations. What is the tree's shape?
– A tree that is unbalanced has a height close to N rather than log
N, which breaks the expected runtime of many operations.
55
Heaps
• heap: A complete binary tree with vertical ordering.
– complete tree: Every level is full except possibly the lowest
level, which must be filled from left to right
• (i.e., a node may not have any children until all possible siblings exist)
56
Heap ordering
• heap ordering: If P ≤ X for every element X with parent P.
– Parents' values are always smaller than those of their children.
– Implies that minimum element is always the root (a "min-heap").
• variation: "max-heap" stores largest element at root, reverses
ordering
– Is a heap a BST? How are they related?
57
Which are min-heaps?
no
10
20
20
80
40
30
no
10
80
60
85
99
700
10
20
40
40
60
700
85
50
80
60
85
99
700
10
80
no
10
15
50
50
20
no
10
20
99
20
80
40
40
80
60
99
60
58
Which are max-heaps?
no
30
10
48
21
20
80
10
25
14
24
7
17
3
50
30
33
10
30
10
30
40
22
no
35
28
18
9
11
59
59
Heap height and runtime
• The height of a complete tree is always log N.
– How do we know this for sure?
• Because of this, if we implement a priority queue using a
heap, we can provide the following runtime guarantees:
– add:
O(log N)
– peek:
O(1)
– remove: O(log N)
n-node complete tree
of height h:
2h  n  2h+1 – 1
h = log n
60
The add operation
• When an element is added to a heap, where should it go?
– Must insert a new node while maintaining heap properties.
– queue.add(15);
new node
10
15
20
40
50
80
60
700
85
99
65
61
The add operation
• When an element is added to a heap, it should be initially
placed as the rightmost leaf (to maintain the completeness
property).
– But the heap ordering property becomes broken!
10
10
20
40
50
80
60
700
65
85
20
99
40
50
80
60
700
65
85
99
15
62
"Bubbling up" a node
• bubble up: To restore heap ordering, the newly added
element is shifted ("bubbled") up the tree until it reaches its
proper place.
– Weiss: "percolate up" by swapping with its parent
– How many bubble-ups are necessary, at most?
10
20
40
50
80
60
700
65
85
15
10
15
99
80
40
50
20
700
65
85
99
60
63
Bubble-up exercise
• Draw the tree state of a min-heap after adding these elements:
– 6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2
2
19
25
76
6
42
50
55
11
88
104
20
64
The peek operation
• A peek on a min-heap is trivial to perform.
– because of heap properties, minimum element is always the root
– O(1) runtime
• Peek on a max-heap would be O(1) as well (return max, not
min)
10
20
40
50
80
60
76
85
99
65
65
The remove operation
• When an element is removed from a heap, what should we do?
– The root is the node to remove. How do we alter the tree?
– queue.remove();
10
20
40
50
80
60
700
85
99
65
66
The remove operation
• When the root is removed from a heap, it should be initially
replaced by the rightmost leaf (to maintain completeness).
– But the heap ordering property becomes broken!
10
65
20
40
700
80
60
50
65
85
20
99
40
700
80
60
50
85
99
65
67
"Bubbling down" a node
• bubble down: To restore heap ordering, the new improper
root is shifted ("bubbled") down the tree until it reaches its
proper place.
– Weiss: "percolate down" by swapping with its smaller child
(why?)
– How many bubble-down
are necessary, at most? 20
65
20
40
74
80
60
50
85
40
99
50
74
80
60
85
99
65
68
Bubble-down exercise
• Suppose we have the min-heap shown below.
• Show the state of the heap tree after remove has been called 3
times, and which elements are returned by the removal.
2
19
25
76
6
42
50
55
11
88
104
20
69
Array heap implementation
• Though a heap is conceptually a binary tree,
since it is a complete tree, when implementing it
we actually can "cheat" and just use an array!
– index of root = 1 (leave 0 empty to simplify the math)
– for any node n at index i :
• index of n.left = 2i
• index of n.right = 2i + 1
• parent index of n?
– This array representation
is elegant and efficient (O(1))
for common tree operations.
70
Implementing HeapPQ
• Let's implement an int priority queue using a min-heap array.
public class HeapIntPriorityQueue
implements IntPriorityQueue {
private int[] elements;
private int size;
// constructs a new empty priority queue
public HeapIntPriorityQueue() {
elements = new int[10];
size = 0;
}
...
}
71
Helper methods
• Since we will treat the array as a complete tree/heap, and walk
up/down between parents/children, these methods are helpful:
// helpers for navigating indexes up/down the tree
private int parent(int index)
{ return index/2; }
private int leftChild(int index)
{ return index*2; }
private int rightChild(int index)
{ return index*2 + 1; }
private boolean hasParent(int index) { return index > 1; }
private boolean hasLeftChild(int index) {
return leftChild(index) <= size;
}
private boolean hasRightChild(int index) {
return rightChild(index) <= size;
}
private void swap(int[] a, int index1, int index2) {
int temp = a[index1];
a[index1] = a[index2];
a[index2] = temp;
}
72
Implementing add
• Let's write the code to add an element to the heap:
public void add(int value) {
...
}
10
10
20
40
50
80
60
700
65
85
15
15
99
80
40
50
20
700
65
85
99
60
73
Implementing add
// Adds the given value to this priority queue in order.
public void add(int value) {
elements[size + 1] = value; // add as rightmost leaf
// "bubble up" as necessary to fix ordering
int index = size + 1;
boolean found = false;
while (!found && hasParent(index)) {
int parent = parent(index);
if (elements[index] < elements[parent]) {
swap(elements, index, parent(index));
index = parent(index);
} else {
found = true; // found proper location; stop
}
}
size++;
}
74
Resizing a heap
• What if our array heap runs out of space?
– We must enlarge it.
– When enlarging hash sets, we needed to carefully rehash the
data.
– What must we do here?
– (We can simply copy the data
into a larger array.)
75
Modified add code
// Adds the given value to this priority queue in order.
public void add(int value) {
// resize to enlarge the heap if necessary
if (size == elements.length - 1) {
elements = Arrays.copyOf(elements,
2 * elements.length);
}
...
}
76
Implementing peek
• Let's write code to retrieve the minimum element in the heap:
public int peek() {
...
}
10
15
80
40
50
20
700
65
85
99
60
77
Implementing peek
// Returns the minimum element in this priority queue.
// precondition: queue is not empty
public int peek() {
return elements[1];
}
78
Implementing remove
• Let's write code to remove the minimum element in the heap:
public int remove() {
...
}
10
65
20
40
700
80
60
50
65
85
20
99
40
700
80
60
50
85
99
65
79
Implementing remove
public int remove() {
// precondition: queue is not empty
int result = elements[1];
// last leaf -> root
elements[1] = elements[size];
size--;
int index = 1;
// "bubble down" to fix ordering
boolean found = false;
while (!found && hasLeftChild(index)) {
int left = leftChild(index);
int right = rightChild(index);
int child = left;
if (hasRightChild(index) &&
elements[right] < elements[left]) {
child = right;
}
if (elements[index] > elements[child]) {
swap(elements, index, child);
index = child;
} else {
found = true; // found proper location; stop
}
}
return result;
}
80
Int PQ ADT interface
• Let's write our own implementation of a priority queue.
– To simplify the problem, we only store ints in our set for now.
– As is (usually) done in the Java Collection Framework, we will
define sets as an ADT by creating a Set interface.
– Core operations are: add, peek (at min), remove (min).
public interface IntPriorityQueue {
void add(int value);
void clear();
boolean isEmpty();
int peek();
// return min element
int remove();
// remove/return min element
int size();
}
81
Generic PQ ADT
• Let's modify our priority queue so it can store any type of data.
– As with past collections, we will use Java generics (a type
parameter).
public interface PriorityQueue<E> {
void add(E value);
void clear();
boolean isEmpty();
E peek();
// return min element
E remove();
// remove/return min element
int size();
}
82
Generic HeapPQ class
• We can modify our heap priority class to use generics as
usual...
public class HeapPriorityQueue<E>
implements PriorityQueue<E> {
private E[] elements;
private int size;
// constructs a new empty priority queue
public HeapPriorityQueue() {
elements = (E[]) new Object[10];
size = 0;
}
...
}
83
Problem: ordering elements
// Adds the given value to this priority queue in order.
public void add(E value) {
...
int index = size + 1;
boolean found = false;
while (!found && hasParent(index)) {
int parent = parent(index);
if (elements[index] < elements[parent]) {
// error
swap(elements, index, parent(index));
index = parent(index);
} else {
found = true; // found proper location; stop
}
}
}
– Even changing the < to a compareTo call does not work.
• Java cannot be sure that type E has a compareTo method.
84
Comparing objects
• Heaps rely on being able to order their elements.
• Operators like < and > do not work with objects in Java.
– But we do think of some types as having an ordering (e.g.
Dates).
– (In other languages, we can enable <, > with operator
overloading.)
• natural ordering: Rules governing the relative placement of
all values of a given type.
– Implies a notion of equality (like equals) but also < and > .
– total ordering: All elements can be arranged in A ≤ B ≤ C ≤ ...
order.
– The Comparable interface provides a natural ordering.
85
The Comparable interface
• The standard way for a Java class to define a comparison
function for its objects is to implement the Comparable
interface.
public interface Comparable<T> {
public int compareTo(T other);
}
• A call of A.compareTo(B) should return:
a value < 0 if A comes "before" B in the ordering,
a value > 0 if A comes "after" B in the ordering,
or exactly 0 if A and B are considered "equal" in the ordering.
• Effective Java Tip #12: Consider implementing
86
Bounded type parameters
<Type extends SuperType>
– An upper bound; accepts the given supertype or any of its
subtypes.
– Works for multiple superclass/interfaces with & :
<Type extends ClassA & InterfaceB & InterfaceC &
...>
<Type super SuperType>
– A lower bound; accepts the given supertype or any of its
supertypes.
• Example:
// can be instantiated with any animal type
public class Nest<T extends Animal> {
...
}
87
Corrected HeapPQ class
public class HeapPriorityQueue<E extends Comparable<E>>
implements PriorityQueue<E> {
private E[] elements;
private int size;
// constructs a new empty priority queue
public HeapPriorityQueue() {
elements = (E[]) new Object[10];
size = 0;
}
...
public void add(E value) {
...
while (...) {
if (elements[index].compareTo(
elements[parent]) < 0) {
swap(...);
}
}
}
}
88
Ordering and Comparators
What's the "natural" order?
public class Rectangle implements Comparable<Rectangle> {
private int x, y, width, height;
public int compareTo(Rectangle other) {
// ...?
}
}
• What is the "natural ordering" of rectangles?
– By x, breaking ties by y?
– By width, breaking ties by height?
– By area? By perimeter?
• Do rectangles have any "natural" ordering?
– Might we want to arrange rectangles into some order anyway?
90
Comparator interface
public interface Comparator<T> {
public int compare(T first, T second);
}
• Interface Comparator is an external object that specifies a
comparison function over some other type of objects.
– Allows you to define multiple orderings for the same type.
– Allows you to define a specific ordering(s) for a type even if there
is no obvious "natural" ordering for that type.
– Allows you to externally define an ordering for a class that, for
whatever reason, you are not able to modify to make it
Comparable:
• a class that is part of the Java class libraries
• a class that is final and can't be extended
• a class from another library or author, that you don't control
91
Comparator examples
public class RectangleAreaComparator
implements Comparator<Rectangle> {
// compare in ascending order by area (WxH)
public int compare(Rectangle r1, Rectangle r2) {
return r1.getArea() - r2.getArea();
}
}
public class RectangleXYComparator
implements Comparator<Rectangle> {
// compare by ascending x, break ties by y
public int compare(Rectangle r1, Rectangle r2) {
if (r1.getX() != r2.getX()) {
return r1.getX() - r2.getX();
} else {
return r1.getY() - r2.getY();
}
}
}
92
Using Comparators
• TreeSet, TreeMap , PriorityQueue can use Comparator:
Comparator<Rectangle> comp = new RectangleAreaComparator();
Set<Rectangle> set = new TreeSet<Rectangle>(comp);
Queue<Rectangle> pq = new PriorityQueue<Rectangle>(10,comp);
• Searching and sorting methods can accept Comparators.
Arrays.binarySearch(array, value, comparator)
Arrays.sort(array, comparator)
Collections.binarySearch(list, comparator)
Collections.max(collection, comparator)
Collections.min(collection, comparator)
Collections.sort(list, comparator)
• Methods are provided to reverse a Comparator's ordering:
public static Comparator Collections.reverseOrder()
public static Comparator Collections.reverseOrder(comparator)
93
PQ and Comparator
• Our heap priority queue currently relies on the Comparable
natural ordering of its elements:
public class HeapPriorityQueue<E extends Comparable<E>>
implements PriorityQueue<E> {
...
public HeapPriorityQueue() {...}
}
• To allow other orderings, we can add a constructor that
accepts a Comparator so clients can arrange elements in any
order:
...
public HeapPriorityQueue(Comparator<E> comp) {...}
94
PQ Comparator exercise
• Write code that stores strings in a priority queue and reads
them back out in ascending order by length.
– If two strings are the same length, break the tie by ABC order.
Queue<String> pq = new PriorityQueue<String>(...);
pq.add("you");
pq.add("meet");
pq.add("madam");
pq.add("sir");
pq.add("hello");
pq.add("goodbye");
while (!pq.isEmpty()) {
System.out.print(pq.remove() + " ");
}
// sir you meet hello madam goodbye
95
PQ Comparator answer
• Use the following comparator class to organize the strings:
public class LengthComparator
implements Comparator<String> {
public int compare(String s1, String s2) {
if (s1.length() != s2.length()) {
// if lengths are unequal, compare by length
return s1.length() - s2.length();
} else {
// break ties by ABC order
return s1.compareTo(s2);
}
}
}
...
Queue<String> pq = new PriorityQueue<String>(100,
new LengthComparator());
96
Heap sort
• heap sort: An algorithm to sort an array of N elements by
turning the array into a heap, then calling remove N times.
– The elements will come out in sorted order.
– We can put them into a new sorted array.
– What is the runtime?
97
Heap sort implementation
public static void heapSort(int[] a) {
PriorityQueue<Integer> pq =
new HeapPriorityQueue<Integer>();
for (int n : a) {
pq.add(a);
}
for (int i = 0; i < a.length; i++) {
a[i] = pq.remove();
}
}
– This code is correct and runs in O(N log N) time but wastes
memory.
– It makes an entire copy of the array a into the internal heap of
the priority queue.
– Can we perform a heap sort without making a copy of a?
98
Improving the code
• Idea: Treat a itself as a max-heap, whose data starts at 0 (not
1).
– a is not actually in heap order.
– But if you repeatedly "bubble down" each non-leaf node, starting
from the last one, you will eventually have a proper heap.
• Now that a is a valid max-heap:
– Call remove repeatedly until the heap is empty.
– But make it so that when an element is "removed", it is moved to
the end of the array instead of completely evicted from the array.
– When you are done, voila! The array is sorted.
99
Step 1: Build heap in-place
• "Bubble" down non-leaf nodes until the array is a max-heap:
– int[] a = {21, 66, 40, 10, 70, 81, 30, 22, 45, 95, 88, 38};
– Swap each node with its
larger child as needed.
21
66
40
10
22
70
45
81
88
95
30
38
index
0
1
2
3
4
5
6
7
8
9
0
1
2
...
value
21
6
6
4
0
1
0
7
0
8
1
3
0
2
2
4
5
9
5
8
8
3
8
0
...
size
12
100
Build heap in-place answer
–
–
–
–
–
–
–
30: nothing to do
81: nothing to do
70: swap with 95
10: swap with 45
40: swap with 81
66: swap with 95, then 88
21: swap with 95, then 88, then 70
95
88
81
45
22
70
10
40
21
66
30
38
index
0
1
2
3
4
5
6
7
8
9
0
1
2
...
value
95
8
8
8
1
4
5
7
0
4
0
3
0
2
2
1
0
6
6
2
1
3
8
0
...
size
12
101
Remove to sort
• Now that we have a max-heap, remove elements repeatedly
until we have a sorted array.
– Move each removed element
to the end, rather than tossing it.
95
88
81
45
22
70
10
40
21
66
30
38
index
0
1
2
3
4
5
6
7
8
9
0
1
2
...
value
95
8
8
8
1
4
5
7
0
4
0
3
0
2
2
1
0
6
6
2
1
3
8
0
...
size
12
102
Remove to sort answer
–
–
–
–
–
95:
88:
81:
70:
...
move
move
move
move
38
21
38
10
up,
up,
up,
up,
swap with
swap with
swap with
swap with
88, 70, 66
81, 40
70, 66
66, 45, 22
– (Notice that after 4 removes,
the last 4 elements in the
array are sorted.
22
If we remove every
element, the entire
array will be sorted.)
10
70
66
45
40
38
21
88
81
30
95
index
0
1
2
3
4
5
6
7
8
9
0
1
2
...
value
66
4
5
4
0
2
2
3
8
2
1
3
0
1
0
7
0
8
1
8
8
9
5
0
...
size
12
103