Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson 2013. All rights reserved. Hashing Reading: 18.1 Recall: ADTs • abstract data type (ADT): A specification of a collection of data and the operations that can be performed on it. – Describes what a collection does, not how it does it. • Java's collection framework describes ADTs with interfaces: – Collection, Deque, List, Map, Queue, Set, SortedMap • An ADT can be implemented in multiple ways by classes: – ArrayList and LinkedList – HashSet and TreeSet – LinkedList , ArrayDeque, etc. implement List implement Set implement Queue 3 SearchTree as a set • We implemented a class SearchTree to store a BST of ints: • Our BST is essentially a set of integers. Operations we support: – add – contains – remove ... overallRoot 55 29 -3 87 42 60 91 • But there are other ways to implement a set... 4 Sets • set: A collection of unique values (no duplicates allowed) that can perform the following operations efficiently: – add, remove, search (contains) – The client doesn't think of a set as having indexes; we just add things to the set in general and don't worry about order "if" set.contains("to") set.contains("be") "the" "to" "of" "down" "from" "by" "she" "you" "in" "why" "him" set true false 5 Int Set ADT interface • Let's think about how to write our own implementation of a set. – To simplify the problem, we only store ints in our set for now. – As is (usually) done in the Java Collection Framework, we will define sets as an ADT by creating a Set interface. – Core operations are: add, contains, remove. public interface IntSet { void add(int value); boolean contains(int value); void clear(); boolean isEmpty(); void remove(int value); int size(); } 6 Unfilled array set • Consider storing a set in an unfilled array. – It doesn't really matter what order the elements appear in a set, so long as they can be added and searched quickly. – What would make a good ordering for the elements? • If we store them in the next available index, as in a list, ... – set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); inde 0 1 2 3 4 5 x valu 9 2 8 -3 4 1 e 3 9 2 6 – How efficient is add? size contains? remove? • O(1), O(N), O(N) 6 7 8 9 0 0 0 0 • (contains must loop over the array; remove must shift elements.) 7 Sorted array set • Suppose we store the elements in an unfilled array, but in sorted order rather than order of insertion. – set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); inde 0 1 x valu -3 8 e size 6 2 3 4 5 6 7 8 9 9 1 2 2 3 4 9 0 0 0 0 – How efficient is add? contains? remove? • O(N), O(log N), O(N) • (You can do an O(log N) binary search to find elements in contains, and to find the proper index in add/remove; but add/remove still need to shift elements right/left to make room, which is O(N) on average.) 8 A strange idea • Silly idea: When client adds value i, store it at index i in the array. – Would this work? – Problems / drawbacks of this approach? How to work around them? inde 0 1 2 3 4 5 6 7 8 9 x set.add(7); valu 0 1 0 0 0 0 0 7 0 9 set.add(1); set.add(9); e ... size 3 set.add(18); index 0set.add(12); 1 2 3 4 5 value 0 size 5 1 0 0 0 0 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 7 0 9 0 0 1 2 0 0 0 0 0 1 8 0 9 Hashing • hash: To map a large domain of values to a smaller fixed domain. – Typically, mapping a set of elements to integer indexes in an array. – Idea: Store any given element value in a particular predictable index. • That way, adding / removing / looking for it are constant-time (O(1)). – hash table: An array that stores elements via hashing. • hash function: An algorithm that maps values to indexes. – hash code: The output of a hash function for a given value. – In previous slide, our "hash function" was: hash(i) i • Potentially requires a large array (a.length > i). 10 Improved hash function • To deal with negative numbers: hash(i) abs(i) • To deal with large numbers: hash(i) abs(i) % length set.add(37); set.add(-2); set.add(49); inde x valu e size // inside // abs(37) % 10 == 7 // abs(-2) % 10 == 2 // abs(49) % 10 == 9 0 1 2 3 4 5 6 7 8 9 0 0 -2 0 0 0 0 3 7 0 4 9 3 HashIntSet class private int hash(int i) { return Math.abs(i) % elements.length; } 11 Sketch of implementation public class HashIntSet implements IntSet { private int[] elements; ... public void add(int value) { elements[hash(value)] = value; } public boolean contains(int value) { return elements[hash(value)] == value; } public void remove(int value) { elements[hash(value)] = 0; } } – Runtime of add, contains, and remove: O(1) !! • Are there any problems with this approach? 12 Collisions • collision: When hash function maps 2 values to same index. set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); inde x valu e size // collides with 24! 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 5 4 0 0 3 7 0 4 9 5 • collision resolution: An algorithm for fixing collisions. 13 Probing • probing: Resolving a collision by moving to another index. – linear probing: Moves to the next available index (wraps if needed). set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); inde 0 x valu 0 e size 5 // collides with 24; must probe 1 2 3 4 5 6 7 8 9 1 1 0 0 2 4 5 4 0 3 7 0 4 9 – variation: quadratic probing moves increasingly far away: +1, 14 +4, +9, ... Implementing HashIntSet • Let's implement an int set using a hash table with linear probing. – For simplicity, assume that the set cannot store 0s for now. public class HashIntSet implements IntSet { private int[] elements; private int size; // constructs new empty set public HashIntSet() { elements = new int[10]; size = 0; } // hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; } ... 15 The add operation • How do we add an element to the hash table? – – – – Use the hash function to find the proper bucket index. If we see a 0, put it there. If not, move forward until we find an empty (0) index to store it. If we see that the value is already in the table, don't re-add it. – set.add(54); – set.add(14); inde x valu e size // client code 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 5 4 1 4 3 7 0 4 9 6 16 Implementing add • How do we add an element to the hash table? public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } if (elements[h] != value) { elements[h] = value; size++; } } inde x valu e size // linear probing // for empty slot // avoid duplicates 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 5 4 0 3 7 0 4 9 5 17 The contains operation • How do we search for an element in the hash table? – Use the hash function to find the proper bucket index. – Loop forward until we either find the value, or an empty index (0). – If find the value, it is contained (true). If we find 0, it is not (false). – set.contains(24) – set.contains(14) – set.contains(35) inde x valu e size // true // true // false 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 5 4 1 4 3 7 0 4 9 6 18 Implementing contains public boolean contains(int value) { int h = hash(value); while (elements[h] != 0) { if (elements[h] == value) { return true; } h = (h + 1) % elements.length; } return false; } inde x valu e size // linear probing // to search // not found 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 5 4 0 3 7 0 4 9 5 19 The remove operation • We cannot remove by simply zeroing out an element: set.remove(54); set.contains(14) // set index 5 to 0 // false??? oops inde 0 1 2 3 4 5 6 7 8 9 x valu 0 1 0 0 2 0 1 3 0 4 e 1 4 4 4 9 5 it by a special "removed" placeholder value • Instead, wesize replace – (can be re-used on add, but keep searching on contains) inde x valu e size 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 X X 1 4 3 4 0 4 9 5 20 Implementing remove public void remove(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } if (elements[h] == value) { elements[h] = -999; // "removed" flag value size--; } } inde 0 1 2 3 4 5 6 7 8 9 x valu 0 11 0 0 24 - 14 34 0 49 e 999 set.remove(54); size 5 // client code set.remove(11); set.remove(34); 21 Patching add, contains private static final int REMOVED = -999; public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value && elements[h] != REMOVED) { h = (h + 1) % elements.length; } if (elements[h] != value) { elements[h] = value; size++; } } // contains does not need patching; // it should keep going on a -999, which it already does public boolean contains(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } return elements[h] == value; } 22 Problem: full array • clustering: Clumps of elements at neighboring indexes. – Slows down the hash table lookup; you must loop through them. set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); set.add(14); set.add(86); // collides with 24 // collides with 24, then 54 // collides with 14, then 37 inde 0 1 2 3 4 5 6 x valu 0 0 0 0 0 0 0 e • Where does size each 0 value go in the array? 7 8 9 0 0 0 • How many indexes must be examined to answer contains(94)? • What will happen if the array completely fills? 23 Rehashing • rehash: Growing to a larger array when the table is too full. – Cannot simply copy the old array to a new one. (Why not?) • load factor: ratio of (# of elements ) / (hash table length ) – many collections rehash when load factor ≅ .75 inde x 0 1 2 3 4 5 6 7 8 9 valu e 9 5 1 1 0 0 2 4 5 4 1 4 3 7 6 6 4 8 size inde x 8 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 valu e 0 0 0 0 2 4 0 6 6 0 4 8 0 0 1 1 0 0 5 4 9 5 1 4 3 7 0 0 size 8 24 Implementing rehash // Grows hash table to twice its original size. private void rehash() { int[] old = elements; elements = new int[2 * old.length]; size = 0; for (int value : old) { if (value != 0 && value != REMOVED) { add(value); } } } public void add(int value) { if ((double) size / elements.length >= 0.75) { rehash(); } ... } 25 Hash table sizes • Can use prime numbers as hash table sizes to reduce collisions. • Also improves spread / reduces clustering on rehash. set.add(11); set.add(39); set.add(21); set.add(29); set.add(71); set.add(41); set.add(99); index // 11 // 39 // 21 // 29 // 81 // 41 // 101 % % % % % % % 13 13 13 13 13 13 13 == 11 == 0 == 8 == 3 == 6 == 2 == 10 0 1 2 3 4 5 6 7 8 9 10 11 12 value 39 0 41 29 0 0 71 0 21 0 10 1 size 11 0 7 26 Other details • How would we implement toString on our HashIntSet? inde x valu e size 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 2 4 5 4 0 3 7 0 4 9 5 System.out.println(set); // [11, 24, 54, 37, 49] 27 Separate chaining • separate chaining: Solving collisions by storing a list at each index. – add/contains/remove must traverse lists, but the lists are short – impossible to "run out" of indexes, unlike with probing index 0 1 2 3 4 5 6 7 8 9 value 11 24 7 49 54 private class Node { public int data; public Node next; ... } 14 28 Implementing HashIntSet • Let's implement a hash set of ints using separate chaining. public class HashIntSet implements IntSet { // array of linked lists; // elements[i] = front of list #i (null if empty) private Node[] elements; private int size; // constructs new empty set public HashIntSet() { elements = new Node[10]; size = 0; } // hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; } ... 29 The add operation • How do we add an element to the hash table? – When you want to modify a linked list, you must either change the list's front reference, or the next field of a node in the list. – Where in the list should we add the new element? – Must make sure to avoid duplicates. index 0 value – set.add(24); 1 11 new node 24 2 3 4 54 5 6 7 7 8 9 49 14 30 Implementing add public void add(int value) { if (!contains(value)) { int h = hash(value); Node newNode = new Node(value); newNode.next = elements[h]; elements[h] = newNode; size++; } } // add to front // of list #h 31 The contains operation • How do we search for an element in the hash table? – Must loop through the linked list for the appropriate hash index, looking for the desired value. – Looping through a linked list requires a "current" node reference. index 0 value 1 2 11 current – set.contains(14) – set.contains(84) – set.contains(53) // true // false // false 3 4 24 5 6 7 7 8 9 49 54 14 32 Implementing contains public boolean contains(int value) { Node current = elements[hash(value)]; while (current != null) { if (current.data == value) { return true; } current = current.next; } return false; } 33 The remove operation • How do we remove an element from the hash table? – Cases to consider: front (24), non-front (14), not found (94), null (32) – To remove a node from a linked list, you must either change the list's front reference, or the next field of the previous node in the list. index 0 value – set.remove(54); 1 2 11 current 3 4 24 5 6 7 7 8 9 49 54 14 34 Implementing remove public void remove(int value) { int h = hash(value); if (elements[h] != null && elements[h].data == value) { elements[h] = elements[h].next; // front case size--; } else { Node current = elements[h]; // non-front case while (current != null && current.next != null) { if (current.next.data == value) { current.next = current.next.next; size--; return; } current = current.next; } } } 35 Rehashing w/ chaining • Separate chaining handles rehashing similarly to linear probing. – Loop over the list in each hash bucket; re-add each element. – An optimal implementation re-uses node objects, but this is optional. inde 0 1 2 3 4 5 6 7 8 9 x valu e inde x valu e 1 1 2 4 5 4 1 4 7 4 9 0 1 2 3 4 5 6 7 8 9 2 4 7 4 9 1 0 1 1 1 1 1 2 1 3 1 4 5 4 1 4 1 5 1 6 1 7 1 8 1 9 36 Hash set of objects public class HashSet<E> implements Set<E> { ... private class Node { public E data; public Node next; } } • It is easy to hash an integer i (use index abs(i) % length ). – How can we hash other types of values (such as objects)? 37 The hashCode method • All Java objects contain the following method: public int hashCode() Returns an integer hash code for this object. – We can call hashCode on any object to find its preferred index. – HashSet, HashMap, and the other built-in "hash" collections call hashCode internally on their elements to store the data. • We can modify our set's hash function to be the following: private int hash(E e) { return Math.abs(e.hashCode()) % elements.length; } 38 Issues with generics • You must make an unusual cast on your array of generic nodes: public class HashSet<E> implements Set<E> { private Node[] elements; ... public HashSet() { elements = (Node[]) new HashSet.Node[10]; } • Perform all element comparisons using equals: public boolean contains(int value) { ... // if (current.data == value) { if (current.data.equals(value)) { return true; } ... 39 Implementing hashCode • You can write your own hashCode methods in classes you write. – All classes come with a default version based on memory address. – Your overridden version should somehow "add up" the object's state. • Often you scale/multiply parts of the result to distribute the results. public class Point { private int x; private int y; ... public int hashCode() { // better than just returning (x + y); // spreads out numbers, fewer collisions return 137 * x + 23 * y; } } 40 Good hashCode behavior • A well-written hashCode method has: – Consistently with itself (must produce same results on each call): o.hashCode() == o.hashCode(), if o's state doesn't change – Consistently with equality: a.equals(b) must imply that a.hashCode() == b.hashCode(), !a.equals(b) does NOT necessarily imply that a.hashCode() != b.hashCode() (why not?) • When your class has an equals or hashCode, it should have both. – Good distribution of hash codes: • For a large set of objects with distinct states, they will generally return unique hash codes rather than all colliding into the same hash 41 bucket. Example: String hashCode • The hashCode function inside a String object looks like this: public int hashCode() { int hash = 0; for (int i = 0; i < this.length(); i++) { hash = 31 * hash + this.charAt(i); } return hash; } – As with any general hashing function, collisions are possible. • Example: "Ea" and "FB" have the same hash value. – Early versions of the Java examined only the first 16 characters. For some common data this led to poor hash table performance. 42 hashCode tricks • If one of your object's fields is an object, call its hashCode: public int hashCode() { // Student return 531 * firstName.hashCode() + ...; • To incorporate a double or boolean, use the hashCode method from the Double or Boolean wrapper classes: public int hashCode() { // BankAccount return 37 * Double.valueOf(balance).hashCode() + Boolean.valueOf(isCheckingAccount).hashCode(); • Guava includes an Objects.hashCode(...) method that takes any number of values and combines them into one hash code. public int hashCode() { // BankAccount return Objects.hashCode(name, id, balance); 43 Implementing a hash map • A hash map is like a set where the nodes store key/value pairs: public class HashMap<K, V> implements Map<K, V> { ... } index 0 1 2 3 4 5 6 7 8 9 value "Stef " // key value map.put("Marty", 14); map.put("Jeff", 21); map.put("Kasey", 20); map.put("Stef", 35); 35 "Marty" 14 "Jeff" 21 "Kasey" 20 – Must modify your Node class to store a key and a value 44 Map ADT interface • Let's think about how to write our own implementation of a map. – As is (usually) done in the Java Collection Framework, we will define map as an ADT by creating a Map interface. – Core operations: put (add), get, contains key, remove public interface Map<K, V> { void clear(); boolean containsKey(K key); V get(K key); boolean isEmpty(); void put(K key, V value); void remove(int value); int size(); } 45 Hash map vs. hash set – The hashing is always done on the keys, not the values. – The contains method is now containsKey; there and in remove, you search for a node whose key matches a given key. – The add method is now put; if the given key is already there, you must replace its old value with the new one. • map.put("Bill", 66); index 0 1 2 // replace 49 with 66 3 4 5 6 7 8 9 value "Stef " 35 "Marty" 14 "Abby" 57 "Bill" "Jeff" 21 "Kasey" 20 49 66 46 Priority Queues and Heaps Reading: 18.2 Prioritization problems • print jobs: CSE lab printers constantly accept and complete jobs from all over the building. We want to print faculty jobs before staff before student jobs, and grad students before undergrad, etc. • ER scheduling: Scheduling patients for treatment in the ER. A gunshot victim should be treated sooner than a guy with a cold, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive? • key operations we want: – add an element (print job, patient, etc.) – get/remove the most "important" or "urgent" element 48 Priority Queue ADT • priority queue: A collection of ordered elements that provides fast access to the minimum (or maximum) element. – – – – add peek remove isEmpty, adds in order returns minimum or "highest priority" value removes/returns minimum value clear, size, iterator O(1) pq.add("if"); pq.add("from"); ... pq.remove() "the" "of" "if" "to" "down" "from" "by" "she" "you" "in" "why" "him" "by" priority queue 49 Unfilled array? • Consider using an unfilled array to implement a priority queue. Store it in the next available index, as in a list. Loop over elements to find minimum element. Loop over elements to find min. Shift to remove. inde 0 1 2 3 4 5 6 7 8 9 queue.add(9); x queue.add(23); valu 9 2 8 - 4 1 0 0 0 0 queue.add(8); e 3 3 9 2 queue.add(-3); queue.add(49); size 6 – add: – peek: – remove: queue.add(12); queue.remove(); – How efficient is add? peek? remove? • O(1), O(N), O(N) • (peek must loop over the array; remove must shift elements) 50 Sorted array? • Consider using a sorted array to implement a priority queue. – add: order. – peek: – remove: Store it in the proper index to maintain sorted Minimum element is in index [0]. Shift elements to remove min from index [0]. inde 0 1 2 3 4 5 6 7 8 9 queue.add(9); x queue.add(23); valu - 8 9 1 2 4 0 0 0 0 queue.add(8); e 3 2 3 9 queue.add(-3); queue.add(49); size 6 queue.add(12); queue.remove(); – How efficient is add? peek? remove? • O(N), O(1), O(N) • (add and remove must shift elements) 51 Linked list? • Consider using a doubly linked list to implement a priority queue. – add: – peek: – remove: remove. Store it at the end of the linked list. Loop over elements to find minimum element. Loop over elements to find min. Unlink to queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); 9 front 2 3 8 – How efficient is add? peek? remove? • O(1), O(N), O(N) -3 4 9 1 2 back 52 Sorted linked list? • Consider using a sorted linked list to implement a priority queue. – add: order. – peek: – remove: Store it in the proper place to maintain sorted Minimum element is at the front. Unlink front -3 element 8 to remove. 9 1 queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); front – How efficient is add? peek? remove? • O(N), O(1), O(1) 2 2 3 4 9 back 53 Binary search tree? • Consider using a binary search tree to implement a PQ. – add: – peek: tree. – remove: Store it in the proper BST L/R - ordered spot. Minimum element is at the far left edge of the Unlink far left element to remove. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); 9 8 -3 23 12 49 – How efficient is add? peek? remove? • O(log N), O(log N), O(log N)...? • (good in theory, but the tree tends to become unbalanced to the 54 Unbalanced binary tree queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); queue.add(16); queue.add(34); queue.remove(); queue.remove(); queue.add(42); queue.add(45); queue.remove(); 12 23 16 49 34 42 45 – Simulate these operations. What is the tree's shape? – A tree that is unbalanced has a height close to N rather than log N, which breaks the expected runtime of many operations. 55 Heaps • heap: A complete binary tree with vertical ordering. – complete tree: Every level is full except possibly the lowest level, which must be filled from left to right • (i.e., a node may not have any children until all possible siblings exist) 56 Heap ordering • heap ordering: If P ≤ X for every element X with parent P. – Parents' values are always smaller than those of their children. – Implies that minimum element is always the root (a "min-heap"). • variation: "max-heap" stores largest element at root, reverses ordering – Is a heap a BST? How are they related? 57 Which are min-heaps? no 10 20 20 80 40 30 no 10 80 60 85 99 700 10 20 40 40 60 700 85 50 80 60 85 99 700 10 80 no 10 15 50 50 20 no 10 20 99 20 80 40 40 80 60 99 60 58 Which are max-heaps? no 30 10 48 21 20 80 10 25 14 24 7 17 3 50 30 33 10 30 10 30 40 22 no 35 28 18 9 11 59 59 Heap height and runtime • The height of a complete tree is always log N. – How do we know this for sure? • Because of this, if we implement a priority queue using a heap, we can provide the following runtime guarantees: – add: O(log N) – peek: O(1) – remove: O(log N) n-node complete tree of height h: 2h n 2h+1 – 1 h = log n 60 The add operation • When an element is added to a heap, where should it go? – Must insert a new node while maintaining heap properties. – queue.add(15); new node 10 15 20 40 50 80 60 700 85 99 65 61 The add operation • When an element is added to a heap, it should be initially placed as the rightmost leaf (to maintain the completeness property). – But the heap ordering property becomes broken! 10 10 20 40 50 80 60 700 65 85 20 99 40 50 80 60 700 65 85 99 15 62 "Bubbling up" a node • bubble up: To restore heap ordering, the newly added element is shifted ("bubbled") up the tree until it reaches its proper place. – Weiss: "percolate up" by swapping with its parent – How many bubble-ups are necessary, at most? 10 20 40 50 80 60 700 65 85 15 10 15 99 80 40 50 20 700 65 85 99 60 63 Bubble-up exercise • Draw the tree state of a min-heap after adding these elements: – 6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2 2 19 25 76 6 42 50 55 11 88 104 20 64 The peek operation • A peek on a min-heap is trivial to perform. – because of heap properties, minimum element is always the root – O(1) runtime • Peek on a max-heap would be O(1) as well (return max, not min) 10 20 40 50 80 60 76 85 99 65 65 The remove operation • When an element is removed from a heap, what should we do? – The root is the node to remove. How do we alter the tree? – queue.remove(); 10 20 40 50 80 60 700 85 99 65 66 The remove operation • When the root is removed from a heap, it should be initially replaced by the rightmost leaf (to maintain completeness). – But the heap ordering property becomes broken! 10 65 20 40 700 80 60 50 65 85 20 99 40 700 80 60 50 85 99 65 67 "Bubbling down" a node • bubble down: To restore heap ordering, the new improper root is shifted ("bubbled") down the tree until it reaches its proper place. – Weiss: "percolate down" by swapping with its smaller child (why?) – How many bubble-down are necessary, at most? 20 65 20 40 74 80 60 50 85 40 99 50 74 80 60 85 99 65 68 Bubble-down exercise • Suppose we have the min-heap shown below. • Show the state of the heap tree after remove has been called 3 times, and which elements are returned by the removal. 2 19 25 76 6 42 50 55 11 88 104 20 69 Array heap implementation • Though a heap is conceptually a binary tree, since it is a complete tree, when implementing it we actually can "cheat" and just use an array! – index of root = 1 (leave 0 empty to simplify the math) – for any node n at index i : • index of n.left = 2i • index of n.right = 2i + 1 • parent index of n? – This array representation is elegant and efficient (O(1)) for common tree operations. 70 Implementing HeapPQ • Let's implement an int priority queue using a min-heap array. public class HeapIntPriorityQueue implements IntPriorityQueue { private int[] elements; private int size; // constructs a new empty priority queue public HeapIntPriorityQueue() { elements = new int[10]; size = 0; } ... } 71 Helper methods • Since we will treat the array as a complete tree/heap, and walk up/down between parents/children, these methods are helpful: // helpers for navigating indexes up/down the tree private int parent(int index) { return index/2; } private int leftChild(int index) { return index*2; } private int rightChild(int index) { return index*2 + 1; } private boolean hasParent(int index) { return index > 1; } private boolean hasLeftChild(int index) { return leftChild(index) <= size; } private boolean hasRightChild(int index) { return rightChild(index) <= size; } private void swap(int[] a, int index1, int index2) { int temp = a[index1]; a[index1] = a[index2]; a[index2] = temp; } 72 Implementing add • Let's write the code to add an element to the heap: public void add(int value) { ... } 10 10 20 40 50 80 60 700 65 85 15 15 99 80 40 50 20 700 65 85 99 60 73 Implementing add // Adds the given value to this priority queue in order. public void add(int value) { elements[size + 1] = value; // add as rightmost leaf // "bubble up" as necessary to fix ordering int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } } size++; } 74 Resizing a heap • What if our array heap runs out of space? – We must enlarge it. – When enlarging hash sets, we needed to carefully rehash the data. – What must we do here? – (We can simply copy the data into a larger array.) 75 Modified add code // Adds the given value to this priority queue in order. public void add(int value) { // resize to enlarge the heap if necessary if (size == elements.length - 1) { elements = Arrays.copyOf(elements, 2 * elements.length); } ... } 76 Implementing peek • Let's write code to retrieve the minimum element in the heap: public int peek() { ... } 10 15 80 40 50 20 700 65 85 99 60 77 Implementing peek // Returns the minimum element in this priority queue. // precondition: queue is not empty public int peek() { return elements[1]; } 78 Implementing remove • Let's write code to remove the minimum element in the heap: public int remove() { ... } 10 65 20 40 700 80 60 50 65 85 20 99 40 700 80 60 50 85 99 65 79 Implementing remove public int remove() { // precondition: queue is not empty int result = elements[1]; // last leaf -> root elements[1] = elements[size]; size--; int index = 1; // "bubble down" to fix ordering boolean found = false; while (!found && hasLeftChild(index)) { int left = leftChild(index); int right = rightChild(index); int child = left; if (hasRightChild(index) && elements[right] < elements[left]) { child = right; } if (elements[index] > elements[child]) { swap(elements, index, child); index = child; } else { found = true; // found proper location; stop } } return result; } 80 Int PQ ADT interface • Let's write our own implementation of a priority queue. – To simplify the problem, we only store ints in our set for now. – As is (usually) done in the Java Collection Framework, we will define sets as an ADT by creating a Set interface. – Core operations are: add, peek (at min), remove (min). public interface IntPriorityQueue { void add(int value); void clear(); boolean isEmpty(); int peek(); // return min element int remove(); // remove/return min element int size(); } 81 Generic PQ ADT • Let's modify our priority queue so it can store any type of data. – As with past collections, we will use Java generics (a type parameter). public interface PriorityQueue<E> { void add(E value); void clear(); boolean isEmpty(); E peek(); // return min element E remove(); // remove/return min element int size(); } 82 Generic HeapPQ class • We can modify our heap priority class to use generics as usual... public class HeapPriorityQueue<E> implements PriorityQueue<E> { private E[] elements; private int size; // constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; } ... } 83 Problem: ordering elements // Adds the given value to this priority queue in order. public void add(E value) { ... int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { // error swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } } } – Even changing the < to a compareTo call does not work. • Java cannot be sure that type E has a compareTo method. 84 Comparing objects • Heaps rely on being able to order their elements. • Operators like < and > do not work with objects in Java. – But we do think of some types as having an ordering (e.g. Dates). – (In other languages, we can enable <, > with operator overloading.) • natural ordering: Rules governing the relative placement of all values of a given type. – Implies a notion of equality (like equals) but also < and > . – total ordering: All elements can be arranged in A ≤ B ≤ C ≤ ... order. – The Comparable interface provides a natural ordering. 85 The Comparable interface • The standard way for a Java class to define a comparison function for its objects is to implement the Comparable interface. public interface Comparable<T> { public int compareTo(T other); } • A call of A.compareTo(B) should return: a value < 0 if A comes "before" B in the ordering, a value > 0 if A comes "after" B in the ordering, or exactly 0 if A and B are considered "equal" in the ordering. • Effective Java Tip #12: Consider implementing 86 Bounded type parameters <Type extends SuperType> – An upper bound; accepts the given supertype or any of its subtypes. – Works for multiple superclass/interfaces with & : <Type extends ClassA & InterfaceB & InterfaceC & ...> <Type super SuperType> – A lower bound; accepts the given supertype or any of its supertypes. • Example: // can be instantiated with any animal type public class Nest<T extends Animal> { ... } 87 Corrected HeapPQ class public class HeapPriorityQueue<E extends Comparable<E>> implements PriorityQueue<E> { private E[] elements; private int size; // constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; } ... public void add(E value) { ... while (...) { if (elements[index].compareTo( elements[parent]) < 0) { swap(...); } } } } 88 Ordering and Comparators What's the "natural" order? public class Rectangle implements Comparable<Rectangle> { private int x, y, width, height; public int compareTo(Rectangle other) { // ...? } } • What is the "natural ordering" of rectangles? – By x, breaking ties by y? – By width, breaking ties by height? – By area? By perimeter? • Do rectangles have any "natural" ordering? – Might we want to arrange rectangles into some order anyway? 90 Comparator interface public interface Comparator<T> { public int compare(T first, T second); } • Interface Comparator is an external object that specifies a comparison function over some other type of objects. – Allows you to define multiple orderings for the same type. – Allows you to define a specific ordering(s) for a type even if there is no obvious "natural" ordering for that type. – Allows you to externally define an ordering for a class that, for whatever reason, you are not able to modify to make it Comparable: • a class that is part of the Java class libraries • a class that is final and can't be extended • a class from another library or author, that you don't control 91 Comparator examples public class RectangleAreaComparator implements Comparator<Rectangle> { // compare in ascending order by area (WxH) public int compare(Rectangle r1, Rectangle r2) { return r1.getArea() - r2.getArea(); } } public class RectangleXYComparator implements Comparator<Rectangle> { // compare by ascending x, break ties by y public int compare(Rectangle r1, Rectangle r2) { if (r1.getX() != r2.getX()) { return r1.getX() - r2.getX(); } else { return r1.getY() - r2.getY(); } } } 92 Using Comparators • TreeSet, TreeMap , PriorityQueue can use Comparator: Comparator<Rectangle> comp = new RectangleAreaComparator(); Set<Rectangle> set = new TreeSet<Rectangle>(comp); Queue<Rectangle> pq = new PriorityQueue<Rectangle>(10,comp); • Searching and sorting methods can accept Comparators. Arrays.binarySearch(array, value, comparator) Arrays.sort(array, comparator) Collections.binarySearch(list, comparator) Collections.max(collection, comparator) Collections.min(collection, comparator) Collections.sort(list, comparator) • Methods are provided to reverse a Comparator's ordering: public static Comparator Collections.reverseOrder() public static Comparator Collections.reverseOrder(comparator) 93 PQ and Comparator • Our heap priority queue currently relies on the Comparable natural ordering of its elements: public class HeapPriorityQueue<E extends Comparable<E>> implements PriorityQueue<E> { ... public HeapPriorityQueue() {...} } • To allow other orderings, we can add a constructor that accepts a Comparator so clients can arrange elements in any order: ... public HeapPriorityQueue(Comparator<E> comp) {...} 94 PQ Comparator exercise • Write code that stores strings in a priority queue and reads them back out in ascending order by length. – If two strings are the same length, break the tie by ABC order. Queue<String> pq = new PriorityQueue<String>(...); pq.add("you"); pq.add("meet"); pq.add("madam"); pq.add("sir"); pq.add("hello"); pq.add("goodbye"); while (!pq.isEmpty()) { System.out.print(pq.remove() + " "); } // sir you meet hello madam goodbye 95 PQ Comparator answer • Use the following comparator class to organize the strings: public class LengthComparator implements Comparator<String> { public int compare(String s1, String s2) { if (s1.length() != s2.length()) { // if lengths are unequal, compare by length return s1.length() - s2.length(); } else { // break ties by ABC order return s1.compareTo(s2); } } } ... Queue<String> pq = new PriorityQueue<String>(100, new LengthComparator()); 96 Heap sort • heap sort: An algorithm to sort an array of N elements by turning the array into a heap, then calling remove N times. – The elements will come out in sorted order. – We can put them into a new sorted array. – What is the runtime? 97 Heap sort implementation public static void heapSort(int[] a) { PriorityQueue<Integer> pq = new HeapPriorityQueue<Integer>(); for (int n : a) { pq.add(a); } for (int i = 0; i < a.length; i++) { a[i] = pq.remove(); } } – This code is correct and runs in O(N log N) time but wastes memory. – It makes an entire copy of the array a into the internal heap of the priority queue. – Can we perform a heap sort without making a copy of a? 98 Improving the code • Idea: Treat a itself as a max-heap, whose data starts at 0 (not 1). – a is not actually in heap order. – But if you repeatedly "bubble down" each non-leaf node, starting from the last one, you will eventually have a proper heap. • Now that a is a valid max-heap: – Call remove repeatedly until the heap is empty. – But make it so that when an element is "removed", it is moved to the end of the array instead of completely evicted from the array. – When you are done, voila! The array is sorted. 99 Step 1: Build heap in-place • "Bubble" down non-leaf nodes until the array is a max-heap: – int[] a = {21, 66, 40, 10, 70, 81, 30, 22, 45, 95, 88, 38}; – Swap each node with its larger child as needed. 21 66 40 10 22 70 45 81 88 95 30 38 index 0 1 2 3 4 5 6 7 8 9 0 1 2 ... value 21 6 6 4 0 1 0 7 0 8 1 3 0 2 2 4 5 9 5 8 8 3 8 0 ... size 12 100 Build heap in-place answer – – – – – – – 30: nothing to do 81: nothing to do 70: swap with 95 10: swap with 45 40: swap with 81 66: swap with 95, then 88 21: swap with 95, then 88, then 70 95 88 81 45 22 70 10 40 21 66 30 38 index 0 1 2 3 4 5 6 7 8 9 0 1 2 ... value 95 8 8 8 1 4 5 7 0 4 0 3 0 2 2 1 0 6 6 2 1 3 8 0 ... size 12 101 Remove to sort • Now that we have a max-heap, remove elements repeatedly until we have a sorted array. – Move each removed element to the end, rather than tossing it. 95 88 81 45 22 70 10 40 21 66 30 38 index 0 1 2 3 4 5 6 7 8 9 0 1 2 ... value 95 8 8 8 1 4 5 7 0 4 0 3 0 2 2 1 0 6 6 2 1 3 8 0 ... size 12 102 Remove to sort answer – – – – – 95: 88: 81: 70: ... move move move move 38 21 38 10 up, up, up, up, swap with swap with swap with swap with 88, 70, 66 81, 40 70, 66 66, 45, 22 – (Notice that after 4 removes, the last 4 elements in the array are sorted. 22 If we remove every element, the entire array will be sorted.) 10 70 66 45 40 38 21 88 81 30 95 index 0 1 2 3 4 5 6 7 8 9 0 1 2 ... value 66 4 5 4 0 2 2 3 8 2 1 3 0 1 0 7 0 8 1 8 8 9 5 0 ... size 12 103