Java Methods Object-Oriented Programming and Data Structures 3rd AP edition Maria Litvin ● Gary Litvin Chapter Hash Function Danger Keep Out 25 Lookup Tables and Hashing Copyright © 2015 by Maria Litvin, Gary Litvin, and Skylight Publishing. All rights reserved. Objectives: • Learn about lookup tables • Learn about hashing • Review java.util.HashSet and java.util.HashMap 25-2 Lookup Tables • A lookup table is an array that helps to find data very quickly. • The array stores references to data records (or some values). • A data record is identified by some key. • The value of a key is directly translated into an array index using a simple formula. 25-3 Lookup Tables (cont’d) • Only one key can be mapped onto a particular index (no collisions). • The index that corresponds to a key must fall into the valid range (from 0 to array.length-1). • Access to data is “instantaneous” (O(1)). 25-4 Lookup Tables: Example 1 Zip codes 0 1 ... 600 601 ... 1004 1005 1006 1007 1008 1009 ... 99950 99951 ... 99998 99999 <null> <null> Corresponding locales <null> Adjuntas, PR Amherst, MA Barre, MA <null> Belchertown, MA Blanford, MA Bondsville, MA Some table entries remain unused Ketchikan, AK <null> <null> <null> 25-5 Lookup Tables: Example 2 private static final int [ ] n_thPowerOf3 = { 1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683 }; ... // precondition: 0 <= n < 10 public int powOf3 (int n) { return n_thPowerOf3 [ n ]; } 25-6 Lookup Tables: Example 3 256 colors used in a particular image; each of the palette entries corresponds to a triplet of RGB values 25-7 Applications of Lookup Tables • • • • Data retrieval Data compression and encryption Tabulating functions Color mapping 25-8 Hash Tables • A hash table is similar to a lookup table. • The value of a key is translated into an array index using a hash function. • The index computed for a key must fall into the valid range. • The hash function can map different keys onto the same array index — this situation is called a collision. 25-9 Hash Tables (cont’d) • The hash function should map the keys onto the array indices randomly and uniformly. • A well-designed hash table and hash function minimize the number of collisions. • There are two common techniques for resolving collisions: chaining and probing. 25-10 Each element in the array is itself a collection, called a bucket (a list or a BST), which is searched for the desired key Chaining Hash Function "Peach" 126 Danger Keep Out 122 123 124 125 126 127 "Peach" Buckets 25-11 If the place where we want to store the key is occupied by a different key, we store the former in another location in the same array, computed using a certain probing formula Probing "Peach" Hash Function 126 Danger Keep Out The probing function recalculates the index Peach? 125 126 127 128 Apple Banana Lemon Orange Nope... Nope... 129 130 Peach Pl Yes! 25-12 java.util.HashSet<E> and java.util.HashMap<K,V> Classes • These classes implement the Set<E> and Map<K,V> interfaces, respectively, using hash tables (with chaining). • This implementation may be more efficient than TreeSet and TreeMap. 25-13 HashSet and HashMap (cont’d) • Collisions are resolved through chaining. • The sizes of buckets must remain relatively small. • Load factor: Load factor = Total number of items Number of buckets 25-14 HashSet and HashMap (cont’d) • Fine tuning: Load factor too large lots of collisions Load factor too small wasted space and slow iterations over the whole set • If the load factor exceeds the specified limit, the table is automatically rehashed into a larger table; if possible this should be avoided. 25-15 HashSet and HashMap (cont’d) • Objects in a HashSet or keys in a HashMap must have a reasonable int hashCode method that overrides Object’s hashCode and helps calculate the hashing function. • The hashCode method returns an int from the entire int range; it is later mapped on the range of indices in a particular hash table. • String, Integer, Double: each has a reasonable hashCode defined. 25-16 hashCode Examples • For String: hashCode s0 31n1 s1 31n2 sn1 (where si is Unicode for the i-th character in the string) • For Person: public int hashCode ( ) { return getFirstName( ).hashCode( ) + getLastName( ).hashCode( ); } 25-17 Consistency • HashSet / HashMap first use hashCode, then equals. • TreeSet / TreeMap use only compareTo (or a comparator) • For consistent performance, these methods should agree with each other: x.equals (y) x.compareTo (y) == 0 x.equals (y) x.hashCode( ) == y.hashCode( ) 25-18 HashSet<E> Constructors 25-19 HashMap<K,V> Constructors 25-20 Review: • What is the main difference between a lookup table and a hash table? • What is a collision? • Name two common techniques for resolving collisions. • What is a bucket? • What is a load factor? 25-21 Review (cont’d): • How is hash table performance affected when the load factor is too high? Too low? • What happens to a HashSet or a HashMap when the load factor exceeds the specified limit? • HashSet’s no-args constructor sets the initial capacity to 16 and the load factor limit to 0.75. How many values can be stored in this table before it is rehashed? 25-22 Review (cont’d): • What is the sequence of values returned by an iterator for a HashSet? • What is the range of values for the hashCode method? • Which method(s) of an object are used to find it in a TreeSet? A HashSet? 25-23