Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k5 k2 k4 k3 Some properties of hash table Size of hash table (Example will be shown.) Hash function: map keys into index of an array element. (To be continued…) Multiplication Hash Division Hash Input to build a hash table: array of keys to store in the hash table int [] input = {1,2,3,4,5,6,7,8} 1 2 / 3 4 / 5 6 / 7 8 / Example Hash table size is 10 20 110 103 13 10 69 / / 53 / Division Hash (input size, m value) (500, 499) h1(k) = k mod m (1000, 997) Returns the index of array (2000, 1999) (4000, 3989) k is the key Table 1. m is the size of the hash table. Good values of m: prime numbers smaller than and closest to the size of the input. See Table 1. Java syntax of mod is %. Multiplication hash h2(k) = floor(m (kA mod 1) ) m is size of hash table Good values of m: prime numbers smaller than and closest to the size of the input. See table 1. k is key A = 0.61803 (Came from (sqrt(5) - 1)/2 ) Hints: Use the decimal in your program is better, it may reduce your bugs. Collisions When hashing a key, if collision happens the new key is stored in the linked list in that location Number of collisions of a location = Number of elements in that location - 1 # of collisions = 2-1=1 20 110 / # of collisions = 3-1=2 103 13 53 / "the 3 metrics" maxCollisions: Maximum number of collisions of all locations in a hash table minCollisions: Minimum number of collisions of all locations in a hash table totalCollisions: Total collisions of all locations in a hash table Examples on the next slide maxCollisions = 2 minCollisions = 1 (** Note that the minCollisions will be at least 1 if there exists collisions in some locations, even if there are locations with 0 collisions. If there is no collisions at all, return 0. ) totalCollisions = 4 # of collisions = 1 20 110 / # of collisions = 2 103 105 13 15 53 / / # of collisions = 1 Discussion Why metrics? It can tell us which hash is better according to the collision metrics Why 3 metrics, why not just measure totalCollisions? Let’s see an example. Which hash table is better? 20 110 / 103 13 / 103 13 / 103 13 / 20 110 13 Hash table 1: totalCollisions = 4 103 13 / 103 / 13 / Hash table 2: totalCollisions = 4 103 / We not only want less collisions, but also want to distribute the collisions evenly into the hash table. That is why hash table 1 is better than hash table 2. This lab is to implement two hash functions, division and multiplication and use metrics of collisions to demonstrate which hash is better.