Hashing Jeff Chastine Hashing • Many applications require INSERT, SEARCH and DELETE functions • Hashing on average time can do all of these in O (1) • Based on keys • Falls under two general categories: – Direct-Address Tables – Hash Tables Jeff Chastine Direct-Addressing • Good for when universe U of keys is small – U = {0, 1, …, m – 1 | m is not large} – All elements have unique keys • Table T [0..m -1] | each slot corresponds to a key • All operations take only O (1) Jeff Chastine Direct Implementation 0 1 U (universe of keys) 0 9 1 7 4 K (actual keys) 6 key 2 2 3 3 4 2 5 3 5 6 5 7 8 8 9 Jeff Chastine 8 satellite data Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ← x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ← NIL Jeff Chastine Hash Tables • What are potential problems with direct addressing? – |U| may be impractical – Set of actual keys may be small – Example SSNs • Here, hash tables require much less storage • Only catch: O (1) is average time instead of worst-case ! Jeff Chastine How it works • With direct-addressing, something with key k goes into slot k • With hashing it goes into h (k) | h is a hash function • Hash functions try to “randomize” • Hash function maps U to T [0..m – 1] h :U → {0, 1, …, m – 1} • Instead of |U| values,need only m values Jeff Chastine Hash Implementation T 0 U (universe of keys) h (k1) h (k4) K (actual keys) k1 k 2 k k 4 5 h (k2) = h (k5) k 3 h (k3) m-1 Jeff Chastine Collisions • Have two keys hash to the same slot • Because |U| > m, pigeon hole principle – Therefore, collisions must exist – We often talk of the load factor (α = n/m) • Pick a good hash function – Near random, yet deterministic • Can chain collisions together – This is where the worst-case comes from • Can use open addressing Jeff Chastine Chaining T U (universe of keys) k k7 1 k 4 k K (actual keys) k1 7 k 2 k k 4 5 k k 5 2 k 3 k 3 Jeff Chastine Hash Functions • What makes a good hash function? – Equally likely to hash to any of the m slots – If keys are random numbers [0 … 1} then take floor of km – Convert strings to ASCII to hash? – Most usually involve mod Jeff Chastine Hash Functions • Division method: h (k ) = k mod m • Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine Open Addressing • Systematically examine or probe slots until item is found • No lists and no elements stored outside the table; thus α <= 1 • Instead of following pointers, we compute the sequence – Instead of fixed order – is based off of key Jeff Chastine Kinds of Open Addressing • Linear Probing h (k, i ) = (h’ (k ) + i ) mod m • Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m • Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine Jeff Chastine