Introduction to Hashing Hash Functions • Sections 5.1, 5.2, and 5.6 1 Hashing • Data items stored in an array of some fixed size – Hash table • Search performed using some part of the data item – key • Used for performing insertions, deletions, and finds in constant average time • Operations requiring ordering information not supported efficiently – Such as findMin, findMax 2 Hash Table Example 3 Hash Table Applications • Comparing search efficiency of different data structures: – Vector, list: O(N) – AVL search tree: O(log(N)) – Hash table: O(1) expected time • Compilers to keep track of declared variables – Symbol tables – Mapping from name to id • On-line spelling checkers 4 Hash Functions • Map keys to integers (which represent table indices) – Hash(Key) = Integer – Evenly distributed index values • Even if the input data is not evenly distributed • What happens if multiple keys mapped to the same integer (same position)? – Collision management (discussed in detail later) – Collisions are likely to be reduced if keys are evenly distributed over the hash table 5 Simple Hash Functions • Assumptions: – K: an unsigned 32-bit integer – M: the number of buckets (the number of entries in a hash table) • Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K) – So that items are evenly distributed in the hash table 6 A Simple Function • What if – Hash(K) = K % M – Where M is of any integer value • What is wrong? • Values of K may not be evenly distributed – But Hash(K) needs to be evenly distributed • Suppose – M = 10, – K = 10, 20, 30, 40 • Then K % M = 0, 0, 0, 0, 0… 7 Another Simple Function • If – Hash(K) = K % P, P = prime number • Suppose – P = 11 – K = 10, 20, 30, 40 • K % P = 10, 9, 8, 7 • More uniform distribution… • So hash tables often have prime number of entries 8 A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) { hash += Key[j] } return hash; } • Problem: Small sized keys may not use a large fraction of a large hash table 9 Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1] + 729*Key[2]; } • Problem: English does not use random strings; so, the hash values are not uniformly distributed – Using more characters of the key can improve the hash function 10 A Better Hash Function unsigned int Hash(const string &Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) hash = 37*hash + (Key[j]-’a’+1); return hash%TableSize; } • The for loop computes ai37n-i using Horner’s rule, where ai has the value 1 for ‘a’, 2 for ‘b’, etc – a3 + 37a2 + 372a1 + 373a0 = 37(37(37a0 + a1)+ a2) + a3 • The for implicitly performs arithmetic modulo 2k, where k is the number of bits in an unisigned int 11 STL Hash Tables • STL extensions – hash_set – hash_map • The key type, hash function, and equality operator may need to be provided • Available in new standard as unordered set and map – <tr1/unordered_map> or <unordered_map> – <trl/unordered_set> or <unordered_set> • Example: Lec24/hashmapex.cpp – Reference • www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html 12