Introduction to Hashing - Hash Functions • Sections 5.1, 5.2, and 5.6 1

advertisement
Introduction to Hashing Hash Functions
• Sections 5.1, 5.2, and 5.6
1
Hashing
• Data items stored in an array of some fixed size
– Hash table
• Search performed using some part of the data item
– key
• Used for performing insertions, deletions, and finds in
constant average time
• Operations requiring ordering information not
supported efficiently
– Such as findMin, findMax
2
Hash Table Example
3
Hash Table Applications
• Comparing search efficiency of different data
structures:
– Vector, list: O(N)
– AVL search tree: O(log(N))
– Hash table: O(1) expected time
• Compilers to keep track of declared variables
– Symbol tables
– Mapping from name to id
• On-line spelling checkers
4
Hash Functions
• Map keys to integers (which represent table indices)
– Hash(Key) = Integer
– Evenly distributed index values
• Even if the input data is not evenly distributed
• What happens if multiple keys mapped to the same
integer (same position)?
– Collision management (discussed in detail later)
– Collisions are likely to be reduced if keys are evenly
distributed over the hash table
5
Simple Hash Functions
• Assumptions:
– K: an unsigned 32-bit integer
– M: the number of buckets (the number of entries in a hash
table)
• Goal:
– If a bit is changed in K, all bits are equally likely to change for
Hash(K)
– So that items are evenly distributed in the hash table
6
A Simple Function
• What if
– Hash(K) = K % M
– Where M is of any integer value
• What is wrong?
• Values of K may not be evenly distributed
– But Hash(K) needs to be evenly distributed
• Suppose
– M = 10,
– K = 10, 20, 30, 40
• Then K % M = 0, 0, 0, 0, 0…
7
Another Simple Function
• If
– Hash(K) = K % P, P = prime number
• Suppose
– P = 11
– K = 10, 20, 30, 40
• K % P = 10, 9, 8, 7
• More uniform distribution…
• So hash tables often have prime number of entries
8
A Simple Hash for Strings
unsigned int Hash(const string& Key) {
unsigned int hash = 0;
for (int j = 0; j != Key.size(); ++j) {
hash += Key[j]
}
return hash;
}
• Problem: Small sized keys may not use a large
fraction of a large hash table
9
Another Simple Hash Function
unsigned int Hash(const string& Key) {
return Key[0] + 27*Key[1] + 729*Key[2];
}
• Problem: English does not use random strings; so,
the hash values are not uniformly distributed
– Using more characters of the key can improve the hash
function
10
A Better Hash Function
unsigned int Hash(const string &Key) {
unsigned int hash = 0;
for (int j = 0; j != Key.size(); ++j)
hash = 37*hash + (Key[j]-’a’+1);
return hash%TableSize;
}
• The for loop computes ai37n-i using Horner’s rule,
where ai has the value 1 for ‘a’, 2 for ‘b’, etc
– a3 + 37a2 + 372a1 + 373a0 = 37(37(37a0 + a1)+ a2) + a3
• The for implicitly performs arithmetic modulo 2k,
where k is the number of bits in an unisigned int
11
STL Hash Tables
• STL extensions
– hash_set
– hash_map
• The key type, hash function, and equality
operator may need to be provided
• Available in new standard as unordered set
and map
– <tr1/unordered_map> or <unordered_map>
– <trl/unordered_set> or <unordered_set>
• Example: Lec24/hashmapex.cpp
– Reference
•
www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html
12
Download