Homework on Hash Tables Key 1. What are the characteristics of a “good” hash function? Computation is fast and easy, it minimizes collisions, and it should be uniform. 2. Describe each of the following hash functions: a. Division method hash(x) = x modulo tableSize. The tableSize must be a prime number so that the hash function is uniform thus reducing collisions b. Extraction hash(12345678) = 18 (extract the first and last digit). Selecting digits from the key. The digits are not necessarily successive. You should select digits that will vary among the group of keys to reduce collision. c. Shift Folding Divide the key into groups of digits, then add then shifts each group under the other, then add the groups together. 3. Define the following: a. Collision A collision occurs when key1 key2 but h(key1) = h(key2). b. perfect hash function A perfect hash function has zero collisions. c. uniform hash function A uniform hash function is a hash function that is equally likely to select any index in the table. In other words, if k is an index in the table, the probability that k will be selected is 1/tablesize. d. open addressing A collision resolution method such that when one key collides with another, the collision is resolved by finding an address that is open (no key is stored there). e. primary clustering Primary clustering is a problem with linear-probing such that the table contains groups of consecutively occupied locations after several collisions have occurred. f. secondary clustering Secondary clustering is a problem with quadratic-probing such that when two items hash into the same location, the same probe sequence is used for each item when a collision occurs. g. (load factor) The load factor is defined to be (Current number of table items) / table size. Load factor is a measure of how full the hash table is. 4. If a table is 4/5 full, what is the approximate average number of comparisons that a search requires (for a successful search) for linear probing? ½[1 + 1 / (1- (4/5))] = 3 For quadratic probing? -loge(1-4/5) / (4/5) = 2.012 5. If h(x) = x mod 7 and chaining resolves collisions, what does the hash table look like after the following insertions occur: 8, 10, 24, 15, 32, 17? 0 1 2 3 4 5 6 15 8 17 24 10 32 6. Redo #5 if linear probing is used to resolve collisions. empty 0 1 8 2 15 3 10 4 24 5 32 6 17 7. Redo #5 if quadratic probing is used to resolve collisions. 0 17 1 8 2 15 3 10 4 24 5 32 6 empty 8. What is the load factor of the table in #6? Load factor = (current number of table items) / table size = 6/7 = .857 9. Suppose shift folding is used for the hash function and the table size is 100. Where would the following keys be placed in the table? Assume chaining is used to resolve collisions. 41389217, 21634289, 15161718, 42356117 41389217 = 41|38|92|17 will go into index 41+38+92+17=188 so 188%100 = 88 of the array. 21634289 = 21|63|42|89 will go into index 21+63+42+89=215 so 215%100 = 15 of the array. 15161718 = 15|16|17|18 will go into index 15+16+17+18=66 of the array. 42356117 = 42|35|61|17 will go into index 42+35+61+17=155 so 155%100 = 55 of the array. No collisions needed to be resolved using shift folding. 10. Redo #9 if folding on the boundaries is used for the hash function and the table size is 100. 41389217 = 41|38|92|17 will go into index 41+83+92+71=287 so 287%100 = 87 of the array. 21634289 = 21|63|42|89 will go into index 21+36+42+98=197 so 197%100 = 97 of the array. 15161718 = 15|16|17|18 will go into index 15+61+17+81=174 so 174%100 = 74 of the array. 42356117 = 42|35|61|17 will go into index 42+53+61+71=227 so 227%100 = 27 of the array. No collisions needed to be resolved using shift folding. 11. Write pseudocode for the table operation TableDelete when the implementation uses hashing and chaining is used to resolve collisions. This is not pseudocode – it is code using STL lists. template<class T> void HashTable<T>::delete(int key, bool found) { int index = h(key); list<T>::iterator itr = items[index].begin(); while(itr != items[index].end() && itr->getKey() != key) ++itr; if(itr != items[index].end() ) { found = true; items[index].erase(itr); return; } found = false; return; } 12. What table size should be used if the division method is used as the hash function? A prime number should be used as the table size if the division method is used as the hash function. Why? A prime number is used to reduce collisions when the division method is used as the hash function. 13. What table size should be used if quadratic probing is used for collision resolution? When quadratic probing is used for collision resolution a prime number in the form of 4k +3 for some number k is used as the table size. Why? This is done so that all indices in the table are considered during collision resulution.