Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010 Hashing 1. What is Hashing? 2. Problems in hashing 3. Collision Resolution Strategies 1. What is Hashing? Hashing is a quick and efficient searching technique. So far, efficiency of search depended on the number of comparisons In hashing the keys themselves point directly to records by applying a hashing function. All possible key values are mapped into in the hash table. The hashing function is used for search as well as for storing. 1. What is Hashing? The hash table is sequential and contiguous. Each slot is called a bucket. Buckets may hold more than one key. 1. What is Hashing? Hashing methods: Direct and Subtraction Modulo-division (or division remainder) using list size ( prime, why?) Digit extraction Midsquare Folding ( fold shift, fold boundary) Pseudo random ( seed) Hashing 1. What is Hashing? 2. Problems in hashing 3. Collision Resolution Strategies Problems in Hashing Collision occurs whenever a hash function maps two distinct keys to the same bucket. The hashing function must generate bucket addresses quickly and efficiently, with minimum collisions. As the domain of keys is usually larger than the number of buckets collisions are very likely to happen no matter how efficient the hashing function is. Hashing 1. What is Hashing? 2. Problems in hashing 3. Collision Resolution Strategies 3. Collision Resolution Strategies Definitions: Load factor = list size/num of elements in list Clustering ( primary, secondary) 3. Collision Resolution Strategies Open Addressing: (using prime area) Probing (Linear, quadratic) Double Hashing Pseudo-random Key offset Linked Lists (Separate Chaining) (Bucket Hashing) Re-hashing 3. Collision Resolution Strategies Open Addressing: Probing: Linear Probing: Search at constant intervals from collision (typically 1) Quadratic Probing: Search at quadratically increasing intervals, i.e. collision function f(i) = i2 ; i.e. on collision searching 1st, 4th, 9th, … location Linear Probing 3. Collision Resolution Strategies Open Addressing: (using prime area) Probing (Linear, quadratic) Double Hashing Pseudo-random Key offset Linked Lists (Separate Chaining) (Bucket Hashing) Re-hashing 3. Collision Resolution Strategies Open Addressing Double Hashing: Apply a second hashing function and probe at the obtained address: hash2(x), 2* hash2(x), 3* hash2(x), . . . 3. Collision Resolution Strategies Open Addressing: (using prime area) Probing (Linear, quadratic) Double Hashing Pseudo-random Key offset Linked Lists (Separate Chaining) (Bucket Hashing) Re-hashing 3. Collision Resolution Strategies Linked lists (Separate Chaining): Separate chaining ( may be modified by keeping the chain sorted!) Modified Hash Table (by eliminating the first probe, hence the hash table becomes an array of records instead of an array of pointers to records) Linked List (Separate Chaining) 3. Collision Resolution Strategies Open Addressing: (using prime area) Probing (Linear, quadratic) Double Hashing Pseudo-random Key offset Linked Lists (Separate Chaining) (Bucket Hashing) Re-hashing 3. Collision Resolution Strategies Rehashing: When table becomes too full, operations will start taking too long Solution:successful Buildsearchanother hashing unsuccessful search table of about double size + associated hashing function and scan down entire original hash table 3. Collision Resolution Strategies Rehashing: When is the table too full ? Rehash when table is half full Rehash when an insertion fails When table reaches a certain load factor . . . . . best End of Hashing Probing Definition: Each calculation of an address and test for success is known as probing Key offset collision resolution Offset = key/list size Address= (Offset + old address) % list size