Data Structures and Algorithms Hashing First Year

advertisement
Data Structures and
Algorithms
Hashing
First Year
M. B. Fayek
CUFE 2010
Hashing
1. What is Hashing?
2. Problems in hashing
3. Collision Resolution Strategies
1. What is Hashing?





Hashing is a quick and efficient searching
technique.
So far, efficiency of search depended on the
number of comparisons
In hashing the keys themselves point
directly to records by applying a hashing
function.
All possible key values are mapped into in
the hash table.
The hashing function is used for search as
well as for storing.
1. What is Hashing?



The hash table is sequential and contiguous.
Each slot is called a bucket.
Buckets may hold more than one key.
1. What is Hashing?

Hashing methods:
 Direct
and Subtraction
 Modulo-division (or division remainder)
using list size ( prime, why?)
 Digit extraction
 Midsquare
 Folding ( fold shift, fold boundary)
 Pseudo random ( seed)
Hashing
1. What is Hashing?
2. Problems in hashing
3. Collision Resolution Strategies
Problems in Hashing



Collision occurs whenever a hash function
maps two distinct keys to the same bucket.
The hashing function must generate bucket
addresses quickly and efficiently, with
minimum collisions.
As the domain of keys is usually larger than
the number of buckets collisions are very
likely to happen no matter how efficient the
hashing function is.
Hashing
1. What is Hashing?
2. Problems in hashing
3. Collision Resolution Strategies
3. Collision Resolution Strategies
 Definitions:
 Load
factor
= list size/num of elements in list
 Clustering ( primary, secondary)
3. Collision Resolution Strategies

Open Addressing: (using prime area)
 Probing
(Linear, quadratic)
 Double Hashing
 Pseudo-random
 Key
offset
Linked Lists (Separate Chaining)
 (Bucket Hashing)
 Re-hashing

3. Collision Resolution Strategies
 Open
Addressing:
 Probing:
Linear Probing: Search at constant
intervals from collision (typically 1)
 Quadratic Probing: Search at quadratically increasing intervals, i.e.
collision function f(i) = i2 ; i.e. on
collision searching 1st, 4th, 9th, …
location

Linear Probing
3. Collision Resolution Strategies

Open Addressing: (using prime area)
 Probing
(Linear, quadratic)
 Double Hashing
 Pseudo-random
 Key
offset
Linked Lists (Separate Chaining)
 (Bucket Hashing)
 Re-hashing

3. Collision Resolution Strategies
Open Addressing
 Double Hashing: Apply a second
hashing function and probe at the
obtained address:

hash2(x), 2* hash2(x), 3* hash2(x), . . .
3. Collision Resolution Strategies

Open Addressing: (using prime area)
 Probing
(Linear, quadratic)
 Double Hashing
 Pseudo-random
 Key
offset
Linked Lists (Separate Chaining)
 (Bucket Hashing)
 Re-hashing

3. Collision Resolution Strategies
 Linked
lists (Separate Chaining):
 Separate
chaining ( may be modified
by keeping the chain sorted!)
 Modified Hash Table (by eliminating
the first probe, hence the hash table
becomes an array of records instead
of an array of pointers to records)
Linked List (Separate Chaining)
3. Collision Resolution Strategies

Open Addressing: (using prime area)
 Probing
(Linear, quadratic)
 Double Hashing
 Pseudo-random
 Key
offset
Linked Lists (Separate Chaining)
 (Bucket Hashing)
 Re-hashing

3. Collision Resolution Strategies
 Rehashing:
When table becomes too full,
operations will start taking too long
 Solution:successful
Buildsearchanother
hashing
unsuccessful
search
table of about double size +
associated hashing function and scan
down entire original hash table

3. Collision Resolution Strategies
 Rehashing:

When is the table too full ?
Rehash when table is half full
 Rehash when an insertion fails
 When table reaches a certain load
factor . . . . . best

End of Hashing
Probing
 Definition:
Each calculation of an address and
test for success is known as
probing
Key offset collision resolution
 Offset
= key/list size
 Address= (Offset + old address) %
list size
Download