Lecture B - Hashing.ppt

advertisement
Hashing
Jeff Chastine
Hashing
• Many applications require INSERT, SEARCH and
DELETE functions
• Hashing on average time can do all of these in
O (1)
• Based on keys
• Falls under two general categories:
– Direct-Address Tables
– Hash Tables
Jeff Chastine
Direct-Addressing
• Good for when universe U of keys is small
– U = {0, 1, …, m – 1 | m is not large}
– All elements have unique keys
• Table T [0..m -1] | each slot corresponds to a
key
• All operations take only O (1)
Jeff Chastine
Direct Implementation
0
1
U
(universe of keys)
0
9
1
7
4
K
(actual
keys)
6
key
2
2
3
3
4
2
5
3
5
6
5
7
8
8
9
Jeff Chastine
8
satellite data
Direct-Addressing Operations
DIRECT-ADDRESS-SEARCH (T, k)
return T[k]
DIRECT-ADDRESS-INSERT (T, x)
T[key[x]] ← x
DIRECT-ADDRESS-DELETE (T, x)
T[key[x]] ← NIL
Jeff Chastine
Hash Tables
• What are potential problems with direct addressing?
– |U| may be impractical
– Set of actual keys may be small
– Example SSNs
• Here, hash tables require much less storage
• Only catch: O (1) is average time instead of worst-case !
Jeff Chastine
How it works
• With direct-addressing, something with key k
goes into slot k
• With hashing it goes into h (k) | h is a hash
function
• Hash functions try to “randomize”
• Hash function maps U to T [0..m – 1]
h :U → {0, 1, …, m – 1}
• Instead of |U| values,need only m values
Jeff Chastine
Hash Implementation
T
0
U
(universe of keys)
h (k1)
h (k4)
K
(actual
keys)
k1
k
2
k
k
4
5
h (k2) = h (k5)
k
3
h (k3)
m-1
Jeff Chastine
Collisions
• Have two keys hash to the same slot
• Because |U| > m, pigeon hole principle
– Therefore, collisions must exist
– We often talk of the load factor (α = n/m)
• Pick a good hash function
– Near random, yet deterministic
• Can chain collisions together
– This is where the worst-case comes from
• Can use open addressing
Jeff Chastine
Chaining
T
U
(universe of keys)
k
k7
1
k
4
k
K
(actual
keys)
k1
7
k
2
k
k
4
5
k
k
5
2
k
3
k
3
Jeff Chastine
Hash Functions
• What makes a good hash function?
– Equally likely to hash to any of the m slots
– If keys are random numbers [0 … 1} then take floor
of km
– Convert strings to ASCII to hash?
– Most usually involve mod
Jeff Chastine
Hash Functions
• Division method:
h (k ) = k mod m
• Multiplication method:
Let 0 < A < 1
h (k ) = floor(m (k A mod 1) ) // Fractional part
Jeff Chastine
Open Addressing
• Systematically examine or probe slots until
item is found
• No lists and no elements stored outside the
table; thus α <= 1
• Instead of following pointers, we compute the
sequence
– Instead of fixed order – is based off of key
Jeff Chastine
Kinds of Open Addressing
• Linear Probing
h (k, i ) = (h’ (k ) + i ) mod m
• Quadratic Probing
h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m
• Double Hashing
h (k, i ) = (h1(k ) + i h2(k )) mod m
Jeff Chastine
Jeff Chastine
Download