Midterm will be given in Week 11’s lecture (2 hurs) Hash Tables 1

advertisement
Midterm will be given in Week 11’s
lecture (2 hurs)
Hash Tables
1
Part E
Hash Tables
0
1
2
3
4
Hash Tables

025-612-0001
981-101-0002

451-229-0004
2
Hash Tables
• Questions: how to design a data structure, such that the
following operations are all O(1)?
–
–
–
–
Deletion
Search
Insertion
Replace
Hash Tables
3
Motivations of Hash
Tables
• We have n items, each contains a key and value (k, value).
– The key uniquely determines the item.
•
Each key could be anything, e.g., a number in [0, 232], a
string of length 32, etc.
• How to store the n items such that
given the key k, we can find the position of the item
with key= k in O(1) time.
– Another constraint: space required is O(n).
• Linked list? Space O(n) and Time O(n).
• Array? Time O(1) and space: too big, e.g.,
• If the key is an integer in [0, 2 32], then the space required is 2 32.
• if the key is a string of length 30, the space required is 26 30.
• Hash Table: space O(n) and time O(1).
Hash Tables
4
Basic ideas of Hash Tables
• A hash function h maps keys of a given type with a wide
range to integers in a fixed interval [0, N - 1], where N is
the size of the hash table such that
• if k≠k’ then h(k)≠h(k’)
….. (1) .
Problem:
It is hard to design a function h such that (1)
holds.
What we can do:
• We can design a function h so that with high
chance, (1) holds.
• i.e., (1) may not always holds, but (1) holds for
most of the n keys.
Hash Tables
5
Hash Functions
• A hash function h maps keys of a given type to integers in a
fixed interval [0, N - 1]
• Example:
h(x) = x mod N
is a hash function for integer keys
• The integer h(x) is called the hash value of key x
• A hash table for a given key type consists of
– Hash function h
– Array (called table) of size N
• the goal is to store item (k, o) at index i = h(k)
Hash Tables
6
Example
Hash Tables
0
1
2
3
4

025-612-0001
981-101-0002

451-229-0004
…
• We design a hash table storing
entries as (HKID, Name),
where HKID is a nine-digit
positive integer
• Our hash table uses an array
of size N = 10,000 and the
hash function
h(x) = last four digits of x
• Need a method to handle
collision.
9997
9998
9999

200-751-9998

7
0
1
2
3
4
Example

50xxxx01
50xxxx02
51xxxx02

86xxxx04
…
• We design a hash table storing
entries as (studentID, Name), 97 
where studentID is a eight51xxxx98
98
digit positive integer
99 
Task: to store the n items such
• Our hash table uses an array
that given the key k, we can
of size N = 100 for our class of
find the position of the item
n= 49 students and the hash
with key= k in O(1) time.
As long as the chance for collision
function
is low, we can achieve this
h(x) = last two digits of x
goal.
• Need a method to handle
Setting N=1000 and looking at
collision.
the last three digits will
reduce the chance of collision.
Hash Tables
8
How to design a Hash Function
• A hash function is usually
• The hash code is
specified as the composition of
applied first, and the
two functions:
compression function is
applied next on the
Hash code:
result, i.e.,
h1: keys  integers
h(x) = h2(h1(x))
– key could be anything, e.g., your
name, an object, etc.
• The goal of the hash
Compression function:
function is to “disperse”
h2: integers  [0, N - 1]
the keys in an
– The size of the array N cannot be
apparently random way
too large in order to save space.
so that in most cases (1)
– Trade-off between space and time.
holds.
Hash Tables
9
Integer cast:
– We reinterpret the bits of the key as an integer
Example: characters is mapped to its ASCII code.
A=65=01000001, B=66=01000010, a=97,
b=98,
,=44,
.=46.
– Suitable for keys of length less than or equal to the number of
bits of the integer type (e.g., byte, short, int and float in Java)
Hash Tables
10
Component sum:
– We partition the bits of the key into components of fixed length
(e.g., 16 or 32 bits) a0 a1 … an-1 and we sum the components
(ignoring overflows) a0 + a1 + a2 + … +an-1
– Example 1: AB=0100000101000010
h(AB)= 01000001
+ 01000010
10000011.
Example 2:
h(100000011000001001001000)=
10000001
10000010
01001000
01001011 (ignore overflows)
– Suitable for numeric keys of fixed length greater than or equal to
the number of bits of the integer type (e.g., long and double in
Java)
Hash Tables
11
Compression Functions
• Division:
– h2 (y) = y mod N
– The size N of the hash table is usually chosen to be a
prime
– The reason has to do with number theory and is
beyond the scope of this course
– Example: keys: {200, 205, 210, 215, 220, 600}.
If N=100, 200 and 600 have the same code, i.e., 0.
It is better to choose N=101.
Hash Tables
12
Compression Functions
• Multiply, Add and Divide (MAD):
– h2 (y) = (ay + b) mod N
– a >0 and b>0 are nonnegative integers such that
a mod N  0 and N is a prime number.
Example: Keys={200, 205, 210, 215, 220, 600}.
N=101. a=3 and b=7.
h(200)=(600+7) mod 101 = 607 mod 101=1.
H(205)=(615+7) mod 101 = 622 mod 101=16.
h(210)=(630+7) mod 101 = 637 mod 101=31.
h(215)=(645+7) mod 101 = 652 mod 101=46.
h(220)=(660+7) mod 101 = 667 mod 101=61.
H(600)=(3600 +7) mod 101=3607 mod 101=72.
Hash Tables
13
Collision Handling
• Collisions occur when
different elements are
mapped to the same cell
• Separate Chaining: let
each cell in the table
point to a linked list of
entries that map there
0
1
2
3
4

025-612-0001


451-229-0004
981-101-0004
• Separate chaining is
simple, but requires
additional memory
outside the table
Hash Tables
14
Open Addressing
• The colliding item is placed in a
different cell of the table
• Load factor: n/N, where n is the number of
items to store and N the size of the hash table.
• n/N≤1.
• To get a reasonable performance, n/N<0.5.
Hash Tables
15
Linear Probing
• Linear probing handles
collisions by placing the
colliding item in the next
(circularly) available table
cell
• Each table cell inspected is
referred to as a “probe”
• Colliding items lump
together, causing future
collisions to cause a longer
sequence of probes
• Example:
– h(x) = x mod 13
– Insert keys 18, 41, 22, 44,
59, 32, 31, 73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
41
18 44 59 32 22 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12
Hash Tables
16
Search with Linear Probing
• Consider a hash table A that
uses linear probing
• get(k)
– We start at cell h(k)
– We probe consecutive
locations until one of the
following occurs
• An item with key k is found,
or
• An empty cell is found, or
• N cells have been
unsuccessfully probed
– To ensure the efficiency, if k is
not in the table, we want to
find an empty cell as soon as
possible. The load factor can
NOT be close to 1.
Algorithm get(k)
i  h(k)
p0
repeat
c  A[i]
if c = 
return null
else if c.key () = k
return c.element()
else
i  (i + 1) mod N
pp+1
until p = N
return null
Hash Tables
17
Linear Probing
• Search for key=20.
• Example:
– h(x) = x mod 13
– Insert keys 18, 41, 22, 44,
59, 32, 31, 73, 12, 20 in this
order
– h(20)=20 mod 13 =7.
– Go through rank 8, 9, …, 12, 0.
• Search for key=15
– h(15)=15 mod 13=2.
– Go through rank 2, 3 and
return null.
0 1 2 3 4 5 6 7 8 9 10 11 12
20 41
18 44 59 32 22 31 73 12
0 1 2 3 4 5 6 7 8 9 10 11 12
Hash Tables
18
Updates with Linear Probing
• To handle insertions and
deletions, we introduce a
special object, called
AVAILABLE, which replaces
deleted elements
• remove(k)
• put(k, o)
– We search for an entry with key
k
– If such an entry (k, o) is found,
we replace it with the special
item AVAILABLE and we
return element o
– Else, we return null
– Have to modify other methods
to skip available cells.
Hash Tables
– We throw an exception if the
table is full
– We start at cell h(k)
– We probe consecutive cells until
one of the following occurs
• A cell i is found that is either
empty or stores AVAILABLE,
or
• N cells have been unsuccessfully
probed
– We store entry (k, o) in cell i
19
Updates with Linear Probing
Algorithm put(k,o)
i  h(k); p  0;
av = -1
repeat
• Example:
c  A[i]
– h(x) = x mod 13
if av == -1 and c= 
– Insert keys 18, 41, 22, 44,
A[i].key()=k
59, 32, 31, 73, 20, 12 in this
A[i].element=o
order
return
– Ti insert 12, we look at rank
else if A[i].key()=k
12 and then rank 0.
A[i].element=o
return
else if av > -1 and c= 
A[av].key()=k
0 1 2 3 4 5 6 7 8 9 10 11 12
A[av].element=o
return
else
if av == -1 then av =i
12 41
18 44 59 32 22 31 73 20
i  (i + 1) mod N
0 1 2 3 4 5 6 7 8 9 10 11 12
pp+1
until p = N
Hash Tables
20
if av >-1 then A[av].key()=k; A[av].element=o
A complete example
• Example:
– h(x) = x mod 13
– Insert keys 18, 41, 22, 44,
59, 32, 31, 73, 20, 12 in this
order
– Remove(): 20, 12
– Get(11): check the cell after
AVAILABLE cells.
– Insert keys 10, 11. 10 is at
rank 12 and 11 is at rank 0.
The Available cells are hard to
deal with.
Separate Chaining approach
is simpler.
0 1 2 3 4 5 6 7 8 9 10 11 12
A
A
12 41
18 44 59 32 22 31 73 20
0 1 2 3 4 5 6 7 8 9 10 11 12
Hash Tables
21
Performance of Hashing
• In the worst case, searches,
insertions and removals on a
hash table take O(n) time
• The worst case occurs when all
the keys inserted into the map
collide
• The load factor a = n/N affects
the performance of a hash table
• Assuming that the hash values
are like random numbers, it can
be shown that the expected
number of probes for an
insertion with open addressing
is
1 / (1 - a)
• The expected running
time of all the operations
in a hash table is O(1)
• In practice, hashing is very
fast provided the load
factor is not close to 100%
• Applications of hash
tables:
Hash Tables
– small databases
– compilers
– browser caches
22
Download