Hash_Table

advertisement
Hash Tables
Group Members:
Syed Husnain Bukhari
Ahmad Inam
M.Umair Sharif
SP10-BSCS-92
SP10-BSCS-06
SP10-BSCS-38
Description
• A hash table is a data structure that stores
things and allows insertions, lookups, and
deletions to be performed in O(1) time.
• An algorithm converts an object, typically a
string, to a number. Then the number is
compressed according to the size of the table
and used as an index.
• There is the possibility of distinct items being
mapped to the same key. This is called a
collision and must be resolved.
Key 
Hash Code Generator  Number 
Compression
Smith  7
0
1
2
3
4
5
6
7
8
9
Bob Smith
123 Main St.
Orlando, FL 327816
407-555-1111
bob@myisp.com
 Index
Definition:
Hashing is a key-to-address mapping process.
Terms must be familiarized.
Collision: A collision occurs when a hashing algorithm produces an
address for an insertion key and that address is already occupied.
Home address: The address produced by the hashing algorithm is
known as the home address.
Prime area: The memory that contains all of the home addresses is
known as the prime area.
Probe: Each calculation of an address and test for success is known
as a probe.
Collision Resolution
• There are two kinds of collision resolution:
1 – Chaining makes each entry a linked list so
that when a collision occurs the new entry is
added to the end of the list.
2 – Open Addressing uses probing to discover
an empty spot.
• With chaining, the table does not have to be
resized. With open addressing, the table must
be resized when the number of elements is
larger than the capacity.
Smith  7
Chaining
0
1
2
3
4
5
6
7
8
9
Bob Smith
123 Main St.
Orlando, FL 327816
407-555-1111
bob@myisp.com
Jim Smith
123 Elm St.
Orlando, FL 327816
407-555-2222
jim@myisp.com
Smith  7
Probing
0
1
2
3
4
5
Bob Smith
123 Main St.
Orlando, FL 327816
407-555-1111
bob@myisp.com
6
7
8
9
Jim Smith
123 Elm St.
Orlando, FL 327816
407-555-2222
jim@myisp.com
Hashing Methods
There are eight hashing methods they are:
1: Direct method
2: Subtraction method
3: Modulo-division
4: Mid square
5: Digit extraction
6: Rotation
7: Folding
8: Pseudorandom generation
Hashing Methods
• Direct Method
In direct hashing the key is the address
without any algorithmic manipulation.
Direct hashing is limited, but it can be very
powerful because it guarantees that there
are no synonyms and therefore no
collision.
• Modulo-division Method
This is also known as division remainder method.
This algorithm works with any list size, but a list
size that is a prime number produces fewer
collisions than other list sizes.
The formula to calculate the address is:
Address = key MODULO listsize + 1
Where listsize is the number of elements in the
array.
• Example:
Given data :
Keys are : 137456 214562 140145
137456 % 19 +1 = 11
214562 % 19 + 1 = 15
140145 % 19 + 1 = 2
• Digit-extraction Method
Using digit extraction selected digits are extracted from
the key and used as the address.
Example : Using six-digit employee number to hash to a
three digit address (000-999), we could select the
first, third, and fourth digits( from the left) and use
them as the address.
The keys are:
379452 -> 394
121267 -> 112
378845 -> 388
• Folding Method
Two folding methods are used they are:
1: Fold shift
2: Fold boundary
1: Fold Shift
In fold shift the key value is divided into
parts whose size matches the size of the required
address. Then the left and right parts are shifted
and added with the middle part.
• Fold boundary
In fold boundary the left and right numbers are
folded on a fixed boundary between them and
the center number. The two outside values are
thus reversed.
• Midsquare Method
In midsquare hashing the key is squared and the
address is selected from the middle of the
square number.
Limitation is the size of the key.
Example:
94522 = 89340304: address is 3403
• Rotation Method
Rotation method is generally not used by itself
but rather is incorporated in combination with
other hashing methods.
It is most useful when keys are assigned serially.
• Pseudorandom Hashing
A common random-number generator is shown below.
y= ax + c
To use the pseudorandom-number generator as a
hashing method, we set x to the key, multiply it by the
coefficient a, and then add the constant c. The result is
then divided by the list size, with the remainder being the
hashed address.
Example:
Y= ((17 * 121267) + 7) modulo 307
Y= (2061539 + 7) modulo 307
Y= 2061546
Y=41
Hash Table Uses
• Compilers use hash tables for symbol
storage.
• The Linux Kernel uses hash tables to
manage memory pages and buffers.
• High speed routing tables use hash tables.
• Database systems use hash tables.
Summary
• A hash table is a convenient data structure for
storing items that provides O(1) access time.
• The concepts that drive selection of the key
generation and compression functions can be
complicated, but there is a lot of research
information available.
• There are many areas where hash tables are
used.
• Modern programming languages typically
provide a hash table implementation ready for
use in applications.
References
Knuth, Donald A. The Art of Computer Programming.
Philippines: Addison-Wesley Publishing Company, 1973.
Loudon, Kyle. Mastering Algorithms with C. Sebastopol:
O’Reilly & Associates, 1999
Watt, David A., and Deryck F. Brown. Java Collections.
West Sussex: John Wiley & Sons, 2001
Dewdney, A. K. The New Turing Omnibus. New York:
Henry Holt and Company, 2001
Any Question
Download