Hashing Techniques

advertisement
Some Hashing Techniques
(used to randomize the relative addresses)
Prepared by Perla P. Cosme
1
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
2
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
3
Prime Number Division Remainder Method
• Similar with % (modulo or mod) function or the
integer division remainder method
• The key of the record is used to apply the hash
function
h(x) = x % PN
where x = primary key of the record
% = mod function
PN = prime number
Prepared by Perla P. Cosme
4
Some notes to ponder
1. Why do we use the modulo function
when we can choose any user-defined
function?
2. Why would we choose a prime number
when we can choose any positive integer
no.?
3. Is the hash function given as h(x) = x %
PN, the only function we can use?
Prepared by Perla P. Cosme
5
Prime Number Division Remainder Method
Notes:
1. Choose PN such that it is the largest among
the prime numbers based from the relative
positions. Why?
2. Relative positions are pre-defined by the
operating system (OS). But for purposes of
illustration, we shall adopt in our class that
our relative position could be any of the form
0..(N-1) positions.
Prepared by Perla P. Cosme
6
Just a simple mental exercise ...
Question:
If the relative positions are labelled as 1..10,
what would be the best choice for a prime
number? Justify your answer.
Prepared by Perla P. Cosme
7
Another simple mental exercise ...
Question:
If the relative positions are labelled as 1..99,
what would be the best choice for a prime
number? Justify your answer.
Prepared by Perla P. Cosme
8
Let’s try this ...
Assuming that there are 100 relative positions
labeled as 0..99, and suppose we have the
following key values: 24964, 25936, 32179,
39652, 40851, 53455, 53758, 54603, 63388,
81347
Questions:
1.Find the relative positions of these records
using the hashing strategy called prime
number division remainder method.
2.Determine the number of synonyms, if any.
Prepared by Perla P. Cosme
9
Answer
Key Values
Relative Positions
24964
25936
32179
39652
40851
53455
53758
54603
63388
81347
35
37
72
76
14
8
20
89
47
61
No. of Synonyms
0
Prepared by Perla P. Cosme
10
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
11
Digit Extraction
This technique is advisable to use if and
only if you have a prior knowledge in the
distribution or placement of digits within the
record’s primary key. Why?
Prepared by Perla P. Cosme
12
Digit Extraction
Algorithm:
1. Lay all the primary keys of all records to be
placed within the relative positions.
2. By cross examination, choose the positions or
columns of digits to be extracted.
3. The relative position of the record is the
concatenated digits from the chosen
columns.
Prepared by Perla P. Cosme
13
Let’s try this ...
Assuming that there are 100 relative positions
labeled as 0..99, and suppose we have the
following key values: 24964, 25936, 32179,
39652, 40851, 53455, 53758, 54603, 63388,
81347
Questions:
1. Find the relative positions of these records using
the hashing strategy called digit extraction. Let us
choose the positions of the chosen digits as the
5th and 3rd.
2. Determine the number of synonyms, if any.
Prepared by Perla P. Cosme
14
Answer
Key Values
Relative Positions
24964
25936
32179
39652
40851
53455
53758
54603
63388
81347
49
69
91
26
18
54
87
36
83
73
No. of Synonyms
0
Prepared by Perla P. Cosme
15
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
16
Folding
Algo:
1. Consider the key values as a sequence of
digits.
2. By “folding” the sequence of digits, we end
up as if we divide the digits into 2.
3. Add up the digits such that the first half of
the digits becomes the first addend while the
second half is composed of the digits
belonging to the other half.
Prepared by Perla P. Cosme
17
Let’s try this ...
Assuming that there are 100 relative positions
labeled as 0..99, and suppose we have the
following key values: 24964, 25936, 32179,
39652, 40851, 53455, 53758, 54603, 63388,
81347
Questions:
1. Find the relative positions of these records using
the hashing strategies called folding. Let us
assume that the demarcation line (or where the
folding is made) is after the 3rd digit.
2. Determine the number of synonyms, if any.
Prepared by Perla P. Cosme
18
Answer
Key Values
Relative Positions
24964
25936
32179
39652
40851
53455
53758
54603
63388
81347
13
95
0
48
59
89
95
49
21
60
No. of Synonyms
1
Prepared by Perla P. Cosme
19
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
20
Radix Conversion
Algorithm: (similar with conversion from one
number system to another number system)
1. With each digit in the primary key, multiply
each digit by powers of the chosen base
number (or radix). The exponent must start
from 0, and it increases as the number of
digits increases.
2. Take the sum of all the products.
3. The last 2 digits of the computed sum is the
relative address.
Prepared by Perla P. Cosme
21
Example
Assume that our radix is 8. The octal
number 12345, when converted to its base 10
will be computed as follows:
123458 = __________10
Prepared by Perla P. Cosme
22
Let’s try this ...
Assuming that there are 100 relative positions
labeled as 0..99, and suppose we have the
following key values: 24964, 25936, 32179,
39652, 40851, 53455, 53758, 54603, 63388,
81347
Questions:
1. Find the relative positions of these records using
the hashing strategies called radix conversion. Let
us assume that the radix is base 12.
2. Determine the number of synonyms, if any.
Prepared by Perla P. Cosme
23
Answer
Key Values
Relative Positions
24964
25936
32179
39652
40851
53455
53758
54603
63388
81347
56
50
1
86
57
5
40
59
36
3
No. of Synonyms
0
Prepared by Perla P. Cosme
24
Some Hashing Techniques
1. Prime Number Division
Remainder Method
2. Digit Extraction
3. Folding
4. Radix Conversion
5. Mid-Square
Prepared by Perla P. Cosme
25
Mid-Square
Algorithm:
As the name implies, the randomization is
done by taking the middle digits, then, square
the middle values. The result is the relative
address of the record.
Prepared by Perla P. Cosme
26
Point of Order
If the relative positions ranges from 0..99,
then take the last 2 digits of the result as the
relative address of the record.
Questions:
1. Why do we take the last 2 digits of the result as
the relative address of the record – why not the
first 2 digits or the middle digits, etc.?
2. If the relative positions are labelled as 0..999,
which digits of the result (of mid-square
operation) is considered as the relative address?
Why?
Prepared by Perla P. Cosme
27
Notes to Ponder
1. It is not advisable to get only one digit as the
middle number. Why?
2. If the number of digits in the key value is
even, which digit positions are considered as
the middle digits? Why?
Prepared by Perla P. Cosme
28
Let’s try this ...
Assuming that there are 100 relative positions
labeled as 0..99, and suppose we have the
following key values: 24964, 25936, 32179,
39652, 40851, 53455, 53758, 54603, 63388,
81347
Questions:
1. Find the relative positions of these records using
the hashing strategies called mid-square. Let us
take the 2nd up to 4th digits as our middle values.
2. Determine the number of synonyms, if any.
Prepared by Perla P. Cosme
29
Answer
Key Values
Relative Positions
24964
25936
32179
39652
40851
53455
53758
54603
63388
81347
16
49
89
25
25
25
25
0
44
56
3
No. of Synonyms
Prepared by Perla P. Cosme
30
That ends our discussion on the different
hashing techniques.
Are there questions?????
Coming up next … rehashing techniques
Prepared by Perla P. Cosme
31
Download