CS 3100 Programming Project #4

advertisement
CS 3100 Programming Project #4
Fall 2001
HASHING SOCIAL SECURITY NUMBERS
DUE DATES:

Level 03 Program is due Monday, December 3. Submit all your source files plus a
script showing your test runs. You must demonstrate adequate testing of the
program. Use the same procedure we have been using to send your files to me
electronically. Send your file as an email attachment before midnight.

Final Level Program is due Monday, December 10. Submit all your source files
plus a script showing your test runs. You must demonstrate adequate testing of the
program. Use the same procedure we have been using to send your files to me
electronically. Send your file as an email attachment before midnight.
Write a program that implements a hash table of social security numbers.
The lists of social security numbers will be provided to you. Except for the input file of
social security numbers, all input will come from standard input and all output will go to
standard output. Here are the contents of a sample input file.
ssn15 – file containing
20
15
259-97-6223 808-93-0709
041-39-9350 397-67-7732
927-06-2355 087-75-3173
910-26-9707 080-42-6115
15 social security numbers
636-86-3660 399-88-2968
522-23-1899 762-23-8562
296-59-8912 052-61-3718
917-34-3630
Here are some sample (interactive) operations:
s
r
r
i
d
q
917-34-3630
911-34-3623
911-43-6298
808-93-0709
The first number in the input file (20 in the example) is an integer, minTableSize. After
reading it the program will calculate the table size, and allocate a hash table of that size.
The table size must be the smallest prime number not less than minTableSize. In the
example above, the table size will be 23.
Page 1 of 4
The second number in the input (15 in the example) is an integer, numKeys. After
reading numKeys, the program will read numKeys keys (social security numbers) from
the input, formatted as shown, and insert each one into the hash table. (No two of these
keys are allowed to be equal.) There must be exactly numKeys keys following numKeys.
In the example, there are exactly 15 keys.
The program will calculate the hash address by treating the key as a 9-digit number N and
taking N % tableSize as the address. In the example above, 041-39-9350 hashes to the
address
41399350 % 23 = 17
N.B.: C++ interprets an integer with a leading zero as an octal constant, and so the
expression above would be incorrect if written as 041399350 % 23 = 17.
The program will use counters to keep track of how many keys have hashed to each
possible address. The possible addresses are 0..tableSize-1. Note that these counter values
are independent of the collision resolution policy. When a key is hashed, a counter
corresponding to the original hash address is incremented, regardless of where the
program decides to actually place the key.
After processing the numKeys key values, the program will output a tabular report
showing how many of the locations in the hash table had their counter end up equal to
0, 1, 2,.. up to the largest size of any counter. The report will also compare that data with
the numbers predicted by the "perfect randomizer" expression:
exp(-alpha) * (alpha ** k) / k!
This expression, simply put, is the expected frequency of locations in the table whose
counter stops at k, if the hash function has no "bias." The frequency is expressed as a
decimal fraction of the table size. The quantity alpha is the "load factor", i.e., the ratio of
numKeys to tableSize.
For example, suppose that minTableSize is 1000 and numKeys is 800. Then tableSize is
1009, alpha = 800/1009 = 0.793, and for k = 3 the formula exp(-alpha) * (alpha ** k) / k!
= (0.453) * (0.498) / 6 = 0.038 (approximately).
Thus, if we are hashing 800 keys into a table with 1009 slots, we would expect that a
hash function that is a perfect randomizer would result in about 38 positions with counter
equal to 3.
Your program's output will show the number 38 and the actual number of locations that
had counter size 3, so that a comparison of the experimental and ideal values can be
made. The report will be set up so that this comparison can be done for all counter sizes
that occur.
Page 2 of 4
One good way to do this is to print a 4-column report: first column is for the number k,
second column for the predicted values, third column for the actual values, and fourth
column expressing the actual value as a percentage of the ideal value.
After printing the tabular report, the program will process commands coming from the
input. The possible commands are s, r, i, d, and q.
The s command stands for show. In response to this command, the program will print a
representation of what is in the hash table. This should be a two-column report: first
column is table indices, and second column is key(s) stored in the location given by the
index.
A couple of things to note here: First, in this case you will be reporting on the locations
where the keys are actually stored. If you are using open addressing, this is not
necessarily the address the key hashes to initially. Second, if you are doing chaining,
there could be more than one key in a given slot.
The s command is not meant to be used for large tables, so just write the code so it uses
one line for each occupied array slot. Also, write the code so it entirely skips any empty
slot. No index is written in that case and no line is used up on the output. The program
has to write the social security number in "standard form" -- that is with dashes in the
right places.
The r command stands for retrieve. This command requires a single social security
number (a key) to appear right after the r. The program will find the key in the table,
report which address the key hashes to, the current value of the counter for that address,
and also which address the key is actually stored in. Each of these three pieces of
information must be appropriately labeled. For example:
hash address: 4 ; counter: 3 ; location: 6
If the key is not found, an appropriate message must be printed.
The i command stands for insert. This command, just like r, requires a key value next in
the input. The program will insert the key into the table, unless the table already contains
such an element, or unless the table is full. If insertion fails the program will print an
appropriate message. If insertion succeeds the program will print the same three pieces of
information as the r command. Of course the counter value will be the new value: one
greater than the old value, because of the insertion.
The d command stands for delete. This command requires a parameter just like r and i.
In this case the program will delete the element if it is present. Like r and i, it will print
the hash address, new counter value, and actual location. If the operation fails because the
key was not found (the only valid reason) then the program will print an appropriate
message.
Page 3 of 4
The q command stands for quit.
There can be any number of s, r, i, and d commands after the initial input. They can
appear mixed into any order. The numbers minTableSize and numKeys will both be
positive integers and numKeys will be less than minTableSize.
ECHOING
Except for the list of keys that comes after numKeys, the program will echo each line of
input before writing the output called for by that line of input. The output for different
commands will be appropriately delimited by lines of pound signs (#) or some such
character.
CALCULATION OF PRIMES
In this program you need to have a function that inputs a number m and returns the
smallest prime number p >= m. We can discuss in class too, and I can give you some
simple pseudo code.
DYNAMIC ALLOCATION OF ARRAYS
This program has to allocate space for the hash table based on the size value from the
input. See page 171 in your textbook for information on how to allocate an array
dynamically.
CHOICE OF IMPLEMENTATION
If you want to save time writing this program, consider using chaining as your collision
resolution policy. Unless you get a lot of help by reusing classes from somewhere,
chaining is probably the easiest way to implement. Deletion is a little complex when open
addressing is used. You can use the linked list class from the textbook to implement
chains.
Page 4 of 4
Download