Questions

advertisement
CPSC 319
Assignment 3
Hash Tables
You are to create a hash table to store the words contained in an input file; the resulting
table must be at least 70% full. Create and implement your own hash function that
minimizes the number of collisions; experiment until you find a function that gives you at
least 30% hashing efficiency. You must use open addressing and linear probing to resolve
collisions. Your program must also calculate and print out statistics about its efficiency
for the given input.
Create a Java program that does the following:
1. Read in a set of words from the input file one at a time, in a loop.
2. Use your hash function to calculate the table address for the word, and then insert it
into the table. Use open addressing and linear probing to resolve any collisions.
3. Once all the words have been inserted into the hash table, reread the input file, using
each word as a search item, and search for the word in the table; you will use the
same hash function and linear probing technique as above. As you do the search, keep
track of the total number of reads you make as you search through the table, as well
as the longest chain of reads you make as you search for a word. Calculate and print
out to the output file the average number of reads per record, the load factor, the
hashing efficiency, and the size of the longest chain when searching.
The program will be invoked from the command line as follows:
java Assign3 input output
where Assign3 is the name of the file containing executable bytecode for your program,
input is the name of the input text file containing the list of words, and output is the name
of the text file containing the statistics. If the command line arguments are improperly
specified, the program should print out a "usage" message and abort. Be sure the
specified files can be opened or created.
Create your own input files to test your code. An official input file will be made available
to the class before the due date; use this file to create your output file. You must program
all the data structures from scratch; in other words, do not use any classes from the Java
libraries to implement the hash table.
Bonus (optional)
Create a second version of your program where you use quadratic probing instead of
linear probing. Use the command-line flag -q to indicate this when invoking your
program. Print out the second set of statistics and compare this technique with linear
probing. Is the performance better or worse?
Questions
1. Briefly describe your hash function and how it works.
2. Analyze the quality of your hash function based on the statistics reported. How could
you improve it?
3. Compare your hash function with an ideal hash function. How close is it to ideal
performance?
4. If the input data set contained numerals (for example, student ID numbers) instead of
alphabetical characters, how would it influence the performance of your hash
function? Justify your answer with a brief analysis.
Submit electronically:
1.
2.
3.
4.
The readme file.
Your source code files.
A printout of the output file(s).
The answers to the above questions.
Collaboration
The assignment must be done individually so everything that you hand in must be your
original work, except for the code copied from the text or course directory, or that
supplied by your TA. When someone else's code is used like this, you must acknowledge
the source explicitly. Copying another student's work will be deemed academic
misconduct. Contact your TA if you have problems getting your code to work.
Download