The Efficiency of Binary Search with Arrays and Hash Tables

advertisement
CS46B
Project - The Efficiency of Binary Search with Arrays and Hash Tables
You can do this assignment in teams of three with at least two members per team.
The overall goal of this assignment is to see, for hash tables and binary search with ordered arrays, if the actual computer run
times fit the theory described in class and the book. An additional goal is to determine which data structure is best for the
jumble program.
This assignment involves a fairly small variations of your tree Jumble program.
Part 1: replace the TreeSet class with java.util’s HashTable class. Let the key be the word stored in the dictionary and let the
associated data (the “value” in the java documentation) be “” that is a very short string. Use containsKey to look up
permutations in the HashTable. Call this file JumbleHash.java
Part2: Store the dictionary data in an ordinary array. The dictionaries are already ordered so your array will be ordered if you
change every word from a dictionary to lower case before placing it in your array. Be sure to do this. Implement the
binary search method described in chapter 11 (either with recursion as described in the book or with loops as described in
class). Use this search method to look up permutations in the array. Call this file JumbleBinarySearch.java.
To test your code for correctness do the runs --“rigan”, “rownc”, “cotony”, “rahnge”, and “eekp.” You can turn in
"gincrncoonge" (or alternate word chosen as described earlier) but this is not required for this assignment. Turn in your
source code and printouts of these runs. On a disk include JumbleHash.java, JumbleBinarySearch.java and an electronic
version of your report (for example as a word document). If some of the report is not electronic, for example if the graphs are
done by hand, include whatever you have in electronic form. Call your report report.(whatever extension is appropriate for
you format).
The report should compare the timing results for the three jumble programs (Jumble with tree, JumbleHash, and
JumbleBinarySearch). You will need to repeat the timing runs for loading the dictionaries versus the dictionary size and the
timing runs for search for permutations versus dictionary size. You can skip the timing runs that relate time to string length.
Suggested Report Template
1.
Introductory paragraph. Summarize the goals of the report. Don’t include specific details.
2.
Include a paragraph on the time to load a dictionary into a hash table. For simplicity let java automatically select the
hash table size. Let For each of five (or six if you want to use the sixth dictionary, half_dictionary.txt, on my web site)
do three runs. Include a table, a graph, a discussion of what you expect (based on theory in the book and class) the time
for loading to be and a discussion of whether your experimental results support this thesis.
3.
Include a paragraph on the time to load a dictionary into an ordered array. . For simplicity fix the array size big enough
to hold the largest dictionary or, alternately if you wish, use vectors and let java automatically select the vector size. For
each of five (or six if you want to use the sixth dictionary, half_dictionary.txt, on my web site) do three runs. Include a
table, a graph, a discussion of what you expect (based on theory in the book and class) the time for loading to be and a
discussion of whether your experimental results support this thesis. Include a comment on what you expect the load
time would be if the text file was unordered and was placed into an ordered array with insertion sort.
4.
Include a paragraph on the time to search for all permutations using a hash table. For each of five (or six if you want to
use the sixth dictionary, half_dictionary.txt, on my web site) do three runs. Include a table, a graph, a discussion of what
you expect (based on theory in the book and class) the time for searching to be and a discussion of whether your
experimental results support this thesis.
5.
Include a paragraph on the time to search for all permutations using binary search with an ordered array. For each of
five (or six if you want to use the sixth dictionary, half_dictionary.txt, on my web site) do three runs. Include a table,
two graphs (time versus n and time versus log n), a discussion of what you expect (based on theory in the book and
class) the time for searching to be and a discussion of whether your experimental results support this thesis.
6.
Based on your results in this assignment and the previous assignment make a recommendation on which data structure
(tree, ordered array with binary search or a hash table) is best for a program that will solve jumble programs. Include
reasons for your choice. Note that if you are going to compare run times from the last assignment and this one the runs
should all be done on the same computer, if possible.
7.
Have a concluding paragraph. Briefly summarize the highlights of your results.
Download