Homework – 1

advertisement
Homework – 1
Posted: Jan 20, 2011
Due: Feb 1, 2011
Assignment due at the beginning of class. Follow the instructions on the submit form. I will
not accept late assignments.
Objectives
• Develop understanding of object-oriented programming in Java (especially code reuse).
• First look at basic parsing and regular expressions.
• Familiarize yourself with the programming style requirements and the homework submission
process
Related Files
Homework submission form:
https://wwwx.cs.unc.edu/Courses/comp524-s11/lib/exe/fetch.php/homework/submission-form.
pdf
Text of Herman Melvilles novel Moby Dick, or, the whale (Project Gutenberg version):
https://wwwx.cs.unc.edu/Courses/comp524-s11/lib/exe/fetch.php/homework/assignment/
mobydick.txt
Part 1 You will implement a small subset of the functionality, the unix tool grep provides.
grep is used to search for the occurrence of a given string or a given regular expression. For this
assignment you will implement jgrep that will satisfy the following goals:
• jgrep accepts one or two arguments, < string > to search for and a file to search on. If
the second argument is omitted then the program should process standard input stream
(System.in). Find all occurrences of the string in the file or System.in and output :
“< string > Found N times”, where N represents the number of occurrences.
Your implementation of jgrep should handle special cases (either file empty) and error cases
(invalid filenames, either file not readable) gracefully (e.g., display an error message and exit
• Now, add support for two regex operators * and ?.
– *: Matches the preceding element zero or more times. For example, ab*c, can match to
ac, abc, abbc or abbbbc etc.
– ?: Matches the preceding element zero or one time. For example, ab?c, can match to
ac, abc
1
• Final result: jgrep “< string >”|“< regex >” < f ile >. So, I can use jgrep as follows,
jgrep “Ishmael” moby-dick.txt, should display ”Ishmael” Found N times. Or jgrep “Ish*”
moby-dick.txt should display:
“Is” found N times
“Ish” found N times
“Ishh” found N times etc.
You are not allowed to use the built in Java or any other Regexp library, but instead implement
the parsing required for “*” and “+”. You can however use any other string and substring matching
methods.
Part 2 Implement a Java program, named Frequency, that lists the most-frequently used words
in a text file (or, alternatively, the standard input stream) and the number of times that they occur.
Frequency must accept one or two command line arguments. The first argument specifies a limit
on the number of words that should be displayed, e.g., if the first argument is 20, then only the 20
most-frequently occurring words should be displayed. The second argument specifies the name of
a text file. If the second argument is omitted, then the program should process the standard input
stream (System.in). Your implementation of Frequency should handle special cases (file empty) and
error cases (invalid numbers, file not readable) gracefully (e.g., display an error message and exit).
For the purpose of this assignment, a word is simply a sequence of non-whitespace, nonpunctation,
non-bracket characters, i.e., a word contains neither a space, a newline separator, nor one of the
following characters: . :; , ?! − ”0 ()[]. Consecutive words are separated by at least one non-word
character.
Part 3 Use your implementations of jgrep and Frequency to examine Herman Melvilles novel
“Moby Dick”. Specifically, determine
1. the top 20 most-frequently used words are.
2. find words that have atmost one character changed in the end for the top-10 mostly frequently
used words. e.g. Whale vs Whales.
3. how many times are those words are present in the text.
4. how many times the words “harpoon”, “whale”, “blood”, “love”, “peace”, and “flowers”
occur.
Guidelines You may discuss possible approaches with other students, but you cannot share
source code. Hints:
• Make use of the Java Collections Framework and the Iterator/Iterable interfaces.
• Have a look at the StringTokenizer class
Deliverables The Java source code files that implement jgrep and Frequency. Make sure that
your code can be compiled manually with javac. A text file (plain text or PDF) that contains the
answers to Part 3.
2
Grading
• 25 points Functionality. Do jgrep and Frequency work as specified? (This covers Part 3).
• 05 points Is the source code documented?
• 05 points Style. Does the code look like a clean Java program?
• 05 points Is the design properly object-oriented? Is code reused among the two programs?
Style Guide Use a clean and consistent coding style. The standard Eclipse style is ok. Spaces
are preferred over tabulators. See [2] or [1] for detailed style guides that are acceptable.
Make sure you structure classes consistently (i.e., either first attributes, then constructors, followed by methods, or the other way around, but the order should be consistent across classes).
Do not use unstructured control flow constructs (break, continue). However, you may use System.exit() to simplify argument checking in the main method.
Implement proper exception handling (the main method should not throw any exceptions).
Write the program in an object-oriented style. The problem should be solved by collaborating
objects. Make only sparse use of static methods; ideally, main should be the only static method.
Write brief and concise comments. Just get the idea across; do not spend too much time on
documentation.
References
[1] Apache. http://incubator.apache.org/ace/java-coding-style-guide.html.
[2] Sun.
http://developers.sun.com/sunstudio/products/archive/whitepapers/
java-style.pdf.
3
Download