Computer Science 2210

advertisement
Computer Science 2210
Lab Assignment 1
Purpose:
The purpose of this assignment is to get additional experience with Visual C++, some STL classes and
features (string, vector, algorithm, etc.) and analysis and design issues. This is in the context of a text
analysis example. The solution should use good object-oriented design.
Assignment:
You will process information from a text file consisting of several paragraphs of text. The program should
allow the user to designate the input file, input the text, isolate the individual tokens (words; punctuation period, comma, exclamation, colon, semicolon, question mark; and new-line characters), and produce a
report. The report should include the original text, counts for the number of words (not all of the tokens),
the number of distinct words, the number of sentences, and the number of paragraphs. It should include
the individual distinct words (in alphabetical order) with the number of times each appears in the text.
Next should be each sentence listed separately followed by the number of words (and other statistics – if
any) in that sentence. The sentence should be formatted appropriately. Finally, there should be a listing of
separate paragraphs with the numbers of words and sentences in each along with the average word length
of the sentences in the paragraph.
Specifications:
Use your Split function from the homework assignment. Make sure it has a signature similar to the
following:
static
vector< string>
Split (const string& strIn, const string& strDelims);
For this assignment, a sentence is a series of tokens ending in a period, question mark, or exclamation
point. For the purpose of this assignment, you may assume that these sentence-ending punctuation marks
will only appear at the end of sentences (e.g., every period ends a sentence rather than an abbreviation
such as Mrs. or Dr.). A paragraph is ended by two consecutive new-line characters or by the end of the
input file.
The report should be placed into a file for printing. The test program should permit the user to choose a
file to process. It should display the information on the screen as well so the report need not be printed
unless the user wishes. The format of the output should make it easy to read and understand.
The output should be formatted to fit an 80-column screen, and it should be neat and easy to read.
Use the STL vector class with appropriate iterators and templated functions from <algorithm> as needed
in this assignment. Though the design details are largely left to you, you should store the text in a vector
of tokens (see the static Split method above). Sentences and paragraphs may be designated by a vector of
pointers (or subscripts) or you may store the text itself in a vector of Sentence objects. The distinct words
may be stored in a vector of objects of an appropriate Word class.
Use exceptions to handle exceptional conditions if needed; use templates if and where appropriate.
Define and use classes (for example: Paragraph, Sentence, DistinctWord, etc.) as appropriate. Re-use
existing code, classes, and functionality where possible. The grade is dependent on how well you design
your solution.
Deliverables:
Submit a zipped copy of your entire project folder plus a sample of the report generated by your program.
You may work in groups of two or three on this assignment, but EVERYONE will be tested on all
aspects of the assignment. Submit only one copy of the assignment, but be sure that the name and
contact information for all team members are in the submittal e-mail message as well as in the
internal documentation for the project.
Grading:
The assignment will be graded on correctness, documentation, design, and user-friendliness. These issues
are inter-related, but their approximate weights are:
Correctness .......................................
Documentation .................................
Design ..............................................
User-friendliness ..............................
Lab assignment 2 – Data Structures
Page 1 of 1
20%
20%
40%
20%
100%
D:\106752056.doc
Download