LA6 Assignment Description

advertisement
CS 1120 Spring 2013
LA 6
Sentence Generation
Lab Assignment 6
Sentence Generation
Concepts


Stacks
Pair Programming Model
Background
Natural language processing is an exciting area that merges the fields of computer science and
linguistics. It includes many practical applications from using computers to translate text or speech, to
enabling human-computer conversations, to automatically creating a summary of a newspaper article.
For your assignment, you will be using a simple grammar to generate random English sentences. The
study of formal grammars for languages (both natural and computer) is a complex topic that whole
graduate level courses can be devoted to. You will not need to understand all of that. For your
assignment, all you need to understand are simple symbol replacement rules.
The rules for an example grammar are shown below.
Example Grammar:
<START> : The <OBJECT> <VERB> tonight.
<OBJECT> : big yellow flowers
<OBJECT> : slugs
<VERB> : sigh <ADVERB>
<ADVERB> : warily
<ADVERB> : grumpily
The symbol on the left-hand side of the colon can be replaced by the symbol or sequence of symbols on
the right-hand side of the colon. There are two types of symbols – non-terminals (here enclosed in angle
brackets) and terminals. Non-terminals can be further replaced with rules, while terminals remain in
their final forms. The symbols used in any given grammar are arbitrary. So your application will not need
to worry about what they mean, but only how they can be replaced.
Here is an example of how this grammar works (you may want to work this out on paper to understand it
better):
1. A sentence starts as the symbol <START>
2. From the first rule, we see that this symbol can be replaced by the string of terminals and nonterminals The <OBJECT> <VERB> tonight.
3. A terminal, like The, does not have any further rules that allow it to be replaced, so it just goes
straight to output.
4. A non-terminal, like <OBJECT>, does have more rules. We can see that there are two rules for
replacing <OBJECT> – big yellow flowers and slugs
CS 1120 Spring 2013
LA 6
Sentence Generation
a. We can choose one at random, e.g. big yellow flowers
b. These are each terminals, so again they just go straight to output.
5. Similarly, the non-terminal <VERB> can be replaced too, by the string sigh <ADVERB>
a. The terminal sigh goes to output
b. The non-terminal <ADVERB> must be replaced. There are again two rules for
<ADVERB> – warily and grumpily.
i. We randomly choose grumpily
ii. This is a terminal, so it goes to output
6. The last word in our original rule is the terminal tonight. which goes to output.
7. We have now replaced all non-terminals, and our final output is:
The big yellow flowers sigh grumpily tonight.
Problem Specification
You will be writing an application that generates random sentences according to a specified grammar.
The rules for this grammar will be contained in a text file in the format shown above.
There will only ever be one symbol on the left-hand side, but one or more symbols on the right-hand
side. Each symbol (including the colon) will be separated by a space. You may assume that the symbol
which represents a whole sentence (and from where you will start) will always be <START>.
Your program must allow the user to choose a text file that includes the grammar to be used. It should
read the grammar from the file. Once a file is chosen, the program should allow the user to generate
random sentences. With each generated sentence, you must display all the specific rules actually used in
generating the sentence, one rule per line. See the example output below:
Example Output:
CS 1120 Spring 2013
LA 6
Sentence Generation
Design Phase
Work with your partner, following the design process outlined here. Together, you must submit a hard
copy of the Design Phase of your lab report by the end of lab today (only one copy needed for each pair).
Note: Your application must use a stack data structure as described below.
Basic Structure
The stack is the heart of this whole assignment. You will use it as the structure on which to build a
sentence from the grammar rules. You can utilize the work you’ve already done with linked lists from the
previous assignment with some simple modifications.
First of all, the data in your stack will not be BaseballPlayer objects. Instead, the data will consist of
Strings. Additionally, stacks are restricted to only two ways of manipulating the list – push and pop. Push
inserts data, and pop removes it. These work just like the insert and remove methods you made for the
linked list, but they operate at the same end (usually called the top). These are the only methods allowed
to insert or remove data. You may include other methods (such as Peek) if you like, but they are not
necessary.
Show the structure for the StringStack and Node classes in your lab report, including their attributes
(variables) and methods. Give a brief description of each method.
The other class you should have is the SentenceGenerator class itself.
 The constructor for this class should take in either the name of the grammar file, or some other
data structure (such as an array or list) which contains the grammar rules. Then every time you
instantiate a SentenceGenerator, it will be initialized with some set of rules.
 The class should also have a public method called generate() which generates the sentence
using the specified grammar.
 Finally, it should have two public getter methods:
o getUsedRules() which returns a String array of the specific rules actually used in
generating the sentence
o getFinalSentence() which returns the generated sentence String
Show the structure for the SentenceGenerator class and its methods, including brief descriptions, in
your lab report.
Pseudocode
The steps below will ask you to write pseudocode or answer a question. Do this in your lab report.
1. The SentenceGenerator class will generate a sentence by using a stack. The stack will
represent the working sentence. Start by pushing the <START> symbol. Then you will
repeatedly pop off the top symbol and process it as follows:
If the symbol is a terminal, it is simply added to the final sentence. If it is a non-terminal, you
must find a matching rule, and use it to replace the symbol. For example, in the above grammar,
if the symbol is <VERB>, you will replace it by pushing sigh and <ADVERB> back onto the stack.
Note that if there is more than one matching rule, you will need to choose one at random.
CS 1120 Spring 2013
LA 6
Sentence Generation
Hint: You may want to push the symbols back on the stack in reverse order (the rightmost symbol
is pushed first, the leftmost symbol is pushed last) so that the front end of the sentence is always
on top. This is not necessary, but otherwise you will end up building the sentence backwards.
Before getting to the code (or even pseudocode) for this algorithm, work through the process of
generating a sentence using the above grammar by drawing out each step on your lab report.
Begin with the following:

<START> is pushed onto the stack
<START>
The
<OBJECT>

<START>
<VERB>
tonight.
The
<OBJECT>
<VERB>
<OBJECT>

tonight.
<VERB>
tonight.
<START> is popped off the stack and
replaced with the symbols:
The <OBJECT> <VERB> tonight.
(in reverse order)
The is popped off the stack. Since it is a
terminal, it is added to the final sentence
output but not replaced on the stack
Output so far: The
2. Write pseudocode for the process described in question #1 (hint: use a while loop)
3. When a symbol has more than one possible replace rule, such as <OBJECT>, you will need to
randomly choose one to use. How can you find all of the matching rules for a given symbol, and
then choose one randomly? Describe this process using pseudocode.
Implementation Phase
The rest of the assignment will be due in one week at the beginning of your lab.
Create a project in Eclipse named LA6name (where name is YOUR last name). In this project, create a
package named edu.wmich.cs1120.la6, and the classes from the Design Phase above, along with
their attributes and any necessary constructors. Add all of the methods designed earlier, translating your
above pseudocode into Java code.
Once again, you will be given a simple GUI class which can be used to run your program. The only thing
you should need to change in this class is adding code to the button ActionListeners to call your
SentenceGenerator methods. This will be indicated in the code.
CS 1120 Spring 2013
LA 6
Sentence Generation
Testing Phase
Your application should work not only for the example grammar given, but for any grammar which
follows the format of the text file described above. So you should try modifying the grammar rules for
testing purposes.
In your lab report, discuss how you tested your project and show your output.
Maintenance Phase
Discuss what features could be added as the development cycle continues. What are some issues that
could potentially arise because the current application does not account for certain situations or
behaviors? These can be feature-related or structural weaknesses.
Extra Credit


The example grammar shown above very simple, and it automatically includes proper
capitalization and punctuation. A much larger example grammar will be provided for you which
cannot guarantee such a properly formatted output. After generating your sentence, do some
post-processing to make the sentence look nice – ensure that the first letter is capitalized and
that spaces around punctuation are correct (e.g. no space before a comma, but a space after it).
As mentioned above, the grammar symbols are actually very arbitrary, and can range from the
very syntax-based, like <noun> and <conjunction>, to the more idea-based, such as <absurd plot
premise> or <James Bond villain>. Be creative and write your own grammar file, centered
around a specific theme or using a certain style of words. Don’t just modify a few words in the
sample grammars. Try to make your grammar produce sentences (or even whole paragraphs)
that would be very different from everyone else’s.
Pair Programming
You will work in two-person teams for this assignment and must utilize the pair programming model to
accomplish this task. Please refer to the additional pair programming explanations below as well as the
instructions on pair programming received from your course and lab instructors to complete this lab
assignment.
Two specific and well-defined roles, which require an equal amount of pair member contributions but
have different responsibilities, are required:


Driver: ‘Drives’ the keyboard and mouse. Sits in front of the keyboard and screen, changes any
removable media but does not handle hardcopy documentation.
Navigator: Does not touch the computer or any hardware. Handles documentation, makes
observations and suggestions, points out potential problems, finds solutions to driver questions.
Swapping roles is required. This should be done equally and on a regular basis. Paired programming at a
distance is possible (although restricted) through the use of communication software, email, cell phones,
etc.
CS 1120 Spring 2013
LA 6
Sentence Generation
You must keep a pair programming log that describes how you followed the pair programming model. It
should describe each session, including who took which role, how much time you spent during that
session, and what you worked on as a pair. Each time you switch roles, create a new entry in the log.
Assignment Submission



Generate a .zip file that contains all of your files, including:
o Program Files
 including any input or output files
o The lab report document (including the design phase you completed during lab)
Submit the .zip file via E-learning
Submit a hard copy of the lab report at the beginning of your next lab
Each pair member will submit the full assignment solution, lab report, and pair programming log on Elearning individually, as usual; however, both partners will receive exactly the same grade. Include the
names of both partners on all submissions. You only need to submit one paper copy of the lab report.
Download