CS 1120 Spring 2013 LA 6 Sentence Generation Lab Assignment 6 Sentence Generation Concepts Stacks Pair Programming Model Background Natural language processing is an exciting area that merges the fields of computer science and linguistics. It includes many practical applications from using computers to translate text or speech, to enabling human-computer conversations, to automatically creating a summary of a newspaper article. For your assignment, you will be using a simple grammar to generate random English sentences. The study of formal grammars for languages (both natural and computer) is a complex topic that whole graduate level courses can be devoted to. You will not need to understand all of that. For your assignment, all you need to understand are simple symbol replacement rules. The rules for an example grammar are shown below. Example Grammar: <START> : The <OBJECT> <VERB> tonight. <OBJECT> : big yellow flowers <OBJECT> : slugs <VERB> : sigh <ADVERB> <ADVERB> : warily <ADVERB> : grumpily The symbol on the left-hand side of the colon can be replaced by the symbol or sequence of symbols on the right-hand side of the colon. There are two types of symbols – non-terminals (here enclosed in angle brackets) and terminals. Non-terminals can be further replaced with rules, while terminals remain in their final forms. The symbols used in any given grammar are arbitrary. So your application will not need to worry about what they mean, but only how they can be replaced. Here is an example of how this grammar works (you may want to work this out on paper to understand it better): 1. A sentence starts as the symbol <START> 2. From the first rule, we see that this symbol can be replaced by the string of terminals and nonterminals The <OBJECT> <VERB> tonight. 3. A terminal, like The, does not have any further rules that allow it to be replaced, so it just goes straight to output. 4. A non-terminal, like <OBJECT>, does have more rules. We can see that there are two rules for replacing <OBJECT> – big yellow flowers and slugs CS 1120 Spring 2013 LA 6 Sentence Generation a. We can choose one at random, e.g. big yellow flowers b. These are each terminals, so again they just go straight to output. 5. Similarly, the non-terminal <VERB> can be replaced too, by the string sigh <ADVERB> a. The terminal sigh goes to output b. The non-terminal <ADVERB> must be replaced. There are again two rules for <ADVERB> – warily and grumpily. i. We randomly choose grumpily ii. This is a terminal, so it goes to output 6. The last word in our original rule is the terminal tonight. which goes to output. 7. We have now replaced all non-terminals, and our final output is: The big yellow flowers sigh grumpily tonight. Problem Specification You will be writing an application that generates random sentences according to a specified grammar. The rules for this grammar will be contained in a text file in the format shown above. There will only ever be one symbol on the left-hand side, but one or more symbols on the right-hand side. Each symbol (including the colon) will be separated by a space. You may assume that the symbol which represents a whole sentence (and from where you will start) will always be <START>. Your program must allow the user to choose a text file that includes the grammar to be used. It should read the grammar from the file. Once a file is chosen, the program should allow the user to generate random sentences. With each generated sentence, you must display all the specific rules actually used in generating the sentence, one rule per line. See the example output below: Example Output: CS 1120 Spring 2013 LA 6 Sentence Generation Design Phase Work with your partner, following the design process outlined here. Together, you must submit a hard copy of the Design Phase of your lab report by the end of lab today (only one copy needed for each pair). Note: Your application must use a stack data structure as described below. Basic Structure The stack is the heart of this whole assignment. You will use it as the structure on which to build a sentence from the grammar rules. You can utilize the work you’ve already done with linked lists from the previous assignment with some simple modifications. First of all, the data in your stack will not be BaseballPlayer objects. Instead, the data will consist of Strings. Additionally, stacks are restricted to only two ways of manipulating the list – push and pop. Push inserts data, and pop removes it. These work just like the insert and remove methods you made for the linked list, but they operate at the same end (usually called the top). These are the only methods allowed to insert or remove data. You may include other methods (such as Peek) if you like, but they are not necessary. Show the structure for the StringStack and Node classes in your lab report, including their attributes (variables) and methods. Give a brief description of each method. The other class you should have is the SentenceGenerator class itself. The constructor for this class should take in either the name of the grammar file, or some other data structure (such as an array or list) which contains the grammar rules. Then every time you instantiate a SentenceGenerator, it will be initialized with some set of rules. The class should also have a public method called generate() which generates the sentence using the specified grammar. Finally, it should have two public getter methods: o getUsedRules() which returns a String array of the specific rules actually used in generating the sentence o getFinalSentence() which returns the generated sentence String Show the structure for the SentenceGenerator class and its methods, including brief descriptions, in your lab report. Pseudocode The steps below will ask you to write pseudocode or answer a question. Do this in your lab report. 1. The SentenceGenerator class will generate a sentence by using a stack. The stack will represent the working sentence. Start by pushing the <START> symbol. Then you will repeatedly pop off the top symbol and process it as follows: If the symbol is a terminal, it is simply added to the final sentence. If it is a non-terminal, you must find a matching rule, and use it to replace the symbol. For example, in the above grammar, if the symbol is <VERB>, you will replace it by pushing sigh and <ADVERB> back onto the stack. Note that if there is more than one matching rule, you will need to choose one at random. CS 1120 Spring 2013 LA 6 Sentence Generation Hint: You may want to push the symbols back on the stack in reverse order (the rightmost symbol is pushed first, the leftmost symbol is pushed last) so that the front end of the sentence is always on top. This is not necessary, but otherwise you will end up building the sentence backwards. Before getting to the code (or even pseudocode) for this algorithm, work through the process of generating a sentence using the above grammar by drawing out each step on your lab report. Begin with the following: <START> is pushed onto the stack <START> The <OBJECT> <START> <VERB> tonight. The <OBJECT> <VERB> <OBJECT> tonight. <VERB> tonight. <START> is popped off the stack and replaced with the symbols: The <OBJECT> <VERB> tonight. (in reverse order) The is popped off the stack. Since it is a terminal, it is added to the final sentence output but not replaced on the stack Output so far: The 2. Write pseudocode for the process described in question #1 (hint: use a while loop) 3. When a symbol has more than one possible replace rule, such as <OBJECT>, you will need to randomly choose one to use. How can you find all of the matching rules for a given symbol, and then choose one randomly? Describe this process using pseudocode. Implementation Phase The rest of the assignment will be due in one week at the beginning of your lab. Create a project in Eclipse named LA6name (where name is YOUR last name). In this project, create a package named edu.wmich.cs1120.la6, and the classes from the Design Phase above, along with their attributes and any necessary constructors. Add all of the methods designed earlier, translating your above pseudocode into Java code. Once again, you will be given a simple GUI class which can be used to run your program. The only thing you should need to change in this class is adding code to the button ActionListeners to call your SentenceGenerator methods. This will be indicated in the code. CS 1120 Spring 2013 LA 6 Sentence Generation Testing Phase Your application should work not only for the example grammar given, but for any grammar which follows the format of the text file described above. So you should try modifying the grammar rules for testing purposes. In your lab report, discuss how you tested your project and show your output. Maintenance Phase Discuss what features could be added as the development cycle continues. What are some issues that could potentially arise because the current application does not account for certain situations or behaviors? These can be feature-related or structural weaknesses. Extra Credit The example grammar shown above very simple, and it automatically includes proper capitalization and punctuation. A much larger example grammar will be provided for you which cannot guarantee such a properly formatted output. After generating your sentence, do some post-processing to make the sentence look nice – ensure that the first letter is capitalized and that spaces around punctuation are correct (e.g. no space before a comma, but a space after it). As mentioned above, the grammar symbols are actually very arbitrary, and can range from the very syntax-based, like <noun> and <conjunction>, to the more idea-based, such as <absurd plot premise> or <James Bond villain>. Be creative and write your own grammar file, centered around a specific theme or using a certain style of words. Don’t just modify a few words in the sample grammars. Try to make your grammar produce sentences (or even whole paragraphs) that would be very different from everyone else’s. Pair Programming You will work in two-person teams for this assignment and must utilize the pair programming model to accomplish this task. Please refer to the additional pair programming explanations below as well as the instructions on pair programming received from your course and lab instructors to complete this lab assignment. Two specific and well-defined roles, which require an equal amount of pair member contributions but have different responsibilities, are required: Driver: ‘Drives’ the keyboard and mouse. Sits in front of the keyboard and screen, changes any removable media but does not handle hardcopy documentation. Navigator: Does not touch the computer or any hardware. Handles documentation, makes observations and suggestions, points out potential problems, finds solutions to driver questions. Swapping roles is required. This should be done equally and on a regular basis. Paired programming at a distance is possible (although restricted) through the use of communication software, email, cell phones, etc. CS 1120 Spring 2013 LA 6 Sentence Generation You must keep a pair programming log that describes how you followed the pair programming model. It should describe each session, including who took which role, how much time you spent during that session, and what you worked on as a pair. Each time you switch roles, create a new entry in the log. Assignment Submission Generate a .zip file that contains all of your files, including: o Program Files including any input or output files o The lab report document (including the design phase you completed during lab) Submit the .zip file via E-learning Submit a hard copy of the lab report at the beginning of your next lab Each pair member will submit the full assignment solution, lab report, and pair programming log on Elearning individually, as usual; however, both partners will receive exactly the same grade. Include the names of both partners on all submissions. You only need to submit one paper copy of the lab report.