114-37.1 COMP 114, Fall 2001 Program 5: Jumble unscrambler using backtracking Assigned Due November 28, 2001 December 7, 2001 Turn your program in at your recitation or at the Sitterson Hall front desk. Programs will be accepted until 4:30pm. No late programs will be accepted. Objectives • Use backtracking to solve a problem. • Use pruning to make backtracking more efficient. • Use your word list objects to solve a real problem. Word Jumbles Word Jumbles appear on the comics page of many newspapers. They consist of four scrambled words. You have to unscramble the words and then, using the letters in the unscrambled words, solve another puzzle (often a dumb play on words). For example, one day the jumbled words were NUGOY, UNORM, ENVEAL, and NURTAT. These unscrambled to YOUNG, MOURN, LEAVEN, and TRUANT. The second puzzle, which used the letters in the underlined positions of the unscrambled words, was "What the actor hoped the play about a marathon would have." The answer is LONG RUN. We will solve only the first part: unscrambling the words. See http://puzzles.webpoint.com/puzz/games/1,1037,heraldsun-jclassic,00.html for an example. Program Write a program that solves word jumbles using backtracking with pruning. Your program will first read in the dictionary and construct 208 (26x8) word lists for 3, 4, 5, 6, 7, 8, 9, and 10-letter words using linked lists. Just as we did in program 4, each word list will contain only words of the appropriate length and that begin with the appropriate first letter. Then the program will accept strings from the user and for each, display all the valid words that can be formed by permuting the letters of that string. For some jumbled strings, there may be more than one valid word. For example entering “opst” should generate: “stop,” “post,” and “spot.” Some solutions won’t be found because the dictionary does not contain all inflected forms of words. For example, “pots” will not appear as a solution to “opst” because while “pot” is in the dictionary, its plural, “pots,” is not. Another example: the dictionary contains “hide” but neither “hides” nor “hiding.” Don’t worry about this; your program should generate only those solutions that are in the dictionary. 114-37.2 Input Your program should read the full (25,000 word) dictionary file. Valid words (all letters, and length within the specified bounds) should be inserted into the appropriate word list. Invalid words should be discarded. The rest of the input will come from the keyboard and will consist of a series of lines, each line is a string terminated by an <ENTER>. The line “$QUIT” terminates the program. Input strings can contain any characters. Uppercase letters should be converted to lower case; strings with non-letters and strings that are too short or too long should be rejected. We will be testing the programs automatically, so it is essential that you do not require any other input besides the strings to be unjumbled and the $QUIT. Output Display something interesting while you are reading the dictionary so that we know that your program is progressing, but too much output will slow down the program. Displaying a dot for every thousand words read is sufficient. Or you could display every thousandth word read. For each jumbled word entered, display all the solutions plus the number of calls to the filter (okToAdd) and the number of calls to the backtrack method that were required. If the word is invalid (out of range or containing invalid characters), display an appropriate error message. Superset Given a jumbled string w with length n, the superset is the set of all strings of lower case letters whose length is n. Hence the cardinality of the superset is 26n; a really big number! n 26n 1 26 2 676 3 17,576 4 456,976 5 11,881,376 6 308,915,776 7 8,031,810,176 8 208,827,064,576 Even if we could test a million strings per second (which we can’t), trying to unjumble an eight-letter string would take more than two days. But we can do better with pruning. For example, I was able to solve “dfntyeii” (whose only solution is “identify”) in less than one second with only 2,158 calls to the filter and 83 calls to the backtracter. This is about one hundred million times faster than without pruning. 114-37.3 Algorithm Use the "test before extend" version of the backtrack algorithm. Reassurance This is actually a very short program. My backtracking method is 16 lines long (of which 6 are brackets); my filter is 11 lines long (exclusive of comments, assertions). The lesson here is that backtracking is really pretty easy, and if it weren’t so *!%@* slow, then it would be a very attractive problem solving strategy. Approved shortcut It is ok to use global variables to count the number of calls to the filter and to the backtracker. You can also count calls with parameters, but that is much messier. Word, WordNode, and WordList classes You will use the Word, WordNode, and WordList classes written for Program 4. Examples of these classes will be put on the web on November 29. Feel free to use them at no penalty. Magic numbers Magic numbers Your program should have no magic numbers. In particular, the maximum and minimum valid word lengths should be implemented as constants. You should be able to change your program to allow words in the range 2...11 in no more than a minute. Extra challenge, small extra credit (overnight run) Run the dictionary against itself and find all the words that can be rearranged to form at least two other dictionary words. For example, “eastern” has two other anagrams in the dictionary; “beard” has three other anagrams: “eastern” “beard” 114-37.4 “earnest” “nearest” “bread” “debar” “debra” Some of the dictionary words, such as “debra” are proper nouns or other unusual strings. Don’t worry. This is good program to run overnight, sending the output to a file. Which words have the most anagrams?