Word Jumbles

advertisement
114-37.1
COMP 114, Fall 2001
Program 5: Jumble unscrambler using backtracking
Assigned
Due
November 28, 2001
December 7, 2001 Turn your program in at your recitation or at the
Sitterson Hall front desk. Programs will be accepted until 4:30pm. No
late programs will be accepted.
Objectives
• Use backtracking to solve a problem.
• Use pruning to make backtracking more efficient.
• Use your word list objects to solve a real problem.
Word Jumbles
Word Jumbles appear on the comics page of many newspapers. They consist of four
scrambled words. You have to unscramble the words and then, using the letters in the
unscrambled words, solve another puzzle (often a dumb play on words). For example,
one day the jumbled words were NUGOY, UNORM, ENVEAL, and NURTAT. These
unscrambled to YOUNG, MOURN, LEAVEN, and TRUANT. The second puzzle,
which used the letters in the underlined positions of the unscrambled words, was "What
the actor hoped the play about a marathon would have." The answer is LONG RUN. We
will solve only the first part: unscrambling the words.
See http://puzzles.webpoint.com/puzz/games/1,1037,heraldsun-jclassic,00.html for an example.
Program
Write a program that solves word jumbles using backtracking with pruning. Your
program will first read in the dictionary and construct 208 (26x8) word lists for 3, 4, 5, 6,
7, 8, 9, and 10-letter words using linked lists. Just as we did in program 4, each word list
will contain only words of the appropriate length and that begin with the appropriate first
letter. Then the program will accept strings from the user and for each, display all the
valid words that can be formed by permuting the letters of that string. For some jumbled
strings, there may be more than one valid word. For example entering “opst” should
generate: “stop,” “post,” and “spot.” Some solutions won’t be found because the
dictionary does not contain all inflected forms of words. For example, “pots” will not
appear as a solution to “opst” because while “pot” is in the dictionary, its plural, “pots,” is
not. Another example: the dictionary contains “hide” but neither “hides” nor “hiding.”
Don’t worry about this; your program should generate only those solutions that are in the
dictionary.
114-37.2
Input
Your program should read the full (25,000 word) dictionary file. Valid words (all letters,
and length within the specified bounds) should be inserted into the appropriate word list.
Invalid words should be discarded. The rest of the input will come from the keyboard
and will consist of a series of lines, each line is a string terminated by an <ENTER>. The
line “$QUIT” terminates the program. Input strings can contain any characters.
Uppercase letters should be converted to lower case; strings with non-letters and strings
that are too short or too long should be rejected. We will be testing the programs
automatically, so it is essential that you do not require any other input besides the strings
to be unjumbled and the $QUIT.
Output
Display something interesting while you are reading the dictionary so that we know that
your program is progressing, but too much output will slow down the program.
Displaying a dot for every thousand words read is sufficient. Or you could display every
thousandth word read.
For each jumbled word entered, display all the solutions plus the number of calls to the
filter (okToAdd) and the number of calls to the backtrack method that were required. If
the word is invalid (out of range or containing invalid characters), display an appropriate
error message.
Superset
Given a jumbled string w with length n, the superset is the set of all strings of lower case
letters whose length is n. Hence the cardinality of the superset is 26n; a really big
number!
n
26n
1
26
2
676
3
17,576
4
456,976
5
11,881,376
6
308,915,776
7
8,031,810,176
8
208,827,064,576
Even if we could test a million strings per second (which we can’t), trying to unjumble an
eight-letter string would take more than two days. But we can do better with pruning.
For example, I was able to solve “dfntyeii” (whose only solution is “identify”) in less than
one second with only 2,158 calls to the filter and 83 calls to the backtracter. This is about
one hundred million times faster than without pruning.
114-37.3
Algorithm
Use the "test before extend" version of the backtrack algorithm.
Reassurance
This is actually a very short program. My backtracking method is 16 lines long (of which
6 are brackets); my filter is 11 lines long (exclusive of comments, assertions). The lesson
here is that backtracking is really pretty easy, and if it weren’t so *!%@* slow, then it
would be a very attractive problem solving strategy.
Approved shortcut
It is ok to use global variables to count the number of calls to the filter and to the
backtracker. You can also count calls with parameters, but that is much messier.
Word, WordNode, and WordList classes
You will use the Word, WordNode, and WordList classes written for Program 4.
Examples of these classes will be put on the web on November 29. Feel free to use them
at no penalty.
Magic
numbers
Magic numbers
Your program should have no magic numbers. In particular, the maximum and minimum
valid word lengths should be implemented as constants. You should be able to change
your program to allow words in the range 2...11 in no more than a minute.
Extra challenge, small extra credit (overnight run)
Run the dictionary against itself and find all the words that can be rearranged to form at
least two other dictionary words. For example, “eastern” has two other anagrams in the
dictionary; “beard” has three other anagrams:
“eastern”
“beard”
114-37.4
“earnest”
“nearest”
“bread”
“debar”
“debra”
Some of the dictionary words, such as “debra” are proper nouns or other unusual strings.
Don’t worry.
This is good program to run overnight, sending the output to a file. Which words have
the most anagrams?
Download