Solving Crossword Puzzles - The College of Saint Rose

advertisement
Artificial Intelligence
CIS 342
The College of Saint Rose
David Goldschmidt, Ph.D.
March 6, 2009
Crossword Puzzle Construction

Given:
–
–

Dictionary of valid words
and phrases
Empty crossword grid
Problem:
–
–
Fill the crossword grid such
that all words both across
and down are valid
Assign clues
Crossword Puzzle Construction

Depth-First Search (DFS)
–
Fill in words until a solution is found
or a dead-end is encountered
Backtrack from dead-ends
–
Questions:
–




Where do we start?
What word do we fill in next?
What backtracking strategies do we use?
How do we avoid repetition (boring puzzles)?
Crossword Puzzle Construction

Optimize the DFS:
–
–
Add longer (most constrained) words first
Associate weights with words in dictionary
based on frequency of letters



Friendly crossword puzzle words
include letters: S, R, E, T, D, A, I, L
Unfriendly crossword puzzle words
include letters: J, Q, X, Z, F, V, W
e.g. quiz, fix, jazz, quaff, xylophone, wax
Crossword Puzzle Construction

Genetic Algorithm (GA)
–
–
Evolve a solution by crossovers and
mutations through many generations
Initial population of crossword grids:



–
C rossover
Generation i
X1i
1 1 0 0
f = 36
X2i
0 1 0 0
f = 44
X3i
0 0 0 1
f = 14
X4i
1 1 1 0
f = 14
X5i
0 1 1 1
f = 56
X6i
1 0 0 1
f = 54
X6i
1 0 00 1
0 1 00 00 X2i
X1i
10 11 00 00
0 11 11 11 X5i
X2i
0 1 0 0
0 1 1 1 X5i
Generation (i + 1)
X1i+1 1 0 0 0 f = 56
X2i+1 0 1 0 1 f = 50
X3i+1 1 0 1 1 f = 44
X4i+1 0 1 0 0 f = 44
X5i+1 0 1 1 0 f = 54
Random letters?
Random letters based on Scrabble® frequencies?
Random words from dictionary?
X6i+1 0 1 1 1 f = 56
Fitness of each grid is number of valid words
Mutation
X6'i 1 0 0 0
X2'i 0 1 0 10
X1'i 10
1 1
1
1 1 X1"i
X5'i 0 1 01 01
X2i
0 1
X5i
0 1 1 1
0
0 1
0 X2"i
Solving Crossword Puzzles

Given:
–
–

Crossword grid
Clues
Problem:
–
Fill the grid such
that all words
correctly answer
the given clues
Solving Crossword Puzzles

Obtain candidate answers for each clue
–
–

Assign a confidence value to each candidate
Are we guaranteed to have the correct answer?
Place candidate answers in grid until a solution
is found or a dead-end occurs
–
Which backtracking strategies
should we use?
Solving Crossword Puzzles

PROVERB — Duke University, 1999
–
–
Modules provide candidate answers
from dictionaries, encyclopedias,
movie databases, etc.
Module sources a Crossword Puzzle Database of
exactly 5142 previously solved puzzles

–
Pivotal in PROVERB’s success
Another module generates all combinations
of letters (ouch!)
Solving Crossword Puzzles

Google CruciVerbalist (GCV)
Solving Crossword Puzzles

GCV solved 13x13 puzzle with 68 clues
–
–
–
–
Many clues are fill-in-the-blank
or pop-culture clues
Candidate answers
obtained from Google
results page (top 50)
Solved using 559 Google queries
Queries yielded 68 correct answers

44 correct answers had highest confidence
Solving Crossword Puzzles
Clue Preprocessing

Categorize clues based on text and type of clues:
–
–
–
–
–
–
–
Fill-in-the-blank clues
Synonyms/Antonyms
“Type of” (or “Kind of”) clues
Abbreviations
Clues with “and” or “or”
Singular or plural
Number of words in answer
Clue Preprocessing

Translate clues to Google-friendly forms
–
–
–
“To ___ is human”


“To * is human”
“To * * is human”
“Mary ___ little lamb” (2 words)

“Mary * * little lamb”
“___ to Joy” by Beethoven


“* to Joy” by Beethoven
“* * to Joy” by Beethoven
Clue Preprocessing

Translate clues to Google-friendly forms
–
–
–
Diplomacy

synonyms of Diplomacy
Not dry


opposite of dry
antonyms of dry
Joy

synonyms of Joy
Clue Preprocessing

Translate clues to Google-friendly forms
–
–
–
Type of dancing [or Kind of dancing]

* dancing
Second sight (abbr.)


Second sight
abbreviations of Second sight
Superman’s admirer

admirer of Superman
Clue Preprocessing

Translate clues to Google-friendly forms
–
–
Couldn’t move



Could not move
Could opposite of move
Could antonyms of move
Knight or Danson


Knight
Danson
Clue Preprocessing

Translate clues to Google-friendly forms
–
–
Bosley and Arnold



Bosley
Arnold
Append an ‘s’
Henson, and others
[or Henson, and namesakes]


Henson
Append an ‘s’
Results of Google-Querying
Results of Google-Querying

GCV excels at solving fill-in-the-blank
and pop-culture clues
–

Why?
Though results are encouraging,
using keyword-based searching
is limited
–
Why?
Populating the Crossword Grid

Use a Depth-First Search (DFS) algorithm:
–
–
Fill in the crossword grid based on confidence values
of candidate words
At each iteration:


–
Select candidate word with highest confidence value
amongst clues not yet placed
Attempt to fit candidate word into grid
Halt when a solution is found or a dead-end occurs
Populating the Crossword Grid

When a dead-end occurs, what do we do?
–
Backtrack: Remove last word placed in grid

–
Disadvantages?
Backjump: Identify culprit and remove all words
back to culprit word

Disadvantages?
Populating the Crossword Grid

When a dead-end occurs, what do we do?
–
Extricating Backjump: Identify and remove the culprit

–
Disadvantages?
How do we identify
the culprit?
Extricating Backjumping

Assign weights to the squares of the grid
–
Square weights correspond to confidence values
of candidate words placed
–
e.g. Place TWAIN with
confidence value of 10
at 5-Across
Extricating Backjumping

Weights of interlocking words are multiplied
Extricating Backjumping

Define grid weight of a word as the sum of each
individual square weight
–
e.g. TWAIN = 100,
NOW = 72
Extricating Backjumping

When a dead-end occurs, the culprit is the
word with the lowest grid weight
A Sampling of Crossword Puzzles
A Sampling of Crossword Puzzles

New York Times
A Sampling of Crossword Puzzles
A Sampling of Crossword Puzzles

TV Guide #42
A Sampling of Crossword Puzzles
A Sampling of Crossword Puzzles

TV Guide #63
A Sampling of Crossword Puzzles
A Sampling of Crossword Puzzles

Mensa Kids Puzzle #3
Results of Grid Solving
Limitations of Keyword-Based Search

Google and GCV use keyword-based tricks
to artificially improve result sets
–
–
Word frequency & proximity to other words
Additional keywords to help direct queries to
good candidate answers

–
e.g. synonyms of
Grammatical and structural rearrangements
Limitations of Keyword-Based Search

Lack of precision in keyword-based search
–
–
–
Irrelevant results in candidate answer lists
Confidence values based on word frequency
produces many false positives
Correct answer is often buried in other mediocre
(and incorrect!) candidates
In Conclusion....

Other uses of the
Web as an automated
information source?
–
–
–
Keyword-based search
is insufficient
Lacks the means for
machine-interpretable
information
Semantic Web
Download