Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D. March 6, 2009 Crossword Puzzle Construction Given: – – Dictionary of valid words and phrases Empty crossword grid Problem: – – Fill the crossword grid such that all words both across and down are valid Assign clues Crossword Puzzle Construction Depth-First Search (DFS) – Fill in words until a solution is found or a dead-end is encountered Backtrack from dead-ends – Questions: – Where do we start? What word do we fill in next? What backtracking strategies do we use? How do we avoid repetition (boring puzzles)? Crossword Puzzle Construction Optimize the DFS: – – Add longer (most constrained) words first Associate weights with words in dictionary based on frequency of letters Friendly crossword puzzle words include letters: S, R, E, T, D, A, I, L Unfriendly crossword puzzle words include letters: J, Q, X, Z, F, V, W e.g. quiz, fix, jazz, quaff, xylophone, wax Crossword Puzzle Construction Genetic Algorithm (GA) – – Evolve a solution by crossovers and mutations through many generations Initial population of crossword grids: – C rossover Generation i X1i 1 1 0 0 f = 36 X2i 0 1 0 0 f = 44 X3i 0 0 0 1 f = 14 X4i 1 1 1 0 f = 14 X5i 0 1 1 1 f = 56 X6i 1 0 0 1 f = 54 X6i 1 0 00 1 0 1 00 00 X2i X1i 10 11 00 00 0 11 11 11 X5i X2i 0 1 0 0 0 1 1 1 X5i Generation (i + 1) X1i+1 1 0 0 0 f = 56 X2i+1 0 1 0 1 f = 50 X3i+1 1 0 1 1 f = 44 X4i+1 0 1 0 0 f = 44 X5i+1 0 1 1 0 f = 54 Random letters? Random letters based on Scrabble® frequencies? Random words from dictionary? X6i+1 0 1 1 1 f = 56 Fitness of each grid is number of valid words Mutation X6'i 1 0 0 0 X2'i 0 1 0 10 X1'i 10 1 1 1 1 1 X1"i X5'i 0 1 01 01 X2i 0 1 X5i 0 1 1 1 0 0 1 0 X2"i Solving Crossword Puzzles Given: – – Crossword grid Clues Problem: – Fill the grid such that all words correctly answer the given clues Solving Crossword Puzzles Obtain candidate answers for each clue – – Assign a confidence value to each candidate Are we guaranteed to have the correct answer? Place candidate answers in grid until a solution is found or a dead-end occurs – Which backtracking strategies should we use? Solving Crossword Puzzles PROVERB — Duke University, 1999 – – Modules provide candidate answers from dictionaries, encyclopedias, movie databases, etc. Module sources a Crossword Puzzle Database of exactly 5142 previously solved puzzles – Pivotal in PROVERB’s success Another module generates all combinations of letters (ouch!) Solving Crossword Puzzles Google CruciVerbalist (GCV) Solving Crossword Puzzles GCV solved 13x13 puzzle with 68 clues – – – – Many clues are fill-in-the-blank or pop-culture clues Candidate answers obtained from Google results page (top 50) Solved using 559 Google queries Queries yielded 68 correct answers 44 correct answers had highest confidence Solving Crossword Puzzles Clue Preprocessing Categorize clues based on text and type of clues: – – – – – – – Fill-in-the-blank clues Synonyms/Antonyms “Type of” (or “Kind of”) clues Abbreviations Clues with “and” or “or” Singular or plural Number of words in answer Clue Preprocessing Translate clues to Google-friendly forms – – – “To ___ is human” “To * is human” “To * * is human” “Mary ___ little lamb” (2 words) “Mary * * little lamb” “___ to Joy” by Beethoven “* to Joy” by Beethoven “* * to Joy” by Beethoven Clue Preprocessing Translate clues to Google-friendly forms – – – Diplomacy synonyms of Diplomacy Not dry opposite of dry antonyms of dry Joy synonyms of Joy Clue Preprocessing Translate clues to Google-friendly forms – – – Type of dancing [or Kind of dancing] * dancing Second sight (abbr.) Second sight abbreviations of Second sight Superman’s admirer admirer of Superman Clue Preprocessing Translate clues to Google-friendly forms – – Couldn’t move Could not move Could opposite of move Could antonyms of move Knight or Danson Knight Danson Clue Preprocessing Translate clues to Google-friendly forms – – Bosley and Arnold Bosley Arnold Append an ‘s’ Henson, and others [or Henson, and namesakes] Henson Append an ‘s’ Results of Google-Querying Results of Google-Querying GCV excels at solving fill-in-the-blank and pop-culture clues – Why? Though results are encouraging, using keyword-based searching is limited – Why? Populating the Crossword Grid Use a Depth-First Search (DFS) algorithm: – – Fill in the crossword grid based on confidence values of candidate words At each iteration: – Select candidate word with highest confidence value amongst clues not yet placed Attempt to fit candidate word into grid Halt when a solution is found or a dead-end occurs Populating the Crossword Grid When a dead-end occurs, what do we do? – Backtrack: Remove last word placed in grid – Disadvantages? Backjump: Identify culprit and remove all words back to culprit word Disadvantages? Populating the Crossword Grid When a dead-end occurs, what do we do? – Extricating Backjump: Identify and remove the culprit – Disadvantages? How do we identify the culprit? Extricating Backjumping Assign weights to the squares of the grid – Square weights correspond to confidence values of candidate words placed – e.g. Place TWAIN with confidence value of 10 at 5-Across Extricating Backjumping Weights of interlocking words are multiplied Extricating Backjumping Define grid weight of a word as the sum of each individual square weight – e.g. TWAIN = 100, NOW = 72 Extricating Backjumping When a dead-end occurs, the culprit is the word with the lowest grid weight A Sampling of Crossword Puzzles A Sampling of Crossword Puzzles New York Times A Sampling of Crossword Puzzles A Sampling of Crossword Puzzles TV Guide #42 A Sampling of Crossword Puzzles A Sampling of Crossword Puzzles TV Guide #63 A Sampling of Crossword Puzzles A Sampling of Crossword Puzzles Mensa Kids Puzzle #3 Results of Grid Solving Limitations of Keyword-Based Search Google and GCV use keyword-based tricks to artificially improve result sets – – Word frequency & proximity to other words Additional keywords to help direct queries to good candidate answers – e.g. synonyms of Grammatical and structural rearrangements Limitations of Keyword-Based Search Lack of precision in keyword-based search – – – Irrelevant results in candidate answer lists Confidence values based on word frequency produces many false positives Correct answer is often buried in other mediocre (and incorrect!) candidates In Conclusion.... Other uses of the Web as an automated information source? – – – Keyword-based search is insufficient Lacks the means for machine-interpretable information Semantic Web