Why are computers so stupid and what can be done about it? Artificial intelligence and commonsense knowledge Ernest Davis Science on Saturday March 3, 2012 Two well-known truths about computers Computers are great and amazing and a lot of fun to deal with. Computers are stupid and frustrating and it can be a huge amount of work to get what you want out of them. Chess • Computers can play chess better than the greatest chess masters. But • They’re no use for answering a question like “Give me an example where White can take Black’s queen, but if he does, Black can immediately checkmate.” Physics • Computers can compute the interaction of two galaxies colliding. • Wolfram Alpha can answer “How far was Jupiter from Saturn on Dec. 17, 1604?” But • They’re no use for answering a question like, “Can you ever get a solar eclipse one day and a lunar eclipse the next day?”. • You can’t answer the question “When is the next sunrise over crater Aristarchus?” faster than a 18th c. astronomer. Movies You can get a list of all Frank Capra’s movies, but no computer can answer the question, “What are Ellie and Peter doing here?” Textual Analysis Computers can tell which of the Federalist Papers were written by Hamilton and which by Madison. But They can’t answer the question “Why is F.P #54 no longer directly relevant?” (It discussed the 3/5 rule for slaves.) A different kind of computer stupidity (Boring, banal example, but that’s the point.) NYU “upgraded” its software for student registration. Endless problems. We want to have a rule that a student can register for at most 4 classes. Answers: Can’t be done/Will cost a lot of money. Incompetent software engineering. Artificial Intelligence “Then somehow it achieved self-awareness, and in a few nanoseconds had enslaved the human race.” Many tasks that are very easy for people are extremely difficult for computers: Vision Natural Language Operating in a rich environment (kitchen). Simple reasoning (chess question) Why is Natural Language Hard? Many reasons. One of the hardest aspects is ambiguity. Lexical disambiguation: “This gift is for Stuart.” “This gift is for Christmas.” “This bowl is for soup.” O.E.D. list 36 primary meanings of “for”; >100 subcategories Ambiguity is ubiquitous The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a cover-shoot. "The beauty business", The Economist, Feb. 11, 2012. Ambiguous words The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a cover-shoot. Black – unambiguous. Blue – most frequent meaning Red – not most frequent meaning Reference disambiguation: Winograd schemas “Jane knocked on Susan’s door, but she didn’t answer.” “Jane knocked on Susan’s door, but she didn’t get an answer” “The trophy doesn’t fit in the suitcase, because it’s too small.” “The trophy doesn’t fit in the suitcase, because it’s too large.” Why is computer vision hard? • Two images of the same thing may be very different depending on viewpoint, lighting, etc. • Two things in the same category may be geometrically very different. • Context is used to interpret objects for which there is actually very little image information. Concert / party. Warmish spring afternoon. Street in suburban neighborhood. Wooded hill behind. Canvas awning above, mended with duct tape. Large mug front center. People in back. Messy kitchen Bottle is empty Fridge in corner Toaster oven, paper towel on counter. Plant is hung from curtain rod. Daytime Two approaches to artificial intelligence • Corpus-based machine learning • Knowledge-based techniques Corpus-based machine learning You have: • Large body of data (text, pictures, etc.) • A task Find many patterns of superficial features that are relevant to the task. Determine how to combine them to carry out the task. Critical: These are done without any real understanding of the task or content. Notable successes • Speech understanding – Automatic dictation – SIRI etc. • Google translate • Autonomous vehicle • Automatic check reading Automatic dictation Start with: Corpus of recorded speech and transcription Extract patterns/rules: • Sounds => phoneme • Sequence of phoneme => word • Common sequences of words All labelled with probabilities Compute: Most probable interpretation of a sequence of sounds. Google translate Start with: • French/English dictionary • Information about grammars • “Bi-texts” e.g. Canadian parliamentary proceedings. Extract: • Translations of words to words or phrases to phrases, with probabilities. • Rules for reorganizing sentence structure. Limitations of Corpus-Based Approach • Task-specific. Learning to translate French does not enable the program to answer questions about a story in French. • Corpus limitations. If your corpus is Parliamentary proceedings, you end up with a Parliamentary vocabulary. • Data limitation. No huge corpus of bitexts. • Errors can be weird. Google translate: To French and back The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a cover-shoot. The price [sic] is more juicy to become the face of a luxury brand like Dior and Burberry. To have a chance, a model must first be magazine shoots under his belt designer. This fact can pay peanuts fashion magazines, even coverage for rickshaws. Google translate: To Japanese and back The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a cover-shoot. The juicy prize is to be a face of brands such as Dior and luxury, such as Burberry. In order to have any chance, the model must have the shooting of her first magazine under designer belt. This fact, and further, fashion magazines, you can pay peanuts cover shoot. Google translate: To Azerbaijani and back The juiciest prize is to become the face of a luxury brand such as Dior or Burberry. To have any chance, a model must first have magazine shoots under her designer belt. This fact allows fashion magazines to pay peanuts, even for a cover-shoot. The juiciest a premium luxury brands such as Dior, or to face Burberry. For any chance, a model should be the first in the bottom of the magazine tumurcuqlar designer belt. This fact is even, the fashion magazines to pay peanuts cover-shoot. Knowledge-based approach • Determine the knowledge needed for reasoning in a domain. • Develop a notation that is clearly defined and that can express that knowledge. • Encode all the domain knowledge. • Find ways to automate reasoning with this knowledge. • Integrate the knowledge with the task Knowledge involves deep features of domain and task. Manually constructed. Commonsense Knowledge The knowledge about the world that everyone has by age 7. Learned by living in the world, not book-learning Time, Space, Physical objects, People, Animals and Plants … “If an open bottle full of liquid is turned upside down, the contents will pour out.” “Hitting someone will not make them like you.” “An animal is the same species as its parents.” So obvious that it’s not worth talking about. “The trophy doesn’t fit in the suitcase because it’s too large”. Interpretation of “trophy is too large”: The trophy does not fit in the suitcase, and any larger trophy will also not fit, but some smaller trophy would fit. Interpretation of “suitcase is too large”: The trophy does not fit in the suitcase and would not fit in any larger suitcase, but would fit in some smaller suitcase. Fact If an object fits in a container, it fits in any larger container. So we can rule out the second reading. In logical notation: ∀o,c1,c2 FitsIn(o,c1) ⋀ Larger(c1,c2) ⇒ FitsIn(o,c2) Commonsense spatial reasoning “Jane knocked on Susan’s door, but she didn’t [get an] answer.” Much more difficult: • Social interactions are more complex than geometry. • Narrative coherence, rather than plausibility. Neither woman answered or got an answer. How far have we gotten? A lot is known about representing: Ontology, general reasoning methods, time, A fair amount is known about: Space, knowledge and belief, interactions between people, plans and goals. A little is known about: physical reasoning. Not much is known about other categories. Successes • Planning. Mars rover. • Debugging. Find quite subtle bugs in very complex programs (operating systems, aircraft control, etc.) and hardware design. • Theorem proving: A couple of original mathematical theorems have been proven. Obstacles • Commonsense reasoning is a small, complex part of any AI task. • Little payoff until there is a lot of commonsense knowledge. • Software development starts with simple useful systems, and adds features. It is unwelcoming to systems that need to be very complex from the start. • Shortcuts lead to chaos. Combined approaches • Information extraction from text (partial success). A suicide car bomber struck at the gates of Baghdad’s police academy Sunday afternoon, as recruits were leaving the compound, punctuating weeks of relative calm here after a particularly violent January. Extract: Event: TerroristAttack. Place:Baghdad. Date: 2/19/12 PM. Method: CarBomb Combined approaches • In a street photograph, a human must be at street level, not floating next to windows. (Alyosha Efros, CMU) The Zipf Distribution: Bane of AI • AKA: Inverse power distribution, long tail, fat tail. The kth [largest/most common] item has [size/frequency] proportional to 1/kα where 1≤α≤ 2.5 or so. Zipf’s law: Lots of things follow the Zipf distribution: Income, city population, number of inlinks, number of occurences of a word … Consequences of the Zipf distribution A few rich people have most of the money. A significant fraction of words in a corpus appear very rarely (long tail) In the BNC (108 words), the 20 most common words account for 28% of the tokens. 0.5% of the tokens are words that occur only once. 2.3% are words that occur no more than 20 times. Reducing the miss rate by a fixed percentage requires reading an exponentially increasing corpus. E.g. to reduce the miss rate by 5%, you have to double the size of the corpus. Getting mediocre “promising” results is easy. Getting good results is a lot of work. Getting really excellent results is a huge amount of work. Why this is bad for AI • Machine learning: Hard to get all the patterns e.g. all sequences of three words that may occur. • Knowledge-based systems: Hard to get all the facts you may need. How to proceed • Look in depth at a variety of different domains. • Get good solutions to basic issues • Natural language texts must be used with caution. • Patience. This is a large, difficult project, which may take centuries.