Formal Semantics Slides by Julia Hockenmaier, Laura McGarrity, Bill McCartney, Chris Manning, and Dan Klein Question Answering: IBM’s Watson Jeopardy challenge: https://www.youtube.com/watch?v=seNkjYyG3 gI Question Answering: IBM’s Watson What components does Watson need? Question Answering: IBM’s Watson What components does Watson need? - named-entity recognition - Named-entity disambiguation - Phrase chunking - Relation extraction - Word sense disambiguation Formal Semantics It comes in two flavors: • Lexical Semantics: The meaning of words • Compositional semantics: How the meaning of individual units combine to form the meaning of larger units What is meaning • Meaning ≠ Dictionary entries Dictionaries define words using words. Circularity! Reference • Referent: the thing/idea in the world that a word refers to • Reference: the relationship between a word and its referent Reference Barack Obama president The president is the commander-in-chief. = Barack Obama is the commander-in-chief. Reference Barack Obama I want to be the president. ≠ I want to be Barack Obama. president Reference • Tooth fairy? • Phoenix? • Winner of the 2016 presidential election? What is meaning? • Meaning ≠ Dictionary entries • Meaning ≠ Reference Sense • Sense: The mental representation of a word or phrase, independent of its referent. Sense ≠ Mental Image • A word may have different mental images for different people. – E.g., “mother” • A word may conjure a typical mental image (a prototype), but can signify atypical examples as well. Sense v. Reference • A word/phrase may have sense, but no reference: – King of the world – The camel in CIS 3203 – The greatest integer – The • A word may have reference, but no sense: – Proper names: Dan McCloy, Kristi Krein (who are they?!) Sense v. Reference • A word may have the same referent, but more than one sense: – The morning star / the evening star (Venus) • A word may have one sense, but multiple referents: – Dog, bird Some semantic relations between words • Hyponymy: subclass – – – – Poodle < dog Crimson < red Red < color Dance < move • Hypernymy: superclass • Synonymy: – Couch/sofa – Manatee / sea cow • Antonymy: – Dead/alive – Married/single Lexical Decomposition • Word sense can be represented with semantic features: Compositional Semantics Compositional Semantics • The study of how meanings of small units combine to form the meaning of larger units The dog chased the cat ≠ The cat chased the dog. ie, the whole does not equal the sum of the parts. The dog chased the cat = The cat was chased by the dog ie, syntax matters to determining meaning. Principle of Compositionality The meaning of a sentence is determined by the meaning of its words in conjunction with the way they are syntactically combined. Exceptions to Compositionality • Anomaly: when phrases are well-formed syntactically, but not semantically – Colorless green ideas sleep furiously. (Chomsky) – That bachelor is pregnant. Exceptions to Compositionality • Metaphor: the use of an expression to refer to something that it does not literally denote in order to suggest a similarity – Time is money. – The walls have ears. Exceptions to Compositionality • Idioms: Phrases with fixed meanings not composed of literal meanings of the words – Kick the bucket = die (*The bucket was kicked by John.) – When pigs fly = ‘it will never happen’ (*She suspected pigs might fly tomorrow.) – Bite off more than you can chew = ‘to take on too much’ (*He chewed just as much as he bit off.) Idioms in other languages Logical Foundations for Compositional Semantics • We need a language for expressing the meaning of words, phrases, and sentences • Many possible choices; we will focus on – First-order predicate logic (FOPL) with types – Lambda calculus Truth-conditional Semantics • Linguistic expressions – “Bob sings.” • Logical translations – sings(Bob) – but could be p_5789023(a_257890) • Denotation: – [[bob]] = some specific person (in some context) – [[sings(bob)]] = true, in situations where Bob is singing; false, otherwise • Types on translations: – bob: e(ntity) – sings(bob): t(rue or false, a boolean type) Truth-conditional Semantics Some more complicated logical descriptions of language: – “All girls like a video game.” – x . y . girl(x) [video-game(y) likes(x,y)] – “Alice is a former teacher.” – (former(teacher))(Alice) – “Alice saw the cat before Bob did.” – x, y, z, t1, t2 . cat(x) see(y) see(z) agent(y, Alice) patient(y, x) agent(z, Bob) patient(z, x) time(y, t1) time(z, t2) <(t1, t2) FOPL Syntax Summary • A set of constants C = {c1, …} • A set of relations R = {r1, …}, where each ri is a subset of Cn for some n. • A set of variables X = {x1, …} • , , , , , , . Truth-conditional semantics • Proper names: – Refer directly to some entity in the world – Bob: bob • Sentences: – Are either t or f, so they are FOL sentences – Bob sings: sings(bob) • So what about verbs and VPs? – sings must combine with bob to produce sings(bob) – The λ-calculus is a notation for functions whose arguments are not yet filled. – sings: λx.sings(x) – This is a predicate, a function that returns a truth value. In this case, it takes a single entity as an argument, so we can write its type as e t Lambda calculus • FOL + λ (new quantifier) will be our lambda calculus • Intuitively, λ is just a way of creating a function – E.g., girl() is a relation symbol; but λx . girl(x) is a function that takes one argument. • New inference rule: function application (λx . L1(x)) (L2) → L1(L2) E.g., (λx . x2) (3) → 32 E.g., (λx . sings(x)) (Bob) → sings(Bob) • Lambda calculus lets us describe the meaning of words individually. – Function application (and a few other rules) then lets us combine those meanings to come up with the meaning of larger phrases or sentences. Quiz: Lambda calculus For each lambda calculus expression below, find a simplified form: • (λx . x) (-19) • (λx . canFly(x)) (PollyParrot) • (λf . f(PollyParrot)) (λx . canFly(x)) Answer: Lambda calculus For each lambda calculus expression below, find a simplified form: • (λx . x) (-19) -19 • (λx . canFly(x)) (PollyParrot) canFly(PollyParrot) • (λf . f(PollyParrot)) (λx . canFly(x)) (λx . canFly(x)) (PollyParrot) canFly(PollyParrot) Quiz: Lambda calculus 2 For each lambda calculus expression below, find a factored form, where each factor contains some portion of the original: • canFly(PollyParrot) • likes(SuzySueMae, JimmyJoeBob) • 2 Answer: Lambda calculus 2 For each lambda calculus expression below, find a factored form, where each factor contains some portion of the original: • canFly(PollyParrot) λx . canFly(x), PollyParrot OR λf . f(PollyParrot), λx . canFly(x) • likes(SuzySueMae, JimmyJoeBob) λx . likes(x, JimmyJoeBob), SuzySueMae OR λx . likes(SuzySueMae, x), JimmyJoeBob OR λx . λy . likes(x, y), SuzySueMae, JimmyJoeBob OR EVEN λf . λx . λy . f(x, y), SuzySueMae, JimmyJoeBob, λa.λb.likes(a, b) • 2 Can’t do it. Only real option: λx . x, 2. But the first factor has nothing of the original. Compositional Semantics with the λ-calculus Associate a combination rule with each grammar rule: – S : β(α) NP : α VP : β (function application) – VP : λx. α(x) ∧ β(x) VP : α and : ∅ VP : β (intersection) • Example: Composition: Some more examples • Transitive verbs: – likes : λx.λy.likes(y,x) – VP “likes Amy” : λy.likes(y,Amy) is just a one-place predicate • Quantifiers: – What does “everyone” mean? – Everyone : λf.x.f(x) – Some problems: • Have to change our NP/VP rule • Won’t work for “Amy likes everyone” – What about “Everyone likes someone”? – Gets tricky quickly! Composition: Some more examples • Indefinites – The wrong way: • “Bob ate a waffle” : ate(bob,waffle) • “Amy ate a waffle” : ate(amy,waffle) – Better translation: • ∃x.waffle(x) ^ ate(bob, x) Composition Example ∃x.waffle(x) ^ ate(bob, x) Use factoring to determine the meaning of each node in the tree. Quiz: Composition ∃x.waffle(x) ^ ate(bob, x) bob λy. ∃x.waffle(x) ^ ate(y, x) By repeatedly applying factoring, what is the lambda calculus form for • ate? • waffle? • a? Answer: Composition ∃x.waffle(x) ^ ate(bob, x) λy. ∃x.waffle(x) ^ ate(y, x) bob λf. λy. ∃x.waffle(x) ^ f(y, x) λa. λb. ate(a, b) λc. waffle(c) λg. λf. λy. ∃x.g(x) ^ f(y, x) By repeatedly applying factoring, what is the lambda calculus form for • ate? λa. λb. ate(a, b) • waffle? λc. waffle(c) • a? λg. λf. λy. ∃x.g(x) ^ f(y, x) Denotation • What do we do with the logical form? – It has fewer (no?) ambiguities – Can check the truth-value against a database – More usefully: can add new facts, expressed in language, to an existing relational database – Question-answering: can check whether a statement in a corpus entails a question-answer pair: “Bob sings and dances” Q:“Who sings?” has answer A:“Bob” – Can chain together facts for story comprehension Grounding • What does the translation likes : λx. λy. likes(y,x) have to do with actual liking? • Nothing! (unless the denotation model says it does) • Grounding: relating linguistic symbols to perceptual referents – Sometimes a connection to a database entry is enough – Other times, you might insist on connecting “blue” to the appropriate portion of the visual EM spectrum – Or connect “likes” to an emotional sensation • Alternative to grounding: meaning postulates – You could insist, e.g., that likes(y,x) => knows(y,x) More representation issues • Tense and events – In general, you don’t get far with verbs as predicates – Better to have event variables e • “Alice danced” : danced(Alice) vs. • “Alice danced” : ∃e.dance(e)^agent(e, Alice)^(time(e)<now) – Event variables let you talk about non-trivial tense/aspect structures: “Alice had been dancing when Bob sneezed” More representation issues • Propositional attitudes (modal logic) – “Bob thinks that I am a gummi bear” • thinks(bob, gummi(me))? • thinks(bob, “He is a gummi bear”)? – Usually, the solution involves intensions (^p) which are, roughly, the set of possible worlds in which predicate p is true. • thinks(bob, ^gummi(me)) – Computationally challenging • Each agent has to model every other agent’s mental state • This comes up all the time in language – – E.g., if you want to talk about what your bill claims that you bought, vs. what you think you bought, vs. what you actually bought. More representation issues • Multiple quantifiers: “In this country, a woman gives birth every 15 minutes. Our job is to find her, and stop her.” -- Groucho Marx • Deciding between readings – “Bob bought a pumpkin every Halloween.” – “Bob put a warning in every window.” More representation issues • Other tricky stuff – – – – Adverbs Non-intersective adjectives Generalized quantifiers Generics • “Cats like naps.” • “The players scored a goal.” – Pronouns and anaphora • “If you have a dime, put it in the meter.” – … etc., etc. Mapping Sentences to Logical Forms CCG Parsing • Combinatory Categorial Grammar – Lexicalized PCFG – Categories encode argument sequences • A/B means a category that can combine with a B to the right to form an A • A \ B means a category that can combine with a B to the left to form an A – A syntactic parallel to the lambda calculus Learning to map sentences to logical form • Zettlemoyer and Collins (IJCAI 05, EMNLP 07) Some Training Examples CCG Lexicon Parsing Rules (Combinators) Application Right: X : f(a) X/Y : f Y : a Left: X : f(a) Y : a X\Y : f Additional rules: • Composition • Type-raising CCG Parsing Example Lexical Generation Input Training Example Sentence: Texas borders Kansas. Logical form: borders(Texas, Kansas) GENLEX • Input: a training example (Si, Li) • Computation: – Create all substrings of consecutive words in Si – Create categories from Li – Create lexical entries that are the cross products of these two sets • Output: Lexicon Λ GENLEX Cross Product Input Training Example Sentence: Texas borders Kansas. Logical form: borders(Texas, Kansas) Output Lexicon Output Substrings Texas borders Kansas Texas borders borders Kansas Texas borders Kansas X (cross product) Output Categories NP : texas NP : kansas (S\NP)/NP : λx.λy.borders(y,x) GENLEX Output Lexicon Words Category Texas NP : texas Texas NP : kansas Texas (S\NP)/NP : λx.λy.borders(y,x) borders NP : texas Borders NP : kansas borders (S\NP)/NP : λx.λy.borders(y,x) … … Texas borders Kansas NP : texas Texas borders Kansas NP : kansas Texas borders Kansas (S\NP)/NP : λx.λy.borders(y,x) Example Learned Lexical Entries Geo880 Test Set Precision Recall F1 Zettlemoyer & Collins 2007 95.49 83.20 88.93 Zettlemoyer & Collins 2005 96.25 79.29 86.95 Wong & Mooney 2007 93.72 80.00 86.31 Challenge revisited Suppose this is your training data: How well will your semantic parser process a question like this one: Who scored the most points in the 2005-2006 NHL season? Building a Large-scale QA System How can we generalize a QA system to all of the information in Wikipedia? In Freebase? www.freebase.com In IMDB, ESPN, Yahoo! Finance, Twitter, and …? Learning meanings of words without labeled examples directed_by Table Web Search Director Film Ang Lee Life of Pi Luca Boni Zombie Massacre Kay Hawtrey Face-Off … … Known semantics: λx. λy. directed_by(y,x) Want lexical entries: directed by λx. λy. directed_by(y,x) directing λx. λy. directed_by(y,x) Life of Pi,” directed by Ang Lee and based on the novel by Yann Martel, features a young man, a tiger and lots of talk about God. A review of "Life of Pi," directed by Ang Lee. Ang Lee poses with his award for best directing for "Life of Pi" during the Oscars at the Dolby Theatre on Feb. Taiwanese-born Ang Lee won his second Oscar for Best Directing on Sunday for Life of Pi. Create new directed by, lexical entries directing, director of, … Extract critical words Summing Up • Hypothesis: Principle of Compositionality – Semantics of NL sentences and phrases can be composed from the semantics of their subparts • Rules can be derived which map syntactic analysis to semantic representation (Rule-to-Rule Hypothesis) – Lambda notation provides a way to extend FOPC to this end – But coming up with rule2rule mappings is hard • Idioms, metaphors and other non-compositional aspects of language makes things tricky (e.g. fake gun)