Formal Semantics Slides by Julia Hockenmaier, Laura McGarrity, Bill McCartney, Chris Manning, and Dan Klein Formal Semantics It comes in two flavors: • Lexical Semantics: The meaning of words • Compositional semantics: How the meaning of individual units combine to form the meaning of larger units What is meaning • Meaning ≠ Dictionary entries Dictionaries define words using words. Circularity! Reference • Referent: the thing/idea in the world that a word refers to • Reference: the relationship between a word and its referent Reference Barack Obama president The president is the commander-in-chief. = Barack Obama is the commander-in-chief. Reference Barack Obama I want to be the president. ≠ I want to be Barack Obama. president Reference • Tooth fairy? • Phoenix? • Winner of the 2016 presidential election? What is meaning? • Meaning ≠ Dictionary entries • Meaning ≠ Reference Sense • Sense: The mental representation of a word or phrase, independent of its referent. Sense ≠ Mental Image • A word may have different mental images for different people. – E.g., “mother” • A word may conjure a typical mental image (a prototype), but can signify atypical examples as well. Sense v. Reference • A word/phrase may have sense, but no reference: – King of the world – The camel in CIS 8538 – The greatest integer – The • A word may have reference, but no sense: – Proper names: Dan McCloy, Kristi Krein (who are they?!) Sense v. Reference • A word may have the same referent, but more than one sense: – The morning star / the evening star (Venus) • A word may have one sense, but multiple referents: – Dog, bird Some semantic relations between words • Hyponymy: subclass – – – – Poodle < dog Crimson < red Red < color Dance < move • Hypernymy: superclass • Synonymy: – Couch/sofa – Manatee / sea cow • Antonymy: – Dead/alive – Married/single Lexical Decomposition • Word sense can be represented with semantic features: Compositional Semantics Compositional Semantics • The study of how meanings of small units combine to form the meaning of larger units The dog chased the cat ≠ The cat chased the dog. ie, the whole does not equal the sum of the parts. The dog chased the cat = The cat was chased by the dog ie, syntax matters to determining meaning. Principle of Compositionality The meaning of a sentence is determined by the meaning of its words in conjunction with the way they are syntactically combined. Exceptions to Compositionality • Anomaly: when phrases are well-formed syntactically, but not semantically – Colorless green ideas sleep furiously. (Chomsky) – That bachelor is pregnant. Exceptions to Compositionality • Metaphor: the use of an expression to refer to something that it does not literally denote in order to suggest a similarity – Time is money. – The walls have ears. Exceptions to Compositionality • Idioms: Phrases with fixed meanings not composed of literal meanings of the words – Kick the bucket = die (*The bucket was kicked by John.) – When pigs fly = ‘it will never happen’ (*She suspected pigs might fly tomorrow.) – Bite off more than you can chew = ‘to take on too much’ (*He chewed just as much as he bit off.) Idioms in other languages Logical Foundations for Compositional Semantics • We need a language for expressing the meaning of words, phrases, and sentences • Many possible choices; we will focus on – First-order predicate logic (FOPL) with types – Lambda calculus Truth-conditional Semantics • Linguistic expressions – “Bob sings.” • Logical translations – sings(Bob) – but could be p_5789023(a_257890) • Denotation: – [[bob]] = some specific person (in some context) – [[sings(bob)]] = true, in situations where Bob is singing; false, otherwise • Types on translations: – bob: e(ntity) – sings(bob): t(rue or false, a boolean type) Truth-conditional Semantics Some more complicated logical descriptions of language: – “All girls like a video game.” – x:e . y:e . girl(x) [video-game(y) likes(x,y)] – “Alice is a former teacher.” – (former(teacher))(Alice) – “Alice saw the cat before Bob did.” – x:e, y:e, z:e, t1:e, t2:e . cat(x) see(y) see(z) agent(y, Alice) patient(y, x) agent(z, Bob) patient(z, x) time(y, t1) time(z, t2) <(t1, t2) FOPL Syntax Summary • A set of types T = {t1, … } • A set of constants C = {c1, …}, each associated with a type from T • A set of relations R = {r1, …}, where each ri is a subset of Cn for some n. • A set of variables X = {x1, …} • , , , , , , ., : Truth-conditional semantics • Proper names: – Refer directly to some entity in the world – Bob: bob • Sentences: – Are either t or f – Bob sings: sings(bob) • So what about verbs and VPs? – – – – sings must combine with bob to produce sings(bob) The λ-calculus is a notation for functions whose arguments are not yet filled. sings: λx.sings(x) This is a predicate, a function that returns a truth value. In this case, it takes a single entity as an argument, so we can write its type as e t • Adjectives? Lambda calculus • FOPL + λ (new quantifier) will be our lambda calculus • Intuitively, λ is just a way of creating a function – E.g., girl() is a relation symbol; but λx . girl(x) is a function that takes one argument. • New inference rule: function application (λx . L1(x)) (L2) → L1(L2) E.g., (λx . x2) (3) → 32 E.g., (λx . sings(x)) (Bob) → sings(Bob) • Lambda calculus lets us describe the meaning of words individually. – Function application (and a few other rules) then lets us combine those meanings to come up with the meaning of larger phrases or sentences. Compositional Semantics with the λ-calculus • So now we have meanings for the words • How do we know how to combine the words? • Associate a combination rule with each grammar rule: – S : β(α) NP : α VP : β (function application) – VP : λx. α(x) ∧ β(x) VP : α and : ∅ VP : β (intersection) • Example: Composition: Some more examples • Transitive verbs: – likes : λx.λy.likes(y,x) – Two-places predicates, type e(et) – VP “likes Amy” : λy.likes(y,Amy) is just a one-place predicate • Quantifiers: – What does “everyone” mean? – Everyone : λf.x.f(x) – Some problems: • Have to change our NP/VP rule • Won’t work for “Amy likes everyone” – What about “Everyone likes someone”? – Gets tricky quickly! Composition: Some more examples • Indefinites – The wrong way: • “Bob ate a waffle” : ate(bob,waffle) • “Amy ate a waffle” : ate(amy,waffle) – Better translation: • • • • ∃x.waffle(x) ^ ate(bob, x) What does the translation of “a” have to be? What about “the”? What about “every”? Denotation • What do we do with the logical form? – It has fewer (no?) ambiguities – Can check the truth-value against a database – More usefully: can add new facts, expressed in language, to an existing relational database – Question-answering: can check whether a statement in a corpus entails a question-answer pair: “Bob sings and dances” Q:“Who sings?” has answer A:“Bob” – Can chain together facts for story comprehension Grounding • What does the translation likes : λx. λy. likes(y,x) have to do with actual liking? • Nothing! (unless the denotation model says it does) • Grounding: relating linguistic symbols to perceptual referents – Sometimes a connection to a database entry is enough – Other times, you might insist on connecting “blue” to the appropriate portion of the visual EM spectrum – Or connect “likes” to an emotional sensation • Alternative to grounding: meaning postulates – You could insist, e.g., that likes(y,x) => knows(y,x) More representation issues • Tense and events – In general, you don’t get far with verbs as predicates – Better to have event variables e • “Alice danced” : danced(Alice) vs. • “Alice danced” : ∃e.dance(e)^agent(e, Alice)^(time(e)<now) – Event variables let you talk about non-trivial tense/aspect structures: “Alice had been dancing when Bob sneezed” More representation issues • Propositional attitudes (modal logic) – “Bob thinks that I am a gummi bear” • thinks(bob, gummi(me))? • thinks(bob, “He is a gummi bear”)? – Usually, the solution involves intensions (^p) which are, roughly, the set of possible worlds in which predicate p is true. • thinks(bob, ^gummi(me)) – Computationally challenging • Each agent has to model every other agent’s mental state • This comes up all the time in language – – E.g., if you want to talk about what your bill claims that you bought, vs. what you think you bought, vs. what you actually bought. More representation issues • Multiple quantifiers: “In this country, a woman gives birth every 15 minutes. Our job is to find her, and stop her.” -- Groucho Marx • Deciding between readings – “Bob bought a pumpkin every Halloween.” – “Bob put a warning in every window.” More representation issues • Other tricky stuff – – – – Adverbs Non-intersective adjectives Generalized quantifiers Generics • “Cats like naps.” • “The players scored a goal.” – Pronouns and anaphora • “If you have a dime, put it in the meter.” – … etc., etc. Mapping Sentences to Logical Forms CCG Parsing • Combinatory Categorial Grammar – Lexicalized PCFG – Categories encode argument sequences • A/B means a category that can combine with a B to the right to form an A • A \ B means a category that can combine with a B to the left to form an A – A syntactic parallel to the lambda calculus Learning to map sentences to logical form • Zettlemoyer and Collins (IJCAI 05, EMNLP 07) Some Training Examples CCG Lexicon Parsing Rules (Combinators) Application Right: X : f(a) X/Y : f Y : a Left: X : f(a) Y : a X\Y : f Additional rules: • Composition • Type-raising CCG Parsing Example Parsing a Question Lexical Generation Input Training Example Sentence: Texas borders Kansas. Logical form: borders(Texas, Kansas) GENLEX • Input: a training example (Si, Li) • Computation: – Create all substrings of consecutive words in Si – Create categories from Li – Create lexical entries that are the cross products of these two sets • Output: Lexicon Λ GENLEX Cross Product Input Training Example Sentence: Texas borders Kansas. Logical form: borders(Texas, Kansas) Output Lexicon Output Substrings Texas borders Kansas Texas borders borders Kansas Texas borders Kansas X (cross product) Output Categories NP : texas NP : kansas (S\NP)/NP : λx.λy.borders(y,x) GENLEX Output Lexicon Words Category Texas NP : texas Texas NP : kansas Texas (S\NP)/NP : λx.λy.borders(y,x) borders NP : texas Borders NP : kansas borders (S\NP)/NP : λx.λy.borders(y,x) … … Texas borders Kansas NP : texas Texas borders Kansas NP : kansas Texas borders Kansas (S\NP)/NP : λx.λy.borders(y,x) Weighted CCG Given a log-linear model with a CCG lexicon Λ, a feature vector f, and weights w: The best parse is: y* = argmax w ∙ f(x,y) y where we consider all possible parses y for the sentence x given the lexicon Λ. Parameter Estimation for Weighted CCG Parsing Inputs: Training set {(Si,Li) | i = 1, …, n} Initial lexicon Λ, initial weights w, num. iter. T Computation: For t=1 … T, i = 1 … n: Step 1: Check correctness If y* = argmax w ∙ f(Si,y) is Li, skip to next i Step 2: Lexical generation Set λ = Λ ∪ GENLEX(Si,Li) Let y’ = argmax w ∙ f(Si,y) y s.t. L(y) = Li Define λi to be the lexical entries in y’ Set Λ = Λ ∪ λi Step 3: Update Parameters Let y’’ = argmax w ∙ f(Si,y) If y’’ ≠ Li y Set w = w + f(Si, y’) – f(Si,y’’) Output: Lexicon Λ and parameters w Example Learned Lexical Entries Challenge Revisited Disharmonic Application Missing Content Words Missing content-free words A complete parse Geo880 Test Set Precision Recall F1 Zettlemoyer & Collins 2007 95.49 83.20 88.93 Zettlemoyer & Collins 2005 96.25 79.29 86.95 Wong & Mooney 2007 93.72 80.00 86.31 Summing Up • Hypothesis: Principle of Compositionality – Semantics of NL sentences and phrases can be composed from the semantics of their subparts • Rules can be derived which map syntactic analysis to semantic representation (Rule-to-Rule Hypothesis) – Lambda notation provides a way to extend FOPC to this end – But coming up with rule2rule mappings is hard • Idioms, metaphors and other non-compositional aspects of language makes things tricky (e.g. fake gun)