A Best-Fit Approach for Productive Analysis of Omitted Arguments Eva Mok & John Bryant University of California, Berkeley International Computer Science Institute Simplify grammar by exploiting the language understanding process Omission of arguments in Mandarin Chinese Construction grammar framework Model of language understanding Our best-fit approach Productive Argument Omission (in Mandarin) 1 2 ma1+ma gei3 ni3 zhei4+ge mother give 2PS this+CLS ni3 gei3 yi2 Mother (I) give you this (a toy). You give auntie [the peach]. 2PS give auntie 3 ao EMP 4 gei3 ni3 gei3 2PS give ya Oh (go on)! You give [auntie] [that]. EMP [I] give [you] [some peach]. give CHILDES Beijing Corpus (Tardiff, 1993; Tardiff, 1996) Arguments are omitted with different probabilities % elided (98 total utterances) 100.00% 90.00% 80.00% Giver Theme 70.00% 60.00% 50.00% Recipient 40.00% 30.00% 20.00% 10.00% 0.00% All arguments omitted: 30.6% No arguments omitted: 6.1% Construction grammar approach Kay & Fillmore 1999; Goldberg 1995 Grammaticality: form and function Basic unit of analysis: construction, i.e. a pairing of form and meaning constraints Not purely lexically compositional Implies early use of semantics in processing Embodied Construction Grammar (ECG) (Bergen & Chang, 2005) Problem: Proliferation of constructions Subj Verb Obj1 Obj2 ↓ ↓ ↓ ↓ Giver Transfer Recipient Theme Verb Obj1 Obj2 ↓ ↓ ↓ Transfer Recipient Theme Subj Verb Obj2 ↓ ↓ ↓ Giver Transfer Theme Subj Verb Obj1 ↓ ↓ ↓ Giver Transfer Recipient … If the analysis process is smart, then... Subj Verb Obj1 Obj2 ↓ ↓ ↓ ↓ Giver Transfer Recipient Theme The grammar needs only state one construction Omission of constituents is flexibly allowed The analysis process figures out what was omitted Best-fit analysis process takes burden off the grammar representation Utterance Discourse & Situational Context Constructions Analyzer: incremental, competition-based, psycholinguistically plausible Semantic Specification: image schemas, frames, action schemas Simulation Competition-based analyzer finds the best analysis An analysis is made up of: A constructional tree A set of resolutions A semantic specification The best fit has the highest combined score Combined score that determines best-fit Syntactic Fit: Constituency relations Combine with preferences on non-local elements Conditioned on syntactic context Antecedent Fit: Ability to find referents in the context Conditioned on syntactic information, feature agreement Semantic Fit: Semantic bindings for frame roles Frame roles’ fillers are scored Analyzing ni3 gei3 yi2 (You give auntie) Two of the competing analyses: ni3 ↓ Giver gei3 ↓ yi2 ↓ Transfer Recipient omitted ↓ ni3 ↓ Theme Giver gei3 ↓ omitted ↓ Transfer Recipient Syntactic Fit: P(Theme omitted | ditransitive cxn) = 0.65 P(Recipient omitted | ditransitive cxn) = 0.42 (1-0.78)*(1-0.42)*0.65 = 0.08 (1-0.78)*(1-0.65)*0.42 = 0.03 yi2 ↓ Theme Using frame and lexical information to restrict type of reference The Transfer Frame Giver Lexical Unit gei3 Recipient Giver (DNI) Theme Recipient (DNI) Theme (DNI) Manner Purpose Means Reason Place Time Can the omitted argument be recovered from context? Antecedent Fit: ni3 ↓ Giver gei3 yi2 omitted ↓ ↓ ↓ Transfer Recipient Theme ni3 ↓ Giver gei3 omitted ↓ ↓ Transfer Recipient Discourse & Situational Context child peach table mother auntie ? yi2 ↓ Theme How good of a theme is a peach? How about an aunt? Semantic Fit: ni3 ↓ Giver gei3 yi2 omitted ↓ ↓ ↓ Transfer Recipient Theme ni3 ↓ Giver gei3 omitted ↓ ↓ Transfer Recipient The Transfer Frame Giver (usually animate) Recipient (usually animate) Theme (usually inanimate) yi2 ↓ Theme The argument omission patterns shown earlier can be covered with just ONE construction % elided (98 total utterances) 90.00% 80.00% Giver Theme 70.00% 60.00% P(omitted|cxn): Subj Verb Obj1 Obj2 50.00% ↓ ↓ ↓ ↓ 20.00% Giver Transfer Recipient Theme 0.78 0.42 40.00% 30.00% 10.00% 0.00% 0.65 Each cxn is annotated with probabilities of omission Language-specific default probability can be set Recipient Research goal A computationally-precise modeling framework for learning early constructions Language Data Learning New Construction Linguistic Knowledge 16 Frequent argument omission in pro-drop languages Mandarin example: ni3 gei3 yi2 (“you give auntie”) Even in English, there are often no spoken antecedents to pronouns in conversations Learner must integrate cues from intentions, gestures, prior discourse, etc 17 A short dialogue bie2 mo3 wai4+tou2 a: #1_3 ! (別抹外頭啊) NEG-IMP apply forehead Don’t apply [lotion to your] forehead mo3 wai4+tou2 ke3 jiu4 bu4 hao3+kan4 le a: . (抹外頭可就不好看了啊) apply forehead LINKER LINKER NEG good looking CRS SFP [If you] apply [lotion to your] forehead then [you will] not be pretty … ze ya a: # bie2 gei3 ma1+ma wang3 lian3 shang4 mo:3 e: ! (嘖呀啊 # 別給媽媽往臉上抹呃) INTERJ # NEG-IMP BEN mother CV-DIR face on apply INTERJ # Don’t apply [the lotion] on [your mom’s] face (for mom) [- low pitch motherese] ma1+ma bu4 mo:3 you:2 . (媽媽不抹油) mother NEG apply lotion Mom doesn’t apply (use) lotion 18 Goals, refined Demonstrate learning given embodied meaning representation structured representation of context Based on Usage-based learning Domain-general statistical learning mechanism Generalization / linguistic category formation 19 Towards a precise computational model Modeling early grammar learning Context model & Simulation Data annotation Finding the best analysis for learning Hypothesizing and reorganizing constructions Pilot results 20 Embodied Construction Grammar construction yi2-N subcase of Morpheme form constraints self.f.orth <-- "yi2" meaning : @Aunt evokes RD as rd constraints self.m <--> rd.referent self.m <--> rd.ontological_category 21 “you” specifies discourse role construction ni3-N subcase of Morpheme form constraints self.f.orth <-- "ni3" meaning : @Human evokes RD as rd constraints self.m <--> rd.referent self.m <--> rd.ontological_category rd.discourse_participant_role <-- @Addressee rd.set_size <-- @Singleton 22 The meaning of “give” is a schema with roles construction gei3-V2 subcase of Morpheme form constraints self.f.orth <-- "gei3" meaning : Give schema Transfer subcase of Action roles giver : @Entity recipient : @Entity theme : @Entity constraints giver <--> protagonist schema Give subcase of Transfer constraints inherent_aspect <-- @Inherent_Achievement giver <-- @Animate recipient <-- @Animate theme <-- @Manipulable_Inanimate_Object 23 Finally, you-give-aunt links up the roles construction ni3-gei3-yi2 subcase of Finite_Clause constructional constituents n : ni3-N g : gei3-V2 y : yi2-N form constraints n.f meets g.f g.f meets y.f meaning : Give constraints self.m <--> g.m self.m.giver <--> n.m self.m.recipient <--> y.m 24 The learning loop: Hypothesize & Reorganize Utterance World Knowledge Discourse & Situational Context Linguistic Knowledge reorganize reinforcement Analysis hypothesize Context Fitting Partial SemSpec 25 If the learner has a ditransitive cxn Form ni3 Meaning Addressee meets yi2 XIXI addressee speaker MOT giver meets gei3 Context Discourse Segment Give recipient Aunt INV theme omitted Peach 26 Context fitting recovers more relations Form ni3 Meaning Addressee meets yi2 Give recipient Aunt theme omitted XIXI giver meets gei3 Context addressee speaker MOT giver Give Discourse Segment recipient INV attentionalfocus theme Peach 27 But the learner does not yet have phrasal cxns Form ni3 Meaning Addressee Context XIXI meets gei3 speaker MOT giver Give Give Discourse Segment recipient meets yi2 addressee Aunt INV attentionalfocus theme Peach 28 Context bootstraps learning Form ni3 Meaning Addressee meets gei3 meets yi2 giver Give recipient Aunt construction ni3-gei3-yi2 subcase of Finite_Clause constructional constituents n : ni3 g : gei3 y : yi2 form constraints n.f meets g.f g.f meets y.f meaning : Give constraints self.m <--> g.m self.m.giver <--> n.m self.m.recipient <--> y.m 29 A model of context is key to learning The context model makes it possible for the learning model to: learn new constructions using contextually available information learn argument-structure constructions in pro-drop languages 30 Understanding an utterance in context Transcripts Events + Utterances Schemas + Constructions Context Model Recency Model Simulation Analysis + Resolution Context Fitting Semantic Specification 31 Context model: Events + Utterances Setting participants, entities, & relations Start Event Sub-Event Event DS Sub-Event 32 Entities and Relations are instantiated Setting CHI, MOT (incl. body parts) livingroom (incl. ground, ceiling, chair, etc), lotion Start caused_motion01 forceful_motion ds04 motion admonishing05 speaker = MOT addressee = CHI forcefulness = normal apply02 translational_motion03 applier = CHI substance = lotion surface = face(CHI) mover = lotion spg = SPG 33 The context model is updated dynamically Events Context Model Recency Model Extended transcript annotation: speech acts & events Simulator inserts events into context model & updates it with the effects Some relations persists over time; some don’t. Simulation 34 Competition-based analyzer finds the best analysis An analysis is made up of: A constructional tree A semantic specification A set of resolutions A-GIVE-B-X subj v obj2 obj1 Ref-Exp Give Ref-Exp Ref-Exp Bill gave Mary the book @Man Give-Action Bill giver @Woman Mary recipient @Book book01 theme 35 Combined score that determines best-fit Syntactic Fit: Constituency relations Combine with preferences on non-local elements Conditioned on syntactic context Antecedent Fit: Ability to find referents in the context Conditioned on syntactic information, feature agreement Semantic Fit: Semantic bindings for frame roles Frame roles’ fillers are scored 36 Context Fitting goes beyond resolution Form ni3 Meaning Addressee meets yi2 Give recipient Aunt theme omitted XIXI giver meets gei3 Context addressee speaker MOT giver Give Discourse Segment recipient INV attentionalfocus theme Peach 37 Context Fitting, a.k.a. intention reading Context Fitting takes resolution a step further considers entire context model, ranked by recency considers relations amongst entities heuristically fits from top down, e.g. • discourse-related entities • complex processes • simple processes • other structured and unstructured entities more heuristics for future events (e.g. in cases of commands or suggestions) 38 Adult grammar size ~615 constructions total ~100 abstract cxns (26 to capture lexical variants) ~70 phrasal/clausal cxns ~440 lexical cxns (~260 open class) ~195 schemas (~120 open class, ~75 closed class) 39 Starter learner grammar size No grammatical categories (except interjections) Lexical items only ~440 lexical constructions ~260 open class: schema / ontology meanings ~40 closed class: pronouns, negation markers, etc ~60 function words: no meanings ~195 schemas (~120 open class, ~75 closed class) 40 The process hierarchy defined in schemas Process State State_ Change Action Proto_Transitive Complex_Process Intransitive_ State Two_Participant_ State Mental_State Serial_Processes Concurrent_ Processes Cause_Effect Joint_Motion Caused_Motion 41 The process hierarchy defined in schemas Action Intransitive_Action Motion Translational_ Motion Expression Self_Motion Translational_ Self_Motion Forceful_Motion Force_Application Continuous_ Force_Application Agentive_Impact 42 The process hierarchy defined in schemas Action Cause_Change Communication Obtainment Transfer Ingestion Perception Other_ Transitive_Action 43 Understanding an utterance in context Transcripts reorganize Events + Utterances Schemas + Constructions Context Model Recency Model Simulation Analysis + Resolution reinforcement hypothesize Context Fitting Semantic Specification 44 Hypothesize & Reorganize Hypothesize: utterance-driven; relies on the analysis (SemSpec & context) operations: compose Reorganize: grammar-driven; can be triggered by usage (to be determined) operations: generalize 45 Composing new constructions ni3 gei3 Addressee Give giver recipient theme Context MOT XIXI Peach INV Compose operation: If roles from different constructions point to the same context element, propose a new construction and set up a meaning binding. 46 Creating pivot constructions ni3 Addressee giver meets gei3 Give yi2 @Aunt Addressee giver meets gei3 recipient meets ni3 meets wo3 Give recipient @Human Pivot generalization: Given a phrasal cxn, look for another cxn that shares 1+ constituents. Line up roles and bindings. Create new cxn category for the slot. 47 Resulting constructions construction ni3-gei3-cat01 constituents ni3, gei3, cat01 meaning : Give constraints self.m.recipient <--> g.m general construction cat01 subcase of Morpheme meaning: @Human construction wo3 subcase of cat01 meaning: @Human construction yi2 subcase of cat01 meaning: @Aunt 48 Pilot Results: Sample constructions learned Composed: chi1_fan4 ni3_chuan1_xie2 ni3_shuo1 bu4_na2 wo3_qu4 ni3_ping2zi_gei3_wo3 ni3_gei3_yi2 wo3_bu4_chi1 eat rice you wear shoe you say NEG take I go you bottle give me you give aunt I NEG eat Pivot Cxns: ni3 {shuo1, chuan1} ni3 {shuo1, hua4} wo3 {zhao3, qu4} bu4 {na2, he1} {wo3, ma1} cheng2 you {say, wear} you {say, draw} I {find, go} NEG {take, drink} {I, mom} scoop 49 Challenge #1: Non-compositional meaning bake Bake baker baked you a cake Context Bake-Event MOT Addressee @Cake Give-Event CHI Cake Non-compositional meaning: Search for additional meaning schemas (in context or in general) that relate the meanings of the individual constructions 50 Challenge #2: Function words bake Bake baker baked a cake Context Bake-Event MOT Benefaction @Cake Cake for you Addressee CHI Function words tend to indicate relations rather than events or entities 51 Challenge #3: How far up to generalize Eat rice Inanimate Object Eat apple Eat watermelon Manipulable Objects Unmovable Objects Food Furniture Want rice Want apple Want chair Fruit apple Savory watermelon Chair Sofa rice 52 Challenge #4: Beyond pivot constructions Pivot constructions: indexing on particular constituent type Eat rice; Eat apple; Eat watermelon Abstract constructions: indexing on role-filler relations between constituents Eat catX food Schema Eat roles eater <--> agent food <--> patient Want catY wanted Schema Want roles wanter <--> agent wanted <--> patient 53 Challenge #5: Omissible constituents Intuition: Same context, two expressions that differ by one constituent a general construction with the constituent being omissible May require verbatim memory traces of utterances + “relevant” context 54 When does the learning stop? Schemas + Constructions reorganize Bayesian Learning Framework Gˆ argmax P(G | U , Z ) G argmax P(U | G, Z ) P(G ) G reinforcement Analysis + Resolution Context Fitting hypothesize SemSpec Most likely grammar given utterances and context The grammar prior is a preference for the “kind” of grammar In practice, take the log and minimize cost Minimum Description Length (MDL) 55 Intuition for MDL S -> Give me NP S -> Give me NP NP -> the book NP -> DET book NP -> a book DET -> the DET -> a Suppose that the prior is inversely proportional to the size of the grammar (e.g. number of rules) It’s not worthwhile to make this generalization 56 Intuition for MDL S -> Give me NP S -> Give me NP NP -> the book NP -> DET N NP -> a book DET -> the NP -> the pen DET -> a NP -> a pen N -> book NP -> the pencil N -> pen NP -> a pencil N -> pencil NP -> the marker N -> marker NP -> a marker 57 How to calculate the prior of this grammar (Yet to be determined) There is evidence that the lexicalized constructions do not completely go away If the more lexicalized constructions are retained, the size of grammar is a bad indication of degree of generality 58