The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign How do children begin to understand sentences? Topid rivvo den marplox. Two problems of ambiguity "the language" "the world" Topid rivvo den marplox. eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother) Typical view: knowledge of meaning drives knowledge of syntax "the sentence" "the world" Topid rivvo den marplox. feed (Johanna, sheep) Typical view: knowledge of meaning drives knowledge of grammar "the world" "the language" Topid rivvo den marplox. Sentence-meaning pairs feed (Johanna, sheep) With or without assumed innate constraints on syntaxsemantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003) The problem of sentence and verb meanings • Predicate terms don't label events, but adopt different perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette, Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985) "I'll put this here." versus "This goes here." If we take seriously the ambiguity of scenes ... "the sentence" "the world" Topid rivvo den marplox. eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother) Syntactic bootstrapping (Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985; Naigles, 1990) sentence structure verbs' semantic predicate-argument structure "I'll put this here." versus "This goes here." How could this be? How could aspects of syntactic structure guide early sentence interpretation ... before the child has learned much about the syntax and morphology of this language? Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press) Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press) (2) Early abstraction: Children are biased toward abstract representations of language experience. (Gertner, Fisher & Eisengart, 2006; Thothathiri & Snedeker, 2008) (3) Independent encoding of sentence structure: Children gather distributional facts about verbs from listening experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009; Yuan & Fisher, 2009) A fourth prediction: (4) 'Real' syntactic bootstrapping: Children can learn which words are verbs by tracking their syntactic argumenttaking behavior in sentences. A computational model of syntactic bootstrapping • Computational experiments simulate a learner whose knowledge of sentence structure is under our control • We equip the model with an unlearned bias to map nouns onto abstract semantic roles, and ask: – Are partial representations based on sets of nouns useful for learning more about the native language? • First case study: learning English word order – Could a learner identify verbs as verbs by noting their tendency to occur with particular numbers of arguments? Computational Model of Semantic Role Labeling (SRL) • A semantic analysis of sentences at the level of who does what to whom • For each verb in the sentence: – SRL system tries to identify all constituents that fill a semantic role – and to assign them roles (agent, patient, goal, etc.) Remember V: remember what A1: patient A1: patient A0: agent Daddy said V: say Semantic Role Labeling (SRL): The basic idea External Labeled Feedback 5 Argument Classification: 4 Feature Extraction: 3 2 1 I is an agent A0 A0 arg=I A1 verb= like bread is a patient A1 ... A0 arg= bread NP < S A1 verb= like Predicate Identification: Argument Identification: Parsing: I like dark bread . ... NP < VP The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: – This is a property of the PropBank annotation scheme: verb-specific roles are grouped into macro-roles (Palmer, Gildea, & Kinsbury, 2005; cf. Dowty, 1991) Like: Arg0:Liker Give: Arg0:Giver Have: Arg0:Owner Arg1:Object of affection Arg1:Thing given Arg1:Possession Arg2:Recipient The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: – This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) Agent Like: Arg0:Liker Give: Arg0:Giver Have: Arg0:Owner Arg1:Object of affection Arg1:Thing given Arg1:Possession Arg2:Recipient The nature of the semantic roles • A key assumption of the SRL is that the semantic roles are abstract: – This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) Patient/Theme Like: Arg0:Liker Give: Arg0:Giver Have: Arg0:Owner Arg1:Object of affection Arg1:Thing given Arg1:Possession Arg2:Recipient • So, like children, an SRL learns to identify agents and patients in sentences, not givers and things-given. This SRL knows the grammar ... & reads External I is an agentminds bread is a patient Labeled Feedback 5 Argument Classification: 4 Feature Extraction: 3 2 1 A0 A0 arg=I A1 A1 verb= like ... A0 arg= bread NP < S A1 verb= like Predicate Identification: Argument Identification: Parsing: I like dark bread . ... NP < VP This SRL knows the grammar ... & reads External minds Labeled Feedback 5 Argument Classification: 4 Feature Extraction: 3 2 1 what are they talking about? A0 arg=? A1 verb= ??? ... ?? < ? A0 arg=? A1 verb= ??? Predicate Identification: Argument Identification: Parsing: Topid rivvo den marplox. ... ?? < ? Baby SRL External Labeled Feedback 5 Argument Classification: Feature 4 Extraction: 3 2 1 I is an agent A0 Predicate Identification: Argument Identification: HMM: A0 arg=I A1 verb= like bread is a patient A1 ... 1st of two A0 before verb arg= bread verb= like N Trained on CDS 58 78 I like ... 2nd of two after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns A1 N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Baby SRL Internal Animacy Feedback 5 Argument Classification: Feature 4 Extraction: 3 2 1 Predicate Identification: Argument Identification: HMM: bread is unknown, agent is I not A0 I is animate A0 A0 arg=I A1 verb= like ... 1st of two A0 before verb arg= bread verb= like N Trained on CDS 58 78 I like ... 2nd of two after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns A1 N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Unsupervised Part-of-Speech Clustering using a Hidden Markov Model (HMM) • Yields context-sensitive clustering of words based on sequential dependencies (cf. Bannard et al., 2004; Chang et al., 2006; Mintz, 2003; Solan et al., 2005) • A priori split between content and function words based on phonological differences (e.g., Shi et al., 1998; Connor et al., 2010) – 80 hidden states, 30 pre-allocated to function words • Trained on ~ 1 million words of unlabeled text from CHILDES 58: I, you, ya, Kent 1 HMM: Trained on CDS 58 78 I like 50 48 dark bread 1 . 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Linking rules • But these clusters are unlabeled • How do we link unlabeled clusters with particular grammatical categories, thus particular roles in sentence interpretation (SRL)? – nouns, which are candidate arguments – and verbs, which take arguments 1 HMM: Trained on CDS 58 78 I like 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Linking rules: Nouns first • Syntactic bootstrapping is grounded in noun learning (e.g., Gillette et al., 1999) – Children learn the meanings of some nouns without syntactic knowledge • Each noun is assumed to be a candidate argument – A simple form of semantic bootstrapping (Pinker, 1984) • (10 to 75) Seed Nouns sampled from M-CDI 2 Argument Identification: 1 HMM: Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Linking rules: What about verbs? • Verbs take noun arguments • Via syntactic bootstrapping, children identify a word as a verb by tracking its argument-taking behavior in sentences 3 2 1 Predicate Identification: Argument Identification: HMM: Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Linking rules: What about verbs? • Collect statistics over HMM training corpus: how often each state occurs with a particular number of arguments • For each SRL input sentence, choose the word whose state is most likely to appear with the number of arguments found in the current sentence. 3 2 1 Predicate Identification: Argument Identification: HMM: 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Baby SRL 5 Argument Classification: Feature 4 Extraction: A0 arg=I A1 verb= like ... 1st of two A0 before verb arg= bread A1 verb= like ... 2nd of two after verb This is a partial sentence representation. 3 2 1 Predicate Identification: Argument Identification: HMM: 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Baby SRL External Labeled Feedback 5 I is an agent A0 Argument Classification: Feature 4 Extraction: A0 arg=I A1 verb= like bread is a patient A1 ... 1st of two A0 before verb arg= bread A1 verb= like ... 2nd of two after verb This is a partial sentence representation. 3 2 1 Predicate Identification: Argument Identification: HMM: 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Semantic-role feedback??? • Theories of language acquisition assume learning to understand sentences is a partially-supervised task. – use existing knowledge of words & syntax to assign a meaning to a sentence – the appropriateness of the meaning in context then provides the feedback [Johanna rivvo den sheep.] Baby SRL with Animacy-based Feedback Internal Animacy Feedback I is animate A0 bread is unknown, agent is I not A0 Bootstrapped animacy feedback (a list of concrete nouns): 1. Animates are agents; inanimates are non-agents 2. Each verb has at most one agent 3. If there are two known animate nouns – Use current SRL classifier to choose the best agent (and use this decision as feedback to train the SRL) • Note: Animacy feedback trains a simplified classifier, with only an agent/non-agent role distinction Baby SRL Internal Animacy Feedback 5 Argument Classification: Feature 4 Extraction: 3 2 1 Predicate Identification: Argument Identification: HMM: bread is unknown, agent is I not A0 I is animate A0 A0 arg=I notA0 verb= like A0 1st of two before verb arg= bread verb= like N Trained on CDS 58 78 I like 2nd of two after verb 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns notA0 N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Training the Baby SRL 3 corpora of child-directed speech – Annotated parental utterances using PropBank annotation scheme – Speech to Adam, Eve, Sarah (Brown, 1973) Adam 01-23 (2;3 - 3;2) • Train on 01-20: 3951 propositions, 8107 arguments Eve 01-20 (1;6 - 2;3) • Train on 01-18: 4029 propositions, 8499 arguments Sarah 01-90 (2;3 - 4;1) • Train on 01-83: 8570 propositions, 15599 arguments Testing the Baby SRL • Constructed test sentences like those used in experiments with children – Unknown verbs & two animate nouns force the SRL to rely on syntactic knowledge "Adam krads Daddy!" Adam Mommy Daddy Ursula ... krad Adam Mommy Daddy Ursula ... Testing the Baby SRL • Compare systems trained on different syntactic features derived from the partial sentence representation • Lexical baseline: Verb and argument word features only verb= arg=I like • Noun Pattern: Lexical features plus features such as '1st of 2 nouns', '2nd verb= 1st of arg=I like two of 2 nouns' • Verb Position: Lexical features plus verb= before arg=I 'before verb', 'after verb' like verb • Combined: All feature types arg=I verb= like 1st of two before verb Question 1: Are partial sentence representations useful, in principle, for learning new syntactic knowledge? Learning English word order Start with "gold standard" part-of-speech tagging and "gold standard" semantic role feedback Are partial sentence representations useful, as a starting point for learning? • We assume that representations as simple as an ordered set of nouns can yield useful information about verb argument structure and sentence interpretation. • Is this true, in ordinary samples of language use? • Despite all the noise this simple representation will add to the data? Result 1: Partial sentence representations are useful for further syntax learning (A krads B) Question 2: Can partial sentence representations be built via unsupervised clustering ... and the proposed linking rules? 3 2 1 Predicate Identification: Argument Identification: HMM: 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N V Seed Nouns N Trained on CDS 58 78 I like N 50 48 dark bread 1 . 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk Result 2: Partial sentence representations can be built via unsupervised clustering Number of known seed nouns Question 3: Now for the Baby SRL Are partial sentence representations built via unsupervised clustering useful for learning new syntactic knowledge? With animacy-based feedback? Learning English word order Starting from scratch ... Result 3: Partial sentence representations based on unsupervised clustering are useful SRL Summary • Result 1: Partial sentence representations are useful for further syntax learning (gold arguments & feedback) • Result 2: Partial sentence representations can be built via unsupervised clustering, and the proposed linking rules: – Semantic bootstrapping for nouns, 'real' syntactic bootstrapping for verbs • Result 3: Partial sentence representations based on unsupervised clustering are useful for further syntax learning – Robust learning based on noisy unsupervised 'parse' of sentences, and noisy animacy-based feedback • verbs precede objects in English Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with an unlearned bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (2) Early abstraction: Children are biased toward usefully abstract representations of language experience. (3) Independent encoding of sentence structure: Children gather distributional facts about new verbs from listening experience. A fourth prediction: (4) 'Real' syntactic bootstrapping: Children learn which words are the verbs by tracking their syntactic argumenttaking behavior in sentences Hey, she pushed her. Will you push me on the swing? John pushed the cat off the sofa ... Verb = Push [noun1, noun2] 2-participant relation Acknowledgements Yael Gertner Jesse Snedeker Lila Gleitman NSF NICHD Soondo Baek Kate Messenger Sylvia Yuan Rose Scott