BabySRL BUCLD 2011 - Cognitive Computation Group

advertisement
The origin of syntactic bootstrapping:
A computational model
Michael Connor, Cynthia Fisher, & Dan Roth
University of Illinois at Urbana-Champaign
How do children begin to understand
sentences?
Topid rivvo den marplox.
Two problems of ambiguity
"the language"
"the world"
Topid rivvo den marplox.
eating (sheep, food)
feed (Johanna, sheep)
help (Daddy, Johanna)
scared (Johanna, sheep)
are-nice (sheep)
want (sheep, food)
getting away (brother)
Typical view: knowledge of meaning drives
knowledge of syntax
"the sentence"
"the world"
Topid rivvo den marplox.
feed (Johanna, sheep)
Typical view: knowledge of meaning drives
knowledge of grammar
"the world"
"the language"
Topid rivvo den marplox.
Sentence-meaning pairs
feed (Johanna, sheep)
With or without assumed innate constraints on syntaxsemantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003)
The problem of sentence and verb meanings
• Predicate terms don't label events, but adopt different
perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette,
Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985)
"I'll put this here."
versus
"This goes here."
If we take seriously the ambiguity of scenes ...
"the sentence"
"the world"
Topid rivvo den marplox.
eating (sheep, food)
feed (Johanna, sheep)
help (Daddy, Johanna)
scared (Johanna, sheep)
are-nice (sheep)
want (sheep, food)
getting away (brother)
Syntactic bootstrapping
(Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985;
Naigles, 1990)
sentence structure
verbs' semantic
predicate-argument structure
"I'll put this here."
versus
"This goes here."
How could this be?
How could aspects of syntactic structure guide early sentence
interpretation ... before the child has learned much about the
syntax and morphology of this language?
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with a
bias toward one-to-one mapping between nouns in
sentences and semantic arguments of predicate terms.
(Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan,
Fisher & Snedeker, in press)
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with a
bias toward one-to-one mapping between nouns in
sentences and semantic arguments of predicate terms.
(Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan,
Fisher & Snedeker, in press)
(2) Early abstraction: Children are biased toward abstract
representations of language experience. (Gertner, Fisher &
Eisengart, 2006; Thothathiri & Snedeker, 2008)
(3) Independent encoding of sentence structure: Children
gather distributional facts about verbs from listening
experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009;
Yuan & Fisher, 2009)
A fourth prediction:
(4) 'Real' syntactic bootstrapping: Children can learn which
words are verbs by tracking their syntactic argumenttaking behavior in sentences.
A computational model of syntactic
bootstrapping
• Computational experiments simulate a learner whose
knowledge of sentence structure is under our control
• We equip the model with an unlearned bias to map nouns
onto abstract semantic roles, and ask:
– Are partial representations based on sets of nouns
useful for learning more about the native language?
• First case study: learning English word order
– Could a learner identify verbs as verbs by noting their
tendency to occur with particular numbers of
arguments?
Computational Model of Semantic Role
Labeling (SRL)
• A semantic analysis of sentences at the level of who does
what to whom
• For each verb in the sentence:
– SRL system tries to identify all constituents that fill a
semantic role
– and to assign them roles (agent, patient, goal, etc.)
Remember V: remember
what
A1: patient
A1: patient A0: agent
Daddy
said
V: say
Semantic Role Labeling (SRL): The basic idea
External
Labeled
Feedback
5
Argument
Classification:
4
Feature
Extraction:
3
2
1
I is an agent
A0
A0
arg=I
A1
verb=
like
bread is a patient
A1
...
A0
arg=
bread
NP < S
A1
verb=
like
Predicate
Identification:
Argument
Identification:
Parsing:
I
like
dark bread
.
...
NP < VP
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are
abstract:
– This is a property of the PropBank annotation scheme:
verb-specific roles are grouped into macro-roles (Palmer,
Gildea, & Kinsbury, 2005; cf. Dowty, 1991)
Like:
Arg0:Liker
Give:
Arg0:Giver
Have:
Arg0:Owner
Arg1:Object of
affection
Arg1:Thing given
Arg1:Possession
Arg2:Recipient
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are
abstract:
– This is a result of the PropBank annotation scheme:
verb specific roles are grouped into macro-roles (e.g.,
Dowty, 1991)
Agent
Like:
Arg0:Liker
Give:
Arg0:Giver
Have:
Arg0:Owner
Arg1:Object of
affection
Arg1:Thing given
Arg1:Possession
Arg2:Recipient
The nature of the semantic roles
• A key assumption of the SRL is that the semantic roles are
abstract:
– This is a result of the PropBank annotation scheme:
verb specific roles are grouped into macro-roles (e.g.,
Dowty, 1991)
Patient/Theme
Like:
Arg0:Liker
Give:
Arg0:Giver
Have:
Arg0:Owner
Arg1:Object of
affection
Arg1:Thing given
Arg1:Possession
Arg2:Recipient
• So, like children, an SRL learns to identify agents and
patients in sentences, not givers and things-given.
This SRL knows the grammar ... & reads
External
I is an agentminds
bread is a patient
Labeled
Feedback
5
Argument
Classification:
4
Feature
Extraction:
3
2
1
A0
A0
arg=I
A1
A1
verb=
like
...
A0
arg=
bread
NP < S
A1
verb=
like
Predicate
Identification:
Argument
Identification:
Parsing:
I
like
dark bread
.
...
NP < VP
This SRL knows the grammar ... & reads
External
minds
Labeled
Feedback
5
Argument
Classification:
4
Feature
Extraction:
3
2
1
what are they talking about?
A0
arg=?
A1
verb=
???
...
?? < ?
A0
arg=?
A1
verb=
???
Predicate
Identification:
Argument
Identification:
Parsing:
Topid rivvo den marplox.
...
?? < ?
Baby SRL
External
Labeled
Feedback
5
Argument
Classification:
Feature
4
Extraction:
3
2
1
I is an agent
A0
Predicate
Identification:
Argument
Identification:
HMM:
A0
arg=I
A1
verb=
like
bread is a patient
A1
...
1st of
two
A0
before
verb
arg=
bread
verb=
like
N
Trained on CDS
58
78
I
like
...
2nd of
two
after
verb
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
A1
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Baby SRL
Internal
Animacy
Feedback
5
Argument
Classification:
Feature
4
Extraction:
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
bread is unknown,
agent is I
not A0
I is animate
A0
A0
arg=I
A1
verb=
like
...
1st of
two
A0
before
verb
arg=
bread
verb=
like
N
Trained on CDS
58
78
I
like
...
2nd of
two
after
verb
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
A1
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Unsupervised Part-of-Speech Clustering using
a Hidden Markov Model (HMM)
• Yields context-sensitive clustering of words based on
sequential dependencies (cf. Bannard et al., 2004; Chang et al.,
2006; Mintz, 2003; Solan et al., 2005)
• A priori split between content and function words based on
phonological differences (e.g., Shi et al., 1998; Connor et al.,
2010)
– 80 hidden states, 30 pre-allocated to function words
• Trained on ~ 1 million words of unlabeled text from
CHILDES
58: I, you, ya, Kent
1
HMM:
Trained on CDS
58
78
I
like
50
48
dark bread
1
.
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Linking rules
• But these clusters are unlabeled
• How do we link unlabeled clusters with particular
grammatical categories, thus particular roles in sentence
interpretation (SRL)?
– nouns, which are candidate arguments
– and verbs, which take arguments
1
HMM:
Trained on CDS
58
78
I
like
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Linking rules: Nouns first
• Syntactic bootstrapping is grounded in noun learning (e.g.,
Gillette et al., 1999)
– Children learn the meanings of some nouns without
syntactic knowledge
• Each noun is assumed to be a candidate argument
– A simple form of semantic bootstrapping (Pinker, 1984)
• (10 to 75) Seed Nouns sampled from M-CDI
2
Argument
Identification:
1
HMM:
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Linking rules: What about verbs?
• Verbs take noun arguments
• Via syntactic bootstrapping, children identify a word as a
verb by tracking its argument-taking behavior in sentences
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Linking rules: What about verbs?
• Collect statistics over HMM training corpus: how often
each state occurs with a particular number of arguments
• For each SRL input sentence, choose the word whose state
is most likely to appear with the number of arguments
found in the current sentence.
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Baby SRL
5
Argument
Classification:
Feature
4
Extraction:
A0
arg=I
A1
verb=
like
...
1st of
two
A0
before
verb
arg=
bread
A1
verb=
like
...
2nd of
two
after
verb
This is a partial sentence representation.
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Baby SRL
External
Labeled
Feedback
5
I is an agent
A0
Argument
Classification:
Feature
4
Extraction:
A0
arg=I
A1
verb=
like
bread is a patient
A1
...
1st of
two
A0
before
verb
arg=
bread
A1
verb=
like
...
2nd of
two
after
verb
This is a partial sentence representation.
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Semantic-role feedback???
• Theories of language acquisition assume learning to
understand sentences is a partially-supervised task.
– use existing knowledge of words & syntax to assign a
meaning to a sentence
– the appropriateness of the meaning in context then
provides the feedback
[Johanna rivvo den sheep.]
Baby SRL with Animacy-based Feedback
Internal
Animacy
Feedback
I is animate
A0
bread is unknown,
agent is I
not A0
Bootstrapped animacy feedback (a list of concrete nouns):
1. Animates are agents; inanimates are non-agents
2. Each verb has at most one agent
3. If there are two known animate nouns
– Use current SRL classifier to choose the best agent
(and use this decision as feedback to train the SRL)
• Note: Animacy feedback trains a simplified classifier, with
only an agent/non-agent role distinction
Baby SRL
Internal
Animacy
Feedback
5
Argument
Classification:
Feature
4
Extraction:
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
bread is unknown,
agent is I
not A0
I is animate
A0
A0
arg=I
notA0
verb=
like
A0
1st of
two
before
verb
arg=
bread
verb=
like
N
Trained on CDS
58
78
I
like
2nd of
two
after
verb
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
notA0
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Training the Baby SRL
3 corpora of child-directed speech
– Annotated parental utterances using PropBank
annotation scheme
– Speech to Adam, Eve, Sarah (Brown, 1973)
Adam 01-23 (2;3 - 3;2)
• Train on 01-20: 3951 propositions, 8107 arguments
Eve 01-20 (1;6 - 2;3)
• Train on 01-18: 4029 propositions, 8499 arguments
Sarah 01-90 (2;3 - 4;1)
• Train on 01-83: 8570 propositions, 15599 arguments
Testing the Baby SRL
• Constructed test sentences like those used in experiments
with children
– Unknown verbs & two animate nouns force the SRL to
rely on syntactic knowledge
"Adam krads Daddy!"
Adam
Mommy
Daddy
Ursula
...
krad
Adam
Mommy
Daddy
Ursula
...
Testing the Baby SRL
• Compare systems trained on different syntactic features
derived from the partial sentence representation
• Lexical baseline: Verb and argument
word features only
verb=
arg=I
like
• Noun Pattern: Lexical features plus
features such as '1st of 2 nouns', '2nd
verb= 1st of
arg=I
like
two
of 2 nouns'
• Verb Position: Lexical features plus
verb= before
arg=I
'before verb', 'after verb'
like
verb
• Combined: All feature types
arg=I
verb=
like
1st of
two
before
verb
Question 1: Are partial sentence representations
useful, in principle, for learning new syntactic
knowledge?
Learning English word order
Start with "gold standard" part-of-speech tagging
and "gold standard" semantic role feedback
Are partial sentence representations useful, as a
starting point for learning?
• We assume that representations as simple as an ordered set
of nouns can yield useful information about verb argument
structure and sentence interpretation.
• Is this true, in ordinary samples of language use?
• Despite all the noise this simple representation will add to
the data?
Result 1: Partial sentence representations are
useful for further syntax learning (A krads B)
Question 2: Can partial sentence
representations be built via unsupervised
clustering ... and the proposed linking rules?
3
2
1
Predicate
Identification:
Argument
Identification:
HMM:
78: 13% 1-N, 54% 2-N
50: 41% 1-N, 31% 2-N
V
Seed Nouns
N
Trained on CDS
58
78
I
like
N
50
48
dark bread
1
.
58: I, you, ya, Kent
78: like, got, did, had
50: many, nice, good
48: more, juice, milk
Result 2: Partial sentence representations can
be built via unsupervised clustering
Number of known seed nouns
Question 3: Now for the Baby SRL
Are partial sentence representations built via unsupervised
clustering useful for learning new syntactic knowledge? With
animacy-based feedback?
Learning English word order
Starting from scratch ...
Result 3: Partial sentence representations based
on unsupervised clustering are useful
SRL Summary
• Result 1: Partial sentence representations are useful for
further syntax learning (gold arguments & feedback)
• Result 2: Partial sentence representations can be built via
unsupervised clustering, and the proposed linking rules:
– Semantic bootstrapping for nouns, 'real' syntactic
bootstrapping for verbs
• Result 3: Partial sentence representations based on
unsupervised clustering are useful for further syntax learning
– Robust learning based on noisy unsupervised 'parse' of
sentences, and noisy animacy-based feedback
• verbs precede objects in English
Three main claims:
(1) Structure-mapping: Syntactic bootstrapping begins with
an unlearned bias toward one-to-one mapping between
nouns in sentences and semantic arguments of predicate
terms.
(2) Early abstraction: Children are biased toward usefully
abstract representations of language experience.
(3) Independent encoding of sentence structure: Children
gather distributional facts about new verbs from listening
experience.
A fourth prediction:
(4) 'Real' syntactic bootstrapping: Children learn which
words are the verbs by tracking their syntactic argumenttaking behavior in sentences
Hey, she pushed her.
Will you push me on the swing?
John pushed the cat off the sofa
...
Verb = Push
[noun1, noun2]
2-participant relation
Acknowledgements
Yael Gertner
Jesse Snedeker
Lila Gleitman
NSF
NICHD
Soondo Baek
Kate Messenger
Sylvia Yuan
Rose Scott
Download