language

advertisement
Statistical Language Learning:
Mechanisms and Constraints
Jenny R. Saffran
Department of Psychology & Waisman Center
University of Wisconsin - Madison
What kinds of learning mechanisms
do infants possess?
• How do infants master
complex bodies of knowledge?
• Learning requires both
experience & innate structure bridge between nature &
nurture?
– Constraints on learning:
computational, perceptual,
input-driven, maturational…
all neural, though we are not
working at that level of
analysis
Language acquisition:
Experience versus innate structure
• How much of language acquisition can be
explained by learning?
– Language-specific linguistic structures
• Learning does not offer transparent explanations…
– How is abstract linguistic structure acquired?
– Why are human languages so similar?
– Why can’t non-human learners acquire human
language?
Today’s talk:
Consider a new approach to language
learning that may begin to address
some of these outstanding central issues
in the study of language & beyond
Statistical Learning
freq XY
pr Y|X = freq X
Statistical Learning
freq XY
pr Y|X = freq X
What computations are performed?
What are the units over which
computations are performed?
Are these the right computations &
units given the structure of human
languages?
Breaking into language
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Word segmentation
Word segmentation cues
• Words in isolation
• Pauses/utterance boundaries
• Prosodic cues (e.g., word-initial stress in
English)
• Correlations with objects in the environment
• Phonotactic/articulatory cues
• Statistical cues
Statistical learning
High likelihood
High likelihood
PRE TTY BA BY
Low likelihood
Continuations within words are systematic
Continuations between words are arbitrary
Transitional probabilities
PRETTY BABY
(freq) pretty
(freq) pre
.80
versus
(freq) tyba
(freq) ty
.0002
Infants can use statistical cues to
find word boundaries
• Saffran, Aslin, & Newport (1996)
– 2 minute exposure to a nonsense language
(tokibu, gopila, gikoba, tipolu)
– Only statistical cues to word boundaries
– Tested on discrimination between words and
part-words (sequences spanning word boundaries)
Experimental setup
Headturn Preference Procedure
QuickTime™ and a
YUV420
codec decompressor
QuickTime™
and a
are neededdecompressor
to see this picture.
are needed to see this picture.
tokibugikobagopilatipolutokibu
gopilatipolutokibugikobagopila
gikobatokibugopilatipolugikoba
tipolugikobatipolugopilatipolu
tokibugopilatipolutokibugopila
tipolutokibugopilagikobatipolu
tokibugopilagikobatipolugikoba
tipolugikobatipolutokibugikoba
gopilatipolugikobatokibugopila
tokibugikobagopilatipolutokibu
gopilatipolutokibugikobagopila
gikobatokibugopilatipolugikoba
tipolugikobatipolugopilatipolu
tokibugopilatipolutokibugopila
tipolutokibugopilagikobatipolu
tokibugopilagikobatipolugikoba
tipolugikobatipolutokibugikoba
gopilatipolugikobatokibugopila
tokibugikobagopilatipolutokibu
gopilatipolutokibugikobagopila
gikobatokibugopilatipolugikoba
tipolugikobatipolugopilatipolu
tokibugopilatipolutokibugopila
tipolutokibugopilagikobatipolu
tokibugopilagikobatipolugikoba
tipolugikobatipolutokibugikoba
gopilatipolugikobatokibugopila
Results
*
Looking times (sec)
8
6
4
2
0
Words
Part-words
Detecting sequential probabilities
• Statistical learning for word segmentation
– Infants track transitional probabilities, not frequencies of co-ocurrence (Aslin, Saffran,
& Newport, 1997)
– The first useable cue to word boundaries: Use of statistical cues precedes use of
lexical stress cues (Thiessen & Saffran, 2003)
– Statistical learning is facilitated by the intonation contours of infant-directed speech
(Thiessen, Hill, & Saffran, 2005)
– Infants treat “tokibu” as an English word (Saffran, 2001)
– Emerging “words” feed into syntax learning (Saffran & Wilson, 2003)
• Other statistics useful for learning phonetic categories, lexical categories, etc.
• Beyond language: Domain generality
– Tone sequences (Saffran et al., 1999; Saffran & Griepentrog, 2001)
golabupabikututibudaropi...  AC#EDGFCBG#A#F#D#…
– Visuospatial & visuomotor sequences (Hunt & Aslin, 2000; Fiser & Aslin, 2003)
– Even non-human primates can do it! (Hauser, Newport, & Aslin, 2001)
So does statistical learning really tell
us anything about language learning?
Language acquisition:
Experience versus innate structure
• How much of language acquisition can be
explained by learning?
– Language-specific linguistic structures

• Learning does not offer transparent explanations…
– How is abstract linguistic structure acquired?
– Why are human languages so similar?
– Why can’t non-human learners acquire human
languages?
Acquisition of basic phrase structure
• Words occur serially, but representations of sentences contain clumps
of words (phrases)
How is this structure acquired? Where does it come from?
• Innately endowed as part of Universal Grammar (X-bar theory)?
• Prosodic cues? (probabilistically available)
• Predictive dependencies as cues to phrase units cross-linguistically
(c.f. mid-20th-century structural linguistics: phrasal diagnostics)
– Nouns often occur without articles, but articles usually require nouns:
*The walked down the street.
– NP often occurs without prepositions, but P usually requires NP
*She walked among.
– NP often occurs without Vtrans, but Vtrans usually requires object NP
*The man hit.
Statistical cue to phrase boundaries
• Unidirectional predictive dependencies
 high conditional probabilities
• Can humans use predictive dependencies to find
phrase units? (Saffran, 2001)
– Artificial grammar learning task
– Dependencies were the only phrase structure cues
– Adults & kids learned the basic structure of the language
Statistical cue to phrase boundaries
• Predictive dependencies assist learners in the
discovery of abstract underlying structure.
 Predicts better phrase structure learning when
predictive dependencies are available than when
they are not.
**Constraint on learning: Provides potential
learnability explanation for why languages so
frequently contain predictive dependencies**
Do predictive dependencies enhance learning?
Methodology: Contrast the acquisition of two
artificial grammars (Saffran, 2002)
• Predictive language
- Contains predictive dependencies between
word classes as a cue to phrasal units
• Non-predictive language
- No predictive dependencies between
word classes
Predictive language
S  AP + BP + (CP)
AP  A + (D) A, AD
BP  CP + F
CP  C + (G) C, CG
A = BIFF, SIG, RUD, TIZ
Note: Dependencies are the opposite direction from English (head-final
language)
Non-predictive language
S  AP + BP
AP  {(A) + (D)} A, D, AD
BP  CP + F
CP  {(C) + (G)} C, G, CG
e.g., in English: *NP  {(Det) + (N)}
Det, N, Det N
Predictive vs. Non-predictive
language comparison
•
•
•
•
•
Sentence types
Five word sentences
Three word sentences
Lexical categories
Vocabulary size
P
N
12
33%
11%
5
16
9
11%
44%
5
16
Experiment 1
• Participants: Adults & 6- to 9-year-olds
• Predictive versus Non-predictive phrase structure languages
– Language: Between-subject variable
– Incidental learning task
– 40 min. auditory exposure, with descending sentential prosody
BIFF
HEP
LUM
DUPP.
RUD
KLOR
CAV
LUM
TIZ.
• Auditory forced-choice test
– Novel grammatical vs. novel ungrammatical
– Same test items for all participants
Results
Mean score (chance = 15)
30
25
*
*
Adults
Children
20
15
10
5
0
Predictive
language
Non-predictive
language
Experiment 2: Effect of predictive dependencies
beyond the language domain?
• Same grammars, different vocabulary:
• Nonlinguistic materials: Alert sounds
• Exp. 1 materials (Predictive & Non-predictive grammars
and test items), translated into non-linguistic vocabulary
• Adult participants
Linguistic versus non-linguistic
Predictive
Mean score (chance = 15)
30
25
*
*
20
15
10
5
0
Linguistic
(Experiment 1)
Non-linguistic
(Experiment 2)
language
Non-predictive
language
New auditory non-linguistic task:
Predictive vs. Non-predictive languages
Non-linguistic replication
Mean score (chance = 15)
30
25
*
*
*
Linguistic
(Exp 1)
Nonlinguistic
(Exp 2)
Nonlinguistic
replication
(Exp 3)
20
15
10
5
0
Predictive
language
Non-predictive
language
Predictive language > Non-predictive language
•
Predictive dependencies play a role in learning
– For both linguistic & non-linguistic auditory materials
•
•
Also seen for simultaneous visual displays
But not sequential visual displays  modality effects
•
Human languages may contain predictive dependencies
because they assist the learner in finding structure.
•
The structure of human languages may have been
shaped by human learning mechanisms.
 Predict different patterns of learning for
appropriately aged human learners versus nonhuman learners.
Infant/Tamarin comparison: Methodology
(with Marc Hauser @ Harvard)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Headturn Preference Procedure:
Laboratory exposure
Test: Measure looking times
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Orienting Procedure:
Home cage exposure
Test: Measure % orienting responses
Paired methods previously used in studies of word segmentation, simple grammars, etc.
(Hauser, Newport, & Aslin, 2001; Hauser, Weiss, & Marcus, 2002; etc.)
Materials
• Predictive vs. Non-Predictive languages (between Ss)
• Small Grammar: Used to validate methodology
– Grammars written over individual words, not categories
(one A word, one C word, etc.)
– 8 sentences, repeated
– 2 min. exposure (infants) or 2 hrs. exposure (tamarins)
– Grammatical (familiar) vs. ungrammatical test items
• Large Grammar: Languages from adult studies
– Grammars written over categories (category A, C, etc.)
– 50 sentences, repeated
– 21 min. exposure (infants) or 2 hrs. exposure (tamarins)
– Grammatical (novel) vs. ungrammatical test items
Tamarin results
A.
100
Grammatical
Ungrammatical
*
Small grammar
0
B.
G UG
G UG
Predictive Non-Predictive
100
Large grammar
0
G UG
G UG
Predictive Non-Predictive
Tamarin results
A.
100
Grammatical
Ungrammatical
*
Small grammar
0
B.
G UG
G UG
Predictive Non-Predictive
100
Large grammar
0
G UG
G UG
Predictive Non-Predictive
Infant results (12-month-olds, 12 per group)
Small grammar
Looking times (sec)
10
8
6
Grammatical
Ungrammatical
*
4
2
0
Large grammar
Looking times (sec)
10
8
Predictive
Non-predictive
*
6
4
2
0
Predictive
Non-predictive
Cross-species differences
• Small grammar vs. large grammar
– Tamarins only learned the small grammar
• Difficulty with generalization? Memory for sentence exemplars?
• Can learn patterns over individual elements but not categories?
– Infants learned both systems, despite size of large grammar
• Availability of predictive dependencies
– Only affected the tamarins learning the small grammar
– Affected the infants regardless of the size of the grammar
• Consistent with constrained statistical learning
hypothesis  human learning mechanisms may have
shaped the structure of natural languages
Constrained statistical learning as a theory of
language acquisition?
• Word segmentation, aspects of phonology, aspects of syntax
• Developing the theory
– Scaling up: Multiple probabilistic cues in the input (e.g., prosodic cues), multiple
levels of language in the input, more realistic speech (e.g., IDS)
– Mapping to meaning: Are statistically-segmented ‘words’ good labels?
– Critical period effects: Exogenous constraints on statistical learning
– Modularity: Distinguishing domain-specific & domain-general factors
• e.g., statistical learning of “musical syntax”
– Bilingualism: Separating languages & computing separate statistics
– Relating to real acquisition outcomes: Individual differences
• Patients with congenital amusia with Isabelle Peretz, U. de Montreal
• Specific Language Impairment study with Dr. Julia Evans, UW-Madison
Conclusions
• Infants are powerful language learners: Rapid acquisition
of complex structure without external reinforcement
• However, humans are constrained in the types of patterns
they readily acquire
• Understanding what is *not* learnable may be just as
valuable as cataloging what infants *can* learn
 These predispositions may be among the factors that
have shaped the structure of human language
Acknowledgements
Infant Learning Lab
UW-Madison
•
•
•
•
•
•
National Institutes of Health RO1 HD37466, P30 HD03352
National Science Foundation PECASE BCS-9983630
UW-Madison Graduate School
UW-Madison Waisman Center
Members of the Infant Learning Lab
All the parents and babies who have participated!
Download