PPT - University of Iowa

advertisement
Continuous acoustic detail affects
spoken word recognition
Implications for
cognition,
development
and language
disorders.
Bob McMurray
University of Iowa
Dept. of Psychology
Collaborators
Richard Aslin
Michael Tanenhaus
David Gow
J. Bruce Tomblin
Joe Toscano
Cheyenne Munson
Dana Subik
Julie Markant
Why Speech and Word Recognition
1) Interface between perception and cognition.
- Basic Categories
- Meaning
- Continuous Input -> Discrete representations.
2) Meaningful stimuli are almost always temporal.
- Music
- Visual Scenes (across saccades)
- Language
3) We understand the:
- Cognitive processes (word recognition)
- Perceptual processes (speech perception)
- Ecology of the input (phonetics)
4) Speech is important: disordered language.
Divisions, Divisions…
Speech / Language
Pathology
Word Recognition,
Sentence Processing
Phonetics
Phonology,
The Lexicon
Speech,
Hearing
Language
Cognition
Linguistics
Perception (& Action)
Psychology
Speech
Perception
Divisions, Divisions…
Divisions useful for framing research and focusing
questions.
But:
Divisions between domains of study
can become…
Implicit models of cognitive processing.
Divisions in Spoken Language Understanding
Speech Perception
• Categorization of acoustic
input into sublexical units.
Acoustic
Sublexical Units
/a/
/la/
/b/
Word Recognition
• Identification of target word
from active sublexical units.
/ip/
/l/
/p/
Lexicon
Divisions yield processes
Speech Perception
• Pattern Recognition
• Normalization Processes
• Stream Segregation
Acoustic
Sublexical Units
/a/
/la/
/b/
Word Recognition
• Competition
• Activation
• Constraint Satisfaction
/ip/
/l/
/p/
Lexicon
Processes yield models
Acoustic
Reduce
Continuous
Variance
Word Recognition
• Identify single
referent.
• Ignore competitors.
Speech Perception
• Extract invariant
phonemes and features.
• Discard continuous
variation.
Sublexical Units
/a/ /la/ /ip/
/b/ /l/ /p/
Lexicon
Reduce
Variance
The Variance Reduction Model
Words
Remove
variance
Phonemes (etc)
Remove
variance
Variance Reduction Model (VRM)
Understanding speech is a process of
progressively extracting invariant,
discrete representations from
variable, continuous input.
Continuous speech cues play a minimal role in word
recognition (and probably wouldn’t be helpful
anyways).
Temporal Integration
The VRM might apply if speech were static.
“Goon”
Goal: Identify /u/
Signal: Low F1, F2, High F3
Noise: Initially: F2 decreasing Variance
Later: F2 increasing
Reduction
Presence of anti-formant Mechanisms
Temporal Integration
But the dynamic properties make it more difficult.
Gone.
Maybe in
STM?
Hasn’t
happened
yet.
“Goon”
Goal: Identify /u/
Signal: Low F1, F2, High F3
Noise: Initially: F2 decreasing
Later: F2 increasing
Presence of anti-formant
Temporal Integration
But the dynamic properties make it more difficult.
Gone.
Maybe in
STM?
Hasn’t
happened
yet.
“Goon”
Variance
Utilization
Mechanisms
Goal: Identify /u/
Signal: Low F1, F2, High F3
Signal': Initially: F2 decreasing
Later: F2 increasing
Presence of anti-formant
Prior /g/
Upcoming
/n/
Goals
1) Replace the Variance
Reduction Model with
the Variance Utilization
Model.
2) Normal lexical activation
processes can serve as
variance utilization mechanisms.
Words
Remove
variance
Phonemes (etc)
Remove
variance
3) Speculatively (and not so speculatively) examine
the consequences for:
•
•
•
Temporal Integration / Short Term Memory.
Development
Non-normal Development
Outline
1) Review
• Origins of the VRM.
• Spoken Word Recognition.
2) Empirical Test
3) The VUM
•
Lexical Locus
• Temporal Integration
• SLI proposal
4) Developmental Consequences
• Empirical Tests
• Computational Model
• CI proposal
Word Recognition
Online Spoken Word Recognition
• Information arrives sequentially
• Fundamental Problem: At early points in time, signal is
temporarily ambiguous.
X
basic
ba… kery
bakery
X
barrier
X
X
bait
barricade
X
baby
• Later arriving information disambiguates the word.
Word Recognition
Current models of spoken word recognition
• Immediacy: Hypotheses formed from the earliest
moments of input.
• Activation Based: Lexical candidates (words) receive
activation to the degree they match the input.
• Parallel Processing: Multiple items are active in
parallel.
• Competition: Items compete with each other for
recognition.
Word Recognition
Input:
time
beach
butter
bump
putter
dog
b...
u…
tt…
e…
r
Word Recognition
These processes have been well defined for a phonemic
representation of the input.
c A gnISn
Considerably less ambiguity if we consider subphonemic
information.
• Bonus: processing dynamics may solve problems in
speech perception.
Example: subphonemic effects of motor processes.
Coarticulation
Any action reflects future actions as it unfolds.
Example: Coarticulation
Articulation (lips, tongue…) reflects current, future and
past events.
Subtle subphonemic variation in speech reflects temporal
organization.
Sensitivity to these perceptual
details might yield earlier
n
n
disambiguation.
e

e
t
c
Lexical activation could retain
k
these perceptual details.
Review:
These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded.
Example: Categorical Perception
Categorical Perception
P
100
Discrimination
B
% /p/
100
Discrimination
ID (%/pa/)
0
B
0
VOT
P
• Sharp identification of tokens on a continuum.
• Discrimination poor within a phonetic category.
Subphonemic variation in VOT is discarded in favor of a
discrete symbol (phoneme).
Categorical Perception
Evidence against the strong form of Categorical
Perception from psychophysical-type tasks:
Discrimination Tasks
Pisoni and Tash (1974)
Pisoni & Lazarus (1974)
Carney, Widin & Viemeister (1977)
Training
Samuel (1977)
Pisoni, Aslin, Perey & Hennessy (1982)
Goodness Ratings
Miller (1997)
Massaro & Cohen (1983)
Variance Reduction Model
Words
CP enabled a
fundamental independence of
speech perception & spoken
word recognition.
Remove
variance
Phonemes (etc)
Evidence against CP seen as supporting VRM
(auditory vs. phonological processing mode).
Critical Prediction: continuous variation in the
signal should not affect word recognition.
Remove
variance
Experiment 1
Does within-category acoustic detail
systematically affect higher level
language?
Is there a gradient effect of subphonemic
detail on lexical activation?
McMurray, Aslin & Tanenhaus (2002)
A gradient relationship would yield systematic effects of
subphonemic information on lexical activation.
If this gradiency is useful for temporal integration, it must be
preserved over time.
Need a design sensitive to both acoustic detail and detailed
temporal dynamics of lexical activation.
Acoustic Detail
Use a speech continuum—more steps yields a better
picture acoustic mapping.
KlattWorks: generate synthetic continua from natural
speech.
9-step VOT continua (0-40 ms)
6 pairs of words.
beach/peach
bump/pump
bale/pale
bomb/palm
bear/pear
butter/putter
lock
shoe
lip
sheep
6 fillers.
lamp
shark
leg
shell
ladder
ship
leaf
shirt
Acoustic Detail
Temporal Dynamics
How do we tap on-line recognition?
With an on-line task: Eye-movements
Subjects hear spoken language and manipulate objects in
a visual world.
Visual world includes set of objects with interesting
linguistic properties.
a beach,, a peach and some unrelated items.
Eye-movements to each object are monitored throughout
the task.
Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Temporal Dynamics
Why use eye-movements and visual world paradigm?
• Relatively natural task.
• Eye-movements generated very fast (within 200ms of
first bit of information).
• Eye movements time-locked to speech.
• Subjects aren’t aware of eye-movements.
• Fixation probability maps onto lexical activation..
Task
A moment
to view the
items
Task
Task
Bear
Repeat
1080
times
Identification Results
1
0.9
proportion /p/
0.8
High agreement
across subjects and
items for category
boundary.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
B
By subject:
By item:
15
20
25
VOT (ms)
30
35
40
P
17.25 +/- 1.33ms
17.24 +/- 1.24ms
Eye-Movement Analysis
200 ms
Trials
1
2
3
4
Target = Bear
Competitor = Pear
% fixations
5
Unrelated = Lamp, Ship
Time
Eye-Movement Results
VOT=0 Response=
VOT=40 Response=
Fixation proportion
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
00
400
800
1200
1600
0
400
800
1200
1600
Time (ms)
More looks to competitor than unrelated items.
2000
Eye-Movement Results
Given that
• the subject heard bear
• clicked on “bear”…
How often was the subject
looking at the “pear”?
target
competitor
Gradient Effect
Fixation proportion
Fixation proportion
Categorical Results
target
competitor
time
time
Eye-Movement Results
Response=
Response=
Competitor Fixations
0.16
VOT
VOT
0.14
0 ms
5 ms
10 ms
15 ms
0.12
0.1
20 ms
25 ms
30 ms
35 ms
40 ms
0.08
0.06
0.04
0.02
0
0
400
800
1200
1600
0
400
800
1200
1600
2000
Time since word onset (ms)
Long-lasting gradient effect: seen throughout the
timecourse of processing.
Eye-Movement Results
Response=
Response=
Competitor Fixations
0.08
0.07
Looks to
0.06
0.05
0.04
Looks to
Category
Boundary
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Area under the curve:
Clear effects of VOT B: p=.017*
Linear Trend B: p=.023*
P: p<.001***
P: p=.002***
Eye-Movement Results
Response=
Response=
Competitor Fixations
0.08
0.07
Looks to
0.06
0.05
0.04
Looks to
Category
Boundary
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Unambiguous Stimuli Only
Clear effects of VOT B: p=.014* P: p=.001***
Linear Trend B: p=.009** P: p=.007**
Summary
Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
• Gradient effect of VOT on looks to the competitor.
• Effect holds even for unambiguous stimuli.
• Seems to be long-lasting.
Consistent with growing body of work using priming
(Andruski, Blumstein & Burton, 1994; Utman, Blumstein &
Burton, 2000; Gow, 2001, 2002).
Extensions
Basic effect has been extended to other phonetic cues.
- general property of word recognition…




Voicing (b/p)1
Laterality (l/r), Manner (b/w), Place (d/g)1
Vowels (i/I, /)2
Natural Speech (VOT)3
P
L
B
Sh
X Metalinguistic Tasks3
Bear
1 McMurray,
Clayards, Tanenhaus &
Aslin (2004)
2 McMurray & Toscano (in prep)
3 McMurray, Aslin, Tanenhaus, Spivey
and Subik (submitted)
Lexical Sensitivity
Basic effect has been extended to other phonetic cues.
- general property of word recognition…
Voicing (b/p)1
Laterality (l/r), Manner (b/w), Place (d/g)1
Vowels (i/I, /)2
0.1
Natural Speech (VOT)3
X Metalinguistic Tasks3
1 McMurray,
Clayards, Tanenhaus &
Aslin (2004)
2 McMurray & Toscano (in prep)
3 McMurray, Aslin, Tanenhaus, Spivey
and Subik (submitted)
Competitor Fixations




Response=P
Looks to B
0.08
0.06
0.04 Response=B
Looks to B
Category
Boundary
0.02
00
5
10
15
20
25
VOT (ms)
30
35
40
Lexical Sensitivity
Basic effect has been extended to other phonetic cues.
- general property of word recognition…
Voicing (b/p)
Laterality (l/r), Manner (b/w), Place (d/g)
Vowels (i/I, /)
0.1
Natural Speech (VOT)
X Metalinguistic Tasks
1 McMurray,
Clayards, Tanenhaus &
Aslin (2004)
2 McMurray & Toscano (in prep)
3 McMurray, Aslin, Tanenhaus, Spivey
and Subik (submitted)
Competitor Fixations




Response=P
Looks to B
0.08
0.06
0.04 Response=B
Looks to B
Category
Boundary
0.02
00
5
10
15
20
25
VOT (ms)
30
35
40
The Variance Utilization Model
1) Word recognition is systematically sensitive to
subphonemic acoustic detail.
2) Acoustic detail is represented as gradations in activation
across the lexicon.
3) Normal word recognition processes do the work of.
• Maintaining detail
• Sharpening categories
• Anticipating upcoming material
• Resolving prior ambiguity.
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
b/p
bump
pump
dump
bun
bumper
bomb
Gradations phonetic cues preserved as relative
lexical activation.
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
b/d
pump
dump
bun
bumper
bomb
Gradations phonetic cues preserved as relative
lexical activation.
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
pump
Vowel
length dump
bun
bumper
bomb
Non-phonemic distinctions preserved.
(e.g. vowel length: Gow & Gordon, 1995;
Salverda, Dahan & McQueen 2003)
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
pump
n/m
dump
bun
n/m info lost
bumper
bomb
Material only retained until it is no longer needed.
Words are a conveniently sized unit.
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
pump
dump
bun
bumper
bomb
No need for explicit short-term memory: lexical
activation persists over time.
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
pump
dump
bun
bumper
bomb
Lexical competition: Perceptual warping (ala CP)
results from natural competition processes.
The Variance Utilization Model
Current models of spoken word recognition
• Immediacy:
Phonetic cues not simultaneous,
Activation retains early cues.
• Activation Based:
Graded response to graded input.
• Parallel Processing: Preserves alternative
interpretations until confident.
Anticipatory activation for
future possibilities.
• Competition:
Non-linear transformation of
perceptual space.
The Variance Utilization Model
Current models of spoken word recognition
• Immediacy:
Phonetic cues not simultaneous,
Activation retains early cues.
• Activation Based:
Graded response to graded input.
• Parallel Processing: Preserves alternative
interpretations until confident.
Anticipatory activation for
future possibilities.
• Competition:
Non-linear transformation of
perceptual space.
The Variance Utilization Model
Current models of spoken word recognition
• Immediacy:
Phonetic cues not simultaneous,
Activation retains early cues.
• Parallel Processing: Preserves alternative
interpretations until confident.
Anticipatory activation for
future possibilities.
Can lexical activation help integrate continuous acoustic
cues over time?
• Regressive ambiguity resolution.
• Anticipation of upcoming material.
Experiment 2: Regressive Ambiguity Resolution
How long are gradient effects of within-category
detail maintained?
Can subphonemic variation play a role in ambiguity
?
resolution?
How is information at multiple levels integrated?
Misperception
What if initial portion of a stimulus was misperceived?
Competitor still active
- easy to activate it rest of the way.
Competitor completely inactive
- system will “garden-path”.
P ( misperception )  distance from boundary.
Gradient activation allows the system to hedge its bets.
Misperception
/ beIrəkeId / vs. / peIrəkit /
barricade vs. parakeet
Input:
p/b
eI
time
Categorical Lexicon
parakeet
barricade
Gradient Sensitivity
parakeet
barricade
r
ə
k
i
t…
Methods (McMurray, Tanenhaus & Aslin, in prep)
10 Pairs of b/p items.
Voiced
Voiceless
Overlap
Bumpercar
Pumpernickel
6
Barricade
Parakeet
5
Bassinet
Passenger
5
Blanket
Plankton
5
Beachball
Peachpit
4
Billboard
Pillbox
4
Drain Pipes
Train Tracks
4
Dreadlocks
Treadmill
4
Delaware
Telephone
4
Delicatessen
Television
4
Methods
X
Eye Movement Results
Barricade -> Parricade
1
VOT
0
5
10
15
20
25
30
Fixations to Target
0.8
0.6
0.4
0.2
35
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical endpoint.
--Even within the non-word range.
Eye Movement Results
Barricade -> Parricade
Parakeet -> Barakeet
1
VOT
0
5
10
15
20
25
30
Fixations to Target
0.8
0.6
0.4
0.2
35
0
300
600
900
Time (ms)
300
600
900
1200
Time (ms)
Faster activation of target as VOTs near lexical endpoint.
• Even within the non-word range.
Eye Movement Results
1
VOT
0.8
Effect Size
Lexical
0.6
0.4
0.2
0
0
200
400
600
800
1000
1200
1400
1600
Time (ms)
Effect of VOT reduced as lexical information takes over.
Experiment 2b
Are results driven by the presence of the visual competitor?
or
Is this a natural process of lexical activation?
X
Look, Ma, no
parakeet!
Experiment 2b: Results
Barricade -> Parricade
Parakeet-> Barakeet
0.9
0.9
0.8
0.8
0.7
0
5
0.6
10
15
0.5
20
0.4
25
30
0.3
35
Looks to Parakeet
Looks to Barricade
0.7
0.6
0.5
0.4
0.3
40
0.2
45
0.1
0.2
0.1
0
0
0
200
400
600
800
Time
1000
1200
1400
0
200
400
600
800
1000
1200
Time
• Effect found even without visual competitor.
• Regressive ambiguity resolution is a general property
of lexical processes.
1400
Experiment 2 Conclusions
Gradient effect of within-category variation without
minimal-pairs.
Gradient effect long-lasting: mean POD = 240 ms.
Effect is not driven by visual context.
Regressive ambiguity resolution:
• Subphonemic gradations maintained until more
information arrives.
• Subphonemic gradation not maintained after
POD.
• Subphonemic gradation can improve (or hinder)
recovery from garden path.
The Variance Utilization Model
Current models of spoken word recognition
• Immediacy:
Phonetic cues not simultaneous,
Activation retains early cues.
• Parallel Processing: Preserves alternative
interpretations until confident.
Anticipatory activation for
future possibilities.
Can lexical activation help integrate continuous acoustic
cues over time?
• Regressive ambiguity resolution. 
• Anticipation of upcoming material. ?
Progressive Expectation Formation
Can within-category detail be used to predict
future acoustic/phonetic events?
Yes: Phonological regularities create systematic
within-category variation.
• Predicts future events.
(Gow & McMurray, in press)
Experiment 3: Anticipation
Word-final coronal consonants (n, t, d) assimilate the place
of the following segment.
Maroong Goose
Maroon Duck
Place assimilation -> ambiguous segments
—anticipate upcoming material.
Input:
time
maroon
goose
goat
duck
m… a… rr… oo… ng… g… oo…
s…
Methods
Subject hears
“select the maroon
“select the maroon
“select the maroong
“select the maroong
duck”
goose”
goose”
duck” *
We should see
faster eyemovements to
“goose” after
assimilated
consonants.
Results
Onset of “goose” + oculomotor delay
Fixation Proportion
0.9
0.8
0.7
0.6
0.5
0.4
Assimilated
0.3
Non Assimilated
0.2
0.1
0
0
200
Time (ms)
400
Looks to “goose“ as a function of time
Anticipatory effect on looks to non-coronal.
600
Results
Fixation Proportion
Onset of “goose” + oculomotor delay
0.3
Assimilated
Non Assimilated
0.25
0.2
0.15
0.1
0.05
0
0
200
Time (ms)
400
600
Looks to “duck” as a function of time
Inhibitory effect on looks to coronal (duck, p=.024)
Summary
Sensitivity to subphonemic detail:
• Increase priors on likely upcoming events.
• Decrease priors on unlikely upcoming events.
• Active Temporal Integration Process.
Occasionally assimilation creates ambiguity
• Resolves prior ambiguity: mudg drinker
• Similar to experiment 2…
• Progressive effect delayed 200ms by lexical
competition—supports lexical locus.
Adult Summary
Lexical activation is exquisitely sensitive to withincategory detail.
This sensitivity is useful to integrate material over time.
• Regressive Ambiguity resolution.
• Progressive Facilitation
Underpins a potentially lexical role in speech perception.
Consequences for Language Disorders
Word Recognition: not separable from speech perception.
Specific Language Impairment => Deficits in:
• Speech Perception: Less categorical perception
(some debate: Thibodeaux & Sussman, 1979; Coady, Kluender &
Evans, in press; Manis et al, 1997; Serniclaes et al, 2004; Van Alphen
et al, 2004)
• Word Recognition: Slower recognition.
(Montgomery, 2002; Dollaghan, 1998)
Could word recognition deficits account for apparent
perceptual deficits?
The Variance Utilization Model
Input:
b...
u…
m…
p…
time
bump
pump
dump
bun
bumper
bomb
Lexical competition: Perceptual warping (ala
CP) results from natural competition processes.
The Variance Utilization Model
Categorical perception:
• Stimuli in the same category become closer in
perceptual space (e.g. Goldstone, 2001)
Lexical competition:
• Most active lexical candidate inhibits
alternatives.
• Becomes more active.
• More similar to prototype…
• Feedsback to alter phoneme representations
(Magnuson, McMurray, Tanenhaus & Aslin, 2003)
•
Two versions of same word (category) become
more similar
The Variance Utilization Model
p
If competitionb is suppressed…
20
Input: 80
… by a low-familiarity word
…should
less CP
beach seepeach
…greater
to within-category
detail
Words
80 sensitivity
20
Activates:
beach
Competes: 90
Feedback:
b
90
peach
10
p
10
Words
Phonemes (etc)
Critical step.
Input warped
[90 10] more similar to prototype, [100 0].
Perceptual space warped.
Consequences for Language Disorders
Visual World Paradigm: ideal test
• Simple task: useable with many populations.
• No meta-linguistic knowledge required.
• Used to examine:
- Lexical Activation (Allopenna et al, 1998)
- Lexical Competition (Dahan et al, 2001)
- Within-category sensitivity (McMurray et al, 2002)
Consequences for Language Disorders
Proposed Research Program
(with J. Bruce Tomblin, V.
Samelson, and S. Lee)
Population: SLI & Normal Adolescents
16-17 y.o.
Iowa Longitudinal Study (Tomblin et al)
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Word Familiarity (~200 words)
Basic Word Recognition
Stimuli: Beaker, Beetle, Speaker, etc.
Frequency effects
Familiar words more active than unfamiliar.
Gradiency (sensitivity to VOT) suppressed
for familiar words (high competition).
How do we buttress lexical activation?
Consequences of VUM
Word recognition sensitive to perceptual detail.
• Temporal integration.
Word recognition supports perceptual processed.
• Hypothesis: related to SLI
Continuous variability NOT discarded during recognition.
Does this change how we think about development?
Development
Historically, work in speech perception has been linked
to development.
Sensitivity to subphonemic detail must revise our view
of development.
Use: Infants face additional problems:
No lexicon available to clean up noisy input: rely on
acoustic regularities.
Extracting a phonology from the series of utterances.
Development
Sensitivity to subphonemic detail:
For 30 years, virtually all attempts to address this
question have yielded categorical discrimination (e.g.
Eimas, Siqueland, Jusczyk & Vigorito, 1971).
Exception: Miller & Eimas (1996).
• Only at extreme VOTs.
• Only when habituated to nonprototypical token.
Use?
Nonetheless, infants possess abilities that would
require within-category sensitivity.
• Infants can use allophonic differences at word
boundaries for segmentation (Jusczyk, Hohne
& Bauman, 1999; Hohne, & Jusczyk, 1994)
• Infants can learn phonetic categories from
distributional statistics (Maye, Werker &
Gerken, 2002; Maye & Weiss, 2004).
Statistical Category Learning
Speech production causes clustering along contrastive
phonetic dimensions.
E.g. Voicing / Voice Onset Time
B:
VOT ~ 0
P:
VOT ~ 40
Within a category, VOT forms Gaussian distribution.
Result: Bimodal distribution
0ms
40ms
VOT
Statistical Category Learning
To statistically learn speech categories, infants must:
• Record frequencies of tokens at each value along a
stimulus dimension.
frequency
• Extract categories from the distribution.
+voice
0ms
-voice
VOT
50ms
• This requires ability to track specific VOTs.
Statistical Category Learning
Known statistical learning abilities (Maye et al) predict:
•
Within category sensitivity.
•
Graded structure to category.
Why no demonstrations?
Statistical Category Learning
Why no demonstrations of sensitivity?
• Habituation
Discrimination not ID.
Possible selective adaptation.
Possible attenuation of sensitivity.
• Synthetic speech
Not ideal for infants.
• Single exemplar/continuum
Not necessarily a category representation
Experiment 4: Reassess issue with improved methods.
HTPP
Head-Turn Preference Procedure
(Jusczyk & Aslin, 1995)
Infants exposed to a chunk of language:
• Words in running speech.
• Stream of continuous speech (ala statistical learning
paradigm).
• Word list.
Memory for exposed items (or abstractions) assessed:
• Compare listening time between consistent and
inconsistent items.
HTPP
Test trials start with all lights off.
HTPP
Center Light blinks.
HTPP
Brings infant’s attention to center.
HTPP
One of the side-lights blinks.
HTPP
Beach…
Beach…
Beach…
When infant looks at side-light…
…he hears a word
HTPP
…as long as he keeps looking.
Methods
7.5 month old infants exposed to either 4 b-, or 4 p-words.
80 repetitions total.
Form a category of the exposed
class of words.
Bomb
Palm
Bear
Pear
Bail
Pail
Beach
Peach
Measure listening time on…
Original words Bear
Competitors Pear
VOT closer to boundary Bear*
McMurray & Aslin, 2005
Pear
Bear
Pear*
Methods
Stimuli constructed by cross-splicing naturally
produced tokens of each end point.
B:
P:
M= 3.6 ms VOT
M= 40.7 ms VOT
B*:
P*:
M=11.9 ms VOT
M=30.2 ms VOT
B* and P* were judged /b/ or /p/ at least 90%
consistently by adult listeners.
B*: 97%
P*: 96%
Novelty or Familiarity?
Novelty/Familiarity preference varies across infants and
experiments.
We’re only interested in the middle stimuli (b*, p*).
Infants were classified as novelty or familiarity preferring
by performance on the endpoints.
Novelty
Familiarity
B
36
16
P
21
12
Within each group
will we see evidence
for gradiency?
Novelty or Familiarity?
After being exposed to
bear… beach… bail… bomb…
Infants who show a novelty effect…
…will look longer for pear than bear.
Listening Time
What about in between?
Categorical
Gradient
Bear
Bear*
Pear
Results
Novelty infants (B: 36
P: 21)
Listening Time (ms)
10000
9000
8000
7000
Exposed to:
6000
B
P
5000
4000
Target
Target*
Target vs. Target*:
Competitor vs. Target*:
Competitor
p<.001
p=.017
Results
Familiarity infants (B: 16
P: 12)
10000
Listening Time (ms)
Exposed to:
9000
B
P
8000
7000
6000
5000
4000
Target
Target*
Target vs. Target*:
Competitor vs. Target*:
Competitor
P=.003
p=.012
Results
Infants exposed to /p/
.009**
Novelty
N=21
.024*
9000
8000
7000
6000
.028*
9000
5000
4000
P
Familiarity
N=12
Listening Time (ms)
Listening Time (ms)
10000
8000
P*
.018*
B
7000
6000
5000
4000
P
P*
B
Results
Infants exposed to /b/
>.1
>.2
10000
Novelty
N=36
9000
8000
7000
6000
.06
10000
5000
4000
B
Familiarity
N=16
Listening Time (ms)
Listening Time (ms)
<.001**
.15
9000
B*
8000
P
7000
6000
5000
4000
B
B*
P
Experiment 4 Conclusions
Contrary to all previous work:
7.5 month old infants show gradient sensitivity to
subphonemic detail.
• Clear effect for /p/
• Effect attenuated for /b/.
Listening Time
Reduced effect for /b/… But:
Null Effect?
Bear*
Pear
Listening Time
Bear
Expected Result?
Bear
Bear*
Pear
Listening Time
Actual result.
Bear
Bear*

Pear
• Bear*  Pear
• Category boundary lies between Bear & Bear*
- Between (3ms and 11 ms) [??]
• Within-category sensitivity in a different range?
Experiment 5
Same design as experiment 3.
VOTs shifted away from hypothesized boundary
Train
Bomb
Beach
Bear
Bale
-9.7 ms.
Bomb
Beach
Bear
Bale
-9.7 ms.
Bomb*
Beach*
Bear*
Bale*
3.6 ms.
Palm
Peach
Pear
Pail
40.7 ms.
Test:
Results
Familiarity infants (34 Infants)
=.01**
Listening Time (ms)
9000
=.05*
8000
7000
6000
5000
4000
B-
B
P
Results
Novelty infants (25 Infants)
=.002**
Listening Time (ms)
9000
=.02*
8000
7000
6000
5000
4000
B-
B
P
Experiment 5 Conclusions
• Within-category sensitivity in /b/ as well as /p/.
• Shifted category boundary in /b/: not consistent with
adult boundary (or prior infant work)….
• Graded structure supports statistical learning.
Will an implementation of this model allow us to
understand developmental mechanism?
Computational Model
Distributional learning model
1) Model distribution of tokens as
a mixture of Gaussian distributions
over phonetic dimension (e.g. VOT) .
2) After receiving an input, the Gaussian with the
highest posterior probability is the “category”.
3) Each Gaussian has three
parameters:


VOT

Statistical Category Learning
1) Start with a set of randomly selected Gaussians.
2) After each input, adjust each parameter to find best
description of the input.
3) Start with more Gaussians than necessary--model doesn’t
innately know how many categories.
 -> 0 for unneeded categories.
VOT
VOT
Training:
Lisker & Abramson
(1964) distribution
of VOTs
• Not successful with large K.
• [Successful with K=2…
…but what if we were learning Hindi?]
Solution: Competition (winner-take-all)
1 Category
2 Categories
>4 Categories
% in right place
Mechanism #1:
Competition
5%
95%
0%
95%
No Competition
0%
0%
100%
66%
Competition Required.
Validated with neural network.
What about the nature of the initial state?
Classic view (e.g. Werker & Tees, 1984):
• Infants start with many small (nonnative)
categories.
• Lose distinctions that are not used in native
language.
Small (nonnative) categories =>
Large native categories.
Combining small
categories: easy.
What about
reverse
(large => small)?
Large (overgeneralized) categories =>
Smaller native categories.
Dividing large
categories: hard.
Large (overgeneralized) categories =>
Smaller native categories.
Dividing large
categories: hard.
Mechanism #2:
Combining small categories easier
than dividing large.
Related to adult non-native speech perception findings?
Question:
Reduced auditory acuity in cochlear implant users.
Answer:
Larger region in which stimuli are not discriminable.
Assess non-native discrimination in CI users.
Larger
categories.
forthat
learning?
• Smallinitial
categories:
AuditoryProblem
acuity not
bad.
• Large categories: suggest different learning
mechanisms.
(with J. Bruce Tomblin & B. Barker)
Infant Summary
Infants show graded sensitivity to subphonemic detail.
• Supports variance utilization model.
• Variance used for statistical learning.
Model suggests aspects of developmental mechanism:
• Competition.
• Starting state (large vs. small)
Remaining questions
• Unexpected VOT boundary: may require 2AFC task
(anticipatory eye-movement methods)
• Role of initial category size and learning (possible CI
application).
Conclusions
Infant and adults sensitive to subphonemic detail.
Continuous detail not discarded by perception / word
recognition.
X Variance Reduction
Variance Utilization
Normal SWR mechanisms yield:
1) Temporal Integration
2) Perceptual warping
Conclusions
Infant and adults sensitive to subphonemic detail.
Infant sensitivity allows long term phonology learning.
• Potentially reveals developmental mechanism.
Competition processes:
1) Potentially responsible for CP – locus of SLI?
2) Essential for learning.
Conclusions
Spoken language is defined by change.
But the information to cope with it is
in the signal—if lexical processes don’t discard it.
Within-category acoustic variation is signal,
not noise.
IR Head-Tracker
Emitters
Head-Tracker Cam
Monitor
Head
2 Eye cameras
Computers connected
via Ethernet
Eyetracker
Computer
Subject
Computer
Continuous acoustic detail affects
spoken word recognition
Implications for
cognition,
development
and language
disorders.
Bob McMurray
University of Iowa
Dept. of Psychology
Misperception: Additional Results
Identification Results
1.00
Response Rate
0.90
0.80
0.70
Voiced
Voiceless
NW
0.60
0.50
0.40
Significant target
responses even at
extreme.
0.30
0.20
0.10
0.00
0
5
10
15
20
25
Barricade
30
35
Parricade
1.00
Response Rate
0.90
0.80
0.70
0.60
Voiced
Voiceless
0.50
0.40
NW
0.30
0.20
0.10
0.00
0
5
Barakeet
10
15
20
25
30
35
Parakeet
Graded effects of VOT
on correct response
rate.
Phonetic “Garden-Path”
“Garden-path” effect:
Difference between looks to each target (b
vs. p) at same VOT.
VOT = 0 (/b/)
VOT = 35 (/p/)
Fixations to Target
1
0.8
Barricade
Parakeet
0.6
0.4
0.2
0
0
500
1000
Time (ms)
0
500
1000
Time (ms)
1500
Garden-Path Effect
( Barricade - Parakeet )
0.15
Target
0.1
0.05
GP Effect:
Gradient effect of VOT.
0
-0.05
-0.1
0
5
10
15
20
25
30
35
25
30
35
VOT (ms)
Garden-Path Effect
( Barricade - Parakeet )
0.06
0.04
0.02
Competitor
0
-0.02
-0.04
-0.06
-0.08
-0.1
0
5
10
15
20
VOT (ms)
Target: p<.0001
Competitor: p<.0001
Assimilation: Additional Results
runm picks
runm takes
***
When /p/ is heard, the bilabial feature can be
assumed to come from assimilation (not an
underlying /m/).
When /t/ is heard, the bilabial feature is likely to be
from an underlying /m/.
Exp 3 & 4: Conclusions
Within-category detail used in recovering from
assimilation: temporal integration.
• Anticipate upcoming material
• Bias activations based on context
- Like Exp 2: within-category detail retained to
resolve ambiguity..
Phonological variation is a source of information.
Subject hears
“select the mud
“select the mudg
“select the mudg
drinker”
gear”
drinker
Critical Pair
Onset of “gear”
Avg. offset of “gear” (402 ms)
Fixation Proportion
0.45
0.4
0.35
0.3
0.25
0.2
0.15
Initial Coronal:Mud Gear
0.1
Initial Non-Coronal:Mug Gear
0.05
0
0
200 400
600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Mudg Gear is initially ambiguous with a late bias
towards “Mud”.
Onset of “drinker”
Avg. offset of “drinker (408 ms)
Fixation Proportion
0.6
0.5
0.4
0.3
0.2
Initial Coronal: Mud Drinker
0.1
Initial Non-Coronal: Mug Drinker
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000
Time (ms)
Mudg Drinker is also ambiguous with a late bias towards
“Mug” (the /g/ has to come from somewhere).
Onset of “gear”
Fixation Proportion
0.8
0.7
0.6
0.5
0.4
Assimilated
0.3
Non Assimilated
0.2
0.1
0
0
200
400
600
Time (ms)
Looks to non-coronal (gear) following assimilated or
non-assimilated consonant.
In the same stimuli/experiment there is also a
progressive effect!
Feedback
% /t/
Ganong (1980): Lexical information biases
perception of ambiguous phonemes.
doot / toot
duke / tuke
d
Phoneme
Restoration
(Warren, 1970,
Samuel, 1997).
t
Lexical Feedback: McClelland & Elman (1988);
Magnuson, McMurray, Tanenhaus & Aslin (2003)
Ganong (1980): Lexical information biases
perception of ambiguous phonemes.
words
phonemes
Lexical Feedback: McClelland & Elman (1988);
Magnuson, McMurray, Tanenhaus & Aslin (2003)
Scales of temporal integration in word recognition
• A Word: ordered series of articulations.
- Build abstract representations.
- Form expectations about future events.
- Fast (online) processing.
• A phonology:
- Abstract across utterances.
- Expectations about possible future events.
- Slow (developmental) processing
Sparseness
Overgeneralization
• large 
• costly: lose distinctiveness.
Undergeneralization
• small 
• not as costly: maintain distinctiveness.
To increase likelihood of successful learning:
• err on the side of caution.
• start with small 
1
0.9
39,900
Models
Run
P(Success)
0.8
0.7
0.6
2 Category Model
3 Category Model
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
Starting 
40
50
60
Small 
Sparseness coefficient: % of
space not strongly mapped
to any category.
Unmapped
space
Avg Sparseness Coefficient
VOT
Starting 
0.4
.5-1
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
Training Epochs
10000
12000
Start with large σ
Avg Sparsity Coefficient
VOT
Starting 
0.4
.5-1
0.35
0.3
0.25
20-40
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
Training Epochs
10000
12000
Intermediate starting σ
Avg Sparsity Coefficient
VOT
Starting 
0.4
.5-1
3-11
12-17
20-40
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
Training Epochs
10000
12000
Model Conclusions
To avoid overgeneralization…
…better to start with small estimates for 
Small or even medium starting ’s lead to sparse
category structure during infancy—much of phonetic
space is unmapped.
Sparse categories:
Similar temporal integration to exp 2
Retain ambiguity (and partial
representations) until more input is available.
AEM Paradigm
Examination of sparseness/completeness of categories
needs a two alternative task.
Anticipatory Eye Movements
(McMurray & Aslin, 2005)
Also useful with
Infants are trained
• Colorto make
anticipatory •eye
movements in
Shape
response to auditory
visual
• Spatial or
Frequency
stimulus. • Faces
Post-training, generalization can be
assessed with respect to both targets.
Quicktime Demo
bear
pail
Experiment 6
Anticipatory Eye Movements
Train:
Test:
Bear0: Left
Pail35: Right
Bear0
Bear5
Bear10
Bear15
palm
Pear40
Pear35
Pear30
Pear25
beach
Same naturally-produced tokens
from Exps 4 & 5.
Expected results
Sparse categories
Adult boundary
Performance
unmapped
Bear
space
VOT
Pail
Results
Training Tokens {
% Correct: 67%
9 / 16 Better than chance.
1
Beach
% Correct
0.75
Palm
0.5
0.25
0
0
10
20
VOT
30
40
Download