- Department of Psychological & Brain Sciences

advertisement
Speech perception as a window into
language processing:
Real-time spoken word recognition,
specific language impairment, and CIs
Bob McMurray
Dept. of Psychology
Dept. of Communication
Sciences and Disorders
Thanks to
Richard N. Aslin
Meghan Clayards
Joe Toscano
Gwyn Rost
Marcus Galle
Dan McEchron
Jessica Walker
Keith Apfelbaum
Ashley Farris-Trimble
Cheyenne Munson
Joel Dennhardt
Lea Greiner
Jennifer
Cole
Michael K. Tanenhaus
Allard Jongman
Steve Luck
Intellectual Community
Larissa Samuelson
John Spencer
Mark Blumberg
Ed Wasserman
J. Bruce Tomblin Marlea O’Brien
Prahlad Gupta
Eliot Hazeltine
Vicki Samelson
Karla McGregor
Amanda Owen
Funding
Jean Gordon
Karen Iler Kirk
The National Institute of Deafness Chris Turner
Colleen Mitchell
and other
Bruce Gantz
Matt Howard
Communication
Carolyn Brown & Jerry Zimmerman
Disorders
Introductions
“I do ba’s and pa’s”
Neuroscience
Cognitive Science
Diverse fields are united by their commitment to
understand the basic mechanisms or processes
that underlie perception, cognition, and
language… wherever they occur.
Language disorders
 Useful way to justify basic research to NIH...
Introductions
Culture
Phys./Social
Enviroment
Culture
Phys./Social
Enviroment
Culture
Phys./Social
Enviroment
Culture
Phys./Social
Enviroment
Science
Behavior Developmental
Behavior
Behavior
Phys./S
Enviro
Behavior
Development
is
Brain/
Brain/
Brain/
Brain/
Brain
•Body
Multiply determined.
Body
Body
Body
Body
• Product of interactions between levels of analysis.
Cells• Characterized
Cells
Cells
Cells
Cells
by non-obvious
causation.
• Has no single end-state
Genes
Genes
Genes
Genes
Genes
Language disorders
 Useful way to justify basic research to NIH...
 Useful reminder of the multi-potential nature
of language development…
…but not a rigorous way to approach basic
theoretical questions.
Carl Seashore
Cora Busey Hillis &
Beth Wellman
“Clinical Cognitive Psychology”
Child Welfare Research Station
Boyd McCandless &
Experimental Developmental Psych.
Charlie Spiker
Wendell Johnson
Independent Speech Pathology
E.F. Lindquist
Iowa Test of Basic Skills
Bruce Tomblin
???
Language disorders
 Useful way to justify basic research to NIH...
 Useful reminder of the multi-potential nature
of language development…
…but not a rigorous way to approach basic
theoretical questions.
Individual differences (including disorders) in
language development and outcomes:
 Reveal range of variation that our theories
must account for.
 Allows examination of the consequences of
variation in the internal structure of the system.
Braitenberg (1984)
“Vehicles”
Light Sensor
Wire
Wheel/Motor
Light Avoiding
Light Seeking
Simple mechanisms give rise to complex behavior.
But many such mechanisms are possible.
- Easier to understand mechanism by building
outward, rather than observing inward.
Disordered language users allow us to observe
consequences of a change in mechanism.
Individual differences (including disorders) in
language development and outcomes:
 Reveal range of variation that our theories
must account for.
 Allow us to examine the consequences of
variation in the internal structure of the system.
Neuroscience
The Future of Cognitive
Science, UC Merced,
May 2008
Neuroscience
Education
Individual differences (including disorders) in
language development and outcomes:
 Reveal range of variation that our theories
must account for.
 Allow us to examine the consequences of
variation in the internal structure of the system.
But simultaneously
 Detailed understanding of the process of
language use and development may enable us
to better understand disorders.
A process-oriented approach to
individual differences.
beach
1) Define the process:
• What steps does the brain/mind/language
system/child take to get from some clearly defined
input to some clearly defined output?
2) How can we measure this process as it happens?
3) Identify a population:
• What will we relate variation in process to?
4) What dimensions can vary within that process?
• Which covary with outcome variables?
Goal today:
Show how understanding the real-time (and
developmental) processes that underlie language in
normal listeners can offer an important
Disclaimer viewpoint on individual differences.
(complementary)
Much of this work is examines adults or older kids.
Easier to measure real-time process using more
But 1)
first:
complex
tasks
I have to
show you
what those processes look like (and
2) aEasier
to conceptualizeabout
process
without
having to
dispel
few misconceptions
speech
perception).
worrying about development (as much).
3) Consequently, we take an individual differences
approach, rather developmental (but ask me about
development).
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
The Domain: Speech & Words
Speech perception, word recognition
and their development are an ideal
domain for these questions.
• Excellent understanding of input
-Acoustics of a single word.
-Statistical properties of a language
beach
Aspiration/VOT
F2
Frequency
First
Formant
Time
Voicing
Frequency
Aspiration/VOT
Time
Voicing
50
45
40
# of tokens
35
30
25
20
15
10
5
0
145
130
115
100
85
70
55
40
25
10
-5
VOT (ms)
Major theoretical issue: lack of invariance.
Acoustic cues do not directly distinguish categories due to
• Talker variation (Allen & Miller, 2003; Jongman, Wayland & Wang, 2000;
Peterson & Barney, 1955).
• Influence of neighboring phonemes (coarticulation)
(Fowler & Smith, 1986; Delattre, Liberman & Cooper, 1955)
• Speaking rate variation (Miller, Green & Reeves, 1986; Summerfield, 1981)
• Dialect variation (Clopper, Pisoni & De Jong, 2006)
35
30
# of tokens
25
20
15
10
5
0
100
115
130
145
100
115
130
145
85
70
55
40
25
10
-5
VOT (ms)
50
45
40
# of tokens
35
30
25
20
15
10
5
0
85
70
55
40
25
10
-5
VOT (ms)
Allen & Miller, 1999
400
500
F1 (Hz)
600
700
800
ɛ
ʌ
900
1000
2200
2000
1800
1600
1400
1200
1000
F2 (Hz)
Cole, Linebaugh, Munson & McMurray, 2010, J. Phon
400
500
F1 (Hz)
600
700
800
ɛ
ʌ
900
1000
2200
2000
1800
1600
1400
1200
1000
F2 (Hz)
Cole, Linebaugh, Munson & McMurray, 2010, J. Phon
The Domain: Speech & Words
Speech perception, word recognition
and their development are an ideal
domain for these questions.
• Excellent understanding of input
-Acoustics of a single word.
-Statistical properties of a language
-But difficult problem to solve
• Tractable output units.
beach
The Domain: Speech & Words
Activate words
But phonemes:
• Have no meaning in isolation.
• Theoretically controversial (Port, 2007; Pisoni, 1997)
• Hard to measure
directly…
(e.g., Norris, McQueen & Cutler, 2000;
Identify
phonemes
Pisoni & Tash, 1974; Schouten, Gerrits & Van Hessen, 2003)
… particularly in populations with poor phoneme
awareness, metalinguistic ability.
… particularly
a way that
Extract in
acoustic
cuesgives online (momentby-moment) measurement.
The Domain: Speech & Words
Activate words
Identify phonemes
Extract acoustic cues
The Domain: Speech & Words
Meaning
Sentence Processing
Reference
(semantics)
Words (syntax)
(pragmatics)
• Functionally relevant: Crucial for semantics, sentence
processing, reference.
• Most everyone agrees on them (but see Elman, 2008, SRCLD)
Activate words
Identify phonemes
Extract acoustic cues
Online Word Recognition
Major theoretical issue in word recognition: time
• Information arrives sequentially
• At early points in time, signal is temporarily ambiguous.
X
basic
ba… kery
bakery
X
bacon
X
X
bait
barricade
X
baby
• Later arriving information disambiguates the word.
Online Word Recognition
If input is phonemic, word recognition is characterized by:
• Immediacy
• Parallel Processing
• Activation Based
• Competition
Input:
s... æ…
time
soup
sandal
sack
candle
dog
n…
d…
ə…
l
Measuring Temporal Dynamics
How do we measure unfolding activation?
Eye-movements in the Visual World Paradigm
Subjects hear spoken language and manipulate objects in a
visual world.
Visual world includes set of objects with interesting
linguistic properties.
a sandal, a sandwich, a candle and an unrelated items.
Eye-movements to each object monitored throughout task.
Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
Allopenna, Magnuson & Tanenhaus, 1998
Task
A moment
to view the
items
Task
Task
Sandal
Task
Repeat
200-1000
times…
Task
Bear
Repeat
200-1000
times…
(new words,
locations, etc)
Why use eye-movements and visual world paradigm?
• Relatively natural task.
Easy to use with clinical populations:
- Children with dyslexia (Desroches, Joanisse, & Robertson, 2006),
- Autistic children (Brock, Norbury, Einav, & Nation, 2008;
Campana, Silverman, Tanenhaus, Bennetto, & Packard, 2005)
- People with aphasia (Yee, Blumstein, & Sedivy, 2004, 2008).
- Children with SLI (Nation, Marshall, & Altmann, 2003)
Why use eye-movements and visual world paradigm?
• Relatively natural task.
Easy to use with clinical populations:
• Eye-movements generated very fast (within 200ms of
first bit of information).
• Eye movements time-locked to speech.
• Subjects aren’t aware of eye-movements.
• Fixation probability maps onto lexical activation..
Measures a functional language ability.
Eye movement analysis
200 ms
Trials
1
2
3
4
Target = Sandal
Cohort = Sandwich
% fixations
5
Rhyme = Candle
Unrelated = Necklace
Time
0.9
0.8
Fixation Proportion
0.7
0.6
Target
0.5
Cohort
0.4
Rhyme
0.3
Unrelated
s æ nd ə l
0.2
0.1
0
0
500
1000
1500
2000
Time (ms)
Allopenna, Magnuson & Tanenhaus, 1998
McMurray, Samelson, Lee & Tomblin, 2010
0.9
0.8
Fixation Proportion
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
500
1000
Time (ms)
1500
2000
Meaning
Sentence Processing
Reference
(semantics)
Words (syntax)
(pragmatics)
• Functionally relevant: Crucial for semantics, sentence
processing, reference.
• Most everyone agrees on them (but see Elman, 2009, SRCLD)
• Easy to measure Activate
directly…
(e.g., Tanenhaus, Spivey-Knowlton,
words
Measuring
speech
perception
through
the lens 1998;
of
Sedivy
& Eberhart,
1995; Allopenna,
Magnuson
& Tanenhaus,
spoken
recognition…
… evenword
in populations
with poor phoneme
awareness,
metalinguistic
ability. we find matter
• Ensures that
whatever differences
… online
(moment-by-moment)
data.
for the
next Identify
levelphonemes
up.
• Theoretically
more
Extract
acoustic grounded.
cues
• Multi-dimensional online measure
The Domain: Speech & Words
Speech perception, word recognition
and their development are an ideal
domain for these questions.
• Excellent understanding of input
- Acoustics of a single word.
- Statistical properties of a language
- But difficult problem to solve
• Tractable output units.
- Spoken word recognition
- But problem of time
beach
The Domain: Speech & Words
Speech perception, word recognition
and their development are an ideal
domain for these questions.
• Excellent understanding of input
- Acoustics of a single word.
- Statistical properties of a language
- But difficult problem to solve
• Tractable output units.
- Spoken word recognition
- But problem of time
• Associated with many impairments.
beach
Task
Auditory, Speech or Lexical Deficits have been reported
in a variety of clinical populations
•
•
•
•
•
•
•
•
•
•
Specific / Non-specific Language Impairment
Dyslexia / Struggling Readers
Autism
Cerebellar Damage
Broca’s Aphasia
Downs Syndrome
Hard of Hearing
Cochlear Implant Users
Cognitive Decline
Schizophrenia
So what’s the process?
?
How do listeners map a highly variable
acoustic input onto lexical candidates as the
input unfolds over time?
Variance Reduction in Speech
Activate words
Competition
Graded Activation
Suppress competing
interpretations
Identify phonemes
Categorical Perception
Discard withincategory detail
Extract acoustic cues
Normalization
Warping perceptual space
Discard irrelevant
variation
Problems with Variance Reduction
Problems
• Continuous detail could be useful (Martin & Bunnel, 1981;
Gow, 2001; McMurray et al, 2009).
X
basic
ba…
bak ≠ bas≠ bar
bakery
bacon
X
X
bait
barricade
X
baby
Problems with Variance Reduction
Problems
• Continuous detail could be useful (Martin & Bunnel, 1981;
Gow, 2001; McMurray et al, 2009).
• Some useful variation is not phonemic (Salverda, Dahan &
McQueen, 2003; Gow & Gordon, 1995)
• Acoustic cues are spread out over time – how do you
know when you are done with a phoneme and ready
for word recognition?
d ɹ
æ
g
ə
n
Fowler, 1984; Cole, Linebaugh, Munson & McMurray, 2010
Problems with Variance Reduction
Problems
• Continuous detail could be useful (Martin & Bunnel, 1981;
Gow, 2001; McMurray et al, 2009).
• Some useful variation is not phonemic (Salverda, Dahan &
McQueen, 2003; Gow & Gordon, 1995)
• Acoustic cues are spread out over time – how do you
know when you are done with a phoneme and ready
for word recognition?
d ɹ
æ
g
ə
n
Fowler, 1984; Cole, Linebaugh, Munson & McMurray, 2010
The alternative
• Fine-grained detail can bias lexical activation.
• Let lexical competition sort it out.
Advantages
• Helps with invariance – not making a firm commitment
on any given cue. Lexicon may offer more support.
• Helps with time – use fine-grained detail to make earlier
commitments.
But
• This stands in stark contrast to findings of
categorical perception (Liberman, Harris, Hoffman & Griffith, 1957)
Categorical Perception
P
100
Discrimination
B
% /p/
100
Discrimination
ID (%/pa/)
0
B
0
VOT
P
• Sharp identification of tokens on a continuum.
• Discrimination poor within a phonetic category.
Subphonemic variation in VOT is discarded in favor of a
discrete symbol (phoneme).
Categorical Perception
Evidence against categorical perception from
• Discrimination task variants (Schouten, Gerrits & Van
Hessen, 2003; Carney, Widden & Viemeister, 1977)
• Training studies (Carney et al., 1977; Pisoni & Lazarus, 1974)
• Rating tasks (Massaro & Cohen, 1983)
But no evidence that this fine grained detail
actually affects higher level (lexical) processes.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Integrating speech and words
My intuition:
• word recognition mechanisms can cope with variability.
• sensitivity to gradient acoustic detail can help solve
problem of time.
But only if word recognition and perception are
continuously coupled:
If, activation for lexical candidates gradiently
reflects continuous acoustic detail.
Then, these mechanisms can help sort it out.
Does activation for lexical competitors
reflect continuous detail?
- during online recognition
Need:
• tiny acoustic gradations
• online, temporal word recognition task
McMurray, Tanenhaus & Aslin (2002)
McMurray, Tanenhaus, Aslin
& Spivey (2003)
See Also
Andruski, Blumstein & Burton (1994)
Utman, Blumstein & Burton (2002)
Gradations in the Signal
beach/peach
bear/pear
bomb/palm
bale/pale
bump/pump
butter/putter
Task
Bear
Repeat
1080
times
Identification Results
1
0.9
proportion /p/
0.8
High agreement
across subjects and
items for category
boundary.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
B
By subject:
By item:
15
20
25
VOT (ms)
30
35
40
P
17.25 +/- 1.33ms
17.24 +/- 1.24ms
VOT=0 Response=
VOT=40 Response=
Fixation proportion
0.9
0.8
0.7
0.6
0.5
0.4
0.3
p<.001
p<.001
0.2
0.1
00
400
800
1200
1600
0
400
800
1200
1600
Time (ms)
More looks to competitor than unrelated items.
2000
Gradiency?
Given that
• the subject heard bear
• clicked on “bear”…
How often was the subject
looking at the “pear”?
Categorical Results
target
% /p/
Fixation proportion
100
ID (%/pa/)
0
competitor
time
B
VOT
P
Gradiency?
Given that
• the subject heard bear
• clicked on “bear”…
How often was the subject
looking at the “pear”?
target
competitor
Gradient Effect
Fixation proportion
Fixation proportion
Categorical Results
target
competitor
time
time
Response=
Response=
Competitor Fixations
0.16
VOT
VOT
0.14
0 ms
5 ms
10 ms
15 ms
0.12
0.1
20 ms
25 ms
30 ms
35 ms
40 ms
0.08
0.06
0.04
0.02
0
0
400
800
1200
1600
0
400
800
1200
1600
2000
Time since word onset (ms)
Long-lasting gradient effect: seen throughout the
timecourse of processing.
Response=
Response=
Competitor Fixations
0.08
0.07
Looks to
0.06
0.05
0.04
Looks to
Category
Boundary
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Area under the curve:
VOT
Linear Trend
B
p=.017
p=.023
P
p<.001
p=.002
Response=
Response=
Competitor Fixations
0.08
0.07
Looks to
0.06
0.05
0.04
Looks to
Category
Boundary
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Unambiguous Tokens:
VOT
Linear Trend
B
p=.014
p=.009
P
p=.001
p=.007
Summary
Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
• Gradient effect of VOT on looks to the competitor.
- Refutes strong forms of categorical perception
• Fine-grained information in the signal is not
discarded prior to lexical activation.
Summary
Subphonemic acoustic differences in VOT have gradient
effect on lexical activation.
• Extends to vowels, l/r, d/g, b/w, s/z (Clayards, Toscano,
McMurray, Tanenhaus & Aslin, in prep; Galle & McMurray, in prep)
• Does not work with phoneme decision task (McMurray,
Aslin, Tanenhaus, Spivey & Subik, 2008)
• 8.5 month old infants (McMurray & Aslin, 2005)
• Color Categories
(Huette & McMurray, 2010)
Activate words
Competition
Graded Activation
Identify phonemes
Categorical Perception
Extract acoustic cues
Normalization
Warping perceptual space
Gradient Sensitivity
to fine-grained detail
Activate words
Competition
Graded Activation
Identify phonemes
Extract acoustic cues
Normalization
Warping perceptual space
Gradient Sensitivity
to fine-grained detail
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Variance Reduction in Speech
Psychological
Response
Categorical perception predicts a warping in the
sensory encoding of the stimulus.
Identify phonemes
Extract acoustic cues
Continuous perceptual cue
(e.g., VOT)
Variance Reduction in Speech
Psychological
Response
Continuous perception allows system to veridically
encode what was heard.
Identify phonemes
Extract acoustic cues
Continuous perceptual cue
(e.g.,
VOT) perceptual encoding of
How can we
measure
continuous cues?
Categorical Perception
It is difficult to measure cue
encoding behaviorally.
(Pisoni, 1973; Pisoni & Tash, 1974)
Discrete Categories
Solution:
go direct to the brain.
Event related potentials.
Encoding continuous cues
Behavior
Categorical Perception
The Electroencephalogram (EEG)
Systematic fluctuations in voltage
over time can be measured at the
scalp (Berger, 1929)
• Related to underlying brain
activity (though with a lot of
filtering and scattering).
Categorical Perception
Event-Related Potentials (ERPs)
Stim 1
Stim 2
Stim N
EEG
Averaged ERP
Waveform
Stim 1
P3
P1
P2
Stim 2
N2
+
Voltage (V)
Consistent patterns
of EEG are
triggered by a
stimulus and are
embedded in the
overall EEG.
N1
Stim N
0
200
400
Time (ms)
600
Perception vs. Categorization
P3
Voltage
P2
N2
N1
Auditory N1: Low level auditory processes
- Generated in Heschl’s gyrus (auditory cortex / STG)
- Responds to pure tones and speech.
- Responds to change
Voltage
P3
P2
N2
N1
How does the auditory N1
respond to continuous
changes in VOT?
Toscano, McMurray, Dennhardt & Luck, 2010, PsychSci
N1 (Auditory Encoding) shows linear effect of VOT.
2
VOT
0
5
10
15
20
25
30
35
40
1
Voltage (μV)
0
-1
-2
-3
-4
-5
-6
-50
0
50
100
Time (ms)
150
Stimulus
continuum
N1 Amplitude (μV)
-2
beach/peach
-3
dart/tart
-4
-5
-6
-7
0
5
10
15
20
25
30
35
40
VOT (ms)
Linear effect of VOT.
• Not artifact of averaging across subjects.
Affected by place of articulation.
No effect of target-type, response.
Experiment 1: Summary
Early brain responses encode speech cues veridically.
• N1: low-level encoding is not affected by
categories at all.
Veridical encoding leads to graded categorization.
• Eye-movement results: categories are graded.
Gradiency in the input is
preserved throughout the
processing stream.
Categories
Encoding continuous cues
Variance Reduction in Speech
Activate words
Gradient Sensitivity
to fine-grained detail
Competition
Graded Activation
Identify phonemes
Extract acoustic cues
Normalization
Warping perceptual space
Gradient Sensitivity
to fine-grained detail
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Why Speech Perception?
Problems
• Continuous detail could be useful (Martin & Bunnel, 1981;
Gow, 2001; McMurray et al, 2009).
• Some useful variation is not phonemic (Salverda, Dahan &
McQueen, 2003; Gow & Gordon, 1995)
• Acoustic cues are spread out over time – how do you
know when you are done with a phoneme and ready
for word recognition?
d ɹ
æ
g
ə
n
Cole, Linebaugh, Munson & McMurray, 2010
Activate words
Competition
Graded Activation
Identify phonemes
Extract acoustic cues
Is phoneme
recognition done
before word
recognition begins?
Temporal Integration
Example:
Asynchronous cues to voicing:
VOT
Vowel Length
VOT
Vowel Length
McMurray, Clayards, Tanenhaus & Aslin (2008, PB&R)
Toscano & McMurray (submitted)
VOT
“Buffer” model
Vowel
Length
time
Buffer
Lexicon
Problems
Vowel length not be available until the end of the word.
How do you know when the buffer has enough
information?
What about early lexical commitments?
VOT
“Buffer” model
Vowel
Length
time
Buffer
Lexicon
• Integration at the
Lexicon
VOT
Vowel
Length
time
When do effects on lexical activation occur?
VOT effects cooccurs with vowel length.
(Buffered Integration)
VOT precedes vowel length.
(Lexical integration)
McMurray, Clayards, Tanenhaus & Aslin (2008, PB&R)
Toscano & McMurray (submitted)
2 Vowel
Lengths
9-step VOT continua (0-40 ms)
x
beach/peach
beak/peak
bees/peas
The usual task
1080 Trials
Mouse click results
1
0.9
Long
Short
% /p/ response
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
/b/
5
10
15
20
VOT
25
30
35
40
/p/
Compute 2 effect sizes at each 20 ms time slice.
• VOT: Regression slope of competitor fixations as a
function of VOT.
Competitor Fixations
Time = 320 ms…
0.2
0.18
0.16
-5
-4
-3
-2
-1
Looks to P
0.14
0.12
0.1
0.08
t
0.06
0.04
0.14
0.12
0.1
0.08
0.06
Fix = M320·VOT + B
M320 = 0
0.04
0.02
0
0.02
-30
0
0
200
400
600
800
1000
Time (ms)
1200
1400
1600
1800
2000
-25
-20
-15
-10
-5
Distance from Boundary (VOT)
0
Compute 2 effect sizes at each 20 ms time slice.
• VOT: Regression slope of competitor fixations as a
function of VOT.
Competitor Fixations
Time = 720 ms…
0.2
0.18
0.16
-5
-4
-3
-2
-1
Looks to P
0.14
0.12
0.1
0.08
t
0.06
0.04
0.14
0.12
0.1
0.08
0.06
Fix = M720·VOT + B
0.04
0.02
0
0.02
-30
0
0
200
400
600
800
1000
Time (ms)
1200
1400
1600
1800
2000
-25
-20
-15
-10
-5
Distance from Boundary (VOT)
0
1
VOT
0.8
Effect Size
Vowel Length
0.6
0.4
0.2
0
0
200
400
600
800
1000
Time (ms)
Voicing
VOT: 228 ms
Vowel: 548 ms
Temporal Integration Summary
VOT used as soon as it is available:
• Replicates with b/w.
• Replicates for natural continua (Toscano & McMurray,
submitted)
• Also shown when the primary cue comes after the
secondary cue (Galle & McMurray, in prep)
Preliminary decisions cascade all the way to lexical
processes.
• Make a partial (lexical) commitment
• Update as new information arrives.
• Lexical competition processes are primary.
Variance Reduction in Speech
Activate words
Identify phonemes
Lexical activation is
sensitive to information
that should have been lost
during categorization.
Integrating low-level
material seems to occur at
lexical level.
Extract acoustic cues
What are the role of
phonemic representations
in speech perception?
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
How do we approach the lack of invariance?
1) Hedge our bets: make graded commitments and wait
for more information.
2) Use multiple sources of information.
How far can this get us?
McMurray & Jongman (2011, Psychological Review)
• Collected 2880 recordings of the 8 fricatives.
- 20 speakers, 6 vowels.
• Measured 24 different cues for each token.
• Humans classified a subset of 240 tokens.
Frication
onset
Vowel
offset
Frication
offset
DURF
DURV
F5AMPF
F5AMPV
F3AMPF
F3AMPV
F2
F1
LFAMP
W1
W2
W3
Transition
8 Categories
Logistic Regression
24 Cues
Is the information present in the input sufficient to
distinguish categories?
• All cues reported in literature (+5 new ones)
• Overly powerful learning model.
Asymptotic statistical learning model
Human performance: 91.2% correct.
10 best cues
All 24 cues
1
0.8
More information  Better
performance.
• But still not as good0.6 as listeners.
Proportion Correct
Proportion Correct
1
0.8
0.6
0.4
0.2
0
f
v
0.4
Why shouldn’t
it be? 0.2
Listeners
• WeModel
measured everything.
0
•
Supervised
learning.
f
v
ð
s
z
ɵ
ɵ
ʃ
ʒ
• Fricative
Optimal statistical classifier.
74.5% – 83.3%
Listeners
Model
ð
s
Fricative
z
79.2% – 85.0%
ʃ
ʒ
Still need to compensate for variability in cues due to
speaker, vowel.
Raw Values
400
500
F1 (Hz)
600
700
800
E
U
900
1000
2200
2000
1800
1600
1400
1200
Simple compensation scheme:
• Listener identifies speaker,
vowel.
• Recodes cues relative to
expectations for
that speaker/vowel.
1000
F2 (Hz)
Crucially: this maintains a continuous representation
and does not discard fine-grained detail.
Cole, Linebaugh, Munson & McMurray (2010)
see also Fowler & Smith (1986), Gow (2003)
Still need to compensate for variability in cues due to
speaker, vowel.
Raw Values
400
-200
500
-150
-100
F1 (Hz)
F1 (Hz)
600
700
800
E
U
900
-50
0
50
E
U
100
150
1000
2200
2000
1800
1600
F2 (Hz)
1400
1200
1000
600
400
200
0
-200
-400
-600
F2 (Hz)
Crucially: this maintains a continuous representation
and does not discard fine-grained detail.
Cole, Linebaugh, Munson & McMurray (2010)
see also Fowler & Smith (1986), Gow (2003)
Measurements used as input to a logistic regression
classifier.
• Matched to human performance on the same
recordings: 91.2% correct.
+compensation
1
1
0.8
0.8
Proportion Correct
Proportion Correct
All 24 cues
0.6
0.4
Listeners
Model
0.2
0
f
v
ɵ
ð
s
Fricative
z
79.2% – 85.0%
ʃ
ʒ
0.6
Listeners
Model
0.4
0.2
0
f
v
ɵ
ð
s
Fricative
z
87.0% – 92.9%
ʃ
ʒ
Measurements used as input to a logistic regression
classifier.
• Matched to human performance on the same
recordings: 91.2% correct.
All 24 cues
+compensation
1
0.9
0.9
Proportion Correct
Proportion Correct
1
0.8
0.8
0.7
0.7
Listeners (Complete)
Cue-Integration Model
0.6
0.5
i
u
Vowel
Listeners (Complete)
Parsing Model
0.6
ɑ
0.5
i
u
Vowel
ɑ
We can match human performance with a simple
model as long as:
1) System codes many sources of information.
- No single cue is crucial.
- Redundancy is the key.
2) Cues are encoded veridically and continuously
- Need to preserve as much information as possible.
3) Cues are encoded relative to expected values
derived from context (e.g. speaker and vowel).
Speech and Word Recognition
So what is the process of speech perception?
1) Early perceptual processes are continuous.
- Many many cues are used.
- Cues are coded relative to expectations about talker,
neighboring phonemes (etc).
2) Make graded commitment at lexical level.
- Update when more information arrives.
3) Competition between lexical items sorts it out.
- Language processes are essential for speech perception.
Online Word Recognition
Input:
s... æ…
time
soup
sandal
sack
candle
dog
n…
d…
ə…
l
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
A process-oriented approach to
individual differences.
beach
1) Define the process:
• What steps does the brain/mind/language
system/child take to get from some clearly defined
input to some clearly defined output?
2) How can we measure this process as it happens?
3) Identify a population:
• What will we relate variation in process to?
4) What dimensions can vary within that process?
• Which covary with outcome variables?
The Domain: Speech & Words
What type of individual differences should we be studying?
Variation that is:
• Wide-spread
• Related to broad-based language skills?
• Empirically correlated with speech perception?
Language Impairment
Specific language impairment (SLI) has often been
associated with phonological deficits (Bishop & Snowling,
2004; Joanisse & Seidenberg, 2003; Sussman, 1993)
Generalized language deficits: morphology, word
learning, perception without any obvious causal factors
•
•
•
•
•
Normal non-verbal IQ
No speech motor problems
No hearing impairment
No developmental disorder
No neurological problems
Language Level above Kindergarten Minimum
• Affects 7-8% of children.
• Remarkably stable over development.
10
8
6
4
2
Normal
PLD
Normal
PLD
0
0
2
4
6
8
10
Age above Kindergarten Minimum Age
12
A wealth of evidence suggests a perceptual /
phonological deficit associated with SLI.
• Impaired categorical perception
Godfrey et al (1981); Thibodeau & Sussman (1987); Werker & Tees
(1987); Leonard & McGregor (1992); Manis et al (1997); Nittrouer
(1999); Blomert & Mitterer (2001); Serniclaes et al (2001); Sussman
(2001); Van Alphen et al (2004); Serniclaes et al (2004); but see Coady,
Kluender & Evans (2005), Gupta & Tomblin (in prep);
A wealth of evidence suggests a perceptual /
phonological deficit associated with SLI.
• Impaired categorical perception
Poor endpoint ID
Shallower slopes
Flat Discrimination
Normal
Impaired
Impaired
Normal
Impaired
Normal
Within-Category Discrim
No difference
Impaired
Normal
Impaired
Normal
Dimensions of Individual Differences
But, given evidence against categorical
perception as an organizing principle of speech
perception, what does this mean?
Dimensions of Individual Differences
Candidate dimensions for individual differences:
1) Auditory processes responsible for encoding cues.
But: signal is highly redundant.
Listeners don’t rely on any single cue (or type of cue).
Auditory disruption would have to be massive.
2) Processes of
• Gradually committing to a word.
• Updating activation as new information arrives.
• Competition between words.
Dimensions of Individual Differences
Candidate dimensions for individual differences:
1) Auditory processes responsible for encoding cues.
But: signal is highly redundant.
Listeners don’t rely on any single cue (or type of cue).
Auditory disruption would have to be massive.
2) Processes of
• Gradually committing to a word.
• Updating activation as new information arrives.
• Competition between words.
Methods
41 sets.
Known words to our
subjects (familiarity
survey)
All items appear as
targets.
Natural recordings.
McMurray, Samelson, Lee & Tomblin (2010, Cognitive Psychology)
Individual differences approach.
SLI
N=20
Controls
N=40
Separate effects of
• language impairment
• cognitive impairment
Language ability
NLI
N=17
SCI
N=16
Performance IQ
% Correct
RT
Normal
99.2
1429
SCI
99.0
1493
SLI
98.2
1450
NLI
98.2
1635
0.9
0.8
Fixation Proportion
0.7
0.6
Target
0.5
Cohort
0.4
Rhyme
0.3
Unrelated
0.2
0.1
0
0
500
1000
1500
2000
Time (ms)
Normal Subjects
0.9
0.8
Fixation Proportion
0.7
Overall
0.6
Target
0.5
Cohort
0.4
1) All four groups perform well in task.
Rhyme
0.3
Unrelated
2) All four
groups show
0.2
- incremental
processing
- Parallel
activation of cohorts/rhymes.
0.1
0
0
500
1000
1500
Time (ms)
2000
NLI (Language + Cognition Impaired)
Fixation Proportion
1
0.8
0.6
TD
SCI
SLI
0.4
NLI
0.2
0
0
500
1000
Time (ms)
1500
2000
Variance Reduction in Speech
Logistic Function
peak (p)
baseline (b)
crossover (c)
Time (ms)
Fixation Proportion
1
0.8
0.6
TD
SCI
SLI
0.4
NLI
0.2
0
0
500
1000
1500
2000
Time (ms)
Slope
Asymptote
Cross-over
Language
IQ
p=.002
p=.004
n.s.
n.s.
n.s.
n.s.
Effects on target were unexpected.
Why would subjects be fixating the target less (given
that they correctly identified it)?
Not due to
• Calibration accuracy of eye-tracker
• Knowledge of target words.
• Inability to recognize competitors.
Suggests target may be
less active.
Fixation Proportion
1
0.8
0.6
TD
SCI
SLI
0.4
NLI
0.2
0
0
500
1000
Time (ms)
1500
2000
0.25
N
Cohort Fixations
0.2
SCI
SLI
0.15
NLI
0.1
0.05
0
0
500
1000
Time (ms)
1500
2000
Asymmetric Gaussian Function
Variance Reduction in Speech
peak
height (h)
onset
slope (1) offset
slope (2)
onset
baseline (b1)
peak location ()
offset baseline (b2)
0.25
N
Cohort Fixations
0.2
SCI
SLI
0.15
NLI
0.1
0.05
0
0
500
1000
1500
2000
Time (ms)
Onset slope
Peak Location
Peak
Offset slope
Baseline
Language
n.s.
n.s.
n.s.
p=.005
p=.064+
IQ
n.s.
n.s.
n.s.
n.s.
n.s.
0.14
N
Fixation Proportion
0.12
SCI
0.1
SLI
NLI
0.08
0.06
0.04
0.02
0
0
500
1000
1500
2000
Time (ms)
Onset slope
Peak Location
Peak
Offset slope
Baseline
Language
n.s.
n.s.
n.s.
n.s.
p=.045
IQ
n.s.
n.s.
n.s.
n.s.
n.s.
Summary
IQ showed few effects.
Target: lower peak fixations/activation for LI
Cohort: higher peak fixations for LI.
Rhyme: higher peak fixations for LI.
What computational differences could account for this
timecourse of activation?
0.25
N
0.2
SCI
Fixation Proportion
SLI
0.15
NLI
0.1
0.05
0
0
500
1000
Time (ms)
1500
2000
TRACE
Lateral competition
beaker
beetle
lamp
Words
Excitatory
connections
Inhibitory
connections
b
e
power voiced
k
t
acute
diffuse
r
l
grave
Phonemes
Features
Fixation probability maps onto lexical activation
(transformed via a simple linking hypothesis).
(Allopenna, Magnuson & Tanenhaus, 1998; Dahan, Magnuson & Tanenhaus, 2001;
McMurray, Samelson, Lee & Tomblin, 2010)
Probability of Fixation
Activations in TRACE
Fixation Proportion
0.8
0.6
0.4
0.2
0
0
400
800
Time (ms)
1200
1600
0
400
800
1200
Time (ms)
TRACE Activations: 99% of the variance
1600
TRACE
Lateral competition
beaker
beetle
lamp
Words
Excitatory
connections
Inhibitory
connections
b
e
power voiced
k
t
acute
diffuse
r
l
grave
Phonemes
Features
Global Parameters
• Maximum Activation
• # of known words
Lexical Parameters
• Lexical Inhibition
• Phoneme->Word
• Decay
Phonological Parameters
• Phoneme Inhibition
• Feature->Phoneme
• Phoneme Decay
Perceptual Parameters
• Input Noise
• Feature Spread
• Feature Decay
Strategy:
1) Vary parameter.
2) Does it yield the same
kind of variability we
observed in SLI?
Summary:
• Most parameters failed.
Global Parameters
• Generalized slowing
• # of known words
Perceptual Parameters
• Input Noise
• Feature Spread
• Feature Decay
Phonological Parameters
• Phoneme Inhibition
• Feature->Phoneme
• Phoneme Decay
Lexical Parameters
• Lexical Inhibition
• Phoneme->Word
• Lexical Decay
1
0.9
Normal
LI
Fixation Probability
0.8
0.7
0.6
0.025
0.5
0.02 (default / normal)
0.4
0.015
(robustness of phonology)
0.01
0.3
Phoneme activation.
0.005 (impaired)
0.2
0.1
1
0
0.9
20
40
60
80
0.8
Time (Frames)
0.7
Feature Decay
(sensory memory /
organization)
Fixation Probability
0
100
0.6
0.5
0.01 (default / normal)
0.4
0.02
0.3
0.03
0.2
0.04 (impaired)
0.1
0
0
20
40
60
Time (Frames)
80
100
1
0.9
Normal
LI
Fixation Probability
0.8
0.7
0.6
0.5
Phoneme
Inhibition
0.4
0.3
0.2
(categorical
perception)
0.1
0
0
500
1000
1500
2000
Time (Frames)
Higher level processes (e.g. word recognition) are
largely immune to variation in phoneme processing.
1
0.9
Normal
LI
0.7
0.6
0.5
Lexical Decay
0.4
0.3
0.2
0.1
0.2
0
0
500
1000
0.18
1500
Normal
LI
2000
0.16
Time (Frames)
Fixation Probability
Fixation Probability
0.8
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
500
1000
Time (Frames)
1500
2000
Lexical Decay
Lexical Decay
Phoneme Decay
Lexical Act
Input noise
General Slowing
Lexical Inhibition
Phoneme Act
Feature Decay
Input noise
Lexical Activation
Feature Decay
Feature Spread
Phoneme Decay
General Slowing
Lexical Inhibition
General Inhibition
General Inhibition
Phoneme Activation
Phoneme Inhibition
Phoneme Inhibition
Feature Spread
Lexical Size
Lexical Size
0
0.01
0.02
0.03
Model Fit (RMS Error)
0.04
0.05
0.02
0.03
0.04
0.05
0.06
Model Fit (RMS Error)
0.07
0.08
Robust deficit in lexical competition processes associated
with SLI.
• Late in processing.
• Too much competitor activation / not enough target.
TRACE modeling indicates a lexical, not perceptual locus.
• Dynamics / stability of lexical activation over time.
Provides indirect evidence against a speech perception
deficit in accounting for word recognition deficits in SLI.
Can we ask more directly:
1) Are SLI listeners speech categories structured
gradiently (like normals)?
2) Are SLI listeners overly sensitive or insensitive to
within-category detail?
9-step VOT continua
beach/peach
bear/pear
bomb/palm
bale/pale
bump/pump
butter/putter
Munson, McMurray & Tomblin (submitted)
Competitor Fixations
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Poorly structured phonological categories?
Competitor Fixations
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0
5
10
15
20
25
30
35
40
VOT (ms)
Improperly tuned lexical competition?
Subjects (IQ uncontrolled):
Subjects run in
mobile lab at their
homes and schools.
42 normal
35 language impaired
% /p/ Responses
1
0.8
0.6
0.4
LI
TD
0.2
0
0
10
20
30
40
VOT (ms)
Normal looking identification (mouse click) functions.
Few observable differences.
0.1
0.09
LI
N
Competitor Fixations
0.08
0.07
0.06
0.05
Category
Boundary
0.04
0.03
-30
-20
-10
0
10
20
rVOT (ms)
Problem
not to competitor.
LI: Moreislooks
Sensitivity
to VOTto VOT.
No •effect
on sensitivity
• Nature of phonetic categories.
30
Summary
Robust deficit in lexical competition associated with SLI.
• Late in processing.
• Too much competitor activation / not enough target.
TRACE modeling indicates a lexical, not perceptual locus.
• Dynamics / stability of lexical activation over time.
LI listeners do not show unique differences in their response
to phonetic cues (as reflected in lexical activation).
What is the source of their deficit?
Are they just developmentally delayed?
Development
Do the changes in lexical activation
dynamics over development match
the changes with SLI?
• N=17 TD adolescents.
• Target/Cohort/Rhyme unrelated
paradigm
Fixations to Target
1
SLI
0.8
0.6
Normal
LI
0.4
16 y.o.
0.2
9 y.o.
0
0
500
1000
1500
2000
Time (ms)
McMurray, Walker & Greiner (in preparation)
Development
Do the changes in lexical activation
dynamics over development match
the changes with SLI?
• N=17 TD adolescents.
• Target/Cohort/Rhyme unrelated
paradigm
Fixations to Cohort
0.3
SLI
0.2
16 y.o.
9 y.o.
Normal
LI
0.1
0
0
500
1000
Time (ms)
1500
2000
Summary
Robust deficit in lexical competition associated with SLI (see
also Dollaghan, 1998; Montgomery, 2000; Mainela-Arnold, Evans & Coady, 2008).
• Late in processing.
• Too much competitor activation / not enough target.
TRACE modeling indicates a lexical, not perceptual locus.
• Dynamics / stability of lexical activation over time.
LI listeners do not show unique differences in their response
to phonetic cues (as reflected in lexical activation).
There is still development in basic word recognition
processes between 9 and 16.
But: Development affects speed of target activation, early
competitor activation. Different from LI.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Reliability
Work on SLI and TD adolescents suggest that measuring
the timecourse of word recognition can be sensitive to
different causes of differences.
• Listeners can get to the same outcome (the word) via
different routes.
1) To what extent is this measure reliable across tests?
Farris-Trimble & McMurray (in preparation)
Reliability
1) To what extent is this measure reliable across tests?
2) To what extent is this measure about fixations and
visual processes?
Test 1
+1 week
sandal
sandal
Test 2
Reliability
1) To what extent is this measure reliable across tests?
2) To what extent is this measure about fixations and
visual processes?
Test 1
+1 week
sandal
sandal
Test 2
Reliability
1) To what extent is this measure reliable across tests?
2) To what extent is this measure about fixations and
visual processes?
Test 1
+1 week
sandal
sandal
Test 2
+1 week
Test 3
Reliability
1) To what extent is this measure reliable across tests?
2) To what extent is this measure about fixations and
visual processes?
Test 1
+1 week
sandal
sandal
Test 2
+1 week
Test 3
Reliability
1) To what extent is this measure reliable across tests?
2) To what extent is this measure about fixations and
visual processes?
Test 1
+1 week
sandal
sandal
Test 2
+1 week
Test 3
baseline (b)
crossover (c)
Time (ms)
Asymmetric Gaussian Function
Logistic Function
peak (p)
peak
height (h)
onset
slope (1) offset
slope (2)
onset
baseline
(b1)
offset baseline (b2)
peak location ()
Variance Reduction in Speech
Cohort
Target
R2
Cross-over
Slope
Max
Peak
Peak Time
Baseline
Predictor
Auditory Visual
.63**
.30**
.43**
.01
.28**
.18**
.52**
.43**
.37**
.11*
.35**
.17**
Variance Reduction in Speech
Max: Auditory-2
1
0.9
0.8
0.7
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.9
0.95
1
Max: Auditory-1
Max: Auditory-2
1
0.9
0.8
0.7
0.6
0.65
0.7
0.75
0.8
0.85
Max: Auditory-1
Variance Reduction in Speech
Slope: Auditory-2
0.0035
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
0
0.001
0.002
0.003
0.004
0.005
0.006
0.005
0.006
Slope: Auditory-1
Slope: Auditory-2
0.0035
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
0
0.001
0.002
0.003
0.004
Slope: Visual-1
Summary
Work on SLI and TD adolescents suggest that measuring the
timecourse of word recognition can be sensitive to different
profiles of online processing..
• Listeners can get to the same outcome (the word) via
different routes.
1) This measure is reliable across tests
• Some components had correlations upward of .8
2) Visual processes (eye movements, visual search, decision
making) account for some of this
• But some is uniquely due to auditory/lexical processes.
Overview
1) Speech perception as a language process
• Problems of Speech and word recognition
• Fine-grained detail and word recognition.
• Revisiting categorical perception
• Using acoustic detail over time.
• The beginnings of a comprehensive approach.
2) Individual differences
• A process view of individual differences.
• Case study 1: SLI
• Eye-movement methods for individual differences.
• Case study 2: Cochlear Implants.
Speech and Word Recognition
Candidate dimensions for individual differences in
processing
1) Auditory processes responsible
for encoding
cues.
Cochlear
Implants?
But: signal is highly redundant.
Listeners don’t rely on any single cue (or type of cue).
Auditory disruption would have to be massive.
2) Processes of
• Gradually committing to a word.
• Updating activation as new information arrives.
SLI
• Competition between words.
Speech and Word Recognition
Speech and Word Recognition
Cochlear Implant users
• Should show a deficit in spoken word recognition
(Helms et al., 1997; Balkany et al., 2007; Sommers, Kirk & Pisoni, 1996)
• Temporal dynamics of lexical activation may follow
a different profile of online activation.
In addition: to what extent are differences driven by
• Poor signal encoding?
• Adapting / learning to cope with the implant?
Speech and Word Recognition
29 Adult CI users (postlingually deafened)
26 NH listeners
29 Word Sets
x 5 reps
580 trials
sandal
Farris-Trimble & McMurray (submitted)
Speech and Word Recognition
Fixations to Target
1
0.8
0.6
CI adults
0.4
NH Adults
0.2
0
0
500
1000
1500
2000
Time (ms)
Significant differences in
• Slope (p<.001)
• Cross-over / Delay (p<.001)
• Maximum (p=.01)
Speech and Word Recognition
Fixations to Cohort
0.2
0.15
CI adults
NH Adults
0.1
0.05
0
0
250
500
750
1000
1250
Time (ms)
Significant differences in
• Slope (p=.001)
• Peak location (p=.004)
• Offset slope (p=.007)
• Baseline (p<.001)
1500
Speech and Word Recognition
CI listeners…
• show effects both early and late in the timecourse.
• Are delayed to get going: require more information to
start activating words.
• maintain competitor activation more than NH listeners.
Which of these is driven by poor signal, which by
adaptation?
31 NH Listeners
• Normal words (N=15)
• 8-channel CI simulation (N=16)
Speech and Word Recognition
Fixations to Target
1
0.8
0.6
0.4
8-Channel Simulation
NH Adults
0.2
0
0
500
1000
1500
Time (ms)
Significant differences in
• Cross-over / Delay (p<.001)
Marginal effect on
• Slope (p=.07)
No effect for maximum (t<1)
2000
Speech and Word Recognition
Fixations to Target
0.2
0.15
8-Channel Simulation
NH Adults
0.1
0.05
0
0
250
500
750
1000
Time (ms)
Significant differences in
• Slope (p=.015)
Marginal effects in
• Peak location (p=.067)
• Baseline (p=.058)
No effect on offset slope (T<1)
1250
1500
CI Adult Summary
CI listeners…
• show effects both early and late in the timecourse.
• require more information to start activating words.
• maintain competitor activation more than NH listeners.
1) Degraded signal slows growth of activation for targets
and competitors.
• Also increases chance of misidentifying segments.
2) Listeners adapt by keeping competitors around
• about
In casepediatrically
they need todeafened
revise due
to later
material.
What
child
users?
They face an additional problem:
Learning language with a degraded signal.
CI Kids
Ongoing work
CI users
N 24
Age 17 (12-26)
NH controls
13
15.5 (12-17)
1
Target Fixations
0.8
0.6
Looks to target
Same effects as adults:
Slower, later, lower
NH Children
CI children
CI adults
0.4
0.2
0
0
500
1000
1500
2000
Time (ms)
• Cross-over / Delay (p<.001)
• Slope (p<.001)
• Maximum (p=.006)
Farris-Trimble & McMurray (in prep)
CI Kids
Cohort Fixations
0.2
Looks to cohorts
Similar to adults
but with
NH Children
CI children
CI adults
0.15
0.1
Reduced peak fixation.
0.05
0
0
500
1000
1500
Time (ms)
• Slope (p<.001)
• Peak location (p<.001)
• Peak Height (p=.058)
• Baseline (p<.001)
2000
CI Summary
Degraded input effects early portions of the timecourse
of processing.
• Delay to get started
• Slower activation growth.
Adaptation to the input affects later components
• Increased competitor activation (hedging your bets).
Children show all these effects in the extreme.
• And with reduced competitor activation.
Conclusions
Basic speech perception findings
1) Fine-grained detail is crucial for word recognition.
• Available in sensory encoding of cues.
• Preserved up to level of lexical activation.
• Compensating for speaker/coarticulation in a way
that preserves it allows for excellent speech
recognition.
Perception is not about coping with irrelevant variation.
Conclusions
Basic speech perception findings
1) Fine-grained detail is crucial for word recognition.
Perception is not about coping with irrelevant variation.
2) Lexical activation makes a graded commitment on the
basis of partial information and waits for more.
•
Do people need to make a discrete phoneme
decision as a precursor to word recognition?
Conclusions
Basic speech perception findings
1) Fine-grained detail is crucial for word recognition.
Perception is not about coping with irrelevant variation.
2) Lexical activation makes a graded commitment on the
basis of partial information and waits for more.
3) Speech perception must harness massively redundant
sources of information.
•
Only by harnessing 24 cues + compensation could we
achieve listener performance on fricative categorization.
Conclusions
Basic speech perception findings
1) Fine-grained detail is crucial for word recognition.
Perception is not about coping with irrelevant variation.
2) Lexical activation makes a graded commitment on the
basis of partial information and waits for more.
3) Speech perception must harness massively redundant
sources of information.
4) Implications for impairment
•
•
•
Single cue explanations of SLI don’t make sense.
Impairments in categorical perception may be
impairments in ability to do that task.
Are phonological representations causally related to word
recognition?
Conclusions
Specific Language Impairment
1) SLI (functional language outcomes) more related to
lexical deficits than perceptual/ phonological.
- Consistent with work challenging causal role for
phonology in word recognition.
- Different effects than for development.
2) Could this have effects down-stream (e.g.,
syntax/morphology/learning)?
- If word recognition is not outputting a single
candidate this would make parsing much harder (and
see Levy, Bicknell, Slattery & Rayner, 2008)
-
Generalized deficit in decay/maintaining activation in
multiple components of the system.
Conclusions
Cochlear Implants
1) Timecourse of word recognition is shaped by
- Degraded input
- Listeners adaptation to that at a lexical level.
- Development
2) CI outcomes are as much a cognitive (e.g., lexical) issue
as a perceptual one (see Conway, Pisoni & Kronenberger, 2009).
3) Cascading processes can have unexpected consequence.
- Child CI users activate words so slowly they appear
to have less competition!
Conclusions
Individual Differences more broadly
1) Different populations get to the same outcome via vastly
different mechanisms.
Typical Development
Language Impairment
Normal
LI
16 y.o.
9 y.o.
Cochlear Implants
NH Children
CI children
CI adults
- Need to use measures sensitive to online processing (in
conjunction with speech and language outcomes)
- Need to consider how children accomplish a language
goal rather than language as a measurable outcome.
Conclusions
Individual Differences more broadly
1) Different populations get to the same outcome via vastly
different mechanisms.
2) Gradations in dynamics of lexical activation /
competition can a good way to describe individual
differences at a mechanistic level.
What’s the developmental cause of such differences?
Conclusions
Individual Differences more broadly
1) Different populations get to the same outcome via vastly
different mechanisms.
2) Gradations in dynamics of lexical activation /
competition can a good way to describe individual
differences at a mechanistic level.
3) Multiple-population work can reveal broader
mechanisms at play over language development
- SLI: known language deficit, maybe perceptual deficit.
- CI users: known perceptual deficit, maybe language.
- Child CI users: Both?
Conclusions
By looking at how children use language in real-time, we
might understand better how language develops.
or
Processing
or
Development
Only by studying both can we form a
model of change we can believe in.
Download