INTERSPEECH2007SKS

advertisement
The Neural Basis of Speech
Perception – a view from
functional imaging
Sophie Scott
Institute of Cognitive
Neuroscience,
University College London
This approach to speech
perception
• Speech is an auditory signal
• It is possible to address the neural processing
of speech within the framework of auditory
cortical processing.
• This is not synonymous with the entire
language system.
• If one is a skilled speaker of a language, then
speech perception is obligatory.
Functional imaging
• Where neural activity occurs, blood is
directed.
• Measure neural activity by tracking these
changes in local blood flow.
• Thus measuring mass synaptic activity
• Poor temporal resolution
• Essentially a comparison of blood flow
changes across conditions - so the baseline
comparisons are critical
Listening
Wise et al, Lancet, 2001
Neuroanatomy of speech
Speech
production
Speech
perception
Scott and Johnsrude, 2003, from Romanski et al, 1999
CM
A1
A1
MC
Tpt
TS2
TS1
Pro
CP
medial
Ins
TS3
caudal
RM R
RTM RT
RTL
sts
CORE
ML
AL
RP
lateral
BELT
rostral
PARABELT
Core
Belt
Parabelt
STGc
AI
CL
CBP
R
ML
RBP
RT
Prefrontal
cortex
Dorsal
prearcuate (8a)
Dorsal
principal sulcus
(46)
Inferior
convexity
(12)
AL
STGr
Orbital
polar
From Kaas and Hackett, 1999
Spatial representations
tonotopy
bandwidth
Conspecific vocalisations
Anterior
Posterior
STP
Tpt
HG PT
Ventral
Assoc
STS
STP
C
B
PB
Human
Assoc
STS
Monkey
Scott and Johnsrude, 2003
anterior
medial
lateral
posterior
AA
MA
LA
A1
PA
STA
ALA
LP
Scott and Johnsrude, 2003
Sounds with
harmonic structure
against pure
tones: Hall,
Johnsrude et al.,
2002
anterior
medial
lateral
Frequency modulated
tones against
unmodulated tones:
Hall, Johnsrude et al.,
2002
Amplitude
modulated noise
against
unmodulated noise:
Giraud et al, 1999
posterior
Spectral change against
steady state sounds:
Thivard et al, 2000
Hierarchical processing
• Structure in sound is computed beyond
primary auditory cortex
• More complex structure (e.g. spectral
change) processed further from PAC
• How does this relate to speech
processing?
speech
rotated speech
noise vocoded
speech
rotated noise
vocoded speech
(Sp + VCo + RSp) - RVCo
-60 -4 -10 Z = 6.6
Left hemisphere
1
0
(Sp + VCo + RSp) - RVCo
-64 -38 0 Z = 5.7
1
0
-1
-1
-2
-2
Sp VCo RSp RVCo
Anterior
Sp VCo RSp RVCo
(Sp + VCo) - (RSp + RVCo)
-54 +6 -16 Z = 4.7
(Sp + VCo) - (RSp + RVCo)
-62 -12 -12 Z = 5.5
2
1
1
0
0
-1
-1
-2
Sp VCo RSp RVCo
Scott, Blank, Rosen and Wise, 2000
Sp VCo RSp RVCo
Right hemisphere
Anterior
(Sp + RSp) - (VCo + RVCo)
+66 -12 0 Z = 6.7
2
1
0
-1
Sp VCo RSp RVCo
Scott, Blank, Rosen and Wise, 2000
Intelligibility
Plasticity within this system
Naïve subjects were scanned before they could
understand noise vocoded speech, then they were
trained, then scanned again.
Flexibility in speech perception: learning to
understand noise vocoded speech
Activity to noise vocoded speech after a training period, relative to prior activity to
NVC before the training period. Narain, Wise, Rosen, Matthews, Scott, under
review.
As well as left lateralised STS, there is involvement
of left premotor cortex and the left anterior thalamus
(which receive projections from the belt and
parabelt).
Spectrograms of the stimuli
(speech)
16
8
4
3
2
1
(rotated speech)
16R
3R
Intelligibility - behavioural data
Z=5.6 x=-62 y=-10 z=80
1 2
3
Z=4.52 x=-64 y=-28 z=8
Left
4 8 16 3R 16R
Right
1 2
3
4 8 16 3R 16R
Z=5.96 x=64 y=-4 z=-2
Z=4.73 x=-48 y=-16 z=-16
1 2
3
4 8 16 3R 16R
1 2
3
4 8 16 3R 16R
Scott, Rosen, Lang and Wise, 2006
Scott and Johnsrude, 2003
Sounds with
harmonic structure
against pure
tones: Hall,
Johnsrude et al.,
2002
anterior
medial
lateral
Frequency modulated
tones against
unmodulated tones:
Hall, Johnsrude et al.,
2002
Amplitude
modulated noise
against
unmodulated noise:
Giraud et al, 1999
posterior
Peak responses to
Intelligibility (Scott et
al, 2006)
Spectral change against
steady state sounds:
Thivard et al, 2000
Speech specific processing
• Does not occur in primary auditory
cortexd
• Begins early in auditory cortex - in areas
that also respond to AM
• As we move forward down the STS, the
responses become less sensitive to
acoustic structure - resembles
behavioural profile
Speech comprehension - The
role of context
e.g., words recognised more easily in
sentences
•
• “The ship sailed the sea” > “Paul discussed
the dive”.
• Can we identify the neural basis of this
contextual modulation of speech
comprehension?
(Miller et al., 1951; Boothroyd and Nittrouer, 1988; Grant and Seitz, 2000;
Stickney and Assmann, 2001; Davis et al., 2005)
(noise vocoding:
Shannon et al., 1995
predictability:
Kalikow et al., 1977)
Low predictability:
log increase with more channels
…‘Sue was
interested
in the
bruise’…
jonas obleser
27
High predictability:
influence at intermediate number of channels
…‘Sue was
interested
in the bruise’…
…‘He caught the
fish in his net’…
jonas obleser
Behav 2 low+high
28
Bottom-up processes:
correlations with number of channels
(cf. e.g. Binder et al. 2000; Scott et al., 2000; Davis & Johnsrude 2003; Zekveld et al., 2006)
RFX p<0.005 uncorrected, k>30
Obleser, Wise, Dresner, & Scott, 2007
Left-hemispheric array of brain regions
when context affects comprehension
Lateral Prefrontal (BA 8)
Medial Prefrontal (BA 9)
Angular Gyrus (BA 39)
Ventral IFG (BA 47)
Posterior Cingulate (BA 30)
RFX p<0.005 uncorrected, k>30
Obleser, Wise, Dresner, & Scott, 2007
findings
• A range of brain areas outwith auditory
cortex contribute to ‘top down’ semantic
influences on speech perception
• Further studies will be able to dissociate
the contributions of different linguistic
factors
Words are not the only things
we say
Non speech sounds?
x=54
Regions in
red respond
to noises
and rotated
noises
Regions in yellow respond to noises and
rotated noises
Right hemisphere
Anterior
(Sp + RSp) - (VCo + RVCo)
+66 -12 0 Z = 6.7
2
1
0
-1
Sp VCo RSp RVCo
What drives lateral
asymmetry?
• Previous studies have not generally used
‘speech-like’ acoustic modulations
• We aimed to manipulate speech stimuli to
vary the amplitude and spectral properties of
speech independently
• Control for intelligibility
• Do we see additive effects of amplitude and
spectral modulations?
• Are these left lateralised?
Steady spectrum, steady amplitude
Steady spectrum, varying amplitude
Varying spectrum, steady amplitude
Varying spectrum, varying amplitude
Effect size
Ideal additive effects
Significantly more
activated by
stimuli with both
AM and SpM
Similar response to AM and SpM
Down for flat amplitude and spectrum
Additive effects
Flat AM SpM SpMAM
Flat AM SpM SpMAM
PET scanning, 16 runs, N=13, thresholded
at p<0.0001, 40 voxels
Additive effects
Flat AM SpM SpMAM
Flat AM SpM SpMAM
PET scanning, 16 runs, N=13, thresholded
at p<0.0001, 40 voxels
But…
• Is there a problem - were these stimuli
really processed as speech?
• To address this, 6 of the 13 subjects
were pretrained on speech exemplars,
and the speech stimuli were included as
a 5th condition.
A
B
C
D
E
speech
A
B
C
D
E
speech
Speech conditions
Flat AM SpM SpMAM
Flat AM SpM SpMAM
Flat AM SpM SpMAM
N=6, thresholded at p<0.0001, 40 voxels
Speech conditions
Flat AM SpM SpMAM
Flat AM SpM SpMAM
N=6, thresholded at p<0.0001, 40 voxels
Asymmetries in speech
perception
• Exist!
• Are not driven by simple properties of
the speech signal
• Right - preferentially processes speechlike sounds - voices?
• Left - processes linguistically relevant
information
Posterior auditory areas
• In primates, medial posterior areas
show auditory and tactile responses
• What do these areas do in speech
processing in humans?
Wise et al, 2001, Brain
Speaking and mouthing
Wise, Scott, Blank, Murphy, Mummery and Warburton, 2001
This region, in the left posterior temporalparietal junction, responds when subject
repeat a phrase, mouth the phrase silently,
or go ‘uh uh’, over mentally rehearsing the
phrase
QuickTime™ and a
DV - PAL decompressor
are needed to see this picture.
Amount of DAF (0, 50, 125, 200ms)
Listening over silence
DAF peak on
right
0
50
125 200
Neural basis of speech
perception
• Hierarchical processing of sound in auditory cortex
• The anterior ‘what’ pathway is important in the
perceptual processing of speech
• Activity in this system can be modulated by top
down linguistic factors
• There are hemispheric asymmetries in speech
perception - the left is driven by phonetic, lexical
and linguistic properties: the right is driven by pitch
variation, emotion and indexical properties
• There are sensory motor links in posterior auditory
areas - part of a ‘how’ pathway?
Scott, Current Opinions in Neurobiology, 2005
where
what
where
how
what
Scott, in press
Carolyn
McGettigan
Disa Sauter
Charlotte
Jacquemot
Sophie Scott
Frank Eisner
Richard Wise
Charvy Narain
Andrew Faulkner
Hideki Takaso
Narly Golestani
Jonas Obleser
Stuart Rosen
Download