Uploaded by Dzhuma Abakarova

Neurocomputational models of speech production

advertisement
Models of speech
production
Dzhuma Abakarova
25 September 2018
Neurocomputational model
Computational neuroscience employs mathematical models to understand the principles
that govern the development, structure, physiology and cognitive abilities of the nervous
system.
It focuses on the description of biologically plausible neurons (and neural systems), their
physiology and dynamics,
it is not concerned with biologically unrealistic disciplines such as connectionism, machine
learning, artificial neural networks, artificial intelligence and computational learning theory.
2
Why neurocomputational models
➔ Models capture the essential features of the biological system at multiple spatialtemporal scales, from membrane currents, proteins, and chemical coupling to network
oscillations, columnar and topographic architecture, and learning and memory
➔ Models frame hypotheses that can be directly tested by biological or psychological
experiments
3
Neurocomputational speech processing
What
Computer-simulation of speech production and speech perception by referring to
the natural neuronal processes of speech production and speech perception, as
they occur in the human nervous system
4
Models of speech production
Psycholinguistic models
Sensorimotor models
the whole process of speech
phonological representation → generation of
articulatory movement trajectories & an acoustic
speech signal
utterance planning → generation of a
phonological representation → generation of
articulatory speech patterns
e.g.
❖ the Levelt model (Levelt, 1999)
e.g.
●
●
●
●
●
the DIVA model (Guenter et al., 2006)
the TADA model (Saltzman & Munhall, 1989)
the ACT model (kröger et al., 2009)
the Warlaumont model (Warlaumont, 2013)
The Hickok model (Hickok, 2012)
5
Sensorimotor production models are quantitative
➔ generate measurable articulatory movement patterns and acoustic speech
signal
➔ include some knowledge gained from brain imaging & behavioral experiments
→ neuroscience-based
6
Structure vs. Knowledge
The structure of the sensorimotor models is based on neurophysiological and
neuropsychological facts whereas
speech knowledge is acquired by training artificial neural networks in the process
which imitates early stages of speech acquisition
7
DIVA
model
8
Task Dynamics model
9
ACT model (Kröger et al., 2009)
-
action-based model of speech production, speech perception, and speech
acquisition
-
consists in an artificial computer-implemented vocal tract as a front-end
module that generates articulatory speech movements and acoustic speech
signals
10
ACT
model
11
ACT
model
12
Cortical
correlates of
neural maps
in ACT model
locations of primary motor and
primary sensory maps
locations of motor plan state map
and sensory state maps
locations for the mirrored
phonetic map
Double arrows indicate neuronal
mappings
13
ACT model: STRUCTURE
-
An artificial computer-implemented vocal tract as a front-end module, which is
capable of generating articulatory speech movements and acoustic speech
signals
-
The structure of this neurocomputational model is based on
neurophysiological and neuropsychological facts
-
The structure comprises motor and sensory processing pathways
14
ACT model: KNOWLEDGE
- collected during training stages which imitate early stages of speech acquisition.
- stored in artificial self-organizing maps.
- The current neurocomputational model is capable of producing and perceiving
vowels, VC-, and CV-syllables (V = vowels and C = voiced plosives)
15
Acquisition, Step 1: babbling training
Knowledge:
➔ the vocal tract apparatus ↔ its neural control
(i.e. various motor plan states ↔ their resulting somatosensory and auditory states)
➔ No phonemic activation, just language independent general phonetic knowledge
Training sets: pre-linguistic speech items, i.e. proto-vocalic and proto-syllabic speech items
Flow:
➔ The model randomly produces proto-vocalic and proto-syllabic speech items
➔ The model listens to its own productions
➔ Sensory to motor mappings grow
Result:
➔ the adjustment of link weights of the mapping between the phonetic map
➔ the adjustment of link weights of the mapping between the sensory maps and of the mapping
between the phonetic map and the motor plan map
16
Acquisition, Step 2: imitation training
Knowledge: link neurons which represent different phonemes or phonemic descriptions of syllables with
the motor plan states and with the sensory states of their appropriate typical realizations
Training sets: language-specific vocalic and syllabic speech items
the link weights of the mapping between the phonetic map and the phonemic map are adjusted
Result:
(i) specifying regions of typical phoneme realizations (phone regions) within the phonetic map, i.e. regions
of neurons within the phonetic map, which represent typical realizations of a phoneme or of a syllable
phoneme chain
(ii) fine-tuning of the sensorimotor link weights already trained during babbling. This fine-tuning mainly
occurs at the phone regions. Thus the knowledge which is gained during imitation is language dependent.
17
ACT model: coarticulation
1. coarticulation results from the fact that the exact coordination of articulators
for executing a speech gesture is controlled by the motor execution module
and that a speech gesture is not encoded in all details on the motor plan
level. That leads to variability in gesture execution with respect to context.
2. coarticulation results from the fact that gesture specifications can vary even
on the level of the motor plan
18
THANK YOU!
19
References
Guenther FH, Ghosh SS, Tourville JA (2006). Neural modeling and imaging of the cortical interactions
underlying syllable production. Brain and Language 96, 280-301
Hickok G (2012). Computational neuroanatomy of speech production. Nature Reviews Neuroscience 13,
135-145
Levelt WJM, Roelofs A, Meyer AS (1999). A theory of lexical access in speech production. Behavioral and
Brain Sciences 22, 1-75
Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009). Towards a neurocomputational model of
speech production and perception. Speech Communication 51: 793-809
Warlaumont AS, Westermann G, Buder EH, Oller DK (2013). Prespeech motor learning in a neural
network using reinforcement. Neural Networks 38, 64-75
Saltzman EL, Munhall KG (1989). A dynamical approach to gestural patterning in speech production.
Ecological Psychology 1, 333-382
20
Download