Models of speech production Dzhuma Abakarova 25 September 2018 Neurocomputational model Computational neuroscience employs mathematical models to understand the principles that govern the development, structure, physiology and cognitive abilities of the nervous system. It focuses on the description of biologically plausible neurons (and neural systems), their physiology and dynamics, it is not concerned with biologically unrealistic disciplines such as connectionism, machine learning, artificial neural networks, artificial intelligence and computational learning theory. 2 Why neurocomputational models ➔ Models capture the essential features of the biological system at multiple spatialtemporal scales, from membrane currents, proteins, and chemical coupling to network oscillations, columnar and topographic architecture, and learning and memory ➔ Models frame hypotheses that can be directly tested by biological or psychological experiments 3 Neurocomputational speech processing What Computer-simulation of speech production and speech perception by referring to the natural neuronal processes of speech production and speech perception, as they occur in the human nervous system 4 Models of speech production Psycholinguistic models Sensorimotor models the whole process of speech phonological representation → generation of articulatory movement trajectories & an acoustic speech signal utterance planning → generation of a phonological representation → generation of articulatory speech patterns e.g. ❖ the Levelt model (Levelt, 1999) e.g. ● ● ● ● ● the DIVA model (Guenter et al., 2006) the TADA model (Saltzman & Munhall, 1989) the ACT model (kröger et al., 2009) the Warlaumont model (Warlaumont, 2013) The Hickok model (Hickok, 2012) 5 Sensorimotor production models are quantitative ➔ generate measurable articulatory movement patterns and acoustic speech signal ➔ include some knowledge gained from brain imaging & behavioral experiments → neuroscience-based 6 Structure vs. Knowledge The structure of the sensorimotor models is based on neurophysiological and neuropsychological facts whereas speech knowledge is acquired by training artificial neural networks in the process which imitates early stages of speech acquisition 7 DIVA model 8 Task Dynamics model 9 ACT model (Kröger et al., 2009) - action-based model of speech production, speech perception, and speech acquisition - consists in an artificial computer-implemented vocal tract as a front-end module that generates articulatory speech movements and acoustic speech signals 10 ACT model 11 ACT model 12 Cortical correlates of neural maps in ACT model locations of primary motor and primary sensory maps locations of motor plan state map and sensory state maps locations for the mirrored phonetic map Double arrows indicate neuronal mappings 13 ACT model: STRUCTURE - An artificial computer-implemented vocal tract as a front-end module, which is capable of generating articulatory speech movements and acoustic speech signals - The structure of this neurocomputational model is based on neurophysiological and neuropsychological facts - The structure comprises motor and sensory processing pathways 14 ACT model: KNOWLEDGE - collected during training stages which imitate early stages of speech acquisition. - stored in artificial self-organizing maps. - The current neurocomputational model is capable of producing and perceiving vowels, VC-, and CV-syllables (V = vowels and C = voiced plosives) 15 Acquisition, Step 1: babbling training Knowledge: ➔ the vocal tract apparatus ↔ its neural control (i.e. various motor plan states ↔ their resulting somatosensory and auditory states) ➔ No phonemic activation, just language independent general phonetic knowledge Training sets: pre-linguistic speech items, i.e. proto-vocalic and proto-syllabic speech items Flow: ➔ The model randomly produces proto-vocalic and proto-syllabic speech items ➔ The model listens to its own productions ➔ Sensory to motor mappings grow Result: ➔ the adjustment of link weights of the mapping between the phonetic map ➔ the adjustment of link weights of the mapping between the sensory maps and of the mapping between the phonetic map and the motor plan map 16 Acquisition, Step 2: imitation training Knowledge: link neurons which represent different phonemes or phonemic descriptions of syllables with the motor plan states and with the sensory states of their appropriate typical realizations Training sets: language-specific vocalic and syllabic speech items the link weights of the mapping between the phonetic map and the phonemic map are adjusted Result: (i) specifying regions of typical phoneme realizations (phone regions) within the phonetic map, i.e. regions of neurons within the phonetic map, which represent typical realizations of a phoneme or of a syllable phoneme chain (ii) fine-tuning of the sensorimotor link weights already trained during babbling. This fine-tuning mainly occurs at the phone regions. Thus the knowledge which is gained during imitation is language dependent. 17 ACT model: coarticulation 1. coarticulation results from the fact that the exact coordination of articulators for executing a speech gesture is controlled by the motor execution module and that a speech gesture is not encoded in all details on the motor plan level. That leads to variability in gesture execution with respect to context. 2. coarticulation results from the fact that gesture specifications can vary even on the level of the motor plan 18 THANK YOU! 19 References Guenther FH, Ghosh SS, Tourville JA (2006). Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96, 280-301 Hickok G (2012). Computational neuroanatomy of speech production. Nature Reviews Neuroscience 13, 135-145 Levelt WJM, Roelofs A, Meyer AS (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences 22, 1-75 Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009). Towards a neurocomputational model of speech production and perception. Speech Communication 51: 793-809 Warlaumont AS, Westermann G, Buder EH, Oller DK (2013). Prespeech motor learning in a neural network using reinforcement. Neural Networks 38, 64-75 Saltzman EL, Munhall KG (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology 1, 333-382 20