Cognitive Neuroscience and Embodied Intelligence Hearing and Speech Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars Janusz A. Starzyk EE141 1 Sound and hearing basics Time domain sinewave signal and the same signal in time-frequency domain Complex sound signals can be decomposed into a series of sinewave signals of various frequencies. Human auditory system detects sounds in the range of 20 Hz to 20 kHz bats and whales can hear up to 100 kHz Musicians can detect the difference between 1000 Hz and 1001 Hz 2 EE141 Sound and hearing basics 20 msec is needed for the onset of a consonant 200 msec is time of an average syllable And 2000 msec is needed for a sentence These various time scales and other parameters of the sound like timbre or intensity must be properly processed to recognize speech or music. EE141 A spectrogram of a speech signal – frequency is represented 3 on the y-axis Sound and hearing basics Near total silence - 0 dB A whisper - 15 dB Normal conversation - 60 dB A lawnmower - 90 dB A car horn - 110 dB A rock concert - 120 dB A gunshot - 140 dB Human and cat hearing sensitivity Dynamic range of human hearing system is very broad from 1 SPL (sound pressure level where hearing is accruing) to 1015 SPL or 150 4 dB SPL. EE141 Sound and hearing basics There are two cochlear windows – oval and round. Stapes coveys sound vibrations through oval window to inner ear fluids. Sound wave caused by vibrating objects moves through the air and enters external auditory canal reaching membrane or eardrum. Vibrations propagate through the middle ear through mechanical action of three bones the hammer, anvil and stirrup (or malleus, incus and stapes). Because of the length of the ear canal, it is capable of amplifying sounds with 5 frequencies of approximately 3000 Hz. EE141 Sound and hearing basics The cochlea and the semicircular canals are filled with a water-like fluid. Cochlea in the inner ear contains a basilar membrane. Traveling wave of sound moves across the basilar membrane moving the small hair-like nerve cells. 6 EE141 Sound and hearing basics Pathways at the auditory brainstem The inner surface of the cochlea is lined with over 16 000 hair-like nerve cells which perform one of the most critical roles in our ability to hear. Each hair cell has a natural sensitivity to a particular frequency of vibration. The brain decodes the sound frequencies based on which hair cells along the basilar membrane are activated this is known as place principle.7 EE141 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000 EE141 8 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 9 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000 EE141 10 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 11 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 12 Inner ear details Figure 30-5 From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 13 Inner ear details Figure 30-5 From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 14 Inner ear details From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000. EE141 15 The central auditory system The auditory system has many stages from the ear, to the brainstem, to subcortical nuclei, and to cortex. Ascending (affarent) pathways transmit information from the periphery to cortex. The neuron signals travel starting from the auditory nerve to the lower (ventral) cochlear nucleus. Then signal travels through lateral lemniscus, inferior colliculus, thalamus, to auditory cortex. A key task of the ascending pathway is to localize sound in space. EE141 16 The central auditory system The descending (efferent) pathways from auditory system cortex go down to periphery under cortical control. This control extends all the way to hair cells in the cochlea. Descending pathway provides ‘top down’ information critical for selective attention and perception in a noisy environment. Besides ascending and descending pathways there is connection between left and right auditory pathways through corpus callosum and other brain regions. EE141 17 Auditory cortex Auditory cortex specializes in sound processing. It serves as a hub for sound processing and interacts with other systems within cortex and back down the descending path to the cochlea. These processes provide a wide range of perceptual abilities like selecting a single person's voice in a crowded space or recognizing melody even when it is played off-key. 18 EE141 Auditory cortex In humans primary auditory cortex is located within Heschl’s gyrus. Heschl’s gyrus corresponds to Brodmann’s area 41. Another important region in auditory cortex is planum temporale located posterior to Heschl’s gyrus. Planum temporale is much larger in the left hemisphere (up to 10 times) in right handed individuals. It plays important role in language understanding. Posterior to planum temporale is Broadmann area 22 that Carl Wernicke associated with speech comprehension (Wernicke area). 19 EE141 Auditory cortex There are several types of neurons in the auditory system. They have different response properties for coding frequency, intensity, and timing information in sounds as well as encoding spatial information for localizing sounds in space. Main cells of cochlear nucleus and their corresponding post stimulus time (PST) histograms. Sound stimulus used is typically 25 ms tone bursts at the center frequency and sound level 30 dB above threshold. 20 EE141 Auditory cortex Receptive fields of auditory neurons have different sensitivity to the location of the sound source (in azimuth angle) and its loudness (in dB). The top neuron sensitivity is to a broad range of sound intensity located to the right with larger sensitivity to louder signals. The lower neuron sensitivity is more narrowly tuned to sounds level 3060 dB located slightly to the left of center. Broadly tuned neurons are useful for detection of the sound source, while narrowly tuned give more precise information needed to locate the sound source like more precise direction of the sound and its loudness level. 21 EE141 Auditory cortex Auditory tonotopic cortical fields of a cat. a) lateral view b) lateral view “unfolded’ to show parts hidden within sulci. The four tonotopic fields are: Anterior (A) Primary (AI) Posterior (P) and Ventroposterior (VP) Positions of the lowest and highest center frequencies in these fields are indicated in (b) Other cortical areas have a little tonotopy: seconday (AII), ventral (V), temporal (T), and dorsoposterior (DP). 22 EE141 Functional mapping of auditory processing The planum temporale (PT) location close to Wernicke’s area for speech comprehension, points towards its role as the site for auditory speech and language processing. However neuroimaging studies of PT provide evidence that functional role of PT is not limited to speech. PT is a hub for auditory scene analysis, decoding sensory inputs and comparing them to memories and past experiences. PT further directs cortical processing to decode spatial location and auditory object identification. Planum temporale and its major associations: lateral superior temporal gyrus (STG), superior temporal sulcus (STS), middle temporal gyrus (MTG), parieto-temporal operculum (PTO), inferior parietal lobe (IPL). EE141 23 Functional mapping of auditory processing PT as a hub for auditory and spatial analysis. In a crowded environment it is important to decode auditory objects such as friend’s voice, alarm signal or a squeaking wheel. To do so, auditory system must determine where sounds are occurring in space, and what they represent. All these will be associated with other sensory inputs like vision, smell, or feel and memory associations. 24 EE141 Functional mapping of auditory processing Neurons’ response to interaural time difference (ITD) and interaural level difference (ILD) Abbreviations: CN – cochlear nucleus MSO – medial superior olive LSO – lateral superior olive MNTB – medial nucleus of the trapezoidal body To determine where the sound is coming from, two cues are used: Interaural (between ear) time difference Interaural level difference Sensitivity to time difference must be smaller than millisecond. The head produces a ‘sound shadow’ so that the sound reaching farther ear is slightly weaker. 25 EE141 Functional mapping of auditory processing It was demonstrated that musical conductors were able to better locate sound sources in a musical score They demonstrated higher sensitivity to sounds presented in peripheral listening than other groups including other musicians. 26 EE141 Functional mapping of auditory processing Auditory objects are categorized into human voices, musical instruments, animal sounds, etc. Auditory objects are learned over our lifetime, and associations are stored in the memory. Auditory areas in superior temporal cortex are activated both by recognized and unrecognized sounds. Recognized sounds also activate superior temporal sulcus and middle temporal gyrus (MTG). Fig. (c) shows difference between Activations for recognized sounds and unrecognized sounds EE141 27 Functional mapping of auditory processing Binder and colleagues propose that middle temporal gyrus (MTG) is the region that associates sounds and images. This is in agreement with case studies of patients who suffered from auditory agnosia (inability to recognize sounds). Research results showed that auditory object perception is a complex process and involves multiple brain regions in both hemispheres. Brain activities in auditory processing – cross sections at different depth EE141 28 Cocktail party effect How auditory system separates sounds coming from different sources? Bregman (1990) proposed a model for such segregation. It contains four elements: The source The stream Grouping Stream segregation The source is the sound signal. It represents physical features like frequency, intensity, spatial location. The stream is the percept of the sound and represents psychological aspects depending on individual. Grouping – creates stream Simultaneous grouping e.g. instruments in the orchestra Sequential grouping e.g. grouping sounds across time Stream segregation into objects. EE141 29 Cocktail party effect Bergman grouping principles: Proximity: sounds that are close in time are grouped. Closure: if a sound does not belong to the stream (like cough during a lecture) are excluded. Good continuation: sounds that follow smoothly each other (similar to proximity). Common fate: sounds that come from the same location or coincide in time (orchestra). Exclusive allocation – selective listening (focus on one stream). EE141 Cortical areas of auditory stream analysis: intraparietal sulcus (IPS) is involved in binding of multimodal information (vision, touch, sound) 30 Cocktail party effect There is a growing evidence that like in visual stream cortical networks for decoding ‘what’ and ‘where’ information in sound are processed in separate but highly interactive processing streams. Audio (blue) and visual (pink) processing areas in macaque brain, and ‘what’, ‘where’ audio processing streams EE141 Human brain processing: Blue – language specific phonological structure Lilac – phonetic cues and speech features Purple – intelligible speech Pink – verbal short term memory 31 Green – auditory spatial tasks Speech perception There is no agreement how speech is coded in the brain. What are the speech ‘building blocks’? A natural way would be to code words based on phonemes. Word ‘dig’ would be obtained by identifying a sequence of phonemes Perhaps a syllable is the appropriate unit? We must decode not only ‘what ‘ but ‘who’ and ‘when’ as well to understand temporal order of phonemes, syllables, words, and sentences. EE141 The speech signal must be evaluated on the scale of times from 20 ms to 2000 ms independently of the pitch (high for a 32 child, low for a man), loud or quiet, fast or slow. Speech perception Early attempts in simplifying the speech processing were done in Bell Labs by Homer Dudley who developed vocoder: Vocoder (voice + coder) was able to reduce speech signal for a transmission over long telephone circuits by analyzing and recoding speech. Cochlear implants that stimulate auditory system are based on the vocoder technology for some types of hearing loss. 33 EE141 Speech perception A second invention spectrograph developed in Bell Labs during World War II produced voice picture with frequency on y-axis, time on x-axis and intensity as a level of grey. Problems in analyzing spectrograms: Gaps or silences do not mark when the word begins and ends. Individual phonemes change depending on what phonemes were 34 before and after them. EE141 What is wrong with the short-term spectrum? Inconsistent (same message, different representation) Shannon (1998) showed that a minimum information for speech decoding is included in the shape of the speech signal called temporal envelope short-term spectrum frequency 35 EE141 Speech perception Lack of invariant features in speech spectrogram forced researchers to look for other ways of speech perception. The motor theory developed by Liberman (1985) assumes domain-specific approach to speech. This theory suggests that speech perception is tightly coupled with speech production While acoustics of phonemes lack invariance, the motor gestures to produce the speech is invariant and can be accessed in speech perception. Another theory developed by Tallal assumes that speech and language are domain-general. In this theory left-hemisphere language organization is not result of domain-specific development, but results from domain general bias of the left hemisphere for decoding rapidly changing sounds (such as those contained in speech). It is likely that the neural system uses a combination of domain-specific and domain-general processing for speech perception. 36 EE141 Speech perception A process model for word comprehension. Language areas. 37 EE141 Speech perception Brain response to: Words Pseudowords Reversed speech Binder and colleagues (1997) studied activation of brain areas to words, reverse speech and pseudowords and found that Heschl’s gyrus and the planum temporale were activated similarly for all stimuli. This supports the notion of hierarchical processing of sounds with Heschl’s gyrus representing early sensory analysis. Speech signals activated larger portion of auditory cortex than nonspeech sounds in posterior superior temporal gyrus and superior temporal sulcus, but there was no difference in activation between words, pseudowords and reversed speech. The conclusion is that these regions do not reflect semantic processing 38 of the words but reflect phonological processing of the speech sounds. EE141 Speech perception and production Speech perception and production are tightly coupled. One explanation is that when we speak we hear our voice. Wernicke proposed a model for language processing that links a pathway from auditory speech perception to motor speech production The verbal signal enters the primary cortex (A) and then Wernicke’s area (WA) The response will be formulated in Broca’s area (B) and the primary motor cortex (M). We can listen and respond to our own speech using the same brain regions. Producing internal response to a question will result in silent speaking to ourselves. 39 EE141 Damage to speech perceptual system Phonetic foils Damage to speech perceptual system may be caused by strokes that block the blood flow to the brain area and cause death of neurons. When the stroke impairs the language functions it is called aphasia. Paul Broca discovered aphasia in the region in frontal lobe important for speech production. Carl Wernicke discovered a region in temporal lobe important for speech perception. Experiments by Blumstein tested phonetic deficits and semantic deficits by providing patients with four choices in the test: correct word, semantic foil, phonetic foil and unrelated foil (e.g. peas, carrots, keys, and calculator) EE141 40 Learning and plasticity An important theme in studying human cognition is to find out how new information is encoded during learning and how the brain adapts – plasticity. Much of what is known about plasticity of the auditory system is due to deprivation in animal study. Both cochlea and brainstem are organized tonotopically and this organization is reflected in auditory cortex. After cochlea or brainstem are lesioned some frequencies are no longer transmitted to auditory cortex and then cortex is studied for changes reflecting neural plasticity. Changes in neural response in auditory cortex were observed in human after sudden hearing loss. Children with hearing loss showed some maturational lag comparing to typical development, however after having cochlear implants, their auditory system continued to mature in a typical fashion. This indicates plasticity of the auditory cortex. 41 EE141 Learning and plasticity Plasticity due to learning was observed in laboratory animals using classical conditioning – presented tones were paired with mild electrical shock so the animal learned sounds more relevant to survival (avoiding shock). Plasticity related changes were more pronounced for higher motivational levels. Trained tones were 4.1-8kHz and motivational levels were high (red) medium (black) and low (blue) EE141 Untrained Trained Cortical area change for the desired signal 42 frequency for different motivational levels Auditory awareness Auditory system is the last to fall asleep and the first to wake up. People in sleep respond to their names better than to other sounds. Figure compares responses in auditory cortex during awaken and 43 sleep states. EE141 Auditory imagery Brain areas active for imagined sounds Sounds are played in our head all day even if we do not hear them. Some are voluntary and uncalled for like a melody or your inner voice. Some are planned like when you rehearse a verse or a telephone number in your head. Halpern and colleagues (2004) showed that non-primary auditory cortex is active during imagined (and not heard) sounds. 44 EE141 Auditory imagery A related results were obtained by Jancke and colleagues (2005). They used fMRI images to compare neural responses to real sounds and to imagined sounds. Imagined sounds activate similar regions in auditory cortex as the real ones. 45 EE141 Summary We discussed organization of the acoustic system Learned sound and hearing basics Traced auditory pathways Analyzed organization of auditory cortex Observed functional mapping of auditory processing Discussed sound and music perception Effect of learning on sound processing Research on animals confirmed existence of ‘what’ and ‘where’ pathways in auditory system, however these pathways may be organized differently in humans. When you hear uncalled melody in your head, think which of your brain areas are activated. 46 EE141