CHAPTER 21 SOUND LOCALIZATION AND TEMPORAL PATTERN ANALYSIS 21.1. PERCEPTION OF SPACE USING AUDITORY CUES, OR "WHERE" A SOUND IS. Unlike the visual and somatosensory systems, the auditory system recieves no information that is directly correlated with the spatial location of a sound source, and there is no representation of space at the auditory periphery. A number of different cues must be analyzed in the central nervous system to arrive at an estimate of where a sound originates. These cues include: Interaural intensity differences, or differences in loudness at the two ears. Remember that if the wavelength of sound is less than the diameter of an object in its path, the object creates a "sound shadow" on the side opposite the sound source. This is what our heads do for high frequency sounds, above about 2000 Hz. Interaural time differences, or differences in the time a sound arrives at one ear versus the time it arrives at the other. For low frequency sounds, the intensity is approximately the same on both sides of the head. However, because the speed of sound is relatively slow, there will be a small time difference between the sound at the two ears. This time difference is only on the order of a few hundred microseconds, but our auditory systems are able to detect it and use it for sound localization. Not only is there a time difference between the onset of the sound at the two ears there is a continuing phase difference for each cycle of a tone. Figure 21-1. Interaural intensity differences are created when the head creates a "sound shadow" on the side opposite the sound source (left). Interaural time differences are created due to the fact that it takes sound slightly longer to travel to the ear on the side opposite the sound source than to the ear on the same side as the sound source (right). Spectral peaks and notches. As already described, the outer ear introduces amplitude peaks at some frequencies and amplitude minima at other frequencies. These peaks and valleys vary as a function of sound location, especially in the vertical plane. 143 Patterns of echoes, reverberations, and other time-based cues. Sounds always occur in some sort of surrounding environment. The surroundings may absorb or reflect sound, giving rise to characteristic patterns of echoes. These subtle patterns give us considerable information about what kind of space we are in. Multisensory integration, or comparison of auditory information with information obtained through other modalities, especially vision. In this chapter we will focus on the neural mechanisms through which the auditory system compares intensity and timing information at the two ears to derive an estimate of the location of a sound source in the horizontal plane. 21.2. THE SUPERIOR OLIVARY COMPLEX AND SOUND LOCALIZATION IN THE HORIZONTAL PLANE The superior olivary complex (superior olive) receives bilateral input from the right and left cochlear nuclei. It contains two principal cell groups that are responsible for processing the main cues for sound localization. These are the lateral superior olive (LSO) and the medial superior olive (MSO). Figure 21-2. A diagram of the central auditory pathways showing the location of the superior olivary complex and its relationship to other auditory structures. AVCN = anteroventral cochlear nucleus; PVCN = posteroventral cochlear nucleus, DCN = dorsal cochlear nucleus, LSO = lateral superior olive, MSO = medial superior olive, MNTB = medial nucleus of the trapezoid body, NLL = nuclei of the lateral lemniscus, IC = inferior colliculus, MGB = medial geniculate body. For clarity, projections from only the left cochlea are shown. Those from the right cochlea would be the mirror image of those from the left. 21.2.1. The lateral superior olive (LSO) and interaural intensity differences. 144 The lateral superior olive (LSO) is the first stage of a circuit for computing interaural intensity differences (IIDs). Interaural intensity difference varies systematically as a function of sound position in the horizontal plane (azimuth). Interaural intensity differences are greatest for high frequency sounds. Each LSO cell receives direct excitatory input from the ipsilateral ear via the cochlear nucleus. It also receives indirect inhibitory input from the contralateral ear via a relay in the medial nucleus of the trapezoid body (MNTB), a group of cells that use the inhibitory neurotransmitter glycine. Figure 21-3. An alternate, much more simplified view of the binaural auditory system. Each superior olivary complex receives input from the left ear (white arrows) and the right ear (black arrows). At the level of the inferior colliculus and above, the information transmitted reflects the processing that has already taken place in the superior olive, i.e., comparison of sound at the two ears. Figure 21-4. Diagram of the main nuclei of the superior olivary complex on the left side of the brainstem, showing excitatory and inhibitory connections. CN = cochlear nucleus, LSO = lateral superior olive, MSO = medial superior olive, MNTB = medial nucleus of the trapezoid body. 145 Figure 21-5. Excitatory input to the LSO cell comes from the ipsilateral (same side) cochlear nucleus and causes an EPSP. Inhibitory input to the LSO cell arrives from the contralateral (opposite side) cochlear nucleus via a relay in the MNTB, a nucleus of glycinergic neurons, and causes an IPSP. Both inputs arrive at approximately the same time and add algebraically. The output of the LSO cell depends on the difference between the sound intensity at the ipsilateral ear and the contralateral ear. The output of an LSO cell depends on the difference between the amount of ipsilateral excitation and the amount of contralateral inhibition. Because the excitatory and inhibitory inputs may not be exactly matched in their effectiveness in depolarizing or hyperpolarizing the LSO cell given the identical sound level on both sides, different LSO cells respond best to different interaural intensity differences. Thus, the outputs of the ells that make up the LSO could be thought of as forming a crude map of space in the horizontal plane, mainly for high frequency sounds. Figure 21-6. In the LSO, the representation of high frequencies is expanded relative to the cochlear frequency map (left). At right angles to the frequency axis (shaded plane on the left LSO), it is possible to imagine that there is a representation of interaural intensity difference (right). This representation would arise through a systematic organization in which different neurons receive different ratios of ipsilateral excitation to contralateral inhibition, with the result that different neurons respond best to specific interaural intensity differences. In the LSO, the representation of high frequencies is expanded relative to the cochlear frequency representation, or tonotopy. This is not surprising given that interaural intensity differences are mainly produced by high frequency sounds. 21.2.2. The medial superior olive (MSO) and interaural time differences. The MSO is the first stage of a circuit for computing interaural time differences (ITDs). Interaural time difference varies as a function of sound position in azimuth. Interaural time differences are greatest for low frequency sounds. 146 Each MSO cell receives excitatory input from both ears through direct projections from the right and left cochlear nuclei. The two inputs to each MSO cell may differ in their latency, so that a given MSO cell responds maximally to a specific time difference in the onset of sound at the two ears, or a difference in the time of response to each cycle of an ongoing sound. Figure 21-7. Each MSO cell receives excitatory input from both the right and left cochlear nuclei. However, for most MSO cells, the input from one side (in response to a stimulus that reaches both ears simultaneously) is slightly delayed with respect to that from the other. Both inputs cause an EPSP, but the cell's response is not maximal unless the two EPSPs coincide and summate. Computation of interaural time differences depends on neural circuits that include delay lines (pathways that introduce time delays) and coincidence detectors (cells that fire only when two inputs arrive simultaneously or nearly simultaneously so that the resulting EPSPs add together). Figure 21-8. In the MSO, the representation of low frequencies is expanded relative to the cochlear frequency map (left). At right angles to the frequency axis (dotted line on left MSO), it is possible to imagine that there is a representation of interaural time difference (right). This representation would arise through a systematic organization in which different neurons receive different latency combinations from the ipsilateral and contralateral ears, with the result that different neurons respond best to specific interaural time differences. Different MSO cells respond best to different interaural time differences. Thus, the output of the cells that make up the MSO could be thought to form a crude map of space in the horizontal plane, mainly for low frequency sounds. In the MSO, the representation of low frequencies is expanded relative to the cochlear tonotopic map. This would be expected given that interaural time differences are mainly produced by low frequency sounds. 147 The use of two different cues to localize sound is sometimes referred to as the duplex theory of sound localization. The two different cues operate over different frequency ranges, and there is a "gap" between these two ranges (at around 2000 Hz) where our ability to localize a sound is relatively poor. Psychophysical measurements show that our ability to localize a sound is best in front (where our eyes are directed), and poorer at the periphery. 21.3. LOCALIZATION IN THE VERTICAL PLANE The cues that we use to localize sound in the vertical plane are not as well understood as those used for azimuthal localization. One cue that we clearly do use, however, is the pattern of peaks and valleys in the spectrum of a broadband sound caused by the characteristics of the outer ear (pinna) and the ear canal. This is a cue that we probably have to learn to use, just as we have to learn to understand speech. 21.4. CUES RELATED TO THE DISTANCE OF A SOUND SOURCE Very little is known about how we determine the distance of a sound source. A number of cues might be relevant. One is sound intensity, since faint sounds tend to be far away. This cue works best if we know what the approximate loudness of the sound would be at its source. Another cue that can provide information about distance is the patterns of echoes it generates. The time that elapses between a sound and its echo or echoes in a space of a certain size, shape, and composition is correlated with the distance of the sound source from the listener. Another important cue about sound location, including distance, is the comparison of the sound with visual (or other) information. Sound will always be localized to the most plausible source, based on visual information. For a distant source, the time that elapses between the movement that is seen and the resulting sound is proportional to the distance from the viewer/listener. 21.5. ANALYSIS OF TEMPORAL PATTERNS, OR "WHAT" A SOUND IS The cochlear nuclei and monaural nuclei of the lateral lemniscus receive input from one ear only. Cells in these parts of the auditory brainstem transform their input in various ways. These include the conversion of discharge patterns from “primary-like” to onset, offset, or other types, the introduction of time delays, and the conversion of excitatory input to inhibitory output. Some neurons in the nuclei of the lateral lemniscus respond best to specific sequences of sounds, for example a sound that changes from high frequency to low frequency. However, it is likely that much of the processing that ultimately results in our perception of “what” a sound is occurs at the level of the midbrain, thalamus, or cortex. 21.5.1. The inferior colliculus. The inferior colliculus is the main midbrain auditory center. It receives convergent projections from the "where" and "what" pathways in the lower brainstem. It integrates information about interaural intensity differences and interaural time differences. This information is sent to the superior colliculus for reflex orientation to a sound source. It is also sent to the medial geniculate nucleus in the thalamus for transmission to the auditory cortex. 148 Neurons in the inferior colliculus are tuned to a number of different features of sound. The basic organizing principle within the inferior colliculus is a tonotopic map, with low frequencies represented dorsally and high frequencies ventrally. Within the tonotopic map are other organizations in which neurons are tuned to sound intensity, sound duration, the direction of a change in sound frequency (frequency sweep direction), the rate at which amplitude or frequency changes (modulation rate) and other more complex sound patterns. Not all neurons in the inferior colliculus are tuned to every parameter mentioned here, but all show some degree of selectivity. Figure 21.9. An example of sound processing and analysis in the inferior colliculus: Tuning to the direction of a frequency modulated (FM) sweep. The hypothetical neuron on the left receives inputs from 3 cells at a lower level, each of which is tuned to a different frequency (F1, F2 and F3, with F1 being low and F3 high). The input from F3 has the shortest latency and arrives first. The input from F2 has a longer latency than F1, with Dt representing the difference between the two. F1 has the longest latency. If F1, F2 and F3 occur in the appropriate sequence so that the inputs from the three different pathways coincide, then the inputs summate and a large EPSP is generated. If they occur in the reverse order, then there is no summation and little or no response. The cell on the right has inputs from the same frequencies, but with latencies arranged in the opposite sequence so that it responds best to a sequence from high to low frequency. 21.5.2. Tuning to the direction of a frequency sweep. Tuning to the direction of a frequency sweep is one example of a form of neural selectivity that arises through computational mechanisms in the central nervous system. The sweep direction sensitive cell receives subthreshold excitatory input from two or more sources tuned to different frequencies, and with different latencies. Just like a cell in the MSO, it will respond best to a stimulus in which the input through the pathway with the longer latency precedes input from the pathway with the shorter latency so that the two inputs coincide. In this case, however, the two inputs are from neurons tuned to high and low frequencies, not from the the right and left ears. You will remember that the MSO cell responds when right precedes left (or vice versa), whereas the inferior colliculus cell in our example responds when high frequency precedes low frequency (or vice versa). 21.5.3. Importance of delay lines and coincidence detectors in neural information processing. Inferior colliculus cells sensitive to the direction of a frequency sweep are just one other example of a neural circuit that uses delay lines and coincidence detectors. A neural circuit made up of delay-lines and coincidence detectors can be used to analyze many different patterns of information distributed over time, not just in the auditory system, but in other systems as well. 149 ____________________________________________________________________________ Thought question: Using a system of delay lines and coincidence detectors similar to those used in the example of auditory cells tuned to a specific direction of frequency change, how could you construct neurons sensitive to the direction in which a sound source moves? Where would the inputs come from? How would they be organized? What parameters might be mapped on the resulting array of cells? _____________________________________________________________________________ _____________________________________________________________________________ Thought question: One of the challenges for our auditory systems is separating simultaneously occurring streams of sound that originate from different sources. How might the auditory processing mechanisms you have learned about help in this task? _____________________________________________________________________________ _____________________________________________________________________________ Thought question: Most bats emit high frequency sounds and listen to the echoes reflected from objects in their environment. The time between the emission of the call and the return of the echo is proportional to the distance of the object from the bat. What kinds of coincidence detector cells might one expect to find in the brains of bats? What sorts of neural circuits do you think might be responsible for producing the specialized response properties of these cells? _____________________________________________________________________________ 150