chapter 21 sound localization and temporal pattern analysis

advertisement
CHAPTER 21
SOUND LOCALIZATION AND TEMPORAL PATTERN
ANALYSIS
21.1. PERCEPTION OF SPACE USING AUDITORY CUES, OR "WHERE" A SOUND
IS.
Unlike the visual and somatosensory systems, the auditory system recieves no
information that is directly correlated with the spatial location of a sound source, and there is no
representation of space at the auditory periphery. A number of different cues must be analyzed in
the central nervous system to arrive at an estimate of where a sound originates. These cues
include:
Interaural intensity differences, or differences in loudness at the two ears. Remember
that if the wavelength of sound is less than the diameter of an object in its path, the object creates
a "sound shadow" on the side opposite the sound source. This is what our heads do for high
frequency sounds, above about 2000 Hz.
Interaural time differences, or differences in the time a sound arrives at one ear versus
the time it arrives at the other. For low frequency sounds, the intensity is approximately the same
on both sides of the head. However, because the speed of sound is relatively slow, there will be a
small time difference between the sound at the two ears. This time difference is only on the order
of a few hundred microseconds, but our auditory systems are able to detect it and use it for sound
localization. Not only is there a time difference between the onset of the sound at the two ears there is a continuing phase difference for each cycle of a tone.
Figure 21-1. Interaural intensity differences are created when the head creates a "sound shadow" on the side
opposite the sound source (left). Interaural time differences are created due to the fact that it takes sound slightly
longer to travel to the ear on the side opposite the sound source than to the ear on the same side as the sound source
(right).
Spectral peaks and notches. As already described, the outer ear introduces amplitude
peaks at some frequencies and amplitude minima at other frequencies. These peaks and valleys
vary as a function of sound location, especially in the vertical plane.
143
Patterns of echoes, reverberations, and other time-based cues. Sounds always occur in
some sort of surrounding environment. The surroundings may absorb or reflect sound, giving
rise to characteristic patterns of echoes. These subtle patterns give us considerable information
about what kind of space we are in.
Multisensory integration, or comparison of auditory information with information
obtained through other modalities, especially vision.
In this chapter we will focus on the neural mechanisms through which the auditory
system compares intensity and timing information at the two ears to derive an estimate of the
location of a sound source in the horizontal plane.
21.2. THE SUPERIOR OLIVARY COMPLEX AND SOUND LOCALIZATION IN THE
HORIZONTAL PLANE
The superior olivary complex (superior olive) receives bilateral input from the right and
left cochlear nuclei. It contains two principal cell groups that are responsible for processing the
main cues for sound localization. These are the lateral superior olive (LSO) and the medial
superior olive (MSO).
Figure 21-2. A diagram of the central auditory pathways showing the location of the superior olivary complex and
its relationship to other auditory structures. AVCN = anteroventral cochlear nucleus; PVCN = posteroventral
cochlear nucleus, DCN = dorsal cochlear nucleus, LSO = lateral superior olive, MSO = medial superior olive,
MNTB = medial nucleus of the trapezoid body, NLL = nuclei of the lateral lemniscus, IC = inferior colliculus,
MGB = medial geniculate body. For clarity, projections from only the left cochlea are shown. Those from the right
cochlea would be the mirror image of those from the left.
21.2.1. The lateral superior olive (LSO) and interaural intensity differences.
144
The lateral superior olive (LSO) is the first stage of a circuit for computing interaural
intensity differences (IIDs). Interaural intensity difference varies systematically as a function of
sound position in the horizontal plane (azimuth). Interaural intensity differences are greatest for
high frequency sounds.
Each LSO cell receives direct excitatory input from the ipsilateral ear via the cochlear
nucleus. It also receives indirect inhibitory input from the contralateral ear via a relay in the
medial nucleus of the trapezoid body (MNTB), a group of cells that use the inhibitory
neurotransmitter glycine.
Figure 21-3. An alternate, much more simplified view of the binaural auditory system. Each superior olivary
complex receives input from the left ear (white arrows) and the right ear (black arrows). At the level of the inferior
colliculus and above, the information transmitted reflects the processing that has already taken place in the superior
olive, i.e., comparison of sound at the two ears.
Figure 21-4. Diagram of the main nuclei of the superior olivary complex on the left side of the brainstem, showing
excitatory and inhibitory connections. CN = cochlear nucleus, LSO = lateral superior olive, MSO = medial superior
olive, MNTB = medial nucleus of the trapezoid body.
145
Figure 21-5. Excitatory input to the LSO cell comes from the ipsilateral (same side) cochlear nucleus and causes an
EPSP. Inhibitory input to the LSO cell arrives from the contralateral (opposite side) cochlear nucleus via a relay in
the MNTB, a nucleus of glycinergic neurons, and causes an IPSP. Both inputs arrive at approximately the same time
and add algebraically. The output of the LSO cell depends on the difference between the sound intensity at the
ipsilateral ear and the contralateral ear.
The output of an LSO cell depends on the difference between the amount of ipsilateral excitation
and the amount of contralateral inhibition. Because the excitatory and inhibitory inputs may not
be exactly matched in their effectiveness in depolarizing or hyperpolarizing the LSO cell given
the identical sound level on both sides, different LSO cells respond best to different interaural
intensity differences. Thus, the outputs of the ells that make up the LSO could be thought of as
forming a crude map of space in the horizontal plane, mainly for high frequency sounds.
Figure 21-6. In the LSO, the representation of high frequencies is expanded relative to the cochlear frequency map
(left). At right angles to the frequency axis (shaded plane on the left LSO), it is possible to imagine that there is a
representation of interaural intensity difference (right). This representation would arise through a systematic
organization in which different neurons receive different ratios of ipsilateral excitation to contralateral inhibition,
with the result that different neurons respond best to specific interaural intensity differences.
In the LSO, the representation of high frequencies is expanded relative to the cochlear
frequency representation, or tonotopy. This is not surprising given that interaural intensity
differences are mainly produced by high frequency sounds.
21.2.2. The medial superior olive (MSO) and interaural time differences.
The MSO is the first stage of a circuit for computing interaural time differences (ITDs).
Interaural time difference varies as a function of sound position in azimuth. Interaural time
differences are greatest for low frequency sounds.
146
Each MSO cell receives excitatory input from both ears through direct projections from
the right and left cochlear nuclei. The two inputs to each MSO cell may differ in their latency, so
that a given MSO cell responds maximally to a specific time difference in the onset of sound at
the two ears, or a difference in the time of response to each cycle of an ongoing sound.
Figure 21-7. Each MSO cell receives excitatory input from both the right and left cochlear nuclei. However, for
most MSO cells, the input from one side (in response to a stimulus that reaches both ears simultaneously) is slightly
delayed with respect to that from the other. Both inputs cause an EPSP, but the cell's response is not maximal unless
the two EPSPs coincide and summate.
Computation of interaural time differences depends on neural circuits that include delay
lines (pathways that introduce time delays) and coincidence detectors (cells that fire only when
two inputs arrive simultaneously or nearly simultaneously so that the resulting EPSPs add
together).
Figure 21-8. In the MSO, the representation of low frequencies is expanded relative to the cochlear frequency map
(left). At right angles to the frequency axis (dotted line on left MSO), it is possible to imagine that there is a
representation of interaural time difference (right). This representation would arise through a systematic
organization in which different neurons receive different latency combinations from the ipsilateral and contralateral
ears, with the result that different neurons respond best to specific interaural time differences.
Different MSO cells respond best to different interaural time differences. Thus, the
output of the cells that make up the MSO could be thought to form a crude map of space in the
horizontal plane, mainly for low frequency sounds.
In the MSO, the representation of low frequencies is expanded relative to the cochlear
tonotopic map. This would be expected given that interaural time differences are mainly
produced by low frequency sounds.
147
The use of two different cues to localize sound is sometimes referred to as the duplex
theory of sound localization. The two different cues operate over different frequency ranges, and
there is a "gap" between these two ranges (at around 2000 Hz) where our ability to localize a
sound is relatively poor.
Psychophysical measurements show that our ability to localize a sound is best in front
(where our eyes are directed), and poorer at the periphery.
21.3. LOCALIZATION IN THE VERTICAL PLANE
The cues that we use to localize sound in the vertical plane are not as well understood as
those used for azimuthal localization. One cue that we clearly do use, however, is the pattern of
peaks and valleys in the spectrum of a broadband sound caused by the characteristics of the outer
ear (pinna) and the ear canal. This is a cue that we probably have to learn to use, just as we have
to learn to understand speech.
21.4. CUES RELATED TO THE DISTANCE OF A SOUND SOURCE
Very little is known about how we determine the distance of a sound source. A number of
cues might be relevant. One is sound intensity, since faint sounds tend to be far away. This cue
works best if we know what the approximate loudness of the sound would be at its source.
Another cue that can provide information about distance is the patterns of echoes it generates.
The time that elapses between a sound and its echo or echoes in a space of a certain size, shape,
and composition is correlated with the distance of the sound source from the listener. Another
important cue about sound location, including distance, is the comparison of the sound with
visual (or other) information. Sound will always be localized to the most plausible source, based
on visual information. For a distant source, the time that elapses between the movement that is
seen and the resulting sound is proportional to the distance from the viewer/listener.
21.5. ANALYSIS OF TEMPORAL PATTERNS, OR "WHAT" A SOUND IS
The cochlear nuclei and monaural nuclei of the lateral lemniscus receive input from one
ear only. Cells in these parts of the auditory brainstem transform their input in various ways.
These include the conversion of discharge patterns from “primary-like” to onset, offset, or other
types, the introduction of time delays, and the conversion of excitatory input to inhibitory output.
Some neurons in the nuclei of the lateral lemniscus respond best to specific sequences of sounds,
for example a sound that changes from high frequency to low frequency. However, it is likely
that much of the processing that ultimately results in our perception of “what” a sound is occurs
at the level of the midbrain, thalamus, or cortex.
21.5.1. The inferior colliculus.
The inferior colliculus is the main midbrain auditory center. It receives convergent
projections from the "where" and "what" pathways in the lower brainstem. It integrates
information about interaural intensity differences and interaural time differences. This
information is sent to the superior colliculus for reflex orientation to a sound source. It is also
sent to the medial geniculate nucleus in the thalamus for transmission to the auditory cortex.
148
Neurons in the inferior colliculus are tuned to a number of different features of sound.
The basic organizing principle within the inferior colliculus is a tonotopic map, with low
frequencies represented dorsally and high frequencies ventrally. Within the tonotopic map are
other organizations in which neurons are tuned to sound intensity, sound duration, the direction
of a change in sound frequency (frequency sweep direction), the rate at which amplitude or
frequency changes (modulation rate) and other more complex sound patterns. Not all neurons in
the inferior colliculus are tuned to every parameter mentioned here, but all show some degree of
selectivity.
Figure 21.9. An example of sound processing and analysis in the inferior colliculus: Tuning to the direction of a
frequency modulated (FM) sweep. The hypothetical neuron on the left receives inputs from 3 cells at a lower level,
each of which is tuned to a different frequency (F1, F2 and F3, with F1 being low and F3 high). The input from F3
has the shortest latency and arrives first. The input from F2 has a longer latency than F1, with Dt representing the
difference between the two. F1 has the longest latency. If F1, F2 and F3 occur in the appropriate sequence so that
the inputs from the three different pathways coincide, then the inputs summate and a large EPSP is generated. If
they occur in the reverse order, then there is no summation and little or no response. The cell on the right has inputs
from the same frequencies, but with latencies arranged in the opposite sequence so that it responds best to a
sequence from high to low frequency.
21.5.2. Tuning to the direction of a frequency sweep.
Tuning to the direction of a frequency sweep is one example of a form of neural
selectivity that arises through computational mechanisms in the central nervous system. The
sweep direction sensitive cell receives subthreshold excitatory input from two or more sources
tuned to different frequencies, and with different latencies. Just like a cell in the MSO, it will
respond best to a stimulus in which the input through the pathway with the longer latency
precedes input from the pathway with the shorter latency so that the two inputs coincide. In this
case, however, the two inputs are from neurons tuned to high and low frequencies, not from the
the right and left ears. You will remember that the MSO cell responds when right precedes left
(or vice versa), whereas the inferior colliculus cell in our example responds when high frequency
precedes low frequency (or vice versa).
21.5.3. Importance of delay lines and coincidence detectors in neural information
processing. Inferior colliculus cells sensitive to the direction of a frequency sweep are just one
other example of a neural circuit that uses delay lines and coincidence detectors. A neural circuit
made up of delay-lines and coincidence detectors can be used to analyze many different patterns
of information distributed over time, not just in the auditory system, but in other systems as well.
149
____________________________________________________________________________
Thought question: Using a system of delay lines and coincidence detectors similar to those used
in the example of auditory cells tuned to a specific direction of frequency change, how could you
construct neurons sensitive to the direction in which a sound source moves? Where would the
inputs come from? How would they be organized? What parameters might be mapped on the
resulting array of cells?
_____________________________________________________________________________
_____________________________________________________________________________
Thought question: One of the challenges for our auditory systems is separating simultaneously
occurring streams of sound that originate from different sources. How might the auditory
processing mechanisms you have learned about help in this task?
_____________________________________________________________________________
_____________________________________________________________________________
Thought question: Most bats emit high frequency sounds and listen to the echoes reflected from
objects in their environment. The time between the emission of the call and the return of the echo
is proportional to the distance of the object from the bat. What kinds of coincidence detector
cells might one expect to find in the brains of bats? What sorts of neural circuits do you think
might be responsible for producing the specialized response properties of these cells?
_____________________________________________________________________________
150
Download