Neural Processing of Amplitude-Modulated Sounds

advertisement
Physiol Rev
84: 541–577, 2004; 10.1152/physrev.00029.2003.
Neural Processing of Amplitude-Modulated Sounds
P. X. JORIS, C. E. SCHREINER, AND A. REES
Laboratory of Auditory Neurophysiology, Division of Neurophysiology, K.U. Leuven, Leuven, Belgium;
Coleman Laboratory, Department of Otolaryngology, Keck Center for Integrative Neuroscience, University of
California at San Franscisco, San Francisco, California; and School of Neurology, Neurobiology, and
Psychiatry, The Medical School, University of Newcastle upon Tyne,
Newcastle upon Tyne, United Kingdom
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.
XI.
Temporal Dimensions of Sound
Human Sensitivity to Amplitude Modulation
Neural Response Measures
Auditory Nerve: Bottleneck to the Central Nervous System
A. Basic auditory nerve properties
B. Average response rate and magnitude of synchronization
C. Phase of synchronization
Cochlear Nucleus: Parallel Channels
A. Basic organization of the CN
B. AM responses of neuronal types in the CN
Superior Olivary Complex: An Example of Time-to-Rate Conversion
The Nuclei of the Lateral Lemniscus
Amplitude Modulation Encoding in the Inferior Colliculus: A Center for Convergence
A. Basic organization of the IC
B. Modulation transfer functions for IC units: synchronization
C. Modulation transfer functions for IC units: average rate
D. What determines the MTF upper limit in the IC?
E. Is AM encoded in the IC by rate or synchronization?
F. Relationship between AM responses and other neuronal properties
G. Is modulation frequency represented topographically in the IC?
H. Responses to interaural time disparities in modulation envelopes
I. Contribution of nonlinearities
Amplitude Modulation Encoding in Auditory Thalamus and Cerebral Cortex
A. Basic layout of the thalamocortical system
B. Temporal responses in the MGB
C. Responses to AM in primary auditory cortex: synchronization
D. Responses to AM in primary auditory cortex: average rate
E. Responses to AM in primary auditory cortex: influence of modulation parameters
F. Differences of temporal coding between cortical fields
G. Cortical mechanisms
H. Temporal coding of complex sounds
I. Plasticity of temporal coding properties in auditory cortex
Neurophysiological and Psychological Studies in Humans
Conclusion
542
544
545
547
547
548
549
550
550
551
553
555
555
555
556
557
558
559
559
559
561
561
562
562
562
564
565
566
567
567
567
568
569
569
Joris, P. X., C. E. Schreiner, and A. Rees. Neural Processing of Amplitude-Modulated Sounds. Physiol Rev 84:
541–577, 2004; 10.1152/physrev.00029.2003.—Amplitude modulation (AM) is a temporal feature of most natural
acoustic signals. A long psychophysical tradition has shown that AM is important in a variety of perceptual tasks,
over a range of time scales. Technical possibilities in stimulus synthesis have reinvigorated this field and brought the
modulation dimension back into focus. We address the question whether specialized neural mechanisms exist to
extract AM information, and thus whether consideration of the modulation domain is essential in understanding the
neural architecture of the auditory system. The available evidence suggests that this is the case. Peripheral neural
structures not only transmit envelope information in the form of neural activity synchronized to the modulation
waveform but are often tuned so that they only respond over a limited range of modulation frequencies. Ascending
www.prv.org
0031-9333/04 $15.00 Copyright © 2004 the American Physiological Society
541
542
JORIS, SCHREINER, AND REES
the auditory neuraxis, AM tuning persists but increasingly takes the form of tuning in average firing rate, rather than
synchronization, to modulation frequency. There is a decrease in the highest modulation frequencies that influence
the neural response, either in average rate or synchronization, as one records at higher and higher levels along the
neuraxis. In parallel, there is an increasing tolerance of modulation tuning for other stimulus parameters such as
sound pressure level, modulation depth, and type of carrier. At several anatomical levels, consideration of modulation response properties assists the prediction of neural responses to complex natural stimuli. Finally, some
evidence exists for a topographic ordering of neurons according to modulation tuning. The picture that emerges is
that temporal modulations are a critical stimulus attribute that assists us in the detection, discrimination, identification, parsing, and localization of acoustic sources and that this wide-ranging role is reflected in dedicated
physiological properties at different anatomical levels.
I. TEMPORAL DIMENSIONS OF SOUND
Among the sensory systems, audition excels in its
speed of operation. This is perhaps not too surprising,
since our entire sense of hearing depends on the analysis
of rapid changes in acoustic pressure at the two ears. The
importance of the temporal dimension is manifest in
many structural and functional specializations, starting at
the peripheral sense organ and carried through the subsequent stages in the central nervous system. The striking
sensitivity of auditory structures to temporal features of
the acoustic stimulus has been observed since the earliest
electrophysiological recordings, and this sensitivity is
equally prominent in behavioral observations of humans
and experimental animals.
Importantly, there are multiple temporal dimensions
in acoustic stimuli (238). It is useful to distinguish “finestructure” and “envelope” as two components of a time
waveform. The fast pressure variations that determine the
spectral content constitute the fine-structure. This finestructure waxes and wanes in amplitude, and the contour
of this amplitude modulation (AM) is the envelope. For
example, the waveform of a speech utterance shows
bursts of energy that correspond to phonemes. The temporal characteristics of these bursts carry much information (44, 108, 214, 265, 272, 281), but their dominant
modulation frequency is rather slow (typically 3– 4 Hz,
extending up to ⬃20 Hz) vis-à-vis the temporal capabilities of the peripheral auditory system. Faster modulations
of several hundred Hertz are also very common, e.g., in
segments of voiced speech where they are perceptually
associated with voice pitch. These envelope components
arise from interactions between fine-structure components and are not present as such, i.e., as acoustic energy,
in the waveform. This is illustrated by the superposition
of two sine waves, equal in amplitude but separated by a
small difference frequency (fd): constructive and destructive interference of the two components generate AM in
the form of “beating” at frequency fd. The same principle
extends to environmental sound sources, which commonly produce quasi-periodic signals consisting of a
range of frequency components (harmonics) that are multiples of a fundamental frequency: the combination of
even a limited number of components, e.g., within a coPhysiol Rev • VOL
chlear filter, reconstitutes the fundamental frequency in
the form of a temporal envelope modulation. (For examples of spectrograms, waveforms, and treatment of AM,
see Refs. 99, 100, 177, 180, 302.)
The laboratory stimulus most often used in physiological studies of modulation is a pure tone (sinusoid)
modulated by another tone. Figure 1A and Equation 1
represent the waveform [s(t)] of a tone with frequency fc
(the carrier), whose amplitude is modulated by a lower
frequency fm (the modulator) at a modulation depth m
(0 ⱕ m ⱕ 1)
s(t) ⫽ [1 ⫹ m sin (2␲f mt)] sin (2␲f ct)
(1)
For fc ⬎⬎ fm the first term [1 ⫹ msin(2␲fmt)] is the
time-varying amplitude or envelope.1 Using trigonometric
identities, s(t) can be rewritten as the sum of three components at fc and at fc ⫾ fm (the upper and lower sidebands)
sin (2␲fct) ⫹ m/2[sin 2␲(fc ⫹ fm)t ⫹ sin 2␲(fc ⫺ fm)t] (2)
This signal does not contain energy at fm (Fig. 1, A and B);
the modulation in the time waveform is due to the interaction of the components in the signal which are separated by a difference frequency fm.
The sinusoidal AM stimulus is special because its
envelope consists of a single sinusoidal component. In
real-world stimuli, a range of modulations is usually
present, which can be summarized by the modulation
spectrum: the distribution of modulation energy for the
whole waveform or for a selected band of carrier frequencies in the waveform. The subjectively experienced quality of a modulated signal depends on modulation frequency so that the modulation spectrum also defines different perceptual ranges (see sect. II).
The impetus in early physiological studies to use
modulated stimuli (57, 62, 78, 183, 196) was a desire to go
1
The relationship of m to the waveform is the same as that of the
Rayleigh or Michelson contrast ratio used in vision research: m equals
the difference between the maximum and minimum luminance divided
by their sum.
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
543
FIG. 1. A: superimposed waveforms of an unmodulated 1,000-Hz tone (thin line) and the same tone sinusoidally
amplitude modulated (AM) (thick line) at 100% with a modulation frequency of 100 Hz, according to Equation 1. Dashed
lines indicate the envelope. The amplitude is referenced to the peak amplitude of the unmodulated tone. B: idealized
spectrum of the AM tone in A. At 100% modulation, the amplitude of the sidebands is half that of the carrier, i.e., a
difference of 6 dB. C: average response in the form of a poststimulus time (PST) histogram of a nerve fiber to the signal
shown in A (stimulus duration, 50 ms). D: spectrum of the PST histogram in C. The components at carrier frequency (fc)
and fc ⫾ modulation frequency (fm) indicate that there is phase-locking to the fine-structure of the stimulus waveform.
The component at fm is prominently present in the response but is absent in the stimulus (B). The small circle on the
ordinate indicates the average firing rate.
beyond the arsenal of simple stimuli (pure tones, clicks,
noise) that dominated much of the research at that time.
Somewhat similar to gratings in the visual domain, AM
and frequency modulation (FM) were regarded as elementary features of natural stimuli, which could reveal dynamic properties of the auditory system not addressed
with simpler stimuli. Interest in responses to AM was
rekindled in the 1980s and 1990s through a convergence
of different lines of research concerned with the “dynamic
range problem,” speech coding, pitch, and spatial localization of high-frequency sounds, among others. However,
AM signals are more than just a convenient laboratory
tool to study a diversity of psychophysical and physiological phenomena. The question that we are concerned with
here is whether envelope processing is embedded in the
auditory system, as may be expected from the ecological
prominence of envelopes.
Given the theory of natural selection, one can assume
that animals are well adapted to their specific acoustic
environment and that the statistical structure of the natural auditory environment or the “acoustic ecology” (5) is
Physiol Rev • VOL
reflected in the structure and function of the auditory
system. Acoustic ecology can be defined as the total
ensemble of sounds present in an animal’s environment,
from both inanimate as well as biological sources. Indeed,
the auditory systems of acoustically specialized animals
have revealed the existence of highly developed adaptations. Prominent examples include the echolocation system of bats (e.g., Ref. 61), the mating call detection system in frogs (245), and the alarm call differentiation in
vervet monkeys (275). Common to these examples is that
particular behaviors are elicited by a small set of signals
with specific, fairly invariant acoustic properties. Characterization of these lower order physical sound attributes
led to the discovery of special neuronal mechanisms.
Relatively little work has been done on the quantitative analysis of amplitude modulation statistics in acoustic ecologies and their consequences for neuronal processing. Not only overtly specialized but all animals are
likely to exploit consistencies in statistical properties of
the acoustical environment. Nelken et al. (194) found that
low-frequency amplitude modulations are prominent in
84 • APRIL 2004 •
www.prv.org
544
JORIS, SCHREINER, AND REES
natural environments and are often coherent over different frequency regions, and may be exploited by the auditory system in signal detection. Voss and Clarke (288)
computed temporal correlations of music passages and
discovered a 1/f scaling relation over a few decades. More
recently, Attias and Schreiner (6) decomposed music,
speech, and animal vocalizations into narrow-band frequency channels and studied the statistics of the amplitude and phase distributions for each channel. They also
found a distribution of modulation frequencies following
a power-law, indicating that the amplitude modulation
statistics of natural sound are non-Gaussian, cover a wide
range of modulation frequencies, and scale universally,
i.e., the frequency dependence is similar over different
frequency ranges. Using a mutual information metric between stimulus and spike trains, it was also found (7) that
neurons in the cat inferior colliculus are more efficient at
coding naturalistic stimuli than nonnaturalistic stimuli:
the information rate per spike for naturalistic stimuli was
more than 60% higher than for nonnaturalistic signals.
Similar results have been seen in the frog (232). This
implies that neural processing is adapted and perhaps
optimized for the encoding of naturally occurring modulation information.
Our purpose is to review physiological mechanisms
that may be important for the processing of temporal
envelope information. We first briefly highlight findings
from human psychophysics to illustrate some of the perceptual consequences of AM, but we refrain from a more
substantial discussion of the relationship between physiological mechanisms and perception. Rather, our focus is
on a simpler and more basic question; namely, within
what limits is AM encoded by single auditory neurons,
and does the form of encoding suggest that the temporal
envelope dimension is a fundamental organizing principle
in the auditory system; in the manner that tuning to
orientation, direction, or spatial frequency are considered
fundamental in vision.
For reasons of space, only occasional reference will
be made to the extensive research in bats or nonmammalian vertebrates, even though AM is often an important
feature in echolocation signals (156, 198, 258) and their
study often preceded the research reviewed here.
II. HUMAN SENSITIVITY TO
AMPLITUDE MODULATION
The ability of human listeners to detect and discriminate AM has been a topic of study since the 18th century.
The earliest means of producing a sound with a fluctuating amplitude envelope was to mix two pure tones differing slightly in frequency to generate beats. Thomas Young
and Helmholtz (287) both described the sensation of fluctuating amplitude experienced when listening to beats,
Physiol Rev • VOL
and Helmholtz described the changing quality of the
sound as the beat frequency was increased. He noted that
“the ear easily follows slow beats of not more than 4 to 6
in a second” while at 30 beats/s it is still possible to hear
the pulses of the tone, but it is no longer possible to hear
them as distinct events and they have a “jarring and
rough” quality.
With improvements in technology, subsequent studies (see Ref. 131 for historical review) extended and
quantified these findings. Zwicker (324) showed that the
threshold for detecting AM is very small at low modulation frequencies (threshold m ⬃2% for fm of 1– 4 Hz and fc
of 1 kHz) and increases to a maximum with increasing fm
(m ⬃5% for fm of 32 Hz and fc of 250 Hz; and for fm of 125
Hz and fc of 4 kHz). Above this maximum, threshold
decreases and falls below the values obtained at low
modulation frequencies, but in this range subjects perceive the carrier and the modulation frequency as distinct
tones. Zwicker (324) also determined that, for a given
carrier, thresholds for the detection of AM and FM measured in terms of their modulation depths coincide on the
upper side of the maximum at a modulation frequency he
termed the Phasengrenzfrequenz. This led Zwicker to postulate that above the Phasengrenzfrequenz [now termed
the critical modulation frequency (CMF) (250, 263)] the
carrier and sideband components are analyzed in different critical bands (auditory filters), and thus subjects are
not sensitive to differences in the relative phase of the
modulation components that enable them to distinguish
AM from FM below the CMF. More recent evidence suggests that the situation is more complex than this (180,
263), but nevertheless, it appears that when listening to
AM imposed on pure tone carriers detection may rely on
spectral rather than temporal cues over some ranges of
modulation frequency.
One means of eliminating spectral cues, and therefore estimating the temporal resolving power of the auditory system, is to measure the detection of sinusoidal
modulation imposed on noise rather than a tonal carrier.
The broadband spectrum of the noise precludes the listener detecting the individual spectral components of the
stimulus spectrum. The use of such stimuli (9, 285) demonstrated that the relationship between threshold and
modulation frequency (the psychophysical temporal modulation transfer function) is essentially a low-pass function with a 3-dB cut-off around 50 Hz and a slope of
⫺4 dB/octave. The minimum threshold modulation depth
is ⬃5% at low modulation frequencies (⬍10 Hz) where
subjects detect the individual amplitude changes in the
stimulus. The upper limit of modulation detection extends
to ⬃2.2 kHz (68, 285, 286). As will become apparent later,
this coincides with the very highest limits of neural phaselocking to envelopes obtained for some neurons in the
auditory periphery in cats (Fig. 2, Refs. 127, 229) and
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
FIG. 2. Amplitude modulation (AM) stimuli generate different percepts that encompass several regions of modulation and carrier frequencies. At very low fm, most strongly near 4 Hz and disappearing around 20
Hz, a sensation of fluctuation or rhythm is produced (hatched). The rate
at which the temporal envelope of fluent speech varies is also typically
4 Hz (syllables/s). Fluctuation makes a smooth transition to a percept of
roughness, which starts at ⬃15 Hz (bottom curved line), is strongest
near 70 Hz, and disappears below 300 Hz (top curved line). Harmonic
complex tones produce a pitch that corresponds to a frequency close to
the fundamental frequency. However, the lower harmonics can be removed without affecting the pitch, resulting in “residue pitch” if fc and
fm are chosen within the shaded region. Finally, small interaural time
differences (ITD) can be detected between modulated stimuli to the two
ears for a region of combinations of fm and fc that overlaps with the
region for residue pitch (thick line). Note that these are regions in
stimulus space where modulation is perceptually relevant, but the precise relationship of these percepts to physiological response modulation
is usually unclear. For reference, the small dots indicate ⫺10 dB cutoff
values for modulation transfer functions (MTFs) of auditory nerve fibers
(cf. Fig. 3C) [based on further analysis of data reported by Joris and Yin
(127)]. Delineation of psychophysical regions is based on References 16,
104, 233, 278, 325. The ordinate is truncated at 4 Hz.
exceeds the limit for phase-locking to envelopes in more
central neurons. This raises questions as to the nature of
modulation encoding in the central auditory system, even
when one takes into account the encoding of modulations
by changes in average rate that become apparent at more
central sites.
Although, as Zwicker noted, a distinct pitch at the
frequency of modulation is perceived when components
of the stimulus spectrum can be resolved, weaker but
nevertheless clear pitches are also perceived with modulations containing no resolved components (179, 233).
Even modulations imposed on noise carriers can generate
pitches which though weaker than those generated with
tonal stimuli are able to support melody recognition (21,
22). Taken together, these findings demonstrate that the
periodicity or residue pitches of some modulations must
result solely from temporal analysis, but when resolved
components are present, pitch salience is increased. Figure 2 schematically indicates the combinations of carrier
Physiol Rev • VOL
545
and modulation frequencies resulting in the percepts of
fluctuation, roughness, and residue pitch. (Sensitivity to
binaural envelope disparities is discussed in section VI.)
Two competing models have been proposed to explain the detection of AM. The first consists of a bandpass
filter and half-wave rectifier representing processing by
the cochlea, followed by a low-pass filter (285). Some
measure of the output of this filter provides the basis for
the subject’s response (see Ref. 181 for discussion). In
essence, therefore, this model is an envelope detector.
The second scheme models the detection of modulation
by a bank of bandpass filters that are sensitive to different
ranges of modulation frequency. A channel or filterbank
model of modulation analysis was first proposed by Kay
and colleagues (84, 132) on the basis of adaptation studies
with FM and AM. Subsequently, the adaptation paradigm
was questioned (178, 289), but the concept of a modulation filterbank persists because studies using different
psychophysical paradigms have since reported findings
which support the concept of modulation frequency tuning. Evidence for such selectivity comes from modulation
masking experiments (8, 107), and modulation detection
interference (MDI), a phenomenon in which the detection
of AM is influenced by modulation at the same frequency
but on a very different carrier (318). Dau et al. (36)
invoked a model consisting of a modulation filterbank
associated with each auditory filter to account for the
detection and masking of sinusoidally amplitude-modulated narrowband noise. The latter model was extended
(283) to account for comodulation masking release, another phenomenon, like MDI, that indicates some element
of modulation waveform analysis across different carrier
frequencies (96) (see Ref. 180 for review). Such acrossfrequency interactions between similar modulation envelopes are likely to contribute to grouping and the construction of auditory images (90). Despite different lines
of evidence favoring some form of modulation filterbank,
the concept remains controversial, and the experimental
findings discussed above do not concur in their estimates
of the bandwidths for these putative channels.
III. NEURAL RESPONSE MEASURES
In neurophysiology, one can generally think of a
variety of ways in which stimulus features may be “encoded” and processed (208), and it is not immediately
obvious which aspects of neuronal behavior are the most
relevant for the perceptual task at hand. With few exceptions, the response measures used in studies of AM are
average discharge rate (i.e., the number of spikes evoked
over several modulation cycles), or some measure of
synchronization of the timing of action potentials to the
envelope waveform.
84 • APRIL 2004 •
www.prv.org
546
JORIS, SCHREINER, AND REES
The earliest single-unit studies of peripheral auditory
neurons already reported synchronization to the finestructure of tones, in the sense that discharges occur at a
particular phase of the cyclical waveform. For example,
auditory nerve fibers have the striking capability to
“phase-lock” to low-frequency tones up to several kiloHertz [4 –5 kHz in the cat (121), but the upper limit is
species dependent (298)]. Phase-locking also occurs to
stimulus envelope; both forms of phase-locking are immediately apparent in the poststimulus time (PST) histogram (Fig. 1C) to the AM stimulus of Figure 1A. The fine
spacing of peaks at intervals of 1 ms indicates phaselocking to the 1-kHz fine-structure; the grouping into
broader peaks spaced by 10 ms indicates phase-locking to
the 100-Hz envelope. In contrast to the stimulus spectrum
(Fig. 1B), the response spectrum (Fig. 1D) shows energy
at fm, i.e., the AM signal is demodulated. Several cochlear
nonlinearities with asymmetry between the positive and
negative part of the transfer function can contribute to
this demodulation, the most important being half-wave
rectification in the relationship between displacement of
hair cell stereocilia and receptor potential, and in the
absence of negative firing rates (135). The response spectrum also shows a value at 0 Hz (Fig. 1D: small circle on
ordinate) which equals the average firing rate. In this
review, we will use the terms envelope synchronization
and envelope phase-locking synonymously to refer to synchronization of the response to the stimulus envelope
waveform, and use the term rate coding for changes in
average firing rate during manipulation of the stimulus
modulation parameters.
Different synchronization measures have been used,
sometimes leading to seemingly contradictory statements. The most popular metric is “vector strength” R,
also called synchronization index (81). Each spike is
treated as a vector of unit length and with phase ␪i between 0 and 2␲ measured as the spike time modulo the
stimulus period of interest. The x- and y-components of
the vector are xi ⫽ cos␪i and yi ⫽ sin␪i. The n spikes in a
response are combined by vector addition, and the resultant vector is normalized to n
冑冉冘 冊 冉 冘 冊
2
n
R⫽
i
xi
2
n
⫹
i
yi
(3)
n
which takes values between 0 and 1. R can also be obtained from the Fourier spectrum of the PST or period
histogram, in which case it equals the magnitude of the
first harmonic, normalized by the DC component (average
firing rate). Phase ␾ is also retrieved with either technique. Statistical significance of synchronization is usually
quantified with the Rayleigh test (23, 168).
Physiol Rev • VOL
As will become clear in this review, envelope coding
at peripheral stages is predominantly temporal rather
than rate-based, but these two aspects of the response
progressively reverse in prominence at successive stages
along the neuraxis. Because both average firing rate and
synchronization may contribute to the impact that a neuron has on its postsynaptic targets, many experimenters
have combined the two metrics by multiplication (nR,
with n ⫽ total number of spikes, variously called “modulated rate,” “phase-locked rate,” “synchronized rate”), or,
equivalently, by reporting the unnormalized Fourier component, expressed in spikes per second (33, 141, 224, 314).
Recently, some authors have used 2nR2, which is also the
statistic used in the Rayleigh test of significance (157,
266). Finally, envelope synchronization is often reported
as a gain value (in dB), defined as 20 log10 (2R/m), which
relates output directly to input and facilitates comparison
across studies which use different modulation depth m.
The vector strength metric, often under different
names (e.g., selectivity index), has found general use in
the quantification of periodic neural signals in sensory
and even motor physiology (43). Despite its pervasive use,
it is important to be aware of its limitations. First, the
metric gives only the degree to which the response is
modulated to the frequency at which R is calculated (we
use the subscripts m and c to indicate modulation frequency and carrier frequency, respectively). It does not
capture the full harmonic content of the cycle histogram
at fm so that histograms with a rather different shape can
result in the same Rm value (see Ref. 127 for an example).
An Rm value of one only results from perfect alignment of
all spikes at one phase, but a value of zero does not
necessarily indicate a random distribution of spike times.
For example, if spike times are equally divided between
phase ␾ and ␾ ⫹ ␲, the average vector has zero magnitude. Thus a low vector strength should not necessarily be
equated to absence of temporal structure in the spike
train, but rather is an indication of lack of energy at the
frequency for which R was calculated. Second, high R
values indicate that spikes are distributed over a narrow
time window relative to the period of interest, but such
values do not imply a faithful replica of the stimulus
modulation waveform in the probability of discharge. As a
reference, a PST histogram that closely resembles a halfwave rectified sinusoidal AM signal with m ⫽ 1 gives R ⫽
0.5. Higher R values are obtained when the period histograms are more “peaked” than the original sinusoidal
modulation signal. Third, R is a compressive metric and is
therefore sometimes graphed on an expansive scale (120).
Finally, a problem at a more general level is that calculation of Rm requires knowledge of fm, a strategy that the
brain cannot use. It may be argued that a “clock” signal is
available in the form of the highly synchronized discharge
of some types of cochlear nucleus neurons, which could
be used to perform a vector strength type calculation in
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
which degree of synchronization is translated into average firing rate, e.g., as suggested in the periodicity extraction scheme by Langner (150). Some authors have used
interspike interval or autocorrelation analysis to bring out
the time structure of responses that may be more relevant
to the operations performed by the central processor (27,
85, 123, 141, 226, 301). In this context it is important to
remember that the envelope of most natural sounds is not
strictly periodic in the first place and that the raw acoustic
waveform is not available as such to the auditory nervous
system. Rather, this waveform is decomposed into a multitude of waveforms by virtue of cochlear narrowband
filtering (reviewed in Refs. 206, 234). This process profoundly affects the modulation spectrum present in each
frequency channel, which is thus determined jointly by
the spectrotemporal properties of the acoustic stimulus
and of those of the peripheral filtering process (for illustrations, see Ref. 286). In summary, while most studies
discussed here have used deterministic stimuli with periodic envelopes and have applied the R metric, it is important to keep in mind that, for natural stimuli, the relationship between neural response modulation and stimulus
modulation is more complex and that the neural operations by which the central processor extracts envelope
information likely differ fundamentally from the analytical ways of the experimenter.
The bulk of studies on AM coding have used the same
stimulus strategy, which is to tailor the stimulus to the
cell under study. Early work (78, 183) established that
peripheral neurons display envelope phase-locking only if
the stimulus energy falls within a cell’s tuning curve. For
example, Javel (114) shows the lack of response of an
auditory-nerve fiber tuned to 800 Hz to a high-frequency
AM complex (fc ⫽ 5 kHz) modulated at 800 Hz. Most
studies using AM stimuli with tonal carriers match fc to
the neuron’s characteristic frequency (CF, frequency of
lowest rate threshold), and usually also optimize other
stimulus parameters for the cell under study. The complementary approach, in which the population response
of cells at many different CFs is studied to a limited set of
stimuli, has been little used (27, 293).
A description employed both acoustically, psychophysically, and physiologically, is the modulation transfer
function or MTF, which is response modulation relative to
input modulation as a function of modulation frequency.
Schroeder (257) predicted more than 20 years ago that the
concept of MTF would increase in importance because
the modulation rather than the carrier usually contains
the important information and because highly nonlinear
transmission systems often exhibit a quasi-linear response to modulation. Physiologically, MTFs are usually
measured as the phase-locking to AM tones of fixed m and
fc presented at consecutive modulation frequencies, but
Physiol Rev • VOL
547
other methods have been employed (see sect. IXB).
Marked effects on average rate occur so that a distinction
between temporal MTF (tMTF) and rate MTF (rMTF) is
usually drawn.
IV. AUDITORY NERVE: BOTTLENECK TO THE
CENTRAL NERVOUS SYSTEM
A. Basic Auditory Nerve Properties
Activity in the auditory nerve represents both the
output of the cochlea and the input to the central nervous
system, and studies of envelope phase-locking have been
conducted both to gain more insight into cochlear processing and to define the limits within which the central
processor has to operate. Compared with optic and peripheral somatic nerves, the auditory nerve is highly uniform both morphologically (in caliber and branching pattern) and physiologically. We only discuss type I auditory
nerve fibers, which form the bulk of the nerve, since near
to nothing is known about the physiology of the unmyelinated type II fibers. Because each type I nerve fiber contacts only a single inner hair cell, its activity can, to a first
approximation, be understood from basilar membrane
motion at a single point in the cochlea followed by further
signal modifications by the inner hair cell and hair cell/
nerve synapse (76, 136, 137, 243). The most salient properties are 1) sharp V-shaped tuning to a narrow range of
frequencies; 2) a limited dynamic range of ⬃20 –30 dB,
reflected in an sigmoidal rate-level function; 3) adaptation
of firing rate to sustained stimuli, rather modest compared with adaptation of peripheral nerve fibers in other
systems; and 4) phase-locking to low-frequency pure
tones (⬍4 –5 kHz in the cat).
Auditory nerve fibers show a bimodal distribution of
spontaneous rate (SR), on the basis of which several
classes of fibers are defined that differ in a number of
properties (158, 246, 305). Fibers with high SR (⬎18
spikes/s), which in cat form ⬃60% of the total population,
have low thresholds and limited dynamic range. Fibers
with medium and low SR have higher thresholds and tend
to have “sloping” saturation, i.e., their rate-level functions
show a decrease in slope at ⬃30 dB above threshold but
do not fully saturate. Also, low-SR fibers show less adaptation than high-SR fibers (230). Differences between the
SR classes have been documented mostly with pure tone
and spectrally complex stimuli, but AM stimuli have revealed response differences in the time domain as well.
We first discuss how the basic AM parameters m, sound
pressure level (SPL), fm, and fc (Fig. 3) influence synchronization and average rate, then describe the response
phase.
84 • APRIL 2004 •
www.prv.org
548
JORIS, SCHREINER, AND REES
FIG. 3. Basic dimensions and manipulations in an
AM signal and their effect on auditory nerve activity.
The relationship of an auditory filter (curve) and AM
spectrum are shown schematically for variations in
modulation depth m (A), sound pressure level (SPL)
(B), modulation frequency (fm) (C), and carrier frequency (fc) (D). For each manipulation, three measures of the responses of an auditory nerve fiber are
shown: average rate (rate, dashed line), synchronization magnitude (R, solid line), and synchronization
phase (␾, thin line).
B. Average Response Rate and Magnitude
of Synchronization
When a tone is presented at a fiber’s CF at a fixed
suprathreshold level and is modulated with increasing
depth, the nerve fiber shows a monotonic, saturating increase in synchronization Rm (Fig. 3A). Although Rm increases with m in absolute terms, synchronization magnitude decreases in relative terms, i.e., the gain (response
modulation relative to stimulus modulation) decreases
(127). The gain can be as large as 10 dB for m of 10% and
decreases to values near 0 dB for m of 100%.
Responses to AM as a function of stimulus intensity
have been studied extensively in a variety of animals
(guinea pig, Ref. 33; chinchilla, Ref. 114; cat, Refs. 127,
135, 294; gerbil, Ref. 270). The rate-level function with AM
shows only small differences relative to the function obtained with an unmodulated carrier wave (127, 270). The
Physiol Rev • VOL
synchronization-level (Rm vs. SPL) function shows a stereotypic nonmonotonic shape; a maximum is reached at
low suprathreshold levels, with a decrease in Rm for
further increases in SPL (Fig. 3B). It is easy to see how
this relationship is expected from the compressive relationship between firing rate and SPL, especially when the
modulation depth m is small; maximal modulation of
firing rate should occur for amplitude changes centered
on the steepest part of the rate-level function, between
firing threshold and saturation. At high SPLs, amplitude
fluctuations should not translate into fluctations in firing
rate because firing rate is saturated. Qualitatively the
synchronization-level function does indeed show the expected nonmonotonic shape. However, compared with
quantitative predictions based on the rate-level function,
the observed synchronization shows 1) larger maximal R
values, 2) a maximum that is displaced towards a higher
SPL, and 3) higher synchronization values at high SPLs
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
and a shallow downward slope. These deviations are
predicted when adaptation over a short time scale is
taken into account (33, 270, 311). Basically, adaptation
boosts the coding of stimulus changes so that the operating range over which changes in SPL result in changes
in firing rate is larger for responses to AM than for steadystate responses to pure tones.
There are systematic differences in AM responses of
the different SR classes of auditory nerve fibers. One
descriptor commonly used to compare envelope phaselocking across cell populations is the maximal R value of
the synchronization-level function (Rmax). Cells with low
and medium SR tend to have higher Rmax values than cells
with high SR, and this difference is particularly marked at
low CFs (⬍5 kHz) (127, 294). However, the difference in
synchronization between these different auditory nerve
classes strongly depends on the synchronization metric
used (33, 127, 183, 295). In contrast to earlier reports,
Cooper et al. (33) concluded that fibers with high SR
showed larger envelope synchronization values than low
SR fibers. Their result is less of a conflict than it appears
if it is taken into account that the metric used by these
authors was (unnormalized) modulated rate rather than
Rm, that the average discharge rate of fibers with low SR
is generally lower than that of fibers with high SR (158),
and that the sample of Cooper et al. is biased to high CFs
(⬎8 kHz).
Synchronization is robust in high SR cells at low SPLs
and in low and medium SR cells at mid and high SPLs
(294). However, the different fiber populations reach
maximal synchronization at the same level relative to rate
threshold (33, 294). Low SR fibers have a larger dynamic
range over which significant modulation is present (33),
lending further support to the general hypothesis that
these fibers are particularly important for hearing at high
SPLs.
The narrow bandpass filtering by the cochlea limits
the range of modulation frequencies transmitted by nerve
fibers. As schematized in Figure 3C, increase of fm causes
the sidebands in the stimulus spectrum to move away
from fc. If fc is centered at the CF of the fiber studied, the
energy in the sidebands is increasingly attenuated, resulting in a loss of modulation at the output of the peripheral
filter. The response as a function of fm is usually referred
to as the modulation transfer function (MTF) and again
one should clearly distinguish effects on average rate
(rMTF) from effects on synchronization to fm (tMTF). The
rMTF is usually flat but may show some decrease in rate
with increasing fm, particularly in low-SR fibers (127). In
contrast, tMTFs all have a low-pass shape (guinea pig,
Ref. 203; cat, Ref. 127; rat, Ref. 186; Fig. 3C). These
functions are smooth and do not show any structure
related to harmonic ratios, i.e., whether or not the AM
components (fc and the two sidebands) are integer multiples of fm is inconsequential. The absolute bandwidth of
Physiol Rev • VOL
549
frequency tuning curves, e.g., at 10 dB above threshold,
increases with CF (59, 86, 230), and the cut-off frequency
of tMTFs shows a concomittant increase with CF (Fig. 2).
At very low CFs (a few hundred Hz), a tMTF cut-off
frequency can often not be determined because of the
broad frequency tuning. Interestingly, for CFs above ⬃10
kHz, the increase in cut-off frequency is not commensurate with the increase in bandwidth of frequency tuning at
these high CFs. This presumably reflects temporal filtering at the hair cell/synaptic level rather than spatial filtering at the mechanical level (86, 127). The highest modulation frequency at which significant envelope phase-locking is observed, in high-CF nerve fibers, is ⬃2 kHz (127,
229). A less marked feature of many tMTFs is a shallow
positive slope in the low-frequency skirt (94, 127). According to Cooper et al. (33), this slope tends to become
steeper at high SPLs, consistent with models that include
effects of response adaptation (311).
Clearly, the extent of envelope phase-locking in the
auditory nerve is sufficiently wide to encompass psychophysical existence regions (Fig. 2). Javel and Mott (115)
attributed the disappearance of residue pitch at fc ⬎5 kHz
to increased sharpness of tuning of high-CF fibers (59,
230). However, while bandwidth limitations may contribute to the upper fm limit of ⬃800 Hz, they do not explain
the disappearance of residue pitch altogether.
The dependence of envelope phase-locking on carrier frequency, relative to CF, has not been explored in
great detail (114, 127, 295). It merits further study because
the available data suggest an important effect. If fc is
moved away from CF, the synchronization-level function
shifts to higher SPLs. Consequently, for moderate to loud
stimuli, strongest phase-locking is present in fibers with
CFs that differ from fc, provided that the stimulus is able
to excite these fibers (Fig. 3D). Thus, for all but the
weakest signals, the representation of stimulus envelope
may be carried mainly by fibers tuned to frequencies that
differ from fc.
C. Phase of Synchronization
Few studies reported phase or latency data for AM
stimuli. For a given fiber, the phase of response to the
envelope shows a slight lead with increasing SPL (127)
and, at fixed suprathreshold levels, varies little with
changes in carrier frequency (122). In contrast, response
envelope phase increases nearly linearly with fm. The
slope of this relationship has been used as an estimate of
the total delay accrued between the acoustic stimulus and
the site of recording, similar to earlier such measurements on responses to pure tones in low-CF fibers (4).
The linearity of the phase-fm relationship indicates that it
is mostly determined by fixed mechanical and neural
transmission delays. Consistent with other delay or onset
84 • APRIL 2004 •
www.prv.org
550
JORIS, SCHREINER, AND REES
latency measures, the values obtained vary systematically
and inversely with CF (127, 294), as expected from the
travelling wave on the basilar membrane which starts at
the base of the cochlea and reaches its more apically
located maximum after some delay. However, many processes contribute to the total delay (242, 244). Gummer
and Johnstone (93) scanned envelope delay of nerve fibers near their tuning curve threshold, using AM complexes of fixed fm and low modulation depth over a large
range of carrier frequencies. They found a delay component that was large for carrier frequencies near CF and
smaller in the tuning curve tail, and the authors provide
several arguments to suggest that this component reflects
a delay associated with cochlear bandpass filtering.
The preceding descriptions are based on synchronization of the response to the envelope frequency. Again, it
is important to bear in mind that such descriptions are
incomplete. The shape of cycle histograms can depart
severely from the shape (usually sinusoidal) of the stimulus envelope, particularly at high SPLs and at large modulation depths. Therefore, the spectrum of the cycle histogram typically consists of a number of spectral peaks,
of which the peak at fm is only one, and not necessarily
the largest, component (135, 294). Also, the most salient
temporal information present in the discharge patterns is
not necessarily revealed by calculation of synchronization
to stimulus components. For example, robust phase-locking to fm does not imply that the most common interspike
intervals are at the period of fm: for envelope periods of
several tens of milliseconds multiple spikes occur per
envelope cycle, while periods shorter than a few milliseconds succeed each other too fast to allow a spike in every
envelope cycle. An interesting discrepancy between envelope phase-locking and dominant interspike intervals is
in “pitch-shift” effects of changes in fc (27, 114): phaselocking to fm stays roughly constant, while the most dominant interspike interval shifts in a direction which parallels the subjective pitch of the AM stimulus.
In summary, envelope information is abundantly
available in auditory nerve discharges in temporal form.
Each nerve fiber transmits envelope information over a
stereotypical range of modulation frequencies, carrier frequencies, and intensities. These ranges are consistent, at
least at a qualitative level, with known auditory nerve
properties of frequency tuning, compression, adaptation,
and spontaneous activity, and computer models incorporating these properties reproduce the main features of AM
responses (105, 117, 271). The main way in which the
auditory nerve is a bottleneck to the central nervous
system for AM signals is in the extent of modulation
frequencies over which synchronization occurs. This
range cannot be enlarged centrally, except possibly for
frequencies at which fine-structure information is available (⬍4 –5 kHz), because AM arises from a time-domain
interaction of stimulus components.
Physiol Rev • VOL
V. COCHLEAR NUCLEUS: PARALLEL CHANNELS
The key dynamic properties of cells in the cochlear
nucleus (CN) and the differences with the auditory nerve
were described in the pioneering studies of Møller (183,
184, 187): enhanced gain over a large dynamic range, low
levels of distortion to sinusoidal modulation, i.e., a rather
faithful tracking of the sinusoidal envelope, presence of
bandpass tMTFs particularly at high SPLs, and similar
tMTF shape for different forms of modulation (sinusoidal
AM of pure tone or noise carriers, noise-modulated tones,
noise-modulated noise). However, the marked diversity of
CN cells supports a variety of AM response patterns,
evident in the earliest CN studies (78), and necessitates a
discussion of AM responses per cell type rather than
global statements about the CN or its subdivisions. Limited attempts have been made (not reviewed here) to
uncover the mechanisms underlying the auditory nerve to
CN transformations, for gain enhancement in particular
(72, 228, 296, 323).
A. Basic Organization of the CN
An important insight that emerged from study of the
CN with simple stimuli was that a limited number of
response patterns or “classes” could be discerned and
that these patterns are related to morphological cell
classes (18, 202). Especially through the technique of
intracellular labeling, many of the structure-function relationships that were surmised earlier on the basis of
indirect evidence were solidified. The physiological diversity of these different cell types, combined with the diversity of their central projections (297), led to the concept
of functionally specialized, parallel pathways (for review,
see Refs. 26, 69, 112, 227, 319).
Briefly, three subnuclei are defined on the basis of
the bifurcation pattern of the auditory nerve. The anteroventral cochlear nucleus (AVCN) has three principal cell
types. Stellate cells project to the inferior colliculus (IC)
and respond to tones with a burst of regularly spaced
action potentials called a “chopper” pattern. Bushy cells,
which derive their name from their small and confined
dendritic tree and which are remarkable for their strong
inputs from the auditory nerve, occur in two types. Spherical bushy cells receive large calyceal auditory nerve terminals (end bulbs of Held) and show responses similar to
auditory nerve fibers and are therefore called “primarylike” (PL). Their main projection is to binaural nuclei in
the superior olivary complex. Globular bushy cells also
receive large nerve terminals in the form of modified end
bulbs of Held, and show a characteristic “primary-likewith-notch” (PLN) pattern in response to tones. Their
main projection is contralaterally in the superior olivary
complex where they give rise to giant calyceal endings on
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
551
cells in the medial nucleus of the trapezoid body, which
are inhibitory on binaural cells in the lateral superior olive
(LSO). The posteroventral cochlear nucleus (PVCN) contains octopus cells that project to the ventral nucleus of
the lateral lemniscus (VNLL) and show pure onset (Oi)
responses to tones. It also contains inhibitory multipolar
cells that project to the dorsal cochlear nucleus (DCN)
and the contralateral CN and which show onset-chopper
(Oc) responses. The principal neurons of the DCN are
the fusiform cells, which project to the IC and display
remarkably nonlinear spectral properties. These properties arise through local inhibitory interactions with interneurons in DCN (type II cells) and presumably with the Oc
cells (195).
The classification of CN cells is mostly based on
subjective criteria, which contributes to discrepancies in
conclusions of different studies. Although there is by no
means an agreed upon “task” for each of these circuits, it
is clear that each cell type performs a different analysis of
the auditory nerve input and conveys its output to a
different part of the auditory brain stem. The bushy cells
are clearly involved in binaural analysis important for
spatial localization of sounds. Stellate cells are able to
represent vowel spectrum over a wide range of intensities. Fusiform cells integrate somatosensory and spectral
information and may signal important auditory events.
Responses to AM offer another illustration of how CN cell
types differ in their processing of auditory nerve input.
B. AM Responses of Neuronal Types in the CN
The relationship between AM coding and physiological cell class, as defined by the response to pure tones,
was first examined by Frisina and co-workers in the gerbil
(70, 71). These authors found that envelope phase-locking
in ventral cochlear nucleus (VCN) was generally enhanced relative to the auditory nerve, and they described
a hierarchy of enhancement that correlated with the precision of timing of response onset to pure tones. Of the
four physiological VCN cell types studied, cells with welltimed onset responses showed the highest gains, followed
by choppers, PLN, and PL. The decrease in synchronization with increasing intensity is less than in the auditory
nerve and in some cell types depends on fm, resulting in a
peaked or tuned tMTF at high SPLs. Particularly these
latter two response features, extended dynamic range and
selectivity to fm, received much attention in later studies
(Fig. 4). The general behavior of synchronization as a
function of SPL and fm described by Frisina et al. (71) was
confirmed and extended to other cell types in many subsequent studies, even though not all studies agree on the
exact hierarchical ordering and the discreteness of the
ordering.
Some of the most interesting responses were obPhysiol Rev • VOL
FIG. 4. Two important transformations between the auditory nerve
(dashed lines) and cochlear nucleus (solid lines). A: enhancement of
envelope synchronization and extended dynamic range is present in
many cell types. B: some cell types show bandpass tMTFs.
served in cells with chopper responses. Choppers are
temporally tuned for fm, as reflected in bandpass tMTFs
particularly at higher SPLs (gerbil, Ref. 71; cat, Ref. 229).
A small percentage of choppers also shows bandpass
tuning in their rMTFs (228). The fm causing the strongest
synchronization is called the temporal best modulation
frequency (tBMF). The occurrence of bandpass tuning is
of obvious importance to the concept of a “modulation
frequency filter bank” or “modulation channels” (131).
This concept has some popularity, particularly in the
psychophysical literature (see sect. II), and will be taken
up again in our discussion of IC and auditory cortex.
As mentioned, “chopping” reflects the intrinsic tendency to fire a regular burst of spikes at the beginning or
sometimes entire duration of the stimulus, and these cells
have therefore been viewed as resonators or intrinsic
oscillators (150). SPL-dependent bandpass tuning and os-
84 • APRIL 2004 •
www.prv.org
552
JORIS, SCHREINER, AND REES
cillatory responses were also described earlier by Møller
(187) in the rat. In a subclass of cells in the guinea pig, the
intrinsic behavior is invariant with SPL and affects the
temporal characteristics of the response to nondeterministic stimuli (301). There is a possibility that the intrinsic
properties make these cells function as envelope filters
that decompose the envelope spectrum, much in the way
that inner hair cells in the turtle cochlea decompose
stimulus frequency by virtue of an intrinsic electrical
resonance mechanism (63). Several authors have therefore looked for correlations between AM and intrinsic
oscillation behavior. Frisina et al. (71) compared the frequency of chopping with the tBMF for a sample of sustained choppers in VCN. The tBMFs spanned a range
(170 –700 Hz) roughly similar to the range of chopping
frequencies (80 –520 Hz), but the correlation between the
two response properties was poor. There was a suggestion of interaction between chopping frequency and fm in
that the tBMF only rarely exceeded the chopping frequency, which therefore seemed to set an upper bound. In
a subpopulation of choppers (sustained choppers with a
well-defined tBMF between 150 and 450 Hz), Rhode and
Greenberg (229) noted a tendency for maximal envelope
synchronization when fm matched the discharge rate to a
tone at the same intensity.
A strong and more general relationship, not restricted to choppers, was found by Kim et al. (141) in
DCN/PVCN neurons of the unanesthetized decerebrate
cat. In this study, the “intrinsic oscillation” frequency of a
neuron was measured from the autocorrelation of its
responses to pure or AM tones. Frequency of intrinsic
oscillation and BMF were well correlated (r ⫽ 0.86) with
regression close to the diagonal of equality, and the frequency ranges were roughly similar (50 –500 Hz) to those
reported for VCN choppers (71, 229). Importantly, the
remarkably good correlation arose from the pooling of
different cell groups, rather than from a within-population
trend, complicating any AM-coding scheme based on intrinsic oscillators. At least five cell types contributed to
the data, surprisingly also including auditory nerve fibers.
Besides choppers, the other main constituent cell
types of the AVCN are the two types of bushy cells with
PL and PLN responses. As expected from their powerful
auditory nerve inputs, PL and PLN cells resemble auditory
nerve fibers in many regards, and indeed, their Rmax and
tMTF cut-off frequency distributions at different CFs
largely overlap that of the auditory nerve (129, 229). For
PL cells this overlap is virtually complete, but for CFs
below ⬃7 kHz, PLN cells synchronize much better to
envelopes than auditory nerve fibers. At very low CFs
some bushy cells have enhanced synchronization to both
fine-structure and envelopes (124).
Comparisons of cell types across studies illustrate
that one has to be careful with simple characterizations to
multi-dimensional stimuli like AM. As remarked by Rhode
Physiol Rev • VOL
and Greenberg (229), a single response parameter is not
sufficient to characterize envelope synchronization. The
highest gains found in choppers exceed those of PL cells
but are mostly at fm values below 500 Hz (129, 229) so that
at higher modulation frequencies PL cells are superior to
choppers in transmitting envelope information. Consequently, the hierarchy of modulation enhancement
strongly depends on the range of modulation frequencies
of interest and also, as pointed out earlier (see sect. IVB),
on the chosen metric (266). Rather than providing an
exhaustive listing of response parameters for all cell
types, we emphasize here the properties by which different CN cells stand out most from the auditory nerve and
from each other. For chopper cells this is the bandpass
tuning of tMTFs; for bushy cells it is the extent of the
tMTF (high cut-off frequencies).
The two main response types found in PVCN are
onset (Oi and Oc), associated with the octopus and multipolar morphology, respectively. Both cell types show
remarkable envelope phase-locking, in line with the precision of their onset response to pure tones. Oc cells have
been particularly well-studied (cat, Refs. 125, 140, 228,
229). These cells show some of the highest gains, over the
widest fm and SPL range, which is why Kim et al. (140)
proposed that these cells have a special role in the extraction of the fundamental frequency of voiced speech
sounds. Moreover, large changes in fc and even use of a
wideband carrier have little effect on magnitude of synchronization (228). Oi cells have been studied very little,
but the few existing data reveal interesting properties, in
line with their biophysical specializations (199). These
cells show the highest gains of all CN cells, reaching Rm
values near 1 (228). Moreover, their tMTFs are high in
gain and invariant for SPL, but all-pass. The rMTFs of
these two classes of onset cells also appear unique among
CN cell classes because they can be sharply bandpass.
It is unclear whether these bandpass rMTFs can sustain
a rate code for modulation frequency: among the handful of Oi cells reported, the range of rBMFs was only
350 – 450 Hz.
Onset units have wider frequency tuning than auditory nerve fibers (80, 118, 231). They therefore provide a
test case of the suggestion that is sometimes made that
tMTF bandwidths may broaden centrally by virtue of
convergence of cells tuned to different CFs (180, 286).
However, this would require phase information on the
individual spectral components of the AM stimulus, and
for frequencies above the pure-tone phase-locking range
(⬎4 –5 kHz in cat), such information is not available to the
central processor. Indeed, despite their wider frequency
tuning, tMTF cut-off frequencies of onset cells do not
exceed the limits imposed by the auditory nerve (125,
228, 229).
The DCN has traditionally been regarded as a part of
the CN which has poor timing properties (79, 82, 154), and
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
initial studies with AM seemed consistent with that view
(horseshoe bat, Ref. 282; kangaroo rat, Ref. 29). However,
more recent studies emphasized good AM coding in DCN
(cat, Refs. 125, 229, 254; guinea pig, Refs. 322, 323) and
specific roles for DCN in temporal processing have been
proposed [pitch (150); extraction of envelopes in background noise (73) or at high SPLs (229)]. The tMTFs are
typically low-pass or bandpass and differ from other CN
cell types in their upper fm limit of phase-locking which
never exceeds 800 Hz. To some extent, differences between studies reflect the complexity of this nucleus, both
in diversity of response types and in nonlinearity of behavior (319). Oc cells can be found in deep DCN and may
explain some of the high-gain responses to AM reported
for DCN. Second, simple measures like maximum synchronization or cut-off frequency do not reveal the full
complexity of DCN responses and give DCN a misleading
“AVCN-like” appearance. Even though DCN interneurons
and principal neurons can display high gain responses to
AM stimuli, their response often shows strong nonmonotonicities, not only in average rate but also in magnitude
and phase of envelope synchronization (125, 254, 322).
These nonmonotonicities are likely a manifestation in the
temporal domain of the intricate inhibitory and excitatory
interactions that have been invoked to explain similar
complexities in the frequency domain.
A preliminary study by Frisina et al. (73) in the chinchilla suggests that envelope synchronization of DCN
neurons can be enhanced by background noise, but more
systematic data and comparisons with auditory nerve and
VCN are needed to evaluate whether DCN neurons are
special in this regard. Rhode and Greenberg (229) studied
envelope synchronization in the presence of wide-band
noise in different CN cell types of the cat and found that
in general there is remarkable preservation of envelope
synchronization even at high noise levels.
As in the auditory nerve, few authors have systematically reported envelope phase data. Cells in the CN also
show a linear increase in envelope phase with increasing
fm, but the slopes are systematically steeper than in the
auditory nerve, consistent with additional time delays
required for conduction and synaptic transmission (125,
129). Delays calculated from response envelope phase are
more tightly distributed and shorter than traditional measures of latency based on response onset (94, 185), as is
the case for delay estimates based on fine-structure (65).
Most CN studies of AM coding considered only tMTF
magnitude and not phase when trying to infer functional
consequences of AM tuning for the perception of natural
stimuli. Delgutte et al. (40) used both tMTF magnitude
and phase of responses in auditory nerve, CN, and IC to
predict responses of the same neurons to speech utterances (see below) and stressed the importance of incorporating phase, particularly at very low modulation frequencies, to make succesful predictions.
Physiol Rev • VOL
553
To summarize, the CN shows marked differences in
AM coding relative to its auditory nerve input: wider
dynamic ranges, higher gains, appearance of bandpass
tMTFs, and less sensitivity to the presence of background
noise. Furthermore, different cell types show marked diversity in their synchronization and average rate behavior
to AM signals. A simple hierarchical ranking does not do
justice to the differences among cell types and depends
on whether one emphasizes Rmax values (71, 295), breadth
of the tMTF (129), or statistical reliability of phase-locking
(266). As in the nerve, AM coding is almost entirely temporal: bandpass rMTFs occur rarely, in a few cell classes.
Our knowledge of CN responses to AM is still lacking
in many ways and basically does not go far beyond phenomenology. Perhaps the most pressing question is the
robustness and relevance of bandpass tMTFs, which
many investigators regard as genuine envelope filters.
More studies are needed to determine how invariant tMTF
tuning is with stimulus parameters, what range of tBMFs
is spanned at different CFs, and whether tMTF tuning
indeed supports filtering of envelope energy in natural
stimuli. Such information would be particularly valuable
for carrier frequencies in the range of phase-locking to
fine-structure (⬍4 –5 kHz), which is poorly sampled in
most studies in small animal species with higher-frequency hearing than humans. There are other lacunae.
Data are sparse for certain cell types, most notably pure
onset units in PVCN. In most studies, the stimulus is
optimized for the cell under study; there is a need for
population studies in which the response to a limited set
of stimuli is examined for an entire population. Finally,
there is currently no evidence for any kind of within-class
topographic organization (e.g., within an isofrequency
strip) of AM response properties in the CN.
VI. SUPERIOR OLIVARY COMPLEX: AN
EXAMPLE OF TIME-TO-RATE CONVERSION
Part of the CN output is directed toward nuclei in the
superior olivary complex (SOC). This is an amalgam of
large and small nuclei some of which take part in wellstudied circuits whose function is in feedback to the
periphery (middle ear reflex and the olivocochlear efferent systems) or in the extraction of binaural differences
important in spatial hearing. The preceding and following
sections illustrate that, with some notable exceptions,
envelope coding in the CN is largely temporally based
while at the level of the IC partial conversion to a rate
code is apparent. In our discussion of SOC physiology we
highlight one aspect of these circuits: the conversion of an
envelope time code to an average rate code.
The duplex theory of sound localization holds that
the azimuthal spatial position of low-frequency signals is
determined primarily on the basis of the minute differ-
84 • APRIL 2004 •
www.prv.org
554
JORIS, SCHREINER, AND REES
ences in time at which the acoustic waveform reaches the
two ears, interaural time differences (ITDs), while highfrequency signals are localized on the basis of interaural
SPL or level differences (ILDs). This classical psychophysical theory seems to be embodied anatomically and
physiologically in two binaural circuits in the SOC of most
mammals. The circuit centered on the medial superior
olive (MSO) detects ITDs and contains primarily lowfrequency cells. Another circuit, centered on the lateral
superior olive (LSO), detects ILDs and has a bias towards
high CFs. The detailed physiology of these circuits and
their afferents is beyond the scope of this review (see
Refs. 279, 312, 316).
Starting in the mid-1970s, a number of investigators
reported that humans can reliably discriminate ITDs of
high-frequency signals at thresholds approaching those
for low-frequency signals, i.e., ⬍20 ␮s, provided that the
signals are not pure tones but have a time-varying envelope, as in AM sounds with the parameters illustrated in
Figure 2. Clearly, subjects can detect the on-going envelope differences that occur when complex stimuli are
delayed between the two ears with high precision. Physiological studies in the IC of cat (317) and rabbit (12)
provided evidence for ITD sensitivity to AM signals but
indicated that this sensitivity was probably generated at a
lower level. Subsequent recordings in the SOC indeed
revealed cells that were sensitive to interaural delays of
AM signals, and this ITD sensitivity could be understood
from the binaural interactions known to occur in these
nuclei and the AM coding properties of their afferents.
In the MSO, ITD sensitivity to AM signals is generated
by a multiplicative, cross-correlation type operation.
These cells behave as coincidence detectors, which has
been particularly well-documented for low-frequency signals (81, 126, 313) but holds for modulated signals as well.
The average firing rate of high-CF MSO cells to AM signals
varies with ITD (Fig. 5A). Moreover, the optimal ITD is
predicted from the phases measured from the monaural
response to an ipsi- or contralaterally presented AM signal:
the firing rate is high when the envelope signals from the two
ears arrive in-phase at the site of convergence (10, 122, 313).
In the LSO, ITD sensitivity to AM signals is generated
by a subtractive rather than a multiplicative process (Fig.
5B). These cells have ILD sensitivity by virtue of excitatory signals from the ipsilateral ear and inhibitory ones
from the contralateral ear. Again bushy cells constitute
both contra- and ipsilateral pathways. For ITDs at which
the inhibitory and excitatory phase-locked signals reach
the LSO cell coincidently, the signals cancel each other
and the cell remains silent. At other ITDs cancellation is
not perfect and the excitatory ear is now able to drive the
cell. Thus the ILD sensitivity of the LSO cell combined
with the envelope phase-locking in its afferents generates
overall changes in discharge rate with ITD (10, 11, 122,
128, 129). Interestingly, in anesthetized cats LSO neurons
show a “chopper” pattern to ipsilateral tone bursts, but
FIG. 5. Example of sensitivity to envelope interaural time differences (ITDs) in medial and lateral superior olive (MSO, row A; and LSO, row B).
Sensitivity to ITDs of binaural AM stimuli (left column) shows a complementary pattern in MSO vs.
LSO and is consistent with the response phase to
monaural modulation (middle column): ITDs which
bring the monaural responses in-phase cause a high
firing rate. The complementarity arises from opposite signs of binaural interaction (right column): in
MSO a process of coincidence detection operates
on excitatory inputs from both sides, while in LSO a
subtractive process operates on excitatory ipsi- and
inhibitory contralateral inputs. The LSO response to
contralateral modulation (middle column, bottom)
would be as obtained by presenting an unmodulated
stimulus to the ipsilateral, excitatory ear and a modulated response to the contralateral, inhibitory ear.
Physiol Rev • VOL
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
unlike choppers in the CN, they lack tuning in the tMTFs
(or rMTFs) to ipsilateral stimulation (129).
The simple time-to-rate conversion that occurs in
binaural SOC nuclei may have analogs in monaural processing, e.g., rMTFs in the SOC of the mustache bat
appear to be shaped by monaural excitatory and inhibitory interactions and delays similar to the binaural interactions described in cat and rabbit (91). The envelope ITD
sensitivity in MSO and LSO also illustrates the general
point that it is probably beneficial for a time-to-rate conversion (or more generally a recoding of a stimuluslocked temporal code into another form) to occur at a
peripheral neural level. Indeed, the upper frequency limit
(though not necessarily the gain) of phase-locking tends
to decrease with subsequent integrative stages so that a
de novo comparison of monaural phases by neurons at a
higher level in the neuraxis would yield a more restricted
ITD sensitivity. The frequency and modulation frequency
range over which ITD sensitivity occurs in the IC and
higher levels is comparable to that in the SOC but is
rate-based (66, 219). For example, the ITD sensitivity of
high-frequency cells in the IC extends to modulation frequencies to which the cells no longer phase-lock when
tested monaurally [on average 600 Hz binaurally vs. 250
Hz monaurally (12), see also sect. VIII, B and H]. Also,
envelope phase-locking in the monaural inputs to the LSO
extends to modulation frequencies more than an octave
higher than the highest fm at which LSO neurons show
ITD sensitivity (ⱕ800 Hz) (129). The use of temporal
information may thus be one evolutionary reason for the
extensive subcortical processing in the auditory system
relative to the other sensory systems.
Little is known about envelope sensitivity in other
nuclei of the SOC. Olivocochlear efferent neurons in the
guinea pig are surprisingly well phase-locked to AM signals below ⬃400 Hz (94), with bandpass tMTFs peaking at
⬃100 Hz. Maximal gains were ⬃8 dB higher than for
auditory nerve fibers recorded in the same experiments. It
is not known whether modulation differentially affects
the targets of the medial olivocochlear neurons (the cochlear outer hair cells), although AM signals have been
reported to be effective signals to suppress evoked otoacoustic emissions in humans (162).
Remarkable AM responses were described in monaural cells in the SOC of awake rabbits (145). Cells with
sustained responses showed responses to AM similar in
several respects to CN choppers, but an unusual class of
“off” cells was inhibited during the presentation of pure
tones and responded vigorously after stimulus termination. These cells were strongly driven by AM stimuli and
showed high gains over a wide range of modulation frequencies, resulting in low-pass tMTFs and rMTFs. Several
properties suggested that the responses were in effect a
rebound from inhibition phase-locked to the stimulus envelope, a mechanism also observed in the SOC of the bat (91).
Physiol Rev • VOL
555
VII. THE NUCLEI OF THE LATERAL LEMNISCUS
The nuclei of the lateral lemniscus (NLL) are embedded in the lemniscal fibers that connect the lower brain
stem nuclei with the IC (166, 262). As described in these
reviews, ventral (VNLL), intermediate (INLL), and dorsal
nuclei (DNLL) have been identified, although in some
accounts the intermediate nucleus is treated as part of the
ventral nucleus with all NLL neurons located ventral to
the DNLL referred to as the ventral complex of the NLL
(165, 166). Despite considerable progress over the last
decade in understanding the physiology of these nuclei,
only two accounts (both in echolocating bats) have described responses to amplitude modulation (Ref. 109, big
brown bat, Eptesicus fuscus; Ref. 310, mustache bat,
Pternonotus parnellii parnellii). In the big brown bat
synchronization to the modulation envelope occurred in
nearly all unit types in VNLL, INLL, and DNLL (109). Both
low- and band-pass tMTFs were reported. Neurons in
VNLL and INLL responded to the highest frequencies of
modulation with BMFs between 100 and 1,000 Hz and a
preponderance of low-pass tMTFs. In contrast, a narrower range of tBMFs (100 –500 Hz) was observed in
DNLL units with a high proportion of bandpass tMTFs in
sustained units. Responses to a similar range of modulation frequencies were recorded in the DNLL of mustache
bat but with differences in the responses of onset and
sustained units (310). Most onset units synchronized
equally well to modulation frequencies between 100 and
300 Hz but showed markedly bandpass rMTFs. Sustained
units responded up to 800 Hz with low-pass tMTFs and
flat rMTFs. Inhibition contributes very differently to these
responses. Blockade of GABAA receptors led to a reduction in synchronization at all modulation frequencies in
sustained neurons while onset units either increased their
modulation frequency cut-off to that of sustained neurons
or revealed synchronization where none existed before.
The shapes of the rMTFs were not changed by blocking
inhibition (310).
VIII. AMPLITUDE MODULATION ENCODING IN
THE INFERIOR COLLICULUS: A CENTER
FOR CONVERGENCE
A. Basic Organization of the IC
The several parallel pathways that diverge in the
cochlear nucleus from the common input of the cochlear
nerve converge again in the IC, the principal midbrain
nucleus in the auditory pathway. The IC is an obligatory
processing center for most information ascending via the
medial geniculate body to the auditory cortex. Anatomical
investigations of the IC in several species have identified
a broadly consistent arrangement of subdivisions: a cen-
84 • APRIL 2004 •
www.prv.org
556
JORIS, SCHREINER, AND REES
tral nucleus (CNIC) receiving most of the main ascending
afferent input from many brain stem nuclei is surrounded
dorsally, laterally, and rostrally by dorsal (DCIC) and
external cortices (ECIC) (166, 200, 201). The CNIC is
distinguished from the other subdivisions by its laminar
organization. It is composed of two main cell types
termed disc-shaped or flat cells interspersed with stellate
or less-flat cells (164, 182). This cytoarchitecture gives
rise in three dimensions to twisted laminae of cells and
fibers (167) that constitute the substrate for the highly
tonotopic frequency organization in the IC (173, 237, 252,
264). The frequency-band laminae are oriented so that
neuronal best frequency increases along the dorsolateral
to ventromedial axis of the nucleus. A defining feature of
CNIC is the convergence of temporal, spectral, and spatial
information extracted in parallel earlier in the pathway
onto this laminar structure. However, the full details of
how these converging inputs map onto individual neurons
have yet to be elucidated, and it is not known to what
extent the different strands of information are processed
independently in the IC.
The DCIC and ECIC as well as differing from the
CNIC in their cytoarchitecture have different inputs and
outputs. Descending projections from the cortex terminate, predominantly (304), although not exclusively, in
the cortical divisions (248). The IC is an important source
of both ascending fibers to the thalamus and descending
connections to lower brain stem structures (110).
The monaural and binaural response properties of
single neurons in the IC have been extensively documented (see Refs. 24, 112, 113). Despite the limitations in
our knowledge about its cellular organization, it is clear
that the output of the IC is considerably modified relative
to its input. This is exemplified by the response patterns
of IC neurons to complex sounds including AM. For the
most part, such knowledge is derived from studies in
anesthetized animals that have focused on neurons recorded in response to monaural stimulation of the ear
contralateral to the side of recording, and in what follows
monaural stimulation should be assumed unless specified.
Most of the studies discussed here describe recordings
attributed to the central nucleus, but depending on the
age of the study and the parcellation adopted, in many
cases this will have included at least part of the DCIC and
ECIC as well as the CNIC. Therefore, in this review the
term IC is used to indicate all subdivisions.
B. Modulation Transfer Functions for IC
Units: Synchronization
IC neurons show strongly modulated responses that
for many modulation frequencies greatly exceed the modulation in the stimulus (144, 222–224). Modulation gains
calculated from synchronized responses in the IC are
Physiol Rev • VOL
often 15–20 dB (144, 222, 224) and so are larger than
equivalent measurements obtained in the auditory nerve
and for most neuron types in the CN. The shape of the
tMTF depends on the parameters of the stimulus (see
below) but is invariably either bandpass or low pass (144,
152, 191, 222–224).
Modulation gain may be enhanced in the IC, but
modulation frequencies that elicit a synchronized response are restricted to a lower range than in the periphery. This is manifest in both the tBMFs of neurons in the
IC and the range of frequencies over which there is significant modulation of the response (Fig. 9). In the rat,
Rees and Møller (223) obtained a modal tBMF in the range
of 100 –120 Hz. The tBMF never exceeded 200 Hz, and the
high-frequency cut-off of the tMTF (measured 10 dB down
from the BMF) did not exceed 320 Hz. In guinea pig,
tBMFs fall below 150 Hz with most peaking between 50
and 100 Hz (224). Broadly similar values have been obtained in gerbil (144) and squirrel monkey (191). In the
latter, 73% of neurons showed a bandpass tMTF for AM
with tBMFs between 32 and 64 Hz. In rabbit, single units
and multiunit clusters had a mean tBMF of 87 Hz (12).
However, it is worth noting that one unit synchronized to
a modulation frequency of 925 Hz. For samples of phasic
neurons in both young and old mice, tBMFs were all
below 200 Hz (291). Similarly in mustache bat, the majority of units (⬃70%) only synchronized their firing to modulation frequencies below 300 Hz, but a small proportion
(4.5%) synchronized up to 500 Hz (20). While these values
are broadly similar, the differences that exist more likely
reflect species differences rather than the presence or
absence of anesthetic, since there is no segregation of the
values consistent with anesthetic status.
Rees and Møller (223) demonstrated that the shape of
the tMTF is highly dependent on stimulus level as in some
cochlear nucleus neurons. When stimulus intensity is
close to threshold, tMTFs are usually low-pass functions
but become more bandpass as the mean intensity of the
stimulus is increased. This change may be accompanied
by an upward shift in the tBMF. For neurons with nonmonotonic rate-level functions, however, the tMTF becomes low pass at sound levels falling on the negatively
sloping limb of the rate-level function (224). So the relationship between tMTF shape and sound level is indirect,
with firing rate, perhaps reflecting the net excitatory drive
to the neuron, being the better predictor of the lowfrequency slope of the tMTF. Why the effect of stimulus
level is only apparent at low modulation frequencies is not
clear and may depend on a number of factors including
adaptation. Another possibility is that the neuron’s probability of firing at low stimulus intensities is only high near
the peak of the modulation cycle resulting in highly synchronized firing. As intensity is increased, threshold is
exceeded for a larger fraction of the modulation cycle
leading to a reduction in synchronization. This effect
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
might not be apparent at high modulation frequencies
because the frequency of modulation approaches the neuron’s maximum firing rate, so ultimately only a single
spike occurs in each cycle giving a high degree of synchronization whose upper limit is determined by temporal
resolution. Such effects become more apparent in the
cochlear nucleus and IC than the auditory nerve because
of the enhancing effects of time-dependent inhibition,
membrane properties, and other nonlinearities in more
central neurons, evidenced by their lower spike rates.
Further evidence for a relationship between tMTF
shape and firing rate is provided by the effect of background noise. Bandpass tMTFs become low pass with the
addition of progressively higher levels of background
noise (223). Rees and Palmer (224) showed this change
correlated with the noise-induced shift in the neuron’s
input/output function along the level axis and its consequent effect on the firing rate elicited by a stimulus (224).
C. Modulation Transfer Functions for IC Units:
Average Rate
The most striking change in AM responses between
the IC and its peripheral inputs is in the tuning of rMTFs;
the dependence of average firing rate in the IC on modulation frequency is stronger, more common, and has a
much wider diversity of patterns than is the case in the
CN or the SOC (Fig. 6). (But it is important to note that
we have only limited information about rate responses
to modulation in the nuclei of the SOC and lateral
lemniscus.)
557
rMTFs show a wider range of patterns than is usually
observed for tMTFs. In the cat, Langner and Schreiner
(152) identified specific patterns of rMTF in a population
of single- and multi-unit clusters. These included bandpass, low-pass, high-pass, band-reject, or complex types.
The majority were bandpass (70% of single units, 58% of
multiunits). Similar response patterns are also found in
bat (32) and mouse (291). In guinea pig, 45% of rMTFs
were bandpass; the remainder included a variety of different shapes, with some units showing little effect of
modulation frequency on firing rate (224). Units whose
average firing rate did not change with modulation frequency were the most common type encountered in squirrel monkey, making up almost half of the total (191). The
most detailed study of rMTFs in the IC is that of Krishna
and Semple (144) in gerbil. In addition to confirming the
rMTFs shapes described previously, Krishna and Semple
(144) noted that many rMTFs were characterized by distinct ranges of modulation frequency over which firing
rate was enhanced or suppressed. In some, regions of
enhancement were separated by a marked region of suppression that defined a worst modulation frequency separating the two maxima.
Like synchronized responses, rate responses to modulation depend on the mean level of the stimulus (144,
224). Where units have bandpass rMTFs and monotonic
rate level functions, the heights of the peaks in the rMTFs
increase and then decrease with the average level. They
are highest when measured at sound levels on the sloping
portion of the rate level function and decline as the stimulus level rises into the saturating region of the rate level
FIG. 6. Transformations between the cochlear nucleus (CN) and inferior colliculus (IC).
Both nuclei show a wide variety of AM responses; each column highlights only one of the types
of responses observed and how these are affected by parametric stimulus variations (in SPL,
m, fc) in single cells. The most striking difference
between CN and IC is in the rate modulation
transfer functions (rMTFs), which are only
rarely sharply peaked in the CN (C) but frequently so in the IC (D), where they can also
show a degree of invariance with AM parameters. This is also the case for temporal modulation transfer functions (tMTFs) in the IC (B),
which reach higher maximal synchronization
values than in the CN (A) and often show a
degree of bandpass selectivity, but their maxima
occur at lower fm than in the CN.
Physiol Rev • VOL
84 • APRIL 2004 •
www.prv.org
558
JORIS, SCHREINER, AND REES
function (224). Across a population of neurons with
peaked rMTFs, increases and decreases in BMF with level
were observed (144). In units with rMTFs containing regions of suppression, the suppression often becomes
more prominent as stimulus level or modulation depth is
increased. In some instances, regions of firing rate enhancement changed to suppression at high stimulus levels. Krishna and Semple (144) postulate that inhibition is
an important contributor to these effects.
There is general agreement across species in the
modal value of the rBMF distribution in the IC. In the cat,
the modal value for rBMF lies between 30 and 100 Hz
(152). These values are in keeping with those reported in
rat (222, 223), guinea pig (224), gerbil (144), and bat (32).
In the primate, the peak of the distribution of rBMFs of
multi-units was 128 Hz (191).
There is less agreement over the upper frequency
limit for rBMFs. In the cat, almost 20% of multiunit clusters had rBMFs greater than 200 Hz as did ⬃5% of single
units (152). A few units had rBMFs as high as 1,000 Hz.
rBMFs of up to 800 Hz were also reported for some units
in bat (32) and mouse (291). In contrast, the maximum
rBMFs recorded for single units in gerbil did not exceed
300 Hz (103, 144), and in squirrel monkey, the maximum
rBMF value reported was 256 Hz (191). It is quite likely
that the differences between these studies reflect true
species differences, with there being no such creature as
the average mammal. However, other factors might be
contributory. The cat data show that rBMFs ⬎300 Hz
were more prevalent in multi-unit recordings. As Langner
and Schreiner (152) comment, multi-unit recordings may
contain responses from the fiber inputs to the IC as well
as its neurons. Given that some of these inputs originate
from nuclei in which neurons synchronize to higher modulation frequencies than in the IC, their contribution
could be misleading. On the other hand, units with high
rBMFs may be more difficult to record as single units, and
a small number of single units with high BMFs were
reported. Krishna and Semple (144) suggest that misclassifying the secondary peak of enhancement as the BMF in
those units with more than one rMTF peak might explain
the high rBMFs reported in cat. Apart from species differences, the presence or absence of anesthesia is another
factor that could account for the observed differences in
the ranges of rBMFs. However, it seems unlikely that
anesthesia is the only factor, since some of the largest
differences are seen when comparing data from different
species where no anesthetic was used [compare values
above for squirrel monkey (191), bat (32), and mouse
(291)]. On the other hand, similar values were obtained in
some anesthetized and unanesthetized preparations, e.g.,
cat (152) and mouse (291). Unfortunately, definitive experiments comparing the presence and absence of anesthetic have yet to be perfomed.
Physiol Rev • VOL
D. What Determines the MTF Upper Limit
in the IC?
Lower cut-off frequencies for both tBMF and tMTF in
the IC than at more peripheral stages of the pathway are
generally observed across species. The reasons for this
are not clear. In the auditory nerve, filter bandwidth is one
limiting factor as evidenced by the correlation between
the upper limit of the response to modulation and a fiber’s
CF (see sect. IVB and Fig. 2). However, evidence for a
similar relationship between the response to AM and CF
in the IC is weak. In the cat, the upper boundary of the
rBMF distribution (and presumably the tMTF distribution
since rBMFs and tBMFs are reported to be similar) for
multiunits increases with CF (152). But evidence of such
a correlation was not apparent in single-unit data recorded in other species [rat tBMF (223), squirrel monkey
(rate or synchronization not specified) (191), bat rBMFs
and tBMFs (32), or gerbil (144)]. Krishna and Semple
(144) examined a large data set and failed to find any
correlation between CF and rBMF or between CF and the
cut-off frequency of either rMTFs or tMTFs. Furthermore,
the frequency bandwidths of most IC neurons are sufficiently wide to accommodate the stimulus spectrum.
Thus it seems something other than frequency bandwidth
is primarily responsible for setting the upper frequency
limit of the response to AM in the IC.
An alternative possibility is that the shift in the response to lower modulation frequencies in the IC reflects
a reduction in temporal resolution. Such a reduction is
suggested by an upper frequency limit of 600 Hz for
phase-locking to pure tones in the IC, a substantially
lower value than pertains in auditory nerve fibers (147).
The mechanisms responsible have not been identified, but
intrinsic membrane properties and synaptic mechanisms
are possible candidates, as is the accumulated loss of
temporal resolution en route from the periphery. The
contribution of synaptic processing is now being investigated, but thus far blockade of inhibitory or excitatory
mechanisms has failed to show any significant influence
on the upper limit of synchronization. Neurons in the IC
of the mustache bat seldom responded to a wider range of
modulation frequencies following the blockade of
GABAA, GABAB, or glycinergic inhibition (20). This finding is in contrast to the marked increase in the upper limit
of synchronization in DNLL neurons in the same species
with GABAergic blockade (310). Similarly, neither blockade of N-methyl-D-aspartate (NMDA) (20, 321) nor DL-␣amino-3-hydroxy-5-methylisoxazole-propionic acid (AMPA)
excitatory receptors (321) resulted in changes in the upper
limit of synchronization. Similarly, in chinchilla, Caspary
et al. (28) found no change in the temporal response to
AM with blockade of GABAA receptors, but they did
report changes selectively affecting the low-frequency
limb of rMTFs in some units.
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
E. Is AM Encoded in the IC by Rate
or Synchronization?
Whether AM is encoded in the IC by synchronization
or by average firing rate remains an open question. Of
course, both measures may be important either independently or combined as synchronized rate. tMTFs and
rMTFs and BMFs match in many units, but in a significant
percentage of neurons they are different, with, in some
cases, no obvious dependence of rate on modulation frequency despite a clearly tuned tMTF (144, 152, 191, 222,
224). Population data on the MTF types obtained using
synchronized or average rate measurements were reported in the cat (152). Seventy percent of units had
bandpass rMTFs, and only 7% were low pass. In contrast,
a much larger proportion of tMTFs showed low-pass functions (48%) compared with bandpass functions (33%),
such that 60% of units with low-pass tMTFs had bandpass
rMTFs.
Nevertheless, the relationship between firing rate and
modulation frequency that emerges in IC might signal a
transformation in the encoding of AM from a temporal to
a rate-based representation, and models have been proposed explaining how this might be achieved (105, 149,
160). A common approach invokes coincidence detection
in IC neurons operating on synchronized responses to
modulation from stellate cells in the cochlear nucleus.
Although elegantly simulating many modulation responses of neurons in the IC, current implementations
match the BMFs of the IC neuron and its inputs from the
cochlear nucleus despite experimental data (cf. sects. V
and VII and Fig. 9) which suggest that the BMF ranges are
not the same.
As this discussion has shown, synchronized responses to the modulation envelope are well maintained
in the colliculus, and rMTFs are not simple reflections of
tMTFs. It is premature, therefore, to conclude that temporal based encoding of the modulation envelope has no
significance in the IC. Both rate and synchronized coding
might be retained with different functional consequences.
A rate code could allow the encoding of modulation frequencies that exceed the synchronization limit in the IC,
and the data of Schreiner and Langner (251) support this
conjecture as does the finding in squirrel monkey that the
distribution of rMTFs peaks at a higher frequency than the
distribution of tMTFs (191). On the other hand, some
studies show that synchronization and rate measures extend over broadly similar ranges of modulation frequency
(see sect. VIIIC).
F. Relationship Between AM Responses and Other
Neuronal Properties
Possible functional relationships between response
to AM and other physiological properties have not been
Physiol Rev • VOL
559
well explored in the IC (at least partly because there is no
generally accepted physiological classification scheme, as
is the case for the CN). A variety of firing patterns to tones
are recorded in the IC, and most authors have distinguished onset and sustained responses (see Refs. 112, 113
for review), which can be further subdivided into distinct
classes (e.g., Refs. 221, 290). Such patterns depend on the
state of intrinsic membrane conductances that in turn are
modulated by inhibition (155, 209, 268). Both sustained
and onset units can respond to continuous AM stimuli
that last several seconds (144, 222, 224). Although some
onset units fail to respond to AM, those that do respond at
modulation depths well below 100% negating the argument that the response is effectively to a series of tone
bursts. It does seem that onset units are the least likely to
respond to AM. In both bat and the rat, most of the units
failing to respond to modulation were onset types (32,
204). Other differences in the response to AM between
different unit types are also beginning to emerge. In bat,
average rBMFs increased progressively when comparing
the responses of tonic, chopper, and onset neurons (32).
Sinex et al. (267) report differences between unit types
and their responses to sinusoidal and trapezoidal AM.
Krishna and Semple (144) describe rMTFs with two peaks
separated by a region of suppression. These were predominantly seen in units with sustained or pauser PST
histograms. Onset or onset-sustained neurons showed
only a single peak of enhancement.
Another property of IC neurons correlating with the
response to modulation is regularity of firing. Regular
firing, as measured by calculating the coefficient of variation (320), is apparent in a number of different neuronal
types (221). A preliminary report (225) shows that units
with highly regular intrinsic oscillations show a strong
correlation between tBMF and the oscillation frequency.
On the other hand, cells with peaked rMTFs are mainly
limited to neurons that fire irregularly to tones.
G. Is Modulation Frequency Represented
Topographically in the IC?
Some of the responses discussed so far, in CN and IC,
provide suggestive evidence for a physiological implementation of a modulation filter bank. This view would be
strengthened if neurons were found to be spatially organized according to their AM tuning properties, since the
creation of spatial maps is a common strategy in nervous
systems. Evidence for a topographic representation of
modulation frequency in the IC of cat was reported by
Schreiner and Langner (251). rBMFs and tBMFs were
determined for units encountered in multiple penetrations through the IC at recording sites reconstructed from
the coordinates of the electrode penetration and the recording depth. The measured values, together with inter-
84 • APRIL 2004 •
www.prv.org
560
JORIS, SCHREINER, AND REES
polated points, were assembled to create a map of BMF.
Two patterns of rBMF organization emerged. First, a gradient of rBMF extended along the dorsoventral axis of the
colliculus with CF. Measurements of rBMF along such
electrode penetrations revealed a progressive increase in
rBMF with depth, although the overall trend was accompanied by discontinuities and reversals of rBMF. In addition, a map of BMF extended across the plane of the
frequency-band laminae. The highest BMFs were found
caudally in the lateral half of the lamina. Regions representing the highest BMFs were surrounded by “quasiconcentric” iso-BMF contours representing progressively
lower BMFs. The diameter of the contour representing
each BMF and the upper limit of BMF increased with CF.
Thus, considered in three dimensions, each modulation
frequency is represented on the surface of a cone having
its base located in the high-frequency region of the IC and
its long axis aligned with the dorsoventrally orientated
tonotopic axis of the IC (Fig. 7). Schreiner and Langner
(251) propose that this map demonstrates the importance
of the IC in the perception of periodicity pitch and that
such a representation could facilitate the integration of
periodicity information across carrier frequency. In support of the map, they cite the corroborative evidence that
response latency is spatially mapped across the frequency
band laminae in the IC (153) and that BMF is negatively
correlated with response latency. This implies that there
should be a mapping of BMF along the same axis as the
latency map. Evidence for a mapping of modulation frequency has also been reported in a developmental study
in the gerbil with responses to the highest modulation
frequencies found most laterally as in the cat (103).
The publication of such a mapping of BMF has been
influential in the development of theories and models of
temporal processing in the auditory pathway (35–37, 105,
149). However, a correlation of BMF with location or with
CF has not been confirmed in other studies; indeed, as
discussed above, there is still debate about the range of
modulation frequencies represented in the IC. Given the
concentric organization of the modulation map described
in the cat, it is unlikely that a pattern of such complexity
would be found unless it were the primary objective of the
study. But, as discussed in section VIIIC, the determination
of BMFs from multiunit data, on which most of the mapping is based, must proceed with caution. On the other
hand, it is difficult in single-unit studies to achieve the
necessary sampling density that such mapping ideally
requires. An additional complicating factor in this discussion is the lack of invariance of both tBMFs and rBMFs
with stimulus level (144, 223). Resolution of this issue
may depend on the development of techniques that enable the modulation response properties of large populations of neurons to be determined with high spatial and
temporal resolution. Finally, it should be emphasized that
the absence of a map would not invalidate the existence
of a modulation filter bank. As an analogy, there is some
evidence for a map of ITD tuning in the MSO (14, 269,
313), but a spatial organization in the IC has not been
convincingly demonstrated (315). Nevertheless, the rele-
FIG. 7. Bandpass temporal (A) and rate
modulation transfer functions (B) in the inferior colliculus (IC), with indication of best
modulation frequencies (tBMF and rBMF)
and cutoff frequencies. Various definitions
have been used for cutoff frequency, usually
based on a decrease in gain (e.g., the frequency at which the synchronization value
is 3 dB down from the maximal gain at the
BMF) or statistical significance (e.g., the
highest frequency at which significant synchronization is observed). C: schematic illustration of the proposed map of BMFs in
the IC. The concentric circles indicate isoBMF contours within an iso-frequency
plane. Dashed lines connect contours of the
same BMF.
Physiol Rev • VOL
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
vance of ITD tuning for binaural hearing is not in question.
H. Responses to Interaural Time Disparities
in Modulation Envelopes
Human subjects can localize sounds using on-going
ITDs, generated by the amplitude envelope even when the
carrier frequency of the sound is above 1.5 kHz and
subjects can no longer localize using interaural time differences in the carrier (see sect. VI). Physiological responses to such binaurally disparate amplitude modulations were first investigated systematically by Yin et al.
(317). Firing varied cyclically as a function of ITD, at a
period equal to that of fm, indicating that the neurons
were responding to the interaural delay of the modulation
waveform, not of the carrier. In many respects, ITD sensitivity in the IC strongly resembles that in the SOC, e.g.,
it reflects the same two basic forms of interaction (see
sect. VI and Fig. 5). There are also differences, indicating
an elaboration of response properties between SOC and
IC, but these are outside the scope of this review (13, 66,
172).
The width of ITD tuning to sinusoidal signals is basically determined by the period of the stimulus. Low
frequencies are weighted more heavily in responses based
on envelope than in those based on fine structure, because envelope MTFs of IC cells typically extend further
to low frequencies than their tuning to fine structure. ITD
tuning therefore is typically broader to AM signals than to
tones. However, even at high CFs, where phase-locking to
fine structure is completely lacking, the ITD tuning to
broadband noise can be surprisingly sharp (123). The
presence of such tuning, in the absence of any ITD sensitivity to pure tones, indicates that envelope fluctuations
generated by the interaction of the cochlear bandpass
filters with the broadband stimulus can effectively be
used in the computation of ITDs.
I. Contribution of Nonlinearities
For all but the lowest modulation depths, the response to a sinusoidal AM in the IC is not sinusoidal but
more peaked with firing restricted to only part of the
modulation cycle (144, 196, 222). As modulation depth is
increased, changes also occur in the phase of the response histograms relative to the stimulus (144, 196, 222).
Such changes are consistent with the response following
the amplitude envelope at low modulation depths but
changing to one which is sensitive to the rate of amplitude
change at high depths. Sometimes this is associated with
the appearance of a smaller second peak in the histogram
indicative of a response to the downward amplitude
change in the modulation cycle (222). Direct evidence for
Physiol Rev • VOL
561
such responses comes from experiments using modulations with exponential envelopes (215).
Similarly, asymmetries have been reported in both
the rate and temporal responses of IC neurons in guinea
pig to exponentially ramped and damped sinusoids (197).
When such ramped and damped stimuli have the same
half-life, their long-term spectra are identical, but their
different temporal structures generate quite distinct percepts (205). The percentage of units showing asymmetry
in the magnitude of their temporal or rate responses to
these stimuli is greater than obtained using similar analyses in the VCN (216), and the proportion of neurons
showing response asymmetry at each stimulus half-life
closely matched human psychophysical performance
(205).
A few studies have investigated nonlinearities in the
responses of IC neurons to AM using more complex modulation waveforms. Møller and Rees (189) recorded spike
histograms synchronized to the period of a pseudorandom noise used to modulate a tone carrier. Cross-correlation of the pseudorandom noise with the histogram to
obtain the impulse response followed by Fourier tranformation generates the tMTF. This estimate of the linear
component of the response correlates well with responses obtained using sinusoidal modulation. An estimate of the nonlinear component can be obtained by
using the impulse response to model the neuron, with the
difference between the neuronal and model outputs providing a measure of the nonlinearities present in the neuronal response. The nonlinearities were predominantly
even order, perhaps representing asymmetry in the response to increasing and decreasing sound intensity. Application of this technique to the owl IC similarly demonstrated the presence of significant nonlinearity (133).
Such nonlinearities are more prominent in the response of
IC neurons than those in the cochlear nucleus (184, 188).
The AM stimulus that ultimately holds the greatest
interest for auditory neuroscience is human speech. Delgutte et al. (40) compared the encoding of modulated
noise and a speech utterance at the levels of the auditory
nerve, CN, and IC. Step responses derived from the responses to modulation indicate that responses to amplitude changes in the IC are more phasic than those in the
auditory nerve and, to a lesser extent, the CN. This was
borne out by the responses to speech sounds that were
characterized by bursts of activity at the onsets of syllables. When the responses to the speech waveform were
estimated with the linear component of the modulation,
the model accurately predicted the neural response for
neurons in the auditory nerve and cochlear nucleus, but
the match for the IC was poor.
Although much less abundant than reports using sinusoidal modulation, these studies indicate that the emergence of nonlinear responses to modulated stimuli is a
defining characteristic of processing in the IC, and the
84 • APRIL 2004 •
www.prv.org
562
JORIS, SCHREINER, AND REES
greater application of such nonsinusoidal AM stimuli is
likely to add substantially to our knowledge of nonlinear
mechanisms in the IC.
IX. AMPLITUDE MODULATION ENCODING
IN AUDITORY THALAMUS AND
CEREBRAL CORTEX
A. Basic Layout of the Thalamocortical System
The medial geniculate body (MGB) of the thalamus is
an obligatory station for auditory information from the
midbrain to the cerebral cortex. Based on cytoarchitecture, connectivities, and physiological response properties, three main thalamic regions can be defined (304).
Similarly, auditory cortex consists of several distinct
fields that can be grouped into core, belt, and parabelt
regions according to connectivity and physiology (130,
218). We discuss the projection systems set up in the
thalamus and their relationship with the parcellation of
auditory cortex.
The ventral division of the MGB (MGBv) is considered the principal part and is functionally distinguished
by a clear tonotopy that is related to its laminar dendritic
organization. The MGBv is functionally homogeneous
with sharp frequency selectivity, short latencies, and low
response thresholds. Several properties, such as the density of inhibitory interneurons, sharpness of tuning, onset
latency, and strength of pure-tone phase-locking, vary
systematically along the anterior-posterior axis, i.e., orthogonal to the frequency gradient (236). The axons from
the ventral division terminate predominantly in tonotopically organized “core” areas of auditory cortex, specifically the primary auditory cortex (AI) as well as the
anterior and posterior auditory fields (AAF and PAF, respectively) in the cat and field R in the macaque monkey.
The projections from MGBv also reflect the anterior-posterior gradients so that, for example, AAF in the cat
receives stronger input from the anterior pole, whereas
PAF and the ventroposterior auditory field (VPAF) are
chiefly connected with the posterior pole. The same holds
for the numerous corticothalamic feedback projections
from the cortical core regions to the MGBv.
Two further projection systems parallel to the tonotopic system have been identified. One “diffuse” or nontonotopic system is routed through the dorsal division of
the MGB (MGBd). MGBd and its subdivisions are characterized by broad tuning, weak responses to tones, and
some preference for more complex sounds. The dominant
neurons are stellate cells, and the cortical projection is
predominantly to nontonotopical fields in the belt and
parabelt regions of auditory cortex such as the second
auditory field (AII) in cat and CM in the macaque monkey.
The third projection system is associated with the medial
Physiol Rev • VOL
division of the MGB (MGBm). This “magnocellular” area
is characterized by fairly large multipolar cells and receives polysensory inputs. No clear tonotopic organization is evident, and the neurons are usually broadly tuned
or have multiple response areas. MGBm projects to a
wide range of cortical fields including areas in the core,
belt, and parabelt regions, and it also receives widespread
corticothalamic feedback. In addition, the dorsal and medial projection systems are distinguished by their termination predominantly in layers I and VI, while inputs from
the main tonotopic system end in layers IV and III.
Functional differences between the three projection
systems and their associated regions have been mainly
explored using spectral properties, such as frequency and
intensity. Again, the importance of temporal dimensions
in the perception of complex sounds suggests that much
can be gained from the study of temporal response features in the different parts of auditory thalamus and cortex (31, 101, 210).
B. Temporal Responses in the MGB
Relatively few studies have addressed the capability
of thalamic neurons to encode temporal information. A
study of thalamic neurons in the awake guinea pig (34)
revealed that some neurons phase-lock to AM tones with
modulation frequencies up to 200 Hz. A more systematic
study in the awake squirrel monkey (217) showed that
most tMTFs were bandpass with tBMFs between 2 and
128 Hz. The most commonly encountered tBMF was at 32
Hz. MGBm had a higher median tBMF (16 Hz) than MGBv
(8 Hz). Over the range of modulation frequencies tested,
no significant difference was observed between rBMFs
and tBMFs. This suggests that AM coding in the thalamus,
at least below ⬃100 Hz, is mostly conveyed by a temporal
code accompanied by rate changes due to the phasic
nature of the responses. To date, there is little information
available that directly contributes to the question of the
increasing prominence of rate-coding in the more central
auditory stations.
Changes in modulation depth affect rate and synchronization differently; synchronization increased with
increase in m, while the firing rate showed a nonmonotonic dependence. Changes in overall intensity of the AM
signal resulted in either monotonic or nonmonotonic
changes in firing rate and synchronization, with a higher
percentage of nonmonotonic changes in synchronization.
Recently, a number of studies in a variety of structures have utilized complex auditory spectra to estimate
the spectrotemporal receptive field (STRF) of neurons
using reverse correlation methods (e.g., Refs. 2, 38, 56, 58,
142, 143, 148). The STRF can be interpreted as the average
signal preceding an action potential, corresponding to the
spectrotemporal impulse response of the neuron. STRF
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
estimates of temporal resolution can be directly related to
estimates using isolated AM sounds and would yield the
same result in a linear system. Additionally, the use of
complex spectra can reveal nonlinearities such as the
dependence of the estimated filter shape on spectral and
temporal depth of modulation and overall intensity. A
recent analysis of temporal filter properties derived from
STRFs in MGBv of ketamine-anesthetized cats (175) (Fig.
8) revealed a similar range of tBMFs (35 ⫾ 30 Hz) to that
observed in the awake guinea pig and squirrel monkey
(34, 217). As seen with isolated AM signals, individual
neurons could follow modulation frequencies above 100
Hz. Compared with AM responses in the IC, it appears
that the overall range of temporal following capacity in
the auditory thalamus is considerably reduced (Fig. 9).
A number of studies that have explored the coding of
click trains in the auditory thalamus contribute significantly to our knowledge of temporal coding in the MGB.
Changes in fm of an AM stimulus result in the systematic
change of two potentially confounding aspects of the
stimulus, namely, a change in the period between events
and a change in the rise time of each event. To avoid the
effects of rise-time changes with repetition rate, click
trains have been widely used to explore temporal coding
properties. While these two methods are not totally equivalent, they do capture closely related aspects of repetition
rate coding. One of the first studies of temporal coding in
the thalamus was carried out using click trains (284) in
the awake, paralyzed cat. As in AM studies, maximum
limiting rates (i.e., the highest click rate that showed any
evidence of phase-locking) varied widely between 6 and
200 Hz. These findings were confirmed and expanded in a
series of studies by Rouiller and colleagues (240, 241) in
nitrous oxide-anesthetized cats. These investigators distinguished neurons by differences in the temporal precision of the responses. The largest group of neurons
(“lockers,” 71%) showed tight temporal locking to the
clicks. “Groupers” (8%) responded with weak temporal
synchrony, and “special responders” (21%) showed no
clear phase-locked responses although changes in firing
rate did occur, occasionally resulting in strongest responses for click rates between 200 and 400 Hz. Overall,
limiting rates between 10 and 800 Hz were observed, and
⬃50% of lockers had a limiting rate greater than 100 Hz.
Keeping in mind that these limiting rates were not extracted at the 50% value of the transfer functions (the
traditional measure of limiting rate), and the inherent
differences between click-train analysis and AM analysis,
the actual range of temporal resolution estimated by this
method appears to be compatible with that observed in
AM studies.
Rouiller and De Ribaupierre (240) reported some
differences between thalamic subdivisions regarding the
percentage of lockers. More lockers were located in the
anterior region of MGBv than in the posterior portion, and
the highest limiting rates were also encountered in the
anterior part. They observed no clear CF dependency for
the distribution of lockers but noticed that the lockers
had shorter latencies than groupers and special responders. Furthermore, lockers with limiting rates above 100 Hz
had response latencies ⬃2–3 ms shorter than lockers with
limiting rates below 100 Hz, similar to the latency-BMF
correlation found in the IC (153). No obvious differences
in the distribution and range of limiting rates were found
between recordings made in the nitrous oxide-anesthetized and awake preparations.
In summary, AM phase-locking in thalamic neurons
varies over a wide range from a few Hertz to several
hundred Hertz. Some neurons can follow high rates, but
the majority of neurons appear to peak at rates below 100
Hz. A subgroup of neurons may respond to temporal
information with changes in firing rate rather than in
FIG. 8. tMTFs in the medial geniculate body (MGB) and primary auditory cortex (AI). Typical example tMTFs
(synchronized firing rate) from neurons in the ventral division of the MGB (A) and in AI (anesthetized cat) (B). C:
composite tMTFs for thalamus (dashed line) and cortex. By averaging all tMTFs for thalamic and cortical units
separately, the temporal modulation filters of these two stations are approximated. The dotted lines indicate the 6-dB
upper cut-off frequency. [Adapted from Miller et al. (176).]
Physiol Rev • VOL
563
84 • APRIL 2004 •
www.prv.org
564
JORIS, SCHREINER, AND REES
FIG. 9. An overview of rMTF (left panel) and tMTF (right panel) properties at different anatomical levels. Each entry
shows means or medians (circles) ⫾ SD (lines) and lowest and highest values (bar). Dark bars, thick lines, and solid
circles are for rBMFs (left) and tBMFs (right); light bars, lines, and empty circles are for upper tMTF cutoff frequencies
(right). For convenient comparison, the left panel is arranged mirror-symmetric with respect to the right. The population
measures are taken from published data for one anatomical level, sublevel, or cell class; the numbered reference to the
publication is shown next to the data, followed by a letter indicating the species (b, bat; c, cat; g, gerbil; gp, guinea pig;
m, marmoset; r, rabbit; s, squirrel monkey), and the letter “U” if unanesthetized. Note that part of the differences between
studies reflects differences in the metrics used (in particular upper cutoff, which is often defined as a corner frequency
or alternatively as the upper limit of significant phase-locking). Approximate ranges of perceptual and sound classes are
indicated below the abscissa.
phase-locking; however, the proportion of such a group
and its properties are still unexplored. It appears that the
majority of neurons show limiting rates below that of the
IC, but a detailed comparative study of the transformation
of temporal coding from the IC to the MGB is still lacking.
C. Responses to AM in Primary Auditory
Cortex: Synchronization
A number of studies provided initial evidence that
temporal coding in auditory cortical neurons may be substantially reduced compared with subcortical levels (Fig.
9). Studies with FM and AM in the awake cat (300) and
Physiol Rev • VOL
guinea pig (34) showed neurons had maximum following
rates of ⬍30 Hz. In later studies, the range of synchronization of AI neurons to AM was systematically explored in
a variety of species. A high percentage of neurons showed
band-pass tMTFs (53, 75, 157, 256). The tBMF values in AI
were found to be independent of the CF of the neurons
(53, 157, 256). Accordingly, temporal information in different frequency channels can be processed independently from each other; within each spectral band, AM
information can be decomposed by different neurons into
different AM ranges. Much attention has therefore been
given to the distribution of optimal modulation frequencies. Preferred modulation frequencies commonly vary
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
between 1 and 40 Hz with the vast majority of tBMFs
below 20 Hz. Across all studies, tBMFs above 50 Hz were
encountered in only a very small percentage of neurons
but could occasionally be as high as 100 Hz (17, 157, 255,
256). The composite tMTF in cat AI (ketamine anesthesia), constructed as the weighted sum of all tMTFs measured, shows a tBMF of 12.8Hz and a 50% cut-off frequency of 37.4 Hz (Fig. 8) (176).
It is tempting to regard the presence of modulation
tuning and the range of BMFs as a physiological implementation of a modulation filterbank (e.g., Ref. 35), but
the functional consequences of these cortical (and subcortical) observations are at present unclear and should
not be overstated. When “spatial-frequency channels”
were first described in visual psychophysics and spatialfrequency tuning was later found physiologically, it was
suggested that these channels formed the basis for a
visual Fourier analysis of the retinal image, but this notion
has been discredited (303). There is currently no unequivocal evidence that modulation tuning underlies an analysis of the modulation spectrum in the sense that the
cochlea performs an analysis of stimulus spectrum. For
example, will an envelope with a low fundamental (e.g., to
speech syllables) but fast components (i.e., broad envelope spectrum) recruit neurons tuned to high modulation
frequencies? Is the relative phase of different envelope
components somehow reflected in neural synchronization
or average rate? Even if modulation-tuned neurons do not
perform a full envelope decomposition in the Fourier
sense, it is easy to see that such envelope tuning could be
useful in other ways. For example, modulation tuned
channels could parse spectral stimulus components according to their dominant modulation frequency so that
the spectral components with a common modulation frequency can be grouped in a further step.
Differences in temporal processing between cortical
neurons and their thalamic inputs are not only evident
from population comparisons but were directly observed
in functionally connected thalamocortical neuron pairs
(34, 175) and were also evident in current source density
analysis of the thalamic input and cortical output layers of
AI (274). While these correlation studies reveal a reduction of temporal following capacities from MGBv to AI,
the temporal modulation preferences in thalamus and
cortex are not correlated by rank (175), i.e., thalamic cells
with high (low) BMFs do not preferentially project to
cortical cells with high (low) BMFs. These findings
strongly suggest that a transformation of temporal response properties takes place at the thalamocortical interface.
The width of the transfer function provides a measure of response selectivity. For individual neurons, the
bandwidth of tMTFs, estimated at 50% of the maximum, is
in the range of the BMF values but can vary by a factor of
⬎5 (53, 176, 256) in the anesthetized cat. Bandwidth
Physiol Rev • VOL
565
variations of tMTFs in the awake marmoset monkey (157)
are of similar magnitude. This means that AM selectivity
varies considerably among cortical neurons but that overall the selectivity is relatively poor.
Variations in species, anesthetic state, and estimation
method between the different studies do not permit an
easy comparison to sort out these different influences on
envelope processing. However, it appears that neither
anesthesia nor species-specific effects provide strong influences on the tBMF distribution of cortical neurons.
This is not to say that there are no anesthetic effects;
however, given the fairly large range of variability in the
conditions of these studies, a simple group evaluation is
unlikely to provide such evidence.
The range for time-locked AM coding appears to be
limited to the envelope frequencies underlying the perception of rhythm, roughness, and the following rate of
syllables in communication sounds. The cortical coding of
higher modulation frequencies, important for voicing or
periodicity pitch information, does not seem to fully utilize the same temporal code.
D. Responses to AM in Primary Auditory Cortex:
Average Rate
In view of the successive reduction in envelope synchronization already discussed for the different synaptic
stages leading up to cortex, it is not too surprising to find
the reduction in tBMF. Adverse effects on synchronization should however not necessarily affect rate tuning.
For example, exquisite frequency and ITD selectivity in
average rate is found at the cortical level and can be
sharper than in the brain stem. Therefore, we expect to
find envelope tuning in rMTFs, as it is already prominently
present in the IC.
Bandpass rMTFs are indeed found but appear less
common than bandpass tMTFs. In the rat, ⬎90% of the
tMTFs showed bandpass characteristics while only 30% of
the rMTFs were bandpass (75). In AI of the awake squirrel
monkey (17), this difference was less pronounced, with
bandpass behavior for 49% of the tMTFs compared with
39% of rMTFs. The remaining neurons were either low
pass, high pass, all pass, or had complex filter shapes.
Similar results were reported for the cat (48). In awake
marmosets, 73% of AI units had bandpass rMTFs, and
many neurons were only driven when temporal modulations were present (157).
An important difference with tMTF tuning is the consistent observation that the tuning for rMTFs extends to
higher modulation frequencies, although it is still quite
limited compared with the brain stem. There is also a
fairly large variance, possibly related to the use of anesthesia, in the reported range of rBMFs and upper cut-off
frequencies (e.g., as defined by a 50% reduction in rate)
84 • APRIL 2004 •
www.prv.org
566
JORIS, SCHREINER, AND REES
obtained across the various studies in AI (Fig. 9). The
majority of rBMFs in anesthetized studies are below 50 Hz
(46, 49, 53, 75, 256). Studies in awake animals (17, 34, 157,
190, 247, 260) yielded rBMFs that were either not substantially different from those in anesthetized animals or differed by less than a factor of two. The effect of anesthesia
seems to affect the strength of the response (sustained in
unanesthetized animals, onset under anesthesia) more
than the range of BMFs. The reduction of the upper
cut-off frequencies in tMTFs by anesthesia may be more
substantial than on rMTFs (52, 83, 163) and may affect the
temporal coding capacity for the highest temporally
coded AM frequencies including the range of AM frequencies associated with the perceptual attributes of roughness and periodicity pitch (64).
The general finding that BMFs and upper cut-off frequencies are higher in the rMTF than in the tMTF led
Bieser and Müller-Preuss (17) to suggest that “low modulation rates were mostly encoded by phase-locked neural responses and the higher AM sounds by non-phaselocked spike rate variations.” While the experimental evidence for this claim was suggestive but not conclusive,
Lu et al. (161) demonstrated more forcefully that this
notion might indeed be true and proposed a two-stage
model in which temporal modulations are combined over
an integration window of ⬃30 ms; temporal patterns separated by intervals longer than 30 ms are coded explictly
in temporal form, while more rapid patterns are coded
implicitly by average rate.
It is not entirely clear whether this scheme can fully
account for the coding of modulations since, even in
awake animals and for only a small fraction of the cells,
rBMFs reach maximal values of only a few hundred Hertz.
This is only an octave above the highest tBMFs (even
when measured on the same cells, e.g., Ref. 157) and
lower than the upper limit for periodicity pitch (⬃ 800 Hz)
and modulation detection (⬃2.2 kHz). The markedly reduced cortical upper limit, particularly compared with the
brain stem, is in stark contrast to the upper limit for ITD
sensitivity to AM signals, which appears not to differ
between cortex and brain stem and extends to modulation frequencies up to 1,000 Hz (awake rabbit, Ref. 67).
Thus envelope-based ITD tuning created in the brain stem
is relayed without degradation or recoding to AI, whereas
this does not appear to be the case for AM bandpass
tuning.
Schulze and Langner (259, 261) suggested an alternative coding strategy; in AI of the awake as well as the
anesthetized gerbil, these investigators showed rate tuning of cortical neurons to AM between 50 and 3,000 Hz,
clearly outside the range of cortical phase-locking, but
only when the carrier frequency was placed far above the
cell’s CF. A preliminary study (171) reported similar sensitivity in the IC but attributed the mechanism to difference tones generated in the cochlea, i.e., interpreted it as
Physiol Rev • VOL
a spectral rather than a temporal effect. Since psychophysical studies indicate that the perception of periodicity pitch does not depend on difference tones, it is unclear
whether the mechanism proposed by Schulze and Langner provides its neural basis, although the authors raise
several indirect counterarguments against the role of difference tones as the explanation for their observations.
Overall, then, the timing of cortical discharge encodes low modulation frequencies corresponding to the
perceptual ranges characterized by rhythm and fluctuation strength (48, 53, 60) and, potentially, roughness (64,
255). A code based on the mean firing rate may represent
fast AMs such as those associated with periodicity pitch,
but it remains unclear whether these two coding strategies adequately explain AM coding over the entire perceptual range.
E. Responses to AM in Primary Auditory Cortex:
Influence of Modulation Parameters
The results discussed above were mostly derived
with a modulation depth (m) of 100%. Decrease in m
results in monotonically reduced synchronization (60),
especially for m ⬍ 0.5 (49). In the awake squirrel monkey,
86% of the neurons had maximum synchronization for
80 –100% modulation and showed a monotonic decrease
with reduction of m. Average firing rate was essentially
constant as function of modulation depth (17). Values of
rBMF and tBMF were little affected by m in the awake
marmoset (157).
Changes in the overall intensity resulted in minor
influences on BMF, cut-off frequency, and shape of the
MTF (46, 157, 255). However, the firing rate showed a
strong effect with intensity revealing a limited range of
best levels (49). Phillips and colleagues (211, 212) noticed
intensity-specific differences between the responses to
low and high modulation frequencies. Better responses
were observed for higher modulation frequencies at low
intensities and for low modulation frequencies at higher
intensities; that is, the shape of MTFs can be level dependent. The rMTF appears to be more resistant to changes
in SPL than the tMTF (157).
In a few studies, the effect of the modulation waveform was investigated. These observations suggest a common temporal window within which afferent signals are
integrated. Rectangular AM resulted in stronger response
synchrony than sinusoidal AM, but the tBMFs were similar (255, 256). Modulation with an exponential sine-wave
envelope increased the sharpness of modulation tuning
with decreasing duty cycle but showed no dramatic effects on BMF or cut-off frequency (49). Temporal synchronization to binaural beats (generated by binaural interaction in the brain stem, see sect. VI) also revealed
cut-off frequencies of ⬍40 Hz (219). Moreover, results
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
from the awake primate (157) indicate that BMFs for AM
and FM are often closely matched for single neurons.
Using dynamic ripple spectra, i.e., spectral envelopes
that are periodic along the frequency axis, to determine
the temporal impulse response properties in AI by reverse
correlation in anesthetized ferrets (142) and cats (175)
revealed tBMFs that essentially overlapped with the value
range seen in several other species estimated with AM
tones. Direct comparison between two carrier types
showed either no significant difference in the tBMFs for
tonal and noise carriers (53, 217) or an average tBMF that
is slightly lower for tonal carriers (49). This suggests that
the carrier bandwidth may have little influence on temporal coding properties.
F. Differences of Temporal Coding Between
Cortical Fields
In view of the differences between thalamic subdivisions in terms of thalamocortical connectivity (see sect.
IXA) and temporal responses (see sect. IXB), it is of interest whether neurons in different cortical fields also differ
in their ability to code temporal information (Fig. 9). Field
AAF in the cat, a component of the core area like AI,
shows evidence of higher BMFs and limiting rates than AI
(111, 255, 256). There is some evidence of spatial clustering in AAF with faster following neurons more abundant
for CFs above 10 kHz (53, 111, 255). Further evidence of
faster following rates in AAF over AI has been obtained
from STRF measurements in mice (159). The duration of
STRFs from AAF was found to be shorter than in AI.
Because STRF duration is inversely related to the BMF of
tMTFs, it follows that AAF neurons have higher BMFs
compared with AI. Another predictor for repetition following capacity is the onset latency of isolated CF tones
or clicks. Schreiner and Raggio (253) reported a weak but
significant negative correlation in cat AI for click latency
and BMF, similar to results in the IC (153) and MGB (240).
Onset latencies in AAF of cats (111) and mice (159) are
shorter than in AI, further supporting the notion that AAF
has a higher following capacity than AI.
Cortical fields outside the core areas seem to perform
at even lower temporal fidelity than that found in AI. In
the cat, tBMFs and rBMFs of cortical fields AII, PAF, and
VPAF were 20 – 80% of those seen for AI (53, 256). Similar
results were found in the awake squirrel monkey (17). In
the latter study, three groups of cortical fields could be
distinguished based on their temporal properties. A group
containing AI had average BMFs of ⬃8 Hz; a group that
included the rostral field and the insula had BMFs of 4 Hz
and below, and a group containing the anterior-lateral
field had a predominance of BMFs around 2 Hz. Combined, these findings suggest that hierarchically “higher”
auditory cortical fields primarily receiving input from thaPhysiol Rev • VOL
567
lamic projections other than the ventral nucleus appear to
show slightly but consistently slower following capacity
when tested with AM stimuli than primary cortical fields.
G. Cortical Mechanisms
The cause for the reduced temporal following capacity of cortical neurons compared with subcortical stations
is still not entirely clear. A diversity of cellular and network properties are likely to affect cortical temporal behavior. These include mechanisms of adaptation and postexcitation suppression (19, 25, 116), postsuppression rebound (42, 47, 75), intrinsic oscillation (42, 75, 106, 134,
249), and synaptic depression (1, 169, 170). It has been
suggested that tBMFs are largely determined by processes
intrinsic to the cortical-thalamic network while cut-off
frequency seems to be influenced by intrinsic pyramidal
cell mechanisms (51). Models that include dynamic synaptic processes have been proposed that can account for
many aspects of cortical responses to various repetitive
signal envelopes, including sinusoidal AM stimuli (41, 54,
55). Eggermont (55) demonstrated that the envelope synchronization of cortical activity can be modeled based on
two main components: the degree of input or presynaptic
synchrony and the shape of a temporal filter that is determined by properties of synaptic dynamics. The input
synchrony is highly dependent on the shape of the envelope waveform and reflects peripheral integrative mechanisms that determine response latency and spiking jitter
(102). The properties of the synaptic dynamics are less
stimulus dependent and reflect cortical synaptic activity
changes after repeated stimulation that cause short-term
synaptic depression or facilitation (1, 169, 170). The synaptic dynamic acts as a temporal low-pass filter on the
synchronized input and is dominated by synaptic depression. This two-stage model of cortical modulation transformation holds great promise in unifying many aspects of
temporal envelope processing (55) and other temporal
behaviors of cortical neurons (41). It is likely, however,
that other, conceivably nonlinear, influences also contribute to the shaping of MTFs. This is indicated by the
observed relationships of onset latency and the period of
intrinsic oscillations with BMF as well as the effects of
spectral and temporal stimulus composition on cortical
adaptation behavior (19, 280).
H. Temporal Coding of Complex Sounds
Most studies of complex multisyllable or multi“phrase” communication sounds in auditory cortex noted
that neuronal responses were predominantely located at
the beginning of each phrase provided that the phrases
did not follow each other at rates of more than 20 –30 Hz.
This effect was not dependent on the species-specific
84 • APRIL 2004 •
www.prv.org
568
JORIS, SCHREINER, AND REES
nature of the calls and was seen for speech sounds as well
(50). For example, responses to bird songs in cat auditory
cortex (273) showed preferred response intervals corresponding to ⬃10 Hz. Responses to species-specific calls in
awake squirrel monkey (74), anesthetized squirrel monkey (192), and anesthetized marmoset (292) all showed
“phrase”-locking in the response to repetitive call phrases
around 8 –12 Hz. Similar values were obtained in the
awake guinea pig to various bird and guinea pig vocalizations (34). Wang et al. (292) tested whether the temporal
response to complex sounds was tuned like the response
to more elemental sounds by using stretched and compressed natural vocalizations of marmosets, without
changes in the spectral content of the calls. The responsiveness to the calls was maximal at the natural repetition
rate of the phrases near 8 Hz. In other words, the tMTF of
most neurons was tuned to the repetition rate of the
natural call. Similarly, Nagarajan et al. (192) reported that
the response modulation rates of cortical neurons activated by vocalizations in the marmoset monkey were
highly correlated with the BMFs found for AM tones.
The pulse repetitions in echolocation calls of bats are
another example of temporal structures that require detailed processing by the auditory system. Phase-locked
responses of cortical neurons in the bat occur over similar
ranges as found for AM and click trains in other mammalian species. Sixty percent of BMFs in AI of Eptesicus
fuscus were at or below 10 Hz but could be as high as 83
Hz (116). Pulse repetition coding in the awake FM bat
Myotis lucifungus and the mustached bat Pteronotus
parnellii had limiting rates of ⬃100 Hz (308) and up to
300 Hz (276), respectively, commensurate with the behaviorally relevant range of timing used in echolocation.
A likely strategy for encoding of complex sounds in
auditory cortex is by the temporal-spatial discharge pattern of distributed neuronal populations across the cortical fields (34, 207; see also Refs. 30, 39). Initial studies of
the response of cortical neurons to vocalizations (34, 306,
307) combined with more recent studies of the detailed
representation of species-specific vocalizations (192, 292)
and speech sounds (309) in the primary auditory cortex of
New World monkeys and cats provide evidence that behaviorally relevant vocalizations are well represented by
spatially distributed but temporally highly coherent neuronal discharges. At major transitions during the course
of the signal, a temporally coherent activation of specific
neuronal subpopulations across the cortical fields is created. The synchronous timing of responses across many
sites in primary auditory cortex (and in parallel in other
cortical fields) may provide the necessary means for appropriate grouping or segregation of sequential elements
in ongoing foreground and background sounds. The range
of modulation frequencies spanned by cortical tMTFs of
generally moderate selectivity may be sufficient to provide representational and, perhaps, perceptual invariPhysiol Rev • VOL
ances of complex sound sequences despite potentially
large variations in phoneme rate or in the sequence rate of
musical tones. The distributed representation of temporal
envelope information in each carrier frequency band allows a segregated processing of different temporal phenomena within a given frequency “channel” as well as
processing of similar temporal aspects across frequency
channels (194).
I. Plasticity of Temporal Coding Properties
in Auditory Cortex
Studies of representational plasticity in auditory cortex of adult animals have largely focused on spectral
properties, but several studies have recently examined
temporal properties and reported use-dependent changes
in the tMTF. Beitel et al. (15) trained owl monkeys to
discriminate between two different, sequentially presented, AM rates and rewarded the animals when they
correctly indicated that the second stimulus had a higher
AM rate. The modulation frequencies were chosen to be
in a range (4 – 40 Hz) where they could induce phaselocked cortical responses. Over the course of the training,
AM discrimination thresholds gradually improved. Analysis of the tMTFs of the trained animals revealed that the
shape of the transfer function changed dramatically. As a
consequence, average limiting rate more than doubled
from 12 Hz to ⬎30 Hz, and BMF increased from 8 to 15 Hz.
This result indicates that temporal coding properties of
cortical neurons can be modified by learning.
Studies in rat AI investigated the influence of the
statistics of the input signal on the reorganization of
auditory cortex (138, 139). Stimulation of the nucleus
basalis in the basal forebrain has been shown to increase
the potential for cortical plasticity without explicit behavioral training of the animals (45, 92, 98, 174). Pairing of
nucleus basalis stimulation with acoustic stimulation
(139) caused pronounced changes in the tMTFs which
depended on the temporal properties of the stimuli paired
with the electrical stimulation (Fig. 10). A 20 – 40% increase of the BMF and cut-off frequency was observed
when the modulation frequency of the acoustic stimulus
was slightly higher than the normally observed values of
the tMTFs. Pairing of electrical stimuli with modulation
frequencies below the normal tBMF values caused a decrease in the neuronal cut-off frequencies.
These results indicate that important aspects of temporal properties of the cortex undergo plastic reorganization, reflect aspects of the temporal statistics in the input
stimuli, and can be modified by mechanisms involved in
learning to match specific auditory tasks even in fully
mature animals.
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
FIG. 10. Temporal plasticity in primary auditory cortex. Prolonged
pairing of an AM stimulus (15-Hz trains of tones, random carrier frequencies) with electrical stimulation of the nucleus basalis resulted in a
shift of the population tMTF to higher AM frequencies (dashed lines)
compared with unstimulated control animals (solid line). [Adapted from
Kilgard et al. (139).]
X. NEUROPHYSIOLOGICAL AND
PSYCHOLOGICAL STUDIES IN HUMANS
A number of techniques are beginning to provide
information about the analysis of AM in the human brain.
MTFs generated from steady-state evoked potentials and
magnetic responses to the envelope of modulated sounds
(e.g., Refs. 146, 213, 220, 235, 239) are at least qualitatively
similar to modulation sensitivity demonstrated psychophysically. It is noteworthy however that neither psychophysical nor event-related potential measures show much
evidence of the bandpass tuning to modulation that is a
feature of many single-unit responses. Estimates of group
delay in evoked potentials (119, 146) suggest that responses to low fm are predominantly generated at the
cortical level and those to high fm in the brain stem.
Magnetic responses in auditory cortex suggest a mapping
of modulation frequency that lies orthogonal to the tonotopic axis (151). These magnetic responses lock to the
temporal envelope of speech signals, and the degree of
locking correlates with speech comprehension (3). Functional imaging of the brain with functional magnetic resonance imaging (fMRI) has also been applied to the study
of modulation: the repetition rate at which a tone burst
best elicits a BOLD (blood oxygen level dependent) response decreases progressively from midbrain to thalamus to cortex, with values not dissimilar to those found in
single-unit recordings from these structures in other
Physiol Rev • VOL
569
mammals (97). A progressive shift in favor of low modulation frequencies at more central locations was also reported in an fMRI study using sinusoidally amplitude
modulated white noise (77). In addition, this study also
reported some evidence for restricted cortical regions
responding better to low or high modulation frequencies
but no systematic topographic representation of modulation frequency. At the cortical level other nonsensory
factors are likely to play a role in the processing of
modulation. Hall et al. (95) have demonstrated that activation of the planum temporale caudal to primary auditory cortex is influenced by attention to modulation.
Ablations and lesions of auditory cortex have been
shown to interfere with the processing of temporal tasks,
such as the order of events (193), discrimination between
10- and 300-Hz trains of noise bursts (277), the detection
of AM frequencies below but not above ⬃30 Hz (89), and
the perception of periodicity pitch (299), to name a few
examples. Studies in patients with primary cortical lesions resulting in “word deafness” also show evidence for
deteriorated temporal processing capacities (88). In addition, it has been argued (87) that the pathway up to and
including primary auditory cortex is not sufficient for the
detection of continuous AM in humans. The range of
these perceptual deficits encompasses the cortical range
of temporal as well as the rate-encoded AM frequencies,
corroborating the importance of the coding of envelope
phenomena in auditory cortex and in some of the cortical
regions to which it connects.
XI. CONCLUSION
Our examination of modulation processing at different anatomical levels reveals a patchy picture with many
unsolved issues. As is generally the case in sensory systems, the representation evolves from isomorphic in the
periphery to abstracted at the cortical level. Two general
trends are clearly discernable with ascending levels: 1) a
recoding of modulation selectivity from temporal form to
average rate and 2) a decrease in the highest modulation
frequencies encoded (either temporally or in average
rate). While the first trend seems sensible, the second
trend is puzzling, in particular the limited upper frequency
limit at the cortical level. This observation, as well as
others, may lead to the skeptical view that modulation
encoding and selectivity at the different anatomical levels
is epiphenomenal, in the sense that it is a necessary
outcome of other properties (e.g., frequency tuning, adaptation, sensitivity to rise time, connectivity, membrane
properties, synaptic dynamics) and that the gradual
changes with anatomical level merely reflect change in
these properties but do not indicate processing (e.g., the
assembly of higher-order selectivities or the recoding of
84 • APRIL 2004 •
www.prv.org
570
JORIS, SCHREINER, AND REES
envelope synchronization into a spatially distributed rate
code).
However, if we ignore differences along the auditory
neuraxis for a moment and take stock of the variety of
responses reviewed, a rather optimistic view emerges of
neural mechanisms dedicated to AM processing. Indeed,
these responses show some of the key properties that are
generally considered indicative for the coding of stimulus
parameters. Tuning to modulation frequency is prominently present temporally and in average rate, and the
range of optimal modulation frequencies so represented
spans perceptually relevant ranges. The tuning can show
invariance with SPL, modulation depth, and type of carrier and be predictive of the response to complex modulation waveforms in natural stimuli. There is even suggestive evidence for topographic mapping of modulation
frequency. Selectivity to modulation waveforms or modulation paradigms more complex than the basic sinusoidally modulated tone are beginning to be reported.
There are several neurobiological avenues to further
explore and strengthen the case for dedicated modulation
mechanisms and their link to perception. Review of the
available data suggests that the most immediate gain, with
existing tools, can be expected from inventive stimulus
paradigms. Although sinusoidal AM may be considered a
complex stimulus in the frequency domain, it is an elementary but simple stimulus in the modulation domain.
The vast majority of studies of modulation processing
have used single sinusoidal AM tones and have focused on
modulation tuning. This is a necessary starting point, but
to make a convincing case for the relevance of the tuning
observed, the stimulus arsenal should be expanded. Current technology enables synthesis of more complex stimuli that are amenable to parametric exploration yet a step
closer to natural stimuli. There are still basic unanswered
questions to be addressed with sinusoidal AM, but it is
equally clear that important properties and selectivities
are only manifest with the use of nonsinusoidal envelopes
or stimulus paradigms that involve modulation in ways
that are closer to real-world tasks faced by the auditory
system. Clever use of such paradigms is likely to make
either the skeptical or optimistic view prevail.
We are very grateful to A. Palmer and R. Batra for critical
reading of the manuscript.
During the preparation of this review, P. X. Joris was
supported by the Fund for Scientific Research-Flanders Grants
G.0297.98 and G.0083.02 and Research Fund K.U. Leuven Grant
OT/01/42; C. E. Schreiner was supported by National Institutes
of Health Grants DC-02260 and NS-34835; and A. Rees was
supported by the Wellcome Trust.
Address for reprint requests and other correspondence:
P. X. Joris, Laboratory of Auditory Neurophysiology, K.U. Leuven, Campus Gasthuisberg, B-3000 Leuven, Belgium (E-mail:
Philip.Joris@med.kuleuven.ac.be).
Physiol Rev • VOL
REFERENCES
1. Abbott LF, Varela JA, Sen K, and Nelson SB. Synaptic depression and cortical gain control. Science 275: 220 –224, 1997.
2. Aertsen AM and Johannesma PIM. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol
Cybern 42: 133–143, 1981.
3. Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke
H, and Merzenich MM. Speech comprehension is correlated with
temporal response patterns recorded from auditory cortex. Proc
Natl Acad Sci USA 98: 13367–13372, 2001.
4. Anderson DJ, Rose JE, Hind JE, and Brugge JF. Temporal
position of discharges in single auditory nerve fibers within the
cycle of a sine-wave stimulus: frequency and intensity effects. J
Acoust Soc Am 49: 1131–1139, 1971.
5. Atick J. Could information theory provide an ecological theory of
sensory processing? Network 3: 213–251, 1992.
6. Attias H and Schreiner CE. Low-order temporal statistics of
natural sounds. In: Advances in Neural Information Processing
Systems 9, edited by M. C. Mozer, M. I. Jordan, and T. Petsche.
Cambridge, MA: MIT Press, 1997, p. 27–33.
7. Attias H and Schreiner CE. Coding of naturalistic stimuli by
auditory midbrain neurons. In: Advances in Neural Information
Processing Systems 10, edited by M. I. Jordan, M. Kearns, and S.
Solla. Cambridge, MA: MIT Press, 1998, p. 103–109.
8. Bacon SP and Grantham DM. Modulation masking: effects of
modulation frequency, depth, and phase. J Acoust Soc Am 85:
2575–2588, 1989.
9. Bacon SP and Viemeister NF. Temporal modulation transfer
functions in normal hearing and hearing-impaired listeners. Audiology 24: 117–134, 1985.
10. Batra R, Kuwada S, and Fitzpatrick DC. Sensitivity to interaural
temporal disparities of low- and high-frequency neurons in the
superior olivary complex. I. Heterogeneity of responses. J Neurophysiol 78: 1222–1236, 1997.
11. Batra R, Kuwada S, and Fitzpatrick DC. Sensitivity to interaural
temporal disparities of low- and high-frequency neurons in the
superior olivary complex. II. Coincidence detection. J Neurophysiol 78: 1237–1247, 1997.
12. Batra R, Kuwada S, and Stanford TR. Temporal coding of
envelopes and their interaural delays in the inferior colliculus of
the unanesthetized rabbit. J Neurophysiol 61: 257–268, 1989.
13. Batra R, Kuwada S, and Stanford TR. High-frequency neurons in
the inferior colliculus that are sensitive to interaural delays of
amplitude-modulated tones— evidence for dual binaural influences. J Neurophysiol 70: 64 – 80, 1993.
14. Beckius GE, Batra R, and Oliver DL. Axons from anteroventral
cochlear nucleus that terminate in medial superior olive of cat:
observations related to delay lines. J Neurosci 19: 3146 –3161, 2001.
15. Beitel R, Schreiner CE, Wang X, Cheung S, Jenkins W, and
Merzenich MM. Effects of psychophysical training on the entrainment of primary auditory cortical neurons to amplitude modulated
tones. Soc Neurosci Abstr 21: 1180, 1995.
16. Bernstein LR and Trahiotis C. Detection of interaural delay in
high-frequency sinusoidally amplitude-modulated tones, two-tone
complexes, and bands of noise. J Acoust Soc Am 95: 3561–3567,
1994.
17. Bieser A and Müller-Preuss P. Auditory responsive cortex in the
squirrel monkey: neural responses to amplitude-modulated sounds.
Exp Brain Res 108: 273–284, 1996.
18. Brawer JR, Morest DK, and Kane EC. The neuronal architecture
of the cochlear nucleus of the cat. J Comp Neurol 155: 251–282,
1974.
19. Brosch M and Schreiner CE. Sequence selectivity of neurons in
cat primary auditory cortex. Cereb Cortex 10: 1155–1167, 2000.
20. Burger RM and Pollak GD. Analysis of the role of inhibition in
shaping responses to sinusoidally amplitude-modulated signals in
the inferior colliculus. J Neurophysiol 80: 1686 –1701, 1998.
21. Burns EM and Viemeister NF. Nonspectral pitch. J Acoust Soc
Am 60: 863– 869, 1976.
22. Burns EM and Viemeister NF. Played-again SAM: further observations on the pitch of amplitude-modulated noise. J Acoust Soc
Am 70: 1655–1660, 1981.
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
23. Buunen TJF and Rhode WS. Responses of fibers in the cat’s
auditory nerve to the cubic difference tone. J Acoust Soc Am 64:
772–781, 1978.
24. Caird D. Processing in the colliculus. In: The Neurobiology of
Hearing: The Central Auditory System, edited by R. A. Altschuler,
R. P. Bobbin, B. M. Clopton, and D. W. Hoffman. New York: Raven,
1991, p. 253–292.
25. Calford MB and Semple MN. Monaural inhibition in cat auditory
cortex. J Neurophysiol 73: 1876 –1891, 1995.
26. Cant NB and Benson CG. Parallel auditory pathways: projection
patterns of the different neuronal populations in the dorsal and
ventral cochlear nuclei. Brain Res Bull 60: 457– 474, 2003.
27. Cariani PA and Delgutte B. Neural correlates of the pitch of
complex tones. II. Pitch shift, pitch ambiguity, phase invariance,
pitch circularity, rate pitch, and the dominance region for pitch.
J Neurophysiol 76: 1717–1734, 1996.
28. Caspary DM, Palombi PS, and Hughes LF. GABAergic inputs
shape responses to amplitude modulated stimuli in the inferior
colliculus. Hear Res 168: 163–173, 2002.
29. Caspary DM, Rupert AL, and Moushegian G. Neuronal coding
of vowel sounds in the cochlear nuclei. Exp Neurol 54: 414 – 431,
1997.
30. Chistovich LA, Lublinskaja VV, Malinnikova EA, Ogorodnikova EA, Stoljarova EI, and Zhukov SJ. Temporal processing of
peripheral auditory patterns of speech. In: The Representation of
Speech in the Peripheral Auditory System, edited by R. Carlson
and B. Granstrom. Amsterdam: Elsevier, 1982, p. 165–180.
31. Clarey JC, Barone P, and Imig TJ. Physiology of thalamus and
cortex. In: The Mammalian Auditory Pathway: Neurophysiology,
edited by A. N. Popper and R. R. Fay. New York: Springer, 1992, p.
232–334.
32. Condon CJ, White KR, and Feng AS. Neurons with different
temporal firing patterns in the inferior colliculus of the little brown
bat differentially process sinusoidal amplitude-modulated signals.
J Comp Physiol A Sens Neural Behav Physiol 178: 147–157, 1996.
33. Cooper NP, Robertson D, and Yates GK. Cochlear nerve fiber
responses to amplitude-modulated stimuli: variations with spontaneous rate and other response characteristics. J Neurophysiol 70:
370 –386, 1993.
34. Creutzfeldt OD, Hellweg FC, and Schreiner CE. Thalamocortical transformation of responses to complex auditory stimuli. Exp
Brain Res 39: 87–104, 1980.
35. Dau T, Kollmeier B, and Kohlrausch A. Modelling auditory
processing of amplitude modulation. I. Detection of masking with
narrow-band carriers. J Acoust Soc Am 102: 2892–2905, 1997.
36. Dau T, Kollmeier B, and Kohlrausch A. Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. J Acoust Soc Am 102: 2906 –2919, 1997.
37. Dau T, Verhey J, and Kohlrausch A. Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise
carriers. J Acoust Soc Am 106: 2752–2760, 1999.
38. Decharms RC, Blake DT, and Merzenich MM. Optimizing sound
features for cortical neurons. Science 280: 1439 –1443, 1998.
39. Delgutte B. Auditory neural processing of speech. In: The Handbook of Phonetic Sciences, edited by W. J. Hardcastle and J. Laver.
Oxford, UK: Blackwell, 1997, p. 507–538.
40. Delgutte B, Hammond BM, and Cariani PA. Neural coding of
the temporal envelope of speech: relation to modulation transfer
functions. In: Psychophysical and Physiological Advances in
Hearing, edited by A. R. Palmer, A. Rees, A. Q. Summerfield, and R.
Meddis. London: Whurr, 1997, p. 595– 603.
41. Denham SL and Denham MJ. An investigation into the role of
cortical synaptic depression in auditory processing. In: Emergent
Neural Computational Architectures Based on Neuroscience: Towards Neuroscience-Inspired Computing, edited by S. Wermter,
D. J. WIllshaw, and J. Austin. Berlin: Springer, 2001, p. 494 –506.
42. Dinse HR, Krueger K, Akhavan AC, Spengler F, Schoenor G,
and Schreiner CE. Low-frequency oscillations of visual, auditory
and somatosensory cortical neurons evoked by sensory stimulation. Int J Psychophysiol 26: 205–227, 1997.
43. Drew T and Doucet S. Application of circular statistics to the
study of neuronal discharge during locomotion. J Neurosci Methods 38: 171–181, 1991.
Physiol Rev • VOL
571
44. Drullman R, Festen JM, and Houtgast T. Effect of temporal
modulation reduction on spectral contrasts in speech. J Acoust Soc
Am 99: 2358 –2364, 1996.
45. Edeline JM, Hars B, Maho C, and Hennevin E. Transient and
prolonged facilitation of tone-evoked responses induced by basal
forebrain stimulations in the rat auditory cortex. Exp Brain Res 97:
373–386, 1994.
46. Eggermont JJ. Rate and synchronization measures of periodicity
coding in cat primary auditory cortex. Hear Res 56: 153–167, 1991.
47. Eggermont JJ. Stimulus induced and spontaneous rhythmic firing
of single units in cat primary auditory cortex. Hear Res 61: 1–11,
1992.
48. Eggermont JJ. Differential effects of age on click-rate and amplitude modulation-specific coding in primary auditory cortex of the
cat. Hear Res 74: 51– 66, 1993.
49. Eggermont JJ. Temporal modulation transfer function for AM and
FM stimuli in cat auditory cortex. Effects of carrier type, modulating waveform and intensity. Hear Res 74: 51– 66, 1994.
50. Eggermont JJ. Representation of a voice onset time continuum in
primary auditory cortex of the cat. J Acoust Soc Am 98: 911–920,
1995.
51. Eggermont JJ. How homogeneous is cat primary auditory cortex?
Evidence from simultaneous single-unit recordings. Audit Neurosci 2: 79 –96, 1996.
52. Eggermont JJ. Firing rate and firing synchrony distinguish dynamic from steady state sound. Neuroreport 8: 2709 –2713, 1997.
53. Eggermont JJ. Representation of spectral and temporal sound
features in three cortical fields of the cat. Similarities outweigh
differences. J Neurophysiol 80: 2743–2764, 1998.
54. Eggermont JJ. The magnitude and phase of temporal modulation
transfer functions in cat auditory cortex. J Neurosci 19: 2780 –2788,
1999.
55. Eggermont JJ. Temporal modulation transfer functions in cat
primary auditory cortex: separating stimulus effects from neural
mechanisms. J Neurophysiol 78: 305–321, 2002.
56. Eggermont JJ, Johannesma PIM, and Aertsen AMHJ. Reversecorrelation methods in auditory research. Q Rev Biophys 16: 341–
414, 1983.
57. Erulkar SD, Butler RA, and Gerstein GL. Excitation and inhibition in cochlear nucleus. II. Frequency-modulated tones. J Neurophysiol 31: 537–548, 1968.
58. Escabi MA, Schreiner CE, and Miller LM. Dynamic time-frequency processing in the cat midbrain, thalamus, and auditory
cortex: spectrotemporal receptive fields obtained using dynamic
ripple spectra. Soc Neurosci Abstr 24: 1879, 1998.
59. Evans EF. Cochlear nerve and cochlear nucleus. In: Handbook of
Sensory Physiology, edited by W. D. Keidel and W. D. Neff. Berlin:
Springer, 1975, p. 1–108.
60. Fastl H, Hesse A, Schorer E, Urbas J, and Müller-Preuss P.
Searching for neural correlates of the hearing sensation fluctuation
strength in the auditory cortex of squirrel monkeys. Hear Res 23:
199 –203, 1986.
61. Fenton MB. Natural history and biosonar signals. In: Hearing by
Bats, edited by R. R. Fay and A. N. Popper. New York: Springer,
1995, p. 37– 86.
62. Fernald RD and Gerstein GL. Response of cat cochlear nucleus
neurons to frequency and amplitude modulated tones. Brain Res
45: 417– 435, 1972.
63. Fettiplace R and Fuchs PA. Mechanisms of hair cell tuning.
Annu Rev Physiol 61: 809 – 834, 1999.
64. Fishman YI, Reser DH, Arezzo JC, and Steinschneider M.
Complex tone processing in primary auditory cortex of the awake
monkey. I. Neural ensemble correlates of roughness. J Acoust Soc
Am 108: 235–246, 2000.
65. Fitzgerald JV, Burkitt AN, Clark GM, and Paolini AG. Delay
analysis in the auditory brainstem of the rat: comparison with click
latency. Hear Res 159: 85–100, 2001.
66. Fitzpatrick DC, Batra R, Stanford TR, and Kuwada S. A neuronal population code for sound localization. Nature 388: 871– 874,
1997.
67. Fitzpatrick DC, Kuwada S, and Batra R. Neural sensitivity to
interaural time differences: beyond the Jeffress model. J Neurosci
20: 1605–1615, 2000.
84 • APRIL 2004 •
www.prv.org
572
JORIS, SCHREINER, AND REES
68. Forest TG and Green DM. Detection of partially filled gaps in
noise and the temporal modulation transfer function. J Acoust Soc
Am 82: 1933–1943, 1987.
69. Frisina RD. Subcortical neural coding mechanisms for auditory
temporal processing. Hear Res 158: 1–27, 2001.
70. Frisina RD, Smith RL, and Chamberlain SC. Differential encoding of rapid changes in sound amplitude by second-order auditory
neurons. Exp Brain Res 60: 417– 422, 1985.
71. Frisina RD, Smith RL, and Chamberlain SC. Encoding of amplitude modulation in the gerbil cochlear nucleus. I. A hierarchy of
enhancement. Hear Res 44: 99 –122, 1990.
72. Frisina RD, Smith RL, and Chamberlain SC. Encoding of amplitude modulation in the gerbil cochlear nucleus. II. Possible
neural mechanisms. Hear Res 44: 123–142, 1990.
73. Frisina RD, Walton JP, and Karcich KJ. Dorsal cochlear nucleus single neurons can enhance temporal processing capabilities
in background noise. Exp Brain Res 102: 160 –164, 1994.
74. Funkenstein HH and Winter P. Responses to acoustic stimuli of
units in the auditory cortex of awake squirrel monkeys. Exp Brain
Res 18: 464 – 488, 1973.
75. Gaese BH and Ostwald J. Temporal coding of amplitude and
frequency modulation in the rat auditory cortex. Eur J Neurosci 7:
438 – 450, 1995.
76. Geisler CD. From Sound to Synapse. Oxford, UK: Oxford Univ.
Press, 1998.
77. Giraud A, Lorenzi C, Ashburner J, Wable J, Johnsrude I,
Frackowiak RSJ, and Kleinschmidt A. Representation of the
temporal envelope of sounds in the human brain. J Neurophysiol
84: 1588 –1598, 2000.
78. Glattke TJ. Unit responses of the cat cochlear nucleus to amplitude-modulated stimuli. J Acoust Soc Am 45: 419 – 425, 1968.
79. Godfrey DA, Kiang NYS, and Norris BE. Single unit activity in
the dorsal cochlear nucleus of the cat. J Comp Neurol 162: 269 –
284, 1975.
80. Godfrey DA, Kiang NYS, and Norris BE. Single unit activity in
the posteroventral cochlear nucleus of the cat. J Comp Neurol 162:
247–268, 1975.
81. Goldberg JM and Brown PB. Response of binaural neurons of
dog superior olivary complex to dichotic tonal stimuli: some physiological mechanisms of sound localization. J Neurophysiol 22:
613– 636, 1969.
82. Goldberg JM and Brownell WE. Discharge characteristics of
neurons in anteroventral and dorsal cochlear nuclei of cat. Brain
Res 64: 35–54, 1973.
83. Goldstein MH Jr, De Ribaupierre F, and Brown RM. Responses
of the auditory cortex to repetitive acoustic stimuli. J Acoust Soc
Am 31: 356 –364, 1959.
84. Green GG and Kay RH. Channels in the human auditory system
concerned with the waveform of modulation present in amplitude
and frequency-modulated tones. J Physiol 241: 50 –52, 1974.
85. Greenberg S. Possible role of low and medium spontaneous rate
cochlear nerve fibers in the encoding of waveform periodicity. In:
Auditory Frequency Selectivity, edited by B. C. J. Moore and R. D.
Patterson. New York: Plenum, 1986, p. 241–248.
86. Greenwood DD and Joris PX. Mechanical and “temporal” filtering as codeterminants of the response by cat primary fibers to
amplitude-modulated signals. J Acoust Soc Am 99: 1029 –1039,
1996.
87. Griffiths TD, Penhune V, Peretz I, Dean JL, Patterson RD,
and Green GG. Frontal processing and auditory perception. Neuroreport 11: 919 –922, 2000.
88. Griffiths TD, Rees A, and Green GG. Disorders of human complex sound processing. Neurocase 5: 365–378, 1999.
89. Grigoreva TI, Figurina II, and Vasilev AG. Role of the medial
geniculate body in the production of conditioned reflexes to amplitude-modulated stimuli in rats. Zh Vyssh Nervn Deyat 37: 265–
271, 1988.
90. Grimault N, Bacon SP, and Micheyl C. Auditory stream segregation on the basis of amplitude-modulation rate. J Acoust Soc Am
111: 1340 –1348, 2002.
91. Grothe B. Interaction of excitation and inhibition in processing of
pure tone and amplitude-modulated stimuli in the medial superior
olive of the mustached bat. J Neurophysiol 71: 706 –721, 1994.
Physiol Rev • VOL
92. Gu Q and Singer W. Effects of intracortical infusion of anticholinergic drugs on neuronal plasticity in kitten striate cortex. Eur
J Neurosci 5: 475– 485, 1993.
93. Gummer AW and Johnstone BM. Group delay measurement
from spiral ganglion cells in the basal turn of the guinea pig
cochlea. J Acoust Soc Am 76: 1388 –1400, 1984.
94. Gummer M, Yates GK, and Johnstone BM. Modulation transfer
function of efferent neurones in the guinea pig cochlea. Hear Res
36: 41–52, 1988.
95. Hall DA, Haggard MP, Akeroyd MA, Summerfield AQ, Palmer
AR, Elliott MR, and Bowtell RW. Modulation and task effects in
auditory processing measured using fMRI. Hum Brain Mapp 10:
107–119, 2000.
96. Hall JW, Haggard MP, and Fernandes MA. Detection in noise by
spectro-temporal pattern analysis. J Acoust Soc Am 76: 50 –56,
1984.
97. Harms MP and Melcher JR. Sound repetition rate in the human
auditory pathway: representations in the waveshape and amplitude
of FMRI activation. J Neurophysiol 88: 1433–1450, 2002.
98. Hars B, Maho C, Edeline JM, and Hennevin E. Basal forebrain
stimulation facilitates tone-evoked responses in the auditory cortex of awake rat. Neuroscience 56: 61–74, 1993.
99. Hartmann WM. The physical description of signals. In: Hearing,
edited by B. C. J. Moore. San Diego, CA: Academic, 1995, p. 1– 40.
100. Hartmann WM. Signals, Sound, and Sensation. New York:
Springer, 1997.
101. Heil P. Representation of sound onsets in the auditory system.
Audiol Neuro-otolaryngol 6: 167–172, 2001.
102. Heil P and Neubauer H. Temporal integration of sound pressure
determines thresholds of auditory-nerve fibers. J Neurosci 21:
7404 –7415, 2001.
103. Heil P, Schulze H, and Langner G. Ontogenetic development of
periodicity in the inferior colliculus of the mongolian Gerbil. Audit
Neurosci 1: 363–383, 1995.
104. Henning GB and Ashton J. The effect of carrier and modulation
frequency on lateralization based on interaural phase and interaural group delay. Hear Res 4: 185–194, 1981.
105. Hewitt MJ and Meddis R. A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. J Acoust
Soc Am 95: 2145–2159, 1994.
106. Horikawa J, Tanahashi A, and Suga N. After-discharges in the
auditory cortex of the mustached bat: no oscillatory discharges for
binding auditory information. Hear Res 76: 45–52, 1994.
107. Houtgast T. Frequency selectivity in amplitude-modulation detection. J Acoust Soc Am 85: 1676 –1680, 1989.
108. Houtgast T and Steeneken HJM. The modulation transfer function in room acoustics as a predictor of speech intelligibility.
Acustica 28: 66 –73, 1973.
109. Huffman RF, Argeles PC, and Covey E. Processing of sinusoidally amplitude modulated signals in the nuclei of the lateral lemniscus of the big brown bat, Eptesicus fuscus. Hear Res 126:
181–200, 1998.
110. Huffman RF and Henson OW Jr. The descending auditory pathway and acousticomotor systems: connections with the inferior
colliculus. Brain Res Rev 15: 295–323, 1990.
111. Imaizumi K, Priebe NJ, Crum PAC, Bedenbaugh PH, Cheung
SW, and Schreiner CE. Modular functional organization in cat
anterior auditory field (Abstract). Program No. 488.6. 2003 Abstract Viewer/Itinerary Planner. Washington, DC: Soc. Neurosci,
2003, Online.
112. Irvine DRF. The Auditory Brainstem: A Review of the Structure
and Function of Auditory Brainstem Processing Mechanisms.
Berlin: Springer-Verlag, 1986.
113. Irvine DRF. Physiology of the auditory brainstem. In: The Mammalian Auditory Pathway: Neurophysiology, edited by A. N. Popper and R. R. Fay. New York: Springer-Verlag, 1992, p. 153–231.
114. Javel E. Coding of AM tones in the chinchilla auditory nerve:
implications for the pitch of complex tones. J Acoust Soc Am 68:
133–146, 1980.
115. Javel E and Mott JB. Physiological and psychophysical correlates of temporal processes in hearing. Hear Res 34: 275–294, 1988.
116. Jen PHS, Hou T, and Wu M. Neurons in the inferior colliculus,
auditory cortex and pontine nuclei of the FM bat, Eptesicus fucus,
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.
138.
139.
140.
respond to pulse repetition rates differently. Brain Res 613: 152–
155, 1993.
Jenison RL. A Dynamic Model of the Auditory Periphery Based
on the Responses of Single Auditory-Nerve Fibers (PhD thesis).
Madison: Univ. of Wisconsin, 1991.
Jiang D, Palmer AR, and Winter IM. Frequency extent of twotone facilitation in onset units in the ventral cochlear nucleus.
J Neurophysiol 75: 380 –395, 1996.
John MS and Picton TW. Human auditory steady-state responses
to amplitude-modulated tones: phase and latency measurements.
Hear Res 141: 57–79, 2000.
Johnson DH. The Response of Single Auditory-Nerve Fibers in
the Cat to Single Tones: Synchrony and Average Discharge Rate
(PhD thesis). Cambridge, MA: MIT, 1974.
Johnson DH. The relationship between spike rate and synchrony
in responses of auditory-nerve fibers to single tones. J Acoust Soc
Am 68: 1115–1122, 1980.
Joris PX. Envelope coding in the lateral superior olive. II. Characteristic delays and comparison with responses in the medial
superior olive. J Neurophysiol 76: 2137–2156, 1996.
Joris PX. Interaural time sensitivity dominated by cochlea-induced
envelope patterns. J Neurosci 23: 6345– 6350, 2003.
Joris PX, Carney LHC, Smith PH, and Yin TCT. Enhancement
of synchronization in the anteroventral cochlear nucleus. I. Responses to tonebursts at the characteristic frequency. J Neurophysiol 71: 1022–1036, 1994.
Joris PX and Smith PH. Temporal and binaural properties in
dorsal cochlear nucleus and its output tract. J Neurosci 18: 10157–
10170, 1998.
Joris PX, Smith PH, and Yin TCT. Coincidence detection in the
auditory system: 50 years after Jeffress. Neuron 21: 1235–1238,
1998.
Joris PX and Yin TCT. Responses to amplitude-modulated tones
in the auditory nerve of the cat. J Acoust Soc Am 91: 215–232, 1992.
Joris PX and Yin TCT. Envelope coding in the lateral superior
olive. I. Sensitivity to interaural time differences. J Neurophysiol
73: 1043–1062, 1995.
Joris PX and Yin TCT. Envelope coding in the lateral superior
olive. III. Comparison with afferent pathways. J Neurophysiol 79:
253–269, 1998.
Kaas JH and Hackett TA. Subdivisions of auditory cortex and
processing streams in primates. Proc Natl Acad Sci USA 97: 11793–
11799, 2000.
Kay RH. Hearing of modulation in sounds. Physiol Rev 62: 894 –
975, 1982.
Kay RH and Matthews DR. On the existence in human auditory
pathways of channels selectively tuned to the modulation present
in frequency-modulated tones. J Physiol 225: 657– 677, 1972.
Keller CH and Takahashi TT. Representation of temporal features of complex sounds by the discharge patterns of neurons in
the owl’s inferior colliculus. J Neurophysiol 84: 2638 –2650, 2000.
Kenmochi M and Eggermont JJ. Autonomous cortical rhythms
affect temporal modulation transfer functions. Neuroreport 8:
1589 –1593, 1997.
Khanna SM and Teich MC. Spectral characteristics of the responses of primary auditory-nerve fibers to amplitude-modulated
signals. Hear Res 39: 143–158, 1989.
Kiang NYS. Peripheral neural processing of auditory information.
In: Handbook of Physiology. The Nervous System. Sensory Processes. Bethesda, MD: Am Physiol Soc, 1984, sect. 1, vol. III, pt. 2,
chapt. 15, p. 639 – 674.
Kiang NYS. Curious oddments of auditory-nerve studies. Hear Res
49: 1–16, 1990.
Kilgard MP and Merzenich MM. Plasticity of temporal information processing in the primary auditory cortex. Nature Neurosci 1:
727–731, 1998.
Kilgard MP, Pandya PK, Vazquez J, Gehi A, Schreiner CE, and
Merzenich MM. Sensory input directs spatial and temporal plasticity in primary auditory cortex. J Neurophysiol 86: 326 –338, 2001.
Kim DO, Rhode WS, and Greenberg S. Responses of cochlear
nucleus neurons to speech signals: neural encoding of pitch, intensity and other parameters. In: Auditory Frequency Selectivity,
Physiol Rev • VOL
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
573
edited by B. C. J. Moore and R. D. Patterson. New York: Plenum,
1986, p. 281–288.
Kim DO, Sirianni JG, and Chang SO. Responses of DCN-PVCN
neurons and auditory nerve fibers in unanesthetized cats to AM and
pure tones: analysis with autocorrelation/power-spectrum. Hear
Res 45: 95–113, 1990.
Kowalski N, Depireux DA, and Shamma SA. Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics
of single-unit responses to moving ripple spectra. J Neurophysiol
76: 3505–3523, 1996.
Kowalski N, Depireux DA, and Shamma SA. Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of
unit responses to arbitrary dynamic spectra. J Neurophysiol 76:
3524 –3534, 1996.
Krishna SB and Semple MN. Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior
colliculus. J Neurophysiol 84: 255–273, 2000.
Kuwada S and Batra R. Coding of sound envelopes by inhibitory
rebound in neurons of the superior olivary complex in the unanesthetized rabbit. J Neurosci 19: 2273–2287, 1999.
Kuwada S, Batra R, and Maher VL. Scalp potentials of normal
and hearing-impaired subjects in response to sinusoidally amplitude-modulated tones. Hear Res 21: 179 –192, 1986.
Kuwada S, Yin TCT, Syka J, Buunen TJF, and Wickesberg RE.
Binaural interaction in low-frequency neurons in inferior colliculus
of the cat. IV. Comparison of monaural and binaural response
properties. J Neurophysiol 51: 1306 –1325, 1984.
Kvale M and Schreiner CE. Perturbative M-sequences for auditory systems identification. Acustica 83: 653– 658, 1997.
Langner G. Periodicity coding in the auditory system. Hear Res 6:
115–142, 1992.
Langner G. Neural processing and representation of periodicity
pitch. Acta Otolaryngol Suppl 532: 68 –76, 1997.
Langner G, Sams M, Heil P, and Schulze H. Frequency and
periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetencephalography. J Comp
Physiol A Sens Neural Behav Physiol 181: 665– 676, 1997.
Langner G and Schreiner CE. Periodicity coding in the inferior
colliculus of the cat. I. Neuronal mechanisms. J Neurophysiol 60:
1799 –1822, 1988.
Langner G, Schreiner CE, and Merzenich MM. Co-variation of
latency and temporal resolution in the inferior colliculus of the cat.
Hear Res 31: 197–202, 1987.
Lavine RA. Phase-locking in response of single neurons in cochlear nuclear complex of the cat to low-frequency tonal stimuli.
J Neurophysiol 24: 467– 483, 1971.
Le Beau FEN, Rees A, and Malmierca MS. Contribution of
GABA- and glycine-mediated inhibition to the monaural temporal
response properties of neurons in the inferior colliculus. J Neurophysiol 75: 902–919, 1996.
Lesser HD, Oneill WE, Frisina RD, and Emerson RC. On-off
units in the moustached bat inferior colliculus are selective for
transients resembling “acoustic glint” from fluttering insect targets.
Exp Brain Res 82: 137–148, 1990.
Liang L, Lu T, and Wang X. Neural representations of sinusoidal
amplitude and frequency modulations in the primary auditory cortex of awake primates. J Neurophysiol 87: 2237–2261, 2002.
Liberman MC. Auditory-nerve response from cats raised in a
low-noise chamber. J Acoust Soc Am 63: 442– 455, 1978.
Linden JF, Liu RC, Sahani M, Schreiner CE, and Merzenich
MM. Spectrotemporal structure of receptive fields in areas AI and
AAF of mouse auditory cortex. J Neurophysiol 90: 2660 –2675,
2003.
Lorenzi C, Micheyl C, and Berthommier F. Neuronal correlates
of perceptual amplitude-modulation detection. Hear Res 90: 219 –
227, 1995.
Lu T, Liang L, and Wang X. Neural representations of temporally
asymmetric stimuli in the auditory cortex of awake primates.
J Neurophysiol 85: 2364 –2380, 2001.
Maison S, Micheyl C, and Collet L. Medial olivocochlear efferent
system in humans studied with amplitude-modulated tones. J Neurophysiol 77: 1759 –1768, 1997.
84 • APRIL 2004 •
www.prv.org
574
JORIS, SCHREINER, AND REES
163. Makela JP, Karmos G, Molnar M, Csepe V, and Winkler I.
Steady-state responses from the cat auditory cortex. Hear Res 45:
41–50, 1990.
164. Malmierca MS, Blackstad TW, Osen KK, and Molowny RL. The
central nucleus of the inferior colliculus in rat—a Golgi and computer reconstruction study of the neuronal and laminar structure.
J Comp Neurol 333: 1–27, 1993.
165. Malmierca MS, Leergaard TB, Bajo VM, Bjaalie JG, and Merchan MA. Anatomic evidence of a three-dimensional mosaic pattern of tonotopic organization in the ventral complex of the lateral
lemniscus in cat. J Neurosci 18: 10603–10618, 1998.
166. Malmierca MS and Merchán MA. The auditory system. In: The
Rat Nervous System, edited by G. Paxinos. San Diego, CA: Academic, 2004, p. 995–1080.
167. Malmierca MS, Rees A, Le Beau FEN, and Bjaalie JG. Laminar
organization of frequency-defined axons within and between the
inferior colliculi of the guinea pig. J Comp Neurol 357: 124 –144,
1995.
168. Mardia KV and Jupp PE. Directional Statistics. New York: Wiley,
1999.
169. Markram H, Lübke J, Frotscher M, and Sakmann B. Regulation
of synaptic efficacy by coincidence of postsynaptic APs and EPSPs.
Science 275: 213–215, 1997.
170. Markram H and Tsodyks M. Redistribution of synaptic efficacy
between neocortical pyramidal neurons. Nature 382: 807– 810,
1996.
171. McAlpine D. Are pitch neurones the result of difference tones on
the basilar membrane? Ass Res Otolaryngol Abstr 25: 40, 2002.
172. McAlpine D, Jiang D, Shackleton TM, and Palmer AR. Convergent input from brainstem coincidence detectors onto delay-sensitive neurons in the inferior colliculus. J Neurosci 18: 6026 – 6039,
1998.
173. Merzenich MM and Reid MD. Representation of the cochlea
within the inferior colliculus of the cat. Brain Res 77: 397– 415,
1974.
174. Metherate R and Weinberger NM. Cholinergic modulation of
responses to single tones produces tone-specific receptive field
alterations in cat auditory cortex. Synapse 6: 133–145, 1990.
175. Miller LM, Escabi MA, Read HL, and Schreiner CE. Functional
convergence of response properties in the auditory thalamocortical
system. Neuron 32: 151–160, 2001.
176. Miller LM, Escabi MA, Read HL, and Schreiner CE. Spectrotemporal receptive fields in the lemniscal auditory thalamus and
cortex. J Neurophysiol 87: 516 –527, 2002.
177. Miller MI and Sachs MB. Representation of voice pitch in discharge patterns of auditory-nerve fibers. Hear Res 14: 257–279,
1984.
178. Moody DB, Cole D, Davidson LM, and Stebbins WC. Evidence
for a reappraisal of the psychophysical selective adaption paradigm. J Acoust Soc Am 76: 1076 –1079, 1984.
179. Moore BCJ. Effects of relative phase of the components on the
pitch of three component complex tones. In: Psychophysics and
Physiology of Hearing, edited by E. F. Evans and J. P. Wilson.
London: Academic, 1977, p. 349 –358.
180. Moore BCJ. An Introduction to the Psychology of Hearing. San
Diego, CA: Academic, 2003.
181. Moore BCJ and Sek A. Effects of relative phase and frequency
spacing on the detection of three-component amplitude modulation. J Acoust Soc Am 108: 2337–2344, 2001.
182. Morest DK and Oliver DL. The neuronal architecture of the
inferior colliculus in the cat: defining the functional anatomy of the
auditory midbrain. J Comp Neurol 222: 209 –236, 1984.
183. Møller AR. Coding of amplitude and frequency modulated sounds
in the cochlear nucleus of the rat. Acta Physiol Scand 86: 223–238,
1972.
184. Møller AR. Responses of units in the cochlear nucleus to sinusoidally amplitude-modulated tones. Exp Neurol 45: 104 –117, 1974.
185. Møller AR. Latency of unit responses in cochlear nucleus determined in two different ways. J Neurophysiol 38: 812– 821, 1975.
186. Møller AR. Dynamic properties of primary auditory fibers compared with cells in the cochlear nucleus. Acta Physiol Scand 98:
157–167, 1976.
Physiol Rev • VOL
187. Møller AR. Dynamic properties of the responses of single neurones in the cochlear nucleus of the rat. J Physiol 259: 63– 82, 1976.
188. Møller AR. Coding of increments and decrements in stimulus
intensity in single units in the cochlear nucleus of the rat. J Neurosci Res 4: 1– 8, 1979.
189. Møller AR and Rees A. Dynamic properties of the responses of
single neurons in the inferior colliculus of the rat. Hear Res 24:
203–215, 1986.
190. Müller-Preuss P. On the mechanisms of call coding through auditory neurons in the squirrel monkey. Eur Arch Psychiatry Neurol
Sci 236: 50 –55, 1986.
191. Müller-Preuss P, Flachskamm C, and Bieser A. Neural encoding of amplitude modulation within the auditory midbrain of squirrel monkeys. Hear Res 80: 197–208, 1994.
192. Nagarajan S, Cheung S, Bedenbaugh P, Beitel R, Schreiner
CE, and Merzenich MM. Representation of spectral and temporal
envelope of twitter vocalizations in common marmoset primary
auditory cortex. J Neurophysiol 87: 1723–1737, 2002.
193. Neff WD, Diamond DM, and Casseday JH. Behavioral studies of
auditory discrimination. In: Handbook of Sensory Physiology, edited by W. D. Keidel and W. D. Neff. New York: Springer, 1975, p.
307– 400.
194. Nelken I, Rotman Y, and Bar Yosef O. Responses of auditorycortex neurons to structural features of natural sounds. Nature
397: 154 –157, 1999.
195. Nelken I and Young ED. Two separate inhibitory mechanisms
shape the responses of dorsal cochlear nucleus type IV units to
narrowband and wideband stimuli. J Neurophysiol 71: 2446 –2462,
1994.
196. Nelson PG, Erulkar SD, and Bryan JS. Responses of units of the
inferior colliculus to time-varying acoustic stimuli. J Neurophysiol
29: 834 – 860, 1966.
197. Neuert V, Pressnitzer D, Patterson RD, and Winter IM. The
responses of single units in the inferior colliculus of the guinea pig
to damped and ramped sinusoids. Hear Res 159: 36 –52, 2001.
198. Neuweiler G. Auditory adaptations for prey capture in echolocating bats. Physiol Rev 70: 615– 641, 1990.
199. Oertel D, Bal R, Gardner SM, Smith PH, and Joris PX. Detection of synchrony in the activity of auditory nerve fibers by octopus
cells of the mammalian cochlear nucleus. Proc Natl Acad Sci USA
97: 11773–11779, 2000.
200. Oliver DL and Huerta MF. Inferior and superior colliculi. In: The
Mammalian Auditory Pathway: Neuroanatomy, edited by D. B.
Webster, A. N. Popper, and R. R. Fay. New York: Springer-Verlag,
1992, p. 168 –221.
201. Oliver DL and Shneiderman A. The anatomy of the inferior
colliculus: a cellular basis for integration of monaural and binaural
information. In: The Neurobiology of Hearing: The Central Auditory System, edited by R. A. Altschuler, R. P. Bobbin, B. M. Clopton,
and D. W. Hoffman. New York: Raven, 1991, p. 195–222.
202. Osen KK. Cytoarchitecture of the cochlear nuclei in the cat.
J Comp Neurol 136: 453– 483, 1969.
203. Palmer AR. Encoding of rapid amplitude fluctuations by cochlearnerve fibres in the guinea-pig. Arch Oto-Rhino-Laryngol 236: 197–
202, 1982.
204. Palombi PS, Backoff PM, and Caspary DM. Responses of young
and aged rat inferior colliculus neurons to sinusoidally amplitude
modulated stimuli. Hear Res 153: 174 –180, 2001.
205. Patterson RD. The sound of a sinusoid: spectral models. J Acoust
Soc Am 96: 1409 –1418, 1994.
206. Patuzzi RB and Robertson D. Tuning in the mammalian cochlea.
Physiol Rev 68: 1009 –1082, 1988.
207. Pelleg-Toiba R and Wollberg Z. Discrimination of communication calls in the squirrel monkey: “call detectors” or “cell assemblies.” J Basic Clin Physiol Pharmacol 2: 257–271, 1991.
208. Perkel DH and Bullock TH. Neural coding. Neurosci Res Program 6: 221–348, 1968.
209. Peruzzi D, Sivaramakrishnan S, and Oliver DL. Identification of
cell types in brain slices of the inferior colliculus. Neuroscience
101: 403– 416, 2000.
210. Phillips DP. Neural representation of stimulus time in the primary
auditory cortex. Ann NY Acad Sci 682: 104 –118, 1993.
211. Phillips DP and Hall SE. Responses of single neurons in cat
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
212.
213.
214.
215.
216.
217.
218.
219.
220.
221.
222.
223.
224.
225.
226.
227.
228.
229.
230.
231.
232.
233.
234.
235.
auditory cortex to time-varying stimuli: linear amplitude modulations. Exp Brain Res 67: 479 – 492, 1987.
Phillips DP, Hall SE, and Hollett JL. Repetition rate and signal
level effects on neuronal responses to brief tone pulses in cat
auditory cortex. J Acoust Soc Am 85: 2537–2549, 1989.
Picton TW, Dauman R, and Aran JM. Steady-state responses
produced in humans using sinusoidal frequency-modulation. J Otolaryngol 16: 140 –145, 1987.
Plomp R. The role of modulation in hearing. In: Hearing: Physiological Bases and Psychophysics, edited by R. Klinke and R.
Hartmann. Berlin: Springer-Verlag, 1983, p. 270 –276.
Poon PW and Chiu TW. Single cell responses to AM tones of
different envelopes at the auditory midbrain. In: Acoustic Signal
Processing in the Central Auditory System, edited by J. Syka. New
York: Plenum, 1997, p. 253–261.
Pressnitzer D, Winter IM, and Patterson RD. The responses of
single units in the ventral cochlear nucleus of the guinea pig to
damped and ramped sinusoids. Hear Res 149: 155–166, 2000.
Preuss A and Müller-Preuss P. Processing of amplitude modulated sounds in the medial geniculate body of the squirrel monkey.
Exp Brain Res 79: 201–211, 1990.
Read HL, Winer JA, and Schreiner CE. Functional architecture
of auditory cortex. Curr Opin Neurobiol 12: 433– 440, 2002.
Reale RA and Brugge JF. Auditory cortical neurons are sensitive
to static and continuously changing interaural phase cues. J Neurophysiol 64: 1247–1260, 1990.
Rees A, Green GGR, and Kay RH. Steady-state evoked responses
to sinusoidally amplitude-modulated sounds recorded in man. Hear
Res 23: 123–133, 1986.
Rees A, Malmierca MS, and Le Beau EN. Regularity of firing of
neurons in the inferior colliculus. J Neurophysiol 77: 2945–2965,
1997.
Rees A and Møller AR. Responses of neurons in the inferior
colliculus of the rat to AM and FM tones. Hear Res 10: 301–330,
1983.
Rees A and Møller AR. Stimulus properties influencing the responses of inferior colliculus neurons to amplitude-modulated
sounds. Hear Res 27: 129 –143, 1987.
Rees A and Palmer AR. Neuronal responses to amplitude-modulated and pure-tone stimuli in the guinea pig inferior colliculus, and
their modification by broadband noise. J Acoust Soc Am 85: 1978 –
1994, 1989.
Rees A and Sarbaz A. The influence of intrinsic oscillations on the
encoding of amplitude modulation by neurons in the inferior colliculus. In: Acoustical Signal Processing in the Central Auditory
System, edited by J. Syka. New York: Plenum, 1997, p. 239 –252.
Rhode WS. Interspike intervals as a correlate of periodicity pitch
in cat cochlear nucleus. J Acoust Soc Am 97: 2414 –2429, 1995.
Rhode WS. Physiological-morphological properties of the cochlear
nucleus. In: Neurobiology of Hearing: the Central Auditory System, edited by R. A. Altschuler, R. P. Bobbin, B. M. Clopton, and
D. W. Hoffman. New York: Raven, 1991, p. 47–77.
Rhode WS. Temporal coding of 200% amplitude modulated signals
in the ventral cochlear nucleus of the cat. Hear Res 77: 43– 68, 1994.
Rhode WS and Greenberg S. Encoding of amplitude modulation
in the cochlear nucleus of the cat. J Neurophysiol 71: 1797–1825,
1994.
Rhode WS and Smith PH. Characteristics of tone-pip response
patterns in relationship to spontaneous rate in cat auditory nerve
fibers. Hear Res 18: 159 –168, 1985.
Rhode WS and Smith PH. Encoding timing and intensity in the
ventral cochlear nucleus of the cat. J Neurophysiol 56: 261–286,
1986.
Rieke F, Bodnar D, and Bialek W. Naturalistic stimuli increase
the rate and efficiency of information transmission by primary
auditory neurons. Proc R Soc Lond B Biol Sci 262: 259 –265, 1995.
Ritsma RJ. Existence region of the tonal residue. J Acoust Soc Am
34: 1224 –1229, 1962.
Robles L and Ruggero MA. Mechanics of the mammalian cochlea.
Physiol Rev 81: 1305–1352, 2001.
Rodenburg M, Verveij C, and Van Den Brink G. Analysis of
evoked responses in man elicited by sinusoidally amplitude modulated noise. Audiology 11: 283–293, 1972.
Physiol Rev • VOL
575
236. Rodrigues-Dagaeff C, Simm G, De Ribaupierre Y, Villa A, De
Ribaupierre F, and Rouiller EM. Functional organization of the
ventral division of the medial geniculate body of the cat: evidence
for a rostro-caudal gradient of response properties and cortical
projections. Hear Res 39: 103–126, 1989.
237. Rose JE, Greenwood DD, Goldberg JM, and Hind JE. Some
discharge characteristics of single neurons in the inferior colliculus
of the cat. I. Tonotopical organization, relation of spike-counts to
tone intensity, and firing patterns of single elements. J Neurophysiol 26: 294 –320, 1963.
238. Rosen S. Temporal information in speech: auditory and linguistic
aspects. Philos Trans R Soc Lond B Biol Sci 336: 367–373, 1992.
239. Ross B, Borgmann C, Draganova R, Roberts LE, and Pantev C.
A high-precision magnetoencephalographic study of human auditory steady-state responses to amplitude-modulated tones. J
Acoust Soc Am 108: 679 – 691, 2000.
240. Rouiller E and De Ribaupierre F. Neurons sensitive to narrow
ranges of repetitive acoustic transients in the medial geniculate
body of the cat. Brain Res 48: 323–326, 1982.
241. Rouiller E, De Ribaupierre Y, Toros-Morel A, and De Ribaupierre F. Neural coding of clicks in the medial geniculate body of
cat. Hear Res 5: 81–100, 1981.
242. Ruggero MA. Systematic errors in indirect estimates of basilar
membrane travel times. J Acoust Soc Am 67: 707–710, 1980.
243. Ruggero MA. Physiology and coding of sound in the auditory
nerve. In: The Mammalian Auditory Pathway: Neurophysiology,
edited by R. R. Fay and A. N. Popper. New York: Springer-Verlag,
1992, p. 34 –93.
244. Ruggero MA and Rich NC. Timing of spikes initiation in cochlear
afferents: dependence on site of innervation. J Neurophysiol 58:
379 – 403, 1987.
245. Ryan MJ and Rand AS. Phylogenetic inference and the evolution
of communication in tungara frogs. In: The Design of Animal
Communication, edited by M. D. Hauser and M. Konishi. Cambridge, MA: MIT, 1999, p. 535–557.
246. Sachs MB and Abbas PJ. Rate versus level functions for auditorynerve fibers in cats: tone-burst stimuli. J Acoust Soc Am 56: 1835–
1847, 1974.
247. Saitoh K, Maruyama N, and Kudoh M. Sustained response of
auditory cortex units in the cat. In: Brain Mechanisms of Sensation, edited by Y. Katsuki, R. Norgren, and M. Sato. New York:
Wiley, 1981, p. 31– 43.
248. Saldana E, Feliciano M, and Mugnaini E. Distribution of descending projections from primary auditory neocortex to inferior
colliculus mimics the topography of intracollicular projections.
J Comp Neurol 371: 15– 40, 1996.
249. Sally SL and Kelly JB. Organization of auditory cortex in the
albino rat: sound frequency. J Neurophysiol 59: 1627–1638, 1988.
250. Schorer E. Critical modulation frequency based on detection of
AM versus FM tones. J Acoust Soc Am 79: 1054 –1057, 1986.
251. Schreiner CE and Langner G. Periodicity coding in the inferior
colliculus of the cat. II. Topographical organization. J Neurophysiol 60: 1823–1840, 1988.
252. Schreiner CE and Langner G. Laminar fine structure of frequency organization in auditory midbrain. Nature 388: 383–386,
1997.
253. Schreiner CE and Raggio MW. Neuronal responses in cat primary auditory cortex to electrical cochlear stimulation. II. Repetition rate coding. J Neurophysiol 75: 1283–1300, 1996.
254. Schreiner CE and Snyder RL. Modulation transfer characteristics of neurons in the dorsal cochlear nucleus of the cat. Soc
Neurosci Abstr 13: 1258, 1987.
255. Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory
field (AAF). Hear Res 21: 227–241, 1986.
256. Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the cat. II. Comparison between
cortical fields. Hear Res 32: 49 – 64, 1988.
257. Schroeder MR. Modulation transfer functions: definition and measurement. Acustica 49: 179 –182, 1981.
258. Schuller G. Natural ultrasonic echoes from wing beating insects
are encoded by collicular neurons in the CF-FM bat, rhinolophus-
84 • APRIL 2004 •
www.prv.org
576
259.
260.
261.
262.
263.
264.
265.
266.
267.
268.
269.
270.
271.
272.
273.
274.
275.
276.
277.
278.
279.
JORIS, SCHREINER, AND REES
ferrumequinum. J Comp Physiol A Sens Neural Behav Physiol 155:
121–128, 1984.
Schulze H and Langner G. Periodicity coding in the primary
auditory cortex of the Mongolian gerbil (Meriones unguiculatus):
two different coding strategies for pitch and rhythm? J Comp
Physiol A Sens Neural Behav Physiol 181: 651– 664, 1997.
Schulze H and Langner G. Representation of periodicity pitch in
the primary auditory cortex of the Mongolian gerbil. Acta Otolaryngol Suppl 532: 89 –95, 1997.
Schulze H and Langner G. Auditory cortical responses to amplitude modulations with spectra above frequency receptive fields:
evidence for wide band spectral integration. J Comp Physiol A
Sens Neural Behav Physiol 185: 493–508, 1999.
Schwartz IR. Superior olivary complex and lateral lemniscal nuclei. In: The Mammalian Auditory Pathway: Neuroanatomy, edited by D. B. Webster, A. N. Popper, and R. R. Fay. New York:
Springer-Verlag, 1992, p. 117–167.
Sek A and Moore BCJ. The critical modulation frequency and its
relationship to auditory filtering at low frequencies. J Acoust Soc
Am 95: 2606 –2486, 1994.
Semple MN and Aitkin LM. Representation of sound frequency
and laterality by units in central nucleus of cat inferior colliculus.
J Neurophysiol 42: 1626 –1639, 1979.
Shannon RV, Zeng FG, Kamath V, Wygonski J, and Ekelid M.
Speech recognition with primarily temporal cues. Science 270:
303–304, 1995.
Shofner WP, Sheft S, and Guzman SJ. Responses of ventral
cochlear nucleus units in the chinchilla to amplitude modulation by
low-frequency, two-tone complexes. J Acoust Soc Am 99: 3592–
3605, 1996.
Sinex DG, Henderson J, Li HZ, and Chen GD. Responses of
chinchilla inferior colliculus neurons to amplitude-modulated
tones with different envelopes. J Assoc Res Otolaryngol 3: 390 –
402, 2002.
Sivaramakrishnan S and Oliver DL. Distinct K currents result in
physiologically distinct cell types in the inferior colliculus of the
rat. J Neurosci 21: 2861–2877, 2001.
Smith PH, Joris PX, and Yin TCT. Projections of physiologically
characterized spherical bushy cell axons from the cochlear nucleus
of the cat: evidence for delay lines to the medial superior olive.
J Comp Neurol 331: 245–260, 1993.
Smith RL and Brachman ML. Response modulation of auditorynerve fibers by AM stimuli: effects of average intensity. Hear Res 2:
123–133, 1980.
Smith RL and Brachman ML. Adaptation in auditory-nerve fibers:
a revised model. Biol Cybern 44: 107–120, 1982.
Smith ZM, Delgutte B, and Oxenham AJ. Chimaeric sounds
reveal dichotomies in auditory perception. Nature 416: 87–90, 2002.
Sovijarvi ARA. Detection of natural complex sounds by cells in
the primary auditory cortex of the cat. Acta Physiol Scand 93:
318 –335, 1975.
Steinschneider M, Reser DH, Fishman YI, Schroeder CE, and
Arezzo JC. Click train encoding in primary auditory cortex of the
awake monkey: evidence for two mechanisms subserving pitch
perception. J Acoust Soc Am 104: 2935–2955, 1998.
Struhsaker CT. Auditory communication among vervet monkeys
(Cercopithecus aethiops). In: Social Communication Among Primates, edited by S. A. Altmann. Chicago, IL: Univ. of Chicago Press,
1967, p. 281–324.
Suga N, O’Neill WD, Kujirai K, and Manabe T. Specificity of
combination sensitive neurons for processing of complex biosonar
signals in auditory cortex of the mustached bat. J Neurophysiol 49:
1573–1626, 1983.
Symmes D. Discrimination of intermittent noise by macaques following lesions of the temporal lobe. Exp Neurol 16: 201–214, 1966.
Terhardt E. über die durch amplitudenmodulierte Sinustöne hervorgerufene Hörenempfindung. Acustica 20: 210 –214, 1968.
Tsuchitani C and Johnson DH. Binaural cues and signal processing in the superior olivary complex. In: Neurobiology of Hearing:
The Central Auditory System, edited by R. A. Altschuler, R. P.
Bobbin, B. M. Clopton, and D. W. Hoffman. New York: Raven, 1991,
p. 163–193.
Physiol Rev • VOL
280. Ulanovsky N, Las L, and Nelken I. Processing of low-probability
sounds by cortical neurons. Nature Neurosci 6: 391–398, 2003.
281. Van Tassell DJ, Soli SD, Kirby VM, and Widin GP. Speech
waveform envelope cues for consonant recognition. J Acoust Soc
Am 82: 1152–1161, 1987.
282. Vater M. Single unit responses in cochlear nucleus of horseshoe
bats to sinusoidal frequency and amplitude modulated signals?
J Comp Physiol A Sens Neural Behav Physiol 149: 369 –388, 1982.
283. Verhey JL, Dau T, and Kollmeier B. Within-channel cues in
comodulation masking release (CMR): experiments and model predictions using a modulation-filterbank model. J Acoust Soc Am 106:
2733–2745, 1999.
284. Vernier VG and Galambos R. Response of single medial geniculate units to repetitive click stimuli. Am J Physiol 188: 233–237,
1957.
285. Viemeister NF. Temporal modulation transfer functions based
upon modulation thresholds. J Acoust Soc Am 66: 1364 –1380, 1979.
286. Viemeister NF and Plack CJ. Time analysis. In: Human Psychophysics, edited by W. A. Yost, A. N. Popper, and R. R. Fay. New
York: Springer, 1993, p. 116 –154.
287. Von Helmholtz HLF. Die Lehre von den Tonempfindungen als
physiologiche Grundlage für die Theorie der Musik. Trans. Ellis
AJ 1954. On the Sensations of Tone as a Physiological Basis for
the Theory of Music. New York: Dover, 1863.
288. Voss RF and Clarke J. 1/f noise in music and speech. Nature 258:
317–318, 1975.
289. Wakefield GH and Viemeister NF. Selective adaption to linear
frequency-modulated sweeps: evidence for direction-specific FM
channels. J Acoust Soc Am 75: 1588 –1592, 1984.
290. Walton JP, Frisina RD, and O’Neill WE. Age-related alteration in
processing of temporal sound features in the auditory midbrain of
the cba mouse. J Neurosci 18: 2764 –2776, 1998.
291. Walton JP, Simon H, and Frisina RD. Age-related alterations in
the neural coding of envelope periodicities. J Neurophysiol 88:
565–578, 2002.
292. Wang X, Merzenich MM, Beitel R, and Schreiner CE. Representation of a species-specific vocalization in the primary auditory
cortex of the common marmoset: temporal and spectral characteristics. J Neurophysiol 74: 2685–2706, 1995.
293. Wang X, Merzenich MM, Beitel R, and Schreiner CE. Representation of a species-specific vocalization in the primary auditory
cortex of the common marmoset: temporal and spectral characteristics. J Neurophysiol 74: 2685–2706, 1995.
294. Wang X and Sachs MB. Neural encoding of single-formant stimuli
in the cat. I. Responses of auditory nerve fibers. J Neurophysiol 70:
1054 –1075, 1993.
295. Wang X and Sachs MB. Neural encoding of single-formant stimuli
in the cat. II. Responses of anteroventral cochlear nucleus units.
J Neurophysiol 71: 59 –78, 1994.
296. Wang X and Sachs MB. Transformation of temporal discharge
patterns in a ventral cochlear nucleus stellate cell model: implications for physiological mechanisms. J Neurophysiol 73: 1600 –1616,
1995.
297. Warr WB. Parallel ascending pathways from the cochlear nucleus:
neuroanatomical evidence of functional specialization. In: Contributions to Sensory Physiology, edited by W. D. Neff. New York:
Academic, 1982, p. 1–38.
298. Weiss TF and Rose C. A comparison of synchronization filters in
different auditory receptor organs. Hear Res 33: 175–180, 1988.
299. Whitfield IC. Auditory cortex and the pitch of complex tones. J
Acoust Soc Am 67: 644 – 647, 1980.
300. Whitfield IC and Evans EF. Responses of auditory cortical neurons to stimuli of changing frequency. J Neurophysiol 28: 655– 672,
1965.
301. Wiegrebe L and Winter IM. Temporal representation of iterated
rippled noise as a function of delay and sound level in the ventral
cochlear nucleus. J Neurophysiol 85: 1206 –1219, 2001.
302. Wightman FL and Green DM. The perception of pitch. Am Sci 62:
208 –215, 1974.
303. Wilson HR and Wilkinson F. Evolving concepts of spatial channels in vision: from independence to nonlinear interactions. Perception 26: 939 –960, 1997.
304. Winer JA. The functional architecture of the medial geniculate
84 • APRIL 2004 •
www.prv.org
AUDITORY MODULATION PROCESSING
305.
306.
307.
308.
309.
310.
311.
312.
313.
314.
body and primary auditory cortex. In: The Mammalian Auditory
Pathway: Neuroanatomy, edited by D. B. Webster, A. N. Popper,
and R. R. Fay. New York: Springer-Verlag, 1992, p. 222– 409.
Winter IM, Robertson D, and Yates GK. Diversity of characteristic frequency rate-intensity functions in guinea pig auditory nerve
fibers. Hear Res 45: 191–202, 1990.
Winter P and Funkenstein HH. The effects of species-specific
vocalization on the discharge of auditory cortical cells in the awake
squirrel monkey (Saimiri sciureus). Exp Brain Res 18: 489 –504,
1973.
Wollberg Z and Newman JD. Auditory cortex of squirrel monkey:
response patterns of single cells to species-specific vocalisations.
Science 175: 212–214, 1972.
Wong D, Maekawa M, and Tanaka H. The effects of pulse
repetition rate on the delay sensitivity of neurons in the auditory
cortex of the FM bat, Myotis lucifugus. J Comp Physiol A Sens
Neural Behav Physiol 170: 393– 402, 1992.
Wong SW and Schreiner CE. Representation of CV-sounds in cat
primary auditory cortex: intensity dependence. Speech Communication 41: 93–106, 2003.
Yang L and Pollak GD. Differential response properties to amplitude modulated signals in the dorsal nucleus of the lateral lemniscus of the mustache bat and the roles of GABAergic inhibition.
J Neurophysiol 77: 324 –340, 1997.
Yates GK. Dynamic effects in the input/output relationship of
auditory nerve. Hear Res 27: 221–230, 1987.
Yin TCT. Neural mechanisms of encoding binaural localization
cues in the auditory brainstem. In: Integrative Functions in the
Mammalian Auditory Pathway, edited by D. Oertel, A. N. Popper,
and R. R. Fay. New York: Springer, 2002, p. 99 –159.
Yin TCT and Chan JCK. Interaural time sensitivity in medial
superior olive of cat. J Neurophysiol 64: 465– 488, 1990.
Yin TCT, Chan JCK, and Irvine DRF. Effects of interaural time
delays of noise stimuli on low-frequency cells in the cat’s inferior
colliculus. I. Responses to wideband noise. J Neurophysiol 55:
280 –300, 1986.
Physiol Rev • VOL
577
315. Yin TCT, Chan JK, and Kuwada S. Characteristic delays and
their topographic distribution in the inferior colliculus of the cat.
In: Mechanisms of Hearing, edited by W. R. Webster and L. M.
Aitkin. Clayton, Victoria, Australia: Monash Univ. Press, 1983, p.
94 –99.
316. Yin TCT, Joris PX, Smith PH, and Chan JCK. Neuronal processing for coding interaural time disparities. In: Binaural and
Spatial Hearing in Real and Virtual Environments, edited by R.
Gilkey and T. Anderson. New York: Lawrence Erlbaum, 1997, p.
427– 445.
317. Yin TCT, Kuwada S, and Sujaku Y. Interaural time sensitivity of
high-frequency neurons in the inferior colliculus. J Acoust Soc Am
76: 1401–1410, 1984.
318. Yost WA, Sheft S, and Opie J. Modulation interference in detection and discrimination of amplitude modulation. J Acoust Soc Am
86: 2138 –2147, 1989.
319. Young ED. The cochlear nucleus. In: The Synaptic Organization
of the Brain, edited by G. M. Shepherd. New York: Oxford Univ.
Press, 1998, p. 121–158.
320. Young ED, Robert JM, and Shofner WP. Regularity and latency
of units in ventral cochlear nucleus: implications for unit classification and generation of response properties. J Neurophysiol 60:
1–29, 1988.
321. Zhang HM and Kelly JB. AMPA and NMDA receptors regulate
responses of neurons in the rat’s inferior colliculus. J Neurophysiol
86: 871– 880, 1902.
322. Zhao HB and Liang ZA. Processing of modulation frequency in
the dorsal cochlear nucleus of the guinea pig: amplitude modulated
tones. Hear Res 82: 244 –256, 1995.
323. Zhao HB and Liang ZA. Temporal encoding and transmitting of
amplitude and frequency modulations in dorsal cochlear nucleus.
Hear Res 106: 83–94, 1997.
324. Zwicker E. Die Grenzen der Hörbarkeit der Amplitudenmodulation und der Frequenzmodulation eines Tones. Acustica 2: 125–133,
1952.
325. Zwicker E and Fastl H. Psychoacoustics. Berlin: Springer, 1999.
84 • APRIL 2004 •
www.prv.org
Download