Auditory Brain Stem Response to Complex Sounds: A Tutorial

advertisement
Auditory Brain Stem Response to Complex Sounds:
A Tutorial
Erika Skoe1 and Nina Kraus1,2
This tutorial provides a comprehensive overview of the methodological approach to collecting and analyzing auditory brain stem responses to complex sounds (cABRs). cABRs provide a window into
how behaviorally relevant sounds such as speech and music are
processed in the brain. Because temporal and spectral characteristics of sounds are preserved in this subcortical response, cABRs can
be used to assess specific impairments and enhancements in
auditory processing. Notably, subcortical auditory function is neither
passive nor hardwired but dynamically interacts with higher-level
cognitive processes to refine how sounds are transcribed into neural
code. This experience-dependent plasticity, which can occur on a
number of time scales (e.g., life-long experience with speech or
music, short-term auditory training, on-line auditory processing),
helps shape sensory perception. Thus, by being an objective and
noninvasive means for examining cognitive function and experiencedependent processes in sensory activity, cABRs have considerable
utility in the study of populations where auditory function is of
interest (e.g., auditory experts such as musicians, and persons with
hearing loss, auditory processing, and language disorders). This
tutorial is intended for clinicians and researchers seeking to integrate
cABRs into their clinical or research programs.
Early studies of the ABR used simple stimuli such as clicks
and sinusoidal tones to tap into and maximize these transient
and sustained ABRs. Although clicks and tones have been
instrumental in defining these basic response patterns, they are
poor approximations of the behaviorally relevant sounds that
we encounter outside the laboratory (e.g., speech and music,
nonspeech vocal sounds, and environmental sounds). Moreover, although complex sounds include both sustained and
transient features, the response to a complex sound is not
necessarily predictable from the response to click and tones
(Palmer & Shamma 2004; Song et al. 2006; Johnson et al.
2008a). For these reasons, auditory neuroscience has gradually
transitioned to using sounds that are more complex.
Greenberg (1980) was one of the first to adopt complex
stimuli for recording ABRs. After the seminal work by Young
and Sachs (1979) showing that speech formants are preserved
in the discharge pattern of the auditory nerve, Greenberg
(1980) observed that speech-specific information (i.e., vowel
formants) is also faithfully encoded in the ABR. This stimulus
fidelity was further supported by Galbraith et al. (1995), who
demonstrated that cABRs to words can be heard as intelligible
speech when converted from a neural signal into an audio
signal (refer to supplemental audio files). Importantly, because
cABRs occur several milliseconds after the stimulus onset,
they reflect a response of neural origin and not the cochlear
microphonic (CM) (Galbraith et al. 1995), which occurs nearly
simultaneous with the stimulus onset (see cABR Collection
section for more details on the CM).
A plethora of complex stimuli has now been used to
examine how the temporal and spectral features of sounds are
preserved in the ABR (Table 1). The two most extensively
studied are the consonant-vowel (CV) syllable /dɑ/ (Cunningham et al. 2001; Plyler & Ananthanarayan 2001; King et al.
2002; Russo et al. 2004, 2005; Wible et al. 2004, 2005; Kraus
& Nicol 2005; Musacchia et al. 2007; Johnson et al. 2008a;
Banai et al. 2009; Burns et al. 2009; Parbery-Clark et al. 2009a;
Hornickel et al. 2009b; Chandrasekaran et al. 2009) and
Mandarin syllables with differing pitch contours (i.e., lexical
tones) (Krishnan et al. 2004, 2005, 2009b; Xu et al. 2006;
Wong et al. 2007; Song et al. 2008; reveiwed by Krishnan &
Gandour 2009). The ABR to /dɑ/ has been investigated under
different recording conditions: monaural (Cunningham et al.
2001; Banai et al. 2009) and binaural (Musacchia et al. 2008;
Parbery-Clark et al. 2009a) stimulation; left ear and right ear
stimulation (Hornickel et al. 2009a); audiovisual and auditoryonly stimulation (Musacchia et al. 2006, 2007); and in the
presence of background noise (Cunningham et al. 2001; Russo
et al. 2004, 2005, 2008; Parbery-Clark et al. 2009a). Moreover,
in addition to manipulating stimulus parameters (e.g., duration
of the stimulus, duration of the formant transition, and formant
frequency settings), cABRs to /dɑ/ have been evaluated before
and after auditory training (Russo et al. 2008b; Song et al.
(Ear & Hearing 2010;31;1–●)
INTRODUCTION
The human soundscape is characterized by complex sounds
with rich harmonic structures, dynamic amplitude modulations,
and rapid spectrotemporal fluctuations. This complexity is
represented by an exceptionally precise temporal and spectral
neural code within the auditory brain stem, an ensemble of
nuclei belonging to the efferent and afferent auditory systems.
Within the brain stem, two broad classes of time-locked
responses can be defined, namely, transient and sustained. As
the names suggest, brief, nonperiodic stimulus features evoke
transient responses, whereas periodic features elicit sustained
phase-locked responses (Fig. 1). Discovered nearly 40 years
ago (Jewett et al. 1970; Jewett & Williston 1971; Moushegian
et al. 1973), auditory brain stem responses (ABRs) can be
measured using scalp electrodes that pick up electrical potentials generated by the synchronous activity of populations of
neurons in the brain stem. Because these aggregate neural
responses can be recorded objectively and passively, they offer
an excellent means to assess auditory function in a clinical
setting. It is for this reason that the click-evoked ABR has
enjoyed wide-scale clinical use as a metric for determining
auditory thresholds and detecting neuropathologies (Sininger
1993; Starr et al. 1996; Hood 1998; Hall 2006).
Departments of 1Communication Sciences and 2Neurobiology and Physiology, Otolaryngology; Northwestern University, Evanston, Illinois. Auditory
Neuroscience Laboratory www.brainvolts.northwestern.edu.
Supplemental digital content is available for this article. Direct URL citations
appear in the printed text, and links to the digital files are provided in the
HTML text of this article on the journal’s Web site (www.ear-hearing.com).
0196/0202/10/3103-0001/0 • Ear & Hearing • Copyright © 2010 by Lippincott Williams & Wilkins • Printed in the U.S.A.
1
2
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 1. Transient and sustained features in response to /dɑ/. Time domain
representation of a 40 msec stimulus /dɑ/ (gray) and response (black). The
cABR to /dɑ/ includes both transient and sustained response features. This
stimulus evokes seven characteristic response peaks that we have termed V,
A, C, D, E, F, and O. As can be seen in this figure, these peaks relate to
major acoustic landmarks in the stimulus. Peaks occur ⬃7 to 8 msec after
the corresponding stimulus landmark, which is consistent with neural
transmission time between the cochlea and rostral brain stem. In this figure,
the stimulus waveform is shifted in time to account for this transmission
time and to maximize the visual coherence between the two signals. The
V-A complex, often referred to as the onset response, is analogous to the
click-evoked wave V-Vn complex. This sharp onset response arises from
the broadband stop burst associated with /d/. Along with V and A, C and O
are considered transient responses because they correspond to transient
stimulus features, the beginning and end of voicing, respectively. The
region bounded by D and F forms the frequency following response. Peaks
D, E, and F and the small voltage fluctuations between them correspond to
sustained stimulus features, namely the fundamental frequency (F0) and its
harmonics within the consonant-vowel formant transition. The D-E and E-F
interpeak intervals (⬃8 to 9 msec duration, arrows) occurs at the period of
the F0 of the stimulus, which ramps from 103 to 125 Hz. We have
developed a systematic approach for identifying these peaks and have
established normative data for 3- to 4-yr olds, 5- to 12-yr olds, and young
adults (Johnson et al. 2008b; Dhar et al. 2009). Here, and in all figures
showing a stimulus waveform, the stimulus plot is scaled to match the size
of the response. Hence, the microvolt bar refers only to the response.
2008); across the lifespan (Johnson et al. 2008b; Burns et al.
2009; Anderson et al. 2010); and in a number of different
populations including musicians (Musacchia et al. 2007, 2008;
Parbery-Clark et al. 2009a) and children with dyslexia, specific
language impairment, and autism spectrum disorders (Cunningham et al. 2001; Banai et al. 2005; Banai & Kraus 2008;
Banai et al. 2009; Hornickel et al. 2009b; Chandrasekaran et al.
2009; Russo et al. 2009). Similarly, ABRs to syllables with
Mandarin pitch contours have been studied from numerous
perspectives, including in native and nonnative Mandarin
speakers (Krishnan et al. 2005; Xu et al. 2006); in musicians
and nonmusicians (Bidelman et al. 2009; Wong et al. 2007);
before and after auditory training (Song et al. 2008); under
speech and nonspeech conditions (i.e., Mandarin and musical
pitch contours embedded in iterated rippled noise) (Bidelman
et al. 2009; Swaminathan et al. 2008; Krishnan et al. 2009a);
and using native (curvilinear; Krishnan et al. 2005) and
non-native (linear; Xu et al. 2006) Mandrin pitch contours
(Table 1).
The list of stimuli used to evoke cABRs extends beyond
speech syllables and includes words and phrases (e.g., “car,”
“minute,” “chair,” “rose” [Galbraith et al. 1995], “dani” (Wang et
al. 2010), and “chicken pot pie” [Galbraith et al. 2004]). Investi-
gators have also started to explore the use of environmental
sounds, affective nonspeech vocal sounds (e.g., a baby’s cry; Strait
et al. 2009b) and musical sounds as viable stimuli for brain
stem-evoked recordings. Work on music-evoked ABRs has included a bowed cello note (Musacchia et al. 2007, 2008), a
five-note musical melody (Skoe & Kraus 2009), as well as
consonant and dissonant two-note intervals synthesized from an
electric piano (Lee et al. 2009) and tone complexes (Greenberg et
al. 1997; Bidelman & Krishnan 2009). Despite being a relatively
new endeavor, we anticipate that this arena of research will
experience a surge in the upcoming years.
The study of cABRs is a young science. There are many
more stimuli, populations, and experimental paradigms yet to
be explored. Although interest in this topic is growing, most of
the existing cABR research has come from a handful of
laboratories around the world. Each laboratory has taken a
slightly different approach to collecting and analyzing cABRs,
and although the exact methodologies may differ, this work has
led to a complete rethinking of what the human ABR represents
and how it can be used to study auditory function. Taken as a
whole, this work demonstrates that cABRs provide an objective
and noninvasive means for studying auditory function in expert
(e.g., musicians, native speakers), typically developing, and impaired populations (e.g., persons with hearing loss, auditory
processing disorders, and language impairments). Perhaps most
crucially, this work has revealed that subcortical processing of
sound is not hardwired. It is instead malleable with experience and
inextricably linked to cognitive functions involving language and
music. This retuning of subcortical function likely involves the
corticofugal pathway, an expansive tract of efferent connections
that are even more abundant than afferent connections (Galbraith
and Doan 1995; Krishnan et al. 2005; Banai & Kraus 2008; see
Banai et al. 2009 and Kraus et al. 2009 for treatments of language
and music work, respectively, and Tzounopoulos and Kraus 2009
for a review of experience-dependent processes).
It should be pointed out that we use the term “auditory brain
stem response” to describe both transient and sustained responses originating in the auditory brain stem (see Chandrasekaran & Kraus 2009 for an in depth discussion of the
origins of the cABR). Because the acronym ABR has been
widely adopted to refer to click-evoked ABRs, ABRs to
complex sounds are often differentiated by other names including speech-ABR or music-ABR. However, because this tutorial
focuses on complex sounds in a more general sense that
includes nonspeech vocal sounds and environmental sounds, we
adopt the phrase complex-ABR or cABR when referring to
subcortical responses evoked by complex sounds of any variety.
This tutorial, which represents nearly a decade of accumulated knowledge, was written to encourage researchers and
clinicians to adopt cABRs into their clinical or research
programs. To help answer frequently-asked questions and to
prevent often-encountered stumbling blocks, we provide a
comprehensive overview of our systematic approach to stimulus selection, data collection, and analyses. Although we
primarily focus on the methodologies used in our laboratory,
we also discuss alternative approaches that others have found
successful. Because there are a number of systems on the
market that can be used to collect evoked potentials (EPs), we
frame this tutorial generally and do not provide instructions
that are specific to a particular system. More advanced topics
are covered in footnotes and figure captions.
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
3
TABLE 1. Survey of speech stimuli used to evoke cABRs
Speech Syllable
Examples
Vowels
Synthetic
Natural
Consonant-vowel syllables
Synthetic
Natural
Hybrid
Publications
/ɑ/, /u/
/␧/, /ı/, / i/, /ɑ/, /æ/, //, /u/
Krishnan 2002
Greenburg et al. 1980; Dajani et al. 2005, Aiken & Picton 2006, 2008
/dɑ/
Cunningham et al. 2001; Plyler & Ananthanarayan 2001; King et al. 2002;
Wible et al. 2004, 2005; Russo et al. 2004, 2005; Kraus & Nicol 2005;
Johnson et al. 2007, 2008; Banai et al. 2005, 2009; Burns et al. 2009;
Chandarasekaran et al. 2009; Parbery-Clark et al. 2009a
Akhoun et al. 2008a,b
Plyer & Ananthanarayan 2001; Johnson et al. 2008; Hornickel et al. 2009b
/bɑ/
bɑ-dɑ-gɑ continuum
Mandarin pitch contours
/yi/
/mi/
/yɑ/ with linearly rising and falling
pitch contours
Krishnan et al. 2005; Xu et al. 2006
Wong et al. 2008; Song et al. 2008
Russo et al. 2008
Recommended background reading • This tutorial serves
as a companion to recent review articles produced by our
laboratory (Kraus & Nicol 2005; Banai & Kraus 2008; Chandrasekaran & Kraus, Reference Note 1; Kraus et al. 2009;
Tzounopoulos & Kraus 2009). If you are not trained in electrophysiology, we suggest that you review the literature on the
electrophysiological responses to clicks and tones, because this
work formed the foundation for cABR research and still guides
our interpretation and analysis today. Also, because ABRs to
complex sounds are recorded using essentially the same data
acquisition procedures as ABRs to click and tones, and because
many of the experimental considerations are also shared, we
abbreviate our treatment of these topics and refer the reader to the
following resources where these subjects are explored in great
depth: Hood (1998), Hall (2006), Burkard et al. (2007), and
Krishnan (2007).
ROADMAP
Clinical Considerations
Stimulus Selection and Creation
Stimulus Presentation
Intensity
Monaural and binaural stimulation
Left and right ear stimulation
Stimulus polarity
Presentation rate
Transducer
Detecting stimulus jitter
Multiple stimulus conditions
cABR Collection
Electrodes and electrode montage
Filters
Sampling rate
Signal averaging
Simultaneous cABR-cortical EP recordings
Avoiding, detecting and eliminating artifact
Active versus passive test conditions
Data Analysis
Analyzing transient responses
peak latency and amplitude
differences in latency over time
Analyzing sustained responses
static and sliding window analyses
RMS amplitude
cross-correlation
autocorrelation
Fourier analysis
Summary
Conclusion
CLINICAL CONSIDERATIONS
Brain stem responses to complex sounds are well suited for
clinical applications. In addition to being sensitive biological
markers of maturation (Anderson et al. 2010; Johnson et al.
2008a; Burns et al. 2009) and auditory training (Russo et al.
2005; Song et al. 2008), cABRs are highly replicable across
test sessions and reliably measured under passive conditions
using a small number of electrodes (Russo et al. 2004, 2005).
By providing information about the biological basis of hearing
and language disorders, cABRs can also help to identify those
individuals who are most likely to benefit from auditory
training (Hayes et al. 2003; Russo et al. 2005). Thus, in the
assessment of hearing and language function, cABRs complement existing technologies, such as click-ABRs, distortion
product otoacoustic emissions (Elsisy & Krishnan 2008; Dhar
et al. 2009), and behavioral tests of language and auditory
processing, including tests of speech in noise (SIN) perception.
Our work with clinical populations has revealed a link
between cABRs and higher-level language processes such as
reading and SIN perception (Banai et al. 2009; Hornickel et al.
2009b; Chandrasekaran et al. 2009; Parbery-Clark et al.
2009a). For children with language-based learning and reading
impairments, brain stem deficits are specific to the fast spectrotemporal elements of speech (Cunningham et al. 2001; King
et al. 2002; Wible et al. 2004, 2005; Banai et al. 2005, 2009;
Johnson et al. 2007; Hornickel et al. 2009b). This is in contrast
to the more pervasive encoding deficits seen in children with
autism, which also include abnormal subcortical pitch tracking
(Russo et al. 2008b, 2009). Subcortical responses to speech also
show a longer developmental trajectory than click-ABRs (Johnson
et al. 2008a), suggesting that cABRs could provide an objective
index in the early identification of auditory processing deficits that
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
4
lead to learning or literacy disorders. This work with clinical populations led to the development of BioMARK (Biological Marker
of Auditory Processing; Natus Medical http://www.natus.com/; see
also http://www.brainvolts.northwestern.edu under “Clinical Technologies”), a clinical measure of speech-sound encoding.
In clinical practice, poor speech perception in noise is a
commonly encountered complaint. Although everyone experiences reduced speech perception in noise, children with auditory processing disorders and language-based learning impairments, older adults, and individuals with sensorineural hearing
loss often experience excessive difficulty in suboptimal listening situations (Dubno et al. 1984; Pichora-Fuller et al. 1995;
Kraus et al. 1996; Bradlow et al. 2003; Ziegler et al. 2009).
These perceptual findings are reflected in the electrophysiological response when noise (e.g., white noise or multitalker
babble) masks the stimulus. In addition to showing that
acoustic noise disrupts cABRs in normal populations (Russo et
al. 2004), our work has revealed that cABR abnormalities,
which can often emerge in responses recorded with masking
noise (Cunningham et al. 2001), are linked to poorer SIN
perception (Hornickel et al. 2009b; Chandrasekaran et al.
2009). This relationship between subcortical function and
speech perception in noise is also evident in musicians who
demonstrate better performance on clinical tests of SIN perception relative to nonmusician controls, as well as more robust
cABR representation of stimulus features in background noise.
Thus, by counteracting the deleterious effects of masking noise
(Parbery-Clark et al. 2009a,b; Strait et al, 2009a), musical
training may provide a potential remediation strategy for
individuals with SIN impairments.
Although cABRs can be used to assess a possible disorder,
they do not provide the specificity needed to pinpoint the site
of the disorder. This is because an abnormal outcome on a
single measure may reflect more than one underlying cause or
disorder. The same can be said for click-ABRs (Hood 1998).
Thus, no single cABR measure should be considered in
isolation when forming a clinical interpretation of the results
(see Data Analysis section). However, when interpreted collectively, cABR measures provide both an objective means for
delineating the nature of the suspected disorder and an index of
training outcome.
STIMULUS SELECTION AND CREATION
Stimulus selection
Stimulus selection should factor in (1) the population being
studied, (2) the specific research questions at hand, (3) the
electrophysiological properties of the auditory brain stem, and
(4) the acoustic features that maximize transient and sustained
responses (Table 2).
As a practical guide to stimulus selection, we start with our
rationale for selecting /dɑ/ as our primary stimulus. We then
describe the transient and sustained aspects of complex stimuli
and how they manifest in the cABR. Because there is such a
clear transparency between the acoustic features of the stimulus
and their subcortical transcription, a basic knowledge of
acoustics is necessary for both selecting the stimulus and
analyzing the response. Although it is beyond the scope of this
tutorial to provide a comprehensive overview, we include in
this section brief descriptions of the complex nature of speech
and music.
If you are new to cABRs, you are strongly advised to start
with stimuli that have been thoroughly characterized (e.g., /dɑ/,
vowels, Mandarin pitch contours) to ensure that your collection
system is functioning properly (see Stimulus Presentation
section). When using novel stimuli, pilot studies are mandatory. During the piloting phase, several different stimulus
tokens should be used to determine whether robust and reliable
cABRs can be obtained. (For a general overview of the
techniques used for detecting and assessing EPs, refer to
TABLE 2. Recommended stimulus and presentation parameters for cABRs
Parameter
Stimulus
Type
Characteristics
Transient
Sustained
Creation
Duration
Stimulus Presentation
Intensity
Monaural Stimulation
Binaural Stimulation
Transducer
Rate and ISI
Presentation Software
Recommendation
Rationale/Comments
Speech, music, nonspeech vocal sounds, environmental sounds, etc.
Examine how behaviorally relevant sounds are turned
into neural code
Well-defined temporal features such as strong attacks
and amplitude bursts
F0 ⬍300 Hz
Natural, synthetic, or hybrid
Maximize transient responses
Short: 40 –100 msec
Long: ⬎100 msec
Well above hearing threshold: 60 – 80 dB SPL
Separate norms should be collected for each ear
Maximizes response characteristics
Magnetically shielded ear inserts
Rate: dependent on stimulus duration
ISI: ⱖ30% of stimulus duration
Perform thorough testing to ensure precise, nonjittered
stimulus presentation
cABRs, auditory brain stem responses to complex sounds; ISI, interstimulus interval.
Maximize sustained responses
cABR stimuli can be created with many different
software packages
Minimizes recording time
Maximizes naturalness
Stimuli should be precisely calibrated before each
test session using a sound level meter
Monaural is preferred for children
Binaural is more realistic than monaural
Minimizes stimulus artifact
See Table 3 for recording-based issues that impact
rate and ISI decisions
Because of the temporal sensitivity of the cABR, a
small amount of jitter will spoil the response
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Elberling & Don 2007). Tips for maximizing cABRs are
provided throughout this section and summarized in Table 2.
Why /dɑ/? • Although we currently use a large repertoire of
sounds, our early cABR work focused on the syllable /dɑ/
(Cunningham et al. 2001; Russo et al. 2004), an acoustically
complex sound, which begins with a stop burst, characterized
by an aharmonic and broadband frication, followed by a
harmonically rich and spectrally dynamic formant transition.
This CV syllable was chosen for a number of reasons. First,
/dɑ/ is a relatively universal syllable that is included in the
phonetic inventories of most European languages (Maddieson
1984). Second, the syllable consists of a transient segment
followed by a sustained periodic segment. It is, in a sense,
much like a click followed by a tone—two acoustic signals
whose brain stem response properties have been extensively
characterized. Because of these acoustic similarities, the transient onset response to the stop burst is similar to the clickABR, and the sustained response to the vowel is similar to
tone-evoked frequency following response (FFR). Third, stop
consonants pose great perceptual challenges to clinical populations such as the hearing and learning impaired (Tallal &
Stark 1981; Turner et al. 1992; Kraus et al. 1996). However,
because stop bursts are rapid and low in amplitude compared to
vowels, even normal-hearing adults and children can find it
difficult to discriminate between contrastive stop consonants
(e.g., “dare” versus “bare”) in noisy environments. Finally, we
continue to use this syllable as our primary stimulus because it
elicits clear and replicable ABRs.
Transient features • Transient responses, which are characterized by fast response peaks lasting fractions of milliseconds,
are evoked by brief, nonsustained stimulus features such as the
onset and offset of sounds. In the case of speech syllables,
transient features also include the onset of vocal chord vibration (i.e., voicing) (Fig. 1). For a simple musical stimulus, such
as a bowed note of a cello, transient features include the initial
burst of sound created by the bow contacting the string and the
offset of sound. The morphology of the cABR onset is dictated
by the attack characteristics (i.e., how quickly the sound
reaches full volume) of the specific acoustic token. Stimuli
with sharper rise times (i.e., abrupt onset/amplitude bursts) are
more broadband (i.e., less frequency specific) and cause
broader and more simultaneous activation of the cochlea,
which enlists a larger population of neurons to fire synchronously and leads to more robust (i.e., larger and earlier)
transient responses.
For both speech and music, attack characteristics are important for imparting timbre (sound quality), and they contribute to the identification of specific speech sounds (Rosen 1992)
and instruments (Grey 1977; Howard & Angus 2001). Within
the classes of speech sounds, obstruent stop consonants (e.g.,
/d/, /p/, /k/) have, by definition, sharper stimulus onsets than
nasals and glides (e.g., /m/ and /y/, respectively) and produce
more robust onset responses. Although fricatives and affricates
have not been used to elicit cABR (to the best of our
knowledge), on the basis of stimulus characteristics, we hypothesize the following continuum: earlier and larger onsets for
obstruent stops, with affricates (e.g., /tʃ/ pronounced “ch”),
fricatives (e.g., /z/), and sonorants (a class of sounds comprising nasals, glides, and liquids; e.g., /r/ and /l/) having increas-
5
ingly smaller and later onsets. Similarly, when choosing a
musical stimulus for eliciting cABRs, the attack properties of
the instrument should be taken into consideration. For example,
percussive instruments, such as the piano, have fast, steep
attacks, and bowed string instruments have comparatively
smoother attacks (Fig. 2). Likewise, the mode of playing an
instrument affects the attack (e.g., a plucked string has a shorter
rise time than a bowed string). In addition, abrupt changes in
the amplitude envelope of the sound also trigger onset-like
transient responses. For example, Strait et al. (2009b) recorded
cABRs to the sound of a baby crying—this particular token
included multiple amplitude bursts that produced a series of
sharp transient responses (Fig. 2).
Sustained features • Sounds containing continuous acoustic
features such as sinusoidal tones, harmonically complex vowels, and musical notes elicit sustained brain stem responses
reflecting synchronous population-wide neural phase locking.
Moushegian et al. (1973) were the first to describe the
sustained response in human scalp-recorded brain stem potentials. Using tones (sinusoids ranging from 250 to 2000 kHz)
they demonstrated that each frequency evokes a unique response in which the pattern of neural discharge is time locked
to the temporal structure of the eliciting sound. For example,
the brain stem response to a 250 Hz tone follows the periodicity
of the tone such that response peaks occur at 4 msec intervals
(period ⫽ 1/frequency; 4 msec ⫽ 1/250 Hz). For this reason,
sustained brain stem responses are often called frequency
following responses. Scalp-recorded FFRs can be recorded to
frequencies as high as 1.5 kHz (Moushegian et al. 1973;
Krishnan 2002; Aiken & Picton 2008), although phase locking
becomes weaker with increasing frequency (Greenberg 1980;
Greenberg et al. 1987; Krishnan 2007), reflecting the low-pass
nature of brain stem phase locking. Thus, subcortical phase
locking provides a mechanism for representing low frequencies
contributing to pitch and timbre (Greenberg 1980; Wile &
Balaban 2007; Kraus et al. 2009; Bidelman & Krishnan 2009)
(Fig. 3), whereas a place code likely underlies the neural
encoding of frequencies that are too high to elicit an FFR
(Langner & Schreiner 1988; Chandrasekaran & Kraus 2009).
To obtain sustained responses, the cABR stimulus should
have a low pitch with a fundamental frequency (F0) in the
range of 80 to 300 Hz. In speech, the F0 ranges from ⬃80 Hz
for a deep male voice to ⬃400 Hz for a child. Although speech
can contain spectral information up to 10 kHz, the spectral
information necessary for distinguishing different consonants
and vowels is largely ⬍3 kHz. When selecting a speech
phoneme, keep in mind that some speech formants, including
the second formant of many vowels, are above the range of
brain stem phase locking (Moushegian et al. 1973; Greenberg
1980) and may not be observable in the phase-locked response
but may be observable in the timing and the spectral phase of
the response (Johnson et al., 2008b; Hornickel et al., 2009b;
Skoe et al., 2009; Nicol et al., 2009; Fig. 9).
A wide range of frequencies is also encountered in music.
For example, the lowest note on a standard 88-key piano occurs
at 32.70 Hz and the highest at 4186 Hz (see Everest 2001 for
the frequency ranges of various instruments). Because the F0s
of instruments are often higher than those of speech, they often
fall outside the limits of strong brain stem phase locking.
Therefore, it may be preferable to use an instrument within this
cABR target range, such as the trombone.
6
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 2. Transient responses. To maximize the visual coherence between the
stimulus (gray) and response (black), stimulus waveforms are shifted in time to
align the stimulus with the response onset. Arrows indicate major transient
features. In the response, these transient features are represented as large peaks.
(Top) The brainstem response to a cello note with a low pitch (G2, 100 Hz). The
sound onset occurs when the bow contacts the string and causes a brief
transience before the string starts to vibrate in a periodic manner. This leads to
a strong onset, followed by a more sustained response. Because of the gradual
decay of this sound, a strong offset response is not apparent. For more information
see Musacchia et al. 2007. (To listen to the stimulus, go to Audio file, Supplemental
Digital Content 1, http://links.lww.com/EANDH/A1; to listen to the response, go to
Audio file, Supplemental Digital Content 2, http://links.lww.com/EANDH/A2.) (Middle) Percussive instruments, such as the piano, have fast attacks and rapid decays.
These features are evident in this five-note piano melody. Large response peaks
coincide with the onset of each piano note. The stimulus amplitude envelope is also
preserved in the response. (To listen to the stimulus, go to Audio file, Supplemental
Digital Content 3, http://links.lww.com/EANDH/A3; to listen to the response, go to
Audio file, Supplemental Digital Content 4, http://links.lww.com/EANDH/A4.) (Bottom) Sounds with abrupt changes in the amplitude envelope also trigger multiple
onset-like transient responses. This is illustrated here using the sound of a crying
baby. For more information see Strait et al. 2009b. In the top and bottom plots, the
stimulus was presented binaurally, and in the middle plot, it was presented
monaurally (see Stimulus Presentation section).
Fig. 3. Sustained phase-locked responses. Low frequencies, including those
associated with pitch and timbre perception, are preserved in the auditory
brain stem responses to complex sounds (cABR). For complex sounds, the
pitch corresponds (in large part) to the lowest resonant frequency, also
known as the fundamental frequency (F0). Timbre enables two sounds with
the same pitch to be differentiated. Timbre is a multidimensional property
resulting from timing cues of attack and decay, and the interaction of
spectral and temporal properties associated with the harmonics of the F0.
These timbral features together give rise to the characteristic sound quality
associated with a given instrument or voice. (Top) The full view of the 200
msec time-domain stimulus /dɑ/ (gray) and its cABR (black). The spectrotemporal features of the stimulus, including the F0 and harmonics, are
evident in the response. The gray box demarcates six cycles of the F0. This
section is magnified in the middle panel. (Middle) The smallest repeating
unit of the stimulus has a duration of 10 msec (i.e., the periodicity of the
100Hz F0). (Bottom) The left panel shows a close-up of a single F0 cycle.
The harmonics of the F0 (frequencies at multiples of 100 Hz) are represented as small fluctuations between the major F0 peaks in both the
stimulus and response. In the right panel, the stimulus and cABR are plotted
in the frequency domain.
Time varying and harmonically complex sounds • Reallife sounds, unlike sine waves, have nonstable F0s and complex
harmonic structures. For time-varying stimuli, such as diphthongs, CV formant trajectories, musical glissandos, and linguistic pitch contours, cABRs follow the frequency contour of
the stimulus with interpeak intervals systematically increasing
or decreasing with changing frequency (changes as small as 1
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 4. Distortion products (DPs) in the cABR. Stimulus (top) and response
(bottom) spectra for a consonant musical interval (major 6th). This musical
stimulus was created from G2 and E3 notes produced by an electric piano.
When two harmonically complex notes are played simultaneously, the
F0s and harmonics interact via nonlinear auditory processes to create
DPs that are measurable in the response but not present in the stimulus.
In this figure, italics denote the DPs, f1 denotes the lower tone (G2, red),
and f2 denotes the upper tone (E3, blue). For more information see
Lee et al. 2009 (To listen to the stimulus, go to Audio file, Supplemental Digital Content 5, http://links.lww.com/EANDH/A5; to listen
to the response, go to Audio file, Supplemental Digital Content 6,
http://links.lww.com/EANDH/A6).
Hz) (Figs. 11 and 13). For harmonically complex sounds, phase
locking is observed to the frequencies physically present in the
stimulus, and to the frequencies introduced via nonlinear
processes within the auditory pathway. Examples include
phase locking to the amplitude envelope (Hall 1979; Aiken &
Picton 2006, 2008) and distortion products (Elsisy & Krishnan
2008; Abel & Kossl 2009; Lee et al. 2009) (Fig. 4).
Speech • During speech production, sound is produced when
air leaving the lungs causes the vocal chords to vibrate. For
speech, the F0 is determined by the rate of this vibration.
Because the vocal chords close more slowly than they open, the
sound that is produced is not a sinusoid. It is instead a complex
waveform resembling a triangle or saw-tooth wave, containing
harmonic energy at integer multiples of the F0. This harmonically-rich sound is then filtered (i.e., shaped) by the speech
articulators (i.e., teeth, tongue, lips) to form different speech
sounds. Different articulator configurations change the reso-
7
nance properties of the vocal tract causing certain harmonics to
be amplified and others to be attenuated. Formants, which
correspond to peaks in the speech spectrum, arise from this
filtering. Each speech sound can be uniquely identified by its
characteristic formant pattern, with the first two or three
formants being sufficient for identifying most speech sounds
(Liberman 1954). The cABR, which synchronizes to the F0 and
harmonics of the speech waveform, contains greater energy for
harmonics coinciding with formants (Krishnan 2002; Aiken &
Picton 2008), because there is more energy in the signal at
these frequencies. This has been described as a type of
“formant capture” (Young & Sachs 1979; Krishnan 2002),
whereby harmonics adjacent to the formant regions are emphasized. Also, note that in the speech spectrum, the F0 has less
energy than the speech formants (Fig. 5). However, because the
opening and closing of the vocal folds produces a signal that is
naturally amplitude modulated, the F0 and other modulating
frequencies are amplified or introduced into the neural system
during nonlinear cochlear processing (Brugge et al. 1969; Regan
& Regan 1988; Lins & Picton 1995; Aiken & Picton 2008).
Music • In contrast to speech, which is dominated by fast
spectrotemporal transitions, music has more sustained temporal
and spectral elements, slower transitions, and finer frequency
spacing (Zatorre et al. 2002; Shannon 2005). In music, the
mechanism of the F0 generation depends on the instrument. For
example, the reed is the source of the F0 vibration for the oboe
and clarinet, whereas the string is the source for the violin and
guitar. In the same way that speech sounds are characterized by
unique formant configurations, instruments also have characteristic harmonic structures that impart timbre. Specifically, the
timbre of a musical sound is determined by the rise time of the
attack (discussed above), the spectral flux (i.e., change in
harmonics over time), and the spectral centroid (i.e., the
distribution of the harmonics) (Grey 1977). The clarinet, for
example, has a harmonic structure dominated by lower frequency odd harmonics (the even harmonics have been attenuated). The flute, saxophone, trombone, and tuba, which are all
characterized by strong odd and even harmonics, can be
differentiated by the distribution of the harmonics (e.g., the
energy of the tuba is concentrated in the lower harmonics).
As can be seen in Figure 5, the harmonic structure of
musical sounds is partially preserved in the response. Generally speaking, phase locking is more robust when there is less
spectral flux (i.e., brass and woodwind families; Grey 1977).
The timbre of a musical instrument also depends on how
quickly the sound decays (e.g., a piano has both a fast onset
and quick decay, whereas an electric piano has a slower onset
and decay). For the purposes of eliciting an FFR, sounds with
longer decays elicit responses that are more sustained (Fig. 2).
For more information on the rather complex and multifaceted
topic of musical timbre refer to Fletcher and Rossing (1991)
and Howard and Angus (2001).
Stimulus duration • Within the speech-ABR literature, the
length has varied between 60 msec and 2 sec for vowels
(Krishnan 2002; Dajani et al. 2005; Aiken & Picton 2006,
2008) and for CVs from 40 to 500 msec (Musacchia et al.
2007; Banai et al. 2009). In experiments using musical stimuli,
the duration has ranged from 170 msec for a musical interval
(Lee et al. 2009; Fig. 4) to 1.1 sec for a five-note musical
melody (Skoe & Kraus 2009; Fig. 2).
8
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 5. cABRs to harmonically complex signals. The sustained aspects of cABRs (right) and their evoking stimuli
(left) can be visualized using spectrograms (see Data analysis section and Fig. 13). These graphs represent a 200msec steady-state (unchanging) segment of the vowel /ɑ/
(top) and the cello note (bottom, see also Fig. 2) used in
Musacchia et al. 2007. In this example, the speech (top)
and musical stimulus (bottom) have the same pitch (F0 ⫽
100 Hz; arrows), yet have different harmonic structures and
consequently different timbres. These acoustic differences
account for the different response patterns. For the cello
(bottom), the dominant frequency bands occur at 200 and
400 Hz in both the stimulus and response. For the speech
signal (top), the harmonics around the first formant (700 Hz)
have more energy than the F0. Yet, lower frequencies
dominate the response. This reflects the low-pass nature of
brain stem phase locking and the nonlinear processes that
amplify the energy of the F0 and the lower harmonics. For
more information see Musacchia et al. 2007.
Stimulus creation
presentation systems require a specific sampling rate, the recordings may need to be resampled. Likewise, when comparing the
stimulus and the response via cross-correlation (see Data Analysis
section), the two signals must have the same sampling rate. For
these reasons, it is best to sample the stimulus recordings at a high
rate so that upsampling is not necessary.
Speech • Although natural speech and music tokens are ideal in
the sense that they represent real-world sounds, they are inherently
more complex, variable, and aperiodic. Consequently, with natural
tokens, it is difficult to study how specific physical characteristics
are represented at a subcortical level. Having precise control over
the stimulus parameters is especially important when multiple
stimuli are compared across a single dimension. For example,
/bɑ/, /dɑ/, and /gɑ/ can be distinguished based on their differing
second formant trajectories (F2) (Liberman 1954; Fig. 9). However, natural utterances of /bɑ/, /dɑ/, and /gɑ/ vary on more
parameters than simply F2 (as discussed in Johnson et al. 2008b).
In these cases, investigators rely on speech synthesizers like Klatt
(1976) to create stimuli with precisely defined time-varying and
sustained features.*
In the case of the F0, programs such as STRAIGHT
(Kawahara 2009) and Praat can be used to remove aperiodicities, raise or lower the F0, or apply a particular time-varying
pitch contour (Wong et al. 2007; Russo et al. 2008b). To
generate stimuli with pitch contours, hybrid stimuli can be
made by manipulating the F0 of a natural speech token or by
combining two natural speech tokens using the PSOLA method
(Moulines & Charpentier 1990) in a program like Praat
(Boersma & Weenink 2009). See Wong et al. (2007) (supplement) and Russo et al. (2008) for more details.
With modern computers, recording natural sounds is relatively
simple. The process (ideally) requires a sound-attenuated chamber, a microphone, a high-resolution sound card, and software for
recording (e.g., Adobe Audition [Adobe Systems, San Jose, CA],
Praat [Boersma & Weenink 2009]) (see Aiken & Picton 2008 and
Wong et al. 2007 [supplement] for more details). To ensure that a
viable token can be extracted, multiple recordings and, when
possible, multiple speakers/instruments should be used. Both
natural and synthetic sounds should be created with a high
digitization rate (⬎20 kHz). However, because some stimulus
*Klatt, which can function as both a cascade and parallel synthesizer
(Holmes 2001), facilitates the manipulation of dozens of features including
duration, output sampling rate, the amplitude of frication, the number of
formants, and the frequency and bandwidth of each formant. In our
experience, the process of creating a synthetic speech sound with the
desired percept requires patience and a lot of trial and error. In addition,
although you do have control over many parameters, the output may
deviate from the input because of the complex interaction among parameters. To confirm that the stimulus meets the desired specifications, the
synthetic sound should be acoustically analyzed in a program such as Praat
or Adobe Audition (Adobe Systems, Inc., San Jose, CA).
Because of the sheer number of stimulus presentations
required to obtain a robust response, there is an obvious
trade-off between stimulus duration and the length of the
recording session. For example, to record 6000 trials to a
synthesized 40 msec /dɑ/ takes ⬃9 min, assuming an interstimulus interval (ISI) of 50 msec. Yet, natural sounds generally occur on the order of seconds and not fractions of seconds,
which necessarily requires longer recording sessions. In our
experience, the only factor limiting the stimulus duration is the
feasibility of having a subject remain still for a long time. Thus,
stimulus duration may need to be restricted to present the
desired number of stimuli in a reasonable amount of time. For
speech syllables, one tactic is to record ABRs to a stimulus
containing the consonant and CV transition without a steadystate vowel (Russo et al. 2004; Johnson et al. 2007, 2008a;
Banai et al. 2009; Dhar et al. 2009; Hornickel et al. 2009a)
(Fig. 1 versus Fig. 10). Because each CV has a unique formant
transition, the steady state vowel can be removed with little
impact on the percept. In fact, musical timbre and vowel
identity can be accurately determined from one to four cycles
of the F0 (Gray 1942; Robinson 1995; Robinson & Patterson
1995) but pitch identification requires at least four cycles
(Robinson 1995; Robinson & Patterson 1995). Stimulus duration greatly affects pitch because lower frequencies have longer
periods than higher frequencies (e.g., a 20 msec stimulus can
have no meaningful frequency representation below 50 Hz).
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Music • Because of the increased prevalence of computer-made
music, a large number of tools are currently available to generate
music stimuli. The choice of the right tool depends on the desired
trade-off between acoustic control and sound naturalness. Strict
acoustic control of the stimuli can be best achieved through
additive synthesis in programming environments such as
MATLAB (The Mathworks, Natick, MA). Acoustic samples of
real instruments, which can be found in some music software
packages (e.g., Garritan Personnal Orchestra in Finale software;
MakeMusic, Inc), are another source for music stimuli. An intermediate solution is to use synthesizers, many of which are available as
plugins for music software such as Cubase Studio (Steinburg Media
Technologies).†
Other sounds • It can be difficult to construct synthetic
sounds with strong affective quality. Thus, natural recordings
such as those available from the Center for the Study of
Emotions and Attention (University of Florida, Gainesville,
FL) can be used to study paralinguistic aspects of sounds (Strait et
al. 2009b). Similarly, for environmental sounds, we suggest
selecting a stimulus from a corpus of natural sounds (e.g., Series
6000 General Sound Effect Library, a royalty-free CD of environmental sounds; Sound Ideas, Richmond Hill, Ontario, Canada).
9
evoked FFR reflect distinct neural processes (Hoorman et al.
1992). Using a similar design, Krishnan recorded cABRs to
steady-state vowels between 55 and 85 dB nHL (in 10 dB
increments) and found that the harmonics in the formant range
were clearly represented for each intensity. Although the
amplitudes of the individual harmonics tended to increase with
increasing intensity, the trajectory was not identical for all
harmonics, nor was the increase always linear. Taken together,
this work suggests that different components of the cABR are
distinctively impacted by intensity level.
Monaural and binaural stimulation
This section covers topics relating to stimulus presentation,
including stimulus intensity, monaural and binaural stimulation, left and right ear stimulation, stimulus polarity, stimulation rate, transducers (i.e., earphones and loudspeakers), jitter
in the stimulus presentation, and multiple stimulus conditions.
A summary is provided in Table 2.
It is well established that when a sound is heard with both
ears that it is perceived to be louder than when the same sound
is presented at the same intensity to just one ear (binaural
loudness summation is estimated to be 6 dB). Because the
auditory brain stem plays an integral role in binaural processing
(reviewed in Moore 1991), binaural interaction effects have
been widely studied in the click-ABR and tone-FFR literature
(Dobie & Berlin 1979; Ballachanda et al. 1994; Krishnan &
McDaniel 1998; Ballachanda & Moushegian 2000). Although
similar parametric experiments have not been conducted for
complex stimuli, the same principles are assumed to apply. For
practical reasons, binaural stimulation is preferable when
testing adults not only because it leads to larger and more robust
responses but also because it is more realistic in that we usually
listen with both ears. However, monaural stimulation is used for
individuals with asymmetric hearing thresholds, children and
other populations who have difficulty sitting still during testing, or
when the subject must attend to another sound.
Intensity
Left and right ear stimulation
Speech, music, and other complex sounds are typically
delivered suprathreshold within the “conversational” range of
60 to 85 dB SPL. Similar to the familiar click-ABR, cABRs are
also intensity dependent. This necessitates that the intensity be
stable across subjects and recording sessions. Before each test
session, the output intensity should be calibrated using a
sound-level meter with a coupler that enables the output to be
measured directly from the insert earphones (see below).
The effects of increasing intensity have been examined in
the cABR literature (Krishnan 2002; Akhoun et al. 2008a).
Using a /bɑ/ syllable, Akhoun et al. explored how the timing of
the speech-evoked onset response and FFR (elicited by the
same stimulus) varied as a function of intensity (0 to 60 dB SL,
in 10 dB increments). Consistent with the literature on clicks
and tones, both response components showed systematic latency shifts with increasing intensity. However, the FFR peaks
showed a steeper latency-intensity function than the onset
response, suggesting that the onset response and speech-
Left and right-ear stimulation produce similar but not
identical ABRs (Akhoun et al. 2008b). In fact, the well
established right-ear advantage for speech is evident for discrete components of the cABR (Hornickel et al. 2009a). For a
review of the click-ABR and tone-FFR literature relating to left
vs. right ear stimulation, see Hornickel et al. (2009a).
STIMULUS PRESENTATION
†Unfortunately, these synthesizers are often black boxes. Although they
offer control over certain acoustic features, the sound they provide is not as
good as samples of real instruments. For all methods of stimulus generation, the acoustic properties of the stimulus should be checked with a sound
analyzer before proceeding with the experiment. Acoustic analyzer software
dedicated to music sounds includes the MIR toolbox for MATLAB
(http://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox),
the IPEM toolbox for MATLAB (http://www.ipem.ugent.be/Toolbox/),
and PsySound3 (http://psysound.wikidot.com).
Stimulus polarity
Periodic sound waves traveling through air consist of
alternating regions of compression (i.e., condensation) and
decompression (i.e., rarefaction) of air molecules. In a timeamplitude plot of a sound wave, condensation and rarefaction
manifest themselves as positive or negative deflections (respectively) from the baseline. Because clicks consist of a single
positive or negative deflection, they are defined as either
having condensation or rarefaction polarity. However, because
periodic sounds oscillate between condensation and rarefaction
states, the same terminology is not used. In Figures 6 and 8, we
have adopted “A” and “B” to refer to the two different
polarities. To convert a stimulus from one polarity to another,
the waveform is shifted by 180 degrees (i.e., multiplied by ⫺1).
When collecting cABRs, two different approaches can be
followed: (1) record the response to a single-stimulus polarity
(Krishnan 2007; Aiken & Picton 2008) or (2) record responses
to both polarities and either add (Russo et al. 2004; Akhoun et
al. 2008a) or subtract responses (Greenberg 1980; Greenberg et
al. 1987; Wile & Balaban 2007; Krishnan 2002) to the two
10
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
vanced considerations of stimulus polarity see Figure 6, footnote ‡, Don et al. (1996), and Aiken and Picton (2008).
Presentation rate
Presentation rate depends on the length of the stimulus and
the ISI (defined as the period of silence between the offset of
one stimulus and the onset of the next). A second way to
express the presentation interval is by defining stimulus onset
asynchrony (SOA), which is measured from the onset of one
stimulus to the onset of the next. The two measures are
essentially the same for click stimulation, as a click virtually
has no duration, but ISI and SOA are different for cABRs. In
surveying the cABR literature, the ISI has varied from ⬃30%
of the stimulus length to more than double the length.
When choosing an ISI the following considerations should
be made. First, changing the ISI can alter the perception of a
complex sound. Second, if the ISI is not sufficiently long, the
response to one stimulus may not fully conclude before the
next stimulus is presented. Thus, the ISI and the duration of
the averaging window should be long enough to allow for the
response to return to baseline. The ISI should also allow for an
ample sample of the baseline (i.e., nonstimulus related) activity
so that signal to noise ratios (SNRs) can be evaluated (see Data
Analysis section). Third, latencies and amplitudes, particularly
of onset responses, are affected by the rate of presentation (Hall
2006). In contrast, it appears that FFR latencies in adults are
less susceptible to rate changes than onset responses
(Parthasarathy & Moushegian 1993; Krizman et al, Reference
Fig. 6. Stimulus polarities. Responses to the two polarities of the /dɑ/
stimulus from Figure 1. For shorthand, they are referred to as polarity A (red)
and B (blue). The cABRs to A and B are quite similar, especially for the
prominent negative going peaks corresponding to the onset, offset and F0
(top). By adding or subtracting A and B, envelope and spectral components
of the response, respectively, can be separated (see footnote on page 10).
Adding (gray) accentuates the lower frequency components of the response, including the temporal envelope, and minimizes stimulus artifact
and the cochlear microphonic (see Fig. 8 and Data Analysis section for a
discussion of artifacts). Subtracting (black) emphasizes the higher frequency
components by maximizing the spectral response; however, this process
can also maximize artifact contamination. In the bottom panel, the ADD
and SUB responses are plotted in the frequency domain. In contrast to the
ADD response, which has peaks occurring at F0 (⬃100 Hz) and the
harmonics of the F0, the SUB response has well-defined peaks in the 200 to
700 Hz range. This range corresponds to the first formant trajectory of this
stimulus. In this figure, ADD ⫽ (A ⫹ B)/2; SUB ⫽ (A ⫺ B)/2.
stimulus polarities. The process of adding will accentuate the
lower-frequency components of the response including phase
locking to the amplitude envelope and minimize stimulus
artifact and the CM (see Data Analysis section for a further
discussion of artifacts). Subtracting will bias the higherfrequency components by maximizing the spectral response,
although this process can also maximize artifact contamination.
It should be noted that while we use the addition method for
many of our published analyses, our results have been internally replicated with single-polarity stimuli. For more ad-
‡Phase locking to spectral energy in the stimulus follows the phase of the
stimulus; thus, it inverts when the stimulus polarity is inverted. In Figure 6, this
inversion is evident in the regions between the large peaks of the A and B
responses. Because of this inversion, adding A and B will theoretically cancel
out the spectral response. Subtraction, in contrast, enhances the spectral
response (including formant frequencies and temporal fine structure) and
attenuates the envelope response. Phase locking to the amplitude envelope is
independent of phase because the energy of the envelope is not present in the
speech signal but is introduced into the auditory pathway during nonlinear
cochlear processing. Thus, phase locking to the amplitude envelope does not
invert between A and B and it is, therefore, maximized when the responses to
A and B are added. Although the two polarities of an auditory stimulus do not
sound different, the two polarities do not elicit identical responses, especially
in cases where the stimulus waveform is asymmetric (Greenberg 1980). An
advantage to using the addition and subtraction methods is that they represent
the average of two polarities, thus relieving the necessity to choose which
polarity to use. One criticism of the addition method is that although it will
minimize artifacts, it will also create a response at twice the frequency of the
actual response (Aiken & Picton 2008). Although this is the case for simple
sinusoids, the polarity effects are more complicated for time-varying spectrally
complex stimuli. For a complex stimulus, involving multiple frequency
components, a polarity reversal will theoretically affect the response latency to
a given frequency. This is because the 180-degree inversion will shift the phase
of the frequency component by one half-period, resulting in different single
unit responses for each polarity. However, because of the complex interaction
of temporally dynamic frequency components within a population-wide
response, the doubling of frequency for any particular frequency (such as the
F0) is not evident in the cABR, and frequency-dependent latency shifts may be
obscured or washed out (Don et al. 1996). Consequently, although the
responses to the two polarities are not strictly identical, the differences are not
always as extensive as might be predicted. Moreover, although the effects of
stimulus polarity have been extensively explored to simple stimuli, the
observations based on click stimuli do not generalize to low-frequency tone
bursts (Don et al. 1996) nor do the polarity effects for simple stimuli generalize
to complex sounds. For a more in-depth description and discussion of these
cochlear processes and the effect of stimulus polarity on the click-evoked
response, see Hall (2006) and Starr and Don (1988).
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Note 2). Using a speech stimulus, Krizman and colleagues also
found that the magnitude of the higher frequency components
of the response diminished with increasing rate, but the F0 did
not. Fourth, to avoid contamination from the AC line frequency
(60 Hz in North America, 50 Hz elsewhere), a presentation rate
should be chosen such that the line frequency divided by the
rate is not an integer (e.g., for both 50 and 60 Hz line noise,
10/sec is a bad choice, but 10.3 or 11 is okay). Alternatively, a
variable ISI might be used. Fifth, when conducting simultaneous cABR-cortical EP recordings (see Data Analysis section)
longer ISIs may be required to obtain robust cortical auditory
EPs.
An alternative approach is to record cABRs in several
blocks of continuous stimulation (i.e., no silence between
stimuli) using the same procedure used to record auditory
steady-state responses (Dajani et al. 2005; Aiken & Picton
2006, 2008). This technique maximizes spectral resolution at
the expense of temporal resolution.
Transducer
Because circumaural headphones increase the chances for
stimulus artifact contamination (see Data Analysis section), we
strongly advise against using them and instead recommend electromagnetically shielded insert earphones (e.g., E•A•RTONE3A
[Aearo Technologies, Minneapolis, MN], ER-3a [Etymotic Research, Elk Grove Village, IL], Bio-logic insert-earphones [Biologic Systems Corp., Mundelein, IL]). When testing persons with
hearing aids or other populations not suited for inserts (e.g.,
cochlear implant wearers), loudspeakers can be used to deliver the
stimulus. However, sound field delivery causes the latency of the
response to be more variable because the sound intensity changes
subtly with head movements. To minimize head movements, we
have the subject focus on a movie or another visual image
positioned directly in front of him or her. Also, because the
intensity is dependent on the distance between the loudspeakers
and the subject, we carefully measure and mark the location of the
chair and speakers, and position the left and right speakers
equidistantly.
Detecting stimulus jitter
One of the defining characteristics of the ABR is that it
reflects extremely fast neural activity synchronized across
populations of neurons, with minute disruptions in neural
precision being indicative of brain stem pathologies (Hood
1998; Hall 2006). For this reason, the delivery and recording
units must be precisely time locked to each other.§ Even a
small amount of jitter in this synchronization can ruin an ABR
recording. If the timing of the stimulus does not always occur
at the same time with respect to the triggering of the recording
system, the response is canceled, or at the very least distorted,
when trials are averaged. Thus, when a new recording system
is acquired, it is important to confirm that the delivery system
is properly calibrated (see below) to ensure that there is not an
§Depending on the recording system, the software used for stimulus
delivery is either integrated into the collection system or installed on a
separate computer from the collection computer. In the latter case, to
synchronize the presentation of stimulus and the recording of the response,
the delivery computer sends a digital trigger to the recording computer
every time a sound is played.
11
unexpected stimulus delay or jitter. A system that has been
optimized for collecting cortical responses should also undergo
testing before it can be cleared for brain stem testing. Because
of timing and duration differences between brain stem and
cortical responses, jitter may only be evident when recording
brain stem responses.
To determine whether the stimulus presentation is jittered,
couple the output of the delivery system into the electrode box, as
if recording cABRs from a subject. Next, play a click stimulus and
record the output into the recording system in continuous (nonaveraging) mode. Adjust the output intensity if the waveform is
clipped in the recording. It is important to record a sizeable
number of sweeps (100⫹) to ensure that the jitter does not creep
in over time. After the recording is complete, check that each click
occurs at the same time relative to the trigger across the recording.
For a properly functioning system, the deviation should not
exceed 0.1 msec. This is also an opportunity to determine whether
the stimulus is actually simultaneous with the trigger or whether
there is a delay that needs to be taken into account when
processing and analyzing cABRs.¶
Multiple stimulus conditions
When an experiment includes multiple stimulus conditions, a
block or interleaved paradigm can be used. In a block paradigm,
each condition is presented separately (i.e., block 1: P P P; block
2: Q Q Q) (Johnson et al. 2008b), and in an interleaved paradigm,
the stimulus conditions are intermixed (i.e., P Q P Q P Q or P Q
P P Q P) (Wong et al. 2007). In the block design, state (i.e.,
alertness) or expectancy effects may confound comparisons across
stimulus conditions. However, if the delivery system is not
designed to play multiple stimulus tokens, interleaving stimulus
conditions may not be possible. Although the presentation software might limit the number of stimuli that can be interleaved,
there does not seem to be a corresponding neurophysiologic limit
(e.g., in one experiment, we interleave eight different stimulus
conditions, two polarities for each, for a total of 16 different
sounds). In the case where multiple stimuli are to be directly
compared, it may be desirable to normalize the duration and
amplitude across the stimulus set. This can be carried out in
programs such as Level 16 (Tice and Carrell, University of
Nebraska, Lincoln, NE) and Praat (Boersma & Weenink 2009).
Recent work suggests that block and interleaved designs may
invoke different on-line subcortical encoding mechanisms. Chandrasekaran et al. (2009) compared the response to /dɑ/ collected in
a block condition and the response to the same stimulus when it
was presented with a pseudo-random probability within a mix of
seven other speech stimuli. The response to the interleaved
condition was found to have smaller spectral amplitudes compared with the block condition, which the authors interpret to be
an indication of weaker stimulus “tagging” when the stimulus is
presented less frequently.
cABR COLLECTION
Issues relating to electrodes, filtering, sampling rate, signal
averaging, simultaneous ABR-cortical EP recording, artifact
¶Please note that this testing procedure may not be feasible for every recording
system. Because the source of the jitter can come from one or a combination
of several sources and because each system will have its own peculiarities, we
advise contacting the manufacturer before adjusting any settings.
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
12
TABLE 3. Recommended cABR recording parameters
Parameter
Recommendation
Rationale/Comments
For rostral brain stem recordings; a horizontal montage
is used for recording from more peripheral structures
Better temporal precision with higher sampling rates
More defined transient peaks
Depends on spectral characteristics of stimulus
Digital filters minimize temporal phase shifts
Signal averaging
Vertical montage (active: Cz; reference: earlobe(s);
ground: forehead)
6000 –20000 Hz
Low-pass cutoff: 2000 –3000 Hz
High-pass cutoff: 30 –100 Hz
If possible, collect cABR with open filters (1–3000 Hz)
and band-pass filter off-line using digital filters
2 or more subaverages of 2000 –3000 sweeps
Averaging window
Begin 10 –50 msec before stimulus onset
Electrode placement
Sampling rate
Filtering
Simultaneous cABR-cortical
response recording
Minimizing artifacts
Extend 10 –50 msec after stimulus onset
Only if large files can be accommodated and
longer sessions are appropriate
Passive collection protocol
Electromagnetically shielded insert ear phones
Both stimulus polarities
Use electrically shielded test booth
Project movie into test booth
Artifact rejection criterion: ⬎20 ␮V
Determine response replicability
Spectral-domain averaging will increase spectral estimates and require fewer sweeps
An adequate sample of the baseline is needed to determine whether a particular response peak is above
the noise floor
For running window analysis, the pre-stimulus time
window should be greater than or equal to the duration of the analysis window
Neural activity should return to baseline
Minimizes myogenic artifacts
Minimize stimulus artifact
Enables adding of responses to minimize both stimulus
artifact and cochlear microphonic
Minimizes electrical artifact
Exclude trials exceeding typical neural response size;
criterion depends on high-pass filter setting
cABRs, auditory brain stem responses to complex sounds.
reduction, and recording conditions are reviewed below and
summarized in Table 3.
approach is to record with more open filters such as 30 to 3000 Hz
(Galbraith & Doan 1995; Galbraith et al. 2004).储
Electrodes and electrode montage
Sampling rate
For cABRs, a vertical one-channel montage is the most
common configuration. This configuration requires only three
electrodes corresponding to the active (noninverting), reference
(inverting), and ground electrodes. In our laboratory, the
preferred electrode placements are Cz (active), ipsilateral
earlobe (reference), and forehead or contralateral earlobe
(ground). We prefer to use the earlobe rather than the mastoid
because it is a noncephalic site that causes fewer artifacts from
bone vibration (Hall 2006). For researchers who intend to
record subcortical and cortical potentials simultaneously (see
below) or who wish to collect them within the same session,
cABRs can be recorded with an electrode cap.
Filters
Filtering is used to isolate subcortical activity from cortical
potentials and to increase the SNR of the response. For cABRs,
the band-pass filters match the range of settings used for clickABRs and typically fall in the range of 100 to 3000 Hz. This
frequency range has been found to maximize the detection of the
high-frequency transient peaks, such as the click-evoked peaks
I-V, which have sharp slopes. For stimuli containing frequencies
below 100 Hz (or which produce distortion products below 100;
Lee et al. 2009; Fig. 4), the high-pass cutoff should be lowered to
ensure that these lower-frequency features are lost. Another
Sampling rate (Fs), also referred to as the digitization rate,
determines how many times per second the neural signal is
digitally sampled by the recording system. In cases where only
low-frequency components of the response are of interest, a
low Fs (1000 to 2000 Hz) may be appropriate (Dajani et al.
2005; Aiken & Picton 2006). However, many researchers opt
to over-sample cABR recordings (rates range from 7 to 50
kHz) by sampling well above the Nyquist frequency (i.e., twice
储A more advanced recording technique uses a two-channel (or more) montage
to either simultaneously record horizontal and vertical montages (Galbraith
1994) or to record multiple recording parameters from a single site (e.g.,
different filter bandwidths; see below). We commonly record in continuous
(i.e., nonaveraged) mode using open filters (e.g., 0.1 to 3000 Hz) and then
refilter off-line using more narrowly defined digital band-passes. Analog
filters, generally used at the time of data collection, are more likely to introduce
distortions (e.g., phase distortions) in the response especially when cutoff
frequencies are near the frequency range of the response. In addition, filter
choice may be restricted by the recording system to only a handful of preset
values. Thus, for recording systems that include the option to record with open
filters, subsequent digital filtering is preferred. With off-line digital filtering,
you have the capability to set cutoff values more precisely and to optimize the
filter settings for a particular stimulus by systematically adjusting the bandpass to be more restrictive or more encompassing. However, open filters,
because of their susceptibility to cortical activity and noise contamination, can
be unsatisfactory for monitoring the quality of the response during acquisition.
A two-channel solution, with a second channel using a more restricted
band-pass (e.g., 100 to 2000 Hz) for on-line viewing, solves this problem.
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
13
the highest frequency in the stimulus) (Krishnan et al. 2005;
Musacchia et al. 2007; Akhoun et al. 2008a; Banai et al. 2009).
In addition to reducing sample errors, a higher Fs, by definition, increases the temporal precision of the recording and
allows for finer differentiation of response peaks. Because
cABR disruptions and enhancements occur on the order of
tenths of milliseconds, fine-grained temporal precision is essential. Although a higher frequency is desirable, the choice
may be limited by the particular recording system. For example, some recording systems use a fixed number of sample
points. In this case, the Fs is dependent on the duration of the
recording window (Fs ⫽ sample points/duration).
assisting in the interpretation of the response. For example,
when identifying prominent peaks in the response waveform,
peak amplitudes are compared with the amplitude of the
prestimulus period. For a given peak, if the amplitude does not
exceed the baseline amplitude, it is not considered as a valid
(i.e., reliable) peak. The baseline period can also be used to
determine the SNR (in the time and frequency domains) (see
Data Analysis section). For running window analyses (see Data
Analysis section), it is helpful to have a prestimulus period that
is long enough to include one full analysis window. Because
we typically perform running window analyses on 40 msec
bins, we use a prestimulus window of at least 40 msec.
Signal averaging
Number of sweeps • An age-old question in the EP literature
Simultaneous cABR— cortical EP recordings
is how many sweeps must be averaged to obtain a robust and
reliable response. It is well established that for higher intensity
stimuli roughly 1000 to 2000 sweeps are needed to collect
click-ABRs and tone-FFRs (Hood 1998; Krishnan 2007;
Thornton 2007). For cABRs, a comparable but sometimes
greater number of sweeps are obtained (1000 to 6000). However, if analyses are only carried out in the frequency domain,
then spectral maxima may be detected (i.e., statistically above
the noise floor) with fewer sweeps (Dajani et al. 2005; Aiken
& Picton 2006, 2008; Chandrasekaran et al. 2009).
Our laboratory takes a conservative approach by collecting
more stimulus trials than less, typically ⬃2000 to 3000 per
polarity (i.e., 4000 to 6000 total sweeps). There are several
reasons for this strategy. First, this allows for the creation of
subaverages that can be used to determine response repeatability and/or track how the response evolves over time. Second,
we are often interested in subtle response characteristics and
small group differences that may not be apparent until additional sweeps are collected and/or repeatability is confirmed. A
general principle of EP signal averaging is that the SNR is
proportional to the square root of the number of sweeps (Hood
1998; Hall 2006; Thornton 2007). Thus, the overall SNR
increases quickly at first and then begins to plateau with more
sweeps. However, an individual component of the cABR (e.g.,
a specific peak in the time domain or a spectral peak that is near
the phase locking limits of the brain stem) may show its own
SNR progression with different response components requiring
greater or fewer sweeps. Although it may not be possible to
determine the “optimal” number of sweeps for a given stimulus
and population before the start of an experiment, the optimal
range can be deduced a posteriori using an iterative off-line
averaging technique based on a handful of subjects from
whom a large number of sweeps have been collected (i.e.,
compare subaverages of 1000 sweeps, 1500 sweeps, 2000
sweeps, . . . 6000 sweeps). In the future, we envision that
better characterization of the cABR will enable the number
of sweeps to be reduced while still maintaining spectral and
temporal precision.
Averaging window • In the time domain, the averaging
window should be long enough to include a prestimulus
baseline period, the response period, and a poststimulus period.
The length of the poststimulus period needs to account for the
stimulus transmission delay and neural conduction time. A
poststimulus period between 10 and 50 msec is recommended
to ensure that the response returns to baseline. The prestimulus
baseline reflects the ambient EEG before the response, thereby
Although we have had success simultaneously recording
cABR and cortical responses (Musacchia et al. 2008), there are
a number of practical limitations to this practice that arise from
cABRs and cortical EPs having different optimal recording
parameters. First, cABRs require a much higher Fs than
cortical responses (often a 10-fold or more difference)
(Burkard et al. 2007). Second, because cortical responses are
optimally obtained using slower stimulation rates than ABRs
(Burkard et al. 2007), the presentation rate must be slow for
simultaneous recordings. Yet, because cABRs are much
smaller in amplitude (typically ⬍1 ␮V), many more trials must
be collected for a robust cABR than for a cortical response,
often leading to long recording sessions. These factors aggregate to create extremely large files, especially when highdensity electrode caps are used, leading to concerns about both
computer processing power and data storage. For these reasons,
we usually opt to collect brain stem and cortical-evoked
responses in separate recording sessions, optimizing recording
lengths, numbers of channels, and sampling rates for each.
Avoiding, detecting, and eliminating artifact
There are four types of artifacts that can distort ABR
recordings: external (i.e., nonbiological) electrical noise, myogenic (muscular) artifact, CM, and stimulus artifact. Although
artifacts can be minimized, it is best to remove the contamination at its source.
Electrical • When combating electrical artifact such as line
noise (60 or 50 Hz), the best tactic is to record within an
electrically shielded booth and remove all electrical sources
from the booth including televisions, and CRT and LCD
computer monitors. Light dimmers are another serious source
of noise. If the experimenter wishes to play a movie or another
visual stimulus during the experiment, two different approaches can be taken. The cheaper option is to use a portable
battery-powered DVD player that is placed on a table in front
of the subject. The second option, and the one we use most
often, is to use an LCD projector located outside the booth that
projects the visual stimulus through a booth window onto a
screen inside the booth.
Another type of electrical artifact comes from the electrical
trigger pulse that is used to synchronize stimulus presentation
and response averaging. This artifact appears at time zero. If a
long trigger is used, a second artifact may appear when the
trigger turns off. If the duration of the trigger pulse can be
manually set within the stimulus presentation software, this
type of artifact can be reduced by either shortening the trigger
14
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 7. Detecting stimulus artifact. Stimulus artifacts are easily discernable
in the response. Unlike the response (bottom), the artifact (middle) contains
frequencies that are higher than the phase-locking capabilities of the brain
stem (Moushegian et al. 1973). In contrast to the cABR, which occurs
within 6 to 10 msec after the stimulus (top) is played, the stimulus artifact
exhibits no delay. In addition, the artifact is often larger than a typical
cABR. In this example, the artifact to a 40 msec /dɑ/ (Fig. 1) is 10 times
larger than the response. Stimulus artifact can be minimized by using
electromagnetically shielded insert earphones and adding the responses to
alternating polarities (Fig. 8).
pulse so it occurs before the onset of the response (e.g., ⬍5
msec), or by making it longer than the stimulus.
Myogenic • Given that cABRs are typically recorded with
wide band-pass filters (see above), myogenic artifacts (e.g.,
neck tension, smiling) are often not filtered out. Because
myogenic artifacts produce potentials that can be many times
larger than the brain stem response, trials for which the
amplitude exceeds a specific threshold should be excluded
from the final average (either on-line or off-line). In the cABR
literature, this threshold ranges from ⫾20 to ⫾75 ␮V (Galbraith & Doan 1995; Akhoun et al. 2008b). Although this
technique removes large artifacts, it does not completely
expunge all myogenic contamination from the recording. For
this reason, it is important to keep the subject relaxed and still
during the recording session.
Cochlear microphonic • The CM is a potential generated by
the cochlear hair cells that, similar to the FFR, mimics the
temporal waveform of the acoustic stimulus. Because of its
similarity to the neural response, care must be taken to prevent
or remove the CM from the recordings. The CM can be
distinguished from the brain stem response in a number of
ways. Unlike the cABR, which occurs at ⬃6 to 10 msec
poststimulus onset, the onset of the CM is nearly coincident
with the stimulus. The CM and cABR are also differentially
Fig. 8. Adding polarities (A and B) minimizes stimulus artifact and cochlear
microphonic. Responses to the A and B polarities of the 40-msec /dɑ/ from
Figure 1. (Top) The response to polarity A (inset) is magnified (⫺5 to 10
msec) to illustrate the stimulus artifact. When the A response (red) is
compared with the stimulus (light gray), the two waveforms align in phase
for ⬃4 msec. This is because the stimulus artifact (and CM) follow the
temporal pattern of the stimulus waveform. (Middle) The response to
polarity B (blue) is inverted with respect to the A response in this region.
(Bottom) By adding A and B responses (gray), the artifact is canceled. In
contrast, the artifact is accentuated when the two polarities are subtracted
(black). Thus, although the analysis of the subtracted waveform or the
single polarity response can be complicated by unwanted artifacts, the
added response ensures a response of neurogenic origins (Aiken et al.
2008). In this figure, ADD ⫽ (A ⫹ B)/2; SUB ⫽ (A ⫺ B)/2.
affected by rate, intensity, and noise. For example, although
cABRs break down with increases in presentation rate and
simultaneous masking intensity, the CM remains unaffected.
Furthermore, in contrast to cABR amplitude, which plateaus at
suprathreshold levels, the size of the CM usually increases
linearly with moderate increases in intensity. For more information on the CM and how it can be isolated from the ABR, we refer
the reader to the following studies: Starr and Don (1988), Aiken
and Picton (2008), and Chandrasekaran and Kraus (2009).
Stimulus artifact • Given that the cABR occurs within a
matter of milliseconds after stimulation, and the fact that the
cABR closely mimics the stimulating waveform, stimulus
artifact is a major concern. Fortunately, this type of artifact is
easy to detect (Fig. 7) and can be minimized with the right
recording techniques (Fig. 8).
In most modern EP collection systems, the stimulus waveform is sent as an electrical signal to a transducer where it is
converted to an acoustic signal. If the transducer is not properly
shielded, the electrical signal can “leak” and get picked up by
the electrodes and get recorded by the EP system along with the
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
response (Fig. 8). In addition to using electromagnetically
shielded earphones (Akhoun et al. 2008a,b), it is good practice
to double check that the electrode leads and transducer cables
are not touching and to position the electrodes and transducer
as far apart as possible. This can be achieved with insert
earphones that use a plastic tube to separate the transducer and
foam earplug. Using longer tubes or positioning the transducer
outside the test booth can further minimize stimulus artifact.
For an in-depth discussion of stimulus artifacts in cABR
recordings, refer to Aiken and Picton (2008) and Akhoun et al.
(2008a,b).
Minimizing the CM and stimulus artifact • Given that
both artifacts follow the phase of the stimulus exactly, stimulus
artifact and CM can be minimized from the response by adding
responses to alternating stimulus polarities (Fig. 8, see Stimulus Presentation section).
Active versus passive test conditions
Because ABRs are not greatly affected by sleep, click-ABRs
are often collected while the patient is asleep or sedated. Similarly,
to reduce myogenic artifact, many cABR researchers allow or
even encourage their subjects to fall asleep on a cot or to recline
comfortably in a chair (Dajani et al. 2005; Krishnan et al. 2005;
Aiken & Picton 2008). However, to rule out differences in state as
a potential confound, our subjects stay awake during testing. To
promote relaxation and stillness, subjects watch a movie or read a
book. For monaural stimulation, the movie soundtrack is played at
a low level (⬃40 dB SPL), so that it can be heard in the nontest
ear without masking the auditory stimulation. Subtitles are displayed for binaural recordings.
cABRs can also be recorded under active test conditions in
which the subject performs a task (e.g., detecting/counting
oddball stimulus tokens; Musacchia et al. 2007). For example,
using an audiovisual paradigm, Musacchia et al. revealed that
active multisensory integration can shape how subcortical
sensory processes respond to speech and music (Musacchia et
al. 2006, 2007). Although there is some disagreement in the
literature as to whether attention modulates the click-ABR
(Picton & Hillyard 1974; reviewed by Rinne et al. 2008),
Galbraith and others demonstrated that attentional state can
govern the FFR to tones and speech (Galbraith & Arroyo 1993;
Galbraith & Kane 1993; Galbraith & Doan 1995; Galbraith et
al. 1998, 2003; Hoormann et al. 2004). This is consistent with
recent functional MRI work showing that selective auditory
attention tasks can modulate the activation of subcortical
structures (Rinne et al. 2008).
Notably, to study the dynamic nature of auditory processing, the subject need not be performing an active task during
data collection. A growing body of research supports the use of
passive recording conditions to study how brain stem function
is fine tuned by experience. Although the subject is not actively
processing the sounds evoking the response, cABRs tap into
how previous active engagement with sound that has occurred
during the course of lifelong or short-term auditory experiences
has shaped brain stem processes. This refinement of the
sensory system likely results from an interplay between subcortical structures and high-order cognitive processes via the
corticofugal system (Banai et al. 2005, 2009; Krishnan et al.
2005; Russo et al. 2005; Musacchia et al. 2007; Wong et al.
2007; Song et al. 2008; Lee et al. 2009; Strait et al. 2009b;
reviewed by Kraus et al. 2009; Tzounopoulos & Kraus 2009).
15
DATA ANALYSIS
To analyze the transient and sustained aspects of cABRs,
our laboratory uses a battery of measures to appraise the timing
and magnitude of neural synchrony, as well as the strength and
precision of phase locking. Because cABRs are rich in temporal and spectral information, the use of multiple measures
allows us to (1) dissect individual components of the response
and how they reflect distinct aspects of processing and (2)
describe brain stem encoding in a holistic manner. Because of
the transparency between the temporal and spectral features of
the stimulus and the response, our analyses are largely stimulus
driven. That is to say, we base our analyses and interpretation
on the acoustic make-up of the stimulus. Because of this
stimulus-response fidelity, commonly used digital signal-processing tools such as cross-correlation and Fourier analysis can
be used to analyze both the stimulus and response. Each of
these techniques comes in many variants and each belongs to a
large family of analysis methods; however, we generally use
each in its most basic form. For more information on digital
signal processing, we refer the reader to van Drongelen (2007),
Wallisch (2009), and Porat (1997).
This section includes an overview and illustration of the most
common signal-processing techniques used to evaluate cABRs,
namely peak latency and amplitude measurements, root mean
square (RMS) amplitude, cross-correlation, and Fourier analysis.
A summary is provided in Table 4. The analyses described below
are typically performed off-line on the averaged time-domain
response or subaverages. Although some of these measurements
can be made directly by the EP collection system, others require
the use of computational software packages such as MATLAB.
For researchers and clinicians who are not in a position to
code their own algorithms, we have developed an open
source MATLAB-based toolbox (The Brainstem Toolbox)
that is available for free upon request under the GNU
General Public License (contact eeskoe@northwestern.edu
for more information).
Analyzing transient responses
Peak latency and amplitude • To characterize the transient
features of the response, individual peaks relating to major
acoustic landmarks in the stimulus are identified (Fig. 1). For
each peak, latency (time relative to stimulus onset) and
amplitude measurements are obtained. Interpeak measurements
are also calculated; these include interpeak amplitude, duration,
slope, and area (Russo et al. 2004). In general, transient peaks
occur within 6 to 10 msec after the corresponding stimulus
landmark. Automated peak-picking algorithms can be used to
objectively identify maxima (peaks) or minima (troughs)
known to occur within a given latency range. To be considered
a reliable peak, the absolute amplitude must be larger than the
baseline activity recorded before the onset of the stimulus.
Confidence in selection of ambiguous peaks is aided by
referring to subaverages. Once the peaks have been identified,
they are visually confirmed or rejected by multiple raters who
are blind to subject group or stimulus contrasts. When the
raters disagree, the selection is determined by the most experienced rater. However, bear in mind that agreement among
raters may reflect common training in peak identification.
Consequently, if peaks cannot be identified by objective
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
16
TABLE 4. Methods for analyzing cABRs
Method
Transient features
Peak latency and amplitude
Sustained features
RMS amplitude
Fourier analysis
Description
Rationale/Comments
Delineation of transient response peaks
Global measure of magnitude
Frequency domain representation
Cross-Correlation
Compares the timing and morphology of two
signals
Autocorrelation
A signal is cross-correlated with itself
Sliding window analyses
Small time bins (i.e., windows) of the signal are
analyzed in succession to create a three-dimensional representation of the response
(e.g., spectrograms and autocorrelograms)
Used to calculate SNRs
Used to measure the precision and magnitude of
neural phase locking at specific frequencies and
frequency ranges
Amplitude and phase are recorded
Signal 1 is shifted in time relative to signal 2 to find
the shift that produces the strongest correlation
Examples: stimulus-to-response and quiet-to-noise
cross-correlation
If the correlation coefficient (r) ⫽ 1, the signals are
identical. If r ⫽ 0, the signals are completely dissimilar
Used to find (1) repeating patterns in signals such
as phase-locked activity to the F0 and the amplitude envelope and (2) the strength of phase
locking
Used to evaluate and visualize how cABRs changes
over time
cABRs, auditory brain stem responses to complex sounds; RMS, root mean square; SNRs, signal to noise ratios.
methods, an external rater should also be consulted whenever
feasible.
A number of techniques have been developed to aid in the
identification of difficult to identify/low-amplitude peaks.
These include wavelet denoising (Quian Quiroga et al. 2001;
Russo et al. 2004) and high-pass filtering (Johnson et al.
2008b; Hornickel et al. 2009b). When determining which
peaks to pick in the cABR to a novel stimulus, start by
generating a grand average response of all subjects, and then
compare the grand average with the stimulus waveform to
determine where the two waveforms align. Once this has
been performed, the individual waveforms should be reviewed to determine which peaks have high replicability
across subjects (i.e., ⬍1 msec deviation across subjects/
groups).
Differences in latency over time • When multiple stimulus
conditions are compared, a more advanced technique involves
calculating how the latency changes between/among conditions
as a function of time. For example, recent work from our
laboratory (Johnson et al. 2008b; Hornickel et al. 2009b)
showed that the formant frequency differences differentiating
the stop consonants /bɑ/, /dɑ/, and /gɑ/ are represented by
systematic and progressive latency differences in the cABR
with /gɑ/ responses occurring first, followed by /dɑ/, and then
by /bɑ/ (i.e., higher stimulus frequencies yield earlier response
latencies). These latency differences can be visualized using a
latency-tracking plot (Fig. 9).
Analyzing sustained responses
Static and sliding window analyses • The response to
periodic features (e.g., steady-state vowels, formant transitions,
pitch contours, steady-state musical notes, and glissandos) can
be analyzed using RMS, cross-correlation, and Fourier analy-
sis. Each of the analysis techniques described below can be
used to perform “static” window or “sliding” window (also
called running window) analyses. A single region of the
time-amplitude waveform is evaluated in a static window
Fig. 9. Tracking latency differences over time. The frequency trajectories
that differentiate the consonant-vowel stop syllables /bɑ/, /dɑ/, and /gɑ/ are
represented in the responses by latency differences, with /gɑ/ responses
occurring first, followed by /dɑ/ and then /bɑ/ (i.e., higher frequencies yield
earlier peak latencies than lower frequencies). In the stimulus, the frequency differences diminish during the course of the 50-msec formant
transition. (Top) This pattern is reflected in the timing of the cABR (/gɑ/
⬍ /dɑ/ ⬍ /bɑ/) and is most apparent at five discrete response peaks. Peaks
⬃55 msec are magnified in the inset. (Bottom) The normalized latency
difference between responses is plotted as a function of time (see Johnson
et al. 2008b; Hornickel et al. 2009b, for details).
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 10. Stimulus to response cross-correlation. Cross-correlation is used to
compare the timing and morphology of two signals (A). The response
(black, bottom) to a 170-msec /dɑ/ (gray, top) is compared with a low-pass
filtered version (dark gray, middle) of the evoking stimulus. The stimulus
consists of an onset stop burst, consonant-vowel formant transition, and a
steady-state (i.e., unchanging) vowel. (B) This plot represents the degree to
which the low-pass stimulus and response are correlated as a function of
the time shift. The maximal correlation is reached at an 8.5-msec time
displacement, an indication of the neural transmission delay (rmax ⫽ 0.60;
rmax ⫽ 0.32 for the unfiltered stimulus [correlogram not shown]). An
alternative approach is to cross-correlate the response with the stimulus
envelope (Akhoun et al. 2008a), which can often lead to higher correlation
values. (C) Running-window analyses can be used to visualize and
quantify the similarity of two signals across time. In this example, when
the same low-pass stimulus and response are compared in this manner
(40 msec windows), the two signals are more similar during the
steady-state region, although the delay is consistent across time. (To
listen to the stimulus, go to Audio file, Supplemental Digital Content 7,
http://links.lww.com/EANDH/A7; to listen to the response, go to Audio
file, Supplemental Digital Content 8, http://links.lww.com/EANDH/A8.)
analysis. For sliding-window analyses, small time bins (i.e.,
windows) of the signal are analyzed in succession. This
technique captures how the signal changes over time and it is
often used to create a three-dimensional representation of the
signal, such as spectrograms (Figs. 5 and 13) and correlograms
(Figs. 10 and 11). For time-frequency varying stimuli, such as
17
Mandarin pitch contours, diphthongs, and glissandos, frequency-tracking plots are generated using sliding window analysis
(Krishnan et al. 2005; Wong et al. 2007) to capture how the
changing F0 or harmonic is tracked in the response over time.
RMS amplitude • For cABRs, RMS amplitude represents the
magnitude of neural activation over a given time period (Russo
et al. 2004; Akhoun et al. 2008a). RMS is calculated by (1)
squaring each point, (2) finding the mean of the squared values,
and then (3) taking the square root of the mean. The quotient of
response RMS amplitude (i.e., signal) and prestimulus baseline
RMS amplitude (i.e., noise) can serve as a measure of SNR
(Russo et al. 2004). If the SNR is ⬍1, the prestimulus activity
is larger than the “response” activity. In cases where the SNR
of the cABR is ⬍1.5, we recollect the cABR or exclude the
subject. A typical cABR has an SNR in the range of 2.5 to 3,
although SNRs as high as 6 are not uncommon.
Cross-correlation • Correlation is a useful tool for comparing the overall morphology and timing of two signals (e.g.,
stimulus versus response; Fig. 10; Russo et al. 2004; Akhoun et
al. 2008a). In general terms, cross-correlation determines the
extent to which two signals are correlated, as a function of the
time shift between them. At a given time displacement, if two
signals are identical, the cross-correlation coefficient (r) is 1. If
the signals are identical but 180 degrees out of phase, it is ⫺1.
However, if the signals are completely dissimilar, r ⫽ 0. In
addition to using cross-correlation to determine the degree of
similarity, it can also be used to quantify the time delay
between the two signals (i.e., time displacement that produces
the greatest r value). The onset of the response can be
objectively determined in this manner (Akhoun et al. 2008a) by
correlating the stimulus and the response. In addition, two
responses can be cross-correlated to determine how much the
response has been degraded in noise (Russo et al. 2004, 2005)
or how the response changes for different stimulus conditions,
for example, left versus right ear stimulation (Hornickel et al.
2009a). When performing stimulus-to-response correlations,
the stimulus is low-pass filtered to remove the high frequencies
that are not present in the response (Russo et al. 2004; Akhoun
et al. 2008a) and then resampled, if necessary, to match the
sample rate of the response (see Stimulus Presentation section).
Autocorrelation • Cross-correlation can also be used to find
repeating patterns (i.e., periodicities) within a signal, such as
the fundamental periodicity (Krishnan et al. 2005; Wong et al.
2007) and the temporal envelope (Lee et al. 2009). This class
of cross-correlations is called autocorrelation because a signal
is correlated with itself. Autocorrelations are performed by
making a copy of a signal and then shifting the copy forward
in time (Fig. 11).
The fundamental frequency is represented in the stimulus
and cABR by peaks occurring at the period of the F0 (period ⫽
1/frequency). The interpeak interval (period) can be found by
calculating the time shift at which the signal best correlates
with itself. Thus, autocorrelation is an objective way for
determining interpeak intervals and it can be used to estimate
the F0 of the response (calculated as 1/d, where d is the time shift
needed to obtain the maximum autocorrelation). The strength of
phase locking to the F0 can also be estimated by this maximal
correlation coefficient value. In addition, autocorrelation functions
that have broader morphology reflect less robust phase-locked
responses, and steeper functions reflect sharper and more robust
phase locking (Krishnan et al. 2005; Lee et al. 2009).
18
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Fig. 11. An illustration of frequency tracking by autocorrelation. By cross-correlating a response waveform with itself, the time interval between peaks can
be determined. The frequency of the F0 and other periodic aspects of the response, including the temporal envelope (Krishnan et al. 2004; Lee et al. 2009),
can be derived from an autocorrelogram. This technique can also be used to calculate the strength of phase locking to these features. (A) In this example, the
response to a syllable /mi/ with a dipping F0 contour (Mandarin Tone 3; black line in B) is plotted. (B) By applying the autocorrelation technique on 40-msec
sliding windows, a frequency contour can be tracked over time. Colors represent the strength of correlation; white is highest. (C and D) An illustration of
cross-correlation performed on a single time window (100 to 140 msec; demarcated in A). When a copy of this window is shifted by 10.45 msec, the first
peak of the copy lines up with the second peak of the original (C). A correlogram (D) represents the degree of correlation as a function of the time shift. The
highest correlation occurs at 10.45 msec; thus, the fundamental periodicity of this window is 1/10.45 msec or 96 Hz. The strength of the correlation at 10.45
msec is r ⫽ 0.98, indicating strong phase locking to 96 Hz in this time window.
An autocorrelogram, created via sliding window analysis, is
a visual representation of how well the signal correlates with
itself across time. In the three-dimensional graph (Fig. 11B),
the degree of correlation is represented in color, with the
vertical axis representing the time shift and the horizontal axis
representing time. Autocorrelograms can be used to evaluate
frequency tracking to the F0 and amplitude envelope. Phase
locking can then be described in terms of consistency (i.e., how
much the maximum r value deviates over time) and strength over
time (i.e., the average maximum r value over time; Fig. 11;
Krishnan et al. 2005; Wong et al. 2007; Lee et al. 2009).
Fourier analysis • A frequency domain representation of the
cABR can be generated using Fourier analysis. This method
can be used to measure the precision and magnitude of neural
phase locking at specific frequencies or frequency ranges.
Fourier analyses can be used to generate a frequency-domain
average. An alternative, less computationally demanding technique, is to perform an FFT on the time-amplitude average.
One of the basic properties of periodic waveforms is that
when two or more waves interact, the resulting waveform is the
sum of the individual components (assuming a linear system).
This is the principle underlying Fourier analysis. Using Fourier
analysis, a complex waveform consisting of many frequency
components is decomposed into a set of sine waves. The
magnitude of each sine wave corresponds to the amount of
energy contained in the complex waveform at that frequency.
The spectral composition of a complex wave can then be
represented by plotting the frequency of the sine wave on the
x axis and the magnitude on the y axis (Figs. 4 and 12). The fast
Fourier transform (FFT) (Cooley & Tukey 1965) is the most
common algorithm for performing spectral analysis although
other Fourier-based methods have been used by cABR researchers (Dajani et al. 2005; Aiken & Picton 2008). The FFT
is most efficient (i.e., faster) when the signal N (defined as the
Fig. 12. Fast Fourier analysis of a complex signal with time-varying features.
This response was evoked by a 40-msec /dɑ/ sound, comprising an onset
stop-burst followed by a consonant-vowel formant transition period. A
frequency-domain representation of the frequency following response was
generated using the fast Fourier transform (FFT). As a measure of phase
locking, spectral amplitudes are calculated over a range of frequencies
corresponding to the F0 (103 to 125 Hz) and the first formant (F1; 220 to
720 Hz). The noise floor is plotted in gray. The time-domain representation
of this response is plotted in Fig. 1.
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
number of points) is a power of 2. However, software such as
MATLAB and Mathematica (Wolfram Research, Inc., Champaign, IL) do not require the input to be a set length.
When dealing with finite signals, such as cABRs, the
frequency resolution is dependent on the duration of the sample
being analyzed (resolution ⫽ 1/T, where T ⫽ duration in
seconds). For a 50-msec signal, the frequency resolution is
1/0.05 or 20 Hz. The resulting frequency spectrum contains
information only at integer multiples of 20 Hz (i.e., 0, 20, 40,
60. . . Nyquist frequency; 0 Hz ⫽ DC component). If the signal
contains a frequency component at 130 Hz, the amplitude of
the 130 Hz component, which is not an integer multiple of the
sampling period, “leaks” into the neighboring components (i.e.,
120, 140). This leakage can be reduced by increasing T. As a
general rule of thumb, T should be long enough to include a
minimum of two to four cycles of the lowest frequency of
interest. For example, if you wish to characterize a 100-Hz
frequency component, the duration of the signal should be at
least 20 msec (i.e., [1/100] ⫻ 2). One trick for “increasing” T,
without actually taking a longer sample of the signal, is to add
a series of zeros to the end of the original sample (often called
zero padding; Dajani et al. 2005; Wong et al. 2007). For
example, if the 50 msec sample has a 20-kHz sample rate (1000
point sample), to increase the resolution from 20 to 1 Hz,
19,000 zeros are added onto the end of the sample before
performing the FFT.
Another thing to consider when performing Fourier analyses is that the FFT treats the sample as if it were a continuous
loop in which the first and last samples are contiguous. Thus,
if the starting and ending amplitudes are the not same, the
amplitude difference gets reflected in the FFT output. When
the discontinuity is large, it creates a click-like feature in the
response. Because clicks are broadband, this discontinuity
results in frequency splattering that contaminates the accuracy
of the spectral analysis. To prevent this splatter, a common
countermeasure is to multiply the signal by a windowing
function, which tapers the amplitudes on both ends so that the
sample begins and ends at zero with zero amplitude. Although
window functions come in many different shapes, we typically
use a Hanning window, which has a bell-shaped function.
For cABRs, frequency spectra are analyzed with respect to
the frequency composition of the stimulus. Because stimulus
and response amplitudes occur on different scales, the amplitudes must be normalized to plot the two spectra on the same
plot. This can be achieved by converting both spectra to
decibels (Aiken & Picton 2008) or by dividing each spectral
amplitude by the corresponding spectral maximum (Fig. 4; Lee
et al. 2009). When analyzing the response in the frequency
domain, spectral maxima corresponding to the stimulus F0 and
its harmonics are identified, and the phase and amplitude
(modulus of the FFT) of the maxima are recorded. Fourier
analysis is also useful for calculating the amplitude over a
range of frequencies, especially in cases when the stimulus has
time-varying features such as formant transitions (Fig. 12;
Banai et al. 2009). By performing an FFT on the prestimulus
time window, the spectral noise floor can be estimated (Fig. 12)
and used to calculate spectral SNRs.
If performed as part of a sliding-window analysis, the FFT
can be used to generate a spectrogram, a three-dimensional
graph of the frequency spectrum as a function of time. This
type of analysis is often referred to as short-term Fourier
19
Fig. 13. An illustration of frequency tracking by short-term Fourier transform (STFT) method. STFT is a method for examining frequency tracking,
which enables the tracking of the stimulus F0 and harmonics. (Top) The
response (black) to a /mi/ syllable (gray), with a rising pitch contour
(Mandarin Tone 2). The rising pitch is evident in the increasingly smaller
interpeak intervals in the stimulus and response over time. (Middle) The
estimated response F0 (yellow) contour is plotted against the known F0 of
the stimulus (black). Each point represents the spectral maximum within a
single 40-msec window of a sliding-window STFT analysis. The precision
of phase locking can be measured by calculating the frequency error
between the stimulus and response trajectories (Wong et al. 2007; Russo et
al. 2008). (Bottom) Plotting the resulting spectrogram of the STFT procedure
enables a visualization of the response’s tracking of F0 and its harmonics.
(To listen to the stimulus, go to Audio file, Supplemental Digital Content 9,
http://links.lww.com/EANDH/A9; to listen to the response, go to Audio file,
Supplemental Digital Content 10, http://links.lww.com/EANDH/A10.)
transform. In these plots, the horizontal axis (x axis) represents
time, the vertical axis (y axis) represents frequency, and the
third dimension represents the amplitude at a given timefrequency point. The third dimension is usually represented
using a color continuum. Frequency tracks can be derived from
short-term Fourier transforms of the response (Fig. 13; Musacchia et al. 2007; Song et al. 2008). Wavelets provide an
alternative method for performing time-frequency analyses
(see Addison et al. 2009 for an overview of the emerging role
of wavelets in biological signal analysis).
SUMMARY
1. ABRs provide an objective and noninvasive means for
examining how behaviorally relevant sounds such as
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
20
2.
3.
4.
5.
6.
speech and music are transcribed into neural code. The
brain stem response is ideal for studying this process
because stimulus features are preserved in the response.
Notably, this process is not “hard coded.” Brain stem
encoding of speech and other complex signals is shaped
by short-term and lifelong musical and language experience and are thereby tightly coupled with cognitive
processes. Aspects of the response are selectively impaired or enhanced in impaired and expert populations
(e.g., children with reading impairments and musicians),
facilitating the delineation of the underlying neural
processes.
Similar to click-evoked ABRs, cABRs are well suited for
clinical applications because they can be meaningfully
applied to individuals.
A variety of speech and musical stimuli have been used
to evoke ABRs. When choosing a stimulus, the acoustic
properties of the stimulus matter. To maximize transient
responses, the sound should have sharp onsets or amplitude bursts. A low pitch stimulus (⬍300 Hz), or a
stimulus with its fundamental periodicity in this range, is
needed to obtain strong phase-locked (i.e., sustained)
responses to the F0 and its harmonics.
cABRs are generally elicited at suprathreshold levels (60
to 80 dB SPL) using monaural or binaural stimulation via
electromagnetically shielded insert earphones. If the
stimulus presentation is jittered by even a small amount,
cABRs are canceled when trials are averaged.
cABRs can be recorded using the same data acquisition
procedures as click-ABRs and tone-FFRs. Additionally,
manipulations of stimulus polarity can be used to enhance different aspects of the response and to minimize
stimulus artifacts and the CM.
Because of the transparency between the stimulus and
response, digital signal processing tools (e.g., crosscorrelation, Fourier analysis) can be used to analyze both
the stimulus and response. Sliding-window analysis is
used to track how the response changes over time.
CONCLUSION
Neural transcription of sound in the auditory brain stem is
an objective measure of auditory processing and as such can be
applied to research and clinical assessment whenever auditory
processing is of interest. This includes the investigation of
auditory specialization (e.g., musicians, native language speakers) and the management of auditory disorders (e.g., auditory
processing disorders, language-based learning impairments
such as dyslexia, specific language impairment, autism, hearing loss, and age-related hearing decline) that often result in
pervasive difficulties with speech perception especially in
noise. ABRs to complex sounds provide an objective neural
metric for determining the effectiveness of remediation strategies, providing the outcome measures that clinicians need to
strengthen their role in advocating for auditory training and
remediation across the lifespan. Together with converging lines
of research (Fritz et al. 2007; Weinberger 2007; Luo et al.
2008; Atiani et al. 2009), the cABR has reinforced the notion
that a contemporary view of the auditory system must include
its cognitive and sensory functions. That is, subcortical function inherently reflects a confluence of sensory and cognitive
processes that likely operate in a reciprocally interactive
manner. This view can help the field of audiology more
effectively address socially and clinically meaningful aspects
of human communication. It is hoped that the methodological
information in this tutorial move forward our knowledge and
clinical management of auditory processing.
ACKNOWLEDGMENTS
The authors thank Mary Ann Cheatham, Trent Nicol, Bharath Chandrasekaran, and the members of the Auditory Neuroscience Laboratory for
their careful review of this manuscript. They extend their gratitude to
Alexandra Parbery-Clark and Frederic Marmel for their contributions to
the music-related aspects of the Stimulus Selection and Creation section.
This work was supported by NSF 0842376 and NIH R01 DC01510, F32
DC008052.
Co-author Nina Kraus currently serves as consultant to Natus Corp. for
BioMARK, a clinical measure of auditory function, and receives royalties
through Northwestern University, Evanston, IL.
Address for correspondence: Erika Skoe, Northwestern University, 2240
Campus Drive, Evanston, IL 60208. E-mail: eeskoe@northwestern.edu.
Received June 19, 2009; accepted November 13, 2009.
REFERENCES
Abel, C., & Kossl, M. (2009). Sensitive response to low-frequency
cochlear distortion products in the auditory midbrain. J Neurophysiol,
101, 1560 –1574.
Addison, P., Walker, J., Guido, R. (2009). Time-frequency analysis of
biosignals. IEEE Eng Med Biol Mag, 28, 14 –29.
Aiken, S. J., & Picton, T. W. (2006). Envelope following responses to
natural vowels. Audiol Neurootol, 11, 213–232.
Aiken, S. J., & Picton, T. W. (2008). Envelope and spectral frequencyfollowing responses to vowel sounds. Hear Res, 245, 35– 47.
Akhoun, I., Gallego, S., Moulin, A., et al. (2008a). The temporal relationship between speech auditory brainstem responses and the acoustic
pattern of the phoneme /ba/ in normal-hearing adults. Clin Neurophysiol, 119, 922–933.
Akhoun, I., Moulin, A., Jeanvoine, A., et al. (2008b). Speech auditory
brainstem response (speech ABR) characteristics depending on recording conditions, and hearing status: An experimental parametric study.
J Neurosci Methods, 175, 196 –205.
Anderson, S. & Kraus, N. (2010). Neural Signatures of Speech-in-Noise
Perception in Older Adults. Anaheim, CA: Mid-Winter Meeting for the
Association for Research in Otolaryngology.
Atiani, S., Elhilali, M., David, S. V., et al. (2009). Task difficulty and
performance induce diverse adaptive patterns in gain and shape of
primary auditory cortical receptive fields. Neuron, 61, 467– 480.
Ballachanda, B. B., & Moushegian, G. (2000). Frequency-following
response: Effects of interaural time and intensity differences. J Am Acad
Audiol, 11, 1–11.
Ballachanda, B. B., Rupert, A., Moushegian, G. (1994). Asymmetric
frequency-following responses. J Am Acad Audiol, 5, 133–137.
Banai, K., & Kraus, N. (2008). The Dynamic Brainstem: Implications for
APD. San Diego, CA: Plural Publishing Inc.
Banai, K., Abrams, D., Kraus, N. (2007). Sensory-based learning disability: Insights from brainstem processing of speech sounds. Int J Audiol,
46, 524 –532.
Banai, K., Nicol, T., Zecker, S. G., et al. (2005). Brainstem timing:
Implications for cortical processing and literacy. J Neurosci, 25,
9850 –9857.
Banai, K., Hornickel, J., Skoe, E., et al. (2009). Reading and subcortical
auditory function. Cereb Cortex, 19, 2699 –2707.
Bidelman, G., & Krishnan, K. (2009). Neural correlates of consonance,
dissonance, and the hierarchy of musical pitch in the human brainstem.
J Neurosci, 29, 13165–13171.
Bidelman, G. M., Gandour, J. T. & Krishnan, A. (2009). Cross-domain
effects of music and language experience on the representation of pitch
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
in the human auditory brainstem. J Cogn Neurosci, DOI:10.1162/
jocn.2009.21362.
Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer
(version 5.1.05).
Bradlow, A. R., Kraus, N., Hayes, E. (2003). Speaking clearly for children
with learning disabilities: Sentence perception in noise. J Speech Lang
Hear Res, 46, 80 –97.
Brugge, J. F., Anderson, D. J., Hind, J. E., et al. (1969). Time structure of
discharges in single auditory nerve fibers of the squirrel monkey in
response to complex periodic sounds. J Neurophysiol, 32, 386 – 401.
Burkard, R. F., Eggermont, J. J., & Don, M. (Eds). (2007). Auditory
Evoked Potentials: Basic Principles and Clinical Application. Philadelphia, PA: Lippincott Williams and Wilkins.
Burns, K., Soll, A., Vander Werff, K. (2009). Brainstem responses to
speech in younger and older adults, American Auditory Society Annual
Meeting Scottsdale, AZ, March 2009.
Chandrasekaran, B., & Kraus, N. (2009). The scalp-recorded brainstem
response to speech: Neural origins and plasticity. Psychophysiology.
Chandrasekaran, B., Gandour, J. T., Krishnan, A. (2007). Neuroplasticity
in the processing of pitch dimensions: A multidimensional scaling
analysis of the mismatch negativity. Restor Neurol Neurosci, 25,
195–210.
Chandrasekaran, B., Hornickel, J., Skoe, E., et al. (2009). Contextdependent encoding in the human auditory brainstem relates to hearing
speech in noise: Implications for developmental dyslexia. Neuron, 64,
311–319.
Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine
calculation of complex fourier series. Math Comput, 19, 297–301.
Cunningham, J., Nicol, T., Zecker, S. G., et al. (2001). Neurobiologic
responses to speech in noise in children with learning problems: Deficits
and strategies for improvement. Clin Neurophysiol, 112, 758 –767.
Dajani, H. R., Purcell, D., Wong, W., et al. (2005). Recording human
evoked potentials that follow the pitch contour of a natural vowel. IEEE
Trans Biomed Eng, 52, 1614 –1618.
Dhar, S., Abel, R., Hornickel, J., et al. (2009). Exploring the relationship
between physiological measures of cochlear and brainstem function.
Clin Neurophysiol, 120, 959 –966.
Dobie, R. A., & Berlin, C. I. (1979). Binaural interaction in brainstemevoked responses. Arch Otolaryngol, 105, 391–398.
Don, M., Vermiglio, A. J., Ponton, C. W., et al. (1996). Variable effects of
click polarity on auditory brain-stem response latencies: Analyses of
narrow-band ABRs suggest possible explanations. J Acoust Soc Am,
100, 458 – 472.
Dubno, J. R., Dirks, D. D., Morgan, D. E. (1984). Effects of age and mild
hearing loss on speech recognition in noise. J Acoust Soc Am, 76, 87–96.
Elberling, C., & Don, M. (2007). Detecting and Assessing Synchronous
Neural Activity in the Temporal Domain (SNR, Response Detection). In
R. F. Burkard, J. J. Eggermont, M. Don (Eds). Auditory Evoked
Potentials: Basic Principles and Clinical Application (pp, 102–123).
Philadelphia, PA: Lippincott Williams & Wilkins.
Elsisy, H., & Krishnan, A. (2008). Comparison of the acoustic and neural
distortion product at 2f1–f2 in normal-hearing adults. Int J Audiol, 47,
431– 438.
Everest, F. A., & Ebrary Inc. (2001). The Master Handbook of Acoustics
(4th ed., pp, xix, 615). New York: McGraw-Hill.
Fletcher, N. H., & Rossing, T. D. (1991). The Physics of Musical
Instruments. New York: Springer-Verlag.
Fritz, J. B., Elhilali, M., David, S. V., et al. (2007). Auditory attention–
focusing the searchlight on sound. Curr Opin Neurobiol, 17, 437– 455.
Galbraith, G. C. (1994). Two-channel brain-stem frequency-following
responses to pure tone and missing fundamental stimuli. Electroencephalogr Clin Neurophysiol, 92, 321–330.
Galbraith, G. C., & Arroyo, C. (1993). Selective attention and brainstem
frequency-following responses. Biol Psychol, 37, 3–22.
Galbraith, G. C., & Kane, J. M. (1993). Brainstem frequency-following
responses and cortical event-related potentials during attention. Percept
Mot Skills, 76, 1231–1241.
Galbraith, G. C., & Doan, B. Q. (1995). Brainstem frequency-following
and behavioral responses during selective attention to pure tone and
missing fundamental stimuli. Int J Psychophysiol, 19, 203–214.
Galbraith, G. C., Olfman, D. M., Huffman, T. M. (2003). Selective
attention affects human brain stem frequency-following response.
Neuroreport, 14, 735–738.
21
Galbraith, G. C., Arbagey, P. W., Branski, R., et al. (1995). Intelligible
speech encoded in the human brain stem frequency-following response.
Neuroreport, 6, 2363–2367.
Galbraith, G. C., Bhuta, S. M., Choate, A. K., et al. (1998). Brain stem
frequency-following response to dichotic vowels during attention.
Neuroreport, 9, 1889 –1893.
Galbraith, G. C., Amaya, E. M., de Rivera, J. M., et al. (2004). Brain stem
evoked response to forward and reversed speech in humans. Neuroreport,
15, 2057–2060.
Gray, G. W. (1942). Phonemic microtomy: The minimum duration of
perceptible speech sounds. Speech Monographs, 9, 75.
Greenberg, S. (1980). Wpp, no. 52: Temporal neural coding of pitch and
vowel quality. Working Papers in Phonetics. Department of Linguistics,
UCLA.
Greenberg, S., Marsh, J. T., Brown, W. S., et al. (1987). Neural temporal
coding of low pitch. I. Human frequency-following responses to
complex tones. Hear Res, 25, 91–114.
Grey, J. M. (1977). Multidimensional perceptual scaling of musical
timbres. J Acoust Soc Am, 61, 1270 –1277.
Hall, J. W. (2006). New Handbook of Auditory Evoked Responses. Boston:
Allyn and Bacon.
Hall, J. W., III. (1979). Auditory brainstem frequency following responses
to waveform envelope periodicity. Science, 205, 1297–1299.
Hayes, E. A., Warrier, C. M., Nicol, T. G., et al. (2003). Neural plasticity
following auditory training in children with learning problems. Clin
Neurophysiol, 114, 673– 684.
Holmes, J. (2001). Speech synthesis and recognition. Danvers, MA:
Frances & Taylor Press.
Holmes, J., & Holmes, W. (2002). Speech Synthesis and Recognition.
Philadelphia, PA: Taylor Francis, Inc.
Hood, L. J. (1998). Clinical Applications of the Auditory Brainstem
Response. San Diego, CA: Singular Publication Group.
Hoormann, J., Falkenstein, M., Hohnsbein, J., Blanke, L. (1992). The
human frequency-following response (FFR): normal variability and
relation to the click-evoked brainstem response. Hear Res, 59, 179 –188.
Hoormann, J., Falkenstein, M., Hohnsbein, J. (2004). Effects of spatial
attention on the brain stem frequency-following potential. Neuroreport,
15, 1539 –1542.
Hornickel, J., Skoe, E., Kraus, N. (2009a). Subcortical laterality of speech
encoding. Audiol Neurootol, 14, 198 –207.
Hornickel, J., Skoe, E., Nicol, T., et al. (2009b). Subcortical differentiation
of stop consonants relates to reading and speech-in-noise perception.
Proc Natl Acad Sci USA, 106, 13022–13027.
Howard, D. M., & Angus, J. A. S. (2001). Acoustics and Psychoacoustics
(2nd ed.). Oxford, Boston: Focal Press.
Jewett, D. L., & Williston, J. S. (1971). Auditory-evoked far fields
averaged from the scalp of humans. Brain, 94, 681– 696.
Jewett, D. L., Romano, M. N., Williston, J. S. (1970). Human auditory
evoked potentials: Possible brain stem components detected on the
scalp. Science, 167, 1517–1518.
Johnson, K. L., Nicol, T. G., Zecker, S. G., et al. (2007). Auditory
brainstem correlates of perceptual timing deficits. J Cogn Neurosci, 19,
376 –385.
Johnson, K. L., Nicol, T., Zecker, S. G., et al. (2008a). Developmental
plasticity in the human auditory brainstem. J Neurosci, 28, 4000 – 4007.
Johnson, K. L., Nicol, T., Zecker, S. G., et al. (2008b). Brainstem encoding
of voiced consonant–vowel stop syllables. Clin Neurophysiol, 119,
2623–2635.
Kawahara, H. (2009). STRAIGHT. Retrieved from http://www.
Wakayama-u.Ac.Jp/⬃kawahara/straighttrial/.
King, C., Warrier, C. M., Hayes, E., et al. (2002). Deficits in auditory
brainstem pathway encoding of speech sounds in children with learning
problems. Neurosci Lett, 319, 111–115.
Klatt, D. (1976). Software for cascade/parallel formant synthesizer.
J Acoust Soc Am, 67, 971–975.
Kraus, N., & Nicol, T. (2005). Brainstem origins for cortical ‘what’ and
‘where’ pathways in the auditory system. Trends Neurosci, 28, 176 –
181.
Kraus, N., & Banai, K. (2007). Auditory processing malleability: Focus on
language and music. Curr Dir Psychol Sci, 16, 105–109.
Kraus, N., Skoe, E., Parbery-Clark, A., et al. (2009). Experience-induced
malleability in neural encoding of pitch, timbre, and timing. Ann N Y
Acad Sci, 1169, 543–557.
22
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Kraus, N., McGee, T. J., Carrell, T. D., et al. (1996). Auditory neurophysiologic responses and discrimination deficits in children with learning
problems. Science, 273, 971–973.
Krishnan, A. (2002). Human frequency-following responses: Representation of steady-state synthetic vowels. Hear Res, 166, 192–201.
Krishnan, A. (2007). Frequency-Following Response. In R. F. Burkard,
J. J. Eggermont, M. Don (Eds.). Auditory Evoked Potentials: Basic
Principles and Clinical Application (pp, 313–335). Philadelphia, PA:
Lippincott Williams & Wilkins.
Krishnan, A., & McDaniel, S. S. (1998). Binaural interaction in the human
frequency-following response: Effects of interaural intensity difference.
Audiol Neurootol, 3, 291–299.
Krishnan, A., & Gandour, J. T. (2009). The role of the auditory brainstem
in processing linguistically-relevant pitch patterns. Brain Lang, 110,
135–148.
Krishnan, A., Swaminathan, J., Gandour, J. T. (2009a). Experiencedependent enhancement of linguistic pitch representation in the brainstem is not specific to a speech context. J Cogn Neurosci, 21,
1092–1105.
Krishnan, A., Xu, Y., Gandour, J. T., et al. (2004). Human frequencyfollowing response: Representation of pitch contours in Chinese tones.
Hear Res, 189, 1–12.
Krishnan, A., Xu, Y., Gandour, J., et al. (2005). Encoding of pitch in the
human brainstem is sensitive to language experience. Brain Res Cogn
Brain Res, 25, 161–168.
Krishnan, A., Gandour, J. T., Bidelman, G. M., et al. (2009b). Experiencedependent neural representation of dynamic pitch in the brainstem.
Neuroreport, 20, 408 – 413.
Langner, G., & Schreiner, C. E. (1988). Periodicity coding in the inferior
colliculus of the cat. I. Neuronal mechanisms. J Neurophysiol, 60,
1799 –1822.
Lee, K. M., Skoe, E., Kraus, N., et al. (2009). Selective subcortical
enhancement of musical intervals in musicians. J Neurosci, 29, 5832–
5840.
Liberman, A. M. (1954). The Role of Consonant-Vowel Transitions in the
Perception of the Stop and Nasal Consonants. Washington, DC:
American Psychological Association.
Lins, O. G., & Picton, T. W. (1995). Auditory steady-state responses to
multiple simultaneous stimuli. Electroencephalogr Clin Neurophysiol,
96, 420 – 432.
Luo, F., Wang, Q., Kashani, A., et al. (2008). Corticofugal modulation of
initial sound processing in the brain. J Neurosci, 28, 11615–11621.
Maddieson, I. (1984). Patterns of Sounds. Cambridge [Cambridgeshire],
NY: Cambridge University Press.
Moore, D. R. (1991). Anatomy and physiology of binaural hearing.
Audiology, 30, 125–134.
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform
processing techniques for text-to-speech synthesis using diphones.
Speech Commun, 9, 453– 467.
Moushegian, G., Rupert, A. L., Stillman, R. D. (1973). Laboratory note.
Scalp-recorded early responses in man to frequencies in the speech
range. Electroencephalogr Clin Neurophysiol, 35, 665– 667.
Musacchia, G., Strait, D., Kraus, N. (2008). Relationships between
behavior, brainstem and cortical encoding of seen and heard speech in
musicians and non-musicians. Hear Res, 241, 34 – 42.
Musacchia, G., Sams, M., Nicol, T., et al. (2006). Seeing speech affects
acoustic information processing in the human brainstem. Exp Brain Res,
168, 1–10.
Musacchia, G., Sams, M., Skoe, E., et al. (2007). Musicians have enhanced
subcortical auditory and audiovisual processing of speech and music.
Proc Natl Acad Sci USA, 104, 15894 –15898.
Nicol, T., Abrams, D., Kraus, N. (2009). Low-frequency phase coherence
in near- and far-field midbrain responses reveals high frequency
formant structure in consonant-vowel syllables. Baltimore, MD: Association for Research in Otolaryngology Symposium.
Nunez, P. L., & Srinivasan, R. (2006). Electric Fields of the Brain: The
Neurophysics of EEG (2nd ed.). Oxford, NY: Oxford University Press.
Palmer, A., & Shamma, S. (2004). Physiological Representations of
Speech. In S. Greenberg, W. A. Ainsworth, A. N. Popper, R. R. Fay
(Eds). Speech Processing in the Auditory System: Springer Handbook of
Auditory Research (Vol, 18, pp, xiv, 476). New York: Springer.
Parbery-Clark, A., Skoe, E., Kraus, N (2009a). Musical experience limits
the degradative effects of background noise on the neural processing of
sound. J Neurosci, 29, 14100 –14107.
Parbery-Clark, A., Skoe, E., Lam, C., et al. (2009b). Musician enhancement for speech-in-noise. Ear Hear, 30, 653– 661.
Parthasarathy, T., & Moushegian, G. (1993). Rate, frequency, and intensity
effects on early auditory evoked potentials and binaural interaction
component in humans. J Am Acad Audiol, 4, 229 –37.
Pichora-Fuller, M. K., Schneider, B. A., Daneman, M. (1995). How young
and old adults listen to and remember speech in noise. J Acoust Soc Am,
97, 593– 608.
Picton, T. W., & Hillyard, S. A. (1974). Human auditory evoked potentials.
II. Effects of attention. Electroencephalogr Clin Neurophysiol, 36,
191–199.
Plyler, P. N., & Ananthanarayan, A. K. (2001). Human frequencyfollowing responses: Representation of second formant transitions in
normal-hearing and hearing-impaired listeners. J Am Acad Audiol, 12,
523–533.
Porat, B. (1997). A Course in Digital Signal Processing. New York: John
Wiley.
Quian Quiroga, R., Sakowitz, O. W., Basar, E., et al. (2001). Wavelet
transform in the analysis of the frequency composition of evoked
potentials. Brain Res Brain Res Protoc, 8, 16 –24.
Regan, D., & Regan, M. P. (1988). The transducer characteristic of hair
cells in the human ear: A possible objective measure. Brain Res, 438,
363–365.
Rinne, T., Balk, M. H., Koistinen, S., et al. (2008). Auditory selective
attention modulates activation of human inferior colliculus. J Neurophysiol, 100, 3323–3327.
Robinson, K. (1995). The duration required to identify the instrument, the
octave, or the pitch chroma of a musical note. Music Perception, 13,
1–15.
Robinson, K., & Patterson, R. D. (1995). The stimulus duration required to
identify vowels, their octave, and their pitch chroma. J Acoust Soc Am,
98, 1858 –1865.
Rosen, S. (1992). Temporal information in speech: Acoustic, auditory and
linguistic aspects. Philos Trans R Soc Lond B Biol Sci, 336, 367–373.
Russo, N., Nicol, T., Musacchia, G., et al. (2004). Brainstem responses to
speech syllables. Clin Neurophysiol, 115, 2021–2030.
Russo, N., Nicol, T., Trommer, B., et al. (2009). Brainstem transcription of
speech is disrupted in children with autism spectrum disorders. Dev Sci,
12, 557–567.
Russo, N., Nicol, T., Zecker, S., et al. (2005). Auditory training improves
neural timing in the human brainstem. Behav Brain Res, 156, 95–103.
Russo, N., Skoe, E., Trommer, B., et al. (2008). Deficient brainstem
encoding of pitch in children with autism spectrum disorders. Clin
Neurophysiol, 119, 1720 –1731.
Shannon, R. V. (2005). Speech and music have different requirements for
spectral resolution. Int Rev Neurobiol, 121–134.
Sininger, Y. S. (1993). Auditory brain stem response for objective
measures of hearing. Ear Hear, 14, 23–30.
Skoe, E., Nicol, T., Kraus, N. Phase coherence and the human brainstem
response to consonant-vowel syllables. Baltimore, MD: Association for
Research in Otolaryngology Symposium.
Skoe, E., & Kraus, N. (2009). The Effects of Musical Training on
Subcortical Processing of a Missing Fundamental Piano Melody.
Chicago, IL: Society for Neuroscience.
Song, J. H., Banai, K., Russo, N. M., et al. (2006). On the relationship
between speech- and nonspeech-evoked auditory brainstem responses.
Audiol Neurootol, 11, 233–241.
Song, J. H., Skoe, E., Wong, P. C., et al. (2008). Plasticity in the adult
human auditory brainstem following short-term linguistic training.
J Cogn Neurosci, 20, 1892–1902.
Starr, A., & Don, M. (1988). Brain Potentials Evoked by Acoustic Stimuli.
In T. W. Picton (Ed.). Handbook of Electroencephalography and
Clinical Neurophysiology (Revised Volume 3). Human Event-Related
Potentials (pp, 97–157). Amsterdam: Elsevier.
Starr, A., Picton, T. W., Sininger, Y., et al. (1996). Auditory neuropathy.
Brain, 119 (Pt 3), 741–753.
Strait, D., Kraus, N., Parbery-Clark, A., et al. (2009a). Musical experience
shapes top-down mechanisms: evidence from masking and auditory
attention performance. [Published ahead of print December 21, 2009.]
Hear Res, doi:10.1016/j.heares.2009.12.021.
Strait, D., Kraus, N., Skoe, E., et al. (2009b). Musical experience and
neural efficiency: Effects of training on subcortical processing of vocal
expressions of emotion. Eur J Neurosci, 29, 661– 668.
SKOE AND KRAUS / EAR & HEARING, VOL. 31, NO. 3, 0 –0
Swaminathan, J., Krishnan, A., Gandour, J. T., et al. (2008). Applications of static and dynamic iterated rippled noise to evaluate pitch
encoding in the human auditory brainstem. IEEE Trans Biomed Eng,
55, 281–287.
Tallal, P., & Stark, R. E. (1981). Speech acoustic-cue discrimination
abilities of normally developing and language-impaired children.
J Acoust Soc Am, 69, 568 –574.
Thornton, A. R. D. (2007). Instrumentation and Recording Parameters. In
R. F. Burkard, J. J. Eggermont, M. Don (Eds). Auditory Evoked
Potentials: Basic Principles and Clinical Application (pp, 42–72).
Philadelphia, PA: Lippincott Williams & Wilkins.
Turner, C. W., Fabry, D. A., Barrett, S., et al. (1992). Detection and
recognition of stop consonants by normal hearing and hearing impaired
listeners. J Speech Hear Res, 35, 942–949.
Tzounopoulos, T., & Kraus, N. (2009). Learning to encode timing:
Mechanisms of plasticity in the auditory brainstem. Neuron, 62, 463–
469.
van Drongelen, W. (2007). Signal Processing for Neuroscientists: Introduction to the Analysis of Physiological Signals. Burlington, MA:
Academic Press.
Wallisch, P. (2009). Matlab for Neuroscientists: An Introduction to
Scientific Computing in Matlab. Amsterdam, Boston: Elsevier/Academic Press.
Wang, J., Nicol, T., Skoe, E., Sams, M., Kraus, N. (2010). Emotion and
brainstem responses to speech. Neuroscience Letters. doi:10.1016/
j.neulet.2009.12.018.
Weinberger, N. M. (2007). Associative representational plasticity in the
auditory cortex: A synthesis of two disciplines. Learn Mem, 14, 1–16.
23
Wible, B., Nicol, T., Kraus, N. (2004). Atypical brainstem representation
of onset and formant structure of speech sounds in children with
language-based learning problems. Biol Psychol, 67, 299 –317.
Wible, B., Nicol, T., Kraus, N. (2005). Correlation between brainstem and
cortical auditory processes in normal and language-impaired children.
Brain, 128, 417– 423.
Wile, D., & Balaban E. (2007). An auditory neural correlate suggests a
mechanism underlying holistic pitch perception. PLoS One, 2, e369.
Wong, P. C., Skoe, E., Russo, N. M., et al. (2007). Musical experience
shapes human brainstem encoding of linguistic pitch patterns. Nat
Neurosci, 10, 420 – 422.
Xu, Y., Krishnan, A., Gandour, J. T. (2006). Specificity of experiencedependent pitch representation in the brainstem. Neuroreport, 17,
1601–1605.
Young, E. D., & Sachs, M. B. (1979). Representation of steady-state
vowels in the temporal aspects of the discharge patterns of populations
of auditory-nerve fibers. J Acoust Soc Am, 66, 1381–1403.
Zatorre, R. J., Belin, P., Penhune, V. B. (2002). Structure and function of
auditory cortex: Music and speech. Trends Cogn Sci, 6, 37– 46.
Ziegler, J., Pech-Georgel, C., Florence, G., et al. (2009). Speech-perception-in-noise deficits in dyslexia. Dev Sci, 12, 732–745
REFERENCE NOTES
1. Chandrasekaran, B., & Kraus, N., (in press). Music and noise exclusion:
Implications for individuals with learning disabilities. Music Perception.
2. Krizman, J., Skoe, E., Kraus, N. (in press). Stimulus rate and subcortical
auditory processing of speech. Audiol Neurotol.
Download