Sound temporal envelope and time-patterns of activity in the

Sound temporal envelope and time-patterns of activity in the
human auditory pathway: an fMRI study
by
Michael Patrick Harms
B.S., Electrical Engineering
Rice University, 1994
Submitted to the Harvard-M.I.T. Division of Health Sciences and Technology
in partial fulfillment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2002
© 2002 Michael P. Harms. All rights reserved.
The author hereby grants to M.I.T. permission to reproduce
and to distribute publicly paper and electronic
copies of this thesis document in whole or in part.
Signature of Author:........................................................................................................................................
Harvard-M.I.T. Division of Health Sciences and Technology
March 4, 2002
Certified By:....................................................................................................................................................
Jennifer R. Melcher, Ph.D.
Assistant Professor of Otology and Laryngology, Harvard Medical School
Thesis Supervisor
Accepted By: ...................................................................................................................................................
Martha L. Gray, Ph.D.
Edward Hood Taplin Professor of Medical Engineering and Electrical Engineering
Co-director, Harvard-M.I.T. Division of Health Sciences and Technology
ii
Sound temporal envelope and time-patterns of activity in the human auditory
pathway: an fMRI study
by
Michael Patrick Harms
Submitted to the Harvard-M.I.T. Division of Health Sciences and Technology on
March 4, 2002 in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Abstract
The temporal envelope of sound strongly influences the intelligibility of speech, pattern
analysis, and the grouping of sequential stimuli. This thesis examined the coding of sound
temporal envelope in the time-patterns of population neural activity of the human auditory
pathway. Traditional microelectrode recordings capture the fine time-pattern of neural spiking in
individual neurons, but do not necessarily provide a good assay of temporal coding in neural
populations. In contrast, functional magnetic resonance imaging (fMRI), the technique chosen
for the present study, provides an indicator of population activity over a time-scale of seconds,
with the added advantage that it can be used routinely in human listeners.
In a first study, it was established that the time-pattern of cortical activity is heavily
influenced by sound repetition rate, whereas the time-pattern of subcortical activity is not. In
the inferior colliculus, activity to prolonged noise burst trains (30 s) increased with increasing
rate (2/s – 35/s), but was always sustained throughout the train. In contrast, the most striking
sound rate dependence of auditory cortex was seen in the time-pattern of activity. Low rates
elicited sustained activity, whereas high rates elicited “phasic” activity, characterized by strong
adaptation early in the train and a robust response to train offset. These results for auditory
cortex suggested that certain sound temporal envelope characteristics are encoded over multiple
seconds in the time-patterns of cortical population activity.
A second study tested this idea more fully by using a wider variety of sounds (e.g., speech,
music, clicks, tones) and by systematically varying different sound features. Important for this
test was the development of a new set of basis functions for use in a general linear model that
enabled the detection and quantification of the full range of cortical activity patterns. This study
established that the time-pattern of cortical activity is strongly dependent on sound temporal
envelope, but not sound level or bandwidth. Namely, as either rate or sound-time fraction
increases, the time-pattern shifts from sustained to phasic.
Thus, shifts in the time-pattern of cortical activity from sustained to phasic signal subsecond
differences in sound temporal envelope. These shifts may be fundamental to the perception of
successive acoustic transients as either distinct or grouped acoustic events.
Thesis Supervisor: Jennifer R. Melcher, Ph.D.
Assistant Professor of Otology and Laryngology, Harvard Medical School
iii
iv
ACKNOWLEDGEMENTS
I will forever be grateful to the many people that contributed either directly to this thesis, or my
personal development over the last 7 plus years at MIT. I have benefited tremendously from your advice,
and most importantly your friendship.
This thesis never would have been possible without the guidance and nurturing of my advisor,
Jennifer Melcher. She contributed immensely to my scientific development. She always challenged me
to produce my best, but at the same time remained a good friend (quite an accomplishment for two Type
A personalities). I very much enjoyed our frequent interaction, and her ever “open door”, which made it
easy to get comments and feedback. Jennifer willingly shouldered the burden of assuring that we were
funded, and that human subjects approval was obtained, thus freeing me to focus blissfully on my
research. I fully realize that not all graduate students are so fortunate. Thanks Jennifer!
My other thesis committee members, John Guinan, Mark Tramo, and Anders Dale, also provided
many helpful comments and insights. The fact that they did not ask to be excused from my committee
following an initial 4 hour thesis proposal meeting-marathon speaks volumes about their patience! As
thesis chairman, John was always available to critique a line of thought, a poster, or draft of an abstract or
thesis chapter, which I more than took advantage of given his easily accessibility right down the hall.
I probably would never have entered the Speech and Hearing Sciences Program were it not for
Nelson Kiang, whose vision and enthusiasm convinced me that joining this new program was a risk worth
taking. Conversations with Nelson are always insightful and provocative, and I thank him for many
fruitful career related discussions.
I was fortunate to be able to interact extensively with outstanding scientists from both the NMR
Imaging Center of Massachusetts General Hospital and the Eaton Peabody Laboratory of the
Massachusetts Eye and Ear Infirmary. At the NMR Center, I was welcomed by Bruce Rosen, Ken
Kwong, Bruce Jenkins, Robert Weisskoff, Hans Breiter, Randy Gollub, Joe Mandeville, Rick Hoge, and
Sean Marrett. I was equally welcomed at EPL by Charlie Liberman, Chris Brown, Bertrand Delgutte,
Bill Peake, John Rosowski, Peter Cariani, Barbara Fullerton, Mike Ravicz, and Chris Shera. Thanks to
both groups of scientists for their comments and advice, and for fostering an enjoyable work environment.
I was very fortunate to be able to interact on a daily basis with a very high-caliber group of other
students and post-docs. Pankaj Oberoi, Susan Voss, Mark Oster, Diane Ronan, Janet Slifka, Joyce
Rosenthal, Chandran Seshagiri, Alan Groff, Annette Taberner, Ona Wu, Whitney Edmister, and Greg
Zaharchuk have been good friends and colleagues. Additional good friends, who were also frequently
critics of my posters, practice talks, and manuscripts include Irina Sigalovsky, Monica Hawley, Tom
Talavage, Martin McKinney, John Iversen, Sridhar Kalluri, and Courtney Lane. Ben Hammond’s
untimely death cost me a dear friend, and deprived the auditory community of a great thinker. Doug
Greve of the NMR Center graciously let me use his Matlab implementation of the general linear model,
and helped explain its inner workings so that I could modify it to my purposes. Outside of lab, Adelle
Smith, Mike Lohse, Phil Bradley, and Julie Bradley all helped to assure that my time in Boston was a
great experience.
v
I freely admit that I was spoiled as a graduate student in terms of the administrative support that I
received. Barbara Norris was absolutely instrumental in poster and figure preparation. Dianna Sands in
the EPL front office kept things lively and handled many things that she was fully entitled to tell me to do
myself. The EPL engineering staff, Dave Steffens, Ish Stefanov-Wagner, and Frank Cardarelli ensured
that I had the computer support necessary to do my work. Terry Campbell and Mary Foley of the NMR
Center were always available to explain how to run the magnet, and how to get it to do just what we
wanted.
I thank my many subjects for agreeing to lie still in the magnet for 2 hours, while being instructed
continually to “listen attentively” to rather boring acoustic stimuli. Many of my friends were subjects at
some point, for which I am tremendously grateful.
Somehow my parents got the impression (probably from a lack of clarity on my part) that this “PhD
thing” was just a four-year process. I’m sure that they are ecstatic that they can now give a satisfactory
answer to the questions from friends and relatives about when I would be done with “school”. I thank
them for their unending love and support through it all. My brother Brian and sister Erin were also a key
component of my support structure. It was a real treat having Brian here in Boston the past three and a
half years.
Finally, I thank my wonderful wife, Nicole, for her unconditional love and support. The incredible
depth of her love gives me a glimpse of God’s abounding love for humankind, by which He sent His Son,
Jesus Christ, to be our Savior. Meeting and marrying her will always be the pinnacle of my Boston
experience. I know that this whole thesis was very emotional and personal for her. My pains were her
pains, and my joys were her joys. I can’t wait to set out on our next adventure together. Nicole is my
earthly angel, sent by God, to be my companion and soul-mate. Neither of us could have done this
without God’s serenity, peace, courage, and wisdom. We thank Him from our heart for His many
blessings.
Michael Harms
March 4, 2002
The work in this thesis was supported by NIH/NIDCD PO1DC00119, RO3DC03122, T32DC00038, and
a Martinos Scholarship.
vi
They that wait upon the Lord shall renew their strength.
They will soar on wings like eagles.
Isaiah 40:31
vii
Table of Contents
Chapter 1 Introduction
Overview ..........................................................................................................................................11
Thesis structure and chapter overview .........................................................................................13
References........................................................................................................................................16
Chapter 2 Sound repetition rate in the human auditory pathway:
Representations in the waveshape and amplitude of fMRI
activation
Abstract............................................................................................................................................19
Introduction.....................................................................................................................................20
Methods............................................................................................................................................22
Experiments I and II: Noise burst trains with different burst repetition rates...........................23
Subjects................................................................................................................................23
Acoustic stimulation ............................................................................................................23
Task......................................................................................................................................25
Imaging ................................................................................................................................26
Analysis ...............................................................................................................................28
Experiment III: Small numbers of noise bursts.........................................................................33
Experiment IV: Noise burst trains with different durations......................................................35
Results ..............................................................................................................................................35
Response to noise burst trains: effect of burst repetition rate ...................................................35
Inferior Colliculus................................................................................................................35
Medial Geniculate Body ......................................................................................................39
Heschl's gyrus and superior temporal gyrus ........................................................................43
Response to small numbers of noise bursts...............................................................................47
Response to high rate (35/s) noise burst trains: effect of train duration....................................50
Discussion ........................................................................................................................................51
Role of rate per se in determining fMRI responses ..................................................................52
fMRI responses and underlying neural activity ........................................................................53
fMRI response onset and neural adaptation ..............................................................................55
Phasic response “off-peak” and neural off responses ...............................................................60
Phasic response recovery ..........................................................................................................61
Comparison to previous fMRI and PET studies – auditory and non-auditory..........................62
Relationship between fMRI response waveshape and sound perception..................................64
References........................................................................................................................................66
viii
Chapter 3 Detection and quantification of a wide range of fMRI temporal
responses using a physiologically-motivated basis set
Abstract............................................................................................................................................75
Introduction.....................................................................................................................................76
Methods............................................................................................................................................80
fMRI data ..................................................................................................................................80
Basis functions ..........................................................................................................................82
Response and noise estimation under the general linear model................................................86
Examination of residuals .....................................................................................................88
Practical implementation .....................................................................................................91
Activation map formation .........................................................................................................92
Waveshape index ......................................................................................................................93
Results ..............................................................................................................................................94
Activation detection: OSORU vs. sustained-only and sinusoidal basis functions ....................94
Relative importance of the OSORU basis functions.................................................................97
Assessment of correspondence between OSORU components and actual waveforms ............99
Using the OSORU basis functions to probe response physiology ..........................................101
Discussion ......................................................................................................................................105
Successful response detection with the OSORU basis set ......................................................105
A challenging database provided a strong test of the OSORU basis set.................................106
Detecting and mapping response dynamics ............................................................................106
Previous implementations of the general linear model within a physiological framework ....108
Physiologically-based implementations of the GLM: broad applicability to any brain system109
References......................................................................................................................................111
Chapter 4 The temporal envelope of sound determines the time-pattern of
fMRI responses in human auditory cortex
Introduction...................................................................................................................................115
Methods..........................................................................................................................................118
Stimuli.....................................................................................................................................119
Stimulus level..........................................................................................................................120
Task.........................................................................................................................................121
Sound delivery ........................................................................................................................121
Acoustic stimulation paradigm ...............................................................................................122
Handling scanner acoustic noise .............................................................................................122
Imaging ...................................................................................................................................124
Image pre-processing ..............................................................................................................126
Response detection..................................................................................................................126
Waveshape quantification..................................................................................................129
Calculating response waveforms .......................................................................................131
Defining regions of interest.....................................................................................................132
ix
Results ............................................................................................................................................133
Waveshape dependence on stimulus type in posterior auditory cortex...................................133
Waveshape dependence on modulation rate in posterior auditory cortex...............................137
Waveshape dependence on sound-time fraction in posterior auditory cortex.........................139
Insensitivity of waveshape to sound level in posterior auditory cortex ..................................142
Insensitivity of waveshape to sound bandwidth in posterior auditory cortex .........................146
Response waveshapes throughout auditory cortex for music and 35/s noise bursts ...............147
Differences in response waveshape between cortical areas ....................................................155
Left-right differences in response waveshape.........................................................................158
Discussion ......................................................................................................................................161
Response waveshape: hemodynamic vs. neural factors.........................................................162
Response waveshape: neural adaptation and off-responses...................................................163
Response waveshape and sound temporal envelope characteristics: rate and sound time
fraction ...............................................................................................................................164
Relationship between sound perception, fMRI time-pattern, and neural activity...................167
References......................................................................................................................................170
Appendix
Subjects ..........................................................................................................................................175
HG Data .........................................................................................................................................177
STG Data .......................................................................................................................................183
IC Data...........................................................................................................................................189
Biography
x
Chapter 1
Introduction
OVERVIEW
A primary goal of auditory neuroscience is to understand how human speech and
environmental sounds are represented in neural activity, and how this information is processed and
transformed at the various stages of the auditory pathway. Over the past 50 years, microelectrode
recordings in animals have yielded detailed information regarding the spatial and temporal patterns
of neural activity evoked by acoustic stimuli. This animal work has provided considerable insight
into the coding of various sound features (e.g., frequency, intensity, amplitude modulation) in the
activity of individual neurons. However, because sampling from many neurons across a region of
tissue can be difficult and time-consuming, microelectrode recordings are generally insufficient for
revealing how sound features are represented in population neural activity. Ultimately, knowing
how systems of neurons encode sound features in their population activity may be as relevant and
important for understanding aspects of speech processing and auditory perception as a detailed
knowledge of how sound is represented in individual neurons.
Commonly employed techniques for studying population activity include evoked potentials,
electroencephalography (EEG), magnetoencephalography (MEG), positron emission topography
(PET) and functional magnetic resonance imaging (fMRI). One advantage of these techniques is that
11
12
Chapter 1: Introduction
they can be applied routinely to humans. This is important, since the degree to which animal
findings extend to humans remains uncertain, due to interspecies differences, possible effects of
anesthesia, and a paucity of data in humans that can serve as a link to the animal work. Additionally,
some of the neural processes relevant to human speech processing and auditory perception may be
altogether unique to humans.
Ultimately, direct neurophysiological data in human listeners is
important if we are to understand how sound features are coded in the activity patterns of the human
brain.
This thesis studies population neural activity of the human auditory system using fMRI.
Since its emergence in the early 1990’s (Kwong et al. 1992; Ogawa et al. 1992), fMRI has been
widely adopted as a technique for studying human brain activity. A particular strength of fMRI is its
ability to map brain activity directly to anatomy with a high spatial resolution (~1 mm) compared to
other neuroimaging techniques. While the vast majority of fMRI studies to date have focused on
cortical activity, fMRI can successfully examine activity in structures throughout the auditory
pathway (Guimaraes et al. 1998; Melcher et al. 1999). Since multiple levels of the auditory pathway
can be studied simultaneously with fMRI, the transformation of neural activity across different levels
of the pathway can be examined directly within individual subjects. While most fMRI studies have
focused on the spatial patterns of brain activity, fMRI also has the temporal resolution necessary to
uncover changes in the temporal patterns of population neural activity that occur over a span of
seconds, as this thesis amply illustrates.
The fMRI response arises from localized hemodynamic changes that ultimately reflect
changes in “neural activity” (broadly defined as neural spiking, and excitatory and inhibitory
synaptic activity; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995; Heeger et
al. 2000; Rees et al. 2000; Logothetis et al. 2001). Because the hemodynamic system responds in a
“sluggish” manner to changes in neural activity, the fMRI response can be thought of as reflecting
the time-envelope of population neural activity in a local region of the brain. This thesis focuses
particularly on how this time-envelope of activity relates to sound features and how it changes across
different levels of the auditory pathway.
13
The focus on the time-pattern (i.e., dynamics) of fMRI responses arose from a discovery
early in this thesis of a novel fMRI response in auditory cortex. Early concern in the fMRI literature
regarding response dynamics focused on whether or not the response to very prolonged stimuli (e.g.,
minutes long) decreased over time solely due to changes in the coupling between hemodynamic and
metabolic factors (Frahm et al. 1996; Bandettini et al. 1997; Chen et al. 1998; Howseman et al.
1998). Other fMRI studies subsequently observed transient response features that occurred over
shorter time spans and which are likely related to adaptation of neural activity (Hoge et al. 1999;
Jäncke et al. 1999; Giraud et al. 2000; Sobel et al. 2000). Nonetheless, the transient aspects of
cortical fMRI responses reported in this thesis are particularly dramatic, including a rapid decline in
signal to near baseline following an initial response to stimulus onset, and a prominent response
following the termination of the stimulus.
This phasic response, in contrast to the sustained
responses typically observed with fMRI, indicates that the time-pattern of auditory fMRI responses
contains information about robust sound-dependent variations in the population neural activity of the
auditory pathway.
THESIS STRUCTURE AND CHAPTER OVERVIEW
The thesis is composed of three main chapters, each written in the style of a self-contained
paper. An overview of each of the three chapters follows.
The phasic fMRI response was first discovered in a study investigating how repetition rate is
represented in the activity patterns of multiple auditory structures in the human brain (Chapter 2). At
the commencement of this thesis, there were very few studies exploring the relationship between the
fMRI response and fundamental stimulus parameters such as repetition rate, stimulus level, or
bandwidth for the types of simple acoustic stimuli used routinely in auditory electro- and
neurophysiology, such as noise bursts, tone bursts, clicks, and continuous noise. Responses to noise
bursts presented at repetition rates ranging from 1/s to 35/s were collected from the inferior
colliculus, medial geniculate body, and both primary and non-primary auditory cortex. This study
revealed that the time-pattern of the fMRI response was highly dependent on repetition rate, in a
14
Chapter 1: Introduction
manner that itself was dependent on auditory structure. In particular, responses in the inferior
colliculus were sustained at all rates, although they increased in amplitude with increasing rate. In
contrast, sustained responses were only elicited at low rates in auditory cortex (e.g., 2/s), whereas the
highest rate (35/s) elicited a response with a highly phasic time-pattern.
The DISCUSSION of
Chapter 2 links these transient response features to neural adaptation and the generation of neural
off-responses, and includes a more detailed discussion of the relationship between the fMRI signal
and neural activity.
The discovery of a novel temporal fMRI response in auditory cortex necessitated a
reevaluation of the statistical model employed for detecting regions (i.e., voxels) of the brain
responsive to a given stimulus. In Chapter 3, I develop a method capable of detecting responses with
a wide variety of temporal dynamics, while simultaneously extracting information about individual
temporal features of the response. Specifically, I implemented the general linear model using a
novel set of “physiologically-motivated” basis functions chosen to reflect temporal features of
auditory cortical fMRI responses. The performance of this basis set in detecting responses is
compared against two other basis sets that have been commonly employed in fMRI analyses.
Additionally, I establish that this physiologically-motivated basis set proves effective in exploring
brain physiology.
Equipped with a new approach for detecting and quantifying a wide variety of responses,
Chapter 4 proceeds to rigorously establish which particular sound features are primarily coded in the
time-pattern of auditory cortical fMRI responses. I establish that the time-pattern of auditory fMRI
responses is primarily determined by sound temporal envelope, but not sound level or bandwidth. In
particular, as either repetition rate or sound-time fraction increases, the time-pattern shifts from
sustained to phasic. I further establish that sound temporal envelope characteristics are strongly
represented in the time-pattern of fMRI responses throughout auditory cortex.
Overall, the relationship between sound temporal envelope and the time-pattern of neural
activity is particularly interesting in light of the perceptual changes that occur as sound envelope is
varied. Successive stimuli in a low-rate train can be discerned individually, whereas those of higher
rate trains begin to fuse into a continuous percept, and may be grouped into a single auditory
15
“event”.
The changes of response time-pattern in auditory cortex are correlated with these
perceptual changes – low-rate trains evoke sustained responses, consistent with successive neural
responses to each burst in a train, whereas high rate trains evoke phasic responses, consistent with
neural activity primarily concentrated at the onset and offset of the overall train. Across levels of the
auditory pathway, the results of this thesis indicate that lower levels in the auditory pathway can
respond to successive acoustic transients up to higher rates than higher levels of the auditory
pathway. At lower levels of the pathway, population neural activity codes the occurrence of each
successive acoustic transient in an ongoing sound. In contrast, population neural activity in cortex
may reflect whether or not successive acoustic transients are perceptually grouped into a single
auditory event.
16
Chapter 1: Introduction
REFERENCES
Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity
and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516,
1983.
Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW,
Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes
during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997.
Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation
of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation.
Magn Reson Med 39: 520-527, 1998.
Frahm J, Kruger G, Merboldt KD and Kleinschmidt A. Dynamic uncoupling and recoupling of
perfusion and oxidative metabolism during focal brain activation in man. Magn Reson Med 35: 143148, 1996.
Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt
A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000.
Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S,
Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain
Mapp 6: 33-41, 1998.
Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does
neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000.
Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD
and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999.
Howseman AM, Porter DA, Hutton C, Josephs O and Turner R. Blood oxygenation level
dependent signal time courses during prolonged visual stimulation. Magn Reson Imaging 16: 1-11,
1998.
Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the
BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res
Cogn Brain Res 8: 117-124, 1999.
Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect
synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995.
Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy
DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic
magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl
Acad Sci 89: 5675-5679, 1992.
Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological
investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001.
Melcher JR, Talavage TM and Harms MP. Functional MRI of the auditory system. In: Functional
MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 393-406.
17
Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic
activity in the central auditory system. J Comp Neurol 245: 553-565, 1986.
Ogawa S, Tank DW, Menon R, Ellermann JM, Kim SG, Merkle H and Ugurbil K. Intrinsic
signal changes accompanying sensory stimulation: Functional brain mapping with magnetic
resonance imaging. Proc Natl Acad Sci U S A 89: 5951-5955, 1992.
Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties
of human and macaque V5. Nat Neurosci 3: 716-723, 2000.
Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE.
Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol
83: 537-551, 2000.
18
Chapter 2
Sound repetition rate in the human auditory
pathway: Representations in the waveshape and
amplitude of fMRI activation
ABSTRACT
Sound repetition rate plays an important role in stream segregation, temporal pattern
recognition, and the perception of successive sounds as either distinct or fused. The present study
was aimed at elucidating the neural coding of repetition rate and its perceptual correlates. We
investigated the representations of rate in the auditory pathway of human listeners using functional
magnetic resonance imaging (fMRI), an indicator of population neural activity. Stimuli were trains
of noise bursts presented at rates ranging from low (1-2/s; each burst is perceptually distinct) to high
(35/s; individual bursts are not distinguishable). There was a systematic change in the form of fMRI
response rate-dependencies from midbrain, to thalamus, to cortex.
In the inferior colliculus,
response amplitude increased with increasing rate while response waveshape remained unchanged
and sustained. In the medial geniculate body, increasing rate produced an increase in amplitude and
some change in waveshape at higher rates (from sustained to one showing a moderate peak just after
train onset). In auditory cortex (Heschl's gyrus and the superior temporal gyrus), amplitude changed
19
20
Chapter 2: Repetition Rate
some with rate, but a far more striking change occurred in response waveshape – low rates elicited a
sustained response, whereas high rates elicited an unusual phasic response that included prominent
peaks just after train onset and offset. The shift in cortical response waveshape from sustained to
phasic with increasing rate corresponds to a perceptual shift from individually resolved bursts to
fused bursts forming a continuous (but modulated) percept. Thus, at high rates, a train forms a single
perceptual “event”, the onset and offset of which are delimited by the on and off peaks of phasic
cortical responses. While auditory cortex showed a clear, qualitative correlation between perception
and response waveshape, the medial geniculate body showed less correlation (since there was less
change in waveshape with rate), and the inferior colliculus showed no correlation at all. Overall, our
results suggest a population neural representation of the beginning and the end of distinct perceptual
events that is weak or absent in the inferior colliculus, begins to emerge in the medial geniculate
body, and is robust in auditory cortex.
INTRODUCTION
It is well-known from human psychophysical experiments that the perception of a succession
of sounds depends strongly on the rate of sound presentation. For instance, when bursts of noise are
presented repeatedly at a low rate (e.g., < 10/s), each burst can be separately resolved (Miller and
Taylor 1948; Symmes et al. 1955). In contrast, bursts presented at a higher rate fuse to form a single,
modulated percept. In experiments where multiple series of sounds are presented simultaneously
(e.g., a series of high and a series of low frequency tone bursts), the rate of sound presentation
influences whether the series are perceived as single or separate streams, as well as the perceived
temporal pattern within each stream (Royer and Robin 1986; Bregman 1990). The dependencies on
rate observed in controlled psychophysical experiments such as these suggest that rate plays an
important role in the perception of the more complex acoustic conditions encountered in everyday
life.
Since repetition rate plays so basic a role in determining how sounds are heard it is not
surprising that there have been numerous neurophysiological studies of rate in animals. Broad trends
Introduction
21
concerning the coding of rate in the auditory pathway have emerged from this work. For instance,
the highest repetition rates at which neurons respond faithfully to each successive sound in a train (or
each successive cycle of amplitude modulated stimuli) tends to decrease from brainstem to thalamus
to cortex (e.g., Creutzfeldt et al. 1980; Schreiner and Langner 1988; Langner 1992). In cortex, the
neural coding of low and high rates may be accomplished by different populations of neurons, one
coding low rate stimuli through stimulus-synchronized activity and the other coding high rates in the
overall amount of discharge activity (Lu and Wang 2000; Lu et al. 2001). While the animal work
has shed light on the neural representations of repetition rate, the degree to which the animal findings
extend to humans remains uncertain because of interspecies differences, anesthesia differences, and a
paucity of data in humans that can serve as a link to the animal work.
In the end, direct
neurophysiological data in human listeners is important if we are to understand how repetition rate is
represented in the activity patterns of the human brain.
Most previous neurophysiological studies of repetition rate in humans have used noninvasive
techniques for probing brain function, such as evoked potential and evoked magnetic field
measurements. The evoked response work has examined averaged responses at short, middle, and
long latencies to various types of brief stimuli (e.g., clicks, tone and noise bursts) presented at
different rates (Picton et al. 1974; Thornton and Coleman 1975; Näätänen and Picton 1987). A
particular strength of evoked potential and magnetic field measurements is that they can be used to
examine responses to individual stimuli within a train up to much higher rates than with other
noninvasive brain imaging techniques (see below). A limitation, however, is that the sites of
response generation cannot always be reliably localized. Evoked magnetic field examinations of
repetition rate are further limited in that they provide information mainly concerning cortical areas
because of inherent limitations in probing subcortical function using this technique (Erne and Hoke
1990).
Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI),
two techniques for spatially mapping brain activity, have also been used to examine the dependence
of human brain activation on repetition rate. Compared to evoked potential and magnetic field
measurement, fMRI lacks the temporal resolution needed to separately resolve the responses
22
Chapter 2: Repetition Rate
produced by individual stimuli in a train (except at extremely low rates, e.g., ~0.1/s), and the
temporal resolution of PET is even less. An important advantage, however, is that both PET and
fMRI enable activation to be directly localized to brainstem, thalamic, and cortical structures of the
auditory pathway (Guimaraes et al. 1998; Lockwood et al. 1999; Melcher et al. 1999; Griffiths et al.
2001). The localization provided by fMRI is particularly precise because of the technique's high
spatial resolution and direct mapping to anatomy. Despite the fact that fMRI and PET can show
activation at different stages of the auditory pathway, most rate studies using these approaches have
focused exclusively on cortical areas. All but one (Giraud et al. 2000) have also focused on low
repetition rates (< 2.5/s; Price et al. 1992; Binder et al. 1994; Frith and Friston 1996; Dhankhar et al.
1997; Rees et al. 1997). Overall, there is limited PET or fMRI data concerning the representations of
rate within the human auditory pathway. Specifically, there is little information concerning the
transformation of rate representations from structure to structure within the pathway for a wide range
of psychophysically relevant rates.
The present fMRI study compared the representation of repetition rate across cortical and
subcortical structures of the human auditory pathway using a wide range of rates. Stimuli were
trains of repeated noise bursts with repetition rates ranging from low (where each burst could be
resolved individually) to high (where individual bursts were not distinguishable and the train was
perceived as a continuous, but modulated, sound). Noise bursts were chosen as the elemental
stimulus based on the assumption that broadband sound would elicit robust responses by activating
neurons across a wide range of characteristic frequencies. fMRI was selected for its high spatial
resolution, its localizing capabilities, and its higher temporal resolution (~2 s) compared to PET (>10
s).
The latter feature proved important because one of the most striking differences in rate
representation across structures occurred in the temporal dynamics of the fMRI response.
METHODS
Four series of experiments were conducted. The first two examined the effect of repetition
rate on the response to a noise burst train in the inferior colliculus (IC), Heschl's gyrus (HG), and the
Methods
23
superior temporal gyrus (STG; Experiment I) or the IC and medial geniculate body (MGB;
Experiment II). The remaining Experiments (III, IV) were aimed at understanding one of the
findings from Exps. I and II, namely an unusual form of temporal response in the cortex to trains
with a high repetition rate.
This study was approved by the institutional committees on the use of human subjects at the
Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts
General Hospital. All subjects gave their written informed consent.
Experiments I and II: Noise burst trains with different burst repetition rates
Subjects
Nine subjects participated in a total of 11 imaging sessions for Experiments I and II (Exp. I:
5 sessions, subject #'s 1-5; Exp. II: 6 sessions, subject #'s 2,5,6-9). Two subjects participated once in
each Experiment. Subjects ranged in age from 19 to 35 years (mean = 25.6). Eight of the nine
subjects were male. Eight of the nine were right-handed. Subjects had no known audiological or
neurological disorders.
Acoustic stimulation
The stimuli were bursts of uniformly distributed white noise. The bursts were presented at
repetition rates of 1, 2, 10, 35/s (Exp. I) or 2, 10, 20, 35/s (Exp. II). The 1/s rate was used in only 3
of the 5 sessions of Exp. I. Individual noise bursts in all four Experiments were always 25 ms in
duration (full width half maximum), with a rise/fall time of 2.5 ms. The spectrum of the noise
stimulus at the subject's ears was low-pass (6 kHz cutoff), reflecting the frequency response of the
acoustic system.
Noise bursts were presented in 30 s long trains alternated with 30 s “off'” periods, during
which no auditory stimulus was presented (Figure 2-1, top). Four alternations between “train on”
and “off” periods constituted a single scanning “run” (total duration 240 s). For all but two sessions
(in Exp. I), each of the four rates was presented once during each run, and their order was varied
24
Chapter 2: Repetition Rate
across runs. Within a train, the repeated noise bursts were identical (i.e. “frozen”), but the noise
bursts differed across trains and runs. For the other two sessions, the same rate was presented
throughout a run, and this rate was varied across runs. For these two sessions, the noise burst was
frozen throughout the entire run, but differed across runs. In each session, the total number of train
presentations at each rate was between 8-13 (mean: 11.2).
Separately for each ear, the subject's threshold of hearing to 10/s noise bursts was
determined in the scanner room immediately prior to the imaging session. Noise bursts were
presented binaurally at 55 dB above this threshold.
During both threshold determination and
functional imaging, there was an on-going low-frequency background noise produced primarily by
the pump for the liquid helium (used to supercool the magnet coils). This sound reaches levels of
∼80 dB SPL in the frequency range of 50-300 Hz (Ravicz et al. 2000).
Additionally during
functional imaging, each image acquisition generated a “beep” of approximately 115 dB SPL at 1.0
kHz (∼130 dB SPL at 1.4 kHz for Exps. III and IV).
Noise bursts were delivered through a headphone assembly that provided approximately 30
dB of attenuation at the primary frequency of the scanner-generated sounds (1.0 or 1.4 kHz; Ravicz
and Melcher 2001). Specifically, the noise bursts were produced by a D/A board (running under
LabView), amplified, and fed to a pair of audio transducers housed in a shielded box adjacent to the
scanner. The output of the transducers reached the subject's ears via air-filled tubes that were
incorporated into sound attenuating earmuffs.
25
EXPERIMENTS
I and II
Methods
Noise Burst
Noise Burst
Train
Train
Off
(Rate 1)
(Rate 2)
(30 s)
(30 s)
(30 s)
Time
25 ms
image acquired at each
tick mark ( ~ 2/s)
(18 s) (18 s) (18 s)
Time
1 NB
2 NBs
@ 2/s
2 NBs
@ 35/s
5 NBs
@ 35/s
~
~
EXPERIMENT
III
100 ms
500 ms
28.6 ms
Figure 2-1: Schematic of the stimulus paradigm for Exps. I-III. In Exps. I and II trains of noise
bursts at a given repetition rate were presented for 30 s, followed by a 30 s “off” period. This
alternation was repeated four times for each imaging “run”, typically using a different repetition rate
for each “train on” period. Tick marks represent an image acquisition (approximately every 2 s).
The expanded view uses a smaller time scale to illustrate the stimulus – in this case a portion of a
prolonged 10/s noise burst train. In Exp. III, “trials” of noise bursts, consisting of either 1, 2 or 5
noise bursts, were presented once every 18 s (15-16 trials per run). The interstimulus interval for the
two noise bursts was either 500 ms or 28.6 ms (i.e., 2 NBs@2/s and 2 NBs@35/s; in this case the
expanded view shows the complete stimulus for a given trial). The trials with five noise bursts used
an interstimulus interval of 28.6 ms. In one imaging session, all four trial types were presented in
randomized order. In the other two sessions, the same trial type was used throughout a run. In all
experiments, individual noise bursts were always 25 ms in duration.
Task
Subjects were instructed to listen to the noise burst trains. For Exp. II, subjects performed an
additional, simple task to further ensure that they remained attentive. They indicated whenever they
detected an occasional 6 dB increment or decrement in intensity by raising or lowering their index
finger. Intensity changes persisted for all the noise bursts that occurred in a 1 s interval. Subject
26
Chapter 2: Repetition Rate
responses were monitored by the experimenter, who could see the subject's finger from the imager
control room. Each subject identified more than 90% of the intensity changes. At the end of each
scanning run (for all Exps.), subjects reported their alertness on a qualitative scale ranging from 1
(fell asleep during run) to 5 (highly alert). Alertness ratings were almost always in the 3-5 range,
and were never 1. No data were discarded because of inadequate subject alertness.
Imaging
Subjects were imaged using a 1.5 Telsa whole-body scanner (General Electric) and a head
coil (transmit/ receive; General Electric). The scanner was retrofitted for high-speed imaging (i.e.,
single-shot echo-planar imaging; Advanced NMR Systems, Inc.). Subjects rested supine in the
scanner. To avoid head motion, they were fitted with a bite bar custom-molded to their teeth and
mounted to the head coil.1 Each imaging session lasted ~2 hours and included the following
procedures:
1. Contiguous sagittal images of the whole head were acquired.
2. An automated, echo-planar based shimming procedure was performed to increase
magnetic field homogeneity within the brain regions to be functionally imaged (Reese et al. 1995).
3. The brain slice to be functionally imaged was selected using the sagittal images as a
reference. For Exp. I, the selected slice intersected the IC and the posterior aspect of HG and the
STG (Figure 2-2, left and middle). When there appeared to be multiple transverse temporal gyri, we
selected the anterior one as HG (Penhune et al. 1996; Leonard et al. 1998). For Exp. II, the slice
intersected the IC and MGB (located just ventral and lateral to the cerebral aqueduct; Figure 2-2,
right). A single slice, rather than multiple slices, was imaged to reduce the impact of scannergenerated acoustic noise on auditory activation.
1
In Exp. IV, and two (of three) sessions in Exp. III, we used a simpler set-up in which a pillow and foam were
packed snugly around the head to reduce head motion, rather than using a bite bar.
Methods
27
Experiments I, III, IV
3 mm from Midline
39 mm from Midline
Superior
1 cm
Posterior
Inferior Colliculus
Heschl's Gyrus
Experiment II
Midline
Anterior Cerebral Brachium between
Aqueduct
Inferior Colliculi
Figure 2-2: Functional imaging planes superimposed on sagittal, anatomical images. In Exps. I, III,
and IV the plane (thick white line) passed through the inferior colliculi (top left panel) and Heschl's
gyri (top right panel). In Exp. II, the plane passed through the inferior colliculi (located just lateral to
the brachium of the inferior colliculi) and the medial geniculate bodies of the thalamus (located just
ventral and lateral to the cerebral aqueduct; bottom panel).
28
Chapter 2: Repetition Rate
4. A T1-weighted, high-resolution anatomical image was acquired of the selected brain slice
for subsequent overlay of the functional data (TR = 10 s, TI = 1200 ms, TE = 40 ms, in-plane
resolution = 1.6 x 1.6 mm, thickness = 7 mm). A second high-resolution anatomical image was
acquired at the end of the session after functional imaging. A comparison of the initial and final T1
images allowed for a gross check of subject movement over the session.
5. Functional images of the selected slice were acquired using a blood oxygenation level
dependent (BOLD) sequence (asymmetric spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o,
thickness = 7 mm, in-plane resolution = 3.1 x 3.1 mm). The beginning of each scanning “run”
included four discarded images to ensure that image signal level had approached a steady state.
During the remainder of the run, functional images of the selected slice were acquired repeatedly
while the noise burst trains were alternately turned on for 30 seconds and off for 30 seconds (Figure
2-1, top).
Functional imaging was performed using a cardiac gating method that increases the
detectability of activation in the inferior colliculus (Guimaraes et al. 1998). Image acquisitions were
synchronized to every other QRS complex in the subject's electrocardiogram, and the interimage
interval (TR) was recorded. The average TR across all sessions was 2082 ms (the average within a
session varied from 1521 to 2650 ms). Fluctuations in heart rate lead to variations in TR that result
in image-to-image variations in image signal strength (i.e., T1 effects). Using the measured TR
values, image signal was corrected to account for these variations (Guimaraes et al. 1998).
Analysis
Image pre-processing
The images for each scanning run were corrected for any movements of the head that may
have occurred over the course of the imaging session. Each functional image of a session was
translated and rotated to fit the first image of the first functional run using standard software
(SPM95; without spin history correction; Friston et al. 1995; Friston et al. 1996). Because only one
functional slice was acquired, these corrections for motion were necessarily limited to adjustments
Methods
29
within the imaging plane. In most cases, the motion correction algorithm was well-behaved and
resulted in an improvement in image alignment. However, for one session, the algorithm introduced
some clearly artifactual movement, so the pre-motion corrected data was utilized. Additionally, we
did not include the MGB of one subject in the analysis, because the image translations calculated by
the motion-correction algorithm were smaller than the movement evident at the location of the MGB
in the T1 anatomical images acquired pre- and post- functional imaging. A similar discrepancy did
not occur for the IC of this subject, so the IC data were included. The images for each run were
further processed in two ways to enhance the likelihood of detecting activation. (1) Image signal vs.
time for each voxel was corrected for linear or quadratic drifts in signal strength over each run (i.e.,
drift-corrected). (2) Image signal vs. time for each voxel was normalized such that the time-average
signal had the same (arbitrary) value for all voxels and runs. (Specifically, the signal vs. time data
were ratio normalized to the intercept of a least square quadratic fit to the data). This normalization
was done to eliminate artificial discontinuities in the signal level between runs in the subsequently
concatenated data. All subsequent analyses were performed on the drift-corrected, normalized
images.
Generating activation maps
Maps of activation were derived as follows. First, each image in the file was assigned to
either a “train on” or “off” period. Stimulus-evoked changes in image signal typically have a delay
of 4-6 s (Kwong et al. 1992; Bandettini et al. 1993; Buckner et al. 1996). To account for this
(hemodynamic) delay, the first three images taken after the onset of a noise burst train were assigned
to the preceding “off” period, and the first three images after the train offset were assigned to the
preceding “train on” period. For each rate, the images assigned to each “train on” period and its
following “off” period were concatenated into a single file. For each voxel in the functional images,
image signal strength during train on vs. off periods was compared using an unpaired t-test (Press et
al. 1992). The p-value result of this statistical test, plotted as a function of position, constituted an
activation map. P-values were not corrected to account for the correlated nature of fMRI time-series
(Purdon and Weisskoff 1998 ), nor were they adjusted for the repeated application (voxel-by-voxel)
of a statistical test (Friston et al. 1994).
30
Chapter 2: Repetition Rate
Defining regions of interest
Responses were analyzed quantitatively within four anatomically-defined regions of interest
(ROIs): the IC, MGB, HG, and STG. Independent of the activation maps, the borders of these
structures were identified directly in the high-resolution anatomical images of the functional imaging
plane. These border-delimited “high resolution” regions of interest were then down-sampled to the
same resolution as the functional images for the subsequent analysis. The borders in the highresolution anatomical images were defined as follows:
IC: In Exp. I, the IC were readily identified as distinct anatomical circular areas (e.g., Figure
2-3). For Exp. II, only the caudal edge of the IC were distinguishable (e.g., Figure 2-6), so the area
of each IC ROI was defined as a circle sized to fit this visible edge. The circle was displaced
caudally (by approximately 1.5 mm) relative to the IC to ensure that the IC activation was fully
encompassed by the ROI even after downsampling. The shift was necessary because activation in
the imaging plane for Exp. II frequently abutted, or even overlapped, the caudal IC edge.
MGB: Standard anatomical atlases were used to delimit a ROI enclosing the MGB, since the
MGB were not directly identifiable in the anatomical images. The caudal border of each MGB ROI
was defined as the edge between the brain and the ambient cistern. The distance from the region's
caudal edge to its rostral edge was determined from measurements of the caudorostral extent of the
MGB in the atlases. Similarly for the distance between the midline and the medial edge of the MGB
ROI. Distances were computed by first normalizing the atlas measurements to maximum brain
width, and then multiplying the normalized atlas measurements by the maximum width of the
individual imaged brain slice. The lateral edge of the MGB ROI was a line extended rostrally from
the lateral edge of the ambient cistern. The resulting MGB ROI probably included a portion of the
lateral geniculate in some subjects. However, activation generally did not occur at this lateral-most
edge.
HG: When HG was visible as a “mushroom” protruding from the surface of the superior
temporal plane, the lateral edge of this mushroom defined the lateral edge of the HG ROI. The
medial edge of the ROI was the medial-most aspect of the Sylvian fissure.
When a distinct
mushroom was not present, the HG ROI covered approximately the medial third of the superior
temporal plane (extending from the medial-most aspect of the Sylvian fissure). In the superior-
Methods
31
inferior dimension, the HG ROI extended superiorly to the edge of the overlying parietal lobe, and
inferiorly so as to entirely encompass any activation centered on HG.
STG: The STG ROI was defined as the superior temporal cortex lateral to the HG ROI. The
definition of the inferior and superior borders was the same as for the HG ROI.
Calculating response time courses
Specific voxels were chosen for computing the time course of response within each
anatomically defined region of interest. The voxels were chosen based on the activation maps for a
particular “reference rate”: 35/s for IC, 20/s for MGB, 10/s for HG, and 2/s for STG. The reference
rates were those that typically produced the strongest activation in the maps. For each IC and MGB,
we used the single voxel with the lowest p-value in the activation map at the reference rate. For each
HG and STG, we averaged the responses of the four voxels with the lowest p-values at the reference
rate. Note that for a given structure, session, and hemisphere, the same voxels were used in
computing the response time course at each rate.
Response time courses were computed as follows. Because cardiac gating results in an
irregular temporal sampling, the time series for each imaging “run” and voxel was linearly
interpolated to a consistent 2 s interval between images, using recorded interimage intervals to
reconstruct when each image occurred. These data were then temporally smoothed using a three
point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response “block” was defined as a 70 s
window (35 images) that included 10 s prior to a noise burst train, the 30 s coinciding with the train,
and the 30 s off period following the train. These response blocks were averaged according to rate to
give an average signal vs. time waveform for each rate, session, and hemisphere. The signal at each
time point was then converted to a percent change in signal relative to a baseline. The baseline was
defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the
noise burst train. In Exps. I and II, there was some uncertainty in the timing of the stimulus relative
to the images (up to a maximum of about 2 s in a given run). For the analyses performed in this
paper, this level of uncertainty is negligible.
In a supplementary analysis, we determined that response waveshape, averaged across
sessions and hemispheres (i.e., Figure 2-4), was a fair representation of the trends in the individual
32
Chapter 2: Repetition Rate
responses.
In particular, the waveshape of the responses was unaffected when the individual
responses were first normalized (by dividing by their maximum value) prior to averaging. This
result indicates that average response waveshape was not unduly influenced by just a small subset of
the individual responses.
In a second supplementary analysis, we determined that response waveshape was not
sensitive to our voxel selection criteria. For comparison, time courses were computed as above,
except using all voxels with a p-value less than 0.01 in the activation map at the reference rate
(instead of just the voxels showing the strongest activation). As expected, the resulting percent
change time courses (averaged across sessions and hemispheres) were reduced in magnitude.
However, the waveshape of the responses was unaffected.
A third supplementary analysis examined whether response waveshape at a given rate might
have changed during the experimental sessions. This analysis focused on HG and STG since the
most dramatic variations in waveshape occurred in these structures (e.g., see Figure 2-4).
Specifically, we computed response time courses for each session based on the three initial and three
final presentations of the 2/s and 35/s trains. The initial and final time courses for each rate were
then averaged across sessions. For each rate and structure, the average initial and final time courses
were qualitatively similar. They were also quantitatively similar in that there was a high degree of
correlation between the “initial” and “final” waveforms. [When the “initial” and “final” waveforms
for each rate and structure were cross-correlated with one another, the correlation coefficients were:
0.92 (HG, 2/s), 0.86 (HG, 35/s), 0.93 (STG, 2/s), 0.90 (STG, 35/s)].
In contrast, there was
considerably less correlation between the responses at the two different rates. [When the “initial”
waveforms for the two rates were cross-correlated, the correlation coefficients were: 0.54 (HG) and
0.55 (STG). Similarly, for the “final” waveforms, the coefficients were 0.25 (HG) and 0.35 (STG)].
This analysis indicates that, on average, there was no dramatic change in cortical response
waveshape during experimental sessions, and any change was substantially less than the change in
response waveshape with rate.
Methods
33
Quantifying response magnitude
Response magnitude in each auditory structure was quantified using two measures computed
from the percent change time courses. “Time-average” percent change, a measure of the overall
response strength, was computed as the mean percent change from t = 4 to 30 s. “Onset” percent
change, a measure of the response amplitude near the beginning of the noise burst train, was
computed as the maximum percent change from t = 4 to 10 s. Since “time-average” and “onset”
percent change were calculated from the percent change time courses, they indicate image signal
deviations relative to a 6 s baseline immediately preceding the stimulus (i.e., the baseline period used
in calculating the time courses).
Experiment III: Small numbers of noise bursts
To investigate a strong signal decrease that occurred in cortex following the onset of high
(but not low) rate trains (e.g., see Figure 2-4), we examined the responses to a single noise burst and
short clusters of noise bursts. Responses were collected in three imaging sessions with three subjects
(Exp. III; subject #'s 2,5,10). Two of these individuals also participated in Exps. I and II.
Either one noise burst or a cluster of noise bursts (2 or 5) was presented once every 18 s,
constituting a single “trial” (Figure 2-1, bottom).
For the clusters of five noise bursts, the
interstimulus interval (ISI, onset-to-onset) between noise bursts was 28.6 ms, equivalent to the ISI
for a rate of 35/s. For clusters of two noise bursts, two different ISIs were used: 500 ms (2/s rate)
and 28.6 ms (35/s rate). For two sessions, there was no task, and the same stimulus was used in all
of the trials for a given run (12 runs; 270 s per run; 45 total repetitions per trial type). The subjects
for these sessions reported difficulties in maintaining a high level of alertness due to the sparseness
and uniformity of the stimulus trials. Therefore, to help maintain alertness, in the third session the
subject (#10) was asked to count the number of trials per run, and the stimulus was randomized
across trials (7 runs; 288 s per run; 28 repetitions per trial type). Stimuli were presented binaurally at
55 dB above the threshold to a 10/s noise burst train (as in Exps. I and II).
The imaging methods were identical to those for Exp. I with the following exceptions: A 3T,
instead of a 1.5T, scanner was used to improve the ability to detect small amplitude responses
34
Chapter 2: Repetition Rate
(General Electric, outfitted for echo-planar imaging by ANMR Inc.).
The parameters used in
acquiring the high-resolution anatomical image of the “plane of interest” were: TR = 10 s, TI = 1200
ms, TE = 57 ms, in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm). The functional imaging
parameters were: gradient echo, TE = 40 ms, flip = 90o, in-plane resolution = 3.1 x 3.1 mm,
thickness = 7 mm). The first session used a fixed interimage interval (TR) of 2 s. The second and
third sessions used cardiac gating (parameters as in Exps. I and II) in an attempt to detect single trial
responses in the IC. Convincing responses were generally not seen in the IC. Therefore, only
cortical data are reported for Exp. III.
Images were analyzed and time courses for each stimulus were computed as in Exps. I and
II, with the following exceptions: (1) For the one session with a fixed TR, no linear interpolation was
necessary; (2) No temporal smoothing was applied (in order to avoid disproportionally altering the
responses, which were expected to be brief in duration); (3) The activation map for determining the
reference voxels was based on a single run of music (4 repetitions of the first 30 s of the fourth
movement in Beethoven Symphony No. 7). Because music typically evokes larger magnitude
responses than trains of either 2/s or 10/s noise bursts, we were able to obtain robust activation maps
with a single run, thereby allowing more time for collecting responses to the primary stimuli of
interest for the experiment.2 As in Exps. I and II, the four reference voxels selected from HG and
STG were those with the lowest p-values in the t-test activation map; (4) The baseline signal level
for converting time courses to percent change was based on the average of just two time points, t =
-2 to 0 s (since the “off” period between stimuli (18 s) was less for this experiment than for Exps. I
and II (30 s) and we wanted to avoid including time points where the response may have not yet
returned to baseline from the preceding stimulus).
2
In several sessions (not included in this paper), in which we presented both music and trains of 2/s and 35/s
noise bursts, we obtained similar responses for the noise burst trains irrespective of whether the reference
voxels were chosen using activation maps based on music or 2/s noise bursts. Typically, at least two of the
four reference voxels were in common between the two activation maps. Importantly, the dynamics of the
responses to music are similar to the dynamics of the responses to 2/s noise burst trains.
Results
35
Experiment IV: Noise burst trains with different durations
The effect of train duration was examined in two imaging sessions with two subjects (Exp.
IV; subject #'s 11,12). Trains of four different durations (15, 30, 45, and 60 s) were presented with
an “off” period of 40 s following each train. Noise burst repetition rate within each train was always
35/s. Each train duration was presented once per run (310 s per run; 8-9 runs) with the order of
durations randomized across runs. Imaging parameters were the same as Exp. III, except the
gradient echo functional images used a TE of 30 ms and a 60o flip angle.3 Both sessions used cardiac
gating so that the effect of train duration in cortex could be compared to the effect in the IC. Time
courses were computed as in Exp. I, except the activation map for determining the reference voxels
was based on a single run of music (as in Exp. III).
Supplementary information concerning the effects of train duration was obtained in two
additional experiments that used a single, long train duration (60 s) and 35/s noise bursts. One of
these experiments was conducted at 1.5 T, and the other at 3 T using the imaging parameters from
Exp. I and III, respectively.
RESULTS
Response to noise burst trains: effect of burst repetition rate
Inferior Colliculus
Activation maps for the IC showed an increase in activation with increasing burst repetition
rate. Figure 2-3 demonstrates this increase for two sessions from Exp. I. The maps show activation
that is least at 2/s, greater at 10/s, and greatest at 35/s. The volume of the inferior colliculus (2-4
3
We lowered the TE to reduce the potential for susceptibility induced signal losses. The flip angle was
reduced because of a tendency of the magnet to “overflip” past the nominal value.
36
Chapter 2: Repetition Rate
voxels) is only slightly greater than the spatial resolution of the activation maps, so the main
difference across rate is in the strength of activation (greater activation is reflected in the maps as a
lower p-value from the statistical comparison of image signal level during train “on” and “off”
periods). Greater IC activation at higher rates is also demonstrated by the maps in Figure 2-6, which
correspond to two sessions from Exp. II.
Figure 2-4 (left column) shows the time course of the responses in the IC averaged across all
sessions. At all rates, the response was “sustained” in that image signal increased when the noise
burst train was turned on, remained elevated while the train was on, and decreased once the train was
turned off. The amplitude of the sustained response during the “train on” period increased with
increasing rate.
The increase in response amplitude was quantified using two measures: peak percent signal
change near the beginning of the “train on” period (“onset” percent change), and percent signal
change time-averaged over the on period (“time-average” percent change; defined in Methods). On
average, both measures increased with increasing rate (Figure 2-5, top left). Onset and time-average
percent change showed a significant increase from 2/s to 10/s (p = 0.01, onset; p = 0.05, average;
paired t-test), and from 10/s to 35/s (p = 0.02, onset; p = 0.006, average). Plots of percent change vs.
rate for individual IC also showed an overall trend of increasing percent change with increasing rate
(Figure 2-5, top middle and right). For 19 of 22 IC, the response at 35/s was greater than the
response at 2/s (for both measures).
For the rates that overlapped between Exps. I and II (2, 10, 35/s) there was no significant
difference between the percent signal change values (p > 0.1, t-test), suggesting that the two main
differences between these Experiments (imaging plane and intensity detection task) did not have a
strong effect on inferior colliculus responses. There was no significant difference between the values
obtained from the left and right IC (p > 0.3, paired t-test, collapsing the data across all rates).
Summary: The IC showed a sustained response to noise burst trains. The amplitude of this
response increased with increasing burst repetition rate.
Results
37
Inferior Colliculus
(Exp. I)
Subject 3
R
L
Subject 5
p=0.01
35/s
p=2x10-9
10/s
5 mm
2/s
Inferior
Colliculi
Figure 2-3: Activation maps for the IC (two subjects, Exp. I). Stimuli were noise burst trains with
repetition rates of 2, 10, or 35/s. Each panel shows a T1-weighted anatomic image (grayscale) and
superimposed activation map (color) for a particular subject. Rectangle superimposed on the
diagrammatic image (bottom, right) indicates the area shown in each panel. For the activation maps,
regions are colored according to the result of a t-test comparison of image signal strength during
“train on” and “off” periods. In this and all subsequent figures, blue and yellow correspond to the
lowest (p = 0.01) and highest (p = 2 x 10-9) significance levels, respectively. (Areas with p > 0.01
are not colored). Activation maps (based on functional images with an in-plane resolution of 3.1 x
3.1 mm) have been interpolated to the resolution of the anatomic images (1.6 x 1.6 mm). Images are
displayed in radiological convention, so the subject's right is displayed on the left. R, right; L, left.
38
Chapter 2: Repetition Rate
Response Time Courses
Inferior
Colliculus
2
Medial Geniculate
Body
Heschl's
Gyrus
train ON
2
1
35/s
Superior Temporal
Gyrus
ON
peak
OFF
peak
1
0
0
20/s
10/s
Percent Signal Change
2
mean
± std.
error
1
0
2
2
1
1
0
0
2
2
2/s
1
1
0
0
0
30
60
0
Time (sec)
1/s
30
60
2
1
0
0
30
60
0
30
60
Time (sec)
Figure 2-4: Response time courses averaged across sessions and hemispheres (solid lines; IC: n = 22
for 2,10,35/s, n = 12 for 20/s; MGB: n = 10 for all rates; HG and STG: n = 10 for 2,10,35/s, n = 6 for
1/s). Dashed lines give the mean ± one standard error at each time point. Note that the vertical scale
for the IC and MGB responses differs slightly from the scale for HG and STG.
Results
39
Medial Geniculate Body
In contrast with the IC, activation maps for the MGB usually showed a nonmonotonic
change in activation with rate. The trend for the MGB is illustrated by the maps for two sessions in
Figure 2-6.4 The maps show an increase in MGB activation with increasing rate in the 2/s – 20/s
range, but a decrease from 20/s to 35/s.
The trend in the activation maps parallels the rate-dependence of time-average percent signal
change in the MGB, but not onset percent change. The close correspondence between time-average
percent change and activation maps is to be expected since the maps are based on a comparison of
time-average signal levels during “train on” and “off” periods. In the average across sessions, timeaverage percent change increased significantly from 2/s to 20/s (p = 0.005, paired t-test), and
decreased from 20/s to 35/s (p = 0.03; Figure 2-5, left). On average, onset percent signal change
showed the same trend from 2/s to 20/s (i.e., a significant increase from 2/s to 20/s; p< 0.001), but
not at high rates in that there was no difference between 20/s and 35/s (p = 0.9). The different ratedependence for onset vs. time-average percent change is also apparent overall in the plots for
individual MGBs (Figure 2-5, middle, right), despite the intersession variability in the precise trends
between the rates. Neither onset nor time-average percent change differed significantly between the
left and right MGBs (p > 0.15, paired t-test, collapsing the data across all rates).
The different rate-dependencies for onset and time-average percent change indicate that the
time course of the MGB response varies with rate. This variation is illustrated in Figure 2-4 (second
column). On average, responses to a 35/s train peaked just after train onset, then declined by
approximately 50% during the remainder of the train. This moderate decrease in the response differs
from the largely sustained responses at the lower rates of 2/s, 10/s, and 20/s. A quantitative
comparison of onset percent change to the percent change at the end of the train (i.e., at 30 s in the
time courses of Figure 2-4) confirmed the response difference at the highest rate. For the 35/s train,
4
In general, activation in the MGB was not as strong as in the IC. Consistently within subjects, the standard
deviations used in calculating the t-statistic for the activation maps were greater for the MGB than the IC.
40
Chapter 2: Repetition Rate
Figure 2-5: Response magnitude vs. repetition rate in the IC, MGB, HG, and STG. Left: Timeaverage and onset percent change averaged across sessions and hemispheres. Bars indicate the
standard error.5 (See caption of Figure 2-4 for the number of sessions and hemispheres represented
by each data point). Middle and right: Time-average and onset percent change for each session and
hemisphere vs. rate. To facilitate comparison of the trends across rate, each curve has been displaced
vertically by adding a constant (specific to each curve), such that the resulting mean of the values for
2, 10, and 35/s is always the same [and equal to the population mean for these rates (left column)].
In all plots, the repetition rate axis uses a categorical scale. Note that there are no data at 20/s for all
of the HG and STG curves, and for 10 of the IC curves.
5
The relatively larger standard errors for the MGB and STG, both in this figure and the response time courses,
arise from one instance in each structure (left hemisphere for MGB, right hemisphere for STG) in which the
response at all rates was noticeably larger than the responses from other subjects. Exclusion of this MGB
“outlier” reduced the mean time-average percent change, onset percent change, and time course values (during
the “train on” period) by 20-30%, and the standard errors by 30-50%. Exclusion of the STG outlier reduced
the mean of these same measures by 25-35%, and their standard errors by 40-60%. Precisely because of the
potential for large intersubject variations in response magnitude, we have used paired statistical tests
throughout the text whenever appropriate. The trends across rate for both the MGB and STG outlier were
consistent with the results reported in this paper, and analyses conducted by excluding these outliers did not
change any of the primary conclusions in this paper.
Results
41
Response vs. Repetition Rate
Average
across Subjects
Individual Subjects
Time-Average
Onset
2
Onset
Inferior
Colliculus
0
Normalized Percent Signal Change
Time-Average
Medial
Geniculate
Body
Heschl's
Gyrus
Percent Signal Change
2
0
2
0
3
Superior
Temporal
Gyrus
0
1
2 10 20 35
1
2
10 20 35 1
2
10 20 35
Repetition Rate of Noise Bursts in Train
Figure 2-5
42
Chapter 2: Repetition Rate
Medial Geniculate Body
and Inferior Colliculus
(Exp. II)
Subject 5
R
Subject 6
L
35/s
20/s
10/s
2/s
5 mm
Medial Inferior Medial
Geniculate Colliculi Geniculate
Body
Body
Figure 2-6: Activation maps for the MGB and IC (two subjects, Exp. II). Stimuli were noise burst
trains with repetition rates of 2, 10, 20, or 35/s. See Figure 2-3 caption.
Results
43
response amplitude was significantly less at the end of the train (p = 0.04, paired t-test), consistent
with a response decrease. In contrast, there was no significant difference at the lower rates (p > 0.1),
consistent with a sustained response.6 Thus, MGB responses varied over the course of high, but not
lower rate trains.
The rate dependencies seen in time-average percent change and the activation maps can be
explained in terms of the time course and onset amplitude of MGB responses. Between 2/s and 20/s,
the increase in time-average percent change (and activation in the maps) is largely attributable to the
increase in sustained response amplitude, which is simultaneously reflected as in increase in onset
percent change. Given that 20/s and 35/s evoke equal onset responses, the decrease in time-average
percent change (and in the maps) between these two rates can be primarily attributed to a change in
the response to the latter portion of the train (i.e., the change from a sustained response to one with a
moderate decrease following onset).
Summary: Between 2/s and 20/s, onset percent change increased with increasing rate in
MGB, while response time courses remained primarily sustained. Between 20/s and 35/s, there was
no change in onset amplitude, but the response dynamics changed from sustained to moderately
decreasing following the train onset.
Heschl's gyrus and superior temporal gyrus
A nonmonotonic relationship between rate and activation was apparent in the activation
maps for HG and STG (Figure 2-7). The maps showed an activation increase from 1/s to 2/s, and a
decrease from 10/s to 35/s.
6
The average time course of the MGB response at 2/s (Figure 2-4) shows an increase from the onset to the end
of the train. However, this increase was only present in 5 of 10 MGB, and was not significant in a paired t-test.
44
Chapter 2: Repetition Rate
Auditory Cortex
Subject 3
R
(Exp. I)
Subject 5
L
35/s
Heschl's
Gyrus
Superior
Temporal Gyrus
10/s
2/s
1/s
1 cm
Figure 2-7: Activation maps for HG and STG (two subjects, Exp. I). Stimuli were noise burst trains
with repetition rates of 1, 2, 10, or 35/s. See Figure 2-3 caption.
Results
45
As expected, the trends in the activation maps paralleled the rate-dependence of timeaverage percent signal change. In HG, time-average percent change increased from 2/s to 10/s (p =
0.05, paired t-test), but decreased markedly from 10/s to 35/s (p < 0.001; Figure 2-5, left). These
trends were observed consistently in individual HG (Figure 2-5, middle). In STG, the rate of greatest
time-average percent change (2/s) was less than in HG (10/s; Figure 2-5, left). For the 6 STG with
1/s data, time-average percent change at 2/s was greater than 1/s (p = 0.003, paired t-test). Timeaverage percent change tended to decrease from 2/s to 10/s and from 10/s to 35/s, so that the overall
decrease from 2/s to 35/s was significant (p = 0.002; Figure 2-5).
Onset percent change again showed differences compared to time-average percent change.
In HG, the difference was primarily one of degree. Onset percent change at 10/s was significantly
greater than both 2/s and 35/s (p = 0.01, paired t-test; Figure 2-5, left and right), but the decrease
from 10/s to 35/s averaged only 20% for onset percent change compared to 50% for time-average
percent change.
In STG, onset and time-average percent change had overall different trends.
Whereas time-average percent change decreased from 2/s to 35/s (p = 0.002), onset percent change
was unchanged over this range (p = 0.4; Figure 2-5, left and right).
A dramatic rate-dependent change in response waveshape accounts for the differences
between onset and time-average percent change in HG and STG. At low rates, responses were
sustained, whereas at high rates they were not (Figure 2-4, third and fourth columns). At the highest
rate of 35/s, image signal increased to a peak occurring 6 s following train onset (“on-peak”),
declined substantially over the next 8 s, increased slightly over the remainder of the on period, and
peaked again 6 s following train offset (“off-peak”). The most prominent features of this “phasic”
response are the peaks just after train onset and offset. In HG, the reduction in time-average percent
change between 10/s and 35/s was partly because of 1) the decrease in onset percent change, and 2)
the more dramatic signal decline during the on period for the 35/s train. In STG, onset percent
change did not vary significantly with rate, so the decline in time-average percent change at high
rates was primarily due to the change in response waveshape.
While the rate-dependencies of the HG and STG responses paralleled each other, there were
also clear differences between the two structures. In both HG and STG, the signal decline during the
46
Chapter 2: Repetition Rate
on period became increasingly pronounced with increasing repetition rate, so responses had an
increasing phasic appearance. However, at any given rate, the magnitude of the signal decline was
greater in STG. In STG, the magnitude of the decline (measured as a percentage decrease from the
onset peak to the value at 14 s in the average response time courses; Figure 2-4) was 22, 25, 58, and
93% for 1/s, 2/s, 10/s and 35/s respectively. In HG, the corresponding values were all less: 15, 13,
32, and 78%.
In light of the phasic response for high rate trains, it is not surprising that the activation maps
frequently showed little evidence of activity at the 35/s rate (e.g., Figure 2-7, top left).
The
activation maps were based on the difference in time-average signal between “train on” and “off”
periods. It is clear that for the response to the 35/s train (Figure 2-4), the difference between the
time-average of these two periods will be close to zero, even though cortex is responding robustly
(albeit transiently).7 Thus, for cortex, the activation maps, calculated using a standard method,
provide only a partial picture of cortical rate dependencies.
In HG and STG there were right-left differences in response magnitude. In HG, both timeaverage and onset percent change was greater on the right for 16 of 18 possible comparisons (p <
0.001, paired t-test, collapsing the data across all rates. In STG, the same trend was apparent, but
was weaker (right greater in 12 of 18 cases; p < 0.02). These right-left differences may reflect a
functional difference between right and left auditory cortex. Alternatively, they may reflect a
functional difference in auditory cortex in the anterior/posterior dimension. HG tends to be shifted
more anteriorly on the right as compared to the left (Penhune et al. 1996; Leonard et al. 1998),
raising the possibility that the imaged slice sampled different cortical areas on the two sides. In
support of there being a true right-left functional difference, a post-hoc reexamination of our imaging
plane (relative to the sagittal reference images) confirmed that the slice always intersected the
postero-medial aspect of HG and, in cases with two gyri, sampled the more anterior one. Thus, in all
7
Although we accounted for hemodynamic delay in generating the activation maps (see Methods), both the
“train on” and “off” periods were shifted by three images. Consequently, since both the “on” and “off-peak”
occur approximately 6 sec following train onset and offset, the two peaks will essentially nullify each other in
the t-statistic calculation.
Results
47
cases, the slices likely sampled primary auditory cortex (Rademacher et al. 1993), as well as
immediately lateral non-primary areas. In both HG and STG, response waveshape showed the same
changes with rate in the left and right hemispheres, although the signal decline following the onset
peak tended to be greater on the left compared to the right hemisphere.8 Overall, the right-left
difference was primarily one of response magnitude.
Summary: Responses in HG were sustained at low rates, but became phasic at high rates.
The most prominent features of this phasic response are signal peaks just after train onset and offset.
The amplitude of the on-peak (onset percent change) was greatest at 10/s. Responses in STG also
showed a progression from sustained to phasic with increasing rate. However, the amplitude of the
on-peak did not vary significantly with rate.
Response to small numbers of noise bursts
To investigate the rapid decline in signal shortly after train onset for cortical responses, we
compared the responses to a single noise burst, and clusters of two or five noise bursts with a burstto-burst interstimulus interval (ISI) of 28.6 ms (“35/s” rate) or 500 ms (“2/s” rate). Both single and
clustered noise bursts elicited measurable responses in HG and STG. The responses, averaged
across subjects and hemispheres, peaked 4-6 s after the stimulus and then returned to baseline by 810 s (Figure 2-8, top). After 8-10 s, the average response dipped below baseline. However, this
response feature, unlike the others, was dominated by the data for only one of the three subjects
(subj. #2).
8
Specifically, in HG the magnitude of the signal decline for the 35/s train (measured as a percentage decrease
from the onset peak to the value at 14 s in average response time courses computed separately for each
hemisphere) was 84% and 73% in the left and right hemisphere, respectively. In STG, the corresponding
values were 125% (i.e., a decline below baseline) and 71% for the 35/s train, and 76% and 44% for the 10/s
train.
48
Chapter 2: Repetition Rate
Responses to Single and Multiple Noise Bursts
Percent Signal Change
(averaged across subjects)
Heschl's Gyrus
Superior Temporal Gyrus
1.0
2.0
1 NB
2 NBs @ 35/s
5 NBs @ 35/s
2 NBs @ 2/s
1.5
0.5
1.0
0.5
0
0
-0.5
-4
-0.5
0
4
8
12
16
-4
0
4
8
12
16
Time (sec)
Normalized Peak Response
(within individual subjects)
3.5
Linear
Growth
3.5
3.0
3.0
2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
@ 35/s
0.5
@ 2/s
Linear
Growth
Subject Hemisphere
2 R
5
R
10
R
2
L
L
5
10
L
@ 35/s
0.5
@ 2/s
~
~
~
~
1
2
5
2
1
2
5
2
Number of Noise Bursts
Figure 2-8: Top: Average response time courses in HG and STG to either a single noise burst, or a
cluster of two or five noise bursts. Each trace is an average across both hemispheres of three
subjects. Bottom: Normalized peak response for each subject and hemisphere. Dashed line indicates
the prediction from a model in which each successive noise burst evokes a response identical to the 1
NB response, and the responses to each burst add.
Results
49
Figure 2-8 (bottom) shows normalized peak response vs. number of noise bursts for each
subject and hemisphere. These normalized responses were quantified as the peak percent signal
change in the response time course (which always occurred at t = 4 or 6 s), divided by the peak
percent change for a single noise burst. The normalized peak response generally increased with
increasing number of noise bursts (Figure 2-8, bottom). However, the response increase was always
less than would be predicted by a model in which each successive noise burst evokes a response
equivalent to the 1 NB response and the responses to each burst add (i.e., linear growth). Similarly,
for every subject and hemisphere, the peak response to 5 NBs@35/s was less than 2.5 times the
response to 2 NBs@35/s. These results are consistent with a model in which the responses to noise
bursts at the beginning of a train are greater than those occurring later. The fact that the peak
response for 2 NBs@2/s was always greater than for 2 NBs@35/s indicates that the decline in
response from the first burst to the second was greater at high, as compared to low rates.
We compared the mean peak percent change for single and multiple noise bursts to the mean
onset percent change for 35/s trains from Exp. I9 to gain an appreciation for the proportion of the onpeak accounted for by the earliest noise bursts in the train. In STG, we estimated that the mean peak
percent change for 1 NB and 5 NBs@35/s was approximately 40% and 65%, respectively, of the
mean onset percent change. In HG, the corresponding estimates were approximately 25% and 40%.
These values indicate that the earliest noise bursts of a high rate train account for a substantial
portion of the on-peak, especially in STG.
9
In making these comparisons, we took into account the two main differences between Exps. I and III. First,
because the responses in Exp. III were computed without temporal smoothing, we recomputed the onset
percent change values from Exp. I without temporal smoothing. The resulting mean onset percent change
values for a 35/s train increased by 20% for both HG and STG. Second, we took into account the difference in
imaging parameters (Exp. I: 1.5T scanner, ASE sequence; Exp. III: 3T scanner, GE sequence). The effect of
this difference was estimated empirically using responses to music obtained under Exp. I (11 sessions) and
Exp. III conditions (11 sessions). Time-average percent change was, on average, about 25% greater for the
Exp. III conditions, so the onset amplitude from Exp. I (calculated without temporal smoothing) was revised up
by this percentage before comparison with the single and multiple noise burst responses.
50
Chapter 2: Repetition Rate
Response Dependence on Train Duration
1.5
15 s
Percent Signal Change
Inferior
Colliculus
45 s
60 s
OFF
peak
1.0
Heschl's
Gyrus
30 s
0.5
0
-0.5
1.0
Subj. 11
Subj. 12
0.5
0
train ON
-0.5
0 15
0
30
0
45
0
60
Time (sec)
Figure 2-9: Response time courses in HG (top) and IC (bottom) to 35/s noise burst trains, with
durations of 15, 30, 45, and 60 s. Each trace is an average across hemispheres for a given subject.
The off-peak in each HG response is indicated by an arrow.
Response to high rate (35/s) noise burst trains: effect of train duration
By considering noise burst trains with different durations, we tested whether the off-peak in
cortical phasic responses is specifically linked to train termination. Two subjects were studied using
35/s noise burst trains with durations of 15, 30, 45, and 60 s. For both subjects and all durations, HG
responses showed a distinct off-peak after train offset (Figure 2-9, top). Regardless of train duration,
the off-peak occurred approximately 6 s after train offset, indicating a strong coupling between offpeak and train termination. A similarly strong coupling between off-peak and train termination was
also found in STG for both subjects (not shown). One subject (#11) was unusual in that the response
in STG did not show a clear off-peak for voxels selected by our standard criteria. Nevertheless, there
was a clear off-peak for other, nearby voxels, and this off-peak always occurred approximately 6 s
after train offset, regardless of train duration. In contrast to cortex, IC responses were largely
sustained for all durations and showed no sign of an off-peak (Figure 2-9, bottom).
Discussion
51
Data in two additional subjects further support the strong coupling between cortical off-peak
and train termination. These subjects, tested with a single train duration of 60 s, showed off-peaks in
both HG and STG that occurred approximately 6 s after train offset. All of the train duration data
taken together indicate that the cortical off-peak is specifically evoked by the termination of highrate noise burst trains.
DISCUSSION
fMRI responses to trains of noise bursts changed substantially with burst repetition rate in
every studied structure, although the nature of the changes was highly structure-dependent. In the
IC, response amplitude near train onset increased with increasing rate while response waveshape
remained unchanged (i.e., sustained). In the MGB, increasing rate produced an increase in onset
amplitude up to a point where a further increase in rate instead produced a change in waveshape
(from a largely sustained response to one showing a distinct peak just after train onset). In HG, the
site of primary auditory cortex, onset amplitude changed some with rate, but the most striking
change occurred in response waveshape. At low rates the waveshape was sustained, while at high
rates it was strongly phasic in that there were prominent response peaks just after train onset and
offset. In STG, which includes secondary auditory areas, onset amplitude showed no systematic
dependence on rate, whereas response waveshape showed a strong and dramatic rate-dependence
paralleling that in HG. Overall, from midbrain, to thalamus, to cortex, there was a systematic shift in
the form of response rate-dependencies from one of amplitude to one of waveshape.
The sustained response waveshapes seen in subcortical structures and for low rates in
auditory cortex are typical of the fMRI literature. In contrast, the phasic responses seen for higher
rates in auditory cortex are not, nor are their individual signature features. One signature feature –
the prominent peak following stimulus onset – has been reported for a few prolonged acoustic,
odorant, and visual stimuli (Bandettini et al. 1997; Jäncke et al. 1999; Giraud et al. 2000; Sobel et al.
2000), but is nevertheless a fairly uncommon feature for responses in the fMRI literature. A second
signature feature of phasic responses – the peak following stimulus offset – is highly unusual. To
52
Chapter 2: Repetition Rate
our knowledge, the only other reported “off-response” occurred in a subregion of primary visual
cortex following the transition from steady white light to darkness (Bandettini et al. 1997). The
paucity of previous reports of phasic fMRI responses may be partly an issue of detection since phasic
responses are poorly detected by some of the most commonly-used analysis approaches (e.g., a t-test
comparison of stimulus “on” and “off” periods or, equivalently, correlation or analyses using the
SPM software package that assume a sustained response; Bandettini et al., 1993; Sobel et al., 2000).
It is also possible that phasic responses have not been seen because they reflect neurophysiological
mechanisms that are only invoked in particular, largely unexplored stimulus regimes.
It is widely assumed that different sound features (e.g., frequency, bandwidth, repetition
rate) are represented in the amplitude of fMRI activation or amplitude variations with position (e.g.,
Giraud et al. 2000; Talavage et al. 2000; Yang et al. 2000; Wessinger et al. 2001). In contrast, the
possibility of representations in the temporal dimension is not generally entertained, and this makes
the wide variations in cortical response waveshape of the present study especially intriguing. A few
other studies have also reported covariations between sound characteristics and temporal fMRI
activation patterns in the auditory system. For instance, Gaschler-Markefski et al. (1997) examined
the degree of temporal stationarity of auditory cortical fMRI responses and reported regional
variations depending on stimulus and task. In studying fMRI responses to amplitude modulated
noise, Giraud et al. (2000) found an increasingly prominent peak at stimulus onset with increasing
modulation rate, a result that strongly parallels the findings of the present study (see “Comparison to
previous fMRI and PET studies” below). The present study and these previous reports suggest that
fMRI temporal patterns – or more specifically the temporal variations in neural activity underlying
these patterns – may be an important way in which sound is represented in the auditory system.
Role of rate per se in determining fMRI responses
In the present study, noise burst duration was held constant while rate was varied, so overall
stimulus energy and sound-time fraction (STF) covaried with rate (resulting in an ~12 dB differential
in sound pressure level for 2/s vs. 35/s noise burst trains). While this raises the possibility that the
wide range of response waveshapes in auditory cortex was due primarily to changes in parameters
Discussion
53
other than rate, we do not believe this to be the case for two reasons. First, in a separate study, we
have found that varying the intensity of 2/s or 35/s noise bursts over a 20-30 dB range has no effect
on response waveshape (see Chap. 4). Second, we have found that changing rate from 2/s to 35/s
while holding STF constant (and therefore varying burst duration) still produces a change in
waveshape from sustained to phasic (although STF does have some influence on response
waveshape; Chap. 4). In the case of response amplitude, the precise rate dependencies might be
somewhat different if stimulus energy and STF were held constant instead of burst duration, because
varying energy alone can produce changes in response amplitude (Hall et al. 2001; Sigalovsky et al.
2001), as may also be the case for changes in STF.
fMRI responses and underlying neural activity
To understand the significance of the different fMRI response waveshapes, it is necessary to
first consider the extent to which waveshape is governed by neural, metabolic, and hemodynamic
factors. While the relationship between neural activity and fMRI responses is not fully understood, it
is generally accepted that neural activity and image signal are ultimately linked through a chain of
metabolic and hemodynamic events. For the form of fMRI in the present study (“blood-oxygen level
dependent”, or BOLD fMRI), this linkage is as follows. When there is an increase in neural activity
in the form of synaptic events or neural discharges10, there is a corresponding increase in local brain
metabolism and oxygen consumption (Sokoloff 1989). The increase in oxygen consumption is
10
We consider both synaptic events and neural discharges as “neural activity”, because there is evidence in
favor of each as a contributor. 2DG studies suggest that synaptic events may dominate the metabolic response
(and hence the fMRI response; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995). A
recent study by Logothetis et al. (2001), which simultaneously recorded intracortical activity and fMRI
responses, also suggests that synaptic activity may dominate. However, there are also reports of a strong
coupling between discharges and fMRI responses (Heeger et al. 2000; Rees et al. 2000), possibly because
discharges and synaptic activity were themselves strongly correlated in those studies. It seems likely that the
relative contribution of synaptic events versus discharges to the fMRI response will depend on the specifics of
the local neural circuitry (Bandettini and Ungerleider 2001), and thence may in fact vary across different
regions of the brain (Auker et al. 1983; Mathiesen et al. 1998), or even for different types of stimuli. For the
purposes of the present discussion, we leave open the possibility that either or both synaptic activity and
discharges “contribute” to the fMRI responses we measured.
54
Chapter 2: Repetition Rate
accompanied by an increase in blood flow and blood volume in the active brain region. However,
the increase in flow dominates, such that the local concentration of deoxygenated hemoglobin
actually decreases, which is important because deoxygenated hemoglobin is paramagnetic (Pauling
and Coryell 1936) and thus influences local image signal levels. The net effect of a decrease in
deoxygenated hemoglobin is an increase in image signal. When the entire chain of events is
considered together, increases and decreases in neural activity result in concordant changes in image
signal strength (Kwong et al. 1992; Ogawa et al. 1993; Springer et al. 1999). Since hemodynamic
changes occur over the course of seconds, fMRI effectively provides a temporally low-pass filtered
view of neural activity. More specifically, since fMRI is sampling activity over small volumes of
brain (i.e., voxels), the responses can be thought of as showing the time-envelope of population
neural activity on a voxel-by-voxel basis.
Previous work has shown that the relative timing and magnitude of stimulus-evoked changes
in blood flow, blood volume, and oxygen consumption can influence the waveshape of the fMRI
response (Buxton et al. 1998; Mandeville et al. 1998). While this raises the possibility that changes
in waveshape from sustained to phasic reflect changes in hemodynamics rather than underlying
neural activity, we believe this to be unlikely for both of the main components of the phasic
response, namely the off-peak and the on-peak. It is particularly unlikely that a hemodynamic
explanation accounts for the off-peak. Previous hemodynamic modeling and experimentation has
not predicted an off-peak following stimulus termination, and we know of no plausible model that
could generate such a component.
Therefore, the emergence of an off-peak with increasing
repetition rate in auditory cortex is almost certainly attributable to a rate-dependent increase in neural
activity at stimulus offset.
The other major feature that distinguishes phasic from sustained responses, namely the sharp
decline in signal that forms the prominent onset peak, requires more detailed consideration because it
is known that declines in signal can theoretically occur over the course of a prolonged stimulus for
completely hemodynamic reasons (Buxton et al. 1998). However, measurements of BOLD signal,
blood flow, and blood volume responses have failed to illustrate a case for which purely
hemodynamic features could generate a signal decline as dramatic as those seen here (e.g., Hoge et
Discussion
55
al. 1999a; Mandeville et al. 1999). Additionally, separate evidence works against the idea that the
signal decline is driven primarily by hemodynamic rather than neural influences. The reasoning
follows from the fact that the same voxels were capable of showing either a phasic response (and the
associated dramatic signal decline) or a sustained response depending on the stimulus. Since the
time course of the phasic and sustained responses is very similar over the first 6-8 seconds, the
“operational history” of the hemodynamic system is presumably similar as well. In light of this
common initial response, a hemodynamic system that could subsequently generate grossly different
response waveshapes seems unlikely, unless the differences reflect differences in underlying neural
activity.
While response waveshape varied with rate within a structure, it also varied across structures
for a given rate. It is known that there can be spatial heterogeneity in tissue hemodynamics (Chen et
al. 1998; Davis et al. 1998), so the possibility that regional variations in hemodynamics play some
role in the waveshape differences across structures cannot be discounted. Still, the heterogeneities in
hemodynamics that have been documented are not sufficient to account for the dramatic waveshape
changes that occur across the pathway as a whole (from the inferior colliculus to cortex).
fMRI response onset and neural adaptation
Given that fMRI responses reflect the time-envelope of population neural activity, the
prominent declines in fMRI signal that occur at high rates in MGB, HG, and STG provide clear
evidence for an overall decline in neural activity during the first seconds of a train (< 10 s). This
decline likely includes decreases in synaptic, as well as discharge activity since both forms of
activity are reflected in fMRI signals.
While an overall decrease in neural activity early in high-rate trains is clear, the subsecond
temporal details of activity during this decrease remain unresolved because fMRI provides a lowpass filtered view of neural activity. Previous electrophysiological data suggest various possible
forms for the temporal details underlying the overall decline in neural activity. For instance, it may
be that the fMRI signal decline reflects a burst-to-burst adaptation in neural activity in which each
successive burst early in a train elicits progressively less activity (Ritter et al. 1968; Roth and Kopell
56
Chapter 2: Repetition Rate
1969). Alternatively, a variant of this may occur in which activity does not always decrease in a
strictly progressive fashion across consecutive bursts, but sometimes shows an increase from one
burst to the next (e.g., facilitation or enhancement; Loveless et al. 1989; Budd and Michie 1994;
Brosch et al. 1999). (As long as decreases from burst to burst occur more often than increases, the
time-envelope of neural activity would still decrease.) Another possibility is that population neural
activity is not synchronized to individual bursts (Lu and Wang 2000; Lu et al. 2001), but instead
occurs in response to the train as a whole with an initial peak in activity followed by a lower-level of
activity. All of these possibilities would result in a decline in the time-envelope of population neural
activity, and are therefore consistent with the prominent declines seen in fMRI signal.
While we cannot conclusively determine the temporal details of activity during the declines
in fMRI signal, it is worth recognizing that several aspects of our data are consistent with the idea
that there is a burst-to-burst adaptation in neural activity. In addition to the decline in fMRI response
early during high-rate trains, the fMRI responses to small numbers of noise bursts are also suggestive
of an adaptation process in that fMRI response amplitude did not increase in proportion to the
number of bursts, but rather showed a slower than linear growth. Whether neural activity and fMRI
signal are coupled in an approximately linear manner (and under what circumstances) is an open
question under active investigation (Boynton et al. 1996; Dale and Buckner 1997; Vazquez and Noll
1998; Hoge et al. 1999b; Gratton et al. 2001; Logothetis et al. 2001). However, if they are, the
slower than linear growth in the fMRI responses to small numbers of noise bursts would indicate
diminishing increases in neural activity as more and more bursts are added to a train (i.e.,
adaptation). Another aspect of the data consistent with neural adaptation is the growth in onset
amplitude with rate. If neural activity and fMRI response amplitude vary in direct proportion, onset
amplitude may be viewed roughly as an indicator of the time-average neural activity during the first
seconds of a train. If there were no adaptation and each successive burst in a train produced an
identical increase in neural activity, the time-average neural activity during the first seconds of a
train would increase in direct proportion to rate, and onset amplitude would be expected to do the
Discussion
57
same. Instead, a proportional increase in onset amplitude did not occur in any structure. This is
most obvious at high rates in MGB, HG, and STG where onset amplitude did not change, or even
declined with increasing rate.11 However, it can also be seen at lower rates and in the IC. For
instance, an increase in rate from 2/s to 10/s increased onset amplitude by less than two-fold in every
structure, well short of the five-fold increase expected if growth were proportional to rate. This
result is consistent with neural adaptation occurring in all of the studied structures, even the IC where
fMRI response waveshapes are largely sustained and do not immediately suggest an underlying
adaptation.
Looking across structures, the data indicate that any neural adaptation increased with
increasing position in the pathway. For instance, at any given rate, the percentage decline in signal
following the on-peak increased progressively from IC to MGB to auditory cortex (HG and STG),
suggesting an increasing degree of adaptation in the underlying population neural activity. An
increase in adaptation across structures is also suggested by the fact that the growth in onset
amplitude with rate falls increasingly short of predictions assuming no adaptation. For instance, the
increase in (average) onset amplitude from 2/s to 10/s falls increasingly short of the five-fold
increase predicted in the absence of adaptation as one moves from IC (1.69), to MGB (1.42), to
auditory cortex (HG: 1.26; STG: 0.92). Similarly, the increase in onset amplitude from 10/s to 35/s
falls increasingly short of the 3.5-fold prediction [IC: 1.42; MGB: 1.36; auditory cortex: 0.80 (HG),
0.99 (STG)]. Thus, if there is burst-to-burst adaptation in population neural activity early in a train,
it increases from IC to MGB to auditory cortex.
11
One neural interpretation of the trends in onset amplitude follows from a simple model in which there is a
neural response to each successive burst in a train, and two competing effects of increasing rate: (1) an
increased number of bursts per unit time (working to increase total neural activity during the initial seconds of
the train, and hence onset amplitude), and (2) a decreased response to individual bursts because of increased
adaptation (working to decrease total neural activity). If this latter effect due to adaptation were relatively
unimportant at low rates, but dominant at high rates, the result would be an increase in onset percent change
with increasing rate at low rates and a plateau or decrease at high rates – the trends we observed in MGB and
HG.
58
Chapter 2: Repetition Rate
Relationship to electrophysiological data in animals – A trend of increasing adaptation with
increasing position in the auditory pathway has emerged from several animal neurophysiological
studies explicitly designed to compare responses across structures. For instance, microelectrode
recordings of the response to paired stimuli with different interstimulus intervals have shown an
increase in recovery time with increasing level in the auditory pathway. In unanesthetized animals
(cats and rabbits), the average interval required for 50% recovery of the response to the second of
two clicks is 2 ms in the auditory nerve, cochlear nucleus, and superior olivary complex, but 7 ms in
the inferior colliculus, and 20 ms in auditory cortex (Fitzpatrick and Kuwada 1999).
In
unanesthetized guinea pig, Creutzfeldt et al. (1980) recorded responses to amplitude modulated tones
simultaneously from thalamic and cortical neurons (specifically nine thalamo-cortical unit pairs for
which the correlation of spontaneous activity suggested a direct synaptic connection). Activity in the
cortical neurons declined more rapidly over successive cycles of the AM tone than did the activity in
the thalamic neurons, indicating greater adaptation in the cortical neurons. Finally, recording nearfield potentials from the IC and auditory cortex in response to brief noise bursts in unanesthetized
chinchilla, Burkard et al. (1999) found that the mean response amplitude (averaged across noise
burst presentations) decreased more in cortex than IC as repetition rate was increased. Their results
are again consistent with greater adaptation in cortex than lower structures in the auditory pathway.
The extensive animal literature regarding modulation transfer functions (MTFs) also
suggests a change in temporal response properties from the IC to auditory cortex. Here we focus on
studies that quantify their results in terms of rate MTFs (rMTF; average firing rate vs. modulation
frequency), rather than temporal MTFs, since changes in the “synchronization” or phase locking of
neural activity (in the absence of average rate changes) are unlikely to be reflected in fMRI activity.
Furthermore, since most animal studies use short duration stimulus trains (~1 s), the most
appropriate measure of the present study for comparison to the animal results is onset amplitude. In
the IC, the best modulation frequency (BMF; the frequency at which the rMTF has its largest value)
for individual neurons is generally greater than ~30 Hz (Langner and Schreiner 1988; Muller-Preuss
et al. 1994; Krishna and Semple 2000). In contrast, BMFs in auditory cortex tend to be less than ~20
Hz (Schreiner and Urbas 1988; Eggermont 1991; Bieser and Muller-Preuss 1996; Schreiner and
Raggio 1996). These values are consistent with the present study in that onset amplitude steadily
Discussion
59
increased in the IC for noise burst rates up to 35/s (the highest rate employed), but peaked in HG at a
lower rate (10/s). While the variation in onset amplitude with rate in HG was rather small (and in
STG, onset amplitude did not vary at all), a similarly weak “tuning” also holds in population neural
activity, in that the rMTF averaged across cortical neurons is primarily low-pass, or only weakly
band-pass (Schreiner and Urbas 1986; Eggermont 1994; Eggermont and Smith 1995; Eggermont
1998). The relatively flat nature of the average rMTF in cortex probably reflects a weak tuning in
the rMTFs of many individual neurons (Schreiner and Raggio 1996; Eggermont 1998), but could
also be due in part to the summation of activity across sharply tuned units having a wide range of
BMFs. Overall, in both IC and HG, changes in onset amplitude as a function of repetition rate were
consistent with what might be predicted based on microelectrode recordings in animals of neural
spiking in response to amplitude modulated trains.
Relationship to electric recordings in humans – A trend of generally greater adaptation at
cortical vs. brainstem levels of the auditory pathway fits with data concerning two of the most
studied components of human auditory evoked potentials: wave V of the brainstem evoked potential
and the long latency potential N1. Wave V is likely generated by neurons projecting to the IC
(Melcher and Kiang 1996; Moller 1998), while the primary generators of N1 have been localized to
auditory cortex (e.g., Näätänen and Picton 1987; Reite et al. 1994). Wave V and N1 show different
sensitivities to stimulus repetition rate.
For example, in responses averaged over many click
presentations, a high click rate (> 50/s) is required to generate a 30-50% amplitude decrement in
wave V of the brainstem evoked potential (Thornton and Coleman 1975; Suzuki et al. 1986; Jiang et
al. 1991). In contrast, similar decrements in N1 occur at much lower rates (0.5 – 2/s), as seen in
averaged responses and by comparing the individual responses to successive stimuli in a train12
(Davis et al. 1966; Ritter et al. 1968; Roth and Kopell 1969; Fruhstorfer et al. 1970; Fruhstorfer
1971; Davis et al. 1972; Picton et al. 1977). The substantially different rate sensitivities for wave V
12
Interestingly, for stimulus pairs separated by 0.5 – 2 s (corresponding to rates of 2 – 0.5/s), the adaptation in
N1 is in close agreement with the data from Exp. III, in which the average increase in peak magnitude from
1 NB to 2 NBs@2/s was about 60% in both HG and STG.
60
Chapter 2: Repetition Rate
and N1 indicate greater stimulus-to-stimulus adaptation in the cortical neurons generating N1 than in
the brainstem neurons generating wave V.
Phasic response “off-peak” and neural off responses
The off-peak in the phasic response may be related to electrophysiological “off-responses”
observed intracortically following the termination of stimulus trains. For instance, following 175 ms
click trains, Steinschneider et al. (1998) found transient increases in cortical multiunit activity in
awake monkeys.
Using depth electrodes implanted in the vicinity of human auditory cortex,
Chatrian et al. (1960) showed gross potential responses following 3 s click trains. In both of these
electrophysiological studies, the rate-dependence of the off-response resembled our fMRI cortical
data in that an off-response was present for high, but not low repetition rates. Given this similarity, it
seems quite possible that the fMRI and electrophysiological off-responses to stimulus trains arise
from similar underlying neural mechanisms.
The fMRI off-peak to stimulus trains may also be physiologically related to off-responses
that occur in evoked potential and magnetic field recordings following the cessation of prolonged
individual stimuli. Support for this idea comes from the fact that a cortical fMRI off-peak can be
elicited by a prolonged noise burst (30 s duration; Sigalovsky et al. 2001), as well as by high rate
stimulus trains. In humans, extracranial evoked potential (Spychala et al. 1969; Pfefferbaum et al.
1971; Hillyard and Picton 1978; Picton et al. 1978) and magnetic field recordings (Hari et al. 1987;
Pantev et al. 1996; Lammertmann and Lutkenhoner 2001) have also shown responses to the offset of
prolonged noise or tone bursts (with durations ranging from 0.5 to 10 s). The generation site of
electrophysiological off-responses has been localized to auditory cortex (Hari et al. 1987; Pantev et
al. 1996; Noda et al. 1998). Therefore, both the electrophysiological and fMRI off-responses imply
increased activity in auditory cortical neurons following sound offset, a form of response that has
ample precedent at the single neuron level in auditory cortex (Goldstein et al. 1968; Abeles and
Goldstein 1972; Howard et al. 1996; He et al. 1997; Recanzone 2000).
The fMRI and electrophysiological off-responses resemble one another functionally in that
both show a decrease in magnitude with decreasing sound duration. For instance, evoked potential
Discussion
61
off-responses following tone bursts have been shown to decrease in magnitude as burst duration is
reduced from 9 s to 1 s (Hillyard and Picton 1978; Picton et al. 1978), or from 2.5 s to 0.5 s
(Pfefferbaum et al. 1971). The fMRI off-response to high-rate trains is lower in magnitude for a 15 s
train, as compared to longer duration trains (≥ 30 s; Figure 2-9). The duration dependence of fMRI
and electrophysiological off-responses may indicate that a diminishing percentage of cortical
neurons respond to sound offset as sound duration decreases, or that the neurons responding to the
offset of sound do so less robustly as duration decreases.
The fact that the fMRI and
electrophysiological data were obtained for very different regimes of duration (15 – 60 s and 0.5 – 9
s) suggests that changes in sound duration may be reflected in cortical population neural activity in
the same basic way over a broad range of sound durations.
Phasic response recovery
In addition to the prominent on and off-peaks of fMRI phasic responses, there is a third
component, namely the steady signal increase that begins approximately 10 s after train onset and
continues to the end of the train. A similar “signal recovery” has been observed in auditory cortex in
response to tone bursts (Robson et al. 1998) and in the supplementary motor area in response to a
finger tapping task (Nakai et al. 2000). This signal recovery13, which in the present study can be
seen in the responses to intermediate (10/s) and high-rate (35/s) trains of various durations (e.g., 30 –
60 s; Figures 2-4 and 2-9), has no obvious analog in electric or magnetic evoked responses to
prolonged auditory stimuli, since these responses do not generally increase over the course of the
stimulus, but rather remain fairly constant, or even decrease (Picton et al. 1978; Lammertmann and
Lutkenhoner 2001). However, most of the stimulus durations used in this literature are far less than
those used in the present study, so it is possible that an electric or magnetic analog would be
identified if longer stimulus durations were tried. It is also possible that the phasic response signal
13
It is unlikely that the signal recovery is the manifestation of low-frequency “oscillations” in the metabolic or
vascular systems, since the “frequency” required to explain a signal increase consistent with the longest
duration train (< 0.01 Hz) would be below the lowest frequencies generally considered in the literature on this
topic (Hudetz et al. 1998; Obrig et al. 2000).
62
Chapter 2: Repetition Rate
recovery is related to subjects' anticipation of the termination of high-rate trains (and thus may be
loosely analogous to electrophysiological correlates of expectation such as the contingent negative
variation; Donchin et al. 1978). To explore this idea, we performed two pilot experiments in which,
for some runs, the subject's attention was diverted to the visual domain with a highly demanding,
ongoing visual task, while presenting a 35/s noise burst train in our standard “30 s on, 30 s off”
protocol.
Comparison of task vs. no task showed the expected decrease in auditory response
amplitude in the visual task condition (Woodruff et al. 1996), but no obvious change in the form of
the signal recovery. If the signal recovery does indeed reflect anticipation of train termination, it
would appear to be a “low-level” expectation that is not readily eliminated by overt diversion of
attention.
Comparison to previous fMRI and PET studies – auditory and non-auditory
Several imaging studies have examined rate effects in auditory cortex using stimuli limited
to low rates. fMRI studies presenting blocks of syllables or words at rates up to 2.5/s show primarily
sustained responses in auditory cortex and an increase in response amplitude with increasing rate
(Binder et al. 1994; Dhankhar et al. 1997; Rees et al. 1997). Both the time courses and amplitude
variations are consistent with the results of the present study at low rates. PET studies have also
investigated the effect of changes in the repetition rate of words or long duration (500 ms) tone
bursts presented at rates up to 1.5/s (Price et al. 1992; Frith and Friston 1996; Rees et al. 1997).
Because PET requires the collection of photon counts over an extended duration (e.g., tens of
seconds), response time courses cannot be obtained, and results are reported only in terms of the
overall percent signal change between stimulus and control periods.
Computed in this “time-
averaged” manner, these PET rate studies report a monotonic relationship between rate and percent
change. The findings of the present study are in agreement with this trend at low repetition rates.
The fMRI study of Giraud et al. (2000) examined a wider range of rates, and for high rates
identified a cortical response component that is likely analogous to the “on-peak” in the phasic
responses of the present study. The stimulation paradigm presented amplitude modulated (AM)
noise and unmodulated noise in alternating 30 s blocks. Because unmodulated noise was used as the
Discussion
63
comparison condition, the signal changes that occur following the onset of the AM noise may reflect
a combination of responses: (1) to the AM noise itself, and (2) to the offset of the preceding,
unmodulated noise (since the offset of unmodulated noise can elicit a change in cortical signal;
Sigalovsky et al., 2001). Nevertheless, a relationship between AM rate and changes in cortical
response waveshape was found. For auditory cortex and high AM rates (> ~32 Hz), image signal
changes during AM presentations were modeled best by a transient response at AM noise onset. In
contrast, for low rates, the signal changes were better modeled by a sustained response. Thus, the
Giraud et al. study, like the present one, indicates the emergence of a prominent on-peak with
increasing rate in auditory cortex. The activation reported by Giraud et al. for subcortical centers
was relatively weak, perhaps because the data were taken without the benefit of cardiac gating.
Nevertheless, the data suggest that higher rates generally yield greater responses than low ones in
subcortical areas, a trend consistent with the results of the present study.
Robson et al. (1998) reported fMRI responses to an ~30 s train of 5/s tone bursts (100 ms
burst duration) that are remarkably similar to what we measured to 10/s noise bursts in cortex, in that
they include (1) an initial on-peak characterized by a 30-50% signal decline, (2) a subsequently
slowly increasing signal, and (3) a possible small signal increase or prolongation following train
termination. Robson et al. (1998) found that the cortical fMRI response to the tone burst train could
be fairly well-described by a model in which there is a response to each burst in a train, but the
magnitude of the response decreases progressively with each successive burst until a steady state
response level is reached.
PET and fMRI studies of rate effects in the visual, somatosensory, and motor systems are
generally consistent with the rate-dependence of “time-averaged” percent change in auditory cortex.
Over the low rates (< 5 Hz) used in studies of the motor system, activation in primary motor cortex
increases with increasing rate of finger tapping or finger flexion-extension movements (Blinkenberg
et al. 1996; Rao et al. 1996; Schlaug et al. 1996; Jäncke et al. 1998). For primary visual (Fox and
Raichle 1984, 1985; Kwong et al. 1992; Mentis et al. 1997; Thomas and Menon 1998; Zhu et al.
1998) and somatosensory (Ibanez et al. 1995; Takanashi et al. 2001) cortex, stimuli have been
presented over a wider rate range, comparable to the range used in the present study. A consistent
64
Chapter 2: Repetition Rate
result in these studies is that “time-averaged” activation increases at low rates, peaks around 8 Hz,
and then declines, a trend similar to the one seen here for primary auditory cortex. The visual and
somatosensory studies do not provide information concerning the time course of activation, so it is
not clear whether the decline in time-average activation at high rates reflects a change in response
waveshape, as is the case for primary auditory cortex. Nevertheless, the general trend in timeaverage activation raises the possibility that changes in response waveshape from sustained to phasic
with increasing rate are not unique to auditory cortex, but instead are a general property of sensory
cortical areas.
Relationship between fMRI response waveshape and sound perception
The fact that the various noise burst trains used in the present study differ considerably from
a perceptual standpoint raises the possibility that some of the observed trends in the fMRI responses
are correlates of perception, rather than simply rate. The most striking of the observed trends, the
change in cortical response waveshape, showed a qualitative correlation with the perceptual
attributes of the stimuli. For the low rate that elicited sustained responses (i.e., 2/s), the individual
bursts in the train were distinguishable. For the high rate that elicited the most phasic responses (i.e.,
35/s), the noise bursts fused so that the overall train sounded continuous (although modulated).
Thus, distinctly different cortical response waveshapes occurred for two distinctly different
perceptual regimes.
The possible correlation between fMRI response waveshape and perception fits with the idea
that population neural activity in auditory cortex encodes the beginning and end of perceptually
distinct acoustic “events”. At a 2/s rate, individual noise bursts of a train are the distinguishable
events, so there would be successive neural responses to each burst, resulting in a sustained fMRI
response. In contrast, 35/s noise burst trains are perceptually continuous and would be treated as a
single event, thus producing neural responses primarily at the onset and offset of the train, and
thence the phasic fMRI response.
Coordinated fMRI and psychophysical experiments will be
necessary to more fully explore the potential coupling between perception and neural activity as seen
through fMRI response waveshape.
Discussion
65
The trend in fMRI response waveshape across structures suggests that the coding of
perceptually distinct acoustic events in population neural activity occurs to different degrees
depending on structure. While auditory cortex showed a clear, qualitative correlation between
perception and response waveshape, the MGB showed less correlation in that there was less
difference in waveshape for 2/s vs. 35/s bursts. In the IC, there was no apparent correlation since
waveshape was the same regardless of whether the individual bursts of a train were distinct (as at
2/s) or fused (35/s). Thus, any correlation between perception and response waveshape diminished
from high to lower stages of the pathway.
Overall, our results suggest a population neural representation of the beginning and the end
of distinct perceptual events that, while weak or absent in the midbrain, begins to emerge in the
thalamus and is robust in auditory cortex.
Acknowledgements
The authors gratefully thank John Guinan, Mark Tramo, Irina Sigalovsky, Courtney Lane,
Peter Cariani, and Monica Hawley for numerous helpful comments and suggestions, and Barbara
Norris for considerable assistance in figure preparation. Support was provided by NIH/NIDCD
PO1DC00119, RO3DC03122, T32DC00038, and a Martinos Scholarship.
66
Chapter 2: Repetition Rate
REFERENCES
Abeles M and Goldstein MH, Jr. Responses of single units in the primary auditory cortex of the cat
to tones and to tone pairs. Brain Res 42: 337-352, 1972.
Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity
and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516,
1983.
Bandettini PA, Jesmanowicz A, Wong EC and Hyde JS. Processing strategies for time-course
data sets in functional MRI of the human brain. MRM 30: 161-173, 1993.
Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW,
Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes
during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997.
Bandettini PA and Ungerleider LG. From neuron to BOLD: New connections. Nat Neurosci 4:
864-866, 2001.
Bieser A and Muller-Preuss P. Auditory responsive cortex in the squirrel monkey: Neural
responses to amplitude-modulated sounds. Exp Brain Res 108: 273-284, 1996.
Binder JR, Rao SM, Hammeke TA, Frost JA, Bandettini PA and Hyde JS. Effects of stimulus
rate on signal responses during functional magnetic resonance imaging of auditory cortex. Brain Res
Cogn Brain Res 2: 31-38, 1994.
Blinkenberg M, Bonde C, Holm S, Svarer C, Andersen J, Paulson OB and Law I. Rate
dependence of regional cerebral activation during performance of a repetitive motor task: A PET
study. J. Cerebral Blood Flow and Metabolism 16: 794-803, 1996.
Boynton GM, Engel SA, Glover GH and Heeger DJ. Linear systems analysis of functional
magnetic resonance imaging in human V1. J Neurosci 16: 4207-4221, 1996.
Bregman A. Auditory scene analysis. The perceptual organization of sound. Cambridge: MIT Press,
1990.
Brosch M, Schulz A and Scheich H. Processing of sound sequences in macaque auditory cortex:
Response enhancement. J Neurophysiol 82: 1542-1559, 1999.
Buckner RL, Bandettini PA, O'Craven KM, Savoy RL, Peterson SE, Raichle ME and Rosen
BR. Detection of cortical activation during averaged single trials of a cognitive task using functional
magnetic resonance imaging. Proc Natl Acad Sci 93: 14878-14883, 1996.
Budd TW and Michie PT. Facilitation of the N1 peak of the auditory ERP at short stimulus
intervals. Neuroreport 5: 2513-2516, 1994.
Burkard RF, Secor CA and Salvi RJ. Near-field responses from the round window, inferior
colliculus, and auditory cortex of the unanesthetized chinchilla: Manipulations of noise burst level
and rate. J Acoust Soc Am 106: 304-312, 1999.
Buxton RB, Wong EC and Frank LR. Dynamics of blood flow and oxygenation changes during
brain activation: The balloon model. Magn Reson Med 39: 855-864, 1998.
References
67
Chatrian GE, Petersen MC and Lazarte JA. Responses to clicks from the human brain: Some
depth electrographic observations. Electroencephalogr Clin Neurophysiol 12: 479-489, 1960.
Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation
of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation.
Magn Reson Med 39: 520-527, 1998.
Creutzfeldt O, Hellweg FC and Schreiner C. Thalamocortical transformation of responses to
complex auditory stimuli. Exp Brain Res 39: 87-104, 1980.
Dale AM and Buckner RL. Selective averaging of rapidly presented individual trials using fMRI.
Hum Brain Mapp 5: 329-340, 1997.
Davis H, Mast T, Yoshie N and Zerlin S. The slow response of the human cortex to auditory
stimuli: Recovery process. Electroencephalogr Clin Neurophysiol 21: 105-113, 1966.
Davis H, Osterhammel A, Wier CC and Gjerdingen DB. Slow vertex potentials: Interactions
among auditory, tactile, electric and visual stimuli. Electroencephalogr Clin Neurophysiol 33: 537545, 1972.
Davis TL, Kwong KK, Weisskoff RM and Rosen BR. Calibrated functional MRI: Mapping the
dynamics of oxidative metabolism. Proc Natl Acad Sci 95: 1834-1839, 1998.
Dhankhar A, Wexler BE, Fulbright RK, Halwes T, Blamire AM and Shulman RG. Functional
magnetic resonance imaging assessment of the human brain auditory cortex response to increasing
word presentation rates. J Neurophysiol 77: 476-483, 1997.
Donchin E, Ritter W and McCallum WC. Cognitive psychophysiology: The endogenous
components of the ERP. In: Event-related brain potentials in man, edited by Callaway E, Tueting P
and Koslow SH. New York: Academic Press, 1978, p. 349-411.
Eggermont JJ. Rate and synchronization measures of periodicity coding in cat primary auditory
cortex. Hear Res 56: 153-167, 1991.
Eggermont JJ. Temporal modulation transfer functions for AM and FM stimuli in cat auditory
cortex. Effects of carrier type, modulating waveform and intensity. Hear Res 74: 51-66, 1994.
Eggermont JJ and Smith GM. Synchrony between single-unit activity and local field potentials in
relation to periodicity coding in primary auditory cortex. J Neurophysiol 73: 227-245, 1995.
Eggermont JJ. Representation of spectral and temporal sound features in three cortical fields of the
cat. Similarities outweigh differences. J Neurophysiol 80: 2743-2764, 1998.
Erne SN and Hoke M. Short-latency evoked magnetic fields from the human auditory brainstem.
Adv Neurol 54: 167-176, 1990.
Fitzpatrick DC and Kuwada S. Responses of neurons to click-pairs as simulated echoes: Auditory
nerve to auditory cortex. J Acoust Soc Am 106: 3460-3472, 1999.
Fox PT and Raichle ME. Stimulus rate dependence of regional cerebral blood flow in human striate
cortex, demonstrated by positron emission tomography. J Neurophysiol 51: 1109-1120, 1984.
Fox PT and Raichle ME. Stimulus rate determines regional brain blood flow in striate cortex. Ann
Neurol 17: 303-305, 1985.
Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC and Evans AC. Assessing the
significance of focal activations using their spatial extent. Hum Brain Mapp 1: 210-220, 1994.
Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial
registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995.
68
Chapter 2: Repetition Rate
Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in
fMRI time-series. Magn Reson Med 35: 346-355, 1996.
Frith CD and Friston KJ. The role of the thalamus in "top down" modulation of attention to sound.
Neuroimage 4: 210-215, 1996.
Fruhstorfer H, Soveri P and Jarvilehto T. Short-term habituation of the auditory evoked response
in man. Electroencephalogr Clin Neurophysiol 28: 153-161, 1970.
Fruhstorfer H. Habituation and dishabituation of the human vertex response. Electroencephalogr
Clin Neurophysiol 30: 306-312, 1971.
Gaschler-Markefski B, Baumgart F, Tempelmann C, Schindler F, Stiller D, Heinze HJ and
Scheich H. Statistical methods in functional magnetic resonance imaging with respect to
nonstationary time-series: Auditory cortex activity. Magn Reson Med 38: 811-820, 1997.
Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt
A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000.
Goldstein MH, Jr., Hall JL, II and Butterfield BO. Single-unit activity in the primary auditory
cortex of unanesthetized cats. J Acoust Soc Am 43: 444-455, 1968.
Gratton G, Goodman-Wood MR and Fabiani M. Comparison of neuronal and hemodynamic
measures of the brain response to visual stimulation: An optical imaging study. Hum Brain Mapp 13:
13-25, 2001.
Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O and Patterson RD. Encoding of the
temporal regularity of sound in the human brainstem. Nat Neurosci 4: 633-637, 2001.
Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S,
Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain
Mapp 6: 33-41, 1998.
Hall DA, Haggard MP, Summerfield AQ, Akeroyd MA, Palmer AR and Bowtell RW.
Functional magnetic resonance imaging measurements of sound-level encoding in the absence of
background scanner noise. J Acoust Soc Am 109: 1559-1570, 2001.
Hari R, Pelizzone M, Makela JP, Hallstrom J, Leinonen L and Lounasmaa OV. Neuromagnetic
responses of the human auditory cortex to on- and offsets of noise bursts. Audiology 26: 31-43, 1987.
He J, Hashikawa T, Ojima H and Kinouchi Y. Temporal integration and duration tuning in the
dorsal zone of cat auditory cortex. J Neurosci 17: 2615-2625, 1997.
Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does
neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000.
Hillyard SA and Picton TW. On and off components in the auditory evoked potential. Percept
Psychophys 24: 391-398, 1978.
Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD
and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999a.
Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Linear coupling between
cerebral blood flow and oxygen consumption in activated human cortex. Proc Natl Acad Sci 96:
9403-9408, 1999b.
Howard MA, III, Volkov IO, Abbas PJ, Damasio H, Ollendieck MC and Granner MA. A
chronic microelectrode investigation of the tonotopic organization of human auditory cortex. Brain
Res 724: 260-264, 1996.
References
69
Hudetz AG, Biswal BB, Shen H, Lauer KK and Kampine JP. Spontaneous fluctuations in
cerebral oxygen supply. An introduction. Adv Exp Med Biol 454: 551-559, 1998.
Ibanez V, Deiber MP, Sadato N, Toro C, Grissom J, Woods RP, Mazziotta JC and Hallett M.
Effects of stimulus rate on regional cerebral blood flow after median nerve stimulation. Brain 118:
1339-1351, 1995.
Jäncke L, Specht K, Mirzazade S, Loose R, Himmelbach M, Lutz K and Shah NJ. A parametric
analysis of the 'rate effect' in the sensorimotor cortex: A functional magnetic resonance imaging
analysis in human subjects. Neurosci Lett 252: 37-40, 1998.
Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the
BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res
Cogn Brain Res 8: 117-124, 1999.
Jiang ZD, Wu YY and Zhang L. Amplitude change with click rate in human brainstem auditoryevoked responses. Audiology 30: 173-182, 1991.
Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect
synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995.
Krishna BS and Semple MN. Auditory temporal processing: Responses to sinusoidally amplitudemodulated tones in the inferior colliculus. J Neurophysiol 84: 255-273, 2000.
Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy
DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic
magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl
Acad Sci 89: 5675-5679, 1992.
Lammertmann C and Lutkenhoner B. Near-DC magnetic fields following a periodic presentation
of long-duration tonebursts. Clin Neurophysiol 112: 499-513, 2001.
Langner G and Schreiner CE. Periodicity coding in the inferior colliculus of the cat. I. Neuronal
mechanisms. J Neurophysiol 60: 1799-1822, 1988.
Langner G. Periodicity coding in the auditory system. Hear Res 60: 115-142, 1992.
Leonard CM, Puranik C, Kuldau JM and Lombardino LJ. Normal variation in the frequency
and location of human auditory cortex landmarks. Heschl's gyrus: Where is it? Cereb Cortex 8: 397406, 1998.
Lockwood AH, Salvi RJ, Coad ML, Arnold SA, Wack DS, Murphy BW and Burkard RF. The
functional anatomy of the normal human auditory system: Responses to 0.5 and 4.0 kHz tones at
varied intensities. Cereb Cortex 9: 65-76, 1999.
Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological
investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001.
Loveless N, Hari R, Hamalainen M and Tiihonen J. Evoked responses of human auditory cortex
may be enhanced by preceding stimuli. Electroencephalogr Clin Neurophysiol 74: 217-227, 1989.
Lu T and Wang X. Temporal discharge patterns evoked by rapid sequences of wide- and
narrowband clicks in the primary auditory cortex of cat. J Neurophysiol 84: 236-246, 2000.
Lu T, Liang L and Wang X. Temporal and rate representations of time-varying signals in the
auditory cortex of awake primates. Nat Neurosci 4: 1131-1138, 2001.
Mandeville JB, Marota JJA, Kosofsky BE, Keltner JR, Weissleder R, Rosen BR and Weisskoff
RM. Dynamic functional imaging of relative cerebral blood volume during rat forepaw stimulation.
Magn Reson Med 39: 615-624, 1998.
70
Chapter 2: Repetition Rate
Mandeville JB, Marota JJA, Ayata C, Moskowitz MA, Weisskoff RM and Rosen BR. MRI
measurement of the temporal evolution of relative CMRO2 during rat forepaw stimulation. Magn
Reson Med 42: 944-951, 1999.
Mathiesen C, Caesar K, Akgoren N and Lauritzen M. Modification of activity-dependent
increases of cerebral blood flow by excitatory synaptic activity and spikes in rat cerebellar cortex. J
Physiol (Lond) 512: 555-566, 1998.
Melcher JR and Kiang NY. Generators of the brainstem auditory evoked potential in cat. III:
Identified cell populations. Hear Res 93: 52-71, 1996.
Melcher JR, Talavage TM and Harms MP. Functional MRI of the auditory system. In: Functional
MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 393-406.
Mentis MJ, Alexander GE, Grady CL, Horwitz B, Krasuski J, Pietrini P, Strassburger T,
Hampel H, Schapiro MB and Rapoport SI. Frequency variation of a pattern-flash visual stimulus
during PET differentially activates brain from striate through frontal cortex. Neuroimage 5: 116-128,
1997.
Miller GA and Taylor WG. The perception of repeated bursts of noise. J Acous Soc Am 20: 171182, 1948.
Moller AR. Neural generators of the brainstem auditory evoked potentials. Seminars in Hearing 19:
11-27, 1998.
Muller-Preuss P, Flachskamm C and Bieser A. Neural encoding of amplitude modulation within
the auditory midbrain of squirrel monkeys. Hear Res 80: 197-208, 1994.
Näätänen R and Picton T. The N1 wave of the human electric and magnetic response to sound: A
review and an analysis of the component structure. Psychophysiology 24: 375-425, 1987.
Nakai T, Matsuo K, Kato C, Takehara Y, Isoda H, Moriya T, Okada T and Sakahara H. Poststimulus response in hemodynamics observed by functional magnetic resonance imaging--difference
between the primary and sensorimotor area and the supplementary motor area. Magn Reson Imaging
18: 1215-1219, 2000.
Noda K, Tonoike M, Doi K, Koizuka I, Yamaguchi M, Seo R, Matsumoto N, Noiri T, Takeda N
and Kubo T. Auditory evoked off-response: Its source distribution is different from that of onresponse. Neuroreport 9: 2621-2625, 1998.
Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic
activity in the central auditory system. J Comp Neurol 245: 553-565, 1986.
Obrig H, Neufang M, Wenzel R, Kohl M, Steinbrink J, Einhaupl K and Villringer A.
Spontaneous low frequency oscillations of cerebral hemodynamics and metabolism in human adults.
Neuroimage 12: 623-639, 2000.
Ogawa S, Menon RS, Tank DW, Kim SG, Merkle H, Ellerman JM and Ugurbil K. Functional
brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging: A
comparison of signal characteristics with a biophysical model. Biophys J 64: 803-812, 1993.
Pantev C, Eulitz C, Hampson S, Ross B and Roberts LE. The auditory evoked "off" response:
Sources and comparison with the "on" and the "sustained" responses. Ear Hear 17: 255-265, 1996.
Pauling L and Coryell CD. The magnetic properties and structure of hemoglobin, oxyhemoglobin
and carbonmonoxyhemoglobin. Proc Natl Acad Sci 22, 1936.
References
71
Penhune VB, Zatorre RJ, MacDonald JD and Evans AC. Interhemispheric anatomical
differences in human primary auditory cortex: Probabilistic mapping and volume measurement from
magnetic resonance scans. Cereb Cortex 6: 661-672, 1996.
Pfefferbaum A, Buchsbaum M and Gips J. Enhancement of the average evoked response to tone
onset and cessation. Psychophysiology 8: 332-339, 1971.
Picton TW, Hillyard SA, Krausz HI and Galambos R. Human auditory evoked potentials. I:
Evaluation of components. Electroencephalogr Clin Neurophysiol 36: 179-190, 1974.
Picton TW, Woods DL, Baribeau-Braun J and Healey TMG. Evoked potential audiometry. J
Otolaryngol 6: 90-119, 1977.
Picton TW, Woods DL and Proulx GB. Human auditory sustained potentials. II. Stimulus
relationships. Electroencephalogr Clin Neurophysiol 45: 198-210, 1978.
Press WH, Teukolsky SA, Vetterling WT and Flannery BP. Numerical recipes in c: The art of
scientific computing: Cambridge University Press, 1992.
Price C, Wise R, Ramsay S, Friston K, Howard D, Patterson K and Frackowiak R. Regional
response differences within the human auditory cortex when listening to words. Neuroscience Letters
146: 179-182, 1992.
Purdon PL and Weisskoff RM. Effect of temporal autocorrelation due to physiological noise and
stimulus paradigm on voxel-level false-positive rates in fMRI. Hum Brain Mapp 6: 239-249, 1998.
Rademacher J, Caviness J, V.S., Steinmetz H and Galaburda AM. Topographical variation of
the human primary cortices: Implications for neuroimaging, brain mapping, and neurobiology. Cereb
Cortex 3: 313-329, 1993.
Rao SM, Bandettini PA, Binder JR, Bobholz JA, Hammeke TA, Stein EA and Hyde JS.
Relationship between finger movement rate and functional magnetic resonance signal change in
human primary motor cortex. J Cerebral Blood Flow and Metabolism 16: 1250-1254, 1996.
Ravicz ME, Melcher JR and Kiang NY-S. Acoustic noise during functional magnetic resonance
imaging. J Acoust Soc Am 108: 1683-1696, 2000.
Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional
magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and
body. J Acoust Soc Am 109: 216-231, 2001.
Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving
macaque monkeys. Hear Res 150: 104-118, 2000.
Rees G, Howseman A, Josephs O, Frith CD, Friston KJ, Frackowiak RSJ and Turner R.
Characterizing the relationship between BOLD contrast and regional cerebral blood flow
measurements by varying the stimulus presentation rate. Neuroimage 6: 270-278, 1997.
Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties
of human and macaque V5. Nat Neurosci 3: 716-723, 2000.
Reese TG, Davis TL and Weisskoff RM. Automated shimming at 1.5 T using echo-planar image
frequency maps. J Magn Reson Imaging 5: 739-745, 1995.
Reite M, Adams M, Simon J, Teale P, Sheeder J, Richardson D and Grabbe R. Auditory m100
component 1: Relationship to heschl's gyri. Brain Res Cogn Brain Res 2: 13-20, 1994.
Ritter W, Vaughan J, H.G. and Costa LD. Orienting and habituation to auditory stimuli: A study
of short term changes in average evoked responses. Electroencephalogr Clin Neurophysiol 25: 550556, 1968.
72
Chapter 2: Repetition Rate
Robson MD, Dorosz JL and Gore JC. Measurements of the temporal fMRI response of the human
auditory cortex to trains of tones [published erratum appears in Neuroimage 8: 228, 1998].
Neuroimage 7: 185-198, 1998.
Roth WT and Kopell BS. The auditory evoked response to repeated stimuli during a vigilance task.
Psychophysiology 6: 301-309, 1969.
Royer FL and Robin DA. On the perceived unitization of repetitive auditory patterns. Percept
Psychophys 39: 9-18, 1986.
Schlaug G, Sanes JN, Thangaraj V, Darby DG, Jancke L, Edelman RR and Warach S.
Cerebral activation covaries with movement rate. Neuroreport 7: 879-883, 1996.
Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of cat.
I. The anterior auditory field (AAF). Hear Res 21: 227-241, 1986.
Schreiner CE and Langner G. Coding of temporal patterns in the central auditory nervous system.
In: Auditory function: Neurobiological bases of hearing, edited by Edelman GM, Gall WE and
Cowan WM. New York: John Wiley and Sons, 1988, p. 337-361.
Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the
cat. II. Comparison between cortical fields. Hear Res 32: 49-64, 1988.
Schreiner CE and Raggio MW. Neuronal responses in cat primary auditory cortex to electrical
cochlear stimulation. II. Repetition rate coding. J Neurophysiol 75: 1283-1300, 1996.
Sigalovsky I, Hawley ML, Harms MP and Melcher JR. Sound level representations in the human
auditory pathway investigated using fMRI. Neuroimage 13: S939, 2001.
Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE.
Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol
83: 537-551, 2000.
Sokoloff L. In: Basic neurochemistry, edited by Siegel G, Agranoff B, Albers RW and Molinoff P.
New York: Raven, 1989, p. 565-590.
Springer CS, Jr., Patlak CS, Palyka I and Huang W. Principles of susceptibility contrast-based
functional MRI: The sign of the functional MRI response. In: Functional MRI, edited by Moonen
CTW and Bandettini PA. Berlin: Springer, 1999, p. 91-102.
Spychala P, Rose DE and Grier JB. Comparison of the "on" and "off" characteristics of the
acoustically evoked response. International Audiology 8: 416-423, 1969.
Steinschneider M, Reser DH, Fishman YI, Schroeder CE and Arezzo JC. Click train encoding in
primary auditory cortex of the awake monkey: Evidence for two mechanisms subserving pitch
perception. J Acoust Soc Am 104: 2935-2955, 1998.
Suzuki T, Kobayashi K and Takagi N. Effects of stimulus repetition rate on slow and fast
components of auditory brain-stem responses. Electroencephalogr Clin Neurophysiol 65: 150-156,
1986.
Symmes D, Chapman LF and Halstead WC. The fusion of intermittent white noise. J Acoust Soc
Am 27: 470-473, 1955.
Takanashi M, Abe K, Yanagihara T, Oshiro Y, Watanabe Y, Tanaka H, Hirabuki N,
Nakamura H and Fujita N. Effects of stimulus presentation rate on the activity of primary
somatosensory cortex: A functional magnetic resonance imaging study in humans. Brain Res Bull
54: 125-129, 2001.
References
73
Talavage TM, Ledden PJ, Benson RR, Rosen BR and Melcher JR. Frequency-dependent
responses exhibited by multiple regions in human auditory cortex. Hear Res 150: 225-244, 2000.
Thomas CG and Menon RS. Amplitude response and stimulus presentation frequency response of
human primary visual cortex using BOLD EPI at 4 T. Magn Reson Med 40: 203-209, 1998.
Thornton ARD and Coleman MJ. The adaptation of cochlear and brainstem auditory evoked
potentials in humans. Electroencephalogr Clin Neurophysiol 39: 399-406, 1975.
Vazquez AL and Noll DC. Nonlinear aspects of the BOLD response in functional MRI.
Neuroimage 7: 108-118, 1998.
Wessinger CM, VanMeter J, Tian B, Van Lare J, Pekar J and Rauschecker JP. Hierarchical
organization of the human auditory cortex revealed by functional magnetic resonance imaging. J
Cogn Neurosci 13: 1-7, 2001.
Woodruff PWR, Benson RR, Bandettini PA, Kwong KK, Howard RJ, Talavage T, Belliveau J
and Rosen BR. Modulation of auditory and visual cortex by selective attention is modalitydependent. Neuroreport 7: 1909-1913, 1996.
Yang Y, Engelien A, Engelien W, Xu S, Stern E and Silbersweig DA. A silent event-related
functional MRI technique for brain activation studies without interference of scanner acoustic noise.
Magn Reson Med 43: 185-190, 2000.
Zhu XH, Kim SG, Andersen P, Ogawa S, Ugurbil K and Chen W. Simultaneous oxygenation
and perfusion imaging study of functional activity in primary visual cortex at different visual
stimulation frequency: Quantitative correlation between BOLD and CBF changes. Magn Reson Med
40: 703-711, 1998.
74
Chapter 3
Detection and quantification of a wide range of
fMRI temporal responses using a physiologicallymotivated basis set
ABSTRACT
The temporal dynamics of fMRI responses can span a broad range, indicating a rich
underlying physiology, but also posing a significant challenge for detection. For instance, in human
auditory cortex, prolonged sound stimuli (~30 s) can evoke responses ranging from sustained to
highly phasic (i.e., characterized by prominent peaks just after sound onset and offset). In the
present study, we developed a method capable of detecting a wide variety of responses, while
simultaneously extracting information about individual response components, which may have
different physiological underpinnings. Specifically, we implemented the general linear model using
a novel set of basis functions chosen to reflect temporal features of cortical fMRI responses. This
physiologically-motivated basis set (the “OSORU” basis set) was tested against (1) the commonly
employed “sustained-only” basis “set” (i.e., a single smoothed “boxcar” function), and (2) a
sinusoidal basis set, which is capable of detecting a broad range of responses, but lacks a direct
relationship to individual response components. On data that included many different temporal
75
76
Chapter 3: Physiologically-motivated GLM
responses, the OSORU basis set performed far better overall than the sustained-only set, and as well
or better than the sinusoidal basis set. The OSORU basis set also proved effective in exploring brain
physiology. As an example, we demonstrate that the OSORU basis functions can be used to
spatially map the relative amount of transient vs. sustained activity within auditory cortex. The
OSORU basis set provides a powerful means for response detection and quantification that should be
broadly applicable to any brain system and to both human and non-human species.
INTRODUCTION
It is well recognized that sites of brain activation could escape detection with fMRI if the
temporal dynamics of activation and the a priori temporal assumptions of the detection technique are
poorly matched. Numerous approaches have been proposed attempting to minimize the assumptions
concerning activation dynamics in order to avoid the possibility of missed activation (e.g., Brammer
1998; Golay et al. 1998; Andersen et al. 1999). However, for the most part, these techniques have
remained more of theoretical interest than practical significance because there have not been striking
examples of their necessity.
The wide range of temporal fMRI responses recently found in human auditory cortex to
prolonged (e.g., 30 s) stimuli provide a clear illustration of the need for methods capable of detecting
an extensive range of response waveshapes.
Depending on the sound stimulus, the temporal
dynamics of auditory cortical activation can vary from the sustained waveshapes seen typically in
fMRI to atypical “phasic” waveshapes that include prominent peaks just after sound onset and offset
(e.g., see Figure 3-3). The demonstrated capacity of auditory cortex to show these variations in
fMRI waveshape raises the possibility of similarly dramatic, but as yet unidentified variations in
other cortical areas. It also raises the possibility of additional, as yet undetected modes of response
both within and outside the auditory system.
That some forms of activation can be easily missed is clearly illustrated using one of the
most statistically powerful, but also the most constrained of detection methods – cross-correlation
with an assumed response waveshape. For a commonly used cross-correlating function, a smoothed
Introduction
77
boxcar, the cross-correlation approach performs well when activation is sustained, but poorly when it
is not. This point is well-illustrated by the poor detection of phasic responses in auditory cortex, as
compared to the good detection of sustained responses (e.g., Figure 3-3, bottom activation maps). In
cases like auditory cortex, where the extremes of response waveshape are quite different, a detection
technique with highly constrained assumptions concerning waveshape is bound to fail at one end of
the waveshape spectrum or the other. A method that allows for a wide range of temporal dynamics
is therefore essential if activation is to be detected reliably. The overall objective of the present
study was to develop such a method. We specifically sought an approach that would provide
information about the underlying waveshape of activation as an automatic by-product of detection.
Of the various approaches with the potential to detect a range of response waveshapes, we
identified the general linear model (GLM) as having, in principle, the characteristics needed to meet
our goals, and rejected several other alternatives because they did not satisfy our requirements. In
theory, a signal decomposition based on wavelet analysis may support the detection of responses
with dynamics that vary over different temporal scales (Brammer 1998; von Tscharner and Thulborn
2001). However, a wavelet transformation does not necessarily facilitate the extraction of directly
pertinent, interpretable information concerning the temporal dynamics of activation. Other possible
alternatives, such as autocorrelation analysis (Paradis et al. 1996), fuzzy clustering (Baumgartner et
al. 1998; Golay et al. 1998; Chuang et al. 1999; Fadili et al. 2000), and principle component analysis
(Sychra et al. 1994; Andersen et al. 1999) lack a well-defined statistic that supports inference on a
univariate, voxel-by-voxel, basis. The GLM on the other hand, does not suffer from this limitation
and can provide direct information concerning the temporal properties of activation for individual
voxels.
The basic idea behind the GLM is that the response can be modeled as a weighted sum of
“basis functions”. The amplitudes of the basis functions are estimated so as to give the best overall
fit to the measured response. As a by-product of detection under the GLM, the basis functions and
their corresponding amplitudes can provide direct information about the underlying temporal
responses, provided the basis functions are chosen to relate directly to specific features of the
waveshape of activation. Additional strengths of the GLM include: 1) the capability to handle the
78
Chapter 3: Physiologically-motivated GLM
correlated nature of fMRI time-series, 2) the existence of a well-defined, easily-computed statistic
(F-statistic) for estimating significance relative to the null-hypothesis, and 3) good statistical power
characteristics (Ardekani and Kanno 1998).
An important element of the present study was devising a flexible, yet concise, set of basis
functions for modeling a range of cortical responses within the GLM framework. The vast majority
of previous implementations of the GLM have used a single, “sustained” basis function (equivalent
to cross-correlation with a smoothed boxcar or, similarly, a t-test). Some studies have used a twoelement basis set, supplementing the standard sustained function with its temporal derivative or a
component with an exponential decay (Friston et al. 2000; Giraud et al. 2000). However, this
implementation remains limited in terms of the range of response waveshapes that can be handled.
The opposite extreme is to use basis functions that lead to a direct estimate of all time-points of the
response (i.e., finite impulse response models), thereby allowing for complete flexibility in possible
response dynamics (and truly unbiased response estimation; Burock and Dale 2000; Miezin et al.
2000). However, such an approach will typically be ill advised for epoch-related studies with
prolonged stimulus presentation, since the direct estimation of many time-points will result in a
statistical test with drastically reduced power relative to a test based on a small, well-chosen basis
set.
An example of a small, well-chosen basis set is a series of sinusoids (i.e., a truncated Fourier
series; Friston et al. 1995b; Bullmore et al. 1996; Ardekani et al. 1999). The GLM using this basis
set is generally acknowledged to be powerful in terms of detection, and flexible in that it can handle
a wide range of temporal responses. In theory, this approach is capable of detecting any response
with frequency components in the range of the basis set. A downside, however, is that sinusoidal
basis functions do not necessarily have physiological meaning. Thus, while providing a powerful
means for detecting a variety of responses, a sinusoidal basis set does not meet our objective of
providing direct information about different response components.
The present study took a different approach from previous GLM work in that the choice of
basis functions was neurophysiologically motivated. Specifically, the form of each function was
chosen to mirror the general shape of a particular component in actual fMRI responses, the idea
Introduction
79
being that certain components may indicate particular aspects of underlying neural activity (e.g., the
prominent peaks after stimulus onset and offset in the phasic responses of auditory cortex likely
reflect neural activity in response to stimulus onset and offset). Thus, the basis functions, together
with their amplitudes, should provide direct, readily interpretable information about the temporal
dynamics of responses, and hence the neural activity underlying them. A key difference between the
present approach and some previous physiologically-driven detection methods (Purdon et al. 2001) is
that our choice of basis functions was oriented toward understanding the neural activity behind fMRI
responses, rather than modeling the hemodynamics. Here, five basis functions were chosen. These
will be referred to as the “OSORU” basis set, a name that derives from descriptions of the individual
components (Onset, Sustained, Offset, Ramp, Undershoot; see Figure 3-1).
The present study examined the utility of the OSORU basis set in three ways. First, the
detection capability of the OSORU basis set was assessed by testing whether the extent of detected
activation was (1) greater than or equal to the extent obtained using one of the most common
detection methods, i.e., comparison with a (smoothed) boxcar reference waveform, and (2)
comparable to that obtained using a sinusoidal basis set – an alternative basis set generally
acknowledged to be powerful, yet flexible, in terms of response detection. Second, the ability of the
OSORU basis set to match different waveshape components was assessed by comparing the
amplitude of different OSORU basis functions (or combinations thereof) with direct measurements
(e.g., baseline to peak amplitudes) from the waveforms themselves.
Both the detection and
waveform matching tests were conducted using an extensive and challenging database composed of
auditory cortical responses to a variety of sounds. Finally, we examined the utility of the OSORU
basis set for extracting physiological information concerning responding brain areas. As an example,
we derived one particular measure from the OSORU basis functions that summarizes the degree to
which auditory cortex responds to sound in a transient vs. sustained manner. We show that the
relative amounts of transient and sustained activity can be captured in this measure, and can be
spatially mapped across cortical areas.
Overall, the GLM method using the OSORU basis set provided reliable detection and
straightforward characterization of the temporal dynamics of auditory cortical activation. Although
80
Chapter 3: Physiologically-motivated GLM
developed and tested on the human auditory system, this approach should also be applicable, both in
concept and detail, to other brain systems and species.
METHODS
fMRI data
The data used in this paper are pooled across experiments that examined the representation
of sound in the fMRI responses of the human auditory system. The overall database comes from 25
subjects and 39 total imaging sessions. All studies were approved by the institutional committees on
the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear
Infirmary, and Massachusetts General Hospital, and all subjects gave their written informed consent.
Stimuli were trains of broadband noise bursts with various rates (1, 2, 10, and 35/s) and
duty-cycles (5, 25, 50, and 88%), trains of narrowband noise bursts (third octave bandwidth, center
frequency 500 Hz or 4 kHz; rate = 2 or 35/s), trains of tone bursts (500 Hz or 4 kHz, rate = 2 or
35/s), continuous broadband noise, trains of clicks (35 or 100/s), orchestral music, and running
speech. This repertoire of stimuli produced a wide range of response waveshapes and thus provided
a challenging database for developing and testing the detection and quantification approach of the
present study. Three to six stimuli were included in each imaging session, giving a database with a
total of 177 cases for which we constructed activation maps and response estimates.
Stimuli were always presented for 30 s “on” epochs alternated with 30 s “off” epochs, during
which no auditory stimulus was presented. The on/off “period” corresponding to a given stimulus
was repeated 4 – 13 times during an imaging session. The actual functional imaging was organized
into individual “runs” consisting of 4 – 5 on/off periods. Typically, the different stimuli in a session
were presented once each run, and their order was varied across runs. However, in 13 sessions the
same stimulus was used repeatedly throughout a given run, as was also the case for all cases of the
music stimulus. Stimuli were delivered binaurally through a headphone assembly that provided
approximately 30 dB of attenuation at the primary frequency of the scanner noise (Ravicz and
Methods
81
Melcher 2001). In 26 of the sessions, stimulus levels were set to approximately 55 dB above
threshold (SL); in the remaining 13 sessions stimulus level was varied between 35 and 75 dB SL.
Imaging was performed on five different systems: 1.5 and 3.0 T General Electric magnets
retrofitted for high-speed, echo-planar imaging (by Advanced NMR Systems, Inc.), a 1.5 T General
Electric Signa Horizon magnet, and 1.5 and 3.0 T Siemens System Sonata and Allegra magnets. For
functional imaging, the selected slice intersected the inferior colliculus and the posterior aspect of
Heschl's gyrus and the superior temporal gyrus (which include auditory cortical areas). A single
slice, rather than multiple slices, was imaged to reduce the impact of scanner-generated acoustic
noise on auditory activation without sacrificing temporal resolution. Slice thickness was always 7
mm with an in-plane resolution of 3.1 x 3.1 mm. Functional images of the selected slice were
acquired using a blood oxygenation level dependent (BOLD) sequence. For the 1.5 T experiments,
the sequence parameters were: asymmetric spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o. For
the 3.0 T experiments the parameters were: gradient echo, TE = 30 ms (except one session used 40
ms and another used 50 ms), flip = 600 or 900. A T1-weighted anatomical image (in-plane resolution
= 1.6 x 1.6 mm, thickness = 7 mm) of the functionally imaged slice was also obtained and used to
localize auditory cortex.
The present study focuses on responses from auditory cortex, although the data comes from
studies that also examined the inferior colliculus. Therefore, functional images were generally
collected using a cardiac gating method that increases the detectability of activation in the inferior
colliculus (Guimaraes et al. 1998). Image acquisitions were synchronized to every other QRS
complex in the subject's electrocardiogram, resulting in an average interimage interval (TR) of
approximately 2 s. Image signal was corrected to account for the image-to-image variations in signal
strength (i.e., T1 effects) that result due to fluctuations in heart rate (Guimaraes et al. 1998). In the 3
sessions that did not use cardiac gating, the TR was 2 s.
Prior to response estimation using the general linear model, two additional imaging preprocessing steps were performed. First, the images for each scanning run were corrected for any inplane movements of the head that may have occurred over the course of the imaging session using
standard software (SPM95 software package; without spin history correction; Friston et al. 1995a;
82
Chapter 3: Physiologically-motivated GLM
Friston et al. 1996). Then, because cardiac gating results in irregular temporal sampling, the timeseries for each imaging “run” and voxel was linearly interpolated to a consistent 2 s interval between
images, using recorded interimage intervals to reconstruct where each image occurred in time.
While it would be relatively straightforward to create basis functions sampled at the actual times at
which images were acquired, there is no straightforward way to rigorously model and estimate the
noise covariance given irregular temporal sampling. Therefore, we interpolate the time-series for
each voxel and treat the data as if it were originally sampled with a regular (2 s) interval.
Independent of the GLM implementation, we computed empirical response time courses by
averaging across repeated presentations of a given stimulus in a given imaging session. Specifically,
the time-series for each imaging run and voxel was corrected for linear or quadratic drifts in signal,
normalized to an (arbitrary) mean intensity, and temporally smoothed using a three point, zero-phase
filter (with coefficients 0.25, 0.5, 0.25).1 A response “block” was defined as a 70 s window (35
images) that included 10 s prior to stimulus onset, the 30 s coinciding with the stimulus “on” period,
and the 30 s “off” period following the stimulus. These response blocks were averaged according to
stimulus to give an average signal vs. time waveform for a given stimulus in a session. For each
stimulus and session, we further averaged signal vs. time across “active” voxels (i.e., all voxels in
auditory cortex with p < 0.001 according to the OSORU GLM). The resulting “grand-average”
waveforms were then converted to percent change in signal relative to baseline. The baseline was
defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the
stimulus.
Basis functions
OSORU basis set – The OSORU basis set was composed of five physiologically-motivated
components, shown in Figure 3-1. The design of these components was guided by a small subset of
1
Technically, the drift correction and normalization were performed prior to interpolating the time series to a
consistent 2 s inter-image interval.
Methods
83
the responses in the overall database (i.e., ~20 out of 177). The components include: 1) A transient
response to the onset of the stimulus. The time course of this component peaks at 6 s post-stimulus
onset and returns to near baseline by 14-16 s; 2) A sustained response during the stimulus. This
component represents the convolution (normalized to unit amplitude) of the onset component with a
boxcar waveform.2 The beginning of the sustained component was designed to overlap the onset
component, so that the onset component would primarily reflect a transient response that was above
and beyond any sustained response; 3) A slowly changing (approximately linear) response during
the stimulus, included because responses in auditory cortex sometimes exhibit a “signal recovery”
following the signal decline that characterizes the onset component.
This “ramp” component
represents the convolution of the onset component with a ramp that increases linearly from t = 8 to
30 s; 4) A transient “offset” response to stimulus termination, which is a time-shifted version of the
onset component. Offset responses occur frequently in our auditory database. Yet, the possibility of
offset responses has generally not been factored into fMRI response detection; 5) An additional
transient component (with a simple Gaussian waveshape) following the offset component was
included primarily to help model responses with a post-stimulus undershoot. Like the other basis
functions, this “undershoot” function was defined as a positive deviation from baseline (Figure 3-1).
It was therefore expected to have a negative amplitude for responses with a clear undershoot (i.e., a
negative deviation from baseline). Altogether, these five components define the OSORU basis set
(Onset, Sustained, Offset, Ramp, Undershoot). The five components of the OSORU set form a
concise, yet flexible, basis set that is capable of naturally modeling the prominent features of a
variety of response waveshapes.
2
Technically, the amplitude of the boxcar waveform for the first sample (t = 0 s) was twice that of the
remainder of the 30 s “on” period, in order to impart a more rapid signal increase to the sustained component.
84
Chapter 3: Physiologically-motivated GLM
OSORU Basis Functions
Onset
1.0
Amplitude
0.8
Sustained
Offset
Ramp
Undershoot
0.6
0.4
0.2
0
0
10
20
30
40
50
60
Stimulus On
Time (sec)
Figure 3-1: The five physiologically-motivated functions of the OSORU basis set. The shaded area
indicates the period of sound stimulation.
Sustained-only basis "set" – For one comparison to the OSORU basis set, we used a basis
“set” that consisted of just a single component, namely, the sustained component of the OSORU
basis set. We included this comparison since methods assuming a sustained response (whether a raw
“boxcar” waveform, e.g. a t-test, or a “hemodynamically smoothed” version thereof) have
historically played a prominent role in the analysis of fMRI data acquired in an “epoch-related”
paradigm. Note that computing activation maps from the sustained-only basis set is not the same as
computing maps from just the magnitude of the sustained component in the OSORU basis set. This
Methods
85
is because the basis functions in the OSORU set are correlated (correlation coefficients: –0.5 to 0.6).
Consequently, the estimated amplitudes for a given component are dependent on precisely which
other components are included in the complete model (Draper and Smith 1981; Andrade et al. 1999).
Sinusoidal basis set – For the second comparison to the OSORU basis set, we examined a
sinusoidal basis set that consisted of the 1st through 4th harmonics of a truncated Fourier series
(including both sine and cosine terms). The 1st harmonics had a fundamental frequency defined by
the “on/off” stimulation period (i.e., 1/60 s). Previous studies have used just the 1st-3rd harmonics,
under the assumption that these harmonics were sufficient for modeling the response space
(Bullmore et al. 1996; Ardekani et al. 1999). Here, we added a higher-order harmonic because
preliminary work indicated it was needed to ensure optimal detection of phasic responses. In
particular, we simulated an artificial data set consisting of a phasic response in Gaussian white noise,
and examined the performance of sinusoidal basis sets in which the maximum harmonic varied from
one to eight (including both the sine and cosine terms for all harmonics up to and including the
maximum). Receiver-operator-characteristic curves (Constable et al. 1995; Skudlarski et al. 1999)
showed that the basis set with the 1st-4th harmonics was most powerful in detecting the phasic
response.
The important contribution of the 4th harmonics is not surprising given that these
harmonics together account for about 25% of the variance in a signal with a phasic waveshape,
second only to the 2nd harmonics, which together account for 45% of the signal variance. (Because
sinusoids are orthogonal, the “explained signal variance” can be uniquely partitioned among the
individual components).
“Trend” functions – In implementing the GLM, for each of the three basis sets (i.e.,
OSORU, sustained-only, sinusoidal), we included three additional functions in the design matrix for
each imaging run to handle low-frequency “trends” in the signal vs. time. These included the
standard column vector of ones (for estimating the mean over the run), and both a linear and
quadratic vector for estimating any signal drift during the course of the run. Although these three
vectors had non-zero amplitudes, their estimated amplitudes were ignored in the creation of the
activation maps. Note that we assumed response waveshape and magnitude were constant across
86
Chapter 3: Physiologically-motivated GLM
repeated presentations of a given stimulus, so none of the implementations of the GLM incorporated
a progressive “habituation” in the amplitude of the basis functions across stimulus repetitions.
Response and noise estimation under the general linear model
The GLM was implemented separately for each of the three basis sets: OSORU, sustainedonly, and sinusoidal.3
It is well documented that the noise in typical fMRI time-series is
autocorrelated (i.e., is not white; Bullmore et al. 1996; Locascio et al. 1997; Zarahn et al. 1997;
Purdon and Weisskoff 1998), typically resulting in an inflation of the false-positive rate above its
theoretically predicted value.
Therefore, in implementing the GLM, an estimate of the noise
autocorrelation was used to pre-whiten the data.4
Under the general linear model (e.g., Friston et al. 1995c; Burock and Dale 2000), the timeseries y for a single voxel is modeled as a weighted linear sum of basis functions plus a noise term: y
= Xβ + η, where the columns of the design matrix X contain the basis functions used to model the
response space, β is a vector of basis function amplitudes to be estimated, and η is a noise sequence
with arbitrary covariance matrix Ω . If the covariance Ω is known, the generalized least squares
(GLS) estimator (which is also the maximum likelihood estimator if the noise is multivariate
Gaussian) of β is βˆ GLS = ( XT Ω −1X) −1 XT Ω −1y . This estimator is the optimally efficient estimator,
meaning that it has the smallest possible variance, among the class of linear, unbiased estimators of
β. In our situation, the noise covariance Ω is unknown, but can be estimated (described below).
Given such an estimate Ω̂ , the feasible generalized least squares (FGLS) estimator is obtained by
ˆ −1X) −1 XT Ω
ˆ −1y . Under fairly general conditions,
simply replacing Ω with Ω̂ , i.e., βˆ FGLS = ( XT Ω
3
For the present study, the GLM was implemented by programs originally developed by Doug Greve and
colleagues (at the Martinos Imaging Center), but which we modified for our purposes. It should be possible to
incorporate the OSORU and sinusoidal basis sets within any package that implements the GLM (The
“sustained-only” set or an equivalent is a standard part of GLM packages). Note that some packages (e.g.,
Statistical Parametric Mapping) automatically orthonormalize their basis sets, which would have to be disabled
in order to preserve the physiological meaning of the OSORU functions.
4
An alternative would be to “pre-color” the data by replacing the unknown endogenous autocorrelation with
an exogenous autocorrelation of known form by applying a smoothing filter. This approach is theoretically
less efficient than pre-whitening, but also less prone to invalid statistical inference if the noise is incorrectly
estimated (Friston et al. 2000; Bullmore et al. 2001).
Methods
87
βÌ‚ FGLS will have desirable asymptotic properties. Generally, if Ω̂ is a consistent estimator of the
noise, then βÌ‚ FGLS will be an asymptotically efficient estimator of β (Fomby et al. 1984). Depending
on the mismatch between the estimated and actual autocorrelations, βÌ‚ FGLS will have nearly
minimum variance.
To estimate the noise covariance we adopt an iterative approach (Bullmore et al. 1996;
Burock and Dale 2000) in which we first compute the ordinary least squares (OLS) estimate of β,
βˆ OLS = ( X T X) −1 X T y . The underlying noise structure is then estimated from the residual error
given by e = y − Xβˆ OLS . In its most general form, Ω̂ has more free parameters than the number of
observations available in the residual error vector e. Therefore, to make the estimation of Ω
tractable, it is necessary to make some assumptions about the covariance structure of the noise. In
particular, we assume: 1) The noise is temporally stationary within an imaging run, meaning that a
given diagonal in Ω has a constant value. 2) The noise structure does not vary over a given imaging
session, so a given Ω applies to all imaging runs in a session. 3) Diagonals beyond dmax in Ω are
zero, meaning that any noise correlation effectively decays to zero for lags greater than a certain
value. To ensure that potential long-term correlations are captured, we use a generous maximum lag
of 30 s, yielding dmax = 15. 4) The noise is assumed to be locally constant across auditory cortex,
ˆ = σˆ 2 Λ
ˆ , where ΛÌ‚ is the “normalized” covariance matrix
within a scale factor – i.e., we assume Ω
2
(with ones on the diagonal) that is constant across voxels, and σˆ is the scalar variance that is
estimated separately for each voxel. Since the noise covariance can vary across the brain (Bullmore
et al. 1996; Friston et al. 2000), we limited the noise estimation to voxels from auditory cortex, in
order to avoid using voxels that were altogether unrelated to the structure of interest. Auditory
cortex was defined (from anatomical images) as all cortex lateral to the medial-most aspect of the
Sylvian fissure, between the superior temporal sulcus and the inferior edge of the parietal lobe.
Our approach for estimating the noise differs from some previous papers in that we use the
estimated autocorrelation out to a generous lag (30 s; i.e., 15 samples) to directly form the estimate
of the normalized covariance matrix ΛÌ‚ , rather than assuming a potentially limiting parametric form
for the autocorrelation, such as an AR(1) or AR(3) model (Bullmore et al. 1996; Bullmore et al.
2001), AR(1) plus white noise (Burock and Dale 2000; Purdon et al. 2001), or a 1/f model (Zarahn et
88
Chapter 3: Physiologically-motivated GLM
al. 1997). Theoretically, an AR(p) model can exactly match any given autocorrelation function out
to lag p (e.g., when the AR(p) parameters are estimated using the “Yule-Walker” equations; Percival
and Walden, 1993). Consequently, given a sufficiently rapid decay in the autocorrelation (which is
supported by Figure 3-2), our approach is essentially equivalent to the AR(16) approach that Friston
et al. (2000) used as their “gold standard” for estimating the noise autocorrelation.
The actual empirical noise autocorrelation functions (ACFs), estimated under GLM using
the OSORU basis set, are plotted in Figure 3-2. There was a clear trend for greater autocorrelation in
the sessions conducted with a 3 T magnet, compared to those at 1.5 T. Furthermore, for extended
lags there was a small, but consistent, bias toward negative autocorrelations, which was again more
pronounced for the sessions conducted at 3 T. This negative autocorrelation could reflect genuine
long-term correlations in the noise process, although it seems more likely that it reflects a small bias
in our procedure for estimating the noise. The ACFs estimated under the sinusoidal and sustainedonly basis sets were very similar to the ACFs under the OSORU set – the largest absolute difference
in the ACF functions (across all sessions and lag values) was 0.04 between the OSORU and
sinusoidal basis sets, and 0.07 between the OSORU and sustained-only sets.
Examination of residuals
To investigate the validity of our assumptions about the structure of the noise covariance, we
examined the residuals from the GLM. In particular, we computed the Ljung-Box diagnostic (Ljung
and Box 1978), which tests whether the residuals are consistent with white noise. The Ljung-Box
diagnostic, QK, is given by
rk2
QK = N(N + 2)∑
k =1 N − k
K
where
rk
is
the
autocorrelation
function
computed
from
the
“whitened”
residuals
ˆ −1 2 (y − Xβˆ
e* = Λ
FGLS ) , and N is the length of the residual vector. Under the null hypothesis that
Methods
89
Noise Autocorrelation
0.5
Average
across sessions
0.6
3.0 T
Autocorrelation
0.4
0
1
Individual sessions
1.5 T
3.0 T
0.2
1.5 T
15
0
-0.2
1
5
9
13
Lag (samples)
Figure 3-2: Estimated autocorrelation functions (ACFs) of the noise in auditory cortex. The main
panel plots the ACFs from lags 1 – 16 for each session (x’s for sessions using a 1.5 T magnet, n =
27; o’s for 3.0 T, n = 12). The autocorrelation for lags 16 and greater was assumed to be zero. The
inset plots the ACFs averaged across sessions, according to field strength.
90
Chapter 3: Physiologically-motivated GLM
ΛÌ‚ is estimated correctly, the whitened residuals are serially independent, and QK is distributed as χ2
with K degrees of freedom.5 We choose K = 10, and computed Q10 on a voxel-by-voxel, and run-byrun basis. To summarize the results, we computed, for each run, the percentage of voxels in auditory
cortex that failed the Ljung-Box test at a p-value of 0.01, and then averaged these percentages across
runs to form a single summary measure for each experimental session (PercentLB).
Larger
percentages indicate that the residuals of voxels in auditory cortex were more frequently inconsistent
with the null hypothesis of white noise, indicating that the nominal p-values computed from βÌ‚ FGLS
are less likely to reflect the true false-positive rate. For comparison, we also computed PercentLB
based on the ordinary least square residuals used originally to estimate the noise covariance.
Estimating and accounting for the noise covariance resulted in a dramatic decrease in the
number of voxels whose residuals failed to exhibit serial independence. Prior to noise estimation
(i.e., using the OLS residuals), PercentLB, computed under the OSORU basis set, averaged 40% for
the sessions conducted with a 1.5 T magnet (n=27) and 76% for the sessions with a 3 T magnet
(n=12). After noise estimation, PercentLB was reduced to 6% and 19%, respectively. Under the
sustained-only and sinusoidal basis sets, PercentLB after noise estimation was within a percentage
point of the values obtained for the OROSU basis set. The fact that PercentLB remained somewhat
elevated for the 3 T sessions indicates that our assumptions regarding the structure of the noise were
violated more frequently at the higher field strength (e.g., Bullmore et al. 2001). At 1.5 T, the
autocorrelations were generally lower, so a “floor effect” may mask some of the noise variations
across space (i.e., voxels) and time (e.g., across imaging runs), whereas the higher autocorrelations
seen at 3 T allow for greater spatial and temporal heterogeneity. While PercentLB indicates that
statistical inference will not be fully accurate in all voxels, we proceed to use all the voxels and their
associated p-values (under the FGLS approach), given the dramatic improvement in accuracy in a
large percentage of voxels.
5
In effect, less than 1 degree of freedom was used in estimating the noise at a given voxel, since the 15
parameters required for ΛÌ‚ were estimated using all the voxels in auditory cortex. Therefore, we used K as the
degrees of freedom for the Ljung-Box test.
Methods
91
Practical implementation
In actually computing βÌ‚ FGLS and ΛÌ‚ we utilized the fact that the data were acquired in
separate imaging runs, in order to avoid working with unnecessarily large data matrices.
Specifically, we computed

 Nr
βˆ OLS =  ∑ XTr X r 

 r =1
−1
 Nr T 
 ∑ X r y r 

 r =1
where X r is the design matrix for run r, y r is the fMRI time-series from run r (pre-processed as
specified above), and Nr is the number of runs.
Using the residual errors for each run
e r = y r − X r βˆ OLS , we estimated the noise autocorrelation (up to lag k = 15) as
ρ k = δ k δ , where
0
T −k
∑ ∑ ∑e
δk =
e
r , t r ,t + k
runs AC t =1
voxels
.
N total
Ntotal gives the total number of terms in the above triple summation, and er,t is the tth OLS residual
from run r. Given ρ k , we constructed ΛÌ‚ = Toeplitz( ρ k ), where the Toeplitz operator returns a
matrix that is symmetrical about the leading diagonal. Finally, we estimated
βˆ FGLS
 Nr
ˆ −1X 
=  ∑ XTr Λ
r

 r =1
−1
 N r T ˆ −1 
 ∑ X r Λ y r  .

 r =1
The scalar variance at each voxel was estimated from the whitened residual error
ˆ −1 2 (y − X βˆ
e r* = Λ
r
r FGLS ) as
Nr
σˆ 2 =
∑e
r =1
T
r* r*
e
Ndof
,
where Ndof is the total number of time points (for a given voxel) summed across all runs minus the
total number of estimated parameters (i.e., the number of columns in the design matrix X).
92
Chapter 3: Physiologically-motivated GLM
Activation map formation
For each stimulus in a given imaging session, we created an “omnibus” map that tested
against the null hypothesis that none of the estimated basis function amplitudes (for a given basis set)
were significantly different from zero. To generate these activation maps, statistical inference was
drawn from the estimated amplitudes βÌ‚ FGLS using the generalized hypothesis test. Specifically, we
test against linear hypotheses of the form H o : Rβ = q , where R is a restriction (or contrast) matrix,
and q is a deterministic vector equal to an appropriately sized column vector of zeros for our
analyses.
Since βÌ‚ FGLS is asymptotically distributed as a multivariate Gaussian if the noise
autocorrelation is estimated consistently, an F-statistic can be used for generalized inference (Fomby
et al. 1984). Specifically,
F [ J , N dof ] =
ˆ −1 X) −1 R T ] −1 (Rβˆ
(Rβˆ FGLS − q) T [σˆ 2 R ( X T Λ
FGLS − q )
,
J
where J is the number of rows in R and Ndof is the total number of time points (for a given voxel)
summed across all runs minus the total number of estimated parameters. For the jth stimulus out of
Nstim total stimuli in a session, R = [a1 a2 … aNstim], where ai=j = I (the identity matrix) and ai≠j = 0.
We compare the relative performance of the OSORU, sustained-only, and sinusoidal basis
sets by comparing the “active” voxels at two p-value thresholds (p = 0.001, 0.05 for individual
voxels; not corrected for multiple comparisons). For such comparisons to be informative, it is
important that the statistical inference be equally valid for the GLM based on each of the three basis
sets. We know that the nominal p-values do not strictly match their theoretical values for any of the
basis sets. For instance, the sustained-only set will result in biased estimation in cases with phasic
responses. Also, the Ljung-Box diagnostic indicated that our assumed structure for the noise was not
accurate at all voxels (albeit a small minority). Importantly however, the Ljung-Box diagnostic did
have a similar failure rate under all three basis sets (i.e., similar PercentLB), implying that the pvalues produced by the three sets are valid to roughly equal degrees. Note that our comparison of
activation maps quantifies the ability of the three basis sets to detect a response in the practical
situation in which the actual response and the noise is unknown, and consequently a ROC-type
analysis (receiver-operator-characteristic) of the true statistical power cannot be conducted.
Methods
93
(Although, a family of possible of ROC curves could be derived given assumptions about the type II
error rate for missed activation; e.g., Bullmore et al. 1996).
Waveshape index
Using the OSORU basis functions, we devised a “waveshape index” capable of broadly
distinguishing between sustained and phasic responses. The basic idea was to compare the total
amount of “transient activity” (defined as the sum of the onset and offset amplitudes) to the activity
at the midpoint of the response. Secondarily, the waveshape index was designed to distinguish
responses with transient activity that was primarily limited to either the onset or offset component
from responses with onset and offset components that approached each other in magnitude. The
exact formulation of the waveshape index was chosen so as to yield a robust measure that stayed
within a finite range. Reasonable behavior for the waveshape index was confirmed by examining
how well the index qualitatively sorted the waveforms of the present study (see Results).
Specifically, the waveshape index was defined as:

On + Off
1

 ∈ [0,1]
2  Mid + max(On, Off ) 
(1)
where, On and Off represent the magnitudes of the onset and offset components of the OSORU basis
set (as estimated under the GLM), and Mid is defined as the sum of the magnitude of the sustained
component plus one-half of the ramp component.6
Using this definition, maximum values for the waveshape index (i.e., near 1) can only result
if the two transient components are similar in magnitude and are large relative to the midpoint
response.
Values near one-half can reflect a response consisting of solely an onset or offset
response, or alternatively a combination of onset and offset activity in a response also having some
6
Technically, On, Off, and Mid were all rectified prior to their use in Eq. (1) (i.e., negative values were
converted to zero). Consequently, if On, Off, and Mid, were all negative (or zero) prior to rectification, the
denominator of Eq. (1) would be zero, and hence the waveshape index would be undefined. In practice, this
situation never occurred when the magnitudes of the basis functions were first averaged across active voxels in
auditory cortex (i.e., the “summary waveshape index”). Nor did this situation occur for any of the individual
voxels active in left auditory cortex for the spatial maps of waveshape index shown in Figure 3-6.
94
Chapter 3: Physiologically-motivated GLM
midpoint activity. Values near zero reflect a response dominated by the midpoint response (i.e., by
the sustained and/or ramp components of OSORU basis set).
Eq. (1) was utilized in two different ways in the present study. At the level of individual
voxels, Eq. (1) was used to create spatial “maps” of the waveshape index for a given stimulus in a
session (Figure 3-6). We also calculated a summary waveshape index for each stimulus of each
session using the magnitudes of the basis functions averaged across all the active voxels (p < 0.001)
in auditory cortex. Despite the non-linearity of the waveshape index calculation, the resulting
summary waveshape indices were quite similar to a second possible summary measure, defined as
the mean of the waveshape indices computed separately for each of the active voxels (R2 = 0.97 in a
linear regression comparing the two alternatives).
Thus, the summary waveshape index was
insensitive to the precise method of summarizing across voxels.
RESULTS
Activation detection: OSORU vs. sustained-only and sinusoidal basis functions
OSORU vs. sustained-only – From the standpoint of activation detection, the OSORU basis
set was as effective as the sustained-only set, and was far more effective in many cases. The
difference in performance is illustrated qualitatively by the maps in Figure 3-3, which show
activation for two stimuli (2/s and 35/s noise burst trains) studied in the same imaging session. The
OSORU and sustained-only basis sets detected approximately equal extents of activation for 2/s
noise bursts, which produce sustained response waveshapes (Figure 3-3, left). In contrast, for 35/s
noise bursts, the OSORU set detected extensive activation while the sustained-only set gave the
impression of little activity in auditory cortex (Figure 3-3, right). That this impression from the
sustained-only set was erroneous could be seen by inspecting the responses of individual voxels. In
response to 35/s noise bursts, voxels throughout auditory cortex showed signal peaks that were
clearly time-locked to sound onset and offset. These “phasic” responses to 35/s noise bursts were
Results
95
Response detection with three different basis sets
2/s Noise Bursts
R
35/s Noise Bursts
p=0.001
L
OSORU
p = 2x10-9
Sinusoidal
Heschl’s
Gyrus
Superior
Temporal
Gyrus
Response
Waveform
% Signal Change
SustainedOnly
2.0
“Sustained”
0.0
“Phasic”
On
0
30
0
30
Time (sec)
Figure 3-3: Top three rows: Activation maps obtained using the OSORU, sinusoidal, and sustainedonly basis sets, for two different stimuli that elicit sustained (left) or phasic (right) responses. The
OSORU and sinusoidal basis sets perform well, regardless of underlying response waveshape. In
contrast, the sustained-only basis set only performs well when responses are sustained. For each
stimulus, the three sets of maps were created using the same underlying data. The data for the two
stimuli were obtained in the same imaging session. Stimuli were noise burst trains with repetition
rates of 2/s (left) or 35/s (right) (burst duration = 25 ms). Each panel is an enlargement of right (R)
or left (L) auditory cortex in a near coronal plane. Color activation maps (based on functional
images with an in-plane resolution of 3.1 x 3.1 mm) have been interpolated to the resolution of the
grayscale anatomic images (1.6 x 1.6 mm). Bottom row: The responses to each stimulus, averaged
over the “active” voxels (p < 0.001 in the OSORU maps) in auditory cortex of both hemispheres.
Auditory cortex included both Heschl’s gyrus and the superior temporal gyrus.
96
Chapter 3: Physiologically-motivated GLM
missed by the sustained-only set, whereas they were detected by the OSORU set, hence the greater
extent of detected activation for the OSORU basis set.
The trends seen qualitatively in Figure 3-3 were also borne out in quantitative comparisons
of maps based on the OSORU and sustained-only basis sets. For most imaging sessions and stimuli
(i.e., 117 of 177 cases), more voxels were defined as “active” (p < 0.001) using the OSORU basis
functions. To quantify the nature of the overlap of activated voxels obtained with the two basis sets,
we computed the number of voxels that were designated as active in the OSORU-based map, but not
the sustained-only map (NOSORU-only; i.e., an “exclusive or” operation), or vice-versa (Nsust-only). This
number was then divided by the number of voxels designated as active in both maps (Nboth, an “or”
operation, in which voxels active in both maps were counted only once). Across the 117 cases that
had more “active” voxels in the OSORU-based map, NOSORU-only/Nboth averaged 0.44, while Nsustonly/Nboth
averaged only 0.04. Thus, the voxels detected by the sustained-only set were essentially a
subset of the voxels detected by the OSORU set. Moreover, they were an appreciably smaller
subset.
Further analysis of voxel overlap showed that even for the cases in which the sustained-only
basis set detected an equal or greater number of voxels (60 cases, presumably with primarily
sustained underlying responses7), the number of such voxels was only marginally greater than those
detected by the OSORU set. Specifically, for these cases, Nsust-only/Nboth averaged only 0.16. Thus,
any loss of power in detecting sustained responses under the OSORU set (due to the increased
number of basis functions) was, in practice, quite small.
OSORU vs. sinusoidal – The OSORU basis set was as effective as, or slightly better than, the
sinusoidal set in terms of activation detection. Their similarity in performance is illustrated by the
activation maps in Figure 3-3. For both stimuli (2/s and 35/s noise burst trains), the extent of
activation with the OSORU basis set (Figure 3-3, top row) was comparable to that with the
sinusoidal set (second row). Notably, both basis sets revealed widespread activation in auditory
cortex for both stimuli, despite the substantial difference in underlying response waveform –
7
The average waveshape index of the 60 cases for which the OSORU basis set detected fewer voxels than the
sustained-only set was 0.20.
Results
97
sustained in one case (2/s), and phasic in the other (35/s; Figure 3-3, bottom row). Thus, both the
OSORU and sinusoidal basis sets have the flexibility to detect the extremes of response waveshape
seen in auditory cortex.
A further quantitative analysis revealed that more voxels were generally detected with the
OSORU basis set than with the sinusoidal set. On average, across all imaging sessions and stimuli
(i.e., all 177 cases), 31% of voxels within auditory cortex had p-values less than 0.001 for the
OSORU set, compared to 25% using the sinusoidal set. In 164 of the 177 cases, more voxels were
defined as active using the OSORU basis set. The fraction of voxels detected only with the OSORU
set (NOSORU-only/Nboth) was, on average, 0.26, whereas the fraction detected only with the sinusoidal
set (Nsins-only/Nboth) averaged 0.03 (the nomenclature and quantification parallels the comparison of
the OSORU and sustained-only basis sets above). In other words, the OSORU basis set detected
more “active” voxels in the vast majority of cases, and rarely missed voxels detected with the
sinusoidal basis set.
For the small minority of cases in which the number of voxels active in the OSORU map
was less than or equal to the number in the sinusoidal map (13 of 177), NOSORU-only/Nboth and Nsinsonly/Nboth
were similar: 0.13 and 0.19. This result indicates that in a small fraction of cases the two
basis sets were complementary in that each detected a comparable fraction of voxels missed by the
other.
Insensitivity of comparisons to p-value – As a check that our comparison of the OSORU,
sustained-only, and sinusoidal basis sets was not unduly sensitive to the specific p-value criterion
chosen, we repeated our quantification of activation map overlap using a liberal p-value threshold of
0.05. While more voxels were subsequently designated “active”, the relative performance of the
three basis sets, as defined by the percentage overlap of the active voxels, was essentially unaltered.
Relative importance of the OSORU basis functions
To gain a sense of the importance of the individual basis functions in the OSORU basis set
relative to each other and the overall response fit, we conducted the following analysis. For each of
the 177 cases, F-maps were computed for each of the five basis functions in the OSORU set, based
98
Chapter 3: Physiologically-motivated GLM
on the estimated amplitude of each function and its associated variance. Note that this is equivalent
to computing t-maps and proceeding to work with both the positive and negative tails of the
distribution, since F(1,ν) = t2(ν).
Then, for the voxels that were “active” in auditory cortex
according to the full OSORU basis set (i.e., p < 0.001), we counted the number of voxels that had p
< 0.1 in the map computed for a given, single basis function. (A higher p-value cutoff was used for
the individual functions to avoid an overly restrictive portrayal of their contribution to the overall
response fit). In the literature of linear regression, this procedure is sometimes called a partial Ftest, and is used in “backward elimination” procedures for determining the “best” set of variables for
a regression (Draper and Smith 1981). Conceptually, one can think of the partial F-test as treating a
given basis function as if it were the last to enter the regression equation, and then testing the null
hypothesis that the full set of basis functions does not (statistically) explain more of the variance than
a reduced basis set without the given function.8 Rejection of the null hypothesis means that the
given basis function significantly improves the response fit, relative to a reduced basis set without
this function.
All of the components of the OSORU basis set were important to the response estimation,
although with varying degrees of frequency. Of the total of 6731 “active” voxels as determined by
the full OSORU set (across all 177 cases), the individual basis functions were significant in their
respective F-maps (at p < 0.1) for the following percentages of voxels: 68, 68, 51, 37, and 25%, for
the onset, sustained, offset, ramp, and undershoot functions, respectively. If we had implemented a
true backward elimination procedure on the 6731 active voxels, in which we eliminated the single
basis function with the highest (i.e., least significant) p-value according to the partial F-test (provided
this p-value was greater than the 0.1 threshold), the onset function would have been eliminated in
10% of the voxels, the sustained in 12%, the offset in 17%, the ramp in 24%, and the undershoot in
35% of the voxels. (In 3% of the active voxels, all the individual basis functions had p < 0.1, so
none would have been eliminated). These results indicate that some basis functions were more
8
This “analysis of variance” interpretation of the partial F-test is strictly true only in a framework with
independent, uncorrelated errors (i.e., ordinary least squares).
Results
99
useful than others. More importantly however, the analyses indicate that all five basis functions of
the OSORU set were important for fitting the responses in a substantial number of voxels.
Assessment of correspondence between OSORU components and actual waveforms
Figure 3-4 compares the magnitude of three response measures estimated using two
methods, one based on the amplitudes estimated for the different OSORU basis functions, and an
alternative based on measurements from response waveforms. The left panel plots the amplitude of
the onset basis function versus an alternative waveform-based measure of “onset” amplitude, defined
as the maximum response from t = 4-10 s minus the mean response between t = 12-14 s. A linear
regression line relating the data was highly significant (y = 1.12x + 0.23; p < 0.001, R2 = 0.87).
Similarly, an OSORU-based measure of the response near its midpoint (the amplitude of the
sustained basis function plus one-half the amplitude of the ramp basis function) was a good
description of the waveform amplitude just after the midpoint of the stimulus “On” epoch (defined as
the mean response between t = 18-24 s; y = 0.97x + 0.04; p < 0.001, R2 = 0.92; Figure 3-4, middle).
Finally, there was good correlation between the amplitude of the offset basis function and a
waveform-based measure of “offset” amplitude, defined as the maximum response in the 4 – 8 s
following stimulus termination (34 ≤ t ≤ 38 s) minus the response amplitude at stimulus termination
(t = 30 s; y = 0.93x + 0.26; p < 0.001, R2 = 0.74; Figure 3-4, right). Overall, there was a good
correspondence between the two methods of amplitude quantification, which makes sense in view of
the qualitatively good match between the fitted and actual responses (Figure 3-5). Our results
indicate that the amplitudes of the OSORU basis functions can provide as accurate an assessment of
waveshape features as direct measurements from the waveforms themselves.
100
Chapter 3: Physiologically-motivated GLM
OSORU-Based (% change)
OSORU-Based vs. Waveform-Based Measures
of Amplitude
Onset
4
Midpoint
4
Offset
Regression
Line
2
Unity
2
2
0
0
0
0
2
4
0
2
4
0
2
Waveform-Based (% change)
Figure 3-4: Comparison of OSORU-based vs. waveform-based measures of three different response
amplitudes: onset, midpoint, and offset. Each ‘+’ represents a value based on all the “active” voxels
in auditory cortex (defined as voxels with p < 0.001 in activation maps generated using the OSORU
basis set) for each stimulus of each session. For the OSORU-based measures, the amplitudes of the
basis functions were averaged across the “active” voxels, and then converted to percent change by
dividing by the estimated signal mean (i.e., the average amplitude of the “trend” basis function
comprised of a vector of 1’s) and multiplying by 100. Values for the waveform-based measures
were taken from waveforms also expressed in terms of percent change. The solid line is the linear
regression line relating the two measures. The dashed line represents a one-to-one correspondence
between the two measures.
Results
101
Using the OSORU basis functions to probe response physiology
The utility of the OSORU basis set for extracting information concerning the physiology of
responding brain areas is illustrated by the following example. Here, we examine the relative
amounts of transient and sustained activity in auditory cortex using the waveshape index (defined in
Methods), a measure derived directly from the OSORU basis functions. To qualitatively evaluate
the effectiveness of this measure in capturing the relative amounts of transient vs. sustained activity,
we examined actual response waveforms sorted according to their waveshape index. Figure 3-5
illustrates the sorting, and simultaneously provides a sense of the overall variety of responses in the
database. Across the complete data set, the waveshape index did a good job of sorting the responses,
as determined by visual inspection.
Responses spanned a wide range, extending from highly
sustained (indicating activity throughout the stimulus presentation) to highly phasic (indicating
transient activity at sound onset and offset). As a rough approximation, responses with waveshape
indices less than ~0.25 were primarily sustained, while responses with indices greater than ~0.6 were
predominately phasic. Intermediate waveshape indices (between ~0.25 and ~0.6) indicated a blend
of sustained and transient activity.
In a small minority of cases, the measured response differed noticeably from the sum of the
OSORU basis functions (dashed lines in Figure 3-5), resulting in an apparent mismatch between
response waveform and waveshape index (6-7 waveforms out of 177). In these cases, the response
showed an appreciable signal decline, but the waveshape index was less than 0.25, indicating a
sustained response (e.g., Figure 3-5, fourth column, first row). The mismatches occurred because the
early part of the response was not effectively captured by the onset basis function, either because the
signal decline was not as rapid as that of the onset basis function, or because the response peaked
slightly later than the onset function.
(The latter effect accounts for the mismatch between
waveshape index and waveform in the fourth column of the first row in Figure 3-5). Despite
occasional mismatches, the waveshape index was successful in broadly indicating the relative
amounts of transient and sustained cortical activity in the overwhelming majority of cases.
102
Chapter 3: Physiologically-motivated GLM
Figure 3-5: Response waveforms (solid lines) sorted according to their summary waveshape index.
Every sixth waveform of the complete, sorted database (177 waveforms) is displayed. The shaded
region indicates the 30 s stimulus “on” period. Each response is an average across “active” voxels in
auditory cortex (p < 0.001 in maps constructed from the OSORU basis set) and across all
presentations of a given stimulus in a given session. Responses were converted to percent change,
and then normalized to a maximum of one. The number of “active” voxels varied from 3 – 78
(mean: 39), and the maximum response (prior to normalization) varied from 0.7 – 3.6% (mean:
1.7%). For an indication of the fit of the OSORU basis set to the responses, the sum of the OSORU
basis functions is also plotted (dashed lines). Specifically, this “fitted” response was computed by 1)
averaging the estimated amplitudes of a given OSORU basis function across “active” voxels, 2)
converting to percent change, 3) summing the OSORU basis functions as weighted by these percent
change values, and 4) normalizing the fitted response to a maximum of one.
Results
103
Auditory Cortex Responses Sorted by Waveshape Index
waveshape
: 0.03
index
0
0.08
0.11
0.13
0.15
0.18
ON
1
0.20
0.22
0.23
0.25
0.25
0.28
0.30
0.35
0.39
0.41
0.44
0.45
0.50
0.51
0.54
0.58
0.62
0.65
0.67
0.71
0.74
0.79
0.85
Normalized Signal
0
1
0
1
0
1
Measured
Waveform
0
0
30
0
30
0
30
0
30
0
30
Sum of
OSORU
Basis
Functions
Time (sec)
Figure 3-5
104
Chapter 3: Physiologically-motivated GLM
Spatial Maps
of Waveshape Index
(Left Auditory Cortex)
35/s
Clicks
Heschl's
Gyrus
Continuous
Noise
Superior
Temporal
Gyrus
0
Waveshape Index 1
Figure 3-6: Two cases that showed clear changes in waveshape with position. The cases correspond
to two different subjects and stimuli (top: 35/s clicks; bottom: continuous noise). Each panel shows
a color map of waveshape index superimposed on a grayscale anatomical image of the left superior
temporal lobe. A waveshape index is indicated for all voxels with p < 0.001 using the OSORU basis
set.
Figure 3-6 illustrates how positional variations in the relative amounts of transient and
sustained activity can be explored by spatially mapping the waveshape index on a voxel-by-voxel
basis within auditory cortex. The two cases in the figure were selected because they show clear
changes in waveshape with position. Specifically, there is an increase in waveshape index from
Heschl's gyrus medially to superior temporal gyrus laterally. This trend indicates that primary
auditory cortex (located on Heschl's gyrus) responded to the sound stimuli in a fairly sustained
fashion, whereas more lateral, non-primary areas responded more transiently.
This example
illustrates one of several ways (see Discussion) that spatial variations in the temporal patterns of
activation can be explored using the OSORU basis set.
Discussion
105
DISCUSSION
Successful response detection with the OSORU basis set
The physiologically-motivated, OSORU basis set successfully detected a variety of response
waveforms in auditory cortex, as measured against two alternatives – a sustained-only basis “set”,
and a sinusoidal basis set that has been previously used for flexible response estimation under the
general linear model (Ardekani et al. 1999). The OSORU basis set in many instances dramatically
outperformed the sustained-only “set”, since the OSORU set is able to fit a variety of underlying
response waveforms. Frequently, the sustained-only basis set failed to identify voxels that had a
consistent and repeatable, but “non-sustained”, response to a given stimulus, thus providing a
misleading picture of brain activation. But importantly, the ability of the OSORU basis set to detect
a wide variety of response waveforms did not come at the cost of poor detection of sustained
responses – even for sustained responses the OSORU set detected as many or nearly as many voxels
as the sustained-only set.
The fact that the OSORU basis set consistently designated more voxels as “active” than the
sinusoidal set suggests that the OSORU set had greater statistical power for detecting the responses
of the test database.
The slight detection superiority of the OSORU basis set is particularly
impressive given that several factors in the present study acted to boost the likelihood of response
detection under the sinusoidal approach. For instance, we purposely extended the sinusoidal set used
previously in other studies to include the 4th harmonics, so as to increase the likelihood of detecting
phasic responses. In addition, the actual responses were such that the signal power tended to be
concentrated at either the 1st, or the 2nd and 4th harmonics. This concentration of signal power at
certain frequencies occurred because (1) responses tended to be either sustained, or biphasic (with
onset and offset transients), and (2) the “on” and “off” stimulus epochs were of equal length, thus
yielding responses with a certain “symmetry” that increases the power at the harmonics. In contrast,
for responses consisting of mainly a single transient (e.g., Figure 3-5, 4th row, 1st column), or in
paradigms with unequal duration “on” and “off” epochs, more of the signal power will be dispersed
to higher frequencies, and consequently, the OSORU basis set would enjoy an even greater
106
Chapter 3: Physiologically-motivated GLM
advantage over the sinusoidal set.9 Overall, in terms of activation detection, the OSORU basis set
performed slightly better than the sinusoidal alternative, which itself was optimized to do well in the
current context.
Altogether, the comparison of the OSORU basis set with the sinusoidal and
sustained-only sets indicates that the OSORU set had the flexibility to model a variety of response
waveforms, and yet was concise enough to maintain good statistical power.
A challenging database provided a strong test of the OSORU basis set
The auditory cortical responses used to test the OSORU basis set were an important element
of the present study. The varied temporal dynamics of these data posed a significant challenge for
detection. Despite this challenge, the OSORU basis set performed well.
Whether other approaches with the potential for identifying responses with a range of
dynamics (e.g., wavelet analysis, fuzzy clustering, or principle component analysis) would also
perform well on auditory cortical data is an open question. For the most part, these techniques have
been tested on less challenging data, typically sustained responses, either simulated or measured
(Sychra et al. 1994; Baumgartner et al. 1998; Brammer 1998; Golay et al. 1998; Chuang et al. 1999;
Fadili et al. 2000; von Tscharner and Thulborn 2001). Applying these techniques to a broader range
of waveforms, such as those found in auditory cortex, would provide a far stronger test of their
detection capabilities.
Detecting and mapping response dynamics
One advantage of the OSORU basis set is that it can facilitate the detection of as yet
unidentified responses with forms that might be logically hypothesized based on already identified
9
Note that these comments apply equally well to Fourier analyses that are applied in the frequency domain.
Even though a stimulus paradigm may be completely periodic, this does not imply that a Fourier F-test (e.g.,
Purdon and Weisskoff 1998; Marchini and Ripley 2000) will detect all periodic responses with equal statistical
power. The important criterion is that the frequencies represented in the response be concentrated at the
frequencies used for the F-test. Indeed, a Fourier F-test at the (single) frequency representing the paradigm
periodicity is mathematically equivalent to the application of the general linear model using just a single sine
and cosine term – an approach which has relatively poor power for detecting phasic or transient (yet periodic)
responses.
Discussion
107
waveforms. For instance, the sharp decline in signal that forms the phasic response onset component
may reflect underlying neural adaptation, and the off component likely represents neural responses to
the offset of sound (Chap. 2). It is possible that neuronal populations exhibiting these responses are
not entirely co-localized, or that onset and offset responses do not occur together for every sound
(e.g., Figure 3-5, 4th row, 1st column). A GLM implemented with the OSORU basis set provides a
straightforward way to test these ideas, both qualitatively and statistically. Since the GLM provides
estimates of the response components complete with variances, statistical inference can be
performed, such as examining whether a given component is significantly different from zero on a
voxel-by-voxel basis. This statistical rigor is one advantage of using the OSORU basis set for both
detection and response quantification, rather than first detecting responses (e.g., using a “nonphysiological model” such as the sinusoidal basis set) and then extracting different measures from
response waveforms. There is also a certain parsimony, as well as computational efficiency, in
obtaining response measures from the same model used to identify responding brain areas.
Using the OSORU basis set, it is a straightforward matter to construct a variety of maps that
may reveal spatial variations in response physiology. Mapping the relative amounts of transient and
sustained activity using a parameter such as the “waveshape index” (as in Figure 3-6) is one such
example. Activation maps could also be constructed based on individual OSORU basis functions or
linear combinations thereof, e.g., only the sustained component, or the sum (or difference) of the
onset and offset components. The correlation between basis functions does not invalidate such an
approach, but simply requires that maps constructed from isolated components be interpreted
appropriately within the context of the full basis set in which the parameters were originally
estimated (Andrade et al. 1999). In the case of non-linear combinations of components, such as the
waveshape index, statistical inference is not automatically supported. However, using the estimated
covariance of the basis functions and Monte-Carlo simulation techniques it would be possible to
estimate, for instance, whether two waveshape indices are statistically different from one another.
An alternative to mapping different response components would be to visually examine the
responses for individual voxels and then attempt a mental synthesis across voxels to extract any
spatial trends. However, in cases with a large number of “active” voxels, this approach becomes
unwieldy and the shear volume of data to be synthesized makes it difficult to identify any trends with
108
Chapter 3: Physiologically-motivated GLM
position. In contrast, the approach of extracting and mapping particular OSORU components (or
combinations of components) provides a consolidated view of brain responses that can readily reveal
positional variations in underlying physiology (e.g., Figure 3-6).
In searching for responses with unknown temporal dynamics, there is no single, optimal
basis set, and using two or more complementary detection approaches will likely be beneficial.
GLMs implemented with the OSORU and sinusoidal basis sets form one such complementary pair.
Since the OSORU basis set performed better than the sinusoidal set on the database of the present
study, it may also perform better in cases where similar responses are produced. However, the slight
superiority of the OSORU set derives from tailoring the basis functions to known response features.
Therefore, the relative effectiveness of the OSORU and sinusoidal sets might well reverse if the
underlying responses differ sufficiently from those represented in the current database. The OSORU
basis set, unlike the sinusoidal set, has the advantage that it is easily extended to paradigms that are
not strictly periodic (e.g., paradigms in which the durations of the epochs may vary somewhat across
stimulus presentations, or may vary between the different stimuli). However, unlike the OSORU
basis set, the sinusoidal set offers flexibility for detecting responses with variable temporal delays.
Similar “temporal flexibility” could be incorporated into an OSORU-like physiological basis set by
allowing variations in the latency of the different functions. However, this would require non-linear,
iterative estimation techniques, which are computationally intensive, and one would no longer be
able to take advantage of the well-developed theoretical framework for linear models that allows
estimation of appropriate statistics given correlated noise.
Thus, the OSORU and sinusoidal
approaches are complimentary in several respects. So, using both, rather than either alone, will
increase the likelihood that all viable response waveforms are detected.
Previous implementations of the general linear model within a physiological
framework
Within the context of the general linear model, there appears to be little previous discussion
regarding the selection of a set of basis functions that might be advantageous from a physiological
perspective. Sobel et al. (2000) confirmed the presence of response habituation over the course of a
Discussion
109
prolonged odorous stimulus in olfactory cortex using a reference waveform that modeled
habituation. However, they did not attempt to define a broader set of more general basis functions
that could simultaneously model multiple forms of response. Giraud et al. (2000) observed transient
responses in auditory cortex at the transition between amplitude-modulated and continuous noise.
These authors handled the situation with an analysis that included three entirely distinct response
models (a sustained response model, a transient response model, and mixed model), rather than
synthesizing all the possible responses into a single basis set. In short, there have been only a few
previous attempts to implement the GLM within any kind of physiological framework, and none,
prior to the present study, has attempted a comprehensive basis set capable of accommodating an
extensive range of response forms.
Physiologically-based implementations of the GLM: broad applicability to any brain
system
While the present study focused specifically on the auditory system, it is important to
recognize that the GLM implemented with the OSORU basis set is equally applicable to any brain
system. There are certainly documented cases of non-sustained responses outside the auditory
system (Bandettini et al. 1997; Hoge et al. 1999; Nakai et al. 2000; Sobel et al. 2000). There are also
hints that responses ranging from sustained to phasic may occur for other sensory systems. For
instance, measurements of time-average signal as a function of increasing repetition rate show a
downturn at high rates in the auditory (Chap. 2), somatosensory (Ibanez et al. 1995; Takanashi et al.
2001), and visual systems (Fox and Raichle 1984, 1985; Kwong et al. 1992; Mentis et al. 1997;
Thomas and Menon 1998; Zhu et al. 1998). In the auditory system, this decrease occurs because the
response changes from sustained to phasic. A similar change in response waveform may explain the
decrease in time-average signal in other sensory systems (rather than just a sustained response that
decreases in amplitude with increasing rate). Whether this fundamental change in response “mode”
indeed occurs in other sensory systems could be tested by detecting and quantifying responses using
the OSORU basis set. At a broader level, one can imagine devising alternative physiological basis
sets that are tailored to the particular temporal dynamics of a given brain system, or “tuning” the
110
Chapter 3: Physiologically-motivated GLM
OSORU set to the particular properties of different brain systems (e.g., by adjusting the timing of the
onset and offset basis functions).
Additional basis functions may also be added, reflecting
physiological processes not conceptualized in the current framework. Overall, physiologicallymotivated basis sets should prove useful in detecting and quantifying responses throughout the brain.
Acknowledgements
The authors gratefully thank John Guinan, Anders Dale, Patrick Purdon, Irina Sigalovsky,
Monica Hawley, and Rick Hoge for numerous helpful comments and suggestions; Doug Greve for
providing the Matlab code that formed the basis of our GLM implementation; and Barbara Norris,
for considerable assistance in figure preparation.
Support for this study was provided by
NIH/NIDCD P01DC00119, T32DC00038, and a Martinos Scholarship.
References
111
REFERENCES
Andersen AH, Gash DM and Avison MJ. Principal component analysis of the dynamic response
measured by fMRI: A generalized linear systems framework. Magn Reson Imaging 17: 795-815,
1999.
Andrade A, Paradis AL, Rouquette S and Poline JB. Ambiguous results in functional
neuroimaging data analysis due to covariate correlation. Neuroimage 10: 483-486, 1999.
Ardekani BA and Kanno I. Statistical methods for detecting activated regions in functional MRI of
the brain. Magn Reson Imaging 16: 1217-1225, 1998.
Ardekani BA, Kershaw J, Kashikura K and Kanno I. Activation detection in functional MRI
using subspace modeling and maximum likelihood estimation. IEEE Trans Med Imaging 18: 101114, 1999.
Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW,
Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes
during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997.
Baumgartner R, Windischberger C and Moser E. Quantification in functional magnetic
resonance imaging: Fuzzy clustering vs. Correlation analysis. Magn Reson Imaging 16: 115-125,
1998.
Brammer MJ. Multidimensional wavelet analysis of functional magnetic resonance images. Hum
Brain Mapp 6: 378-382, 1998.
Bullmore E, Brammer M, Williams SCR, Rabe-Hesketh S, Janot N, David A, Mellers J,
Howard R and Sham P. Statistical methods of estimation and inference for functional MR image
analysis. Magn Reson Med 35: 261-277, 1996.
Bullmore E, Long C, Suckling J, Fadili J, Calvert G, Zelaya F, Carpenter TA and Brammer
M. Colored noise and computional inference in neurophysiological (fMRI) time series analysis:
Resampling methods in time and wavelet domains. Hum Brain Mapp 12: 61-78, 2001.
Burock MA and Dale AM. Estimation and detection of event-related fMRI signals with temporally
correlated noise: A statistically efficient and unbiased approach. Hum Brain Mapp 11: 249-260,
2000.
Chuang KH, Chiu MJ, Lin CC and Chen JH. Model-free functional MRI analysis using kohonen
clustering neural network and fuzzy c-means. IEEE Trans Med Imaging 18: 1117-1128, 1999.
Constable RT, Skudlarski P and Gore JC. An ROC approach for evaluating functional brain MR
imaging and postprocessing protocols. Magn Reson Med 34: 57-64, 1995.
Draper NR and Smith H. Applied regression analysis. New York: John Wiley & Sons, 1981.
Fadili MJ, Ruan S, Bloyet D and Mazoyer B. A multistep unsupervised fuzzy clustering analysis
of fMRI time series. Hum Brain Mapp 10: 160-178, 2000.
Fomby TB, Hill RC and Johnson SR. Advanced econometric methods. New York: SpringerVerlag, 1984.
112
Chapter 3: Physiologically-motivated GLM
Fox PT and Raichle ME. Stimulus rate dependence of regional cerebral blood flow in human striate
cortex, demonstrated by positron emission tomography. J Neurophysiol 51: 1109-1120, 1984.
Fox PT and Raichle ME. Stimulus rate determines regional brain blood flow in striate cortex. Ann
Neurol 17: 303-305, 1985.
Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial
registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995a.
Friston KJ, Frith CD, Frackowiak RSJ and Turner R. Characterizing dynamic brain responses
with fMRI: A multivariate approach. Neuroimage 2: 166-172, 1995b.
Friston KJ, Holmes AP, Worsley KJ, Poline J-P, Frith CD and Frackowiak RSJ. Statistical
parametric maps in functional imaging: A general linear approach. Hum Brain Mapp 2: 189-210,
1995c.
Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in
fMRI time-series. Magn Reson Med 35: 346-355, 1996.
Friston KJ, Zarahn E, Holmes AP, Rouquette S and Poline J-B. To smooth or not to smooth?
Bias and efficiency in fMRI time-series analysis. Neuroimage 12: 196-208, 2000.
Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt
A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000.
Golay X, Kollias S, Stoll G, Meier D, Valavanis A and Boesiger P. A new correlation-based fuzzy
logic clustering algorithm for fMRI. Magn Reson Med 40: 249-260, 1998.
Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S,
Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain
Mapp 6: 33-41, 1998.
Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD
and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999.
Ibanez V, Deiber MP, Sadato N, Toro C, Grissom J, Woods RP, Mazziotta JC and Hallett M.
Effects of stimulus rate on regional cerebral blood flow after median nerve stimulation. Brain 118:
1339-1351, 1995.
Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy
DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic
magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl
Acad Sci 89: 5675-5679, 1992.
Ljung GM and Box GEP. On a measure of lack of fit in time series models. Biometrika 65: 297303, 1978.
Locascio JJ, Jennings PJ, Moore CI and Corkin S. Time series analysis in the time domain and
resampling methods for studies of functional magnetic resonance brain imaging. Hum Brain Mapp 5:
168-193, 1997.
Marchini JL and Ripley BD. A new statistical approach to detecting significant activation in
functional MRI. Neuroimage 12: 366-380, 2000.
Mentis MJ, Alexander GE, Grady CL, Horwitz B, Krasuski J, Pietrini P, Strassburger T,
Hampel H, Schapiro MB and Rapoport SI. Frequency variation of a pattern-flash visual stimulus
during PET differentially activates brain from striate through frontal cortex. Neuroimage 5: 116-128,
1997.
References
113
Miezin FM, Maccotta L, Ollinger JM, Petersen SE and Buckner RL. Characterizing the
hemodynamic response: Effects of presentation rate, sampling procedure, and the possibility of
ordering brain activity based on relative timing. Neuroimage 11: 735-759, 2000.
Nakai T, Matsuo K, Kato C, Takehara Y, Isoda H, Moriya T, Okada T and Sakahara H. Poststimulus response in hemodynamics observed by functional magnetic resonance imaging--difference
between the primary and sensorimotor area and the supplementary motor area. Magn Reson Imaging
18: 1215-1219, 2000.
Paradis AL, Mangin JF, Bloch I, Cornilleau-Peres V, Moulines E, Frouin V and Le Bihan D.
Detection of periodic signals in brain echo-planar functional images. 18th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, Amsterdam, 1996, p. 696697.
Percival DB and Walden AT. Spectral analysis for physical applications: Multitaper and
conventional univariate techniques. Cambridge: Cambridge University Press, 1993.
Purdon PL and Weisskoff RM. Effect of temporal autocorrelation due to physiological noise and
stimulus paradigm on voxel-level false-positive rates in fMRI. Hum Brain Mapp 6: 239-249, 1998.
Purdon PL, Solo V, Weisskoff RM and Brown EN. Locally regularized spatiotemporal modeling
and model comparison for functional MRI. Neuroimage 14: 912-923, 2001.
Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional
magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and
body. J Acoust Soc Am 109: 216-231, 2001.
Skudlarski P, Constable RT and Gore JC. ROC analysis of statistical methods used in functional
MRI: Individual subjects. Neuroimage 9: 311-329, 1999.
Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE.
Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol
83: 537-551, 2000.
Sychra JJ, Bandettini PA, Bhattacharya N and Lin Q. Synthetic images by subspace transforms.
I. Principal components images and related filters. Med Phys 21: 193-201, 1994.
Takanashi M, Abe K, Yanagihara T, Oshiro Y, Watanabe Y, Tanaka H, Hirabuki N,
Nakamura H and Fujita N. Effects of stimulus presentation rate on the activity of primary
somatosensory cortex: A functional magnetic resonance imaging study in humans. Brain Res Bull
54: 125-129, 2001.
Thomas CG and Menon RS. Amplitude response and stimulus presentation frequency response of
human primary visual cortex using BOLD EPI at 4 T. Magn Reson Med 40: 203-209, 1998.
von Tscharner V and Thulborn KR. Specified-resolution wavelet analysis of activation patterns
from BOLD contrast fMRI. IEEE Trans Med Imaging 20: 704-714, 2001.
Zarahn E, Aguirre GK and D'Esposito M. Empirical analyses of BOLD fMRI statistics. I.
Spatially unsmoothed data collected under null-hypothesis conditions. Neuroimage 5: 179-197,
1997.
Zhu XH, Kim SG, Andersen P, Ogawa S, Ugurbil K and Chen W. Simultaneous oxygenation
and perfusion imaging study of functional activity in primary visual cortex at different visual
stimulation frequency: Quantitative correlation between BOLD and CBF changes. Magn Reson Med
40: 703-711, 1998.
114
Chapter 4
The temporal envelope of sound determines the
time-pattern of fMRI responses in human auditory
cortex
INTRODUCTION
Spatial and temporal codes are two general schemes for representing various features of
sensory stimuli in the brain. Prominent examples of spatial coding include the orderly tonotopic,
retinotopic, and somatotopic mappings in cortical and subcortical structures of the auditory, visual,
and somatosensory systems. For example, in the auditory system, neurons in the auditory nerve are
preferentially tuned to a particular sound frequency due to the filtering properties of the cochlea, and
this tuning is maintained in orderly spatial maps of “best frequency” in structures ranging from the
cochlear nucleus to auditory cortical areas. In addition to any existing spatial organization, the time
course of neural activity encodes information about the temporal aspects of a stimulus, such as its
duration, or the timing between successive stimuli. An example of temporal coding is the manner in
which neural firing synchronizes to the amplitude modulation in an acoustic stimulus, up to certain
limiting modulation rates (e.g., Schreiner and Urbas 1988; Phillips et al. 1989; Langner 1992).
Much of this knowledge regarding the spatio-temporal encoding of stimulus features comes from
115
116
Chapter 4: Sound envelope determines fMRI time-pattern
microelectrode recordings in animals, which may not directly apply to humans in all respects, due to
anesthesia and interspecies differences. Thus, it is important to study stimulus coding directly in
humans.
Functional magnetic resonance imaging (fMRI) is one non-invasive technique for studying
the activity patterns of the human brain. The fMRI response reflects hemodynamic changes that
arise from changes in neural activity (i.e., neural spiking, and excitatory and inhibitory synaptic
activity; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995; Heeger et al.
2000; Rees et al. 2000; Logothetis et al. 2001). These hemodynamic changes occur on a time-scale
of seconds. Thus, the fMRI response essentially reflects the time-envelope of population neural
activity in a local region (i.e., voxel) of the brain. Because of the high spatial resolution (~1 mm) of
fMRI compared to other neuroimaging techniques, neurophysiological investigations using fMRI
have primarily focused on the representation of information in spatial patterns of brain activity.
Indeed, fMRI studies have provided direct evidence in humans regarding tonotopic (Talavage et al.
2000), retinotopic (Sereno et al. 1995), and somatotopic mappings (Rao et al. 1995). In contrast, the
potential for fMRI to uncover representations in the temporal patterns of activity has generally been
ignored.
That fMRI can provide information concerning temporal coding is strongly suggested by
two recent studies showing dramatic, sound-dependent changes in the waveshape of fMRI responses
from human auditory cortex (Chap. 2; Giraud et al. 2000). For instance, a low rate (2/s) noise burst
train generates cortical fMRI responses that are primarily “sustained”, with a signal level that stays
elevated throughout the train duration. In contrast, noise burst trains at higher rates (35/s) elicit a
“phasic” response, characterized by signal peaks after train onset and offset (Chap. 2). While
sustained responses are well-known to occur for various sounds (e.g., speech, music), phasic
responses are a relatively new discovery, and the types of sounds that elicit them are largely
unknown.
The wide range of fMRI response waveshapes seen in the previous studies indicates
substantial differences in the time-pattern of population neural activity for low vs. high-rate noise
burst trains, and hence differences in the neural coding of these sounds. The sustained responses for
Introduction
117
low rate trains suggest ongoing neural activity throughout the train. In contrast, the signal decline
that forms the initial peak of phasic responses is highly suggestive of strong neural adaptation during
the first seconds of a high-rate train, while the peak after train offset suggests strong neural offresponses. The resulting concentration of neural activity at the onset and offset of high, but not low,
rate trains is especially interesting in light of the perceptual differences between these sounds. The
noise bursts of low rate trains can be discerned individually, whereas those of high-rate trains fuse to
form a continuous (but modulated) percept. Thus, the peaks in neural activity at sound onset and
offset in the high-rate case delineate the endpoints of a sound sequence whose elements are grouped
across time so as to form a single auditory object. This qualitative connection between perception
and brain activity raises the possibility that auditory objects are generally delineated in the temporal
envelope of population neural activity, with neural adaptation and off-responses subserving this
delineation. The present study begins to investigate these ideas in human listeners using fMRI
response waveshape as an assay of population neural activity in auditory cortex. We specifically
explore the relationship between fMRI waveshape and physical sound features, since these features
ultimately determine a sound’s perceptual characteristics.
Since so little is known about phasic fMRI responses, we began by establishing that this
novel response form is actually produced by a variety of sounds. We then proceeded to determine
exactly which aspects of sound most strongly influence fMRI response waveshape, a fundamental
issue left unresolved by the previous work. For instance, in our previous study, noise burst duration
was held constant while rate was varied, so stimulus sound-time fraction (STF) co-varied with rate,
raising the possibility that rate was not the primary sound feature coded in the changing fMRI timepatterns. Both our previous study and that of Giraud et al. (2000) considered only broadband stimuli
at one stimulus level, leaving open the additional possibility that sound bandwidth or level is also
coded via dramatic temporal changes in population neural activity. The present study systematically
examined the relationship between specific sound features and the time-envelope of population
neural activity seen with fMRI.
Two complementary sets of experiments were performed for the present study. The main set
explored responses produced by sounds ranging from simple stimuli like tone or noise burst trains to
118
Chapter 4: Sound envelope determines fMRI time-pattern
more complex stimuli like speech and music. For these experiments, the acoustic noise during fMRI
was handled by essentially trading spatial coverage of widespread auditory cortical areas for the
ability to investigate many stimuli per experiment. The approach involved imaging a single slice
that targeted posterior auditory cortex, including primary auditory cortex on Heschl's gyrus and the
immediately lateral non-primary areas of the superior temporal gyrus. Through these experiments
we established that phasic, as well as sustained responses occur for a variety of different stimuli. We
further determined that sound temporal envelope characteristics (rate and sound-time fraction), but
not level or bandwidth, are strongly coded in the time-pattern of fMRI activation in posterior
auditory cortex.
The second set of experiments studied two sounds with very different temporal
characteristics to test whether the coding of stimulus temporal characteristics in response waveshape
applied for other cortical areas besides posterior auditory cortex. In these experiments the fMRI
acoustic noise was handled using an approach that favored the coverage of widespread areas over the
use of a large number of stimuli. Specifically, the approach involved clustered volume acquisition
with a long (8 s) intercluster interval and a temporal sampling scheme that enabled the reliable
reconstruction of fMRI response waveshape (e.g., Robson et al. 1998; Belin et al. 1999). These
experiments showed that phasic, as well as sustained responses occur in widespread cortical areas.
Moreover, they indicate that sound temporal envelope characteristics are strongly represented in the
time-pattern of fMRI activation throughout auditory cortex.
METHODS
Twenty-six subjects participated in forty total imaging sessions. Subjects ranged in age from
21 to 38 years (mean ~26 years). Seventeen of the subjects were male and twenty-two were righthanded. Subjects had no known audiological or neurological disorders.
The majority of imaging sessions (37) examined response waveshape in posterior auditory
cortex (“single-slice experiments”), using a variety of stimuli.
The three remaining sessions
examined waveshape throughout auditory cortex (“multislice experiments”) using two of these
Methods
stimuli.
119
Most of the imaging sessions (25) were conducted expressly for the present study.
However, some were part of our previous investigation of repetition rate (5 sessions; Chap. 2) or a
separate study that examined the dependencies of activation on sound level (10 sessions; Sigalovsky
et al. 2001). The data from these sound level experiments were acquired while subjects performed a
task that was not included for the other experiments. Therefore, these data are only compared with
each other (i.e., in RESULTS: “Insensitivity of waveshape to sound level in posterior auditory
cortex”), except to note a possible effect of task on waveshape. All studies were approved by the
institutional committees on the use of human subjects at the Massachusetts Institute of Technology,
Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital, and all subjects gave
their written informed consent.
Stimuli
All stimuli were presented binaurally and had a total duration of 30 s. They consisted of
trains of broadband noise bursts with various rates and sound-time fractions, trains of narrowband
noise bursts, trains of tone bursts, trains of clicks, continuous broadband noise, orchestral music, and
speech.
Trains of broadband noise bursts – Bursts of uniformly distributed white noise were
presented in a 30 s train at repetition rates of 2/s, 10/s, and 35/s. The bursts had a rise and fall time
of 2.5 ms. They were usually 25 ms in duration (full width half maximum), resulting in sound-time
fractions (STFs) of 5%, 25%, and 88% for the 2/s, 10/s, and 35/s trains, respectively. However, in
some sessions, 2/s and/or 35/s trains with other STFs were studied – specifically: 2/s and 35/s trains
with an STF of 50% (burst duration for the 2/s train: 250 ms; for the 35/s train: 14.3 ms) and 35/s
trains with an STF of 25% (burst duration: 7.1 ms). The repeated bursts within a train were identical
(i.e., “frozen”), but they differed across trains (except for two sessions from our previous
investigation of repetition rate, in which the noise burst was frozen throughout an entire imaging
“run”, but differed across runs).
Trains of narrowband bursts – Two types of narrowband stimuli were examined: tone bursts
and narrowband (third octave) noise bursts. The bursts were presented in a train at either 2/s or 35/s.
120
Chapter 4: Sound envelope determines fMRI time-pattern
Burst center frequency was either 500 Hz or 4 kHz (rise and fall time: 2.5 ms; duration 25 ms). The
repeated bursts were identical within a train, but differed across trains (and runs).
Continuous noise – The continuous noise was uniformly distributed and white (i.e.,
uncorrelated) across its entire 30 s duration. Thus, there was no repetition in the temporal fine
structure of the continuous noise, in contrast with the “frozen” noise burst trains.
Trains of clicks – Clicks were presented in a train at rates of either 35/s or 100/s. The
duration of the individual clicks was ~100 usec.
Running speech – The speech stimulus was created by concatenating “conversational”
sentences taken from the Harvard IEEE Corpus (IEEE 1969).1 The same male professional speaker
spoke all sentences. The amplitude envelope of the speech was low-pass, with a power spectral
density 10 dB down at 5 Hz relative to its peak at 1.3 Hz.
Orchestral music – The music stimulus was the first 30 s of the fourth movement in
Beethoven Symphony No. 7. The maximum power in the music amplitude envelope occurred at
0.69 Hz, with harmonics at 1.2, 2.5, and 4.9 Hz that were all within 10 dB of the power at 0.69 Hz.
Stimulus level
Except in sessions that examined the effects of sound level, levels were approximately 55 dB
above threshold (SL), determined separately for each ear and stimulus (to within 5 dB) in the scanner
room immediately prior to the imaging session. The resulting sound pressure levels ranged from ~60
to 90 dB SPL, with the majority of cases (> 70%) falling in the range of 70 – 85 dB SPL. (SPL was
computed based on the root-mean-square of the entire 30 s stimulus, after first filtering by the
frequency response of the sound delivery system).
During both threshold determination and
functional imaging, there was an on-going low-frequency background noise produced primarily by
the pump for the liquid helium (see “Sound delivery” below).
1
The recordings were obtained from the Research Laboratory of Electronics, Massachusetts Institute of
Technology.
Methods
121
In sessions that examined the effects of stimulus level (a total of 12), level was varied over a
30 to 40 dB range. The stimuli that were varied in level were (1) 35/s (88% STF) noise burst trains
(studied at 40, 55, 70 dB SL; 2 sessions), (2) orchestral music (30, 50, 60 dB SL; 5 sessions), and (3)
continuous noise (35, 45, 55, 65, 75 dB SL; 5 sessions; 3 – 4 levels were studied in any given
session2). The latter data (for music and continuous noise) were collected in our separate study of
sound level. The absolute sound pressure levels for the level sessions ranged from ~60 to 100 dB
SPL.
Task
Subjects were instructed to listen “attentively” to the stimuli. Subjects were monitored to
ensure that they remained alert throughout an experiment (typically via a non-verbal signal from the
subject at the end of each imaging run in response to a question from the experimenter).
In sessions from our separate study of sound level (using continuous noise and music),
subjects performed an additional task. Specifically, at the beginning and end of each 30 s stimulus
“on” period, subjects controlled a knob to turn on or off an array of lights (Melcher et al. 2000).
Sound delivery
Stimuli were delivered binaurally through a headphone assembly that also attenuated
scanner-generated sounds. Specifically, stimuli were produced by a D/A board (running under
LabView), amplified, and fed to a pair of piezoelectric transducers. For our previous study of sound
level using continuous noise and music, the transducers were incorporated directly into sound
attenuating earmuffs placed over the subject’s ears (sound delivery system I; custom built by GEC
Marconi, Inc.). For all remaining sessions, the transducers were housed in a shielded box adjacent to
the scanner (system II). In this latter setup, the output of the transducers reached earmuffs placed
2
Specifically, three of these five sessions used levels of 35, 55, 65, and 75 dB SL, one session used 35, 45, 55,
and 65 dB SL, and one session used 35, 55, and 75 dB SL.
122
Chapter 4: Sound envelope determines fMRI time-pattern
over the subject's ears via air-filled tubes. The frequency response of both systems, measured at the
subject’s ears, was low pass with a cutoff frequency of 10 kHz (system I) or 6 kHz (system II).
Acoustic stimulation paradigm
In each imaging session, responses were measured for between 2 to 6 different stimuli.
Stimuli were presented during 30 second “on” periods, alternated with 30 s “off” periods during
which no auditory stimulus was presented. Stimulus presentation was organized into individual
“runs” composed of 4 – 5 such on/off “blocks”. The different stimuli in an imaging session were
typically presented once each run, and their order was varied across runs. However, in the following
cases the same stimulus was repeated within an imaging run (but consecutive runs used different
stimuli): 1) all repetitions of the orchestral music, 2) our previous sound level experiments using
continuous noise or music (i.e., same level and stimulus type was presented throughout each run), 3)
two of the sessions from our previous study of repetition rate, and 4) the multislice experiments. For
the single-slice experiments, the various stimuli within a session were repeated an equal number of
times (7 to 13). An exception was the music stimulus, which was typically repeated just 4 times (in a
single imaging run collected at either the beginning or end of the functional imaging).3 For the
multislice experiments, there were 32 – 40 repetitions of the high rate (35/s) noise burst train, and 8
repetitions of the music stimulus. The reason for the fewer music repetitions in both the single and
multislice experiments is that music generally evokes robust responses, so fewer stimulus repetitions
were necessary.
Handling scanner acoustic noise
The earmuffs of the sound delivery systems attenuated the two main types of scanner
acoustic noise. The two types of scanner noise are: (1) an ongoing low-frequency background noise
produced primarily by the pump for the liquid helium (used to supercool the magnet coils), and (2)
3
However, for the instances of music from our previous study of sound level, there were always eight
repetitions of the music (i.e., eight on/off blocks) at a given level.
Methods
123
gradient noise generated by flexing of the gradient coils. Accounting for the attenuation provided by
the earmuffs, the pump noise reaches levels of ~60 dB SPL in the frequency range of 50-300 Hz
(Ravicz et al. 2000; Ravicz and Melcher 2001). The short duration “beep” of the gradient noise
reached peak levels of approximately 85 dB SPL at ~1.0 kHz on the 1.5T scanners and 95 dB SPL at
~1.4 kHz on the 3.0T scanners (both values are again estimates for the SPL at subjects’ ears, after the
attenuation provided by earmuffs).
The gradient noise was further handled in two complementary ways. For the majority of
sessions (“single-slice experiments”), a single slice was imaged in order to reduce the impact of the
scanner-generated acoustic noise on auditory activation while still allowing two other goals to be met
simultaneously. These goals were to 1) investigate many stimuli per session, and 2) maintain a
temporal sampling sufficient to capture the time-pattern of the fMRI response. For the three
experiments imaging multiple slices (“multislice experiments”), all the slices of the functional
volume were acquired within a brief interval (< 1 s) once every 8 s (TR).
This “clustered
acquisition” of the slices within a volume, in conjunction with a long interval between clusters,
reduces the impact of the scanner noise on the responses to sound stimuli (Belin et al. 1999;
Edmister et al. 1999; Elliott et al. 1999; Hall et al. 1999). To maintain “temporal sampling” in spite
of the long interval between volume acquisitions, the onset of the stimulus relative to the first
volume acquisition was staggered by 2 s increments from run to run (e.g., Robson et al. 1998; Belin
et al. 1999). Consequently, across the multiple runs for a given stimulus, the functional data in toto
included samples acquired every 2 s relative to the stimulus. However, restoring temporal sampling
in this manner required an increased number of presentations per stimulus, thus limiting the number
of stimuli that could be studied per session to just two (music and 35/s, 88% STF noise burst trains).
Several pieces of evidence indicate that scanner gradient noise had little or no effect on
response waveshape. For the multislice experiments, the image acquisitions mainly reflect the
response to the sound stimulus for two reasons: (1) the long interval between clusters allows the
response evoked by the acoustic noise of a given cluster to decay appreciably before the next cluster
(Edmister et al. 1999; Hall et al. 2000), and (2) the time-delay of the fMRI response means that the
response to a given cluster does not occur until after the cluster has already ended (Talavage et al.
124
Chapter 4: Sound envelope determines fMRI time-pattern
1999; Hall et al. 2000). The fact that the multislice and single-slice experiments of the present study
showed no discernable difference in waveshape for either music or 35/s noise bursts (c.f., Figures
4-9 and 4-10 to Figure 4-2) indicates that the gradient noise during the single-slice experiments also
had little or no effect on waveshape. Finally, in a previous examination of the effect of TR duration
on waveshape (in two subjects), we found that 35/s noise burst trains evoked phasic responses
regardless of whether the response was constructed from runs (all single-slice) with a TR of either 2
s or 8 s (Harms and Melcher 1999).
Imaging
Subjects were imaged using whole-body scanners and standard head coils (either transmit/
receive or receive-only) while resting supine. Head motion was limited using either 1) a bite bar
custom-molded to the subject’s teeth and mounted to the head coil, or 2) pillow and foam padding
packed snugly around the subject’s head. Each imaging session lasted ~ 2 hours.
Single-slice experiments – Imaging was performed on several different scanners, due to both
equipment changes beyond the authors’ control and a desire to take advantage of higher field
strength magnets when possible. Specifically, imaging was performed on five different systems: a
1.5T or 3.0T General Electric scanner retrofitted for high-speed, echo-planar imaging (by Advanced
NMR Systems, Inc.; 1.5T: 10 sessions; 3.0T: 6 sessions), a 1.5T General Electric Signa Horizon
scanner (10 sessions), a 1.5T Siemens Sonata scanner (5 sessions) and a 3.0T Siemens Allegra
scanner (6 sessions). To check whether response waveshape was influenced by the imaging system
employed, we computed average responses for each imaging system for the three stimuli that were
each studied in at least three sessions on at least four systems: 2/s noise bursts (5% STF), 35/s noise
bursts (88% STF), and orchestral music. To focus on response waveshape, we then normalized these
average responses to a peak of one, so as to remove possible amplitude differences related to
imaging system (Ogawa et al. 1993; Gati et al. 1997; Fujita 2001).
There were no obvious
differences in waveshape between systems. To the extent that there was any hint of inter-system
differences, they were far smaller than the differences in waveshape that result from changes in
stimulus temporal characteristics (e.g., from 2/s to 35/s noise bursts). Furthermore, there did not
Methods
125
appear to be any systematic relationship between waveshape and either field strength or magnet
manufacturer. Importantly, our analysis of the dependence of response waveshape on particular
stimulus variables is based on intra-session comparisons across stimuli, for which the imaging
system was obviously constant.
The brain slice to be functionally imaged was selected using contiguous sagittal images of
the whole head that were acquired at the beginning of each imaging session. The selected slice
intersected the inferior colliculus and the posterior aspect of Heschl's gyrus and the superior temporal
gyrus (a predominantly coronal slice plane). When there appeared to be multiple Heschl's gyri, we
used the anterior one (which includes primary auditory cortex) to position the slice (Penhune et al.
1996; Leonard et al. 1998).
Functional images of the selected slice were acquired using a blood oxygenation level
dependent (BOLD) sequence. For the 1.5T experiments, the sequence parameters were: asymmetric
spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o. For the 3.0T experiments the parameters were:
gradient echo, TE = 30 ms (except one session used 40 ms and another used 50 ms), flip = 600 or
900. The beginning of each functional run included four discarded images to ensure that image
signal level had approached a steady state. Slice thickness was always 7 mm with an in-plane
resolution of 3.1 x 3.1 mm. A T1-weighted anatomical image (in-plane resolution = 1.6 x 1.6 mm,
thickness = 7 mm) of the functionally imaged slice was also obtained in all sessions and used to
localize auditory cortex.
While the present paper focuses on responses from auditory cortex, our experiments imaging
a single slice were designed to also examine the inferior colliculus. Therefore, functional images
were generally collected using a cardiac gating method that increases the detectability of activation
in the inferior colliculus (Guimaraes et al. 1998). Image acquisitions were synchronized to every
other QRS complex in the subject's electrocardiogram, resulting in an average interimage interval
(TR) of approximately 2 s. Image signal was corrected to account for the image-to-image variations
in signal strength (i.e., T1 effects) that result from fluctuations in heart rate (Guimaraes et al. 1998).
In the 2 sessions that did not use cardiac gating, the TR was 2 s.
126
Chapter 4: Sound envelope determines fMRI time-pattern
Multislice experiments –The multislice experiments were conducted on the 3.0T General
Electric scanner. The imaged volume consisted of 10 contiguous slices (in-plane resolution: 3.1 x
3.1 mm; slice thickness: 7 mm), one of which passed through the same “inferior colliculus, Heschl’s
gyrus” slice plane used in the single-slice experiments. The beginning of each functional run
included one discarded image volume. Functional images were acquired with a gradient echo
sequence (TE = 30 ms, flip = 600).
Image pre-processing
Prior to response detection, the following pre-processing steps were performed. First, the
images for each scanning run were corrected for any in-plane movements of the head that may have
occurred over the course of the imaging session. Specifically, each functional image of a session
was translated and rotated to fit the first image of the first functional run using standard software
(SPM95 software package; without spin history correction; Friston et al. 1995; Friston et al. 1996).
For the single slice experiments using cardiac gating, there was an additional pre-processing step.
Because cardiac gating results in irregular temporal sampling, the time series for each imaging “run”
and voxel was linearly interpolated to a consistent 2 s interval between images, using recorded
interimage intervals to reconstruct where each image occurred in time.
Response detection
Responses were detected using a general linear model (GLM) and a set of basis functions
designed to detect the wide range of response waveshapes known to occur in auditory cortex. This
approach has been described and tested in detail previously (Chap. 3). The basic idea behind the
GLM is to model the signal vs. time within each voxel as a weighted sum of basis functions, and
then to identify “active” (i.e., “sound-sensitive”) voxels based on the goodness of fit of this model.
Briefly, the basis set consists of five components, designed to provide direct information
about response waveshape: Onset, Sustained, Ramp, Offset, and Undershoot (Figure 4-1). The onset
component reflects the magnitude of an initial transient response that is above and beyond the level
of any sustained response (reflected in the sustained component). The ramp component provides
Methods
127
flexibility for modeling response changes that occur over the latter two-thirds of the “sound on”
period. The offset component models transient signal increases following stimulus termination. The
undershoot component is so-named because this component is included primarily to help model
responses in which the signal decreases below baseline following a sustained response.4 These basis
functions, appropriately weighted, are able to capture response waveshapes ranging from sustained
to phasic, as illustrated in Figure 4-1. For the “sustained” waveform in Figure 4-1, the sustained
component has the largest amplitude. For the “phasic” waveform, the onset and offset components
have the largest amplitudes, and there is an appreciable ramp component (consistent with the signal
increase seen over the latter half of the “sound on” period).
The GLM was implemented separately for each imaging session, with each stimulus within a
session being represented by its own complete basis set. In constructing the design matrix, the basis
functions for each component were sampled at the appropriate times, as determined by the temporal
relationship between the stimulus and imaging for a given functional run (i.e., every 2 s for the
single-slice experiments; every 8 s for the multislice experiments). For all sessions, we assumed that
response waveshape and magnitude were constant across repeated presentations of a given stimulus
(within a session), so the GLM did not incorporate the possibility of a progressive “habituation” in
the amplitude of the basis functions across stimulus repetitions. (This assumption is justified by an
analysis performed in Chap. 2). In implementing the GLM, we included three additional functions in
the design matrix for each imaging run to handle low-frequency “trends” in the signal vs. time.
These included the standard column vector of ones (for estimating the mean), and both a linear and
quadratic vector for estimating any signal drift during the course of the run.
The estimated
amplitudes of these three functions were ignored in the creation of the activation maps. Lastly, as
part of the GLM, an estimate of the noise autocorrelation was used to pre-whiten the data from the
single-slice experiments, so as to bring the false-positive rate closer in line with its theoretically
predicted value (see Chap. 3 for details). No pre-whitening was applied to the data from the
4
The preceding descriptions of the five components represent their “physiological” rationale. From a strictly
mathematical perspective, the basis functions are simply weighted (positive or negative) to give the best (i.e.,
least square error) fit to the fMRI signal vs. time.
128
Chapter 4: Sound envelope determines fMRI time-pattern
Fitting of OSORU Basis Functions
to Two Different Response Waveshapes
Sustained
Phasic
Response
Response
onset
sustained
ramp
offset
Percent
Signal Change
undershoot
Measured
Waveform
Sum of Basis
Functions
1
0
0
30
Time (s)
60
0
30
60
Time (s)
Figure 4-1: An example illustrating the “best fit” amplitudes of the five basis functions to
prototypical “sustained” (left) and “phasic” (right) responses. These prototypical responses (dashed
lines in bottom row) represent the average response of Heschl’s gyrus to 2/s and 35/s noise burst
trains, respectively (Figure 4-2). The amplitudes of the basis functions were determined using a
general linear model (i.e., linear regression). Summation of the basis functions for each response
type yields the solid line in the bottom row. For this particular example, a vector of ones was not
included in the linear model. (If included, the basis function amplitudes would be slightly different,
and there would have been less of a difference between the actual and fitted responses during the
final 15 s).
Methods
129
multislice experiments, since the residuals without pre-whitening were already consistent with a
hypothesis of “white” (uncorrelated) noise, presumably due to the long interval (8 s) between volume
acquisitions in a given imaging run.
For each stimulus in a given imaging session, we created an “omnibus” activation map
(using an F-statistic) that tested against the null hypothesis that none of the estimated amplitudes of
the basis functions were significantly different from zero (Chap. 3). “Active” voxels were defined as
those with p-values less than 0.001 (not corrected for multiple comparisons).
Waveshape quantification
Quantification of the response to a given stimulus was generally performed by combining
the active voxels within a given anatomically-defined region of interest (Heshl's gyrus – HG,
superior temporal gyrus – STG, or antero-medial region – AM; see “Defining regions of interest”).
For these analyses, the amplitudes of a given basis function were averaged across all the active
voxels in a given region (across both hemispheres for the single-slice experiments; across a given
hemisphere and slice for the multi-slice experiments).
These average amplitudes were then
converted to a “percent change” scale by dividing by the estimated signal mean (i.e., the average
amplitude over the same active voxels of the “trend” basis function comprised of a vector of 1’s) and
multiplying by 100. We denote the resulting amplitudes of the onset, sustained, ramp, and offset
components as On, Sust, Ramp, and Off, respectively. Mid, a measure of the response amplitude
near the middle of the “sound on” period, was defined as the sum of Sust plus one-half of Ramp.
(The undershoot component was included in the basis set to help fit responses with that particular
feature, but was not used to quantify response waveshape). For a given stimulus and region, we
required a minimum of at least three active voxels (in total across the left and right hemispheres for
the single-slice sessions) in order to include that stimulus/region combination in the subsequent
analyses. Our overall database collected over all the single-slice sessions included 161 responses for
130
Chapter 4: Sound envelope determines fMRI time-pattern
both HG and STG. Of these, 5 responses for STG and one for HG were excluded because the threevoxel criterion was not met.5
The overall waveshape of responses was quantified using a summary measure, the
“waveshape index”, capable of broadly distinguishing between sustained and phasic responses. The
basic idea behind the “waveshape index” (WI) was to compare the total amount of “transient
activity” (defined as the sum of the onset and offset amplitudes) to the activity at the midpoint of the
response. Secondarily, the WI was designed to yield smaller (more “sustained”) values when the
transient activity was primarily limited to either the onset or offset component, and larger (more
“transient”) values as the onset and offset components approached each other in magnitude. The
exact formulation of the WI was chosen so as to yield a robust measure that stayed within a finite
range. While a single number obviously cannot encapsulate all the dynamics of an fMRI response,
the WI is a convenient measure for summarizing the overall dynamics of a response waveform.
Reasonable behavior of the WI was previously confirmed by examining how well the WI
qualitatively sorted various waveforms (derived from the same underlying database as the present
study; Chap. 3).
Specifically, the WI was defined as:

1
On + Off

 ∈ [0,1]
2  Mid + max(On, Off ) 
(1)6
Note that the WI depends only on the response waveshape, and is unchanged if a response is scaled
throughout by a constant factor. In some instances (i.e., the figures displaying spatial maps of the
WI), the WI was calculated on a voxel-by-voxel, rather than regional basis, using measures
analogous to On, Mid, and Off for individual voxels.
5
Specifically, the following cases were eliminated from analysis: one instance in STG of the 2/s (5% STF)
noise burst train, one instance in STG of the 35/s noise burst train with a 50% STF, one instance in STG of
music, and both the 2/s and 35/s tone burst trains (4 kHz) in HG and STG of one session (thereby removing
this session entirely from the analysis of the effect of sound bandwidth on response waveshape).
6
Technically, prior to their use in this equation, On, Off, and Mid were all rectified (i.e., negative values were
converted to zero). However, in all other instances in this paper referring to these measures (i.e., outside of the
WI calculation), their amplitudes were not rectified (e.g., the calculations of ∆On, ∆Mid, and ∆Off in Figures
4-6 and 4-7).
Methods
131
Using this definition, maximum values for the WI (i.e., near 1) can only result if the two
transient components are similar in magnitude and are large relative to the midpoint response.
Values near one-half can reflect a response consisting of solely an onset or offset response, or
alternatively a combination of onset and offset activity in a response also having some midpoint
activity. Values near zero reflect a response dominated by the midpoint response (i.e., by the
sustained and/or ramp components).
Additionally, we examined the effect of stimulus changes on the three individual elements
that together constituted the WI – namely, On, Mid, and Off.
Because these measures are not
normalized, they differ from the WI in that they also incorporate information regarding the actual
amplitude of these response features.
Calculating response waveforms
Single-slice experiments – We computed empirical response waveforms by averaging across
repeated presentations of a given stimulus in a given imaging session. Specifically, following
motion correction, image signal vs. time for each voxel was corrected for linear or quadratic drifts in
signal strength over each run, and then normalized so that each voxel had the same (arbitrary) mean
intensity. The resulting time series for each imaging run and voxel was linearly interpolated to a
consistent 2 s interval between images (for the cardiac gated experiments), and then temporally
smoothed using a three point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response
“block” was defined as a 70 s window (35 images) that included 10 s prior to stimulus onset, the 30 s
coinciding with the stimulus “on” period, and the 30 s “off” period following the stimulus. These
response blocks were averaged according to stimulus to give an average signal vs. time waveform
for a given stimulus in a session. For each stimulus and session, we further averaged signal vs. time
across the “active” voxels in either HG or STG. The resulting “grand-average” waveform was then
converted to percent change in signal relative to baseline. The baseline was defined as the average
signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the stimulus. Response
waveforms are included to illustrate the signal vs. time of the actual data. However, all of the actual
response quantification was based on the amplitudes of the basis functions as estimated under the
general linear model (i.e., WI, On, Mid, and Off).
132
Chapter 4: Sound envelope determines fMRI time-pattern
Multi-slice experiments – Response waveforms for a given stimulus were again computed by
averaging data across runs to form a single average response over a 70 s window. This averaging
accounted for the staggered timing between stimulus and volume acquisition from run to run, such
that after appropriately interleaving the data, the response was sampled every 2 s relative to the
stimulus. The average response was temporally smoothed and converted to percent change in signal
as described above.
Defining regions of interest
For the single-slice experiments examining posterior auditory cortex, response waveshape
was quantified for two anatomically-defined regions of interest: Heschl’s gyrus (HG) and the
superior temporal gyrus (STG). The borders of these regions were defined in both hemispheres from
the T1-weighted anatomical images (and subsequently down-sampled to the same spatial resolution
as the functional images). When Heschl's gyrus was visible as a “mushroom” protruding from the
surface of the superior temporal plane, the lateral edge of this mushroom defined the lateral edge of
the HG region. The medial edge was the medial-most aspect of the Sylvian fissure. When a distinct
mushroom was not present, the HG region covered approximately the medial third of the superior
temporal plane (extending from the medial-most aspect of the Sylvian fissure). In the superiorinferior dimension, the HG region extended superiorly to the edge of the overlying parietal lobe, and
inferiorly to the superior edge of the superior temporal sulcus (or medial extension thereof). In
general, activation was confined to the superior two-thirds of this region (i.e., clearly centered on the
“mushroom-like” protrusion). The STG region was defined as the superior temporal cortex lateral to
the HG region. The definition of the inferior and superior borders was the same as for the HG
region. Again, a clear preponderance of the active voxels generally occurred within the superior
two-thirds of the STG region.
For the multislice experiments, auditory cortex was divided into three anatomically-defined
regions of interest: Heschl’s gyrus (HG), superior temporal gyrus (STG), and an anterior-medial
region (AM). Conceptually, given the typical antero-lateral to postero-medial course of Heschl’s
gyrus along the superior temporal plane, STG was defined as the cortex lateral and posterior to
Results
133
Heschl’s gyrus, while AM was defined as the cortex medial and anterior to Heschl’s gyrus.
Therefore, before defining STG and AM, we first identified the medial-lateral and anterior-posterior
extent of Heschl’s gyrus. Unlike the single-slice experiments, the medial edge of the HG region was
not necessarily the medial-most aspect of the Sylvian fissure. Rather, when there was a clear circular
sulcus, the medial aspect of the HG region ended approximately one-half the distance from the
crown of Heschl’s gyrus to the depth of the circular sulcus. The tissue in the depth of this sulcus to
the medial-most aspect of the Sylvian fissure was defined as the AM region. In general, there is little
or no AM in the posterior auditory cortex studied in the single-slice experiments, hence the exclusion
of this region from the single-slice analyses. For the more anterior slices in which no HG was
present, the lateral limit of the AM region was determined by assigning to the AM region a distance
along the cortical surface that was equivalent to that distance in the slice with the most anterior HG
region. The STG region was defined as the cortex lateral to the HG region, or if HG was not present,
as the cortex lateral to AM. (For the most posterior slices in which neither HG nor AM were present,
STG was defined as the entire medial-lateral extent of the superior temporal plane). The formal
definition of the HG and STG regions also differed slightly from the single-slice experiments in one
other respect, which was that the inferior limit of these regions did not include the tissue that was
immediately superior to the superior temporal sulcus. The HG, STG, and AM regions were defined
separately for each hemisphere.
RESULTS
Waveshape dependence on stimulus type in posterior auditory cortex
The waveshape of fMRI responses from posterior auditory cortex depended strongly on the
type of stimulus, as illustrated in Figure 4-2. The eight stimuli represented in this figure have quite
different temporal characteristics, but were similar in spectrum (all were broadband), and sensation
level (all were approximately 55 dB SL). Altogether, the responses in Figure 4-2 range from phasic
(left) to sustained (right).
134
Chapter 4: Sound envelope determines fMRI time-pattern
Figure 4-2: Average responses to eight different stimuli, along with corresponding waveshape
indices (WIs) from individual imaging sessions. Results are shown separately for HG and STG. The
average responses for each stimulus are computed from the same sessions represented by the WI
values. Dashed lines give the standard error of the responses across sessions. WIs near one indicate
a response with approximately equal onset and offset components, and comparatively low signal
near the “midpoint” of the “sound on” period (i.e., “phasic” responses). Conversely, WIs near zero
indicate a response with small onset and offset components, and comparatively large signal near the
midpoint (i.e., “sustained” responses). The data in this figure is restricted to the responses collected
at a stimulus level of ~55 dB SL and from sessions that did not involve a task. Individual noise
bursts were always broadband, and were 25 ms in duration.
1.0
0
1
2
Waveshape
Index
(WI)
Superior
Temporal
Gyrus
Response
0
0.5
1.0
0
1
2
0
Waveshape
Index
0.5
(WI)
Heschl's
Gyrus
Response
% Signal Change
% Signal Change
N=2
30
0
N=2
30
Sound
On
0
100/s
Clicks
0
0
N=27
30
N=27
30
35/s
Noise
Bursts
0
0
N=5
30
N=5
30
Continuous
Noise
0
0
N=4
Time (s)
30
N=4
Time (s)
30
35/s
Clicks
0
0
N=7
30
N=7
30
10/s
Noise
Bursts
0
0
N=22
30
N=23
30
2/s
Noise
Bursts
0
0
N=3
30
N=3
30
Speech
0
0
N=18
30
N=19
30
Music
Results
135
Figure 4-2
136
Chapter 4: Sound envelope determines fMRI time-pattern
Figure 4-2 shows that phasic and sustained responses can be elicited by a variety of stimuli.
For instance, both 100/s clicks and 35/s noise bursts produced highly phasic responses in HG and
STG. These phasic responses were characterized by a prominent signal decline (80-120%) following
an initial signal peak (at ~ 6 s), and a clear peak after sound offset (at ~ 36 s). These stimuli
consistently evoked phasic responses in individual sessions, as evidenced by the preponderance of
WIs greater than 0.6 – values indicative of responses with distinctly phasic waveforms (Chap. 3). At
the other end of the waveshape spectrum, 2/s noise bursts, speech, and music elicited primarily
sustained responses. For these stimuli, the average waveforms show only a small signal decline
following the initial peak (25-30% declines, except for speech in STG, which showed a 40%
decline), so the response near the midpoint of the “sound on” period remains elevated. Additionally,
the waveforms lack a distinct peak after sound offset. In some individual sessions, the responses to
these stimuli were more phasic than indicated in the average waveforms, due primarily to a larger
signal decline following the initial peak, although there was also a small “off-peak” in some
instances. Overall however, the individual responses to 2/s noise bursts, speech, and music were
primarily sustained, as indicated by the typically low values for their WIs.
The responses evoked by 35/s clicks, 10/s noise bursts, and continuous noise were
“intermediate” in waveshape in that they displayed a blend of sustained and transient activity. The
average waveforms for these stimuli exhibited an “intermediate” degree of signal decline (50-75%),
and displayed either a small “off-peak” (e.g., 35/s clicks and continuous noise in STG) or evidence
for a possible “hidden” off-response in the form of a slightly prolonged elevation in signal following
stimulus termination (e.g., 35/s clicks and continuous noise in HG, and 10/s noise bursts).
Consistent with the average waveforms, the WIs for the individual sessions also fell within an
intermediate range.
While Figure 4-2 demonstrates that response waveshape depends strongly on stimulus, a
comparison with the data taken in our separate level experiments (not included in Figure 4-2)
suggests that certain tasks may also affect response waveshape. For music, there was a tendency for
responses from our previous level experiments to be slightly more phasic [mean WIs of 0.17 (HG)
and 0.27 (STG) for responses at 50 dB SL, as compared to mean WIs of 0.09 (HG) and 0.12 (STG)
Results
137
in Figure 4-2]. For continuous noise, responses from the level experiments were clearly more phasic
[mean WIs of 0.58 (HG) and 0.74 (STG) for responses at 55 dB SL, compared to 0.34 (HG) and 0.49
(STG) in Figure 4-2). The finding of more phasic responses in these level experiments may be
highly specific to the particular task required, which was performed only at stimulus onset and
offset.7 Thus, we cannot conclude that all tasks will affect response waveshape, or will do so in the
same way. While these comparisons suggest that factors in addition to stimulus characteristics may
influence response waveshape, the wide variations in waveshape that occur across stimuli when task
is held constant (e.g., Figure 4-2) indicate that stimulus characteristics are a major determinant of
response waveshape.
Waveshape dependence on modulation rate in posterior auditory cortex
Since the stimuli in Figure 4-2 primarily differ in their temporal characteristics, the data
strongly suggest that stimulus temporal characteristics play a prominent role in determining the
dynamics of auditory fMRI responses. For instance, stimulus modulation rate had a clear effect on
response waveshape in that, on average, higher rate stimuli (i.e., 100/s clicks and 35/s noise bursts)
typically elicited phasic responses, whereas the stimuli dominated by low modulation rates (2/s noise
bursts, speech, and music) elicited more sustained responses.
The dependence of waveshape on rate held within individual sessions, as well as on average.
In our experiments, rate was varied within session in three ways: (1) between 2/s and 35/s for noise
bursts (STF = 5% for 2/s, 88% for 35/s), (2) from 2/s to 10/s to 35/s for noise bursts (STFs of 5%,
25%, and 88%, respectively), and (3) between 35/s and 100/s for clicks. In the 23 sessions that used
both 2/s and 35/s noise burst trains, the WI was greater for 35/s in all but one instance in HG, and in
every instance in STG (Table 4-1). This consistent difference in WI was also reflected in the
individual elements that together defined the WI. Specifically, the transient components of the
physiological basis set – onset (On) and offset (Off) – were almost always greater at 35/s than 2/s,
7
We believe that the comparisons reflect a task effect, rather than motion artifact because (1) there was no sign
of motion correlated with the onset and offset of the stimuli, and (2) the waveshape difference between the task
and no task conditions was greater for one stimulus (continuous noise) than for the other (music), despite the
fact that the task was always the same.
138
Chapter 4: Sound envelope determines fMRI time-pattern
whereas the midpoint signal level (Mid) was almost always less at 35/s than 2/s (Table 4-1). For all
seven sessions using 2/s, 10/s, and 35/s noise burst trains8 (a subset of the preceding 23 sessions), the
WI in both HG and STG for the 10/s train was greater than the value for the 2/s train, and less than
the value for the 35/s train. For the two sessions that used both 35/s and 100/s clicks, the WI for
100/s clicks was greater than that for 35/s clicks in both HG and STG. Thus, the intrasession data
strongly indicate a change toward more phasic responses with increasing rate.
35/s vs. 2/s noise burst trains
∆WI
∆On
∆Mid
∆Off
N+/Ntotal
22/23
22/23
1/23
22/23
HG
mean ± ste
0.41 ± 0.03
1.14 ± 0.14
-0.78 ± 0.13
1.00 ± 0.12
STG
N+/Ntotal mean ± ste
22/22
0.42 ± 0.03
19/22
0.99 ± 0.17
1/22
-1.01 ± 0.13
22/22
1.04 ± 0.12
Table 4-1: Differences in WI, On, Mid, and Off between responses to a 35/s and 2/s noise burst train
that were obtained in the same imaging session. Ntotal is the total number of sessions for which such
a comparison was available. N+ is the number of sessions for which the difference (35/s minus 2/s)
was positive. In both HG and STG, the difference between the two trains was significant for all
measures at p < 0.001 (signed rank test). ste = standard error of the mean.
In the comparisons just described, burst or click duration remained constant across rates, so
rate co-varied with STF. Since this raises the possibility that waveshape is controlled by STF, and
not rate, we examined the effects of rate while holding STF constant. Figure 4-3 shows that even
when STF was matched at 50%, the average response waveform for a 35/s noise burst train was still
more phasic than that for a 2/s train. In every session (5/5 for HG, 4/4 for STG), the WI for 35/s was
greater than for 2/s, indicating more phasic responses at 35/s. However, the response difference
between these two noise burst trains with equal STF was less pronounced (average ∆WI ~ 0.24) than
the difference between trains of the same two rates but for which STF also co-varied markedly with
8
The seven sessions represent five subjects, since two subjects were studied twice.
Results
139
Waveshape for
Two Rates with
Equal STF (50%)
2
2/s
Superior
Temporal
Gyrus
1
Percent Signal Change
Heschl's
Gyrus
35/s
Sound
On
0
2/s
2
1
35/s
0
0
30
Time (s)
Figure 4-3: Average responses to 2/s and 35/s noise burst trains, each having a sound-time fraction of
50%. The averages are based on responses collected in the same imaging sessions (N = 5 for HG, 4
for STG).
rate (from 88% for 35/s bursts to 5% for 2/s bursts; average ∆WI between these latter two trains was
~ 0.44 for the same sessions). Thus, these results provide further support for the view that response
waveshape depends strongly on stimulus rate, but suggest that STF may also influence response
waveshape, as we examine in more detail next.
Waveshape dependence on sound-time fraction in posterior auditory cortex
STF is a second stimulus temporal characteristic that influences auditory response dynamics.
This was determined from 7 sessions (6 subjects) that varied STF while holding rate constant.
140
Chapter 4: Sound envelope determines fMRI time-pattern
Waveshape vs. STF
35/s Noise Bursts
1.0
Percent Signal Change
Superior
Temporal
Gyrus
25% STF
1
0.5
Waveshape Index
Heschl's
Gyrus
2
0
1.0
0.5
50%
0
88%
2
25% STF
1
50%
0
88%
0
25 50
88%
Sound-Time
Fraction
0
30
Time (s)
Figure 4-4: Examination of the effect of sound-time fraction on response dynamics for 35/s noise
burst trains with STFs of 25%, 50%, or 88%. Results are shown separately for HG and STG. The
left panels plot the WIs for individual sessions. The values for the 88% STF are limited to the
sessions with data at either of the other two STFs. The right panels plot the response for each STF,
averaged across the same sessions represented by the WI values (N = 2, 6, and 7 for 25%, 50%, and
88% in HG; N = 2, 5, 6 in STG). One subject contributes twice (i.e., two sessions) to the results for
the STFs of 50% and 88% in both HG and STG.
Figure 4-4 shows WIs and average response waveforms for 35/s noise burst trains with STFs of 25%,
50%, or 88%. For the 6 sessions with data for STFs of 50% and 88%, there was clearly an effect of
STF in HG, where the WI was greater for the 88% STF in all sessions (p = 0.03, signed rank test;
∆WI = 0.16 ± 0.04). Changes in On, Mid, and Off also occurred in a consistent direction (On and
Off: larger for the 88% STF in either 5 or 6 of 6 cases; Mid: smaller for the 88% STF in 5 of 6 cases;
p ≤ 0.06 for each comparison). However, the changes in On, Mid, and Off were generally small,
consistent with the fact that there were only small differences between the average response
Results
141
waveforms for the 50% and 88% STFs (∆On = 0.39 ± 0.07; ∆Mid = -0.35 ± 0.09; ∆Off = 0.43 ±
0.16). In STG, the WI was greater for the 88% STF in 4 of 5 cases9 (p = 0.2), but none of the
changes in On, Mid, or Off approached statistical significance (p > 0.3). However, in the two
sessions with data for a larger STF differential of 25% and 88%, responses were more phasic (i.e.,
higher WI) at the higher STF in both HG and STG. This suggests that a larger STF differential
might have resulted in larger, more robust changes in response dynamics. Given the consistent
(albeit small) difference between the 50% and 88% STFs in HG, and in light of the results for the
25% STF in both HG and STG, the data overall indicates a tendency for responses to a 35/s noise
burst train to become less phasic with decreasing STF.
There was also evidence of an effect of STF on the response dynamics of a low rate (2/s)
noise burst train. In particular, in five sessions with responses to a 2/s noise burst train with STFs of
5% and 50%, On was greater at the 50% STF in all cases, in both HG and STG (p = 0.06, signed
rank test; HG: ∆On = 0.68 ± 0.33; STG: ∆On = 0.57 ± 0.12). However, a consistent change in either
Mid or Off was not observed (p > 0.6). Overall, the net effect on the average response waveform was
a slightly more pronounced “on-peak” at the 50% STF [c.f., the 2/s (50% STF) average waveforms
in Figure 4-3 to the 2/s (5% STF) average waveforms in Figure 4-2]. Again, more pronounced
changes in response waveshape might have been observed using a larger STF differential (e.g., 80%
vs. 5%). Altogether, the experiments varying modulation rate and STF indicate that rate and STF are
two temporal characteristics of a stimulus that influence the waveshape of responses from auditory
cortex.
9
There was one less case available for comparison of the 50% and 88% STFs in STG than HG, due to an
insufficient number of “active” voxels for the 50% STF stimulus in one session.
142
Chapter 4: Sound envelope determines fMRI time-pattern
Waveshape vs. Level
(35/s noise bursts)
Waveshape vs. Rate
(70 dB SL)
Percent Signal Change
2
40 dB SL
55 dB SL
70 dB SL
2/s
1
35/s
0
0
30
Time (s)
0
30
Time (s)
Figure 4-5: Left: Responses to a 35/s noise burst train at three different stimulus levels (40, 55, and
70 dB SL). Right: Comparison of the responses to 2/s and 35/s trains at a common level of ~70 dB
SL. Individual noise bursts were broadband, and were 25 ms in duration. All responses are taken
from a single imaging session.
Insensitivity of waveshape to sound level in posterior auditory cortex
Unlike changes in stimulus temporal characteristics, variations of stimulus level over a 30 to
40 dB range did not result in strong, systematic changes in response waveshape. This is illustrated in
Figure 4-5, which shows responses from Heschl’s gyrus for one session. The left panel shows that
phasic responses were elicited by 35/s noise burst trains (88% STF) regardless of level (40, 55, and
70 dB SL), whereas the right panel illustrates the markedly different responses produced by 35/s and
2/s noise burst trains of comparable level (70 dB SL). Any change in waveshape with level was far
less than the change in waveshape with rate.
Altogether, three sets of experiments examined the influence of stimulus level on response
waveshape, and their results support the impression from Figure 4-5. Two sets of experiments
examined the effect of sound level on waveshape and compared it with the effect of a change in
Results
143
stimulus temporal characteristics. One of these sets used 35/s noise bursts (88% STF) of various
levels (40, 55, 70 dB SL) and 2/s noise bursts (5% STF) for comparison. The second set used
continuous noise of various levels (35 – 75 dB SL) and music (50 dB SL) as the comparison
stimulus. A third set of experiments examined the effect of stimulus level on waveshape using music
as the stimulus, but did not include a comparison stimulus with different temporal characteristics.
The effect of level on WI, onset component, midpoint activity, and offset component was quantified
as follows. For each session and measure, we subtracted the minimum value for a given measure
across all levels from the maximum value across all levels, thus obtaining the largest absolute
difference across all levels.
This difference was assigned either a positive or negative sign
depending on the whether the maximum value occurred for the higher, or lower, level, respectively.
The resulting values are plotted in the “Level Change” columns in Figure 4-6 (unfilled symbols).
For the sessions that also included a “comparison” stimulus, the difference in WI, onset, midpoint,
and offset between the standard and comparison stimuli (at comparable levels) was calculated.10 A
sign was assigned based on the following convention: continuous noise minus music, and 35/s minus
2/s noise bursts. The resulting values are plotted in the “Temporal Change” columns in Figure 4-6
(filled symbols).
Overall waveshape, as summarized by the WI, was not affected in a systematic manner by
changes in stimulus level. The changes in waveshape index (∆WI) due to the changes in level were
distributed roughly equally between positive and negative values in both HG and STG (p > 0.15,
signed rank test; Figure 4-6, top row). In contrast, for the same sessions, the ∆WI due to a stimulus
temporal change were always the same sign (p = 0.004). Furthermore, in all but one instance (in
HG), the ∆WI due to the temporal change were greater than the ∆WI due to the level change
(irrespective of sign), indicating a clear effect in the magnitude of ∆WI as well.
10
Since stimuli were compared for a given sensation level, their energy differed slightly. For continuous noise
and music, this difference amounted to less than 5 dB; for 35/s and 2/s noise bursts the difference was ~12 dB.
144
Chapter 4: Sound envelope determines fMRI time-pattern
Figure 4-6: Changes in four different measures, due to either a change in stimulus level (unfilled
symbols) or a change in stimulus temporal characteristics (filled symbols). All values represent a
difference based on a within-session comparison. The values in the “Level Change” columns
represent the largest change in a given measure between 3 – 4 levels presented over a 30-40 dB
range. (Diamonds: 35/s noise burst train; Up triangles: continuous noise; Down triangles: music).
Four of the five subjects in the sessions using music were also subjects in the sessions using
continuous noise. The values in the “Temporal Change” columns represent the change in a given
measure due to a large change in stimulus temporal characteristics. [Circles: Difference between
values for 35/s vs. 2/s noise burst trains at either 55 dB SL (solid circles) or 70 dB SL (circles with
‘+’ symbol); Squares: Difference between values for continuous noise vs. music (50-55 dB SL)].
The “Temporal Change” values were obtained in a subset of the sessions that yielded the “Level
Change” values (specifically, the sessions that varied the level of 35/s noise burst trains and
continuous noise).
Results
145
Effect of Level on Waveshape
Heschl's Gyrus
Superior Temporal Gyrus
Temporal
Change
0.5
Change in
Waveshape
Index
(∆WI)
0
Level
Change
Temporal
Change
Level
Change
2.0
Change in
Onset
(∆On)
0
-2.0
1.0
0
Change in -1.0
Midpoint
(∆Mid)
-2.0
-3.0
2.0
1.0
Change in
Offset
(∆Off)
0
-1.0
Figure 4-6
146
Chapter 4: Sound envelope determines fMRI time-pattern
Examination of the individual waveshape components revealed a similar lack of a systematic
level effect for the onset and midpoint. The changes in both measures (∆On and ∆Mid) due to the
changes in level were equally distributed around zero for both HG and STG (p > 0.3; Figure 4-6,
second and third rows). In contrast, the ∆On and ∆Mid due to a stimulus temporal change were
almost always the same sign (p ≤ 0.01). In terms of the magnitude of the changes, there was
appreciable overlap of the ∆On between the “Level Change” vs. “Temporal Change” comparisons.
However, there was little such overlap for ∆Mid, suggesting that the similar lack of overlap of ∆WI
between the two comparisons was largely driven by the changes in midpoint activity.
Unlike the onset and midpoint, there was evidence for a consistent effect of level on the
offset. The changes in offset (∆Off) due to level were typically positive (p = 0.01 in HG, 0.15 in
STG). As with ∆On, there was considerable overlap of the ∆Off between the “Level Change” and
“Temporal Change” comparisons. Nevertheless, the ∆Off due to a stimulus temporal change were
always positive, and, on average, larger than the changes due to level. Thus, while level appears to
have an effect on response offset, the influence of stimulus temporal properties was both larger (on
average) and more consistent in the direction of the effect. Altogether, these analyses of ∆WI, ∆On,
∆Mid, and ∆Off indicate that response waveshape is affected in a more systematic manner by
changes in stimulus temporal characteristics than by changes in stimulus level.
Insensitivity of waveshape to sound bandwidth in posterior auditory cortex
In another set of experiments we demonstrated that response waveshape was not
systematically affected by changes in stimulus bandwidth. Altogether, these experiments were
designed to allow for comparisons both within and across the factors of rate and bandwidth, for two
different center frequencies of band-limited stimuli. In particular, responses were collected to
broadband noise burst trains and narrowband trains composed of either tone bursts (3 sessions) or
filtered (1/3 octave) noise bursts (2 sessions). The repetition rates were either 2/s or 35/s, and the
center frequencies of the narrowband bursts were either 500 Hz or 4 kHz. In one session the
narrowband stimuli (4 kHz tone bursts at 2/s and 35/s) did not satisfy our criterion of at least three
active voxels (in either HG or STG), so this session was excluded from the sensitivity analysis.
Results
147
Changes in stimulus bandwidth from broadband to narrowband did not have a consistent
affect on the response dynamics. The changes in waveshape index (∆WI) between broadband and
narrowband trains of the same rate were clustered around zero in both HG and STG (p > 0.4, signed
rank test; Figure 4-7, top row, “Bandwidth Change” columns). In contrast, for the same sessions, the
∆WI due to a change of rate for trains of the same bandwidth were always positive (p = 0.02 in HG
and STG; “Rate Change” columns in Figure 4-7), and in all but one instance (in STG) were greater
than the ∆WI due to the bandwidth change. The lack of a consistent effect of bandwidth on ∆WI
also held for the individual waveshape components (Figure 4-7, bottom three rows). The ∆On,
∆Mid, and ∆Off due to a change in bandwidth were approximately evenly distributed around zero (p
= 0.06 for ∆On in STG; p = 0.05 for ∆Off in HG; otherwise, p > 0.35). In contrast, the ∆On, ∆Mid,
and ∆Off due to a change in rate were consistently the same sign (p ≤ 0.03). These results indicate
that bandwidth did not have a consistent effect on the sustained vs. phasic nature of a response
waveform, in contrast to the highly consistent and robust effects of rate.
Response waveshapes throughout auditory cortex for music and 35/s noise bursts
Altogether, the results of the single-slice experiments indicate that response waveshape
depends strongly on stimulus temporal characteristics, but not level or bandwidth. However, the data
supporting these conclusions were obtained for a limited region of auditory cortex – namely, the
most posterior aspect of HG and immediately lateral STG. Three multislice experiments using a
low-rate stimulus (music) and a high-rate stimulus (35/s noise bursts, 88% STF) were designed to
test whether or not stimulus temporal characteristics have a profound influence on response
waveshape throughout auditory cortex.
By imaging multiple slices, the imaged volume was
guaranteed to include the full array of cytoarchitectonically or histologically-defined auditory areas
in humans, including both primary and non-primary auditory cortex (Galaburda and Sanides 1980;
Rivier and Clarke 1997; Morosan et al. 2001).
Figure 4-8 displays a spatial map of the WI for one of the sessions using multiple slices. The
results are representative of what was observed in the other two subjects. For both music and 35/s
148
Chapter 4: Sound envelope determines fMRI time-pattern
Figure 4-7: Changes in four different measures, due to either a change in stimulus bandwidth
(unfilled symbols) or a change in rate (filled symbols). All values represent a difference based on a
within-session comparison. The values in the “Bandwidth Change” columns are the difference in
each measure between a broadband noise burst train and a narrowband train of the same rate. The
narrowband stimuli were either tone bursts or filtered noise bursts, with center frequencies (Fc) of
either 500 Hz or 4 kHz. (Empty diamonds: Rate of the broadband and narrowband trains was 35/s
and Fc of the narrowband train was 500 Hz; Diamonds with ‘+’: Rate = 35/s, Fc = 4 kHz; Empty
squares: Rate = 2/s, Fc = 500 Hz; Squares with ‘+’: Rate = 2/s, Fc = 4 kHz). The values in the “Rate
Change” columns are the difference in each measure between a 35/s and 2/s train. (Circles: Trains at
both rates were broadband; Up Triangles: Trains were narrowband with Fc = 500 Hz; Down
Triangles: Trains were narrowband with Fc = 4 kHz). The “Rate Change” values were obtained in
the same sessions that yielded the “Bandwidth Change” values.
Results
149
Effect of Bandwidth on Waveshape
Heschl's Gyrus
0.6
Rate
Change
Superior Temporal Gyrus
Rate
Change
0.4
Change in
Waveshape
0.2
Index
(∆WI)
Bandwidth
Change
Bandwidth
Change
0
1.0
Change in
Onset
(∆On)
0
-1.0
0
Change in
Midpoint
(∆Mid)
-1.0
1.0
Change in
Offset
(∆Off)
0
Figure 4-7
150
Chapter 4: Sound envelope determines fMRI time-pattern
Figure 4-8: Maps of WI for music and 35/s noise bursts (88% STF) for a single subject across a
broad expanse of auditory cortex. Each panel shows a color WI map (with in-plane resolution of 3.1
x 3.1 mm) superimposed on a T1-weighted anatomic image (acquired at a resolution of 1.6 x 1.6
mm). The WI is displayed for each “active” voxel (p < 0.001 in the activation maps). Slice position
is referenced relative to the most posterior slice with a distinct Heschl’s gyrus. This slice (denoted
“0 mm”) most likely encompassed a sizable portion of primary auditory cortex in the anteriorposterior dimension (Liegeois-Chauvel et al. 1991; Rademacher et al. 1993; Rademacher et al.
2001), and was the slice-plane employed in the single-slice experiments. In this subject, the posteromedial aspect of Heschl’s gyrus occurred in the same imaging slice for both hemispheres. Images
are displayed in radiological convention, so the subject's right is displayed on the left.
Results
151
Spatial Maps of Waveshape Index
Music
35/s Noise Bursts
right
left
right
left
28 mm
21 mm
14 mm
7 mm
0 mm
(posterior
HG)
-7 mm
sustained
responses
phasic
responses
Figure 4-8
152
Chapter 4: Sound envelope determines fMRI time-pattern
noise bursts, active voxels occurred across a wide expanse of the superior temporal plane. However,
the WIs for music and 35/s noise bursts occupied distinct ranges in all slices, indicating that the
response waveshape to these two stimuli was dramatically different throughout cortex. In particular,
the majority of active voxels for music had a WI less than 0.2, indicating primarily sustained
responses. In contrast, for the 35/s noise burst train, the majority of active voxels had a WI greater
than 0.4, indicating “intermediate” or phasic responses. The voxels activated by the two stimuli
were largely overlapping, indicating that the same regions of cortex could show either phasic or
sustained responses depending on stimulus.11
Figure 4-9 illustrates the difference in response
waveshape between these two stimuli for HG and STG from the left hemisphere of slice “0 mm” in
Figure 4-8 (i.e., the slice employed in the “single-slice” experiments).
Similar differences in
waveshape were observed in the other slices and subjects (consistent with the spatial maps of WI).
Overall, the dramatic difference in waveshape for music vs. 35/s noise bursts in widespread cortical
areas indicates that stimulus temporal characteristics strongly influence waveshape throughout
auditory cortex.
To further quantify response waveshape across cortex, we computed the average WI for the
three multislice sessions as a function of slice position, for each of three anatomically defined
regions: HG, STG, and AM. The results, shown in Figure 4-10, confirm that the WIs for music and
35/s noise bursts occupied distinct ranges in all slices, with music eliciting sustained responses
throughout cortex, and the noise bursts eliciting phasic responses throughout cortex. At a more
refined level of analysis, Figure 4-10 suggests the possibility of small variations in WI for a given
stimulus, either 1) across slices for a given region, or 2) across regions for a given slice (e.g., the
differences between HG, STG, and AM for 35/s noise bursts). While no conclusions could be drawn
in this regard, due to the small number of total hemispheres and variability across hemispheres, the
data do suggest a waveshape difference between HG and STG that proved to be significant in the
more extensive single-slice database.
11
The activation maps for the 35/s noise burst trains were based on approximately four times more data than
for the music maps. If equal amounts of data had been obtained for each stimulus, it is likely that there would
have been more active voxels in the music map relative to the noise bursts (for a constant p-value threshold).
Results
153
Music
Percent Signal Change
4
35/s Noise Bursts
3
Heschl's
Gyrus
3
2
Superior
Temporal
Gyrus
2
1
1
0
-1
0
0
30
0
30
Time (s)
Figure 4-9: Responses to music and 35/s noise bursts (88% STF) for HG and STG, averaged across
the “active” voxels in the left hemisphere of slice “0 mm” in Figure 4-8.
154
Chapter 4: Sound envelope determines fMRI time-pattern
Waveshape Index as a
Function of Slice Position and Region
1.0
0.8
STG
0.6
WI
35/s Noise Bursts
HG
AM
0.4
HG
0.2
STG
0
(anterior) 21
14
7
0
-7
Music
-14 (posterior)
Position relative to (probable)
primary auditory cortex (mm)
Figure 4-10: Average WI for the three multislice sessions as a function of slice position, for each of
three anatomically defined regions: HG, STG, and AM. These regions were defined on a slice-byslice basis, separately for each hemisphere. The WI for each region/slice/hemisphere combination
was computed for the “active” voxels (p < 0.001) for each stimulus (provided there were at least
three such voxels). The resulting WIs were then averaged across hemispheres, after first aligning the
hemispheres in the anterior-posterior dimension according to the most posterior slice with a distinct
Heschl’s gyrus (denoted “0 mm”; see Figure 4-8 caption). In some hemispheres, for the slice
posterior to “0 mm”, there appeared to be a remnant of Heschl’s gyrus emerging from insular cortex
(rather than the superior temporal plane). This cortex was classified as HG (thence the data point for
HG at “-7 mm”). For all regions and slices, only data points that represent an average of WIs across
at least three hemispheres (out of six total) are included. (This criterion, in conjunction with the
“three active voxels” requirement, resulted in no values for the AM region for the music stimulus).
Results
155
Differences in response waveshape between cortical areas
In the single slice experiments examining posterior auditory cortex, there was clear evidence
that responses in STG tended to be slightly more phasic than those in HG. This difference between
HG and STG is already suggested by several aspects of the average response waveforms in Figure
4-2. For example, the percentage decline following the on-peak was noticeably greater in STG than
HG for continuous noise, 35/s clicks, and 10/s noise bursts.
Additionally, the off-peak was
noticeably larger or more distinct in STG for 35/s noise bursts, continuous noise, and 35/s clicks.
A statistically significant trend for more phasic responses in STG relative to HG was
confirmed for the stimuli that typically evoked “intermediate” or phasic responses, but not for the
stimuli that typically evoked sustained responses. Figure 4-11 (top left) plots the WI in HG vs. STG.
The stimuli were divided into five broad “groups”: (1) a “high rate or STF” group that included the
100/s clicks and the 35/s bursts (both broadband and narrowband) with a STF of 88% (filled
diamonds), (2) continuous noise (filled squares), (3) an “intermediate rate or STF” group that
included the 10/s noise bursts, 35/s clicks, and 35/s noise bursts with a STF of 25% or 50% (filled
stars), (4) a music and speech group (unfilled triangles), and (5) the 2/s bursts (including tone and
noise bursts, with STFs of 5% and 50%; unfilled circles). For the “high rate or STF” group,
continuous noise, and the “intermediate rate or STF” group, the WI was greater in STG than HG in
most cases (Table 4-2). But for the music and speech group, and the 2/s bursts, the difference in WI
between STG and HG was not significant. Thus, the evidence for more phasic responses in STG was
strongest for the stimuli that tended to evoke phasic or “intermediate” responses.
156
Chapter 4: Sound envelope determines fMRI time-pattern
1.0
WI
On
3.0
0.8
2.0
0.6
Superior Temporal Gyrus
0.4
1.0
0.2
0
0
0
0.2
0.4
0.6
0.8
1.0
4.0
0
1.0
2.0
3.0
2.0
Mid
Off
3.0
1.0
2.0
0
1.0
0
-1.0
-1.0
-1.0
0
1.0
2.0
3.0
4.0
-1.0
0
1.0
2.0
Heschl's Gyrus
Figure 4-11: Scatterplots of WI, On, Mid, and Off in HG vs. STG. A separate point is plotted for
each stimulus of each imaging session (provided there were at least three “active” voxels in both HG
and STG for a given stimulus). The data set consists of all the responses obtained in the single-slice
experiments that did not involve a task. Filled diamonds: 100/s clicks or the 35/s bursts (broadband
and narrowband) with a STF of 88%; filled squares: continuous noise; filled stars: 10/s noise bursts,
35/s clicks, or 35/s noise bursts with a STF of 25 or 50%; unfilled circles: all tone and noise bursts
with a rate of 2/s; unfilled triangles: music or speech. The following pairs of data points (HG,STG)
did not fall within the chosen axis limits (but were included as part of the statistical analysis in Table
4-2) – On: (-0.82,-0.66), (3.16,4.11); Mid: (2.9,4.9), (-1.30,-1.32), (-0.95,-1.77).
Results
157
STG vs. HG
Stimulus group
High rate or STF
Continuous noise
Intermediate rate
or STF
Music or Speech
2/s bursts
∆WI
31/40 (<0.001)
0.062 ± 0.013
5/5 (0.06)
0.148 ± 0.033
15/18 (<0.001)
0.113 ± 0.024
13/21 (0.13)
0.030 ± 0.017
19/33 (0.14)
0.025 ± 0.012
∆On
23/40 (0.15)
0.125 ± 0.071
5/5 (0.06)
0.371 ± 0.067
17/18 (<0.001)
0.473 ± 0.085
10/21 (0.59)
-0.032 ± 0.081
22/33 (0.02)
0.122 ± 0.064
∆Mid
18/40 (0.14)
-0.050 ± 0.05
2/5 (1.0)
0.000 ± 0.200
7/18 (0.47)
-0.106 ± 0.154
17/21 (0.01)
0.398 ± 0.153
24/33 (0.006)
0.229 ± 0.074
∆Off
32/40 (<0.001)
0.259 ± 0.055
5/5 (0.06)
0.386 ± 0.074
14/18 (0.01)
0.353 ± 0.114
19/21 (<0.001)
0.530 ± 0.099
28/33 (<0.001)
0.248 ± 0.065
Table 4-2: Differences in various measures between responses in STG and HG that were obtained in
the same imaging session. See text for stimuli included in each stimulus group. First row in each
cell is N+/Ntotal (see Table 4-1 caption) and p-value (in parenthesis) resulting from a signed rank test.
Second row is the mean difference (STG minus HG) ± standard error.
Examination of the individual response components (On, Mid, and Off) provided further
insight into the nature of the response differences between STG and HG. In general, both On and
Off tended to be larger in STG than HG, for all five broad “groups” of stimuli (a clear exception
being On for the music and speech group; Table 4-2 and Figure 4-11).12 However, Mid only showed
a difference between STG and HG for certain groups (music and speech, 2/s noise bursts; Figure
4-11, lower left). Overall, the following picture emerges (even if the details do not hold exactly, in a
statistical sense, for all three components of all five stimulus groups; Table 4-2). For the stimuli that
typically evoked sustained responses, the amplitude of On and/or Off tended to be larger in STG.
However, Mid also increased in STG for these stimuli, so that the end result was simply larger
sustained responses in STG, and no consistent difference in the WI between the two regions. In
12
Note that Off was frequently negative in both HG and STG for the 2/s bursts, and the music and speech
group (although less often in STG; Figure 4-11, lower right). These negative values must be interpreted in the
framework of the general linear model from which the component amplitudes were derived. Specifically, a
negative value for Off acts to model a response with a faster signal decline following stimulus offset than the
signal decline modeled in the sustained and ramp components of the basis set. In this sense, one interpretation
of the larger values for Off in STG is that sustained responses in STG tend to have slightly slower signal
declines than those in HG.
158
Chapter 4: Sound envelope determines fMRI time-pattern
contrast, for the stimuli that typically evoked “intermediate” or phasic responses, On and Off tended
to be larger in STG, but Mid showed no difference, hence the more phasic responses on STG.
We computed two additional, normalized measures for quantifying the differences in signal
decline between STG and HG, as well as the strength of the off-response in the two regions. The
first was the ratio of Mid to the amplitude of the on-peak (OnPeak; defined as the amplitude of the
onset (On) plus sustained (Sust) components of the basis set). Conceptually, this ratio reflects the
amount of ongoing activity approximately midway through the “sound on” period relative to the
activity evoked by the onset of the stimulus. For the “high-rate or STF” group, continuous noise,
and the “intermediate rate or STF” group, this ratio was significantly lower in STG than HG (p ≤
0.06 for all three groups, signed rank test; Figure 4-12, left). However, no significant difference was
observed for 2/s bursts or the music and speech group (p > 0.6). These results are consistent with
what might be expected based on the population data for On and Mid. The second normalized
measure was the ratio of Off to OnPeak. (It is difficult to assess from Figure 4-11 whether this ratio
might differ between STG and HG). This ratio was consistently higher in STG than HG for all
stimulus groups (p ≤ 0.06; Figure 4-12, right). Together, these ratios provide greater detail regarding
the two transient peaks of the phasic response. Namely, for the intermediate and phasic responses,
the signal decline from the on-peak was typically larger in STG than HG, and the relative amount of
activity evoked by stimulus termination versus stimulus onset was also typically larger in STG.
Left-right differences in response waveshape
Responses from the left hemisphere tended to be more phasic than those from the right
hemisphere for the “high rate or STF” group of stimuli, but not for the other groups of stimuli.
Figure 4-13 plots the WI in the left vs. right hemisphere for the single slice experiments examining
posterior auditory cortex. For the “high rate or STF” group, the WI was consistently greater in the
13
Recall that the onset component of the basis set models the transient aspect of the response following
stimulus onset. The overall amplitude of the response following stimulus onset is best modeled by the sum of
the onset and sustained components of the basis set (i.e., On plus Sust).
Results
159
1.5
1.5
Superior Temporal Gyrus
Mid / OnPeak
Off / OnPeak
1.0
1.0
0.5
0.5
0
0
-0.5
-0.5
-0.5
0
0.5
1.0
1.5
-1.0
-1.0
-0.5
0
0.5
1.0
1.5
Heschl's Gyrus
Figure 4-12: Scatterplots of Mid/OnPeak and Off/OnPeak in HG vs. STG. OnPeak was defined as
the sum of onset and sustained components of the basis set (i.e., On plus Sust). The symbols are the
same as Figure 4-11, as is the data included, except two data points were excluded that had a
negative OnPeak in STG. One data point did not fall within the chosen axes for Mid/OnPeak
(-0.95,-1.25).
left hemisphere, for both HG and STG (HG: L > R in 29 of 34 cases; p < 0.001, signed rank test;
∆WI = 0.120 ± 0.020; STG: L > R in 27 of 38 cases; p = 0.002; ∆WI = 0.074 ± 0.021). However, for
all the other groups of stimuli, there was no significant difference in the WI between left and right
hemisphere (p > 0.2). The trend for more phasic responses in left hemisphere for the “high rate or
STF” group was due to a greater signal decline in the left hemisphere following the on-peak [i.e.,
Mid was consistently smaller in the left hemisphere for the “high rate or STF” group (p < 0.01 for
both HG and STG), but On and Off were not different between the two hemispheres (p > 0.1)].
160
Chapter 4: Sound envelope determines fMRI time-pattern
Waveshape Index
Heschl's Gyrus
Superior Temporal Gyrus
Right Hemisphere
1.0
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1.0
0
0.2
0.4
0.6
0.8
1.0
Left Hemisphere
Figure 4-13: Scatterplots of WI in the left vs. right hemisphere for HG and STG. The symbols are
the same as Figure 4-11, as is the data included (provided there were at least three “active” voxels in
each hemisphere for a given region).
Discussion
161
DISCUSSION
The present study considerably extends our knowledge regarding the sound features that are
coded in the time-pattern of fMRI responses in human auditory cortex. In particular, we found that
response waveshape in posterior auditory cortex is primarily determined by sound temporal envelope
characteristics such as rate and sound-time fraction, rather than sound level or bandwidth. Several
multislice experiments confirmed that the influence of sound temporal characteristics on response
waveshape applied throughout widespread regions of auditory cortex, in that a low rate stimulus (i.e.,
music) evoked sustained responses throughout cortex, while a high rate stimulus (35/s noise burst
train) evoked phasic responses throughout cortex.
Several aspects of the study should be kept in mind regarding the extent to which our results
allow predications of the time-pattern of fMRI responses for other stimuli or cortical areas. First, our
conclusion that sound level did not have a systematic effect on response waveshape was based on a
30-40 dB variation in level, ranging from moderate to loud intensities. It is possible that changes in
waveshape might have been observed if a larger range of levels had been examined (i.e., soft vs. loud
intensities). Changes in the time-pattern of population neural activity as a function of level are not
easily predictable, since there are competing influences – higher levels can lead to increased neural
entrainment to successive stimuli in a train (Phillips et al. 1989), which might be expected to yield
more sustained fMRI responses, but higher levels can also increase the duration of forward inhibition
(Brosch and Schreiner 1997), potentially resulting in more pronounced signal adaptation and hence
more phasic responses. Our results empirically demonstrate that waveshape is not strongly affected
by stimulus level for the levels commonly employed in fMRI experiments.
Secondly, it is
conceivable that changes in level or bandwidth could affect response waveshape in areas outside
posterior auditory cortex, since the examination of these two sound characteristics was restricted to
experiments studying posterior cortex. Nonetheless, it seems quite likely sound temporal envelope
will have a dramatic influence on the time-pattern of fMRI responses throughout auditory cortex for
a whole range of stimulus levels and bandwidths.
162
Chapter 4: Sound envelope determines fMRI time-pattern
Response waveshape: hemodynamic vs. neural factors
While the relationship between neural activity and fMRI responses is not fully understood, it
is generally accepted that neural activity and image signal are ultimately linked through a chain of
metabolic and hemodynamic events (e.g., Jueptner and Weiller 1995; Villringer 1999; Heeger et al.
2000; Rees et al. 2000; Logothetis et al. 2001). These events include changes in blood flow, blood
volume, and oxygen consumption (Fox and Raichle 1986; Kwong et al. 1992; Ogawa et al. 1993;
Malonek et al. 1997).
Since hemodynamic changes occur over the course of seconds, fMRI
effectively provides a temporally low-pass filtered view of neural activity. More specifically, since
fMRI is sampling activity over small volumes of brain (i.e., voxels), the responses can be thought of
as showing the time-envelope of population neural activity on a voxel-by-voxel basis.
Even though hemodynamic factors can influence the waveshape of fMRI responses (Buxton
et al. 1998; Mandeville et al. 1998; Mandeville et al. 1999), we do not believe that hemodynamics
account for the differences seen between stimuli, nor are hemodynamics a particularly plausible
explanation for the differences between STG and HG. The reasoning for this first conclusion
follows from the fact that the same voxels are capable of showing either phasic or sustained
responses depending on the stimulus, which is unlikely unless these waveshape differences are
mediated by differences in underlying neural activity. This argument does not work for ascribing the
difference between STG and HG to neural factors, since these are two separate regions, and it is
known that there can be spatial heterogeneity in tissue hemodynamics (Chen et al. 1998; Davis et al.
1998). However, a statistically significant difference in WI between the two regions was only
observed for the stimuli that generally evoked “intermediate” or phasic responses (i.e., 10/s noise
bursts, 35/s bursts and clicks, 100/s clicks, and continuous noise). The lack of a similar difference
for the stimuli that generally evoked sustained responses (i.e., 2/s bursts, music, and speech) argues
against a simple hemodynamic explanation, since that would require that the hemodynamics
themselves be a function of the response waveshape.
More specifically, it seems unlikely that all aspects of the response differences between STG
and HG could be hemodynamic in origin. For example, assume for the sake of argument that the
tendency for the amplitude of On and Off to be greater in STG than HG is due a higher
Discussion
163
hemodynamic “gain” in STG (or any other “non-neural” factor that can be characterized by a single
scalar coefficient). This hemodynamic difference could also then explain the tendency for Mid to be
greater in STG for the stimuli that evoke sustained responses. However, by necessity, the decrease
in Mid in STG relative to HG for the stimuli that evoke phasic responses must then be due to some
other factor than just a simple difference in hemodynamic gain. One logical and straightforward
explanation is that there is a true difference between STG and HG in population neural activity
during the midpoint of the stimulus. Alternatively, to maintain that all the response differences
between the two regions are solely hemodynamic requires invoking more complicated hemodynamic
factors, such as an interplay between the dynamics of blood flow and volume changes that is itself
dependent on the previous response history (i.e., highly nonlinear). All-in-all, while we cannot
conclusively refute a possible hemodynamic contribution to the observed differences in transient
onset and offset activity between STG and HG (i.e., one can still construct, with a little imagination,
scenarios for which hemodynamics might suffice to explain all the differences), the data suggests
that the differences between these two regions likely reflects underlying differences in the timepatterns of population neural activity between the two regions. Compared to HG, it appears that
neural activity in STG is concentrated more at sound onset and offset, relative to the activity during
the middle of the sound.
Response waveshape: neural adaptation and off-responses
Given this framework, we can interpret the waveshape of phasic responses in light of known
neurophysiological mechanisms (Chap. 2). The pronounced signal decline that creates a distinct onpeak (i.e., high On value relative to Mid) indicates that a general process of neural adaptation is
prominently shaping fMRI responses to stimuli within certain temporal regimes. Similarly, the
signal increase following stimulus termination that creates an off-peak (i.e., high Off value relative to
Mid) indicates the presence of neural off-responses. Together, the degree of adaptation and the
presence or absence of an off-response constitute the two main neural mechanisms that determine
whether the response to a prolonged stimulus is sustained or phasic.
Both of these neural
mechanisms appear to be in operation for a variety of stimuli and across multiple cortical areas.
164
Chapter 4: Sound envelope determines fMRI time-pattern
The results further indicate greater adaptation and more robust off-responses in population
neural activity for STG compared to HG. The ratio of Mid to OnPeak (the response amplitude
following stimulus onset) was consistently smaller in STG for the stimuli that typically evoked
“intermediate” or phasic responses, indicating greater response adaptation in STG following stimulus
onset. (No similar difference between STG and HG was observed for 2/s bursts, music, and speech,
which, as above, argues that this regional difference in signal adaptation was not simply
hemodynamic). Increases in the strength of adaptation from primary (HG) to non-primary (STG)
areas in humans can be similarly inferred from evoked potentials recorded by Howard et al. (2000)
using implanted electrodes. The ratio of Off to OnPeak indicated that, relative to the activity evoked
by the stimulus onset, the activity evoked by the stimulus offset was consistently greater in STG. In
alert macaque monkey, a greater percentage of neurons in the non-primary caudomedial field (CM)
have an excitatory response after the offset of tone and noise bursts than in primary auditory cortex
(AI; Recanzone 2000). In summary, there appears to be two differences between the responses of
STG and HG to high-rate or continuous stimuli: 1) a tendency for enhanced transient activity at
sound onset and offset in STG relative to HG, and 2) a particular enhancement of the response
following sound offset (relative to onset) in STG.
Response waveshape and sound temporal envelope characteristics: rate and sound
time fraction
Variations in the temporal attributes of a stimulus had a more systematic effect on response
waveshape than changes in sound level or bandwidth in posterior auditory cortex. Two particular
stimulus temporal characteristics that we varied were the repetition rate and sound-time fraction
(STF) of a noise burst train. For the combinations of rate and STF employed in this study, the results
indicate that for a fixed STF, higher repetition rates result in fMRI responses with more prominent
transient components, as is also the case for higher STFs at a fixed rate. The direction of these
changes suggests that the silent interval between individual bursts may be particularly important in
determining response waveshape. This follows from the fact that for both cases (i.e., increasing rate
for a fixed STF, and increasing STF for a fixed rate), the silent interval changes in a consistent
Discussion
165
direction (specifically, it decreases), whereas the duration of the individual noise bursts changes in
opposing directions (increasing in the first case, but decreasing in the second).
The likely
importance of the silent interval as a primary factor influencing the amount of ongoing neural
activity to a repetitive stimulus is consistent with microelectrode recordings in animals (de
Ribaupierre et al. 1972; Hocherman and Gilant 1981; Phillips 1985; Brosch and Schreiner 1997).
Psychophysically, the silent interval (more so than rate or STF per se) is related to the fusion of a
noise burst train into continuous noise (Symmes et al. 1955) (although the Symmes et al. study also
shows that the silent interval alone is not sufficient to describe fusion threshold for all rate-STF
combinations). An important role for the silent interval is not surprising given that this interval
obviously controls the available “recovery” time between bursts, and thence the degree of adaptation
to successive bursts. One way to more explicitly examine the effect of silent duration on fMRI
responses would be to use noise burst trains with a fixed silent duration between bursts (e.g., 25 ms).
The resulting stimuli would have a negative correlation between rate and STF, and would therefore
compliment the results obtained using noise burst trains with a fixed burst duration (i.e., positive
correlation between rate and STF).
While the present study was not designed to parametrically explore the entire rate-STF space
(or alternatively, burst duration and silent interval) in a systematic manner, we can nevertheless
make predictions about the response waveforms that will be evoked by a variety of noise or tone
burst trains. The results clearly indicate that phasic responses are likely for trains having both high
rates and STFs, whereas sustained responses are likely for trains in the opposite corner of the rateSTF space (i.e., low rate low and STF). For trains in the other two corners of the rate-STF space, the
effects of rate and STF on response waveform may be in opposition, resulting in responses that are
“intermediate” to the two extremes of phasic and sustained. This was indeed the case for the
sessions with responses to a high rate (35/s) noise burst train with a relatively low STF (25%).
Additionally, intermediate responses were evoked by 35/s clicks (high rate, low “STF”). No
responses were ever obtained for trains with a low rate, but high STF, so we have no information
regarding fMRI responses to stimuli in this corner of the temporal space. At a sufficiently high STF
166
Chapter 4: Sound envelope determines fMRI time-pattern
(i.e., short silent duration between bursts), the response to a low rate train should approach that of
continuous noise.14 However, it is possible that this transition does not occur until very high STFs
(i.e., silent durations so brief that the “train” becomes perceptually indistinguishable from continuous
noise; see below).
A study by Jäncke et al. (1999) suggests that fMRI responses with prominent transient
activity can occur for low-rate trains with intermediate STFs. In that study, low-rate (1/s) tone bursts
trains with STFs ranging from 20% to 80% evoked fMRI responses with a prominent onset
component. They report a signal decline in both primary and secondary auditory cortex of about
60% by 12 seconds following train onset. In the present study, the equivalent declines in the average
responses in HG and STG following the onset of 2/s noise burst trains with STFs of 50% were
smaller (40-50% declines).
While we found no systematic effect of bandwidth on response
waveshape for trains with a 88% STF, the larger percentage signal declines in the Jäncke et al. study
could reflect an interaction between the effects of bandwidth and STF on response waveshape that
the present study was not designed to uncover (since we did not explore the effect of bandwidth on
waveshape for multiple STFs). Additionally, Jäncke et al. report no differences in response between
STFs of 20, 40, 60, and 80%, which appears inconsistent with our observation of an increase in the
On component for a 2/s noise burst train with a 50% STF relative to a one with a 5% STF. One
possibility is that the analysis employed by Jäncke et al. was not sensitive to the magnitude of the
changes in transient onset activity that might have actually occurred. Alternatively, it is possible that
there is a “nonlinear” relationship between changes in response waveshape and changes in STF, such
that changes in onset activity occur when STF is increased from 5% to 50%, but not when STF is
increased from 20% to 80%. Finally, an interaction between bandwidth and STF could again
14
Note that the noise burst trains employed in the present study would not have “merged” into continuous
noise even for very high STFs, since 1) the individual noise bursts within a train were “frozen” repetitions of
each other, and 2) the rise/fall time of the noise bursts was not instantaneous. Indeed, in five sessions with
responses to both continuous noise and a 35/s noise burst train (88% STF), the WI was always larger for the
noise burst train in both HG and STG (mean difference ~0.24 in both structures). The most striking difference
between the response waveshapes for these two stimuli was a larger off-response for the noise burst train. This
raises the intriguing possibility that the periodicity inherent in the fine-time structure of the “frozen” noise
burst train, or perhaps the modulated percept of this train relative to continuous noise, may have contributed to
the generation of larger off-responses.
Discussion
167
account for the inconsistent effect of changes in STF on response waveshape between the present
study and Jäncke et al. Altogether, further experiments appear necessary to resolve the changes in
response dynamics that occur as STF is varied for low-rate trains.
At a more general level, extrapolating beyond just STF and rate, the amplitude envelope of
many stimuli composed of repetitive tokens can be broadly described by three temporal factors: the
dominant modulation rate, the depth of the amplitude modulation, and the rapidity of the envelope
changes. Conceptually, these latter two factors broadly constitute the “continuity” of the stimulus.
Stimuli with low modulation depths and/or slow envelope changes have a high degree of stimulus
“continuity”. In the present study, the modulation depth of all the noise and tone bursts trains was
100%, and the individual bursts had a rapid rise/fall time.
Thus, sound-time fraction was a
convenient single parameter for summarizing stimulus continuity. However, by replacing STF with
the more general notion of stimulus continuity, we can hypothesize about the response waveshapes
that would be evoked by a broader range of stimuli than just noise bursts. For example, we expect
that sinusoidally amplitude modulated (SAM) noise with a low modulation rate (e.g., 2 Hz), but only
a shallow modulation depth (e.g., 20%), would evoke more phasic responses than SAM noise having
the same low-rate, but full (100%) modulation, since the amplitude envelope of the former stimulus
displays more continuity (i.e., it approaches continuous noise). Consistent with the importance of
modulation rate in the present study, Giraud et al. (2000) showed that the fMRI response in auditory
cortex to fully modulated (100%) SAM noise changed from a sustained response at low modulation
rates (≤ 8 Hz) to a transient response at high rates (in particular, a response with a transient onset
component). Altogether, across a wide variety of stimuli, we propose that modulation rate and
stimulus continuity are two attributes of the temporal envelope of a stimulus that together determine
the time-pattern of cortical fMRI responses.
Relationship between sound perception, fMRI time-pattern, and neural activity
The changes in fMRI response dynamics as a function of stimulus temporal characteristics
may correlate with corresponding changes in the perception of the stimulus. The perceptual factor
most closely related to response waveshape was whether or not the stimulus was composed of
168
Chapter 4: Sound envelope determines fMRI time-pattern
distinct and countable elements. Previously, Miller and Taylor (1948) described four perceptual
phases as rate is increased for repeated bursts of noise (at a constant STF): (1) successive, distinct
bursts, (2) a train of bursts having a pitch character, (3) a noise differing slightly in quality from
continuous noise, and (4) a continuous noise. For the purpose of the present discussion, we suggest
an additional phase (“1b”) beginning for rates around 10/s, where the stimuli in a train begin to fuse
perceptually. Stimuli at these rates do not evoke a perception of pitch, but rather a percept of
“rhythm”, “unitization”, or “roughness” (Fastl 1977, 1983; Royer and Robin 1986; Robin et al. 1990;
Fishman et al. 2000) depending on the precise combination of rate and sound-time fraction. It was
for stimuli in this rate regime that response waveshape began changing from sustained to phasic.
There was no indication that waveshape was related to pitch perception, in that trains of 35/s noise
bursts and 100/s clicks both evoked phasic responses, but only the 100/s clicks evoked a pitch
percept. In summary, response waveshape was most closely correlated to the ability to perceive
distinct elements in an ongoing train.
More generally, the generation of phasic and sustained responses may reflect the operation
of some of the same processes involved in the segmentation or grouping of stimuli. For stimuli with
low repetition or modulation rates, such as 2/s noise burst trains, music, and speech, the fundamental
“events” of the stimulus may consist of individual noise bursts, musical notes or beats, or words or
syllables. Successive neural responses to each individual event (either at onset, offset, or both)
would then result in a sustained elevation of time-averaged population neural activity for these lowrate stimuli, and thence a sustained fMRI response. In contrast, for stimulus trains with high
repetition rates, for which the individual bursts are no longer distinct, the train itself may be the
perceptual “event”. The phasic response to high rate trains is consistent with such an interpretation,
in that cortex appears to be responding just at the beginning and end of the overall train.
Previously, based on differences in the time-patterns of fMRI activity as a function of
repetition rate in subcortical and cortical structures (Chap. 2), we suggested that a population neural
representation of the beginning and end of distinct perceptual events is weak or absent in the
midbrain, begins to emerge in the thalamus, and is robust in auditory cortex. The fact that the
activity evoked by high-rate and continuous stimuli in the present study was concentrated more at
Discussion
169
sound onset and offset in STG than HG suggests a further progression within cortex, in which the
neural coding of the beginning and end of distinct perceptual events is more accentuated in nonprimary (STG) as compared to primary auditory cortex (HG).
170
Chapter 4: Sound envelope determines fMRI time-pattern
REFERENCES
Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity
and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516,
1983.
Belin P, Zatorre RJ, Hoge R, Evans AC and Pike B. Event-related fMRI of the auditory cortex.
Neuroimage 10: 417-429, 1999.
Brosch M and Schreiner CE. Time course of forward masking tuning curves in cat primary
auditory cortex. J Neurophysiol 77: 923-943, 1997.
Buxton RB, Wong EC and Frank LR. Dynamics of blood flow and oxygenation changes during
brain activation: The balloon model. Magn Reson Med 39: 855-864, 1998.
Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation
of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation.
Magn Reson Med 39: 520-527, 1998.
Davis TL, Kwong KK, Weisskoff RM and Rosen BR. Calibrated functional MRI: Mapping the
dynamics of oxidative metabolism. Proc Natl Acad Sci 95: 1834-1839, 1998.
de Ribaupierre F, Goldstein MH, Jr. and Yeni-Komshian G. Cortical coding of repetitive
acoustic pulses. Brain Res 48: 205-225, 1972.
Edmister WB, Talavage TM, Ledden PJ and Weisskoff RM. Improved auditory cortex imaging
using clustered volume acquisitions. Hum Brain Mapp 7: 89-97, 1999.
Elliott MR, Bowtell RW and Morris PG. The effect of scanner sound in visual, motor, and
auditory functional MRI. Magn Reson Med 41: 1230-1235, 1999.
Fastl H. Roughness and temporal masking patterns of sinusoidally amplitude modulated broadband
noise. In: Psychophysics and physiology of hearing, edited by Evans EF and Wilson JP. London:
Academic Press, 1977, p. 403-417.
Fastl H. Fluctuation strength of modulated tones and broadband noise. In: Hearing-physiological
bases and psychophysics, edited by Klinke R and Hartmann R. Berlin: Springer-Verlag, 1983, p.
282-288.
Fishman YI, Reser DH, Arezzo JC and Steinschneider M. Complex tone processing in primary
auditory cortex of the awake monkey. I. Neural ensemble correlates of roughness. J Acoust Soc Am
108: 235-246, 2000.
Fox PT and Raichle ME. Focal physiological uncoupling of cerebral blood flow and oxidative
metabolism during somatosensory stimulation in human subjects. Proc Natl Acad Sci 83: 1140-1144,
1986.
Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial
registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995.
Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in
fMRI time-series. Magn Reson Med 35: 346-355, 1996.
References
171
Fujita N. Extravascular contribution of blood oxygenation level-dependent signal changes: A
numerical analysis based on a vascular network model. Magn Reson Med 46: 723-734, 2001.
Galaburda A and Sanides F. Cytoarchitectonic organization of the human auditory cortex. J Comp
Neurol 190: 597-610, 1980.
Gati JS, Menon RS, Ugurbil K and Rutt BK. Experimental determination of the BOLD field
strength dependence in vessels and tissue. Magn Reson Med 38: 296-302, 1997.
Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt
A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000.
Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S,
Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain
Mapp 6: 33-41, 1998.
Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott MR, Gurney EM
and Bowtell RW. "Sparse" temporal sampling in auditory fMRI. Hum Brain Mapp 7: 213-223,
1999.
Hall DA, Summerfield AQ, Goncalves MS, Foster JR, Palmer AR and Bowtell RW. Timecourse of the auditory BOLD response to scanner noise. Magn Reson Med 43: 601-606, 2000.
Harms MP and Melcher JR. Understanding novel fMRI time courses to rapidly presented noise
bursts. Neuroimage 9: S847, 1999.
Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does
neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000.
Hocherman S and Gilant E. Dependence of auditory cortex evoked unit activity on interstimulus
interval in the cat. J Neurophysiol 45: 987-997, 1981.
Howard MA, Volkov IO, Mirsky R, Garell PC, Noh MD, Granner M, Damasio H,
Steinschneider M, Reale RA, Hind JE and Brugge JF. Auditory cortex on the human posterior
superior temporal gyrus. J Comp Neurol 416: 79-92, 2000.
IEEE. IEEE recommended practice for speech quality measurements. IEEE Trans on Audio and
Electroacoustics 17: 225-246, 1969.
Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the
BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res
Cogn Brain Res 8: 117-124, 1999.
Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect
synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995.
Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy
DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic
magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl
Acad Sci 89: 5675-5679, 1992.
Langner G. Periodicity coding in the auditory system. Hear Res 60: 115-142, 1992.
Leonard CM, Puranik C, Kuldau JM and Lombardino LJ. Normal variation in the frequency
and location of human auditory cortex landmarks. Heschl's gyrus: Where is it? Cereb Cortex 8: 397406, 1998.
172
Chapter 4: Sound envelope determines fMRI time-pattern
Liegeois-Chauvel C, Musolino A and Chauvel P. Localization of the primary auditory area in man.
Brain 114: 139-153, 1991.
Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological
investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001.
Malonek D, Dirnagl U, Lindauer U, Yamada K, Kanno I and Grinvald A. Vascular imprints of
neuronal activity: Relationships between the dynamics of cortical blood flow, oxygenation, and
volume changes following sensory stimulation. Proc Natl Acad Sci 94: 14826-14831, 1997.
Mandeville JB, Marota JJA, Kosofsky BE, Keltner JR, Weissleder R, Rosen BR and Weisskoff
RM. Dynamic functional imaging of relative cerebral blood volume during rat forepaw stimulation.
Magn Reson Med 39: 615-624, 1998.
Mandeville JB, Marota JJA, Ayata C, Zaharchuk G, Moskowitz MA, Rosen BR and Weisskoff
RM. Evidence of a cerebrovascular postarteriole windkessel with delayed compliance. J Cereb
Blood Flow Metab 19: 679-689, 1999.
Melcher JR, Sigalovsky IS, Guinan JJ, Jr. and Levine RA. Lateralized tinnitus studied with
functional magnetic resonance imaging: Abnormal inferior colliculus activation. J Neurophysiol 83:
1058-1072, 2000.
Miller GA and Taylor WG. The perception of repeated bursts of noise. J Acous Soc Am 20: 171182, 1948.
Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T and Zilles K. Human
primary auditory cortex: Cytoarchitectonic subdivisions and mapping into a spatial reference system.
Neuroimage 13: 684-701, 2001.
Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic
activity in the central auditory system. J Comp Neurol 245: 553-565, 1986.
Ogawa S, Menon RS, Tank DW, Kim SG, Merkle H, Ellerman JM and Ugurbil K. Functional
brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging: A
comparison of signal characteristics with a biophysical model. Biophys J 64: 803-812, 1993.
Penhune VB, Zatorre RJ, MacDonald JD and Evans AC. Interhemispheric anatomical
differences in human primary auditory cortex: Probabilistic mapping and volume measurement from
magnetic resonance scans. Cereb Cortex 6: 661-672, 1996.
Phillips DP. Temporal response features of cat auditory cortex neurons contributing to sensitivity to
tones delivered in the presence of continuous noise. Hear Res 19: 253-268, 1985.
Phillips DP, Hall SE and Hollett JL. Repetition rate and signal level effects on neuronal responses
to brief tone pulses in cat auditory cortex. J Acoust Soc Am 85: 2537-2549, 1989.
Rademacher J, Caviness J, V.S., Steinmetz H and Galaburda AM. Topographical variation of
the human primary cortices: Implications for neuroimaging, brain mapping, and neurobiology. Cereb
Cortex 3: 313-329, 1993.
Rademacher J, Morosan P, Schormann T, Schleicher A, Werner C, Freund HJ and Zilles K.
Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13:
669-683, 2001.
Rao SM, Binder JR, Hammeke TA, Bandettini PA, Bobholz JA, Frost JA, Myklebust BM,
Jacobson RD and Hyde JS. Somatotopic mapping of the human primary motor cortex with
functional magnetic resonance imaging. Neurology 45: 919-924, 1995.
References
173
Ravicz ME, Melcher JR and Kiang NY-S. Acoustic noise during functional magnetic resonance
imaging. J Acoust Soc Am 108: 1683-1696, 2000.
Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional
magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and
body. J Acoust Soc Am 109: 216-231, 2001.
Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving
macaque monkeys. Hear Res 150: 104-118, 2000.
Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties
of human and macaque V5. Nat Neurosci 3: 716-723, 2000.
Rivier F and Clarke S. Cytochrome oxidase, acetylcholinesterase, and NADPH-diaphorase staining
in human supratemporal and insular cortex: Evidence for multiple auditory areas. Neuroimage 6:
288-304, 1997.
Robin DA, Abbas PJ and Hug LN. Neural responses to auditory temporal patterns. J Acoust Soc
Am 87: 1673-1682, 1990.
Robson MD, Dorosz JL and Gore JC. Measurements of the temporal fMRI response of the human
auditory cortex to trains of tones. Neuroimage 7: 185-198, 1998.
Royer FL and Robin DA. On the perceived unitization of repetitive auditory patterns. Percept
Psychophys 39: 9-18, 1986.
Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the
cat. II. Comparison between cortical fields. Hear Res 32: 49-64, 1988.
Sereno MI, Dale AM, Reppas JB, Kwong KK, Belliveau JW, Brady TJ, Rosen BR and Tootell
RB. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging.
Science 268: 889-893, 1995.
Sigalovsky I, Hawley ML, Harms MP and Melcher JR. Sound level representations in the human
auditory pathway investigated using fMRI. Neuroimage 13: S939, 2001.
Symmes D, Chapman LF and Halstead WC. The fusion of intermittent white noise. J Acoust Soc
Am 27: 470-473, 1955.
Talavage TM, Edmister WB, Ledden PJ and Weisskoff RM. Quantitative assessment of auditory
cortex responses induced by imager acoustic noise. Hum Brain Mapp 7: 79-88, 1999.
Talavage TM, Ledden PJ, Benson RR, Rosen BR and Melcher JR. Frequency-dependent
responses exhibited by multiple regions in human auditory cortex. Hear Res 150: 225-244, 2000.
Villringer A. Physiological changes during brain activation. In: Functional MRI, edited by Moonen
CTW and Bandettini PA. Berlin: Springer, 1999, p. 3-13.
174
Appendix
SUBJECTS
Table A-1 (next page): Subject information for the majority of sessions included as part of this
thesis. All sessions listed used a single imaging slice. The slice plane for sessions 1-38 intersected
the inferior colliculi and the posterior aspect of Heschl’s gryi. The slice plane for sessions 39-44
intersected the inferior colliculi and the medial geniculate bodies. Except for sessions 17 and 18, all
sessions utilized cardiac gated imaging. Sessions 1-5 and 39-44 are the sessions of “Exp. I” and “II”,
respectively, of Chap. 2.1 Sessions 1-27 and 29-38 are the 37 “single-slice” imaging sessions of
Chap. 4.2 Sessions 39-44 and 28-38 involved a task (see Chaps. 2 and 4, respectively). Seven
subjects were imaged in multiple sessions, grouped as follows: (1) 2, 16; (2) 3, 11, 17, 28, 30, 35, 41;
(3) 5, 10, 12, 14, 38, 42; (4) 6, 29, 37; (5) 15, 39; (6) 31, 36; (7) 33, 34. The scanner abbreviations
are as follows (see also Chap. 4): A: 1.5T, General Electric scanner, retrofitted by Advanced NMR
systems; B: 3.0T General Electric scanner, retrofitted by Advanced NMR systems; C: 1.5T Siemens
Sonata scanner; D: 3.0T Siemens Allegra scanner; E: 1.5T General Electric Signa Horizon scanner.
1
The relationship between the subject #’s referred to in Chap. 2, and the session #’s of this Appendix is as
follows. The data for subject #’s 1 and 4 were obtained in the like-numbered sessions. The data for subject #2
were obtained in sessions 3 (“Exp. I”) and 41 (“Exp. II”). The data for subject #3 were obtained in session 2.
The data for subject #5 were obtained in sessions 5 (“Exp. I”) and 42 (“Exp. II”). The data for subject #’s 6, 7,
8, and 9 of Chap. 2 were obtained in sessions 39, 40, 43, and 44, respectively.
2
Session 28 presented continuous noise over a 20 dB range of levels to a subject that was subsequently reimaged using a larger 40 dB range. Data from this session was excluded from Chap. 4, but is included as part
of this Appendix.
175
176
Appendix
Table A-1
Session Gender
1
M
2
M
3
F
4
M
5
M
6
F
7
M
8
M
9
F
10
M
11
F
12
M
13
M
14
M
15
M
16
M
17
F
18
F
19
M
20
M
21
M
22
M
23
M
24
F
25
F
26
M
27
M
28
F
29
F
30
F
31
F
32
F
33
M
34
M
35
F
36
F
37
F
38
M
39
M
40
M
41
F
42
M
43
M
44
M
Handedness
R
R
R
R
R
R
R
R
L
R
R
R
R
R
L
R
R
R
R
R
L
R
R
R
R
R
L
R
R
R
R
R
R
R
R
R
R
R
L
R
R
R
R
R
Age
25
27
35
22
25
25
21
26
30
26
36
26
26
26
21
29
38
26
26
24
24
22
22
28
21
21
26
36
30
37
30
24
32
32
38
30
27
27
19
23
35
25
22
32
Scanner
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
C
C
C
C
C
D
D
D
D
D
D
E
E
E
E
E
E
E
E
E
E
E
A
A
A
A
A
A
HG Data
177
HG DATA
Table A-2: Response quantification for Heschl’s gyrus based on the “OSORU” basis functions. The
amplitudes of a given basis function (Onset, Sustained, Ramp, Offset, and Undershoot) were
averaged across the active voxels (p < 0.001 in the OSORU-based maps) of both hemispheres, and
then converted to a “percent change” scale (see Chap. 4). Positive amplitudes represent a positive
deviation from baseline (e.g., Figure 3-1). The waveshape index (WI) for the active voxels was
computed as in Chap. 4. The “sound on” period was always 30 s in duration. NBs: broadband noise
bursts. TBs: tone bursts. BP NBs: “band-pass” (third octave) noise bursts. All individual bursts
were 25 ms in duration, unless a sound-time fraction (STF) for the train is explicitly specified. All
stimuli were presented at approximately 55 dB above threshold (SL), unless a value for “dB SL” is
specified. See Chap. 4 for additional stimulus details. The session numbers correspond to those in
Table A-1. “# Active Vox”: total number of active voxels (across both hemispheres). “# HG Vox”:
total number of voxels in the HG region.
Stimulus
session 1
session 2
session 3
session 4
session 5
session 6
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
WI
Onset
Sustained
Ramp
Offset
0.16
0.28
0.88
0.16
0.19
0.40
0.78
0.15
0.34
0.68
0.43
0.17
0.21
0.45
0.10
0.07
0.28
0.53
0.17
0.21
-0.11
0.60
1.73
0.52
0.79
1.94
2.57
0.49
1.90
2.20
0.73
0.15
0.34
1.14
0.37
0.32
1.02
1.58
0.39
0.66
1.38
1.32
0.06
1.19
1.28
0.50
-0.99
1.21
0.21
-0.71
0.90
1.17
1.35
0.61
1.45
1.83
1.29
0.37
0.88
0.59
0.79
0.38
0.28
-0.07
-0.04
1.31
1.43
-0.16
1.38
1.40
-0.02
0.04
0.20
0.72
0.14
-0.04
1.01
0.98
-0.23
0.69
0.84
0.60
1.64
-1.20
-1.04
0.56
1.42
-0.01
-0.82
0.81
0.67
0.39
0.47
0.76
-0.32
-0.36
0.56
1.00
-0.61
-0.20
0.76
0.44
1.18
0.75
0.01
0.09
0.55
1.01
1.05
0.44
Undershoot # Active
Vox
0.62
11
0.22
16
-0.08
5
0.45
4
0.00
10
0.17
11
0.44
13
-0.27
3
0.20
5
-0.79
7
0.46
1
0.53
11
0.33
16
1.27
6
0.25
20
0.08
23
0.45
32
0.10
23
0.02
6
0.21
10
-0.07
0.14
4
4
# HG
Vox
45
45
45
48
48
48
48
46
46
46
47
47
47
47
56
56
56
56
78
78
78
78
178
session 7
session 8
session 9
session 10
session 11
session 12
session 13
Appendix
Undershoot # Active
Vox
0.34
9
0.58
13
# HG
Vox
59
59
Stimulus
WI
Onset
Sustained
Ramp
Offset
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
1/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
music
cont. noise
35/s NBs
35/s NBs,
25% STF
35/s clicks
music
0.17
0.23
0.63
1.17
1.82
1.30
-1.24
0.17
-0.09
-0.36
0.65
0.45
2.75
2.20
-0.45
0.59
1.64
0.75
1.28
0.65
-0.44
0.17
12
13
59
59
0.11
0.36
0.44
0.44
0.83
2.81
1.59
1.74
0.17
-0.12
-0.15
1.40
-0.32
1.33
0.40
-0.56
0.05
0.49
14
8
8
59
61
61
0.70
0.55
3.35
3.08
-1.98
-1.63
1.35
1.36
1.32
0.29
0.99
0.40
8
9
61
61
0.08
0.20
0.24
0.51
0.66
1.01
2.75
0.84
0.89
0.13
0.23
0.43
-0.75
-0.07
-0.07
-0.40
-0.04
0.11
8
16
16
61
58
58
0.70
0.56
1.69
1.11
-0.01
0.59
0.36
0.26
0.90
0.94
0.29
0.07
7
12
58
58
0.00
0.10
0.16
-0.05
0.46
0.74
1.65
1.64
1.68
-0.08
0.35
0.49
-0.38
-0.20
0.11
0.02
0.26
0.30
9
23
29
58
59
59
0.46
0.43
1.73
1.44
0.63
0.80
1.04
0.64
0.94
0.75
0.69
0.43
18
25
59
59
0.41
0.30
0.41
0.65
0.17
0.16
0.29
0.41
0.05
0.35
0.62
0.31
1.62
1.24
2.44
3.16
0.72
0.74
0.95
1.53
0.30
1.15
1.50
0.89
-0.02
0.53
-0.16
-0.74
1.06
1.40
1.67
0.71
2.68
0.47
-0.02
0.62
0.80
0.65
1.49
1.32
0.66
0.34
0.82
1.06
0.45
0.07
0.84
0.65
-0.61
-1.14
0.02
0.93
-0.50
-0.41
0.81
0.75
-0.79
-0.11
0.86
0.23
0.27
0.22
-1.41
-0.27
0.25
-0.09
0.00
0.38
-0.61
-0.21
0.21
0.08
8
12
22
25
26
25
28
27
21
13
7
16
48
48
48
48
49
49
49
49
49
44
44
44
0.32
0.14
0.82
0.33
0.81
1.02
0.39
-0.38
0.34
-0.18
-0.07
-0.56
17
11
44
44
HG Data
179
Undershoot # Active
Vox
0.08
30
0.11
25
0.18
23
# HG
Vox
55
55
55
Stimulus
WI
Onset
Sustained
Ramp
Offset
session 14 cont. noise
35/s NBs
35/s NBs,
50% STF
35/s NBs,
25% STF
35/s clicks
music
session 15 2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
session 16 2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
session 17 2/s TBs,
500 Hz
2/s NBs
35/s TBs,
500 Hz
35/s NBs
music
session 18 2/s NBs
35/s NBs
music
session 19 2/s TBs,
4 kHz
2/s NBs
35/s TBs,
4 kHz
35/s NBs
music
session 20 speech
cont. noise
35/s NBs
music
0.17
0.50
0.36
0.54
1.04
0.84
0.98
0.40
0.54
0.09
0.31
0.68
-0.09
0.56
0.39
0.40
0.71
0.71
0.59
0.65
0.05
30
55
0.33
0.03
0.22
0.28
0.79
0.14
0.62
0.90
0.60
1.79
0.76
0.52
0.73
0.60
0.00
0.38
0.35
-0.02
-0.32
-0.11
-0.09
0.21
-0.15
-0.13
27
29
16
22
55
55
56
56
0.60
1.35
-0.74
1.26
0.28
0.34
19
56
0.67
0.72
1.85
1.49
-0.56
-0.40
0.91
0.27
0.63
0.66
0.08
0.01
9
19
56
56
0.14
0.16
0.19
0.78
0.45
0.74
1.91
1.23
1.10
0.33
-0.55
0.16
-0.44
-0.35
-0.45
-0.37
-0.18
0.12
11
18
16
56
47
47
0.67
1.62
-0.05
0.42
0.75
0.07
13
47
0.69
0.81
2.28
1.65
-0.24
-0.26
0.27
0.19
0.88
1.01
-0.20
-0.27
15
24
47
47
0.14
0.29
0.75
1.08
1.99
0.78
-0.11
-0.01
-0.63
-0.60
-0.11
0.22
11
3
47
55
0.28
0.42
0.90
0.95
0.65
0.41
0.11
0.84
-0.15
0.53
-0.03
-0.24
11
5
55
55
0.53
0.10
0.25
0.61
0.17
–
1.47
0.53
0.29
0.88
0.64
–
0.17
1.77
0.73
0.26
1.13
–
0.61
0.68
0.02
0.43
0.19
–
0.60
-1.12
0.23
0.79
-0.10
–
-0.21
-0.06
0.07
-0.11
0.21
–
8
8
16
14
8
0
55
55
48
48
48
70
0.44
–
1.15
–
-0.04
–
0.62
–
0.09
–
0.14
–
5
0
70
70
0.41
0.16
0.15
0.40
0.41
0.15
1.08
0.39
0.38
0.91
0.90
0.37
0.46
1.32
0.88
0.33
0.15
0.84
-0.14
-0.91
0.02
0.37
0.49
0.02
0.12
-0.31
-0.65
0.24
0.16
-0.05
-0.41
-0.74
-0.29
0.00
0.20
-0.03
8
10
18
7
12
6
70
70
56
56
56
56
180
Appendix
Stimulus
session 21 2/s TBs,
4 kHz
35/s TBs,
4 kHz
35/s TBs,
500 Hz
35/s NBs
music
session 22 2/s NBs
2/s BP NBs,
4 kHz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
session 23 2/s NBs
2/s BP NBs,
500 Hz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
session 24 2/s NBs
cont. noise
35/s NBs
music
session 25 2/s NBs
cont. noise
35/s NBs
music
session 26 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
session 27 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
Undershoot # Active
Vox
-0.15
15
# HG
Vox
57
WI
Onset
Sustained
Ramp
Offset
0.27
0.51
0.62
-0.11
0.08
0.49
0.67
0.44
0.13
0.49
-0.07
10
57
0.51
1.08
0.51
0.23
0.66
0.22
9
57
0.49
0.00
0.28
0.18
1.00
-0.82
0.70
0.39
0.69
2.32
0.44
0.69
-0.04
-0.64
0.23
0.02
0.63
-0.20
-0.29
-0.36
0.11
-0.56
-0.14
-0.15
14
12
23
17
57
57
60
60
0.53
0.84
0.01
0.65
0.41
0.16
18
60
0.60
1.28
-0.04
0.53
0.53
0.04
22
60
0.55
0.10
0.28
0.30
1.02
0.41
0.52
1.21
0.25
1.81
1.00
0.58
0.04
-0.18
-0.45
0.47
0.41
-0.19
0.21
-0.36
-0.10
-0.05
0.59
0.03
32
16
15
16
60
60
52
52
0.68
1.60
-0.45
0.81
0.58
0.30
16
52
0.66
2.02
-0.32
0.70
0.69
-0.05
19
52
0.64
0.17
0.18
0.43
0.59
0.02
0.31
0.36
0.79
0.11
0.22
0.14
0.42
0.68
0.55
0.00
0.26
0.24
0.56
0.80
0.77
0.03
1.77
0.97
0.40
1.08
1.13
0.04
0.71
0.94
0.98
0.37
0.79
0.54
1.50
2.09
1.55
-0.32
1.03
1.01
1.64
1.12
1.69
0.05
0.00
2.43
0.66
0.12
-0.01
0.80
0.26
0.18
-0.20
1.18
0.69
1.21
0.42
-0.26
0.26
2.49
0.63
1.06
0.14
-0.22
-0.32
2.59
0.37
-0.95
0.07
0.29
0.27
0.20
0.39
0.36
0.33
0.30
0.66
0.36
1.00
1.01
0.39
0.11
0.60
0.10
1.00
0.90
1.01
-0.53
0.74
-0.21
-0.21
0.08
0.33
-0.41
-0.43
-0.09
0.57
-0.67
-0.38
-0.68
0.51
1.07
0.67
-0.15
-0.29
-0.78
0.90
1.23
1.18
0.11
-0.19
-1.17
0.02
-0.10
-0.04
-0.01
0.43
-0.18
0.29
-0.41
-0.32
-0.06
0.29
0.00
-0.26
0.46
0.04
-0.51
-0.03
0.06
0.03
-0.85
20
9
28
25
25
18
20
21
26
14
14
17
17
10
10
3
27
38
36
33
33
20
52
52
62
62
62
62
49
49
49
49
52
52
52
52
52
52
62
62
62
62
62
62
HG Data
181
Undershoot # Active
Vox
-0.67
14
# HG
Vox
33
Stimulus
WI
Onset
Sustained
Ramp
Offset
session 28 cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 29 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 30 cont. noise,
35 dB SL
cont. noise
cont. noise,
75 dB SL
music
session 31 cont. noise,
35 dB SL
cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 32 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 33 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 34 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
0.38
0.99
0.07
0.51
-0.14
0.31
0.38
0.77
0.92
0.18
0.48
0.57
0.38
-0.46
0.29
-0.39
-0.23
18
21
33
33
0.05
0.71
0.15
1.16
1.40
-0.18
0.09
0.51
-0.17
0.59
-0.16
0.06
13
5
33
79
0.69
0.74
0.95
1.02
-0.17
-0.63
0.46
0.76
0.45
0.49
-0.45
-0.28
11
20
79
79
0.62
0.35
-0.13
0.48
0.87
-0.08
19
79
0.03
0.28
-0.06
0.75
1.62
0.48
-0.42
0.20
0.10
-0.23
-0.18
-0.38
6
18
79
36
0.26
0.32
0.64
1.14
0.65
0.46
-0.11
0.39
-0.14
-0.08
-0.25
-0.07
18
23
36
36
0.09
0.50
0.37
2.36
1.62
-1.30
0.11
2.20
-0.75
-0.36
0.07
-0.76
15
15
36
54
0.62
2.16
-1.33
2.08
0.51
-0.84
22
54
0.64
0.63
1.79
1.50
-1.21
-0.82
1.87
1.87
0.52
0.54
-1.27
-0.52
14
25
54
54
0.24
0.64
1.30
1.13
0.88
-0.43
1.00
0.69
-0.17
0.32
-0.79
-0.28
19
3
54
47
0.79
0.84
1.19
0.99
-0.31
-0.22
0.63
0.62
0.69
0.82
-0.26
0.01
8
9
47
47
0.70
1.18
-0.56
0.79
0.47
-0.12
8
47
0.11
0.64
-0.29
1.96
1.69
-0.80
-0.15
1.03
0.46
0.56
-0.22
-0.16
5
8
47
49
0.77
0.74
1.26
1.58
-0.01
-0.31
0.50
1.16
1.06
1.16
0.43
0.43
9
9
49
49
0.67
1.62
-0.52
1.28
0.73
0.00
11
49
0.04
0.32
-0.07
1.41
2.45
0.76
0.38
0.13
0.20
-0.16
-0.06
-0.57
10
8
49
45
0.25
0.96
0.74
0.37
-0.30
-0.37
11
45
0.28
0.97
0.61
0.31
-0.01
-0.88
14
45
182
Appendix
Stimulus
session 35 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 36 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 37 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 38 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
Undershoot # Active
Vox
-0.15
11
# HG
Vox
38
WI
Onset
Sustained
Ramp
Offset
0.22
0.85
0.95
0.24
-0.76
0.26
1.00
0.65
0.53
-0.82
-0.24
24
38
0.26
0.97
0.68
0.43
-0.57
-0.22
22
38
0.30
1.37
0.64
0.60
-0.95
-0.64
15
50
0.31
1.50
0.39
1.11
-0.77
-0.22
17
50
0.24
0.94
0.61
0.89
-1.15
-0.29
13
50
0.12
0.41
1.11
0.37
-0.47
-0.07
15
80
0.31
1.16
0.44
0.53
-0.41
-0.62
17
80
0.25
0.72
0.64
0.49
0.08
-0.65
22
80
0.12
0.35
1.26
-0.31
-0.12
0.04
8
52
0.17
0.56
1.06
0.11
-0.12
0.10
16
52
0.03
0.10
1.62
-0.13
-0.16
-0.07
14
52
STG Data
183
STG DATA
Table A-3: Same as Table A-2, but for the superior temporal gyrus.
Stimulus
session 1
session 2
session 3
session 4
session 5
session 6
session 7
session 8
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
Undershoot # Active
Vox
0.64
8
0.67
6
0.53
7
-0.20
7
-0.10
11
-0.20
6
-0.08
11
–
0
-0.36
5
-0.64
18
1.07
4
0.70
17
0.29
14
0.76
6
0.31
18
0.21
23
0.62
20
0.05
17
-0.36
9
0.43
10
# STG
Vox
63
63
63
76
76
76
76
69
69
69
75
75
75
75
71
71
71
71
89
89
WI
Onset
Sustained
Ramp
Offset
0.17
0.45
0.81
0.27
0.17
0.57
0.92
–
0.48
0.75
0.22
0.13
0.30
0.52
0.17
0.16
0.31
0.60
0.36
0.25
-0.13
1.77
1.63
1.59
0.84
2.19
2.10
–
3.04
3.16
-0.35
0.44
0.83
0.64
1.10
1.14
1.74
2.29
0.87
1.25
1.28
0.63
0.09
0.82
1.73
-0.15
-0.46
–
-1.49
-1.35
1.35
1.00
1.08
0.72
1.99
2.12
1.76
0.57
1.22
0.96
0.69
0.90
0.48
1.15
-0.04
1.39
0.64
–
3.27
2.13
-0.30
0.49
0.37
-0.28
0.23
0.42
1.62
1.12
-0.77
0.53
0.87
0.79
1.76
-0.58
-0.01
0.91
1.77
–
-1.28
1.58
0.91
-0.18
0.44
1.59
-0.25
-0.25
0.91
1.82
0.35
-0.01
0.68
0.73
1.26
1.15
-0.01
-0.26
0.61
1.08
0.86
0.94
-0.07
0.27
6
2
89
89
0.23
0.23
0.59
1.03
1.35
1.17
-0.42
0.00
0.20
-0.12
0.29
0.79
12
18
80
80
0.68
0.67
2.24
2.41
-0.53
0.03
1.12
0.34
0.85
1.12
-0.63
0.32
15
7
80
80
0.13
0.27
0.37
0.23
0.66
1.68
1.58
1.64
1.12
-0.07
0.18
0.66
0.24
0.62
0.63
-0.45
-0.26
-0.15
41
15
13
80
117
117
0.66
0.50
2.94
2.74
-1.88
-3.08
1.14
2.63
0.93
-0.41
0.39
0.98
11
9
117
117
0.02
0.00
3.09
-0.35
0.09
-0.27
14
117
184
session 9
session 10
session 11
session 12
session 13
session 14
session 15
Appendix
Undershoot # Active
Vox
0.15
30
0.34
31
# STG
Vox
80
80
Stimulus
WI
Onset
Sustained
Ramp
Offset
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
1/s NBs
2/s NBs
10/s NBs
35/s NBs
1/s NBs
2/s NBs
10/s NBs
35/s NBs
music
cont. noise
35/s NBs
35/s NBs,
25% STF
35/s clicks
music
cont. noise
35/s NBs
35/s NBs,
50% STF
35/s NBs,
25% STF
35/s clicks
music
2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
0.21
0.43
0.34
0.97
1.59
1.06
0.05
0.37
0.62
0.91
0.80
0.55
2.19
1.82
-0.14
0.31
0.05
0.94
1.33
1.04
-0.19
0.36
20
24
80
80
0.12
0.07
0.20
-0.53
0.45
0.84
2.35
2.36
2.81
-0.44
0.82
0.58
0.68
0.02
0.71
0.07
0.70
0.57
29
34
35
80
61
61
0.43
0.47
1.72
1.74
1.63
1.65
0.77
0.58
1.51
1.78
1.01
0.59
32
26
61
61
0.42
0.40
0.63
0.65
0.13
0.15
0.31
0.43
0.01
0.55
0.89
0.56
1.73
1.46
3.22
4.11
0.98
1.25
1.44
2.57
-0.04
1.68
1.33
1.36
-0.18
0.15
-0.91
-1.25
2.18
2.67
2.70
1.52
4.74
-0.05
-0.30
0.19
1.03
0.45
1.43
1.28
0.98
0.67
1.34
1.31
0.34
0.32
0.56
0.78
-0.49
-0.88
0.84
1.24
-0.24
-0.07
1.66
1.52
0.10
0.28
1.05
0.81
0.31
0.58
-1.72
-0.53
0.39
0.11
0.37
0.55
-0.37
-0.29
0.23
0.24
14
16
30
38
40
39
42
40
38
33
32
28
67
67
67
67
77
77
77
77
77
65
65
65
0.63
0.14
0.25
0.64
0.51
1.49
0.54
0.87
1.71
1.31
0.20
1.50
1.82
0.48
0.83
0.49
-0.30
-0.22
0.28
0.74
0.95
-0.07
0.40
1.26
1.27
-0.16
-0.55
0.13
0.15
0.55
32
28
29
24
22
65
65
66
66
66
0.39
0.93
1.30
0.47
1.12
0.36
28
66
0.43
0.04
0.23
0.27
1.03
0.04
0.70
1.11
1.08
3.23
0.87
0.70
0.67
1.45
-0.09
0.51
1.24
0.33
-0.05
0.02
0.17
0.24
-0.23
-0.08
27
35
20
22
66
66
63
63
0.69
1.72
-0.49
1.06
0.71
0.28
23
63
0.72
0.76
1.93
1.90
-0.35
-0.65
0.56
0.24
0.84
0.97
-0.07
0.01
16
28
63
63
0.14
0.40
2.43
-0.05
0.41
-0.23
18
63
STG Data
185
Stimulus
session 16 2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
session 17 2/s TBs,
500 Hz
2/s NBs
35/s TBs,
500 Hz
35/s NBs
music
session 18 2/s NBs
35/s NBs
music
session 19 2/s TBs,
4 kHz
2/s NBs
35/s TBs,
4 kHz
35/s NBs
music
session 20 speech
cont. noise
35/s NBs
music
session 21 2/s TBs,
4 kHz
35/s TBs,
4 kHz
35/s TBs,
500 Hz
35/s NBs
music
session 22 2/s NBs
2/s BP NBs,
4 kHz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
Undershoot # Active
Vox
-0.17
26
0.02
19
# STG
Vox
77
77
WI
Onset
Sustained
Ramp
Offset
0.24
0.18
0.90
0.75
1.06
1.15
-0.20
0.26
-0.05
-0.37
0.71
1.63
0.10
0.23
0.99
0.08
20
77
0.77
0.89
1.79
1.80
-0.30
-0.33
0.01
-0.01
0.96
1.40
-0.39
-0.14
25
24
77
77
0.16
0.25
0.84
1.08
1.86
1.12
-0.16
-0.03
-0.39
-0.03
-0.20
0.22
17
31
77
69
0.22
0.66
1.01
1.79
1.33
0.05
0.22
0.91
0.05
1.24
0.23
-0.06
31
29
69
69
0.75
0.12
0.19
0.61
0.10
0.50
2.13
0.23
0.37
0.82
0.36
-0.43
-0.19
2.82
0.92
0.29
1.23
-1.72
0.76
-0.04
0.28
0.43
0.42
2.22
1.34
0.59
0.16
0.98
0.01
2.98
0.09
0.09
0.14
-0.03
0.30
-0.66
29
22
33
30
26
1
69
69
65
65
65
80
0.40
0.39
2.03
1.31
-0.17
-0.46
1.33
1.62
-0.04
-0.03
0.02
0.36
11
2
80
80
0.54
0.26
0.09
0.51
0.57
0.09
0.29
1.27
1.21
0.38
1.42
1.09
0.20
1.02
0.67
1.38
1.75
0.60
0.34
2.13
0.96
-0.27
-0.56
-0.15
0.27
0.49
-0.32
0.11
0.69
-0.39
-0.80
0.77
0.81
0.21
0.16
-0.30
-0.88
-0.52
-0.05
0.41
-0.32
-0.20
19
17
51
21
32
36
12
80
80
89
89
89
89
66
0.41
1.08
0.38
0.61
0.37
-0.37
10
66
0.51
0.84
0.80
0.02
0.98
0.52
19
66
0.56
0.00
0.32
0.31
1.18
-0.66
0.70
0.61
0.88
2.40
0.22
0.25
-0.24
-0.53
0.39
0.24
0.98
-0.04
-0.33
-0.32
0.06
-0.37
-0.08
-0.15
20
28
29
25
66
66
72
72
0.64
0.94
-0.27
0.79
0.43
0.05
18
72
0.68
1.22
-0.26
0.66
0.51
0.22
27
72
0.56
0.18
1.46
0.59
-0.56
2.09
0.64
-0.36
0.18
0.32
-0.12
0.33
33
29
72
72
186
Appendix
Stimulus
session 23 2/s NBs
2/s BP NBs,
500 Hz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
session 24 2/s NBs
cont. noise
35/s NBs
music
session 25 2/s NBs
cont. noise
35/s NBs
music
session 26 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
session 27 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
session 28 cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 29 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
Undershoot # Active
Vox
0.42
18
0.01
26
# STG
Vox
83
83
WI
Onset
Sustained
Ramp
Offset
0.42
0.28
0.65
0.88
1.02
0.66
-0.37
0.20
0.60
0.03
0.74
1.45
-0.26
0.42
0.71
0.17
16
83
0.64
1.81
0.01
0.39
0.78
-0.25
21
83
0.71
0.20
0.24
0.54
0.65
0.13
0.35
0.61
0.87
0.11
0.32
0.14
0.51
0.67
0.64
0.09
0.25
0.38
0.65
0.94
0.78
0.27
0.59
1.71
0.28
0.74
1.41
1.49
0.39
0.74
1.10
0.54
0.43
1.05
0.57
1.72
1.87
0.81
-2.09
0.86
1.41
1.94
1.56
1.78
0.52
1.28
-0.06
1.69
0.69
-0.04
-0.10
2.44
-0.04
-0.13
-0.62
1.48
0.28
1.36
-0.18
-0.94
0.31
6.62
0.99
0.69
0.15
-0.27
-0.07
2.30
-0.11
0.31
-1.24
0.21
0.21
0.25
-0.85
0.69
0.38
0.36
0.22
0.95
0.20
0.93
1.18
0.24
-2.09
0.61
0.04
0.43
0.64
0.75
-0.71
0.61
0.84
0.27
-0.18
0.19
0.47
0.25
-0.35
0.32
0.40
-0.48
0.09
-0.30
0.32
0.65
0.89
1.24
0.23
0.18
1.04
1.48
1.48
1.13
0.48
0.01
-0.49
0.11
-0.20
0.08
-0.26
0.89
-0.02
0.71
-0.19
-0.07
-0.51
0.32
0.05
-0.20
1.44
0.19
-0.48
0.04
0.21
0.19
-0.90
-0.25
24
21
46
46
43
36
28
10
28
15
10
27
10
10
5
1
36
48
33
23
33
11
32
83
83
63
63
63
63
57
57
57
57
62
62
62
62
62
62
81
81
81
81
81
81
80
0.61
0.78
1.11
1.17
0.05
0.29
0.57
0.01
0.64
1.27
-0.14
0.14
29
30
80
80
0.28
0.93
0.24
1.87
1.82
-0.48
-0.66
0.98
1.36
1.60
-0.02
0.27
20
7
80
69
0.87
0.80
2.67
2.31
-0.38
-1.42
0.48
1.03
1.98
1.36
-0.59
-0.61
7
10
69
69
0.72
0.72
-0.40
0.72
1.66
-0.73
12
69
0.58
2.63
1.02
0.27
1.78
-0.91
5
69
STG Data
187
Undershoot # Active
Vox
0.09
37
# STG
Vox
82
Stimulus
WI
Onset
Sustained
Ramp
Offset
session 30 cont. noise,
35 dB SL
cont. noise
cont. noise,
75 dB SL
music
session 31 cont. noise,
35 dB SL
cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 32 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 33 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 34 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 35 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 36 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
0.63
0.79
0.70
-0.48
0.81
0.66
0.87
0.84
1.44
0.67
-0.12
-0.51
0.41
0.90
1.22
0.25
0.08
36
42
82
82
0.32
0.50
1.37
1.53
2.00
-1.13
-0.13
1.71
0.72
-0.55
0.39
-0.52
29
11
82
49
0.55
1.22
-1.02
1.57
0.13
-0.59
13
49
0.69
0.68
1.33
1.08
-0.93
-0.78
1.26
1.57
0.51
0.39
-0.88
-0.35
7
13
49
49
0.19
0.68
0.62
1.80
0.82
-0.96
0.46
1.14
-0.57
0.64
-0.57
-0.55
24
15
49
51
0.88
0.86
1.92
1.95
-0.60
-0.91
0.70
1.02
1.46
1.41
-0.34
-0.04
16
26
51
51
0.81
1.68
-0.97
1.06
1.03
-0.02
26
51
0.21
0.75
0.56
2.55
2.65
-0.73
0.09
0.80
0.95
1.26
-0.84
-0.32
18
20
51
76
0.73
0.93
1.56
1.72
0.41
-0.32
0.18
0.89
1.83
1.71
0.52
0.34
13
27
76
76
0.81
2.24
-0.57
1.16
1.39
-0.26
19
76
0.17
0.30
-0.90
1.42
3.88
0.86
-1.14
0.38
1.68
0.08
0.24
-0.23
26
41
76
92
0.23
0.74
1.32
-0.09
0.17
-0.33
55
92
0.24
0.75
1.34
0.11
0.29
-0.57
53
92
0.28
1.67
1.20
0.25
-0.20
0.20
34
68
0.24
1.49
1.36
0.40
-0.27
-0.27
34
68
0.39
1.73
0.89
0.58
0.53
-0.26
35
68
0.32
1.22
0.47
0.47
-0.90
-0.43
27
47
0.30
1.18
0.33
0.93
-0.97
-0.13
24
47
0.26
0.87
0.48
0.63
-0.99
-0.01
15
47
188
Appendix
Stimulus
session 37 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 38 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
Undershoot # Active
Vox
0.00
13
# STG
Vox
73
WI
Onset
Sustained
Ramp
Offset
0.28
0.73
1.41
0.00
0.48
0.39
2.29
0.05
1.15
-0.43
-0.49
11
73
0.28
1.65
0.92
1.01
0.08
-0.31
11
73
0.02
0.09
2.57
-0.53
-0.41
0.11
30
67
0.05
0.33
2.85
-0.33
-0.37
0.06
37
67
0.01
-0.01
3.39
-0.11
0.04
0.33
35
67
IC Data
189
IC DATA
Table A-4: Response quantification for the inferior colliculus. For each session and stimulus, the
most significant (i.e., lowest p-value) voxel in the left and right IC was identified, based on an Fstatistic computed from the “sustained-only” basis “set” (see Chap. 3). For this voxel, an empirical
response time course was computed by averaging across repeated presentations of a given stimulus
in a given imaging session, and then converting to a percent change in signal relative to the
preceding baseline (see Chaps. 2, 3, or 4 for details). Response magnitude was quantified using two
measures computed from these percent change time courses: “Time-Average” percent change – the
mean percent change from t = 4 to 30 s, and “Onset” percent change – the maximum percent change
from t = 4 to 10 s (see Chap. 2).3 See Table A-2 caption for stimulus abbreviations. Sound pressure
level (dB SPL) is indicated for the sessions in which this calibration was possible. (SPL was
computed based on the root-mean-square of the entire 30 s stimulus, after first filtering by the
frequency response of the sound delivery system). Only sessions that utilized cardiac gated imaging
are included as part of this table. (Consequently, sessions 17 and 18 of Table A-1 are excluded).
NA: not available.
LEFT Hemisphere
Stimulus
session 1 2/s NBs
10/s NBs
35/s NBs
session 2 1/s NBs
2/s NBs
10/s NBs
35/s NBs
session 3 2/s NBs
10/s NBs
35/s NBs
session 4 1/s NBs
2/s NBs
10/s NBs
35/s NBs
3
SPL
(dB)
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
TimeAverage
0.29
0.51
1.67
0.15
-0.05
0.82
1.15
0.66
0.39
0.29
0.30
1.28
0.19
1.58
RIGHT Hemisphere
Onset
p-value
0.47
0.46
1.93
0.53
0.34
1.48
1.52
1.16
0.82
0.51
-0.24
1.19
0.03
1.79
3.08E-01
1.65E-04
9.64E-14
1.74E-02
4.76E-02
2.66E-04
4.17E-10
3.34E-01
1.60E-01
5.70E-02
1.23E-04
1.54E-02
3.26E-03
7.47E-05
SPL
(dB)
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
TimeAverage
0.84
0.35
1.56
0.41
0.24
0.25
1.04
0.38
0.33
0.78
-0.33
0.43
0.56
1.66
Onset
p-value
0.84
0.80
1.41
0.64
0.55
0.48
1.09
0.48
0.44
0.82
-0.17
0.54
0.86
2.63
7.83E-04
5.69E-03
1.57E-12
1.87E-01
1.06E-01
4.01E-01
1.42E-09
4.54E-02
1.42E-02
9.63E-06
2.37E-01
3.25E-02
2.44E-03
5.42E-11
Note that “Onset” percent change of this table (and Chap. 2) has a different definition and interpretation
compared to the onset basis function used to quantify results in Chaps. 3 and 4. Note also that the analyses
underlying this table differ slightly from those in Chap. 2 in two respects: (1) the voxel was selected based on
the “sustained-only” basis set, rather than a t-test, and (2) the lowest p-value voxel (within the anatomically
defined IC region) was selected for each stimulus, whereas in Chap. 2 the same voxel (selected from the t-map
for the noise burst train with a repetition rate of 35/s) was used to quantify all stimuli within a given session.
190
Appendix
LEFT Hemisphere
Stimulus
session 5 1/s NBs
2/s NBs
10/s NBs
35/s NBs
session 6 2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
session 7 2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
session 8 2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
session 9 2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
music
session 10 2/s NBs
2/s NBs,
50% STF
35/s NBs
35/s NBs,
50% STF
session 11 1/s NBs
2/s NBs
10/s NBs
35/s NBs
session 12 1/s NBs
2/s NBs
10/s NBs
35/s NBs
music
RIGHT Hemisphere
SPL
(dB)
NA
NA
NA
NA
77
87
TimeAverage
0.15
0.54
1.09
0.77
-0.30
0.48
Onset
p-value
TimeAverage
0.06
0.33
1.58
1.43
0.56
0.79
Onset
p-value
2.01E-01
1.15E-02
4.03E-05
8.10E-09
2.13E-01
7.30E-06
SPL
(dB)
NA
NA
NA
NA
80
90
0.33
0.99
1.06
1.23
-0.06
0.78
0.59
0.54
1.73
1.74
0.74
1.00
1.88E-03
2.62E-03
1.57E-10
1.00E-15
1.01E-03
8.63E-05
89
86
0.48
1.24
0.64
1.03
8.47E-08
1.45E-15
92
89
0.17
1.00
0.34
0.96
6.13E-07
4.95E-10
69
79
-0.14
0.76
0.05
1.36
8.57E-02
3.60E-02
83
93
0.14
0.22
0.37
0.47
2.13E-01
6.37E-02
81
78
0.89
0.83
1.80
1.26
1.02E-11
7.57E-10
95
92
0.34
0.67
0.55
0.75
1.28E-07
6.15E-06
78
66
76
0.96
0.72
1.25
1.24
1.09
1.20
2.93E-13
3.92E-03
9.12E-02
92
71
81
0.66
0.56
0.35
0.84
1.09
0.57
3.22E-10
8.75E-04
1.17E-03
78
75
0.42
0.25
0.57
0.51
2.91E-03
4.90E-02
83
80
0.71
0.20
1.25
0.98
3.12E-05
1.41E-01
75
76
86
0.24
0.61
0.27
0.17
0.99
0.69
1.66E-01
3.46E-02
1.37E-03
80
73
83
0.60
0.57
0.48
0.90
0.85
0.73
4.67E-02
1.78E-03
1.21E-05
88
85
0.97
0.78
1.32
1.22
2.46E-11
2.78E-08
85
82
1.13
1.18
1.01
1.45
3.56E-15
1.00E-15
85
71
81
1.34
0.43
1.01
1.38
0.34
1.72
6.90E-07
1.26E-02
6.17E-07
82
64
74
1.73
0.72
2.20
1.77
0.90
6.08
1.00E-15
6.48E-04
1.41E-06
83
80
1.18
1.15
1.34
1.39
4.87E-11
1.92E-08
76
73
1.22
1.03
1.30
1.25
1.06E-10
4.16E-08
64
69
76
81
59
64
71
76
73
-0.45
-0.03
2.77
0.27
0.69
0.37
0.55
1.90
1.51
3.38
3.13
7.70
6.20
1.10
0.55
0.83
1.86
1.36
1.04E-01
2.45E-01
9.17E-03
5.36E-04
2.40E-02
2.36E-05
1.76E-07
1.07E-15
2.37E-06
66
70
77
82
59
63
70
75
72
-0.48
-0.10
1.76
1.35
-0.22
-0.13
0.95
1.55
1.00
3.57
3.39
8.14
0.60
0.51
0.15
1.52
1.90
1.79
1.37E-01
1.18E-01
1.13E-02
1.24E-04
1.91E-01
7.24E-04
2.60E-07
8.34E-12
3.71E-08
IC Data
191
LEFT Hemisphere
Stimulus
session 13 cont. noise
35/s NBs
35/s NBs,
25% STF
35/s clicks
music
session 14 cont. noise
35/s NBs
35/s NBs,
50% STF
35/s NBs,
25% STF
35/s clicks
music
session 15 2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
session 16 2/s NBs
2/s NBs,
70 dB SL
35/s NBs,
40 dB SL
35/s NBs
35/s NBs,
70 dB SL
music
session 19 2/s TBs,
4 kHz
2/s NBs
35/s TBs,
4 kHz
35/s NBs
music
session 20 speech
cont. noise
35/s NBs
music
RIGHT Hemisphere
SPL
(dB)
75
73
71
TimeAverage
0.60
0.55
0.35
Onset
p-value
TimeAverage
0.78
0.48
0.38
Onset
p-value
2.04E-11
6.56E-13
6.91E-10
SPL
(dB)
74
76
71
0.84
0.98
0.41
0.80
0.58
0.54
9.25E-11
9.16E-12
7.23E-12
75
71
78
79
77
0.70
0.19
0.72
0.77
0.95
0.78
0.75
0.96
1.10
0.66
3.06E-13
1.30E-03
2.98E-08
3.49E-12
2.76E-10
75
70
78
79
73
0.54
0.58
0.60
0.71
0.64
0.79
0.85
0.84
0.76
0.68
1.00E-15
1.04E-04
1.26E-10
5.73E-15
1.00E-09
75
0.43
0.54
7.12E-09
68
0.31
0.64
2.08E-09
78
74
62
77
0.79
1.27
0.23
-0.02
0.88
1.59
0.37
0.29
1.47E-08
4.49E-07
2.15E-02
1.62E-02
74
74
66
81
0.56
0.74
0.15
0.03
0.63
1.06
0.46
0.32
3.31E-10
1.65E-06
3.39E-02
1.17E-01
59
0.43
0.44
3.06E-06
63
0.52
0.56
2.19E-07
74
89
0.70
0.81
0.78
0.99
1.00E-15
1.00E-15
78
93
0.44
0.84
0.65
0.91
1.15E-13
1.00E-15
71
70
85
1.28
0.57
0.44
1.48
0.57
0.39
2.54E-14
8.49E-09
1.89E-06
75
73
88
0.68
0.16
0.43
1.02
0.37
0.53
1.04E-07
3.90E-04
1.05E-05
67
1.04
1.02
1.00E-15
70
0.86
0.98
1.00E-15
82
97
1.23
1.42
1.55
1.49
1.00E-15
1.00E-15
85
100
0.65
1.41
0.90
1.53
1.00E-15
1.00E-15
79
55
1.38
-0.18
1.52
0.61
5.35E-14
1.46E-02
82
54
1.12
0.59
1.65
1.38
1.22E-08
6.50E-03
63
67
-0.35
-0.35
-0.01
-0.24
1.20E-02
3.09E-02
62
66
0.60
0.16
0.99
0.52
1.08E-02
2.56E-01
75
72
83
70
72
69
0.33
0.63
0.70
0.83
0.86
0.68
0.72
0.52
0.71
1.00
1.06
0.91
1.89E-05
5.90E-03
9.19E-06
3.27E-11
5.56E-15
3.26E-03
74
71
85
73
75
72
0.64
0.74
0.59
0.99
0.83
0.77
1.12
0.49
0.80
1.05
0.85
0.88
2.32E-06
5.27E-03
2.00E-05
1.63E-13
1.00E-15
1.08E-05
192
Appendix
LEFT Hemisphere
Stimulus
session 21 2/s TBs,
4 kHz
35/s TBs,
4 kHz
35/s TBs,
500 Hz
35/s NBs
music
session 22 2/s NBs
2/s BP NBs,
4 kHz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
session 23 2/s NBs
2/s BP NBs,
500 Hz
35/s BP NBs,
4 kHz
35/s BP NBs,
500 Hz
35/s NBs
music
session 24 2/s NBs
cont. noise
35/s NBs
music
session 25 2/s NBs
cont. noise
35/s NBs
music
session 26 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
session 27 2/s NBs
speech
35/s clicks
100/s clicks
35/s NBs
music
RIGHT Hemisphere
SPL
(dB)
59
TimeAverage
0.17
Onset
p-value
TimeAverage
0.62
Onset
p-value
7.77E-03
SPL
(dB)
61
1.02
1.12
2.19E-05
71
0.35
0.45
2.12E-06
73
0.52
0.92
2.42E-03
85
0.46
0.46
1.41E-06
80
0.97
0.99
1.00E-15
76
73
80
71
0.55
1.33
0.49
0.54
0.51
1.25
0.81
0.80
6.07E-07
5.31E-05
4.30E-02
4.01E-04
74
71
67
61
0.86
0.87
0.19
0.18
1.20
0.67
0.39
0.51
1.55E-14
2.48E-05
7.34E-02
1.32E-03
84
0.25
0.43
6.19E-05
74
0.62
0.69
8.68E-09
98
1.08
1.06
1.21E-12
95
0.82
0.94
1.00E-15
92
89
64
76
1.13
1.03
0.28
0.37
1.44
1.66
0.48
0.56
8.69E-13
2.11E-05
1.10E-02
7.13E-02
79
76
63
77
1.17
0.90
0.33
0.36
1.69
1.35
0.51
0.55
1.00E-15
1.55E-06
1.08E-03
1.42E-03
68
0.55
0.80
1.48E-03
71
0.44
0.73
1.64E-03
92
1.05
1.33
3.94E-11
92
1.27
1.36
1.00E-15
76
73
68
81
80
77
68
81
80
77
63
88
75
77
75
72
70
85
83
81
82
79
1.01
0.79
0.32
0.32
0.35
1.07
0.56
0.98
1.27
0.43
-0.02
0.76
0.87
1.28
0.99
1.03
0.32
0.50
1.16
0.97
0.84
1.16
1.32
1.25
0.51
0.56
0.67
0.92
0.72
1.07
1.17
0.68
1.21
1.00
0.49
1.07
0.82
0.83
0.50
1.04
1.47
1.21
1.16
1.48
8.50E-11
1.89E-03
7.73E-02
9.36E-05
3.04E-04
2.37E-04
3.23E-02
2.73E-10
3.88E-15
2.60E-04
7.42E-02
3.97E-12
2.41E-11
1.00E-15
1.04E-09
1.38E-04
1.44E-02
1.78E-09
1.00E-15
1.00E-15
1.00E-15
1.08E-08
75
72
65
78
77
74
66
79
78
75
63
88
74
77
75
72
75
84
88
80
87
84
1.00
0.77
-0.29
0.55
0.49
0.99
0.08
0.97
0.56
1.22
0.23
0.57
1.04
1.40
1.55
1.36
0.98
0.41
0.85
0.69
0.78
1.07
1.13
1.16
-0.20
0.69
0.94
1.33
0.39
1.51
1.03
1.08
0.66
0.99
0.90
1.51
1.69
1.52
1.19
0.75
1.19
0.64
1.03
1.23
1.00E-15
1.81E-06
2.95E-01
7.05E-09
8.22E-09
3.95E-08
4.28E-02
3.72E-13
4.80E-12
1.79E-05
7.09E-04
4.57E-14
1.00E-15
1.00E-15
1.00E-15
2.76E-09
2.91E-05
2.55E-12
1.00E-15
1.00E-15
1.00E-15
1.06E-11
IC Data
193
LEFT Hemisphere
Stimulus
session 28 cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 29 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 30 cont. noise,
35 dB SL
cont. noise
cont. noise,
75 dB SL
music
session 31 cont. noise,
35 dB SL
cont. noise,
45 dB SL
cont. noise
cont. noise,
65 dB SL
music
session 32 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 33 cont. noise,
35 dB SL
cont. noise
cont. noise,
65 dB SL
cont. noise,
75 dB SL
music
session 34 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
RIGHT Hemisphere
SPL
(dB)
79
TimeAverage
0.62
Onset
p-value
TimeAverage
0.76
Onset
p-value
7.22E-08
SPL
(dB)
77
0.96
1.09
3.71E-07
89
99
0.55
0.90
0.53
1.15
5.46E-08
1.00E-15
87
97
0.62
0.80
0.78
1.11
1.20E-08
2.48E-13
89
64
0.50
0.30
0.46
0.70
3.30E-03
3.14E-04
87
62
0.84
0.20
1.33
0.41
3.75E-06
2.74E-02
84
94
1.06
0.96
1.33
1.00
1.52E-11
1.16E-12
82
92
0.66
0.81
0.65
1.10
3.00E-07
7.90E-12
104
1.07
1.32
1.10E-13
102
0.77
0.88
5.17E-11
84
62
0.19
0.25
0.34
0.64
2.83E-01
3.64E-04
82
60
0.62
0.73
0.62
1.00
1.91E-02
3.94E-10
82
102
0.86
0.88
0.95
1.32
5.61E-15
1.00E-15
80
100
0.90
0.96
1.23
1.15
1.00E-15
1.00E-15
82
63
0.57
0.49
1.01
1.08
1.88E-05
6.37E-03
80
62
0.75
0.64
1.01
1.73
7.21E-06
4.04E-02
73
0.83
1.17
6.84E-10
72
0.72
1.34
6.92E-05
83
93
0.77
0.85
1.17
1.22
1.03E-08
7.30E-13
82
92
0.62
1.15
1.22
1.77
1.01E-08
2.94E-13
83
60
0.64
0.25
1.11
0.85
1.02E-03
8.23E-03
82
55
0.30
0.49
0.61
0.87
1.98E-02
5.61E-03
80
90
0.92
0.96
0.95
1.21
6.57E-09
3.85E-10
75
85
0.89
1.06
1.03
1.13
2.94E-10
1.89E-11
100
1.12
1.02
1.57E-14
95
0.78
0.95
8.75E-15
80
63
1.24
0.53
1.23
0.58
6.29E-05
2.52E-03
75
61
0.85
0.36
0.91
0.43
2.39E-05
4.44E-02
83
93
0.43
1.02
0.45
0.62
5.46E-03
1.81E-08
81
91
0.50
0.98
0.66
1.38
5.16E-04
9.75E-07
103
1.22
1.11
1.91E-09
101
0.86
1.24
1.26E-07
83
64
-0.57
0.31
-0.16
0.54
6.66E-02
4.68E-02
81
61
-0.60
0.27
-0.46
0.52
1.05E-01
7.28E-02
84
0.59
0.96
1.12E-03
81
0.30
0.34
1.35E-01
94
0.88
0.87
6.46E-06
91
0.73
1.14
2.70E-04
194
Appendix
LEFT Hemisphere
Stimulus
session 35 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 36 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 37 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 38 music,
30 dB SL
music,
50 dB SL
music,
60 dB SL
session 39 2/s NBs
10/s NBs
20/s NBs
35/s NBs
session 40 2/s NBs
10/s NBs
20/s NBs
35/s NBs
session 41 2/s NBs
10/s NBs
20/s NBs
35/s NBs
session 42 2/s NBs
10/s NBs
20/s NBs
35/s NBs
session 43 2/s NBs
10/s NBs
20/s NBs
35/s NBs
session 44 2/s NBs
10/s NBs
20/s NBs
35/s NBs
RIGHT Hemisphere
SPL
(dB)
66
TimeAverage
0.30
Onset
p-value
TimeAverage
0.35
Onset
p-value
1.35E-01
SPL
(dB)
64
0.54
0.49
2.25E-02
86
0.33
0.55
1.77E-01
84
0.65
0.96
2.13E-04
96
0.25
0.76
1.79E-05
94
0.65
0.96
7.58E-04
60
0.92
1.08
4.83E-02
58
0.28
0.61
2.99E-02
80
0.32
0.62
2.28E-02
78
0.42
1.11
1.41E-02
90
0.61
1.02
1.31E-03
88
0.59
0.86
2.18E-02
64
0.87
1.01
1.32E-03
61
0.51
0.66
1.07E-01
84
0.52
0.88
1.10E-04
81
0.43
0.95
2.54E-02
94
0.51
0.91
5.80E-04
91
0.57
1.23
2.09E-04
63
0.68
1.48
2.67E-02
55
0.04
0.20
1.98E-01
83
0.37
0.57
4.30E-02
75
0.24
0.11
5.51E-02
93
0.73
0.61
2.50E-02
85
0.68
0.90
7.48E-03
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
0.28
0.74
1.21
0.97
0.48
1.11
1.00
0.66
2.13
1.35
0.70
0.87
0.15
0.84
0.57
1.00
1.36
1.13
1.20
1.38
0.35
0.57
0.86
0.30
0.24
0.83
1.31
0.93
0.15
1.42
0.86
0.87
2.40
1.77
1.61
0.91
0.44
1.02
0.22
1.13
1.15
1.46
1.49
1.77
0.45
0.82
0.62
0.39
3.33E-03
7.94E-03
8.31E-10
7.09E-06
4.72E-02
5.56E-07
5.22E-14
3.32E-06
1.79E-11
3.55E-14
1.00E-15
9.17E-11
3.59E-03
1.72E-05
5.66E-06
1.94E-08
4.37E-11
4.65E-14
1.00E-15
2.49E-15
1.26E-01
5.00E-03
1.97E-07
1.69E-08
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
0.29
0.34
1.01
1.21
0.83
0.97
0.40
0.58
-0.01
0.47
1.08
0.84
0.64
0.71
1.31
1.07
0.57
0.54
1.13
1.06
0.62
1.32
0.66
0.54
0.41
0.84
1.31
1.22
1.24
1.62
0.86
1.27
2.67
0.75
0.99
0.68
0.70
0.91
0.99
1.58
0.67
0.55
1.30
1.08
0.74
1.30
0.95
0.41
7.23E-02
2.22E-02
5.05E-08
3.39E-09
3.75E-03
8.37E-11
2.88E-07
1.10E-08
1.59E-03
1.95E-07
1.39E-10
1.11E-08
7.85E-03
4.35E-11
1.00E-15
1.35E-14
3.87E-02
3.53E-04
5.48E-14
4.28E-08
3.69E-04
1.63E-04
1.04E-10
2.44E-06
Biography
Michael Patrick Harms
Place of birth: Minneapolis, Minnesota
EDUCATION
Massachusetts Institute of Technology, Cambridge, MA
Ph.D. Speech and Hearing Biosciences and Technology, Harvard-MIT Division of Health
Sciences and Technology, June 2002.
Thesis: Sound temporal envelope and time-patterns of activity in the human auditory
pathway: an fMRI study.
Rice University, Houston, TX
B.S. Electrical Engineering, Summa Cum Laude, May 1994
Senior Honors Project: Modeling the LSO neuron: Inhibition of the excitatory response.
HONORS AND AWARDS
• Martinos Fellowship (1999, 2000). For research in imaging at MIT.
• NIH Training Fellowship (1994-1999)
• Rice Engineering Alumni Outstanding Senior Award (1994). Awarded to a single senior
across all of Rice's engineering departments.
• Phi Beta Kappa, Tau Beta Pi National Honor Societies
• Jones College (Rice University) Sophomore and Junior Scholar (1992, 1993)
TEACHING EXPERIENCE
Department of Electrical Engineering, MIT, Cambridge, MA
Graduate Teaching Assistant, Probabilistic Systems Analysis, Fall 1996
• Prepared and lead weekly recitation and small group tutorial sessions.
PUBLICATIONS
Melcher, J.R., Talavage, T.M., and Harms, M.P. (1999). Functional MRI of the auditory
system. In: Functional MRI. Moonen, C.T.W. and Bandettini, P.A. (eds), Berlin: SpringerVerlag, p. 393-406.
195
ABSTRACTS
Harms, M.P., Sigalovsky, I.S., and Melcher, J.M. (2002). Temporal dynamics of fMRI
responses in human auditory cortex: primary vs. non-primary areas. Assoc. for Research in
Otolaryngology, Midwinter Meeting. Abstract #929.
Harms, M.P., Sigalovsky, I.S., Guinan, J.J., and Melcher, J.R. (2001). Temporal dynamics of
fMRI responses in human auditory cortex: dependence on stimulus type. Assoc. for
Research in Otolaryngology, Midwinter Meeting. Abstract #653.
Sigalovsky, I.S., Hawley, M.L., Harms, M.P., Levine, R.A., and Melcher, J.R. (2001). Sound
level representations in the human auditory pathway investigated using fMRI. Assoc. for
Research in Otolaryngology, Midwinter Meeting. Abstract #654.
Harms, M.P., and Melcher, J.R. (1999). Understanding novel fMRI time courses to rapidly
presented noise bursts. 5th International Conference on Functional Mapping of the Human
Brain, NeuroImage 9: S847.
Harms, M.P., Melcher, J.R., and Weisskoff, R.M. (1998). Time courses of fMRI signals in
the inferior colliculus, medial geniculate body, and auditory cortex show different
dependencies on noise burst rate. 4th International Conference on Functional Mapping of
the Human Brain, NeuroImage 7, S365.
Harms, M.P., Melcher, J.R., and Weisskoff, R.M. (1998). Dependence of fMRI activation in
the inferior colliculus, medial geniculate body, and auditory cortex on noise burst rate.
Assoc. for Research in Otolaryngology, Midwinter Meeting. Abstract #827.
Harms, M.P., Weisskoff, R.M., and Melcher, J.R. (1997). Dependence of fMRI activation in
the human inferior colliculus and Heschl's gyrus on noise burst rate. Society for
Neuroscience Abstracts, 23: 1033.
196