Sound temporal envelope and time-patterns of activity in the human auditory pathway: an fMRI study by Michael Patrick Harms B.S., Electrical Engineering Rice University, 1994 Submitted to the Harvard-M.I.T. Division of Health Sciences and Technology in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2002 © 2002 Michael P. Harms. All rights reserved. The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Signature of Author:........................................................................................................................................ Harvard-M.I.T. Division of Health Sciences and Technology March 4, 2002 Certified By:.................................................................................................................................................... Jennifer R. Melcher, Ph.D. Assistant Professor of Otology and Laryngology, Harvard Medical School Thesis Supervisor Accepted By: ................................................................................................................................................... Martha L. Gray, Ph.D. Edward Hood Taplin Professor of Medical Engineering and Electrical Engineering Co-director, Harvard-M.I.T. Division of Health Sciences and Technology ii Sound temporal envelope and time-patterns of activity in the human auditory pathway: an fMRI study by Michael Patrick Harms Submitted to the Harvard-M.I.T. Division of Health Sciences and Technology on March 4, 2002 in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The temporal envelope of sound strongly influences the intelligibility of speech, pattern analysis, and the grouping of sequential stimuli. This thesis examined the coding of sound temporal envelope in the time-patterns of population neural activity of the human auditory pathway. Traditional microelectrode recordings capture the fine time-pattern of neural spiking in individual neurons, but do not necessarily provide a good assay of temporal coding in neural populations. In contrast, functional magnetic resonance imaging (fMRI), the technique chosen for the present study, provides an indicator of population activity over a time-scale of seconds, with the added advantage that it can be used routinely in human listeners. In a first study, it was established that the time-pattern of cortical activity is heavily influenced by sound repetition rate, whereas the time-pattern of subcortical activity is not. In the inferior colliculus, activity to prolonged noise burst trains (30 s) increased with increasing rate (2/s – 35/s), but was always sustained throughout the train. In contrast, the most striking sound rate dependence of auditory cortex was seen in the time-pattern of activity. Low rates elicited sustained activity, whereas high rates elicited “phasic” activity, characterized by strong adaptation early in the train and a robust response to train offset. These results for auditory cortex suggested that certain sound temporal envelope characteristics are encoded over multiple seconds in the time-patterns of cortical population activity. A second study tested this idea more fully by using a wider variety of sounds (e.g., speech, music, clicks, tones) and by systematically varying different sound features. Important for this test was the development of a new set of basis functions for use in a general linear model that enabled the detection and quantification of the full range of cortical activity patterns. This study established that the time-pattern of cortical activity is strongly dependent on sound temporal envelope, but not sound level or bandwidth. Namely, as either rate or sound-time fraction increases, the time-pattern shifts from sustained to phasic. Thus, shifts in the time-pattern of cortical activity from sustained to phasic signal subsecond differences in sound temporal envelope. These shifts may be fundamental to the perception of successive acoustic transients as either distinct or grouped acoustic events. Thesis Supervisor: Jennifer R. Melcher, Ph.D. Assistant Professor of Otology and Laryngology, Harvard Medical School iii iv ACKNOWLEDGEMENTS I will forever be grateful to the many people that contributed either directly to this thesis, or my personal development over the last 7 plus years at MIT. I have benefited tremendously from your advice, and most importantly your friendship. This thesis never would have been possible without the guidance and nurturing of my advisor, Jennifer Melcher. She contributed immensely to my scientific development. She always challenged me to produce my best, but at the same time remained a good friend (quite an accomplishment for two Type A personalities). I very much enjoyed our frequent interaction, and her ever “open door”, which made it easy to get comments and feedback. Jennifer willingly shouldered the burden of assuring that we were funded, and that human subjects approval was obtained, thus freeing me to focus blissfully on my research. I fully realize that not all graduate students are so fortunate. Thanks Jennifer! My other thesis committee members, John Guinan, Mark Tramo, and Anders Dale, also provided many helpful comments and insights. The fact that they did not ask to be excused from my committee following an initial 4 hour thesis proposal meeting-marathon speaks volumes about their patience! As thesis chairman, John was always available to critique a line of thought, a poster, or draft of an abstract or thesis chapter, which I more than took advantage of given his easily accessibility right down the hall. I probably would never have entered the Speech and Hearing Sciences Program were it not for Nelson Kiang, whose vision and enthusiasm convinced me that joining this new program was a risk worth taking. Conversations with Nelson are always insightful and provocative, and I thank him for many fruitful career related discussions. I was fortunate to be able to interact extensively with outstanding scientists from both the NMR Imaging Center of Massachusetts General Hospital and the Eaton Peabody Laboratory of the Massachusetts Eye and Ear Infirmary. At the NMR Center, I was welcomed by Bruce Rosen, Ken Kwong, Bruce Jenkins, Robert Weisskoff, Hans Breiter, Randy Gollub, Joe Mandeville, Rick Hoge, and Sean Marrett. I was equally welcomed at EPL by Charlie Liberman, Chris Brown, Bertrand Delgutte, Bill Peake, John Rosowski, Peter Cariani, Barbara Fullerton, Mike Ravicz, and Chris Shera. Thanks to both groups of scientists for their comments and advice, and for fostering an enjoyable work environment. I was very fortunate to be able to interact on a daily basis with a very high-caliber group of other students and post-docs. Pankaj Oberoi, Susan Voss, Mark Oster, Diane Ronan, Janet Slifka, Joyce Rosenthal, Chandran Seshagiri, Alan Groff, Annette Taberner, Ona Wu, Whitney Edmister, and Greg Zaharchuk have been good friends and colleagues. Additional good friends, who were also frequently critics of my posters, practice talks, and manuscripts include Irina Sigalovsky, Monica Hawley, Tom Talavage, Martin McKinney, John Iversen, Sridhar Kalluri, and Courtney Lane. Ben Hammond’s untimely death cost me a dear friend, and deprived the auditory community of a great thinker. Doug Greve of the NMR Center graciously let me use his Matlab implementation of the general linear model, and helped explain its inner workings so that I could modify it to my purposes. Outside of lab, Adelle Smith, Mike Lohse, Phil Bradley, and Julie Bradley all helped to assure that my time in Boston was a great experience. v I freely admit that I was spoiled as a graduate student in terms of the administrative support that I received. Barbara Norris was absolutely instrumental in poster and figure preparation. Dianna Sands in the EPL front office kept things lively and handled many things that she was fully entitled to tell me to do myself. The EPL engineering staff, Dave Steffens, Ish Stefanov-Wagner, and Frank Cardarelli ensured that I had the computer support necessary to do my work. Terry Campbell and Mary Foley of the NMR Center were always available to explain how to run the magnet, and how to get it to do just what we wanted. I thank my many subjects for agreeing to lie still in the magnet for 2 hours, while being instructed continually to “listen attentively” to rather boring acoustic stimuli. Many of my friends were subjects at some point, for which I am tremendously grateful. Somehow my parents got the impression (probably from a lack of clarity on my part) that this “PhD thing” was just a four-year process. I’m sure that they are ecstatic that they can now give a satisfactory answer to the questions from friends and relatives about when I would be done with “school”. I thank them for their unending love and support through it all. My brother Brian and sister Erin were also a key component of my support structure. It was a real treat having Brian here in Boston the past three and a half years. Finally, I thank my wonderful wife, Nicole, for her unconditional love and support. The incredible depth of her love gives me a glimpse of God’s abounding love for humankind, by which He sent His Son, Jesus Christ, to be our Savior. Meeting and marrying her will always be the pinnacle of my Boston experience. I know that this whole thesis was very emotional and personal for her. My pains were her pains, and my joys were her joys. I can’t wait to set out on our next adventure together. Nicole is my earthly angel, sent by God, to be my companion and soul-mate. Neither of us could have done this without God’s serenity, peace, courage, and wisdom. We thank Him from our heart for His many blessings. Michael Harms March 4, 2002 The work in this thesis was supported by NIH/NIDCD PO1DC00119, RO3DC03122, T32DC00038, and a Martinos Scholarship. vi They that wait upon the Lord shall renew their strength. They will soar on wings like eagles. Isaiah 40:31 vii Table of Contents Chapter 1 Introduction Overview ..........................................................................................................................................11 Thesis structure and chapter overview .........................................................................................13 References........................................................................................................................................16 Chapter 2 Sound repetition rate in the human auditory pathway: Representations in the waveshape and amplitude of fMRI activation Abstract............................................................................................................................................19 Introduction.....................................................................................................................................20 Methods............................................................................................................................................22 Experiments I and II: Noise burst trains with different burst repetition rates...........................23 Subjects................................................................................................................................23 Acoustic stimulation ............................................................................................................23 Task......................................................................................................................................25 Imaging ................................................................................................................................26 Analysis ...............................................................................................................................28 Experiment III: Small numbers of noise bursts.........................................................................33 Experiment IV: Noise burst trains with different durations......................................................35 Results ..............................................................................................................................................35 Response to noise burst trains: effect of burst repetition rate ...................................................35 Inferior Colliculus................................................................................................................35 Medial Geniculate Body ......................................................................................................39 Heschl's gyrus and superior temporal gyrus ........................................................................43 Response to small numbers of noise bursts...............................................................................47 Response to high rate (35/s) noise burst trains: effect of train duration....................................50 Discussion ........................................................................................................................................51 Role of rate per se in determining fMRI responses ..................................................................52 fMRI responses and underlying neural activity ........................................................................53 fMRI response onset and neural adaptation ..............................................................................55 Phasic response “off-peak” and neural off responses ...............................................................60 Phasic response recovery ..........................................................................................................61 Comparison to previous fMRI and PET studies – auditory and non-auditory..........................62 Relationship between fMRI response waveshape and sound perception..................................64 References........................................................................................................................................66 viii Chapter 3 Detection and quantification of a wide range of fMRI temporal responses using a physiologically-motivated basis set Abstract............................................................................................................................................75 Introduction.....................................................................................................................................76 Methods............................................................................................................................................80 fMRI data ..................................................................................................................................80 Basis functions ..........................................................................................................................82 Response and noise estimation under the general linear model................................................86 Examination of residuals .....................................................................................................88 Practical implementation .....................................................................................................91 Activation map formation .........................................................................................................92 Waveshape index ......................................................................................................................93 Results ..............................................................................................................................................94 Activation detection: OSORU vs. sustained-only and sinusoidal basis functions ....................94 Relative importance of the OSORU basis functions.................................................................97 Assessment of correspondence between OSORU components and actual waveforms ............99 Using the OSORU basis functions to probe response physiology ..........................................101 Discussion ......................................................................................................................................105 Successful response detection with the OSORU basis set ......................................................105 A challenging database provided a strong test of the OSORU basis set.................................106 Detecting and mapping response dynamics ............................................................................106 Previous implementations of the general linear model within a physiological framework ....108 Physiologically-based implementations of the GLM: broad applicability to any brain system109 References......................................................................................................................................111 Chapter 4 The temporal envelope of sound determines the time-pattern of fMRI responses in human auditory cortex Introduction...................................................................................................................................115 Methods..........................................................................................................................................118 Stimuli.....................................................................................................................................119 Stimulus level..........................................................................................................................120 Task.........................................................................................................................................121 Sound delivery ........................................................................................................................121 Acoustic stimulation paradigm ...............................................................................................122 Handling scanner acoustic noise .............................................................................................122 Imaging ...................................................................................................................................124 Image pre-processing ..............................................................................................................126 Response detection..................................................................................................................126 Waveshape quantification..................................................................................................129 Calculating response waveforms .......................................................................................131 Defining regions of interest.....................................................................................................132 ix Results ............................................................................................................................................133 Waveshape dependence on stimulus type in posterior auditory cortex...................................133 Waveshape dependence on modulation rate in posterior auditory cortex...............................137 Waveshape dependence on sound-time fraction in posterior auditory cortex.........................139 Insensitivity of waveshape to sound level in posterior auditory cortex ..................................142 Insensitivity of waveshape to sound bandwidth in posterior auditory cortex .........................146 Response waveshapes throughout auditory cortex for music and 35/s noise bursts ...............147 Differences in response waveshape between cortical areas ....................................................155 Left-right differences in response waveshape.........................................................................158 Discussion ......................................................................................................................................161 Response waveshape: hemodynamic vs. neural factors.........................................................162 Response waveshape: neural adaptation and off-responses...................................................163 Response waveshape and sound temporal envelope characteristics: rate and sound time fraction ...............................................................................................................................164 Relationship between sound perception, fMRI time-pattern, and neural activity...................167 References......................................................................................................................................170 Appendix Subjects ..........................................................................................................................................175 HG Data .........................................................................................................................................177 STG Data .......................................................................................................................................183 IC Data...........................................................................................................................................189 Biography x Chapter 1 Introduction OVERVIEW A primary goal of auditory neuroscience is to understand how human speech and environmental sounds are represented in neural activity, and how this information is processed and transformed at the various stages of the auditory pathway. Over the past 50 years, microelectrode recordings in animals have yielded detailed information regarding the spatial and temporal patterns of neural activity evoked by acoustic stimuli. This animal work has provided considerable insight into the coding of various sound features (e.g., frequency, intensity, amplitude modulation) in the activity of individual neurons. However, because sampling from many neurons across a region of tissue can be difficult and time-consuming, microelectrode recordings are generally insufficient for revealing how sound features are represented in population neural activity. Ultimately, knowing how systems of neurons encode sound features in their population activity may be as relevant and important for understanding aspects of speech processing and auditory perception as a detailed knowledge of how sound is represented in individual neurons. Commonly employed techniques for studying population activity include evoked potentials, electroencephalography (EEG), magnetoencephalography (MEG), positron emission topography (PET) and functional magnetic resonance imaging (fMRI). One advantage of these techniques is that 11 12 Chapter 1: Introduction they can be applied routinely to humans. This is important, since the degree to which animal findings extend to humans remains uncertain, due to interspecies differences, possible effects of anesthesia, and a paucity of data in humans that can serve as a link to the animal work. Additionally, some of the neural processes relevant to human speech processing and auditory perception may be altogether unique to humans. Ultimately, direct neurophysiological data in human listeners is important if we are to understand how sound features are coded in the activity patterns of the human brain. This thesis studies population neural activity of the human auditory system using fMRI. Since its emergence in the early 1990’s (Kwong et al. 1992; Ogawa et al. 1992), fMRI has been widely adopted as a technique for studying human brain activity. A particular strength of fMRI is its ability to map brain activity directly to anatomy with a high spatial resolution (~1 mm) compared to other neuroimaging techniques. While the vast majority of fMRI studies to date have focused on cortical activity, fMRI can successfully examine activity in structures throughout the auditory pathway (Guimaraes et al. 1998; Melcher et al. 1999). Since multiple levels of the auditory pathway can be studied simultaneously with fMRI, the transformation of neural activity across different levels of the pathway can be examined directly within individual subjects. While most fMRI studies have focused on the spatial patterns of brain activity, fMRI also has the temporal resolution necessary to uncover changes in the temporal patterns of population neural activity that occur over a span of seconds, as this thesis amply illustrates. The fMRI response arises from localized hemodynamic changes that ultimately reflect changes in “neural activity” (broadly defined as neural spiking, and excitatory and inhibitory synaptic activity; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995; Heeger et al. 2000; Rees et al. 2000; Logothetis et al. 2001). Because the hemodynamic system responds in a “sluggish” manner to changes in neural activity, the fMRI response can be thought of as reflecting the time-envelope of population neural activity in a local region of the brain. This thesis focuses particularly on how this time-envelope of activity relates to sound features and how it changes across different levels of the auditory pathway. 13 The focus on the time-pattern (i.e., dynamics) of fMRI responses arose from a discovery early in this thesis of a novel fMRI response in auditory cortex. Early concern in the fMRI literature regarding response dynamics focused on whether or not the response to very prolonged stimuli (e.g., minutes long) decreased over time solely due to changes in the coupling between hemodynamic and metabolic factors (Frahm et al. 1996; Bandettini et al. 1997; Chen et al. 1998; Howseman et al. 1998). Other fMRI studies subsequently observed transient response features that occurred over shorter time spans and which are likely related to adaptation of neural activity (Hoge et al. 1999; Jäncke et al. 1999; Giraud et al. 2000; Sobel et al. 2000). Nonetheless, the transient aspects of cortical fMRI responses reported in this thesis are particularly dramatic, including a rapid decline in signal to near baseline following an initial response to stimulus onset, and a prominent response following the termination of the stimulus. This phasic response, in contrast to the sustained responses typically observed with fMRI, indicates that the time-pattern of auditory fMRI responses contains information about robust sound-dependent variations in the population neural activity of the auditory pathway. THESIS STRUCTURE AND CHAPTER OVERVIEW The thesis is composed of three main chapters, each written in the style of a self-contained paper. An overview of each of the three chapters follows. The phasic fMRI response was first discovered in a study investigating how repetition rate is represented in the activity patterns of multiple auditory structures in the human brain (Chapter 2). At the commencement of this thesis, there were very few studies exploring the relationship between the fMRI response and fundamental stimulus parameters such as repetition rate, stimulus level, or bandwidth for the types of simple acoustic stimuli used routinely in auditory electro- and neurophysiology, such as noise bursts, tone bursts, clicks, and continuous noise. Responses to noise bursts presented at repetition rates ranging from 1/s to 35/s were collected from the inferior colliculus, medial geniculate body, and both primary and non-primary auditory cortex. This study revealed that the time-pattern of the fMRI response was highly dependent on repetition rate, in a 14 Chapter 1: Introduction manner that itself was dependent on auditory structure. In particular, responses in the inferior colliculus were sustained at all rates, although they increased in amplitude with increasing rate. In contrast, sustained responses were only elicited at low rates in auditory cortex (e.g., 2/s), whereas the highest rate (35/s) elicited a response with a highly phasic time-pattern. The DISCUSSION of Chapter 2 links these transient response features to neural adaptation and the generation of neural off-responses, and includes a more detailed discussion of the relationship between the fMRI signal and neural activity. The discovery of a novel temporal fMRI response in auditory cortex necessitated a reevaluation of the statistical model employed for detecting regions (i.e., voxels) of the brain responsive to a given stimulus. In Chapter 3, I develop a method capable of detecting responses with a wide variety of temporal dynamics, while simultaneously extracting information about individual temporal features of the response. Specifically, I implemented the general linear model using a novel set of “physiologically-motivated” basis functions chosen to reflect temporal features of auditory cortical fMRI responses. The performance of this basis set in detecting responses is compared against two other basis sets that have been commonly employed in fMRI analyses. Additionally, I establish that this physiologically-motivated basis set proves effective in exploring brain physiology. Equipped with a new approach for detecting and quantifying a wide variety of responses, Chapter 4 proceeds to rigorously establish which particular sound features are primarily coded in the time-pattern of auditory cortical fMRI responses. I establish that the time-pattern of auditory fMRI responses is primarily determined by sound temporal envelope, but not sound level or bandwidth. In particular, as either repetition rate or sound-time fraction increases, the time-pattern shifts from sustained to phasic. I further establish that sound temporal envelope characteristics are strongly represented in the time-pattern of fMRI responses throughout auditory cortex. Overall, the relationship between sound temporal envelope and the time-pattern of neural activity is particularly interesting in light of the perceptual changes that occur as sound envelope is varied. Successive stimuli in a low-rate train can be discerned individually, whereas those of higher rate trains begin to fuse into a continuous percept, and may be grouped into a single auditory 15 “event”. The changes of response time-pattern in auditory cortex are correlated with these perceptual changes – low-rate trains evoke sustained responses, consistent with successive neural responses to each burst in a train, whereas high rate trains evoke phasic responses, consistent with neural activity primarily concentrated at the onset and offset of the overall train. Across levels of the auditory pathway, the results of this thesis indicate that lower levels in the auditory pathway can respond to successive acoustic transients up to higher rates than higher levels of the auditory pathway. At lower levels of the pathway, population neural activity codes the occurrence of each successive acoustic transient in an ongoing sound. In contrast, population neural activity in cortex may reflect whether or not successive acoustic transients are perceptually grouped into a single auditory event. 16 Chapter 1: Introduction REFERENCES Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516, 1983. Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW, Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997. Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation. Magn Reson Med 39: 520-527, 1998. Frahm J, Kruger G, Merboldt KD and Kleinschmidt A. Dynamic uncoupling and recoupling of perfusion and oxidative metabolism during focal brain activation in man. Magn Reson Med 35: 143148, 1996. Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000. Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S, Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain Mapp 6: 33-41, 1998. Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000. Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999. Howseman AM, Porter DA, Hutton C, Josephs O and Turner R. Blood oxygenation level dependent signal time courses during prolonged visual stimulation. Magn Reson Imaging 16: 1-11, 1998. Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res Cogn Brain Res 8: 117-124, 1999. Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995. Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci 89: 5675-5679, 1992. Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001. Melcher JR, Talavage TM and Harms MP. Functional MRI of the auditory system. In: Functional MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 393-406. 17 Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic activity in the central auditory system. J Comp Neurol 245: 553-565, 1986. Ogawa S, Tank DW, Menon R, Ellermann JM, Kim SG, Merkle H and Ugurbil K. Intrinsic signal changes accompanying sensory stimulation: Functional brain mapping with magnetic resonance imaging. Proc Natl Acad Sci U S A 89: 5951-5955, 1992. Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties of human and macaque V5. Nat Neurosci 3: 716-723, 2000. Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE. Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol 83: 537-551, 2000. 18 Chapter 2 Sound repetition rate in the human auditory pathway: Representations in the waveshape and amplitude of fMRI activation ABSTRACT Sound repetition rate plays an important role in stream segregation, temporal pattern recognition, and the perception of successive sounds as either distinct or fused. The present study was aimed at elucidating the neural coding of repetition rate and its perceptual correlates. We investigated the representations of rate in the auditory pathway of human listeners using functional magnetic resonance imaging (fMRI), an indicator of population neural activity. Stimuli were trains of noise bursts presented at rates ranging from low (1-2/s; each burst is perceptually distinct) to high (35/s; individual bursts are not distinguishable). There was a systematic change in the form of fMRI response rate-dependencies from midbrain, to thalamus, to cortex. In the inferior colliculus, response amplitude increased with increasing rate while response waveshape remained unchanged and sustained. In the medial geniculate body, increasing rate produced an increase in amplitude and some change in waveshape at higher rates (from sustained to one showing a moderate peak just after train onset). In auditory cortex (Heschl's gyrus and the superior temporal gyrus), amplitude changed 19 20 Chapter 2: Repetition Rate some with rate, but a far more striking change occurred in response waveshape – low rates elicited a sustained response, whereas high rates elicited an unusual phasic response that included prominent peaks just after train onset and offset. The shift in cortical response waveshape from sustained to phasic with increasing rate corresponds to a perceptual shift from individually resolved bursts to fused bursts forming a continuous (but modulated) percept. Thus, at high rates, a train forms a single perceptual “event”, the onset and offset of which are delimited by the on and off peaks of phasic cortical responses. While auditory cortex showed a clear, qualitative correlation between perception and response waveshape, the medial geniculate body showed less correlation (since there was less change in waveshape with rate), and the inferior colliculus showed no correlation at all. Overall, our results suggest a population neural representation of the beginning and the end of distinct perceptual events that is weak or absent in the inferior colliculus, begins to emerge in the medial geniculate body, and is robust in auditory cortex. INTRODUCTION It is well-known from human psychophysical experiments that the perception of a succession of sounds depends strongly on the rate of sound presentation. For instance, when bursts of noise are presented repeatedly at a low rate (e.g., < 10/s), each burst can be separately resolved (Miller and Taylor 1948; Symmes et al. 1955). In contrast, bursts presented at a higher rate fuse to form a single, modulated percept. In experiments where multiple series of sounds are presented simultaneously (e.g., a series of high and a series of low frequency tone bursts), the rate of sound presentation influences whether the series are perceived as single or separate streams, as well as the perceived temporal pattern within each stream (Royer and Robin 1986; Bregman 1990). The dependencies on rate observed in controlled psychophysical experiments such as these suggest that rate plays an important role in the perception of the more complex acoustic conditions encountered in everyday life. Since repetition rate plays so basic a role in determining how sounds are heard it is not surprising that there have been numerous neurophysiological studies of rate in animals. Broad trends Introduction 21 concerning the coding of rate in the auditory pathway have emerged from this work. For instance, the highest repetition rates at which neurons respond faithfully to each successive sound in a train (or each successive cycle of amplitude modulated stimuli) tends to decrease from brainstem to thalamus to cortex (e.g., Creutzfeldt et al. 1980; Schreiner and Langner 1988; Langner 1992). In cortex, the neural coding of low and high rates may be accomplished by different populations of neurons, one coding low rate stimuli through stimulus-synchronized activity and the other coding high rates in the overall amount of discharge activity (Lu and Wang 2000; Lu et al. 2001). While the animal work has shed light on the neural representations of repetition rate, the degree to which the animal findings extend to humans remains uncertain because of interspecies differences, anesthesia differences, and a paucity of data in humans that can serve as a link to the animal work. In the end, direct neurophysiological data in human listeners is important if we are to understand how repetition rate is represented in the activity patterns of the human brain. Most previous neurophysiological studies of repetition rate in humans have used noninvasive techniques for probing brain function, such as evoked potential and evoked magnetic field measurements. The evoked response work has examined averaged responses at short, middle, and long latencies to various types of brief stimuli (e.g., clicks, tone and noise bursts) presented at different rates (Picton et al. 1974; Thornton and Coleman 1975; Näätänen and Picton 1987). A particular strength of evoked potential and magnetic field measurements is that they can be used to examine responses to individual stimuli within a train up to much higher rates than with other noninvasive brain imaging techniques (see below). A limitation, however, is that the sites of response generation cannot always be reliably localized. Evoked magnetic field examinations of repetition rate are further limited in that they provide information mainly concerning cortical areas because of inherent limitations in probing subcortical function using this technique (Erne and Hoke 1990). Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI), two techniques for spatially mapping brain activity, have also been used to examine the dependence of human brain activation on repetition rate. Compared to evoked potential and magnetic field measurement, fMRI lacks the temporal resolution needed to separately resolve the responses 22 Chapter 2: Repetition Rate produced by individual stimuli in a train (except at extremely low rates, e.g., ~0.1/s), and the temporal resolution of PET is even less. An important advantage, however, is that both PET and fMRI enable activation to be directly localized to brainstem, thalamic, and cortical structures of the auditory pathway (Guimaraes et al. 1998; Lockwood et al. 1999; Melcher et al. 1999; Griffiths et al. 2001). The localization provided by fMRI is particularly precise because of the technique's high spatial resolution and direct mapping to anatomy. Despite the fact that fMRI and PET can show activation at different stages of the auditory pathway, most rate studies using these approaches have focused exclusively on cortical areas. All but one (Giraud et al. 2000) have also focused on low repetition rates (< 2.5/s; Price et al. 1992; Binder et al. 1994; Frith and Friston 1996; Dhankhar et al. 1997; Rees et al. 1997). Overall, there is limited PET or fMRI data concerning the representations of rate within the human auditory pathway. Specifically, there is little information concerning the transformation of rate representations from structure to structure within the pathway for a wide range of psychophysically relevant rates. The present fMRI study compared the representation of repetition rate across cortical and subcortical structures of the human auditory pathway using a wide range of rates. Stimuli were trains of repeated noise bursts with repetition rates ranging from low (where each burst could be resolved individually) to high (where individual bursts were not distinguishable and the train was perceived as a continuous, but modulated, sound). Noise bursts were chosen as the elemental stimulus based on the assumption that broadband sound would elicit robust responses by activating neurons across a wide range of characteristic frequencies. fMRI was selected for its high spatial resolution, its localizing capabilities, and its higher temporal resolution (~2 s) compared to PET (>10 s). The latter feature proved important because one of the most striking differences in rate representation across structures occurred in the temporal dynamics of the fMRI response. METHODS Four series of experiments were conducted. The first two examined the effect of repetition rate on the response to a noise burst train in the inferior colliculus (IC), Heschl's gyrus (HG), and the Methods 23 superior temporal gyrus (STG; Experiment I) or the IC and medial geniculate body (MGB; Experiment II). The remaining Experiments (III, IV) were aimed at understanding one of the findings from Exps. I and II, namely an unusual form of temporal response in the cortex to trains with a high repetition rate. This study was approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital. All subjects gave their written informed consent. Experiments I and II: Noise burst trains with different burst repetition rates Subjects Nine subjects participated in a total of 11 imaging sessions for Experiments I and II (Exp. I: 5 sessions, subject #'s 1-5; Exp. II: 6 sessions, subject #'s 2,5,6-9). Two subjects participated once in each Experiment. Subjects ranged in age from 19 to 35 years (mean = 25.6). Eight of the nine subjects were male. Eight of the nine were right-handed. Subjects had no known audiological or neurological disorders. Acoustic stimulation The stimuli were bursts of uniformly distributed white noise. The bursts were presented at repetition rates of 1, 2, 10, 35/s (Exp. I) or 2, 10, 20, 35/s (Exp. II). The 1/s rate was used in only 3 of the 5 sessions of Exp. I. Individual noise bursts in all four Experiments were always 25 ms in duration (full width half maximum), with a rise/fall time of 2.5 ms. The spectrum of the noise stimulus at the subject's ears was low-pass (6 kHz cutoff), reflecting the frequency response of the acoustic system. Noise bursts were presented in 30 s long trains alternated with 30 s “off'” periods, during which no auditory stimulus was presented (Figure 2-1, top). Four alternations between “train on” and “off” periods constituted a single scanning “run” (total duration 240 s). For all but two sessions (in Exp. I), each of the four rates was presented once during each run, and their order was varied 24 Chapter 2: Repetition Rate across runs. Within a train, the repeated noise bursts were identical (i.e. “frozen”), but the noise bursts differed across trains and runs. For the other two sessions, the same rate was presented throughout a run, and this rate was varied across runs. For these two sessions, the noise burst was frozen throughout the entire run, but differed across runs. In each session, the total number of train presentations at each rate was between 8-13 (mean: 11.2). Separately for each ear, the subject's threshold of hearing to 10/s noise bursts was determined in the scanner room immediately prior to the imaging session. Noise bursts were presented binaurally at 55 dB above this threshold. During both threshold determination and functional imaging, there was an on-going low-frequency background noise produced primarily by the pump for the liquid helium (used to supercool the magnet coils). This sound reaches levels of ∼80 dB SPL in the frequency range of 50-300 Hz (Ravicz et al. 2000). Additionally during functional imaging, each image acquisition generated a “beep” of approximately 115 dB SPL at 1.0 kHz (∼130 dB SPL at 1.4 kHz for Exps. III and IV). Noise bursts were delivered through a headphone assembly that provided approximately 30 dB of attenuation at the primary frequency of the scanner-generated sounds (1.0 or 1.4 kHz; Ravicz and Melcher 2001). Specifically, the noise bursts were produced by a D/A board (running under LabView), amplified, and fed to a pair of audio transducers housed in a shielded box adjacent to the scanner. The output of the transducers reached the subject's ears via air-filled tubes that were incorporated into sound attenuating earmuffs. 25 EXPERIMENTS I and II Methods Noise Burst Noise Burst Train Train Off (Rate 1) (Rate 2) (30 s) (30 s) (30 s) Time 25 ms image acquired at each tick mark ( ~ 2/s) (18 s) (18 s) (18 s) Time 1 NB 2 NBs @ 2/s 2 NBs @ 35/s 5 NBs @ 35/s ~ ~ EXPERIMENT III 100 ms 500 ms 28.6 ms Figure 2-1: Schematic of the stimulus paradigm for Exps. I-III. In Exps. I and II trains of noise bursts at a given repetition rate were presented for 30 s, followed by a 30 s “off” period. This alternation was repeated four times for each imaging “run”, typically using a different repetition rate for each “train on” period. Tick marks represent an image acquisition (approximately every 2 s). The expanded view uses a smaller time scale to illustrate the stimulus – in this case a portion of a prolonged 10/s noise burst train. In Exp. III, “trials” of noise bursts, consisting of either 1, 2 or 5 noise bursts, were presented once every 18 s (15-16 trials per run). The interstimulus interval for the two noise bursts was either 500 ms or 28.6 ms (i.e., 2 NBs@2/s and 2 NBs@35/s; in this case the expanded view shows the complete stimulus for a given trial). The trials with five noise bursts used an interstimulus interval of 28.6 ms. In one imaging session, all four trial types were presented in randomized order. In the other two sessions, the same trial type was used throughout a run. In all experiments, individual noise bursts were always 25 ms in duration. Task Subjects were instructed to listen to the noise burst trains. For Exp. II, subjects performed an additional, simple task to further ensure that they remained attentive. They indicated whenever they detected an occasional 6 dB increment or decrement in intensity by raising or lowering their index finger. Intensity changes persisted for all the noise bursts that occurred in a 1 s interval. Subject 26 Chapter 2: Repetition Rate responses were monitored by the experimenter, who could see the subject's finger from the imager control room. Each subject identified more than 90% of the intensity changes. At the end of each scanning run (for all Exps.), subjects reported their alertness on a qualitative scale ranging from 1 (fell asleep during run) to 5 (highly alert). Alertness ratings were almost always in the 3-5 range, and were never 1. No data were discarded because of inadequate subject alertness. Imaging Subjects were imaged using a 1.5 Telsa whole-body scanner (General Electric) and a head coil (transmit/ receive; General Electric). The scanner was retrofitted for high-speed imaging (i.e., single-shot echo-planar imaging; Advanced NMR Systems, Inc.). Subjects rested supine in the scanner. To avoid head motion, they were fitted with a bite bar custom-molded to their teeth and mounted to the head coil.1 Each imaging session lasted ~2 hours and included the following procedures: 1. Contiguous sagittal images of the whole head were acquired. 2. An automated, echo-planar based shimming procedure was performed to increase magnetic field homogeneity within the brain regions to be functionally imaged (Reese et al. 1995). 3. The brain slice to be functionally imaged was selected using the sagittal images as a reference. For Exp. I, the selected slice intersected the IC and the posterior aspect of HG and the STG (Figure 2-2, left and middle). When there appeared to be multiple transverse temporal gyri, we selected the anterior one as HG (Penhune et al. 1996; Leonard et al. 1998). For Exp. II, the slice intersected the IC and MGB (located just ventral and lateral to the cerebral aqueduct; Figure 2-2, right). A single slice, rather than multiple slices, was imaged to reduce the impact of scannergenerated acoustic noise on auditory activation. 1 In Exp. IV, and two (of three) sessions in Exp. III, we used a simpler set-up in which a pillow and foam were packed snugly around the head to reduce head motion, rather than using a bite bar. Methods 27 Experiments I, III, IV 3 mm from Midline 39 mm from Midline Superior 1 cm Posterior Inferior Colliculus Heschl's Gyrus Experiment II Midline Anterior Cerebral Brachium between Aqueduct Inferior Colliculi Figure 2-2: Functional imaging planes superimposed on sagittal, anatomical images. In Exps. I, III, and IV the plane (thick white line) passed through the inferior colliculi (top left panel) and Heschl's gyri (top right panel). In Exp. II, the plane passed through the inferior colliculi (located just lateral to the brachium of the inferior colliculi) and the medial geniculate bodies of the thalamus (located just ventral and lateral to the cerebral aqueduct; bottom panel). 28 Chapter 2: Repetition Rate 4. A T1-weighted, high-resolution anatomical image was acquired of the selected brain slice for subsequent overlay of the functional data (TR = 10 s, TI = 1200 ms, TE = 40 ms, in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm). A second high-resolution anatomical image was acquired at the end of the session after functional imaging. A comparison of the initial and final T1 images allowed for a gross check of subject movement over the session. 5. Functional images of the selected slice were acquired using a blood oxygenation level dependent (BOLD) sequence (asymmetric spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o, thickness = 7 mm, in-plane resolution = 3.1 x 3.1 mm). The beginning of each scanning “run” included four discarded images to ensure that image signal level had approached a steady state. During the remainder of the run, functional images of the selected slice were acquired repeatedly while the noise burst trains were alternately turned on for 30 seconds and off for 30 seconds (Figure 2-1, top). Functional imaging was performed using a cardiac gating method that increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998). Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, and the interimage interval (TR) was recorded. The average TR across all sessions was 2082 ms (the average within a session varied from 1521 to 2650 ms). Fluctuations in heart rate lead to variations in TR that result in image-to-image variations in image signal strength (i.e., T1 effects). Using the measured TR values, image signal was corrected to account for these variations (Guimaraes et al. 1998). Analysis Image pre-processing The images for each scanning run were corrected for any movements of the head that may have occurred over the course of the imaging session. Each functional image of a session was translated and rotated to fit the first image of the first functional run using standard software (SPM95; without spin history correction; Friston et al. 1995; Friston et al. 1996). Because only one functional slice was acquired, these corrections for motion were necessarily limited to adjustments Methods 29 within the imaging plane. In most cases, the motion correction algorithm was well-behaved and resulted in an improvement in image alignment. However, for one session, the algorithm introduced some clearly artifactual movement, so the pre-motion corrected data was utilized. Additionally, we did not include the MGB of one subject in the analysis, because the image translations calculated by the motion-correction algorithm were smaller than the movement evident at the location of the MGB in the T1 anatomical images acquired pre- and post- functional imaging. A similar discrepancy did not occur for the IC of this subject, so the IC data were included. The images for each run were further processed in two ways to enhance the likelihood of detecting activation. (1) Image signal vs. time for each voxel was corrected for linear or quadratic drifts in signal strength over each run (i.e., drift-corrected). (2) Image signal vs. time for each voxel was normalized such that the time-average signal had the same (arbitrary) value for all voxels and runs. (Specifically, the signal vs. time data were ratio normalized to the intercept of a least square quadratic fit to the data). This normalization was done to eliminate artificial discontinuities in the signal level between runs in the subsequently concatenated data. All subsequent analyses were performed on the drift-corrected, normalized images. Generating activation maps Maps of activation were derived as follows. First, each image in the file was assigned to either a “train on” or “off” period. Stimulus-evoked changes in image signal typically have a delay of 4-6 s (Kwong et al. 1992; Bandettini et al. 1993; Buckner et al. 1996). To account for this (hemodynamic) delay, the first three images taken after the onset of a noise burst train were assigned to the preceding “off” period, and the first three images after the train offset were assigned to the preceding “train on” period. For each rate, the images assigned to each “train on” period and its following “off” period were concatenated into a single file. For each voxel in the functional images, image signal strength during train on vs. off periods was compared using an unpaired t-test (Press et al. 1992). The p-value result of this statistical test, plotted as a function of position, constituted an activation map. P-values were not corrected to account for the correlated nature of fMRI time-series (Purdon and Weisskoff 1998 ), nor were they adjusted for the repeated application (voxel-by-voxel) of a statistical test (Friston et al. 1994). 30 Chapter 2: Repetition Rate Defining regions of interest Responses were analyzed quantitatively within four anatomically-defined regions of interest (ROIs): the IC, MGB, HG, and STG. Independent of the activation maps, the borders of these structures were identified directly in the high-resolution anatomical images of the functional imaging plane. These border-delimited “high resolution” regions of interest were then down-sampled to the same resolution as the functional images for the subsequent analysis. The borders in the highresolution anatomical images were defined as follows: IC: In Exp. I, the IC were readily identified as distinct anatomical circular areas (e.g., Figure 2-3). For Exp. II, only the caudal edge of the IC were distinguishable (e.g., Figure 2-6), so the area of each IC ROI was defined as a circle sized to fit this visible edge. The circle was displaced caudally (by approximately 1.5 mm) relative to the IC to ensure that the IC activation was fully encompassed by the ROI even after downsampling. The shift was necessary because activation in the imaging plane for Exp. II frequently abutted, or even overlapped, the caudal IC edge. MGB: Standard anatomical atlases were used to delimit a ROI enclosing the MGB, since the MGB were not directly identifiable in the anatomical images. The caudal border of each MGB ROI was defined as the edge between the brain and the ambient cistern. The distance from the region's caudal edge to its rostral edge was determined from measurements of the caudorostral extent of the MGB in the atlases. Similarly for the distance between the midline and the medial edge of the MGB ROI. Distances were computed by first normalizing the atlas measurements to maximum brain width, and then multiplying the normalized atlas measurements by the maximum width of the individual imaged brain slice. The lateral edge of the MGB ROI was a line extended rostrally from the lateral edge of the ambient cistern. The resulting MGB ROI probably included a portion of the lateral geniculate in some subjects. However, activation generally did not occur at this lateral-most edge. HG: When HG was visible as a “mushroom” protruding from the surface of the superior temporal plane, the lateral edge of this mushroom defined the lateral edge of the HG ROI. The medial edge of the ROI was the medial-most aspect of the Sylvian fissure. When a distinct mushroom was not present, the HG ROI covered approximately the medial third of the superior temporal plane (extending from the medial-most aspect of the Sylvian fissure). In the superior- Methods 31 inferior dimension, the HG ROI extended superiorly to the edge of the overlying parietal lobe, and inferiorly so as to entirely encompass any activation centered on HG. STG: The STG ROI was defined as the superior temporal cortex lateral to the HG ROI. The definition of the inferior and superior borders was the same as for the HG ROI. Calculating response time courses Specific voxels were chosen for computing the time course of response within each anatomically defined region of interest. The voxels were chosen based on the activation maps for a particular “reference rate”: 35/s for IC, 20/s for MGB, 10/s for HG, and 2/s for STG. The reference rates were those that typically produced the strongest activation in the maps. For each IC and MGB, we used the single voxel with the lowest p-value in the activation map at the reference rate. For each HG and STG, we averaged the responses of the four voxels with the lowest p-values at the reference rate. Note that for a given structure, session, and hemisphere, the same voxels were used in computing the response time course at each rate. Response time courses were computed as follows. Because cardiac gating results in an irregular temporal sampling, the time series for each imaging “run” and voxel was linearly interpolated to a consistent 2 s interval between images, using recorded interimage intervals to reconstruct when each image occurred. These data were then temporally smoothed using a three point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response “block” was defined as a 70 s window (35 images) that included 10 s prior to a noise burst train, the 30 s coinciding with the train, and the 30 s off period following the train. These response blocks were averaged according to rate to give an average signal vs. time waveform for each rate, session, and hemisphere. The signal at each time point was then converted to a percent change in signal relative to a baseline. The baseline was defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the noise burst train. In Exps. I and II, there was some uncertainty in the timing of the stimulus relative to the images (up to a maximum of about 2 s in a given run). For the analyses performed in this paper, this level of uncertainty is negligible. In a supplementary analysis, we determined that response waveshape, averaged across sessions and hemispheres (i.e., Figure 2-4), was a fair representation of the trends in the individual 32 Chapter 2: Repetition Rate responses. In particular, the waveshape of the responses was unaffected when the individual responses were first normalized (by dividing by their maximum value) prior to averaging. This result indicates that average response waveshape was not unduly influenced by just a small subset of the individual responses. In a second supplementary analysis, we determined that response waveshape was not sensitive to our voxel selection criteria. For comparison, time courses were computed as above, except using all voxels with a p-value less than 0.01 in the activation map at the reference rate (instead of just the voxels showing the strongest activation). As expected, the resulting percent change time courses (averaged across sessions and hemispheres) were reduced in magnitude. However, the waveshape of the responses was unaffected. A third supplementary analysis examined whether response waveshape at a given rate might have changed during the experimental sessions. This analysis focused on HG and STG since the most dramatic variations in waveshape occurred in these structures (e.g., see Figure 2-4). Specifically, we computed response time courses for each session based on the three initial and three final presentations of the 2/s and 35/s trains. The initial and final time courses for each rate were then averaged across sessions. For each rate and structure, the average initial and final time courses were qualitatively similar. They were also quantitatively similar in that there was a high degree of correlation between the “initial” and “final” waveforms. [When the “initial” and “final” waveforms for each rate and structure were cross-correlated with one another, the correlation coefficients were: 0.92 (HG, 2/s), 0.86 (HG, 35/s), 0.93 (STG, 2/s), 0.90 (STG, 35/s)]. In contrast, there was considerably less correlation between the responses at the two different rates. [When the “initial” waveforms for the two rates were cross-correlated, the correlation coefficients were: 0.54 (HG) and 0.55 (STG). Similarly, for the “final” waveforms, the coefficients were 0.25 (HG) and 0.35 (STG)]. This analysis indicates that, on average, there was no dramatic change in cortical response waveshape during experimental sessions, and any change was substantially less than the change in response waveshape with rate. Methods 33 Quantifying response magnitude Response magnitude in each auditory structure was quantified using two measures computed from the percent change time courses. “Time-average” percent change, a measure of the overall response strength, was computed as the mean percent change from t = 4 to 30 s. “Onset” percent change, a measure of the response amplitude near the beginning of the noise burst train, was computed as the maximum percent change from t = 4 to 10 s. Since “time-average” and “onset” percent change were calculated from the percent change time courses, they indicate image signal deviations relative to a 6 s baseline immediately preceding the stimulus (i.e., the baseline period used in calculating the time courses). Experiment III: Small numbers of noise bursts To investigate a strong signal decrease that occurred in cortex following the onset of high (but not low) rate trains (e.g., see Figure 2-4), we examined the responses to a single noise burst and short clusters of noise bursts. Responses were collected in three imaging sessions with three subjects (Exp. III; subject #'s 2,5,10). Two of these individuals also participated in Exps. I and II. Either one noise burst or a cluster of noise bursts (2 or 5) was presented once every 18 s, constituting a single “trial” (Figure 2-1, bottom). For the clusters of five noise bursts, the interstimulus interval (ISI, onset-to-onset) between noise bursts was 28.6 ms, equivalent to the ISI for a rate of 35/s. For clusters of two noise bursts, two different ISIs were used: 500 ms (2/s rate) and 28.6 ms (35/s rate). For two sessions, there was no task, and the same stimulus was used in all of the trials for a given run (12 runs; 270 s per run; 45 total repetitions per trial type). The subjects for these sessions reported difficulties in maintaining a high level of alertness due to the sparseness and uniformity of the stimulus trials. Therefore, to help maintain alertness, in the third session the subject (#10) was asked to count the number of trials per run, and the stimulus was randomized across trials (7 runs; 288 s per run; 28 repetitions per trial type). Stimuli were presented binaurally at 55 dB above the threshold to a 10/s noise burst train (as in Exps. I and II). The imaging methods were identical to those for Exp. I with the following exceptions: A 3T, instead of a 1.5T, scanner was used to improve the ability to detect small amplitude responses 34 Chapter 2: Repetition Rate (General Electric, outfitted for echo-planar imaging by ANMR Inc.). The parameters used in acquiring the high-resolution anatomical image of the “plane of interest” were: TR = 10 s, TI = 1200 ms, TE = 57 ms, in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm). The functional imaging parameters were: gradient echo, TE = 40 ms, flip = 90o, in-plane resolution = 3.1 x 3.1 mm, thickness = 7 mm). The first session used a fixed interimage interval (TR) of 2 s. The second and third sessions used cardiac gating (parameters as in Exps. I and II) in an attempt to detect single trial responses in the IC. Convincing responses were generally not seen in the IC. Therefore, only cortical data are reported for Exp. III. Images were analyzed and time courses for each stimulus were computed as in Exps. I and II, with the following exceptions: (1) For the one session with a fixed TR, no linear interpolation was necessary; (2) No temporal smoothing was applied (in order to avoid disproportionally altering the responses, which were expected to be brief in duration); (3) The activation map for determining the reference voxels was based on a single run of music (4 repetitions of the first 30 s of the fourth movement in Beethoven Symphony No. 7). Because music typically evokes larger magnitude responses than trains of either 2/s or 10/s noise bursts, we were able to obtain robust activation maps with a single run, thereby allowing more time for collecting responses to the primary stimuli of interest for the experiment.2 As in Exps. I and II, the four reference voxels selected from HG and STG were those with the lowest p-values in the t-test activation map; (4) The baseline signal level for converting time courses to percent change was based on the average of just two time points, t = -2 to 0 s (since the “off” period between stimuli (18 s) was less for this experiment than for Exps. I and II (30 s) and we wanted to avoid including time points where the response may have not yet returned to baseline from the preceding stimulus). 2 In several sessions (not included in this paper), in which we presented both music and trains of 2/s and 35/s noise bursts, we obtained similar responses for the noise burst trains irrespective of whether the reference voxels were chosen using activation maps based on music or 2/s noise bursts. Typically, at least two of the four reference voxels were in common between the two activation maps. Importantly, the dynamics of the responses to music are similar to the dynamics of the responses to 2/s noise burst trains. Results 35 Experiment IV: Noise burst trains with different durations The effect of train duration was examined in two imaging sessions with two subjects (Exp. IV; subject #'s 11,12). Trains of four different durations (15, 30, 45, and 60 s) were presented with an “off” period of 40 s following each train. Noise burst repetition rate within each train was always 35/s. Each train duration was presented once per run (310 s per run; 8-9 runs) with the order of durations randomized across runs. Imaging parameters were the same as Exp. III, except the gradient echo functional images used a TE of 30 ms and a 60o flip angle.3 Both sessions used cardiac gating so that the effect of train duration in cortex could be compared to the effect in the IC. Time courses were computed as in Exp. I, except the activation map for determining the reference voxels was based on a single run of music (as in Exp. III). Supplementary information concerning the effects of train duration was obtained in two additional experiments that used a single, long train duration (60 s) and 35/s noise bursts. One of these experiments was conducted at 1.5 T, and the other at 3 T using the imaging parameters from Exp. I and III, respectively. RESULTS Response to noise burst trains: effect of burst repetition rate Inferior Colliculus Activation maps for the IC showed an increase in activation with increasing burst repetition rate. Figure 2-3 demonstrates this increase for two sessions from Exp. I. The maps show activation that is least at 2/s, greater at 10/s, and greatest at 35/s. The volume of the inferior colliculus (2-4 3 We lowered the TE to reduce the potential for susceptibility induced signal losses. The flip angle was reduced because of a tendency of the magnet to “overflip” past the nominal value. 36 Chapter 2: Repetition Rate voxels) is only slightly greater than the spatial resolution of the activation maps, so the main difference across rate is in the strength of activation (greater activation is reflected in the maps as a lower p-value from the statistical comparison of image signal level during train “on” and “off” periods). Greater IC activation at higher rates is also demonstrated by the maps in Figure 2-6, which correspond to two sessions from Exp. II. Figure 2-4 (left column) shows the time course of the responses in the IC averaged across all sessions. At all rates, the response was “sustained” in that image signal increased when the noise burst train was turned on, remained elevated while the train was on, and decreased once the train was turned off. The amplitude of the sustained response during the “train on” period increased with increasing rate. The increase in response amplitude was quantified using two measures: peak percent signal change near the beginning of the “train on” period (“onset” percent change), and percent signal change time-averaged over the on period (“time-average” percent change; defined in Methods). On average, both measures increased with increasing rate (Figure 2-5, top left). Onset and time-average percent change showed a significant increase from 2/s to 10/s (p = 0.01, onset; p = 0.05, average; paired t-test), and from 10/s to 35/s (p = 0.02, onset; p = 0.006, average). Plots of percent change vs. rate for individual IC also showed an overall trend of increasing percent change with increasing rate (Figure 2-5, top middle and right). For 19 of 22 IC, the response at 35/s was greater than the response at 2/s (for both measures). For the rates that overlapped between Exps. I and II (2, 10, 35/s) there was no significant difference between the percent signal change values (p > 0.1, t-test), suggesting that the two main differences between these Experiments (imaging plane and intensity detection task) did not have a strong effect on inferior colliculus responses. There was no significant difference between the values obtained from the left and right IC (p > 0.3, paired t-test, collapsing the data across all rates). Summary: The IC showed a sustained response to noise burst trains. The amplitude of this response increased with increasing burst repetition rate. Results 37 Inferior Colliculus (Exp. I) Subject 3 R L Subject 5 p=0.01 35/s p=2x10-9 10/s 5 mm 2/s Inferior Colliculi Figure 2-3: Activation maps for the IC (two subjects, Exp. I). Stimuli were noise burst trains with repetition rates of 2, 10, or 35/s. Each panel shows a T1-weighted anatomic image (grayscale) and superimposed activation map (color) for a particular subject. Rectangle superimposed on the diagrammatic image (bottom, right) indicates the area shown in each panel. For the activation maps, regions are colored according to the result of a t-test comparison of image signal strength during “train on” and “off” periods. In this and all subsequent figures, blue and yellow correspond to the lowest (p = 0.01) and highest (p = 2 x 10-9) significance levels, respectively. (Areas with p > 0.01 are not colored). Activation maps (based on functional images with an in-plane resolution of 3.1 x 3.1 mm) have been interpolated to the resolution of the anatomic images (1.6 x 1.6 mm). Images are displayed in radiological convention, so the subject's right is displayed on the left. R, right; L, left. 38 Chapter 2: Repetition Rate Response Time Courses Inferior Colliculus 2 Medial Geniculate Body Heschl's Gyrus train ON 2 1 35/s Superior Temporal Gyrus ON peak OFF peak 1 0 0 20/s 10/s Percent Signal Change 2 mean ± std. error 1 0 2 2 1 1 0 0 2 2 2/s 1 1 0 0 0 30 60 0 Time (sec) 1/s 30 60 2 1 0 0 30 60 0 30 60 Time (sec) Figure 2-4: Response time courses averaged across sessions and hemispheres (solid lines; IC: n = 22 for 2,10,35/s, n = 12 for 20/s; MGB: n = 10 for all rates; HG and STG: n = 10 for 2,10,35/s, n = 6 for 1/s). Dashed lines give the mean ± one standard error at each time point. Note that the vertical scale for the IC and MGB responses differs slightly from the scale for HG and STG. Results 39 Medial Geniculate Body In contrast with the IC, activation maps for the MGB usually showed a nonmonotonic change in activation with rate. The trend for the MGB is illustrated by the maps for two sessions in Figure 2-6.4 The maps show an increase in MGB activation with increasing rate in the 2/s – 20/s range, but a decrease from 20/s to 35/s. The trend in the activation maps parallels the rate-dependence of time-average percent signal change in the MGB, but not onset percent change. The close correspondence between time-average percent change and activation maps is to be expected since the maps are based on a comparison of time-average signal levels during “train on” and “off” periods. In the average across sessions, timeaverage percent change increased significantly from 2/s to 20/s (p = 0.005, paired t-test), and decreased from 20/s to 35/s (p = 0.03; Figure 2-5, left). On average, onset percent signal change showed the same trend from 2/s to 20/s (i.e., a significant increase from 2/s to 20/s; p< 0.001), but not at high rates in that there was no difference between 20/s and 35/s (p = 0.9). The different ratedependence for onset vs. time-average percent change is also apparent overall in the plots for individual MGBs (Figure 2-5, middle, right), despite the intersession variability in the precise trends between the rates. Neither onset nor time-average percent change differed significantly between the left and right MGBs (p > 0.15, paired t-test, collapsing the data across all rates). The different rate-dependencies for onset and time-average percent change indicate that the time course of the MGB response varies with rate. This variation is illustrated in Figure 2-4 (second column). On average, responses to a 35/s train peaked just after train onset, then declined by approximately 50% during the remainder of the train. This moderate decrease in the response differs from the largely sustained responses at the lower rates of 2/s, 10/s, and 20/s. A quantitative comparison of onset percent change to the percent change at the end of the train (i.e., at 30 s in the time courses of Figure 2-4) confirmed the response difference at the highest rate. For the 35/s train, 4 In general, activation in the MGB was not as strong as in the IC. Consistently within subjects, the standard deviations used in calculating the t-statistic for the activation maps were greater for the MGB than the IC. 40 Chapter 2: Repetition Rate Figure 2-5: Response magnitude vs. repetition rate in the IC, MGB, HG, and STG. Left: Timeaverage and onset percent change averaged across sessions and hemispheres. Bars indicate the standard error.5 (See caption of Figure 2-4 for the number of sessions and hemispheres represented by each data point). Middle and right: Time-average and onset percent change for each session and hemisphere vs. rate. To facilitate comparison of the trends across rate, each curve has been displaced vertically by adding a constant (specific to each curve), such that the resulting mean of the values for 2, 10, and 35/s is always the same [and equal to the population mean for these rates (left column)]. In all plots, the repetition rate axis uses a categorical scale. Note that there are no data at 20/s for all of the HG and STG curves, and for 10 of the IC curves. 5 The relatively larger standard errors for the MGB and STG, both in this figure and the response time courses, arise from one instance in each structure (left hemisphere for MGB, right hemisphere for STG) in which the response at all rates was noticeably larger than the responses from other subjects. Exclusion of this MGB “outlier” reduced the mean time-average percent change, onset percent change, and time course values (during the “train on” period) by 20-30%, and the standard errors by 30-50%. Exclusion of the STG outlier reduced the mean of these same measures by 25-35%, and their standard errors by 40-60%. Precisely because of the potential for large intersubject variations in response magnitude, we have used paired statistical tests throughout the text whenever appropriate. The trends across rate for both the MGB and STG outlier were consistent with the results reported in this paper, and analyses conducted by excluding these outliers did not change any of the primary conclusions in this paper. Results 41 Response vs. Repetition Rate Average across Subjects Individual Subjects Time-Average Onset 2 Onset Inferior Colliculus 0 Normalized Percent Signal Change Time-Average Medial Geniculate Body Heschl's Gyrus Percent Signal Change 2 0 2 0 3 Superior Temporal Gyrus 0 1 2 10 20 35 1 2 10 20 35 1 2 10 20 35 Repetition Rate of Noise Bursts in Train Figure 2-5 42 Chapter 2: Repetition Rate Medial Geniculate Body and Inferior Colliculus (Exp. II) Subject 5 R Subject 6 L 35/s 20/s 10/s 2/s 5 mm Medial Inferior Medial Geniculate Colliculi Geniculate Body Body Figure 2-6: Activation maps for the MGB and IC (two subjects, Exp. II). Stimuli were noise burst trains with repetition rates of 2, 10, 20, or 35/s. See Figure 2-3 caption. Results 43 response amplitude was significantly less at the end of the train (p = 0.04, paired t-test), consistent with a response decrease. In contrast, there was no significant difference at the lower rates (p > 0.1), consistent with a sustained response.6 Thus, MGB responses varied over the course of high, but not lower rate trains. The rate dependencies seen in time-average percent change and the activation maps can be explained in terms of the time course and onset amplitude of MGB responses. Between 2/s and 20/s, the increase in time-average percent change (and activation in the maps) is largely attributable to the increase in sustained response amplitude, which is simultaneously reflected as in increase in onset percent change. Given that 20/s and 35/s evoke equal onset responses, the decrease in time-average percent change (and in the maps) between these two rates can be primarily attributed to a change in the response to the latter portion of the train (i.e., the change from a sustained response to one with a moderate decrease following onset). Summary: Between 2/s and 20/s, onset percent change increased with increasing rate in MGB, while response time courses remained primarily sustained. Between 20/s and 35/s, there was no change in onset amplitude, but the response dynamics changed from sustained to moderately decreasing following the train onset. Heschl's gyrus and superior temporal gyrus A nonmonotonic relationship between rate and activation was apparent in the activation maps for HG and STG (Figure 2-7). The maps showed an activation increase from 1/s to 2/s, and a decrease from 10/s to 35/s. 6 The average time course of the MGB response at 2/s (Figure 2-4) shows an increase from the onset to the end of the train. However, this increase was only present in 5 of 10 MGB, and was not significant in a paired t-test. 44 Chapter 2: Repetition Rate Auditory Cortex Subject 3 R (Exp. I) Subject 5 L 35/s Heschl's Gyrus Superior Temporal Gyrus 10/s 2/s 1/s 1 cm Figure 2-7: Activation maps for HG and STG (two subjects, Exp. I). Stimuli were noise burst trains with repetition rates of 1, 2, 10, or 35/s. See Figure 2-3 caption. Results 45 As expected, the trends in the activation maps paralleled the rate-dependence of timeaverage percent signal change. In HG, time-average percent change increased from 2/s to 10/s (p = 0.05, paired t-test), but decreased markedly from 10/s to 35/s (p < 0.001; Figure 2-5, left). These trends were observed consistently in individual HG (Figure 2-5, middle). In STG, the rate of greatest time-average percent change (2/s) was less than in HG (10/s; Figure 2-5, left). For the 6 STG with 1/s data, time-average percent change at 2/s was greater than 1/s (p = 0.003, paired t-test). Timeaverage percent change tended to decrease from 2/s to 10/s and from 10/s to 35/s, so that the overall decrease from 2/s to 35/s was significant (p = 0.002; Figure 2-5). Onset percent change again showed differences compared to time-average percent change. In HG, the difference was primarily one of degree. Onset percent change at 10/s was significantly greater than both 2/s and 35/s (p = 0.01, paired t-test; Figure 2-5, left and right), but the decrease from 10/s to 35/s averaged only 20% for onset percent change compared to 50% for time-average percent change. In STG, onset and time-average percent change had overall different trends. Whereas time-average percent change decreased from 2/s to 35/s (p = 0.002), onset percent change was unchanged over this range (p = 0.4; Figure 2-5, left and right). A dramatic rate-dependent change in response waveshape accounts for the differences between onset and time-average percent change in HG and STG. At low rates, responses were sustained, whereas at high rates they were not (Figure 2-4, third and fourth columns). At the highest rate of 35/s, image signal increased to a peak occurring 6 s following train onset (“on-peak”), declined substantially over the next 8 s, increased slightly over the remainder of the on period, and peaked again 6 s following train offset (“off-peak”). The most prominent features of this “phasic” response are the peaks just after train onset and offset. In HG, the reduction in time-average percent change between 10/s and 35/s was partly because of 1) the decrease in onset percent change, and 2) the more dramatic signal decline during the on period for the 35/s train. In STG, onset percent change did not vary significantly with rate, so the decline in time-average percent change at high rates was primarily due to the change in response waveshape. While the rate-dependencies of the HG and STG responses paralleled each other, there were also clear differences between the two structures. In both HG and STG, the signal decline during the 46 Chapter 2: Repetition Rate on period became increasingly pronounced with increasing repetition rate, so responses had an increasing phasic appearance. However, at any given rate, the magnitude of the signal decline was greater in STG. In STG, the magnitude of the decline (measured as a percentage decrease from the onset peak to the value at 14 s in the average response time courses; Figure 2-4) was 22, 25, 58, and 93% for 1/s, 2/s, 10/s and 35/s respectively. In HG, the corresponding values were all less: 15, 13, 32, and 78%. In light of the phasic response for high rate trains, it is not surprising that the activation maps frequently showed little evidence of activity at the 35/s rate (e.g., Figure 2-7, top left). The activation maps were based on the difference in time-average signal between “train on” and “off” periods. It is clear that for the response to the 35/s train (Figure 2-4), the difference between the time-average of these two periods will be close to zero, even though cortex is responding robustly (albeit transiently).7 Thus, for cortex, the activation maps, calculated using a standard method, provide only a partial picture of cortical rate dependencies. In HG and STG there were right-left differences in response magnitude. In HG, both timeaverage and onset percent change was greater on the right for 16 of 18 possible comparisons (p < 0.001, paired t-test, collapsing the data across all rates. In STG, the same trend was apparent, but was weaker (right greater in 12 of 18 cases; p < 0.02). These right-left differences may reflect a functional difference between right and left auditory cortex. Alternatively, they may reflect a functional difference in auditory cortex in the anterior/posterior dimension. HG tends to be shifted more anteriorly on the right as compared to the left (Penhune et al. 1996; Leonard et al. 1998), raising the possibility that the imaged slice sampled different cortical areas on the two sides. In support of there being a true right-left functional difference, a post-hoc reexamination of our imaging plane (relative to the sagittal reference images) confirmed that the slice always intersected the postero-medial aspect of HG and, in cases with two gyri, sampled the more anterior one. Thus, in all 7 Although we accounted for hemodynamic delay in generating the activation maps (see Methods), both the “train on” and “off” periods were shifted by three images. Consequently, since both the “on” and “off-peak” occur approximately 6 sec following train onset and offset, the two peaks will essentially nullify each other in the t-statistic calculation. Results 47 cases, the slices likely sampled primary auditory cortex (Rademacher et al. 1993), as well as immediately lateral non-primary areas. In both HG and STG, response waveshape showed the same changes with rate in the left and right hemispheres, although the signal decline following the onset peak tended to be greater on the left compared to the right hemisphere.8 Overall, the right-left difference was primarily one of response magnitude. Summary: Responses in HG were sustained at low rates, but became phasic at high rates. The most prominent features of this phasic response are signal peaks just after train onset and offset. The amplitude of the on-peak (onset percent change) was greatest at 10/s. Responses in STG also showed a progression from sustained to phasic with increasing rate. However, the amplitude of the on-peak did not vary significantly with rate. Response to small numbers of noise bursts To investigate the rapid decline in signal shortly after train onset for cortical responses, we compared the responses to a single noise burst, and clusters of two or five noise bursts with a burstto-burst interstimulus interval (ISI) of 28.6 ms (“35/s” rate) or 500 ms (“2/s” rate). Both single and clustered noise bursts elicited measurable responses in HG and STG. The responses, averaged across subjects and hemispheres, peaked 4-6 s after the stimulus and then returned to baseline by 810 s (Figure 2-8, top). After 8-10 s, the average response dipped below baseline. However, this response feature, unlike the others, was dominated by the data for only one of the three subjects (subj. #2). 8 Specifically, in HG the magnitude of the signal decline for the 35/s train (measured as a percentage decrease from the onset peak to the value at 14 s in average response time courses computed separately for each hemisphere) was 84% and 73% in the left and right hemisphere, respectively. In STG, the corresponding values were 125% (i.e., a decline below baseline) and 71% for the 35/s train, and 76% and 44% for the 10/s train. 48 Chapter 2: Repetition Rate Responses to Single and Multiple Noise Bursts Percent Signal Change (averaged across subjects) Heschl's Gyrus Superior Temporal Gyrus 1.0 2.0 1 NB 2 NBs @ 35/s 5 NBs @ 35/s 2 NBs @ 2/s 1.5 0.5 1.0 0.5 0 0 -0.5 -4 -0.5 0 4 8 12 16 -4 0 4 8 12 16 Time (sec) Normalized Peak Response (within individual subjects) 3.5 Linear Growth 3.5 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 @ 35/s 0.5 @ 2/s Linear Growth Subject Hemisphere 2 R 5 R 10 R 2 L L 5 10 L @ 35/s 0.5 @ 2/s ~ ~ ~ ~ 1 2 5 2 1 2 5 2 Number of Noise Bursts Figure 2-8: Top: Average response time courses in HG and STG to either a single noise burst, or a cluster of two or five noise bursts. Each trace is an average across both hemispheres of three subjects. Bottom: Normalized peak response for each subject and hemisphere. Dashed line indicates the prediction from a model in which each successive noise burst evokes a response identical to the 1 NB response, and the responses to each burst add. Results 49 Figure 2-8 (bottom) shows normalized peak response vs. number of noise bursts for each subject and hemisphere. These normalized responses were quantified as the peak percent signal change in the response time course (which always occurred at t = 4 or 6 s), divided by the peak percent change for a single noise burst. The normalized peak response generally increased with increasing number of noise bursts (Figure 2-8, bottom). However, the response increase was always less than would be predicted by a model in which each successive noise burst evokes a response equivalent to the 1 NB response and the responses to each burst add (i.e., linear growth). Similarly, for every subject and hemisphere, the peak response to 5 NBs@35/s was less than 2.5 times the response to 2 NBs@35/s. These results are consistent with a model in which the responses to noise bursts at the beginning of a train are greater than those occurring later. The fact that the peak response for 2 NBs@2/s was always greater than for 2 NBs@35/s indicates that the decline in response from the first burst to the second was greater at high, as compared to low rates. We compared the mean peak percent change for single and multiple noise bursts to the mean onset percent change for 35/s trains from Exp. I9 to gain an appreciation for the proportion of the onpeak accounted for by the earliest noise bursts in the train. In STG, we estimated that the mean peak percent change for 1 NB and 5 NBs@35/s was approximately 40% and 65%, respectively, of the mean onset percent change. In HG, the corresponding estimates were approximately 25% and 40%. These values indicate that the earliest noise bursts of a high rate train account for a substantial portion of the on-peak, especially in STG. 9 In making these comparisons, we took into account the two main differences between Exps. I and III. First, because the responses in Exp. III were computed without temporal smoothing, we recomputed the onset percent change values from Exp. I without temporal smoothing. The resulting mean onset percent change values for a 35/s train increased by 20% for both HG and STG. Second, we took into account the difference in imaging parameters (Exp. I: 1.5T scanner, ASE sequence; Exp. III: 3T scanner, GE sequence). The effect of this difference was estimated empirically using responses to music obtained under Exp. I (11 sessions) and Exp. III conditions (11 sessions). Time-average percent change was, on average, about 25% greater for the Exp. III conditions, so the onset amplitude from Exp. I (calculated without temporal smoothing) was revised up by this percentage before comparison with the single and multiple noise burst responses. 50 Chapter 2: Repetition Rate Response Dependence on Train Duration 1.5 15 s Percent Signal Change Inferior Colliculus 45 s 60 s OFF peak 1.0 Heschl's Gyrus 30 s 0.5 0 -0.5 1.0 Subj. 11 Subj. 12 0.5 0 train ON -0.5 0 15 0 30 0 45 0 60 Time (sec) Figure 2-9: Response time courses in HG (top) and IC (bottom) to 35/s noise burst trains, with durations of 15, 30, 45, and 60 s. Each trace is an average across hemispheres for a given subject. The off-peak in each HG response is indicated by an arrow. Response to high rate (35/s) noise burst trains: effect of train duration By considering noise burst trains with different durations, we tested whether the off-peak in cortical phasic responses is specifically linked to train termination. Two subjects were studied using 35/s noise burst trains with durations of 15, 30, 45, and 60 s. For both subjects and all durations, HG responses showed a distinct off-peak after train offset (Figure 2-9, top). Regardless of train duration, the off-peak occurred approximately 6 s after train offset, indicating a strong coupling between offpeak and train termination. A similarly strong coupling between off-peak and train termination was also found in STG for both subjects (not shown). One subject (#11) was unusual in that the response in STG did not show a clear off-peak for voxels selected by our standard criteria. Nevertheless, there was a clear off-peak for other, nearby voxels, and this off-peak always occurred approximately 6 s after train offset, regardless of train duration. In contrast to cortex, IC responses were largely sustained for all durations and showed no sign of an off-peak (Figure 2-9, bottom). Discussion 51 Data in two additional subjects further support the strong coupling between cortical off-peak and train termination. These subjects, tested with a single train duration of 60 s, showed off-peaks in both HG and STG that occurred approximately 6 s after train offset. All of the train duration data taken together indicate that the cortical off-peak is specifically evoked by the termination of highrate noise burst trains. DISCUSSION fMRI responses to trains of noise bursts changed substantially with burst repetition rate in every studied structure, although the nature of the changes was highly structure-dependent. In the IC, response amplitude near train onset increased with increasing rate while response waveshape remained unchanged (i.e., sustained). In the MGB, increasing rate produced an increase in onset amplitude up to a point where a further increase in rate instead produced a change in waveshape (from a largely sustained response to one showing a distinct peak just after train onset). In HG, the site of primary auditory cortex, onset amplitude changed some with rate, but the most striking change occurred in response waveshape. At low rates the waveshape was sustained, while at high rates it was strongly phasic in that there were prominent response peaks just after train onset and offset. In STG, which includes secondary auditory areas, onset amplitude showed no systematic dependence on rate, whereas response waveshape showed a strong and dramatic rate-dependence paralleling that in HG. Overall, from midbrain, to thalamus, to cortex, there was a systematic shift in the form of response rate-dependencies from one of amplitude to one of waveshape. The sustained response waveshapes seen in subcortical structures and for low rates in auditory cortex are typical of the fMRI literature. In contrast, the phasic responses seen for higher rates in auditory cortex are not, nor are their individual signature features. One signature feature – the prominent peak following stimulus onset – has been reported for a few prolonged acoustic, odorant, and visual stimuli (Bandettini et al. 1997; Jäncke et al. 1999; Giraud et al. 2000; Sobel et al. 2000), but is nevertheless a fairly uncommon feature for responses in the fMRI literature. A second signature feature of phasic responses – the peak following stimulus offset – is highly unusual. To 52 Chapter 2: Repetition Rate our knowledge, the only other reported “off-response” occurred in a subregion of primary visual cortex following the transition from steady white light to darkness (Bandettini et al. 1997). The paucity of previous reports of phasic fMRI responses may be partly an issue of detection since phasic responses are poorly detected by some of the most commonly-used analysis approaches (e.g., a t-test comparison of stimulus “on” and “off” periods or, equivalently, correlation or analyses using the SPM software package that assume a sustained response; Bandettini et al., 1993; Sobel et al., 2000). It is also possible that phasic responses have not been seen because they reflect neurophysiological mechanisms that are only invoked in particular, largely unexplored stimulus regimes. It is widely assumed that different sound features (e.g., frequency, bandwidth, repetition rate) are represented in the amplitude of fMRI activation or amplitude variations with position (e.g., Giraud et al. 2000; Talavage et al. 2000; Yang et al. 2000; Wessinger et al. 2001). In contrast, the possibility of representations in the temporal dimension is not generally entertained, and this makes the wide variations in cortical response waveshape of the present study especially intriguing. A few other studies have also reported covariations between sound characteristics and temporal fMRI activation patterns in the auditory system. For instance, Gaschler-Markefski et al. (1997) examined the degree of temporal stationarity of auditory cortical fMRI responses and reported regional variations depending on stimulus and task. In studying fMRI responses to amplitude modulated noise, Giraud et al. (2000) found an increasingly prominent peak at stimulus onset with increasing modulation rate, a result that strongly parallels the findings of the present study (see “Comparison to previous fMRI and PET studies” below). The present study and these previous reports suggest that fMRI temporal patterns – or more specifically the temporal variations in neural activity underlying these patterns – may be an important way in which sound is represented in the auditory system. Role of rate per se in determining fMRI responses In the present study, noise burst duration was held constant while rate was varied, so overall stimulus energy and sound-time fraction (STF) covaried with rate (resulting in an ~12 dB differential in sound pressure level for 2/s vs. 35/s noise burst trains). While this raises the possibility that the wide range of response waveshapes in auditory cortex was due primarily to changes in parameters Discussion 53 other than rate, we do not believe this to be the case for two reasons. First, in a separate study, we have found that varying the intensity of 2/s or 35/s noise bursts over a 20-30 dB range has no effect on response waveshape (see Chap. 4). Second, we have found that changing rate from 2/s to 35/s while holding STF constant (and therefore varying burst duration) still produces a change in waveshape from sustained to phasic (although STF does have some influence on response waveshape; Chap. 4). In the case of response amplitude, the precise rate dependencies might be somewhat different if stimulus energy and STF were held constant instead of burst duration, because varying energy alone can produce changes in response amplitude (Hall et al. 2001; Sigalovsky et al. 2001), as may also be the case for changes in STF. fMRI responses and underlying neural activity To understand the significance of the different fMRI response waveshapes, it is necessary to first consider the extent to which waveshape is governed by neural, metabolic, and hemodynamic factors. While the relationship between neural activity and fMRI responses is not fully understood, it is generally accepted that neural activity and image signal are ultimately linked through a chain of metabolic and hemodynamic events. For the form of fMRI in the present study (“blood-oxygen level dependent”, or BOLD fMRI), this linkage is as follows. When there is an increase in neural activity in the form of synaptic events or neural discharges10, there is a corresponding increase in local brain metabolism and oxygen consumption (Sokoloff 1989). The increase in oxygen consumption is 10 We consider both synaptic events and neural discharges as “neural activity”, because there is evidence in favor of each as a contributor. 2DG studies suggest that synaptic events may dominate the metabolic response (and hence the fMRI response; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995). A recent study by Logothetis et al. (2001), which simultaneously recorded intracortical activity and fMRI responses, also suggests that synaptic activity may dominate. However, there are also reports of a strong coupling between discharges and fMRI responses (Heeger et al. 2000; Rees et al. 2000), possibly because discharges and synaptic activity were themselves strongly correlated in those studies. It seems likely that the relative contribution of synaptic events versus discharges to the fMRI response will depend on the specifics of the local neural circuitry (Bandettini and Ungerleider 2001), and thence may in fact vary across different regions of the brain (Auker et al. 1983; Mathiesen et al. 1998), or even for different types of stimuli. For the purposes of the present discussion, we leave open the possibility that either or both synaptic activity and discharges “contribute” to the fMRI responses we measured. 54 Chapter 2: Repetition Rate accompanied by an increase in blood flow and blood volume in the active brain region. However, the increase in flow dominates, such that the local concentration of deoxygenated hemoglobin actually decreases, which is important because deoxygenated hemoglobin is paramagnetic (Pauling and Coryell 1936) and thus influences local image signal levels. The net effect of a decrease in deoxygenated hemoglobin is an increase in image signal. When the entire chain of events is considered together, increases and decreases in neural activity result in concordant changes in image signal strength (Kwong et al. 1992; Ogawa et al. 1993; Springer et al. 1999). Since hemodynamic changes occur over the course of seconds, fMRI effectively provides a temporally low-pass filtered view of neural activity. More specifically, since fMRI is sampling activity over small volumes of brain (i.e., voxels), the responses can be thought of as showing the time-envelope of population neural activity on a voxel-by-voxel basis. Previous work has shown that the relative timing and magnitude of stimulus-evoked changes in blood flow, blood volume, and oxygen consumption can influence the waveshape of the fMRI response (Buxton et al. 1998; Mandeville et al. 1998). While this raises the possibility that changes in waveshape from sustained to phasic reflect changes in hemodynamics rather than underlying neural activity, we believe this to be unlikely for both of the main components of the phasic response, namely the off-peak and the on-peak. It is particularly unlikely that a hemodynamic explanation accounts for the off-peak. Previous hemodynamic modeling and experimentation has not predicted an off-peak following stimulus termination, and we know of no plausible model that could generate such a component. Therefore, the emergence of an off-peak with increasing repetition rate in auditory cortex is almost certainly attributable to a rate-dependent increase in neural activity at stimulus offset. The other major feature that distinguishes phasic from sustained responses, namely the sharp decline in signal that forms the prominent onset peak, requires more detailed consideration because it is known that declines in signal can theoretically occur over the course of a prolonged stimulus for completely hemodynamic reasons (Buxton et al. 1998). However, measurements of BOLD signal, blood flow, and blood volume responses have failed to illustrate a case for which purely hemodynamic features could generate a signal decline as dramatic as those seen here (e.g., Hoge et Discussion 55 al. 1999a; Mandeville et al. 1999). Additionally, separate evidence works against the idea that the signal decline is driven primarily by hemodynamic rather than neural influences. The reasoning follows from the fact that the same voxels were capable of showing either a phasic response (and the associated dramatic signal decline) or a sustained response depending on the stimulus. Since the time course of the phasic and sustained responses is very similar over the first 6-8 seconds, the “operational history” of the hemodynamic system is presumably similar as well. In light of this common initial response, a hemodynamic system that could subsequently generate grossly different response waveshapes seems unlikely, unless the differences reflect differences in underlying neural activity. While response waveshape varied with rate within a structure, it also varied across structures for a given rate. It is known that there can be spatial heterogeneity in tissue hemodynamics (Chen et al. 1998; Davis et al. 1998), so the possibility that regional variations in hemodynamics play some role in the waveshape differences across structures cannot be discounted. Still, the heterogeneities in hemodynamics that have been documented are not sufficient to account for the dramatic waveshape changes that occur across the pathway as a whole (from the inferior colliculus to cortex). fMRI response onset and neural adaptation Given that fMRI responses reflect the time-envelope of population neural activity, the prominent declines in fMRI signal that occur at high rates in MGB, HG, and STG provide clear evidence for an overall decline in neural activity during the first seconds of a train (< 10 s). This decline likely includes decreases in synaptic, as well as discharge activity since both forms of activity are reflected in fMRI signals. While an overall decrease in neural activity early in high-rate trains is clear, the subsecond temporal details of activity during this decrease remain unresolved because fMRI provides a lowpass filtered view of neural activity. Previous electrophysiological data suggest various possible forms for the temporal details underlying the overall decline in neural activity. For instance, it may be that the fMRI signal decline reflects a burst-to-burst adaptation in neural activity in which each successive burst early in a train elicits progressively less activity (Ritter et al. 1968; Roth and Kopell 56 Chapter 2: Repetition Rate 1969). Alternatively, a variant of this may occur in which activity does not always decrease in a strictly progressive fashion across consecutive bursts, but sometimes shows an increase from one burst to the next (e.g., facilitation or enhancement; Loveless et al. 1989; Budd and Michie 1994; Brosch et al. 1999). (As long as decreases from burst to burst occur more often than increases, the time-envelope of neural activity would still decrease.) Another possibility is that population neural activity is not synchronized to individual bursts (Lu and Wang 2000; Lu et al. 2001), but instead occurs in response to the train as a whole with an initial peak in activity followed by a lower-level of activity. All of these possibilities would result in a decline in the time-envelope of population neural activity, and are therefore consistent with the prominent declines seen in fMRI signal. While we cannot conclusively determine the temporal details of activity during the declines in fMRI signal, it is worth recognizing that several aspects of our data are consistent with the idea that there is a burst-to-burst adaptation in neural activity. In addition to the decline in fMRI response early during high-rate trains, the fMRI responses to small numbers of noise bursts are also suggestive of an adaptation process in that fMRI response amplitude did not increase in proportion to the number of bursts, but rather showed a slower than linear growth. Whether neural activity and fMRI signal are coupled in an approximately linear manner (and under what circumstances) is an open question under active investigation (Boynton et al. 1996; Dale and Buckner 1997; Vazquez and Noll 1998; Hoge et al. 1999b; Gratton et al. 2001; Logothetis et al. 2001). However, if they are, the slower than linear growth in the fMRI responses to small numbers of noise bursts would indicate diminishing increases in neural activity as more and more bursts are added to a train (i.e., adaptation). Another aspect of the data consistent with neural adaptation is the growth in onset amplitude with rate. If neural activity and fMRI response amplitude vary in direct proportion, onset amplitude may be viewed roughly as an indicator of the time-average neural activity during the first seconds of a train. If there were no adaptation and each successive burst in a train produced an identical increase in neural activity, the time-average neural activity during the first seconds of a train would increase in direct proportion to rate, and onset amplitude would be expected to do the Discussion 57 same. Instead, a proportional increase in onset amplitude did not occur in any structure. This is most obvious at high rates in MGB, HG, and STG where onset amplitude did not change, or even declined with increasing rate.11 However, it can also be seen at lower rates and in the IC. For instance, an increase in rate from 2/s to 10/s increased onset amplitude by less than two-fold in every structure, well short of the five-fold increase expected if growth were proportional to rate. This result is consistent with neural adaptation occurring in all of the studied structures, even the IC where fMRI response waveshapes are largely sustained and do not immediately suggest an underlying adaptation. Looking across structures, the data indicate that any neural adaptation increased with increasing position in the pathway. For instance, at any given rate, the percentage decline in signal following the on-peak increased progressively from IC to MGB to auditory cortex (HG and STG), suggesting an increasing degree of adaptation in the underlying population neural activity. An increase in adaptation across structures is also suggested by the fact that the growth in onset amplitude with rate falls increasingly short of predictions assuming no adaptation. For instance, the increase in (average) onset amplitude from 2/s to 10/s falls increasingly short of the five-fold increase predicted in the absence of adaptation as one moves from IC (1.69), to MGB (1.42), to auditory cortex (HG: 1.26; STG: 0.92). Similarly, the increase in onset amplitude from 10/s to 35/s falls increasingly short of the 3.5-fold prediction [IC: 1.42; MGB: 1.36; auditory cortex: 0.80 (HG), 0.99 (STG)]. Thus, if there is burst-to-burst adaptation in population neural activity early in a train, it increases from IC to MGB to auditory cortex. 11 One neural interpretation of the trends in onset amplitude follows from a simple model in which there is a neural response to each successive burst in a train, and two competing effects of increasing rate: (1) an increased number of bursts per unit time (working to increase total neural activity during the initial seconds of the train, and hence onset amplitude), and (2) a decreased response to individual bursts because of increased adaptation (working to decrease total neural activity). If this latter effect due to adaptation were relatively unimportant at low rates, but dominant at high rates, the result would be an increase in onset percent change with increasing rate at low rates and a plateau or decrease at high rates – the trends we observed in MGB and HG. 58 Chapter 2: Repetition Rate Relationship to electrophysiological data in animals – A trend of increasing adaptation with increasing position in the auditory pathway has emerged from several animal neurophysiological studies explicitly designed to compare responses across structures. For instance, microelectrode recordings of the response to paired stimuli with different interstimulus intervals have shown an increase in recovery time with increasing level in the auditory pathway. In unanesthetized animals (cats and rabbits), the average interval required for 50% recovery of the response to the second of two clicks is 2 ms in the auditory nerve, cochlear nucleus, and superior olivary complex, but 7 ms in the inferior colliculus, and 20 ms in auditory cortex (Fitzpatrick and Kuwada 1999). In unanesthetized guinea pig, Creutzfeldt et al. (1980) recorded responses to amplitude modulated tones simultaneously from thalamic and cortical neurons (specifically nine thalamo-cortical unit pairs for which the correlation of spontaneous activity suggested a direct synaptic connection). Activity in the cortical neurons declined more rapidly over successive cycles of the AM tone than did the activity in the thalamic neurons, indicating greater adaptation in the cortical neurons. Finally, recording nearfield potentials from the IC and auditory cortex in response to brief noise bursts in unanesthetized chinchilla, Burkard et al. (1999) found that the mean response amplitude (averaged across noise burst presentations) decreased more in cortex than IC as repetition rate was increased. Their results are again consistent with greater adaptation in cortex than lower structures in the auditory pathway. The extensive animal literature regarding modulation transfer functions (MTFs) also suggests a change in temporal response properties from the IC to auditory cortex. Here we focus on studies that quantify their results in terms of rate MTFs (rMTF; average firing rate vs. modulation frequency), rather than temporal MTFs, since changes in the “synchronization” or phase locking of neural activity (in the absence of average rate changes) are unlikely to be reflected in fMRI activity. Furthermore, since most animal studies use short duration stimulus trains (~1 s), the most appropriate measure of the present study for comparison to the animal results is onset amplitude. In the IC, the best modulation frequency (BMF; the frequency at which the rMTF has its largest value) for individual neurons is generally greater than ~30 Hz (Langner and Schreiner 1988; Muller-Preuss et al. 1994; Krishna and Semple 2000). In contrast, BMFs in auditory cortex tend to be less than ~20 Hz (Schreiner and Urbas 1988; Eggermont 1991; Bieser and Muller-Preuss 1996; Schreiner and Raggio 1996). These values are consistent with the present study in that onset amplitude steadily Discussion 59 increased in the IC for noise burst rates up to 35/s (the highest rate employed), but peaked in HG at a lower rate (10/s). While the variation in onset amplitude with rate in HG was rather small (and in STG, onset amplitude did not vary at all), a similarly weak “tuning” also holds in population neural activity, in that the rMTF averaged across cortical neurons is primarily low-pass, or only weakly band-pass (Schreiner and Urbas 1986; Eggermont 1994; Eggermont and Smith 1995; Eggermont 1998). The relatively flat nature of the average rMTF in cortex probably reflects a weak tuning in the rMTFs of many individual neurons (Schreiner and Raggio 1996; Eggermont 1998), but could also be due in part to the summation of activity across sharply tuned units having a wide range of BMFs. Overall, in both IC and HG, changes in onset amplitude as a function of repetition rate were consistent with what might be predicted based on microelectrode recordings in animals of neural spiking in response to amplitude modulated trains. Relationship to electric recordings in humans – A trend of generally greater adaptation at cortical vs. brainstem levels of the auditory pathway fits with data concerning two of the most studied components of human auditory evoked potentials: wave V of the brainstem evoked potential and the long latency potential N1. Wave V is likely generated by neurons projecting to the IC (Melcher and Kiang 1996; Moller 1998), while the primary generators of N1 have been localized to auditory cortex (e.g., Näätänen and Picton 1987; Reite et al. 1994). Wave V and N1 show different sensitivities to stimulus repetition rate. For example, in responses averaged over many click presentations, a high click rate (> 50/s) is required to generate a 30-50% amplitude decrement in wave V of the brainstem evoked potential (Thornton and Coleman 1975; Suzuki et al. 1986; Jiang et al. 1991). In contrast, similar decrements in N1 occur at much lower rates (0.5 – 2/s), as seen in averaged responses and by comparing the individual responses to successive stimuli in a train12 (Davis et al. 1966; Ritter et al. 1968; Roth and Kopell 1969; Fruhstorfer et al. 1970; Fruhstorfer 1971; Davis et al. 1972; Picton et al. 1977). The substantially different rate sensitivities for wave V 12 Interestingly, for stimulus pairs separated by 0.5 – 2 s (corresponding to rates of 2 – 0.5/s), the adaptation in N1 is in close agreement with the data from Exp. III, in which the average increase in peak magnitude from 1 NB to 2 NBs@2/s was about 60% in both HG and STG. 60 Chapter 2: Repetition Rate and N1 indicate greater stimulus-to-stimulus adaptation in the cortical neurons generating N1 than in the brainstem neurons generating wave V. Phasic response “off-peak” and neural off responses The off-peak in the phasic response may be related to electrophysiological “off-responses” observed intracortically following the termination of stimulus trains. For instance, following 175 ms click trains, Steinschneider et al. (1998) found transient increases in cortical multiunit activity in awake monkeys. Using depth electrodes implanted in the vicinity of human auditory cortex, Chatrian et al. (1960) showed gross potential responses following 3 s click trains. In both of these electrophysiological studies, the rate-dependence of the off-response resembled our fMRI cortical data in that an off-response was present for high, but not low repetition rates. Given this similarity, it seems quite possible that the fMRI and electrophysiological off-responses to stimulus trains arise from similar underlying neural mechanisms. The fMRI off-peak to stimulus trains may also be physiologically related to off-responses that occur in evoked potential and magnetic field recordings following the cessation of prolonged individual stimuli. Support for this idea comes from the fact that a cortical fMRI off-peak can be elicited by a prolonged noise burst (30 s duration; Sigalovsky et al. 2001), as well as by high rate stimulus trains. In humans, extracranial evoked potential (Spychala et al. 1969; Pfefferbaum et al. 1971; Hillyard and Picton 1978; Picton et al. 1978) and magnetic field recordings (Hari et al. 1987; Pantev et al. 1996; Lammertmann and Lutkenhoner 2001) have also shown responses to the offset of prolonged noise or tone bursts (with durations ranging from 0.5 to 10 s). The generation site of electrophysiological off-responses has been localized to auditory cortex (Hari et al. 1987; Pantev et al. 1996; Noda et al. 1998). Therefore, both the electrophysiological and fMRI off-responses imply increased activity in auditory cortical neurons following sound offset, a form of response that has ample precedent at the single neuron level in auditory cortex (Goldstein et al. 1968; Abeles and Goldstein 1972; Howard et al. 1996; He et al. 1997; Recanzone 2000). The fMRI and electrophysiological off-responses resemble one another functionally in that both show a decrease in magnitude with decreasing sound duration. For instance, evoked potential Discussion 61 off-responses following tone bursts have been shown to decrease in magnitude as burst duration is reduced from 9 s to 1 s (Hillyard and Picton 1978; Picton et al. 1978), or from 2.5 s to 0.5 s (Pfefferbaum et al. 1971). The fMRI off-response to high-rate trains is lower in magnitude for a 15 s train, as compared to longer duration trains (≥ 30 s; Figure 2-9). The duration dependence of fMRI and electrophysiological off-responses may indicate that a diminishing percentage of cortical neurons respond to sound offset as sound duration decreases, or that the neurons responding to the offset of sound do so less robustly as duration decreases. The fact that the fMRI and electrophysiological data were obtained for very different regimes of duration (15 – 60 s and 0.5 – 9 s) suggests that changes in sound duration may be reflected in cortical population neural activity in the same basic way over a broad range of sound durations. Phasic response recovery In addition to the prominent on and off-peaks of fMRI phasic responses, there is a third component, namely the steady signal increase that begins approximately 10 s after train onset and continues to the end of the train. A similar “signal recovery” has been observed in auditory cortex in response to tone bursts (Robson et al. 1998) and in the supplementary motor area in response to a finger tapping task (Nakai et al. 2000). This signal recovery13, which in the present study can be seen in the responses to intermediate (10/s) and high-rate (35/s) trains of various durations (e.g., 30 – 60 s; Figures 2-4 and 2-9), has no obvious analog in electric or magnetic evoked responses to prolonged auditory stimuli, since these responses do not generally increase over the course of the stimulus, but rather remain fairly constant, or even decrease (Picton et al. 1978; Lammertmann and Lutkenhoner 2001). However, most of the stimulus durations used in this literature are far less than those used in the present study, so it is possible that an electric or magnetic analog would be identified if longer stimulus durations were tried. It is also possible that the phasic response signal 13 It is unlikely that the signal recovery is the manifestation of low-frequency “oscillations” in the metabolic or vascular systems, since the “frequency” required to explain a signal increase consistent with the longest duration train (< 0.01 Hz) would be below the lowest frequencies generally considered in the literature on this topic (Hudetz et al. 1998; Obrig et al. 2000). 62 Chapter 2: Repetition Rate recovery is related to subjects' anticipation of the termination of high-rate trains (and thus may be loosely analogous to electrophysiological correlates of expectation such as the contingent negative variation; Donchin et al. 1978). To explore this idea, we performed two pilot experiments in which, for some runs, the subject's attention was diverted to the visual domain with a highly demanding, ongoing visual task, while presenting a 35/s noise burst train in our standard “30 s on, 30 s off” protocol. Comparison of task vs. no task showed the expected decrease in auditory response amplitude in the visual task condition (Woodruff et al. 1996), but no obvious change in the form of the signal recovery. If the signal recovery does indeed reflect anticipation of train termination, it would appear to be a “low-level” expectation that is not readily eliminated by overt diversion of attention. Comparison to previous fMRI and PET studies – auditory and non-auditory Several imaging studies have examined rate effects in auditory cortex using stimuli limited to low rates. fMRI studies presenting blocks of syllables or words at rates up to 2.5/s show primarily sustained responses in auditory cortex and an increase in response amplitude with increasing rate (Binder et al. 1994; Dhankhar et al. 1997; Rees et al. 1997). Both the time courses and amplitude variations are consistent with the results of the present study at low rates. PET studies have also investigated the effect of changes in the repetition rate of words or long duration (500 ms) tone bursts presented at rates up to 1.5/s (Price et al. 1992; Frith and Friston 1996; Rees et al. 1997). Because PET requires the collection of photon counts over an extended duration (e.g., tens of seconds), response time courses cannot be obtained, and results are reported only in terms of the overall percent signal change between stimulus and control periods. Computed in this “time- averaged” manner, these PET rate studies report a monotonic relationship between rate and percent change. The findings of the present study are in agreement with this trend at low repetition rates. The fMRI study of Giraud et al. (2000) examined a wider range of rates, and for high rates identified a cortical response component that is likely analogous to the “on-peak” in the phasic responses of the present study. The stimulation paradigm presented amplitude modulated (AM) noise and unmodulated noise in alternating 30 s blocks. Because unmodulated noise was used as the Discussion 63 comparison condition, the signal changes that occur following the onset of the AM noise may reflect a combination of responses: (1) to the AM noise itself, and (2) to the offset of the preceding, unmodulated noise (since the offset of unmodulated noise can elicit a change in cortical signal; Sigalovsky et al., 2001). Nevertheless, a relationship between AM rate and changes in cortical response waveshape was found. For auditory cortex and high AM rates (> ~32 Hz), image signal changes during AM presentations were modeled best by a transient response at AM noise onset. In contrast, for low rates, the signal changes were better modeled by a sustained response. Thus, the Giraud et al. study, like the present one, indicates the emergence of a prominent on-peak with increasing rate in auditory cortex. The activation reported by Giraud et al. for subcortical centers was relatively weak, perhaps because the data were taken without the benefit of cardiac gating. Nevertheless, the data suggest that higher rates generally yield greater responses than low ones in subcortical areas, a trend consistent with the results of the present study. Robson et al. (1998) reported fMRI responses to an ~30 s train of 5/s tone bursts (100 ms burst duration) that are remarkably similar to what we measured to 10/s noise bursts in cortex, in that they include (1) an initial on-peak characterized by a 30-50% signal decline, (2) a subsequently slowly increasing signal, and (3) a possible small signal increase or prolongation following train termination. Robson et al. (1998) found that the cortical fMRI response to the tone burst train could be fairly well-described by a model in which there is a response to each burst in a train, but the magnitude of the response decreases progressively with each successive burst until a steady state response level is reached. PET and fMRI studies of rate effects in the visual, somatosensory, and motor systems are generally consistent with the rate-dependence of “time-averaged” percent change in auditory cortex. Over the low rates (< 5 Hz) used in studies of the motor system, activation in primary motor cortex increases with increasing rate of finger tapping or finger flexion-extension movements (Blinkenberg et al. 1996; Rao et al. 1996; Schlaug et al. 1996; Jäncke et al. 1998). For primary visual (Fox and Raichle 1984, 1985; Kwong et al. 1992; Mentis et al. 1997; Thomas and Menon 1998; Zhu et al. 1998) and somatosensory (Ibanez et al. 1995; Takanashi et al. 2001) cortex, stimuli have been presented over a wider rate range, comparable to the range used in the present study. A consistent 64 Chapter 2: Repetition Rate result in these studies is that “time-averaged” activation increases at low rates, peaks around 8 Hz, and then declines, a trend similar to the one seen here for primary auditory cortex. The visual and somatosensory studies do not provide information concerning the time course of activation, so it is not clear whether the decline in time-average activation at high rates reflects a change in response waveshape, as is the case for primary auditory cortex. Nevertheless, the general trend in timeaverage activation raises the possibility that changes in response waveshape from sustained to phasic with increasing rate are not unique to auditory cortex, but instead are a general property of sensory cortical areas. Relationship between fMRI response waveshape and sound perception The fact that the various noise burst trains used in the present study differ considerably from a perceptual standpoint raises the possibility that some of the observed trends in the fMRI responses are correlates of perception, rather than simply rate. The most striking of the observed trends, the change in cortical response waveshape, showed a qualitative correlation with the perceptual attributes of the stimuli. For the low rate that elicited sustained responses (i.e., 2/s), the individual bursts in the train were distinguishable. For the high rate that elicited the most phasic responses (i.e., 35/s), the noise bursts fused so that the overall train sounded continuous (although modulated). Thus, distinctly different cortical response waveshapes occurred for two distinctly different perceptual regimes. The possible correlation between fMRI response waveshape and perception fits with the idea that population neural activity in auditory cortex encodes the beginning and end of perceptually distinct acoustic “events”. At a 2/s rate, individual noise bursts of a train are the distinguishable events, so there would be successive neural responses to each burst, resulting in a sustained fMRI response. In contrast, 35/s noise burst trains are perceptually continuous and would be treated as a single event, thus producing neural responses primarily at the onset and offset of the train, and thence the phasic fMRI response. Coordinated fMRI and psychophysical experiments will be necessary to more fully explore the potential coupling between perception and neural activity as seen through fMRI response waveshape. Discussion 65 The trend in fMRI response waveshape across structures suggests that the coding of perceptually distinct acoustic events in population neural activity occurs to different degrees depending on structure. While auditory cortex showed a clear, qualitative correlation between perception and response waveshape, the MGB showed less correlation in that there was less difference in waveshape for 2/s vs. 35/s bursts. In the IC, there was no apparent correlation since waveshape was the same regardless of whether the individual bursts of a train were distinct (as at 2/s) or fused (35/s). Thus, any correlation between perception and response waveshape diminished from high to lower stages of the pathway. Overall, our results suggest a population neural representation of the beginning and the end of distinct perceptual events that, while weak or absent in the midbrain, begins to emerge in the thalamus and is robust in auditory cortex. Acknowledgements The authors gratefully thank John Guinan, Mark Tramo, Irina Sigalovsky, Courtney Lane, Peter Cariani, and Monica Hawley for numerous helpful comments and suggestions, and Barbara Norris for considerable assistance in figure preparation. Support was provided by NIH/NIDCD PO1DC00119, RO3DC03122, T32DC00038, and a Martinos Scholarship. 66 Chapter 2: Repetition Rate REFERENCES Abeles M and Goldstein MH, Jr. Responses of single units in the primary auditory cortex of the cat to tones and to tone pairs. Brain Res 42: 337-352, 1972. Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516, 1983. Bandettini PA, Jesmanowicz A, Wong EC and Hyde JS. Processing strategies for time-course data sets in functional MRI of the human brain. MRM 30: 161-173, 1993. Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW, Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997. Bandettini PA and Ungerleider LG. From neuron to BOLD: New connections. Nat Neurosci 4: 864-866, 2001. Bieser A and Muller-Preuss P. Auditory responsive cortex in the squirrel monkey: Neural responses to amplitude-modulated sounds. Exp Brain Res 108: 273-284, 1996. Binder JR, Rao SM, Hammeke TA, Frost JA, Bandettini PA and Hyde JS. Effects of stimulus rate on signal responses during functional magnetic resonance imaging of auditory cortex. Brain Res Cogn Brain Res 2: 31-38, 1994. Blinkenberg M, Bonde C, Holm S, Svarer C, Andersen J, Paulson OB and Law I. Rate dependence of regional cerebral activation during performance of a repetitive motor task: A PET study. J. Cerebral Blood Flow and Metabolism 16: 794-803, 1996. Boynton GM, Engel SA, Glover GH and Heeger DJ. Linear systems analysis of functional magnetic resonance imaging in human V1. J Neurosci 16: 4207-4221, 1996. Bregman A. Auditory scene analysis. The perceptual organization of sound. Cambridge: MIT Press, 1990. Brosch M, Schulz A and Scheich H. Processing of sound sequences in macaque auditory cortex: Response enhancement. J Neurophysiol 82: 1542-1559, 1999. Buckner RL, Bandettini PA, O'Craven KM, Savoy RL, Peterson SE, Raichle ME and Rosen BR. Detection of cortical activation during averaged single trials of a cognitive task using functional magnetic resonance imaging. Proc Natl Acad Sci 93: 14878-14883, 1996. Budd TW and Michie PT. Facilitation of the N1 peak of the auditory ERP at short stimulus intervals. Neuroreport 5: 2513-2516, 1994. Burkard RF, Secor CA and Salvi RJ. Near-field responses from the round window, inferior colliculus, and auditory cortex of the unanesthetized chinchilla: Manipulations of noise burst level and rate. J Acoust Soc Am 106: 304-312, 1999. Buxton RB, Wong EC and Frank LR. Dynamics of blood flow and oxygenation changes during brain activation: The balloon model. Magn Reson Med 39: 855-864, 1998. References 67 Chatrian GE, Petersen MC and Lazarte JA. Responses to clicks from the human brain: Some depth electrographic observations. Electroencephalogr Clin Neurophysiol 12: 479-489, 1960. Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation. Magn Reson Med 39: 520-527, 1998. Creutzfeldt O, Hellweg FC and Schreiner C. Thalamocortical transformation of responses to complex auditory stimuli. Exp Brain Res 39: 87-104, 1980. Dale AM and Buckner RL. Selective averaging of rapidly presented individual trials using fMRI. Hum Brain Mapp 5: 329-340, 1997. Davis H, Mast T, Yoshie N and Zerlin S. The slow response of the human cortex to auditory stimuli: Recovery process. Electroencephalogr Clin Neurophysiol 21: 105-113, 1966. Davis H, Osterhammel A, Wier CC and Gjerdingen DB. Slow vertex potentials: Interactions among auditory, tactile, electric and visual stimuli. Electroencephalogr Clin Neurophysiol 33: 537545, 1972. Davis TL, Kwong KK, Weisskoff RM and Rosen BR. Calibrated functional MRI: Mapping the dynamics of oxidative metabolism. Proc Natl Acad Sci 95: 1834-1839, 1998. Dhankhar A, Wexler BE, Fulbright RK, Halwes T, Blamire AM and Shulman RG. Functional magnetic resonance imaging assessment of the human brain auditory cortex response to increasing word presentation rates. J Neurophysiol 77: 476-483, 1997. Donchin E, Ritter W and McCallum WC. Cognitive psychophysiology: The endogenous components of the ERP. In: Event-related brain potentials in man, edited by Callaway E, Tueting P and Koslow SH. New York: Academic Press, 1978, p. 349-411. Eggermont JJ. Rate and synchronization measures of periodicity coding in cat primary auditory cortex. Hear Res 56: 153-167, 1991. Eggermont JJ. Temporal modulation transfer functions for AM and FM stimuli in cat auditory cortex. Effects of carrier type, modulating waveform and intensity. Hear Res 74: 51-66, 1994. Eggermont JJ and Smith GM. Synchrony between single-unit activity and local field potentials in relation to periodicity coding in primary auditory cortex. J Neurophysiol 73: 227-245, 1995. Eggermont JJ. Representation of spectral and temporal sound features in three cortical fields of the cat. Similarities outweigh differences. J Neurophysiol 80: 2743-2764, 1998. Erne SN and Hoke M. Short-latency evoked magnetic fields from the human auditory brainstem. Adv Neurol 54: 167-176, 1990. Fitzpatrick DC and Kuwada S. Responses of neurons to click-pairs as simulated echoes: Auditory nerve to auditory cortex. J Acoust Soc Am 106: 3460-3472, 1999. Fox PT and Raichle ME. Stimulus rate dependence of regional cerebral blood flow in human striate cortex, demonstrated by positron emission tomography. J Neurophysiol 51: 1109-1120, 1984. Fox PT and Raichle ME. Stimulus rate determines regional brain blood flow in striate cortex. Ann Neurol 17: 303-305, 1985. Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC and Evans AC. Assessing the significance of focal activations using their spatial extent. Hum Brain Mapp 1: 210-220, 1994. Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995. 68 Chapter 2: Repetition Rate Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in fMRI time-series. Magn Reson Med 35: 346-355, 1996. Frith CD and Friston KJ. The role of the thalamus in "top down" modulation of attention to sound. Neuroimage 4: 210-215, 1996. Fruhstorfer H, Soveri P and Jarvilehto T. Short-term habituation of the auditory evoked response in man. Electroencephalogr Clin Neurophysiol 28: 153-161, 1970. Fruhstorfer H. Habituation and dishabituation of the human vertex response. Electroencephalogr Clin Neurophysiol 30: 306-312, 1971. Gaschler-Markefski B, Baumgart F, Tempelmann C, Schindler F, Stiller D, Heinze HJ and Scheich H. Statistical methods in functional magnetic resonance imaging with respect to nonstationary time-series: Auditory cortex activity. Magn Reson Med 38: 811-820, 1997. Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000. Goldstein MH, Jr., Hall JL, II and Butterfield BO. Single-unit activity in the primary auditory cortex of unanesthetized cats. J Acoust Soc Am 43: 444-455, 1968. Gratton G, Goodman-Wood MR and Fabiani M. Comparison of neuronal and hemodynamic measures of the brain response to visual stimulation: An optical imaging study. Hum Brain Mapp 13: 13-25, 2001. Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O and Patterson RD. Encoding of the temporal regularity of sound in the human brainstem. Nat Neurosci 4: 633-637, 2001. Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S, Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain Mapp 6: 33-41, 1998. Hall DA, Haggard MP, Summerfield AQ, Akeroyd MA, Palmer AR and Bowtell RW. Functional magnetic resonance imaging measurements of sound-level encoding in the absence of background scanner noise. J Acoust Soc Am 109: 1559-1570, 2001. Hari R, Pelizzone M, Makela JP, Hallstrom J, Leinonen L and Lounasmaa OV. Neuromagnetic responses of the human auditory cortex to on- and offsets of noise bursts. Audiology 26: 31-43, 1987. He J, Hashikawa T, Ojima H and Kinouchi Y. Temporal integration and duration tuning in the dorsal zone of cat auditory cortex. J Neurosci 17: 2615-2625, 1997. Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000. Hillyard SA and Picton TW. On and off components in the auditory evoked potential. Percept Psychophys 24: 391-398, 1978. Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999a. Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Linear coupling between cerebral blood flow and oxygen consumption in activated human cortex. Proc Natl Acad Sci 96: 9403-9408, 1999b. Howard MA, III, Volkov IO, Abbas PJ, Damasio H, Ollendieck MC and Granner MA. A chronic microelectrode investigation of the tonotopic organization of human auditory cortex. Brain Res 724: 260-264, 1996. References 69 Hudetz AG, Biswal BB, Shen H, Lauer KK and Kampine JP. Spontaneous fluctuations in cerebral oxygen supply. An introduction. Adv Exp Med Biol 454: 551-559, 1998. Ibanez V, Deiber MP, Sadato N, Toro C, Grissom J, Woods RP, Mazziotta JC and Hallett M. Effects of stimulus rate on regional cerebral blood flow after median nerve stimulation. Brain 118: 1339-1351, 1995. Jäncke L, Specht K, Mirzazade S, Loose R, Himmelbach M, Lutz K and Shah NJ. A parametric analysis of the 'rate effect' in the sensorimotor cortex: A functional magnetic resonance imaging analysis in human subjects. Neurosci Lett 252: 37-40, 1998. Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res Cogn Brain Res 8: 117-124, 1999. Jiang ZD, Wu YY and Zhang L. Amplitude change with click rate in human brainstem auditoryevoked responses. Audiology 30: 173-182, 1991. Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995. Krishna BS and Semple MN. Auditory temporal processing: Responses to sinusoidally amplitudemodulated tones in the inferior colliculus. J Neurophysiol 84: 255-273, 2000. Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci 89: 5675-5679, 1992. Lammertmann C and Lutkenhoner B. Near-DC magnetic fields following a periodic presentation of long-duration tonebursts. Clin Neurophysiol 112: 499-513, 2001. Langner G and Schreiner CE. Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. J Neurophysiol 60: 1799-1822, 1988. Langner G. Periodicity coding in the auditory system. Hear Res 60: 115-142, 1992. Leonard CM, Puranik C, Kuldau JM and Lombardino LJ. Normal variation in the frequency and location of human auditory cortex landmarks. Heschl's gyrus: Where is it? Cereb Cortex 8: 397406, 1998. Lockwood AH, Salvi RJ, Coad ML, Arnold SA, Wack DS, Murphy BW and Burkard RF. The functional anatomy of the normal human auditory system: Responses to 0.5 and 4.0 kHz tones at varied intensities. Cereb Cortex 9: 65-76, 1999. Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001. Loveless N, Hari R, Hamalainen M and Tiihonen J. Evoked responses of human auditory cortex may be enhanced by preceding stimuli. Electroencephalogr Clin Neurophysiol 74: 217-227, 1989. Lu T and Wang X. Temporal discharge patterns evoked by rapid sequences of wide- and narrowband clicks in the primary auditory cortex of cat. J Neurophysiol 84: 236-246, 2000. Lu T, Liang L and Wang X. Temporal and rate representations of time-varying signals in the auditory cortex of awake primates. Nat Neurosci 4: 1131-1138, 2001. Mandeville JB, Marota JJA, Kosofsky BE, Keltner JR, Weissleder R, Rosen BR and Weisskoff RM. Dynamic functional imaging of relative cerebral blood volume during rat forepaw stimulation. Magn Reson Med 39: 615-624, 1998. 70 Chapter 2: Repetition Rate Mandeville JB, Marota JJA, Ayata C, Moskowitz MA, Weisskoff RM and Rosen BR. MRI measurement of the temporal evolution of relative CMRO2 during rat forepaw stimulation. Magn Reson Med 42: 944-951, 1999. Mathiesen C, Caesar K, Akgoren N and Lauritzen M. Modification of activity-dependent increases of cerebral blood flow by excitatory synaptic activity and spikes in rat cerebellar cortex. J Physiol (Lond) 512: 555-566, 1998. Melcher JR and Kiang NY. Generators of the brainstem auditory evoked potential in cat. III: Identified cell populations. Hear Res 93: 52-71, 1996. Melcher JR, Talavage TM and Harms MP. Functional MRI of the auditory system. In: Functional MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 393-406. Mentis MJ, Alexander GE, Grady CL, Horwitz B, Krasuski J, Pietrini P, Strassburger T, Hampel H, Schapiro MB and Rapoport SI. Frequency variation of a pattern-flash visual stimulus during PET differentially activates brain from striate through frontal cortex. Neuroimage 5: 116-128, 1997. Miller GA and Taylor WG. The perception of repeated bursts of noise. J Acous Soc Am 20: 171182, 1948. Moller AR. Neural generators of the brainstem auditory evoked potentials. Seminars in Hearing 19: 11-27, 1998. Muller-Preuss P, Flachskamm C and Bieser A. Neural encoding of amplitude modulation within the auditory midbrain of squirrel monkeys. Hear Res 80: 197-208, 1994. Näätänen R and Picton T. The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology 24: 375-425, 1987. Nakai T, Matsuo K, Kato C, Takehara Y, Isoda H, Moriya T, Okada T and Sakahara H. Poststimulus response in hemodynamics observed by functional magnetic resonance imaging--difference between the primary and sensorimotor area and the supplementary motor area. Magn Reson Imaging 18: 1215-1219, 2000. Noda K, Tonoike M, Doi K, Koizuka I, Yamaguchi M, Seo R, Matsumoto N, Noiri T, Takeda N and Kubo T. Auditory evoked off-response: Its source distribution is different from that of onresponse. Neuroreport 9: 2621-2625, 1998. Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic activity in the central auditory system. J Comp Neurol 245: 553-565, 1986. Obrig H, Neufang M, Wenzel R, Kohl M, Steinbrink J, Einhaupl K and Villringer A. Spontaneous low frequency oscillations of cerebral hemodynamics and metabolism in human adults. Neuroimage 12: 623-639, 2000. Ogawa S, Menon RS, Tank DW, Kim SG, Merkle H, Ellerman JM and Ugurbil K. Functional brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging: A comparison of signal characteristics with a biophysical model. Biophys J 64: 803-812, 1993. Pantev C, Eulitz C, Hampson S, Ross B and Roberts LE. The auditory evoked "off" response: Sources and comparison with the "on" and the "sustained" responses. Ear Hear 17: 255-265, 1996. Pauling L and Coryell CD. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proc Natl Acad Sci 22, 1936. References 71 Penhune VB, Zatorre RJ, MacDonald JD and Evans AC. Interhemispheric anatomical differences in human primary auditory cortex: Probabilistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6: 661-672, 1996. Pfefferbaum A, Buchsbaum M and Gips J. Enhancement of the average evoked response to tone onset and cessation. Psychophysiology 8: 332-339, 1971. Picton TW, Hillyard SA, Krausz HI and Galambos R. Human auditory evoked potentials. I: Evaluation of components. Electroencephalogr Clin Neurophysiol 36: 179-190, 1974. Picton TW, Woods DL, Baribeau-Braun J and Healey TMG. Evoked potential audiometry. J Otolaryngol 6: 90-119, 1977. Picton TW, Woods DL and Proulx GB. Human auditory sustained potentials. II. Stimulus relationships. Electroencephalogr Clin Neurophysiol 45: 198-210, 1978. Press WH, Teukolsky SA, Vetterling WT and Flannery BP. Numerical recipes in c: The art of scientific computing: Cambridge University Press, 1992. Price C, Wise R, Ramsay S, Friston K, Howard D, Patterson K and Frackowiak R. Regional response differences within the human auditory cortex when listening to words. Neuroscience Letters 146: 179-182, 1992. Purdon PL and Weisskoff RM. Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level false-positive rates in fMRI. Hum Brain Mapp 6: 239-249, 1998. Rademacher J, Caviness J, V.S., Steinmetz H and Galaburda AM. Topographical variation of the human primary cortices: Implications for neuroimaging, brain mapping, and neurobiology. Cereb Cortex 3: 313-329, 1993. Rao SM, Bandettini PA, Binder JR, Bobholz JA, Hammeke TA, Stein EA and Hyde JS. Relationship between finger movement rate and functional magnetic resonance signal change in human primary motor cortex. J Cerebral Blood Flow and Metabolism 16: 1250-1254, 1996. Ravicz ME, Melcher JR and Kiang NY-S. Acoustic noise during functional magnetic resonance imaging. J Acoust Soc Am 108: 1683-1696, 2000. Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and body. J Acoust Soc Am 109: 216-231, 2001. Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150: 104-118, 2000. Rees G, Howseman A, Josephs O, Frith CD, Friston KJ, Frackowiak RSJ and Turner R. Characterizing the relationship between BOLD contrast and regional cerebral blood flow measurements by varying the stimulus presentation rate. Neuroimage 6: 270-278, 1997. Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties of human and macaque V5. Nat Neurosci 3: 716-723, 2000. Reese TG, Davis TL and Weisskoff RM. Automated shimming at 1.5 T using echo-planar image frequency maps. J Magn Reson Imaging 5: 739-745, 1995. Reite M, Adams M, Simon J, Teale P, Sheeder J, Richardson D and Grabbe R. Auditory m100 component 1: Relationship to heschl's gyri. Brain Res Cogn Brain Res 2: 13-20, 1994. Ritter W, Vaughan J, H.G. and Costa LD. Orienting and habituation to auditory stimuli: A study of short term changes in average evoked responses. Electroencephalogr Clin Neurophysiol 25: 550556, 1968. 72 Chapter 2: Repetition Rate Robson MD, Dorosz JL and Gore JC. Measurements of the temporal fMRI response of the human auditory cortex to trains of tones [published erratum appears in Neuroimage 8: 228, 1998]. Neuroimage 7: 185-198, 1998. Roth WT and Kopell BS. The auditory evoked response to repeated stimuli during a vigilance task. Psychophysiology 6: 301-309, 1969. Royer FL and Robin DA. On the perceived unitization of repetitive auditory patterns. Percept Psychophys 39: 9-18, 1986. Schlaug G, Sanes JN, Thangaraj V, Darby DG, Jancke L, Edelman RR and Warach S. Cerebral activation covaries with movement rate. Neuroreport 7: 879-883, 1996. Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of cat. I. The anterior auditory field (AAF). Hear Res 21: 227-241, 1986. Schreiner CE and Langner G. Coding of temporal patterns in the central auditory nervous system. In: Auditory function: Neurobiological bases of hearing, edited by Edelman GM, Gall WE and Cowan WM. New York: John Wiley and Sons, 1988, p. 337-361. Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the cat. II. Comparison between cortical fields. Hear Res 32: 49-64, 1988. Schreiner CE and Raggio MW. Neuronal responses in cat primary auditory cortex to electrical cochlear stimulation. II. Repetition rate coding. J Neurophysiol 75: 1283-1300, 1996. Sigalovsky I, Hawley ML, Harms MP and Melcher JR. Sound level representations in the human auditory pathway investigated using fMRI. Neuroimage 13: S939, 2001. Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE. Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol 83: 537-551, 2000. Sokoloff L. In: Basic neurochemistry, edited by Siegel G, Agranoff B, Albers RW and Molinoff P. New York: Raven, 1989, p. 565-590. Springer CS, Jr., Patlak CS, Palyka I and Huang W. Principles of susceptibility contrast-based functional MRI: The sign of the functional MRI response. In: Functional MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 91-102. Spychala P, Rose DE and Grier JB. Comparison of the "on" and "off" characteristics of the acoustically evoked response. International Audiology 8: 416-423, 1969. Steinschneider M, Reser DH, Fishman YI, Schroeder CE and Arezzo JC. Click train encoding in primary auditory cortex of the awake monkey: Evidence for two mechanisms subserving pitch perception. J Acoust Soc Am 104: 2935-2955, 1998. Suzuki T, Kobayashi K and Takagi N. Effects of stimulus repetition rate on slow and fast components of auditory brain-stem responses. Electroencephalogr Clin Neurophysiol 65: 150-156, 1986. Symmes D, Chapman LF and Halstead WC. The fusion of intermittent white noise. J Acoust Soc Am 27: 470-473, 1955. Takanashi M, Abe K, Yanagihara T, Oshiro Y, Watanabe Y, Tanaka H, Hirabuki N, Nakamura H and Fujita N. Effects of stimulus presentation rate on the activity of primary somatosensory cortex: A functional magnetic resonance imaging study in humans. Brain Res Bull 54: 125-129, 2001. References 73 Talavage TM, Ledden PJ, Benson RR, Rosen BR and Melcher JR. Frequency-dependent responses exhibited by multiple regions in human auditory cortex. Hear Res 150: 225-244, 2000. Thomas CG and Menon RS. Amplitude response and stimulus presentation frequency response of human primary visual cortex using BOLD EPI at 4 T. Magn Reson Med 40: 203-209, 1998. Thornton ARD and Coleman MJ. The adaptation of cochlear and brainstem auditory evoked potentials in humans. Electroencephalogr Clin Neurophysiol 39: 399-406, 1975. Vazquez AL and Noll DC. Nonlinear aspects of the BOLD response in functional MRI. Neuroimage 7: 108-118, 1998. Wessinger CM, VanMeter J, Tian B, Van Lare J, Pekar J and Rauschecker JP. Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. J Cogn Neurosci 13: 1-7, 2001. Woodruff PWR, Benson RR, Bandettini PA, Kwong KK, Howard RJ, Talavage T, Belliveau J and Rosen BR. Modulation of auditory and visual cortex by selective attention is modalitydependent. Neuroreport 7: 1909-1913, 1996. Yang Y, Engelien A, Engelien W, Xu S, Stern E and Silbersweig DA. A silent event-related functional MRI technique for brain activation studies without interference of scanner acoustic noise. Magn Reson Med 43: 185-190, 2000. Zhu XH, Kim SG, Andersen P, Ogawa S, Ugurbil K and Chen W. Simultaneous oxygenation and perfusion imaging study of functional activity in primary visual cortex at different visual stimulation frequency: Quantitative correlation between BOLD and CBF changes. Magn Reson Med 40: 703-711, 1998. 74 Chapter 3 Detection and quantification of a wide range of fMRI temporal responses using a physiologicallymotivated basis set ABSTRACT The temporal dynamics of fMRI responses can span a broad range, indicating a rich underlying physiology, but also posing a significant challenge for detection. For instance, in human auditory cortex, prolonged sound stimuli (~30 s) can evoke responses ranging from sustained to highly phasic (i.e., characterized by prominent peaks just after sound onset and offset). In the present study, we developed a method capable of detecting a wide variety of responses, while simultaneously extracting information about individual response components, which may have different physiological underpinnings. Specifically, we implemented the general linear model using a novel set of basis functions chosen to reflect temporal features of cortical fMRI responses. This physiologically-motivated basis set (the “OSORU” basis set) was tested against (1) the commonly employed “sustained-only” basis “set” (i.e., a single smoothed “boxcar” function), and (2) a sinusoidal basis set, which is capable of detecting a broad range of responses, but lacks a direct relationship to individual response components. On data that included many different temporal 75 76 Chapter 3: Physiologically-motivated GLM responses, the OSORU basis set performed far better overall than the sustained-only set, and as well or better than the sinusoidal basis set. The OSORU basis set also proved effective in exploring brain physiology. As an example, we demonstrate that the OSORU basis functions can be used to spatially map the relative amount of transient vs. sustained activity within auditory cortex. The OSORU basis set provides a powerful means for response detection and quantification that should be broadly applicable to any brain system and to both human and non-human species. INTRODUCTION It is well recognized that sites of brain activation could escape detection with fMRI if the temporal dynamics of activation and the a priori temporal assumptions of the detection technique are poorly matched. Numerous approaches have been proposed attempting to minimize the assumptions concerning activation dynamics in order to avoid the possibility of missed activation (e.g., Brammer 1998; Golay et al. 1998; Andersen et al. 1999). However, for the most part, these techniques have remained more of theoretical interest than practical significance because there have not been striking examples of their necessity. The wide range of temporal fMRI responses recently found in human auditory cortex to prolonged (e.g., 30 s) stimuli provide a clear illustration of the need for methods capable of detecting an extensive range of response waveshapes. Depending on the sound stimulus, the temporal dynamics of auditory cortical activation can vary from the sustained waveshapes seen typically in fMRI to atypical “phasic” waveshapes that include prominent peaks just after sound onset and offset (e.g., see Figure 3-3). The demonstrated capacity of auditory cortex to show these variations in fMRI waveshape raises the possibility of similarly dramatic, but as yet unidentified variations in other cortical areas. It also raises the possibility of additional, as yet undetected modes of response both within and outside the auditory system. That some forms of activation can be easily missed is clearly illustrated using one of the most statistically powerful, but also the most constrained of detection methods – cross-correlation with an assumed response waveshape. For a commonly used cross-correlating function, a smoothed Introduction 77 boxcar, the cross-correlation approach performs well when activation is sustained, but poorly when it is not. This point is well-illustrated by the poor detection of phasic responses in auditory cortex, as compared to the good detection of sustained responses (e.g., Figure 3-3, bottom activation maps). In cases like auditory cortex, where the extremes of response waveshape are quite different, a detection technique with highly constrained assumptions concerning waveshape is bound to fail at one end of the waveshape spectrum or the other. A method that allows for a wide range of temporal dynamics is therefore essential if activation is to be detected reliably. The overall objective of the present study was to develop such a method. We specifically sought an approach that would provide information about the underlying waveshape of activation as an automatic by-product of detection. Of the various approaches with the potential to detect a range of response waveshapes, we identified the general linear model (GLM) as having, in principle, the characteristics needed to meet our goals, and rejected several other alternatives because they did not satisfy our requirements. In theory, a signal decomposition based on wavelet analysis may support the detection of responses with dynamics that vary over different temporal scales (Brammer 1998; von Tscharner and Thulborn 2001). However, a wavelet transformation does not necessarily facilitate the extraction of directly pertinent, interpretable information concerning the temporal dynamics of activation. Other possible alternatives, such as autocorrelation analysis (Paradis et al. 1996), fuzzy clustering (Baumgartner et al. 1998; Golay et al. 1998; Chuang et al. 1999; Fadili et al. 2000), and principle component analysis (Sychra et al. 1994; Andersen et al. 1999) lack a well-defined statistic that supports inference on a univariate, voxel-by-voxel, basis. The GLM on the other hand, does not suffer from this limitation and can provide direct information concerning the temporal properties of activation for individual voxels. The basic idea behind the GLM is that the response can be modeled as a weighted sum of “basis functions”. The amplitudes of the basis functions are estimated so as to give the best overall fit to the measured response. As a by-product of detection under the GLM, the basis functions and their corresponding amplitudes can provide direct information about the underlying temporal responses, provided the basis functions are chosen to relate directly to specific features of the waveshape of activation. Additional strengths of the GLM include: 1) the capability to handle the 78 Chapter 3: Physiologically-motivated GLM correlated nature of fMRI time-series, 2) the existence of a well-defined, easily-computed statistic (F-statistic) for estimating significance relative to the null-hypothesis, and 3) good statistical power characteristics (Ardekani and Kanno 1998). An important element of the present study was devising a flexible, yet concise, set of basis functions for modeling a range of cortical responses within the GLM framework. The vast majority of previous implementations of the GLM have used a single, “sustained” basis function (equivalent to cross-correlation with a smoothed boxcar or, similarly, a t-test). Some studies have used a twoelement basis set, supplementing the standard sustained function with its temporal derivative or a component with an exponential decay (Friston et al. 2000; Giraud et al. 2000). However, this implementation remains limited in terms of the range of response waveshapes that can be handled. The opposite extreme is to use basis functions that lead to a direct estimate of all time-points of the response (i.e., finite impulse response models), thereby allowing for complete flexibility in possible response dynamics (and truly unbiased response estimation; Burock and Dale 2000; Miezin et al. 2000). However, such an approach will typically be ill advised for epoch-related studies with prolonged stimulus presentation, since the direct estimation of many time-points will result in a statistical test with drastically reduced power relative to a test based on a small, well-chosen basis set. An example of a small, well-chosen basis set is a series of sinusoids (i.e., a truncated Fourier series; Friston et al. 1995b; Bullmore et al. 1996; Ardekani et al. 1999). The GLM using this basis set is generally acknowledged to be powerful in terms of detection, and flexible in that it can handle a wide range of temporal responses. In theory, this approach is capable of detecting any response with frequency components in the range of the basis set. A downside, however, is that sinusoidal basis functions do not necessarily have physiological meaning. Thus, while providing a powerful means for detecting a variety of responses, a sinusoidal basis set does not meet our objective of providing direct information about different response components. The present study took a different approach from previous GLM work in that the choice of basis functions was neurophysiologically motivated. Specifically, the form of each function was chosen to mirror the general shape of a particular component in actual fMRI responses, the idea Introduction 79 being that certain components may indicate particular aspects of underlying neural activity (e.g., the prominent peaks after stimulus onset and offset in the phasic responses of auditory cortex likely reflect neural activity in response to stimulus onset and offset). Thus, the basis functions, together with their amplitudes, should provide direct, readily interpretable information about the temporal dynamics of responses, and hence the neural activity underlying them. A key difference between the present approach and some previous physiologically-driven detection methods (Purdon et al. 2001) is that our choice of basis functions was oriented toward understanding the neural activity behind fMRI responses, rather than modeling the hemodynamics. Here, five basis functions were chosen. These will be referred to as the “OSORU” basis set, a name that derives from descriptions of the individual components (Onset, Sustained, Offset, Ramp, Undershoot; see Figure 3-1). The present study examined the utility of the OSORU basis set in three ways. First, the detection capability of the OSORU basis set was assessed by testing whether the extent of detected activation was (1) greater than or equal to the extent obtained using one of the most common detection methods, i.e., comparison with a (smoothed) boxcar reference waveform, and (2) comparable to that obtained using a sinusoidal basis set – an alternative basis set generally acknowledged to be powerful, yet flexible, in terms of response detection. Second, the ability of the OSORU basis set to match different waveshape components was assessed by comparing the amplitude of different OSORU basis functions (or combinations thereof) with direct measurements (e.g., baseline to peak amplitudes) from the waveforms themselves. Both the detection and waveform matching tests were conducted using an extensive and challenging database composed of auditory cortical responses to a variety of sounds. Finally, we examined the utility of the OSORU basis set for extracting physiological information concerning responding brain areas. As an example, we derived one particular measure from the OSORU basis functions that summarizes the degree to which auditory cortex responds to sound in a transient vs. sustained manner. We show that the relative amounts of transient and sustained activity can be captured in this measure, and can be spatially mapped across cortical areas. Overall, the GLM method using the OSORU basis set provided reliable detection and straightforward characterization of the temporal dynamics of auditory cortical activation. Although 80 Chapter 3: Physiologically-motivated GLM developed and tested on the human auditory system, this approach should also be applicable, both in concept and detail, to other brain systems and species. METHODS fMRI data The data used in this paper are pooled across experiments that examined the representation of sound in the fMRI responses of the human auditory system. The overall database comes from 25 subjects and 39 total imaging sessions. All studies were approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear Infirmary, and Massachusetts General Hospital, and all subjects gave their written informed consent. Stimuli were trains of broadband noise bursts with various rates (1, 2, 10, and 35/s) and duty-cycles (5, 25, 50, and 88%), trains of narrowband noise bursts (third octave bandwidth, center frequency 500 Hz or 4 kHz; rate = 2 or 35/s), trains of tone bursts (500 Hz or 4 kHz, rate = 2 or 35/s), continuous broadband noise, trains of clicks (35 or 100/s), orchestral music, and running speech. This repertoire of stimuli produced a wide range of response waveshapes and thus provided a challenging database for developing and testing the detection and quantification approach of the present study. Three to six stimuli were included in each imaging session, giving a database with a total of 177 cases for which we constructed activation maps and response estimates. Stimuli were always presented for 30 s “on” epochs alternated with 30 s “off” epochs, during which no auditory stimulus was presented. The on/off “period” corresponding to a given stimulus was repeated 4 – 13 times during an imaging session. The actual functional imaging was organized into individual “runs” consisting of 4 – 5 on/off periods. Typically, the different stimuli in a session were presented once each run, and their order was varied across runs. However, in 13 sessions the same stimulus was used repeatedly throughout a given run, as was also the case for all cases of the music stimulus. Stimuli were delivered binaurally through a headphone assembly that provided approximately 30 dB of attenuation at the primary frequency of the scanner noise (Ravicz and Methods 81 Melcher 2001). In 26 of the sessions, stimulus levels were set to approximately 55 dB above threshold (SL); in the remaining 13 sessions stimulus level was varied between 35 and 75 dB SL. Imaging was performed on five different systems: 1.5 and 3.0 T General Electric magnets retrofitted for high-speed, echo-planar imaging (by Advanced NMR Systems, Inc.), a 1.5 T General Electric Signa Horizon magnet, and 1.5 and 3.0 T Siemens System Sonata and Allegra magnets. For functional imaging, the selected slice intersected the inferior colliculus and the posterior aspect of Heschl's gyrus and the superior temporal gyrus (which include auditory cortical areas). A single slice, rather than multiple slices, was imaged to reduce the impact of scanner-generated acoustic noise on auditory activation without sacrificing temporal resolution. Slice thickness was always 7 mm with an in-plane resolution of 3.1 x 3.1 mm. Functional images of the selected slice were acquired using a blood oxygenation level dependent (BOLD) sequence. For the 1.5 T experiments, the sequence parameters were: asymmetric spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o. For the 3.0 T experiments the parameters were: gradient echo, TE = 30 ms (except one session used 40 ms and another used 50 ms), flip = 600 or 900. A T1-weighted anatomical image (in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm) of the functionally imaged slice was also obtained and used to localize auditory cortex. The present study focuses on responses from auditory cortex, although the data comes from studies that also examined the inferior colliculus. Therefore, functional images were generally collected using a cardiac gating method that increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998). Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, resulting in an average interimage interval (TR) of approximately 2 s. Image signal was corrected to account for the image-to-image variations in signal strength (i.e., T1 effects) that result due to fluctuations in heart rate (Guimaraes et al. 1998). In the 3 sessions that did not use cardiac gating, the TR was 2 s. Prior to response estimation using the general linear model, two additional imaging preprocessing steps were performed. First, the images for each scanning run were corrected for any inplane movements of the head that may have occurred over the course of the imaging session using standard software (SPM95 software package; without spin history correction; Friston et al. 1995a; 82 Chapter 3: Physiologically-motivated GLM Friston et al. 1996). Then, because cardiac gating results in irregular temporal sampling, the timeseries for each imaging “run” and voxel was linearly interpolated to a consistent 2 s interval between images, using recorded interimage intervals to reconstruct where each image occurred in time. While it would be relatively straightforward to create basis functions sampled at the actual times at which images were acquired, there is no straightforward way to rigorously model and estimate the noise covariance given irregular temporal sampling. Therefore, we interpolate the time-series for each voxel and treat the data as if it were originally sampled with a regular (2 s) interval. Independent of the GLM implementation, we computed empirical response time courses by averaging across repeated presentations of a given stimulus in a given imaging session. Specifically, the time-series for each imaging run and voxel was corrected for linear or quadratic drifts in signal, normalized to an (arbitrary) mean intensity, and temporally smoothed using a three point, zero-phase filter (with coefficients 0.25, 0.5, 0.25).1 A response “block” was defined as a 70 s window (35 images) that included 10 s prior to stimulus onset, the 30 s coinciding with the stimulus “on” period, and the 30 s “off” period following the stimulus. These response blocks were averaged according to stimulus to give an average signal vs. time waveform for a given stimulus in a session. For each stimulus and session, we further averaged signal vs. time across “active” voxels (i.e., all voxels in auditory cortex with p < 0.001 according to the OSORU GLM). The resulting “grand-average” waveforms were then converted to percent change in signal relative to baseline. The baseline was defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the stimulus. Basis functions OSORU basis set – The OSORU basis set was composed of five physiologically-motivated components, shown in Figure 3-1. The design of these components was guided by a small subset of 1 Technically, the drift correction and normalization were performed prior to interpolating the time series to a consistent 2 s inter-image interval. Methods 83 the responses in the overall database (i.e., ~20 out of 177). The components include: 1) A transient response to the onset of the stimulus. The time course of this component peaks at 6 s post-stimulus onset and returns to near baseline by 14-16 s; 2) A sustained response during the stimulus. This component represents the convolution (normalized to unit amplitude) of the onset component with a boxcar waveform.2 The beginning of the sustained component was designed to overlap the onset component, so that the onset component would primarily reflect a transient response that was above and beyond any sustained response; 3) A slowly changing (approximately linear) response during the stimulus, included because responses in auditory cortex sometimes exhibit a “signal recovery” following the signal decline that characterizes the onset component. This “ramp” component represents the convolution of the onset component with a ramp that increases linearly from t = 8 to 30 s; 4) A transient “offset” response to stimulus termination, which is a time-shifted version of the onset component. Offset responses occur frequently in our auditory database. Yet, the possibility of offset responses has generally not been factored into fMRI response detection; 5) An additional transient component (with a simple Gaussian waveshape) following the offset component was included primarily to help model responses with a post-stimulus undershoot. Like the other basis functions, this “undershoot” function was defined as a positive deviation from baseline (Figure 3-1). It was therefore expected to have a negative amplitude for responses with a clear undershoot (i.e., a negative deviation from baseline). Altogether, these five components define the OSORU basis set (Onset, Sustained, Offset, Ramp, Undershoot). The five components of the OSORU set form a concise, yet flexible, basis set that is capable of naturally modeling the prominent features of a variety of response waveshapes. 2 Technically, the amplitude of the boxcar waveform for the first sample (t = 0 s) was twice that of the remainder of the 30 s “on” period, in order to impart a more rapid signal increase to the sustained component. 84 Chapter 3: Physiologically-motivated GLM OSORU Basis Functions Onset 1.0 Amplitude 0.8 Sustained Offset Ramp Undershoot 0.6 0.4 0.2 0 0 10 20 30 40 50 60 Stimulus On Time (sec) Figure 3-1: The five physiologically-motivated functions of the OSORU basis set. The shaded area indicates the period of sound stimulation. Sustained-only basis "set" – For one comparison to the OSORU basis set, we used a basis “set” that consisted of just a single component, namely, the sustained component of the OSORU basis set. We included this comparison since methods assuming a sustained response (whether a raw “boxcar” waveform, e.g. a t-test, or a “hemodynamically smoothed” version thereof) have historically played a prominent role in the analysis of fMRI data acquired in an “epoch-related” paradigm. Note that computing activation maps from the sustained-only basis set is not the same as computing maps from just the magnitude of the sustained component in the OSORU basis set. This Methods 85 is because the basis functions in the OSORU set are correlated (correlation coefficients: –0.5 to 0.6). Consequently, the estimated amplitudes for a given component are dependent on precisely which other components are included in the complete model (Draper and Smith 1981; Andrade et al. 1999). Sinusoidal basis set – For the second comparison to the OSORU basis set, we examined a sinusoidal basis set that consisted of the 1st through 4th harmonics of a truncated Fourier series (including both sine and cosine terms). The 1st harmonics had a fundamental frequency defined by the “on/off” stimulation period (i.e., 1/60 s). Previous studies have used just the 1st-3rd harmonics, under the assumption that these harmonics were sufficient for modeling the response space (Bullmore et al. 1996; Ardekani et al. 1999). Here, we added a higher-order harmonic because preliminary work indicated it was needed to ensure optimal detection of phasic responses. In particular, we simulated an artificial data set consisting of a phasic response in Gaussian white noise, and examined the performance of sinusoidal basis sets in which the maximum harmonic varied from one to eight (including both the sine and cosine terms for all harmonics up to and including the maximum). Receiver-operator-characteristic curves (Constable et al. 1995; Skudlarski et al. 1999) showed that the basis set with the 1st-4th harmonics was most powerful in detecting the phasic response. The important contribution of the 4th harmonics is not surprising given that these harmonics together account for about 25% of the variance in a signal with a phasic waveshape, second only to the 2nd harmonics, which together account for 45% of the signal variance. (Because sinusoids are orthogonal, the “explained signal variance” can be uniquely partitioned among the individual components). “Trend” functions – In implementing the GLM, for each of the three basis sets (i.e., OSORU, sustained-only, sinusoidal), we included three additional functions in the design matrix for each imaging run to handle low-frequency “trends” in the signal vs. time. These included the standard column vector of ones (for estimating the mean over the run), and both a linear and quadratic vector for estimating any signal drift during the course of the run. Although these three vectors had non-zero amplitudes, their estimated amplitudes were ignored in the creation of the activation maps. Note that we assumed response waveshape and magnitude were constant across 86 Chapter 3: Physiologically-motivated GLM repeated presentations of a given stimulus, so none of the implementations of the GLM incorporated a progressive “habituation” in the amplitude of the basis functions across stimulus repetitions. Response and noise estimation under the general linear model The GLM was implemented separately for each of the three basis sets: OSORU, sustainedonly, and sinusoidal.3 It is well documented that the noise in typical fMRI time-series is autocorrelated (i.e., is not white; Bullmore et al. 1996; Locascio et al. 1997; Zarahn et al. 1997; Purdon and Weisskoff 1998), typically resulting in an inflation of the false-positive rate above its theoretically predicted value. Therefore, in implementing the GLM, an estimate of the noise autocorrelation was used to pre-whiten the data.4 Under the general linear model (e.g., Friston et al. 1995c; Burock and Dale 2000), the timeseries y for a single voxel is modeled as a weighted linear sum of basis functions plus a noise term: y = Xβ + η, where the columns of the design matrix X contain the basis functions used to model the response space, β is a vector of basis function amplitudes to be estimated, and η is a noise sequence with arbitrary covariance matrix Ω . If the covariance Ω is known, the generalized least squares (GLS) estimator (which is also the maximum likelihood estimator if the noise is multivariate Gaussian) of β is βˆ GLS = ( XT Ω −1X) −1 XT Ω −1y . This estimator is the optimally efficient estimator, meaning that it has the smallest possible variance, among the class of linear, unbiased estimators of β. In our situation, the noise covariance Ω is unknown, but can be estimated (described below). Given such an estimate Ω̂ , the feasible generalized least squares (FGLS) estimator is obtained by ˆ −1X) −1 XT Ω ˆ −1y . Under fairly general conditions, simply replacing Ω with Ω̂ , i.e., βˆ FGLS = ( XT Ω 3 For the present study, the GLM was implemented by programs originally developed by Doug Greve and colleagues (at the Martinos Imaging Center), but which we modified for our purposes. It should be possible to incorporate the OSORU and sinusoidal basis sets within any package that implements the GLM (The “sustained-only” set or an equivalent is a standard part of GLM packages). Note that some packages (e.g., Statistical Parametric Mapping) automatically orthonormalize their basis sets, which would have to be disabled in order to preserve the physiological meaning of the OSORU functions. 4 An alternative would be to “pre-color” the data by replacing the unknown endogenous autocorrelation with an exogenous autocorrelation of known form by applying a smoothing filter. This approach is theoretically less efficient than pre-whitening, but also less prone to invalid statistical inference if the noise is incorrectly estimated (Friston et al. 2000; Bullmore et al. 2001). Methods 87 βÌ‚ FGLS will have desirable asymptotic properties. Generally, if Ω̂ is a consistent estimator of the noise, then βÌ‚ FGLS will be an asymptotically efficient estimator of β (Fomby et al. 1984). Depending on the mismatch between the estimated and actual autocorrelations, βÌ‚ FGLS will have nearly minimum variance. To estimate the noise covariance we adopt an iterative approach (Bullmore et al. 1996; Burock and Dale 2000) in which we first compute the ordinary least squares (OLS) estimate of β, βˆ OLS = ( X T X) −1 X T y . The underlying noise structure is then estimated from the residual error given by e = y − Xβˆ OLS . In its most general form, Ω̂ has more free parameters than the number of observations available in the residual error vector e. Therefore, to make the estimation of Ω tractable, it is necessary to make some assumptions about the covariance structure of the noise. In particular, we assume: 1) The noise is temporally stationary within an imaging run, meaning that a given diagonal in Ω has a constant value. 2) The noise structure does not vary over a given imaging session, so a given Ω applies to all imaging runs in a session. 3) Diagonals beyond dmax in Ω are zero, meaning that any noise correlation effectively decays to zero for lags greater than a certain value. To ensure that potential long-term correlations are captured, we use a generous maximum lag of 30 s, yielding dmax = 15. 4) The noise is assumed to be locally constant across auditory cortex, ˆ = σˆ 2 Λ ˆ , where ΛÌ‚ is the “normalized” covariance matrix within a scale factor – i.e., we assume Ω 2 (with ones on the diagonal) that is constant across voxels, and σˆ is the scalar variance that is estimated separately for each voxel. Since the noise covariance can vary across the brain (Bullmore et al. 1996; Friston et al. 2000), we limited the noise estimation to voxels from auditory cortex, in order to avoid using voxels that were altogether unrelated to the structure of interest. Auditory cortex was defined (from anatomical images) as all cortex lateral to the medial-most aspect of the Sylvian fissure, between the superior temporal sulcus and the inferior edge of the parietal lobe. Our approach for estimating the noise differs from some previous papers in that we use the estimated autocorrelation out to a generous lag (30 s; i.e., 15 samples) to directly form the estimate of the normalized covariance matrix ΛÌ‚ , rather than assuming a potentially limiting parametric form for the autocorrelation, such as an AR(1) or AR(3) model (Bullmore et al. 1996; Bullmore et al. 2001), AR(1) plus white noise (Burock and Dale 2000; Purdon et al. 2001), or a 1/f model (Zarahn et 88 Chapter 3: Physiologically-motivated GLM al. 1997). Theoretically, an AR(p) model can exactly match any given autocorrelation function out to lag p (e.g., when the AR(p) parameters are estimated using the “Yule-Walker” equations; Percival and Walden, 1993). Consequently, given a sufficiently rapid decay in the autocorrelation (which is supported by Figure 3-2), our approach is essentially equivalent to the AR(16) approach that Friston et al. (2000) used as their “gold standard” for estimating the noise autocorrelation. The actual empirical noise autocorrelation functions (ACFs), estimated under GLM using the OSORU basis set, are plotted in Figure 3-2. There was a clear trend for greater autocorrelation in the sessions conducted with a 3 T magnet, compared to those at 1.5 T. Furthermore, for extended lags there was a small, but consistent, bias toward negative autocorrelations, which was again more pronounced for the sessions conducted at 3 T. This negative autocorrelation could reflect genuine long-term correlations in the noise process, although it seems more likely that it reflects a small bias in our procedure for estimating the noise. The ACFs estimated under the sinusoidal and sustainedonly basis sets were very similar to the ACFs under the OSORU set – the largest absolute difference in the ACF functions (across all sessions and lag values) was 0.04 between the OSORU and sinusoidal basis sets, and 0.07 between the OSORU and sustained-only sets. Examination of residuals To investigate the validity of our assumptions about the structure of the noise covariance, we examined the residuals from the GLM. In particular, we computed the Ljung-Box diagnostic (Ljung and Box 1978), which tests whether the residuals are consistent with white noise. The Ljung-Box diagnostic, QK, is given by rk2 QK = N(N + 2)∑ k =1 N − k K where rk is the autocorrelation function computed from the “whitened” residuals ˆ −1 2 (y − Xβˆ e* = Λ FGLS ) , and N is the length of the residual vector. Under the null hypothesis that Methods 89 Noise Autocorrelation 0.5 Average across sessions 0.6 3.0 T Autocorrelation 0.4 0 1 Individual sessions 1.5 T 3.0 T 0.2 1.5 T 15 0 -0.2 1 5 9 13 Lag (samples) Figure 3-2: Estimated autocorrelation functions (ACFs) of the noise in auditory cortex. The main panel plots the ACFs from lags 1 – 16 for each session (x’s for sessions using a 1.5 T magnet, n = 27; o’s for 3.0 T, n = 12). The autocorrelation for lags 16 and greater was assumed to be zero. The inset plots the ACFs averaged across sessions, according to field strength. 90 Chapter 3: Physiologically-motivated GLM ΛÌ‚ is estimated correctly, the whitened residuals are serially independent, and QK is distributed as χ2 with K degrees of freedom.5 We choose K = 10, and computed Q10 on a voxel-by-voxel, and run-byrun basis. To summarize the results, we computed, for each run, the percentage of voxels in auditory cortex that failed the Ljung-Box test at a p-value of 0.01, and then averaged these percentages across runs to form a single summary measure for each experimental session (PercentLB). Larger percentages indicate that the residuals of voxels in auditory cortex were more frequently inconsistent with the null hypothesis of white noise, indicating that the nominal p-values computed from βÌ‚ FGLS are less likely to reflect the true false-positive rate. For comparison, we also computed PercentLB based on the ordinary least square residuals used originally to estimate the noise covariance. Estimating and accounting for the noise covariance resulted in a dramatic decrease in the number of voxels whose residuals failed to exhibit serial independence. Prior to noise estimation (i.e., using the OLS residuals), PercentLB, computed under the OSORU basis set, averaged 40% for the sessions conducted with a 1.5 T magnet (n=27) and 76% for the sessions with a 3 T magnet (n=12). After noise estimation, PercentLB was reduced to 6% and 19%, respectively. Under the sustained-only and sinusoidal basis sets, PercentLB after noise estimation was within a percentage point of the values obtained for the OROSU basis set. The fact that PercentLB remained somewhat elevated for the 3 T sessions indicates that our assumptions regarding the structure of the noise were violated more frequently at the higher field strength (e.g., Bullmore et al. 2001). At 1.5 T, the autocorrelations were generally lower, so a “floor effect” may mask some of the noise variations across space (i.e., voxels) and time (e.g., across imaging runs), whereas the higher autocorrelations seen at 3 T allow for greater spatial and temporal heterogeneity. While PercentLB indicates that statistical inference will not be fully accurate in all voxels, we proceed to use all the voxels and their associated p-values (under the FGLS approach), given the dramatic improvement in accuracy in a large percentage of voxels. 5 In effect, less than 1 degree of freedom was used in estimating the noise at a given voxel, since the 15 parameters required for ΛÌ‚ were estimated using all the voxels in auditory cortex. Therefore, we used K as the degrees of freedom for the Ljung-Box test. Methods 91 Practical implementation In actually computing βÌ‚ FGLS and ΛÌ‚ we utilized the fact that the data were acquired in separate imaging runs, in order to avoid working with unnecessarily large data matrices. Specifically, we computed   Nr βˆ OLS =  ∑ XTr X r   ï£ r =1 −1  Nr T   ∑ X r y r   ï£ r =1 where X r is the design matrix for run r, y r is the fMRI time-series from run r (pre-processed as specified above), and Nr is the number of runs. Using the residual errors for each run e r = y r − X r βˆ OLS , we estimated the noise autocorrelation (up to lag k = 15) as ρ k = δ k δ , where 0 T −k ∑ ∑ ∑e δk = e r , t r ,t + k runs AC t =1 voxels . N total Ntotal gives the total number of terms in the above triple summation, and er,t is the tth OLS residual from run r. Given ρ k , we constructed ΛÌ‚ = Toeplitz( ρ k ), where the Toeplitz operator returns a matrix that is symmetrical about the leading diagonal. Finally, we estimated βˆ FGLS  Nr ˆ −1X  =  ∑ XTr Λ r  ï£ r =1 −1  N r T ˆ −1   ∑ X r Λ y r  .  ï£ r =1 The scalar variance at each voxel was estimated from the whitened residual error ˆ −1 2 (y − X βˆ e r* = Λ r r FGLS ) as Nr σˆ 2 = ∑e r =1 T r* r* e Ndof , where Ndof is the total number of time points (for a given voxel) summed across all runs minus the total number of estimated parameters (i.e., the number of columns in the design matrix X). 92 Chapter 3: Physiologically-motivated GLM Activation map formation For each stimulus in a given imaging session, we created an “omnibus” map that tested against the null hypothesis that none of the estimated basis function amplitudes (for a given basis set) were significantly different from zero. To generate these activation maps, statistical inference was drawn from the estimated amplitudes βÌ‚ FGLS using the generalized hypothesis test. Specifically, we test against linear hypotheses of the form H o : Rβ = q , where R is a restriction (or contrast) matrix, and q is a deterministic vector equal to an appropriately sized column vector of zeros for our analyses. Since βÌ‚ FGLS is asymptotically distributed as a multivariate Gaussian if the noise autocorrelation is estimated consistently, an F-statistic can be used for generalized inference (Fomby et al. 1984). Specifically, F [ J , N dof ] = ˆ −1 X) −1 R T ] −1 (Rβˆ (Rβˆ FGLS − q) T [σˆ 2 R ( X T Λ FGLS − q ) , J where J is the number of rows in R and Ndof is the total number of time points (for a given voxel) summed across all runs minus the total number of estimated parameters. For the jth stimulus out of Nstim total stimuli in a session, R = [a1 a2 … aNstim], where ai=j = I (the identity matrix) and ai≠j = 0. We compare the relative performance of the OSORU, sustained-only, and sinusoidal basis sets by comparing the “active” voxels at two p-value thresholds (p = 0.001, 0.05 for individual voxels; not corrected for multiple comparisons). For such comparisons to be informative, it is important that the statistical inference be equally valid for the GLM based on each of the three basis sets. We know that the nominal p-values do not strictly match their theoretical values for any of the basis sets. For instance, the sustained-only set will result in biased estimation in cases with phasic responses. Also, the Ljung-Box diagnostic indicated that our assumed structure for the noise was not accurate at all voxels (albeit a small minority). Importantly however, the Ljung-Box diagnostic did have a similar failure rate under all three basis sets (i.e., similar PercentLB), implying that the pvalues produced by the three sets are valid to roughly equal degrees. Note that our comparison of activation maps quantifies the ability of the three basis sets to detect a response in the practical situation in which the actual response and the noise is unknown, and consequently a ROC-type analysis (receiver-operator-characteristic) of the true statistical power cannot be conducted. Methods 93 (Although, a family of possible of ROC curves could be derived given assumptions about the type II error rate for missed activation; e.g., Bullmore et al. 1996). Waveshape index Using the OSORU basis functions, we devised a “waveshape index” capable of broadly distinguishing between sustained and phasic responses. The basic idea was to compare the total amount of “transient activity” (defined as the sum of the onset and offset amplitudes) to the activity at the midpoint of the response. Secondarily, the waveshape index was designed to distinguish responses with transient activity that was primarily limited to either the onset or offset component from responses with onset and offset components that approached each other in magnitude. The exact formulation of the waveshape index was chosen so as to yield a robust measure that stayed within a finite range. Reasonable behavior for the waveshape index was confirmed by examining how well the index qualitatively sorted the waveforms of the present study (see Results). Specifically, the waveshape index was defined as:  On + Off 1   ∈ [0,1] 2 ï£ Mid + max(On, Off )  (1) where, On and Off represent the magnitudes of the onset and offset components of the OSORU basis set (as estimated under the GLM), and Mid is defined as the sum of the magnitude of the sustained component plus one-half of the ramp component.6 Using this definition, maximum values for the waveshape index (i.e., near 1) can only result if the two transient components are similar in magnitude and are large relative to the midpoint response. Values near one-half can reflect a response consisting of solely an onset or offset response, or alternatively a combination of onset and offset activity in a response also having some 6 Technically, On, Off, and Mid were all rectified prior to their use in Eq. (1) (i.e., negative values were converted to zero). Consequently, if On, Off, and Mid, were all negative (or zero) prior to rectification, the denominator of Eq. (1) would be zero, and hence the waveshape index would be undefined. In practice, this situation never occurred when the magnitudes of the basis functions were first averaged across active voxels in auditory cortex (i.e., the “summary waveshape index”). Nor did this situation occur for any of the individual voxels active in left auditory cortex for the spatial maps of waveshape index shown in Figure 3-6. 94 Chapter 3: Physiologically-motivated GLM midpoint activity. Values near zero reflect a response dominated by the midpoint response (i.e., by the sustained and/or ramp components of OSORU basis set). Eq. (1) was utilized in two different ways in the present study. At the level of individual voxels, Eq. (1) was used to create spatial “maps” of the waveshape index for a given stimulus in a session (Figure 3-6). We also calculated a summary waveshape index for each stimulus of each session using the magnitudes of the basis functions averaged across all the active voxels (p < 0.001) in auditory cortex. Despite the non-linearity of the waveshape index calculation, the resulting summary waveshape indices were quite similar to a second possible summary measure, defined as the mean of the waveshape indices computed separately for each of the active voxels (R2 = 0.97 in a linear regression comparing the two alternatives). Thus, the summary waveshape index was insensitive to the precise method of summarizing across voxels. RESULTS Activation detection: OSORU vs. sustained-only and sinusoidal basis functions OSORU vs. sustained-only – From the standpoint of activation detection, the OSORU basis set was as effective as the sustained-only set, and was far more effective in many cases. The difference in performance is illustrated qualitatively by the maps in Figure 3-3, which show activation for two stimuli (2/s and 35/s noise burst trains) studied in the same imaging session. The OSORU and sustained-only basis sets detected approximately equal extents of activation for 2/s noise bursts, which produce sustained response waveshapes (Figure 3-3, left). In contrast, for 35/s noise bursts, the OSORU set detected extensive activation while the sustained-only set gave the impression of little activity in auditory cortex (Figure 3-3, right). That this impression from the sustained-only set was erroneous could be seen by inspecting the responses of individual voxels. In response to 35/s noise bursts, voxels throughout auditory cortex showed signal peaks that were clearly time-locked to sound onset and offset. These “phasic” responses to 35/s noise bursts were Results 95 Response detection with three different basis sets 2/s Noise Bursts R 35/s Noise Bursts p=0.001 L OSORU p = 2x10-9 Sinusoidal Heschl’s Gyrus Superior Temporal Gyrus Response Waveform % Signal Change SustainedOnly 2.0 “Sustained” 0.0 “Phasic” On 0 30 0 30 Time (sec) Figure 3-3: Top three rows: Activation maps obtained using the OSORU, sinusoidal, and sustainedonly basis sets, for two different stimuli that elicit sustained (left) or phasic (right) responses. The OSORU and sinusoidal basis sets perform well, regardless of underlying response waveshape. In contrast, the sustained-only basis set only performs well when responses are sustained. For each stimulus, the three sets of maps were created using the same underlying data. The data for the two stimuli were obtained in the same imaging session. Stimuli were noise burst trains with repetition rates of 2/s (left) or 35/s (right) (burst duration = 25 ms). Each panel is an enlargement of right (R) or left (L) auditory cortex in a near coronal plane. Color activation maps (based on functional images with an in-plane resolution of 3.1 x 3.1 mm) have been interpolated to the resolution of the grayscale anatomic images (1.6 x 1.6 mm). Bottom row: The responses to each stimulus, averaged over the “active” voxels (p < 0.001 in the OSORU maps) in auditory cortex of both hemispheres. Auditory cortex included both Heschl’s gyrus and the superior temporal gyrus. 96 Chapter 3: Physiologically-motivated GLM missed by the sustained-only set, whereas they were detected by the OSORU set, hence the greater extent of detected activation for the OSORU basis set. The trends seen qualitatively in Figure 3-3 were also borne out in quantitative comparisons of maps based on the OSORU and sustained-only basis sets. For most imaging sessions and stimuli (i.e., 117 of 177 cases), more voxels were defined as “active” (p < 0.001) using the OSORU basis functions. To quantify the nature of the overlap of activated voxels obtained with the two basis sets, we computed the number of voxels that were designated as active in the OSORU-based map, but not the sustained-only map (NOSORU-only; i.e., an “exclusive or” operation), or vice-versa (Nsust-only). This number was then divided by the number of voxels designated as active in both maps (Nboth, an “or” operation, in which voxels active in both maps were counted only once). Across the 117 cases that had more “active” voxels in the OSORU-based map, NOSORU-only/Nboth averaged 0.44, while Nsustonly/Nboth averaged only 0.04. Thus, the voxels detected by the sustained-only set were essentially a subset of the voxels detected by the OSORU set. Moreover, they were an appreciably smaller subset. Further analysis of voxel overlap showed that even for the cases in which the sustained-only basis set detected an equal or greater number of voxels (60 cases, presumably with primarily sustained underlying responses7), the number of such voxels was only marginally greater than those detected by the OSORU set. Specifically, for these cases, Nsust-only/Nboth averaged only 0.16. Thus, any loss of power in detecting sustained responses under the OSORU set (due to the increased number of basis functions) was, in practice, quite small. OSORU vs. sinusoidal – The OSORU basis set was as effective as, or slightly better than, the sinusoidal set in terms of activation detection. Their similarity in performance is illustrated by the activation maps in Figure 3-3. For both stimuli (2/s and 35/s noise burst trains), the extent of activation with the OSORU basis set (Figure 3-3, top row) was comparable to that with the sinusoidal set (second row). Notably, both basis sets revealed widespread activation in auditory cortex for both stimuli, despite the substantial difference in underlying response waveform – 7 The average waveshape index of the 60 cases for which the OSORU basis set detected fewer voxels than the sustained-only set was 0.20. Results 97 sustained in one case (2/s), and phasic in the other (35/s; Figure 3-3, bottom row). Thus, both the OSORU and sinusoidal basis sets have the flexibility to detect the extremes of response waveshape seen in auditory cortex. A further quantitative analysis revealed that more voxels were generally detected with the OSORU basis set than with the sinusoidal set. On average, across all imaging sessions and stimuli (i.e., all 177 cases), 31% of voxels within auditory cortex had p-values less than 0.001 for the OSORU set, compared to 25% using the sinusoidal set. In 164 of the 177 cases, more voxels were defined as active using the OSORU basis set. The fraction of voxels detected only with the OSORU set (NOSORU-only/Nboth) was, on average, 0.26, whereas the fraction detected only with the sinusoidal set (Nsins-only/Nboth) averaged 0.03 (the nomenclature and quantification parallels the comparison of the OSORU and sustained-only basis sets above). In other words, the OSORU basis set detected more “active” voxels in the vast majority of cases, and rarely missed voxels detected with the sinusoidal basis set. For the small minority of cases in which the number of voxels active in the OSORU map was less than or equal to the number in the sinusoidal map (13 of 177), NOSORU-only/Nboth and Nsinsonly/Nboth were similar: 0.13 and 0.19. This result indicates that in a small fraction of cases the two basis sets were complementary in that each detected a comparable fraction of voxels missed by the other. Insensitivity of comparisons to p-value – As a check that our comparison of the OSORU, sustained-only, and sinusoidal basis sets was not unduly sensitive to the specific p-value criterion chosen, we repeated our quantification of activation map overlap using a liberal p-value threshold of 0.05. While more voxels were subsequently designated “active”, the relative performance of the three basis sets, as defined by the percentage overlap of the active voxels, was essentially unaltered. Relative importance of the OSORU basis functions To gain a sense of the importance of the individual basis functions in the OSORU basis set relative to each other and the overall response fit, we conducted the following analysis. For each of the 177 cases, F-maps were computed for each of the five basis functions in the OSORU set, based 98 Chapter 3: Physiologically-motivated GLM on the estimated amplitude of each function and its associated variance. Note that this is equivalent to computing t-maps and proceeding to work with both the positive and negative tails of the distribution, since F(1,ν) = t2(ν). Then, for the voxels that were “active” in auditory cortex according to the full OSORU basis set (i.e., p < 0.001), we counted the number of voxels that had p < 0.1 in the map computed for a given, single basis function. (A higher p-value cutoff was used for the individual functions to avoid an overly restrictive portrayal of their contribution to the overall response fit). In the literature of linear regression, this procedure is sometimes called a partial Ftest, and is used in “backward elimination” procedures for determining the “best” set of variables for a regression (Draper and Smith 1981). Conceptually, one can think of the partial F-test as treating a given basis function as if it were the last to enter the regression equation, and then testing the null hypothesis that the full set of basis functions does not (statistically) explain more of the variance than a reduced basis set without the given function.8 Rejection of the null hypothesis means that the given basis function significantly improves the response fit, relative to a reduced basis set without this function. All of the components of the OSORU basis set were important to the response estimation, although with varying degrees of frequency. Of the total of 6731 “active” voxels as determined by the full OSORU set (across all 177 cases), the individual basis functions were significant in their respective F-maps (at p < 0.1) for the following percentages of voxels: 68, 68, 51, 37, and 25%, for the onset, sustained, offset, ramp, and undershoot functions, respectively. If we had implemented a true backward elimination procedure on the 6731 active voxels, in which we eliminated the single basis function with the highest (i.e., least significant) p-value according to the partial F-test (provided this p-value was greater than the 0.1 threshold), the onset function would have been eliminated in 10% of the voxels, the sustained in 12%, the offset in 17%, the ramp in 24%, and the undershoot in 35% of the voxels. (In 3% of the active voxels, all the individual basis functions had p < 0.1, so none would have been eliminated). These results indicate that some basis functions were more 8 This “analysis of variance” interpretation of the partial F-test is strictly true only in a framework with independent, uncorrelated errors (i.e., ordinary least squares). Results 99 useful than others. More importantly however, the analyses indicate that all five basis functions of the OSORU set were important for fitting the responses in a substantial number of voxels. Assessment of correspondence between OSORU components and actual waveforms Figure 3-4 compares the magnitude of three response measures estimated using two methods, one based on the amplitudes estimated for the different OSORU basis functions, and an alternative based on measurements from response waveforms. The left panel plots the amplitude of the onset basis function versus an alternative waveform-based measure of “onset” amplitude, defined as the maximum response from t = 4-10 s minus the mean response between t = 12-14 s. A linear regression line relating the data was highly significant (y = 1.12x + 0.23; p < 0.001, R2 = 0.87). Similarly, an OSORU-based measure of the response near its midpoint (the amplitude of the sustained basis function plus one-half the amplitude of the ramp basis function) was a good description of the waveform amplitude just after the midpoint of the stimulus “On” epoch (defined as the mean response between t = 18-24 s; y = 0.97x + 0.04; p < 0.001, R2 = 0.92; Figure 3-4, middle). Finally, there was good correlation between the amplitude of the offset basis function and a waveform-based measure of “offset” amplitude, defined as the maximum response in the 4 – 8 s following stimulus termination (34 ≤ t ≤ 38 s) minus the response amplitude at stimulus termination (t = 30 s; y = 0.93x + 0.26; p < 0.001, R2 = 0.74; Figure 3-4, right). Overall, there was a good correspondence between the two methods of amplitude quantification, which makes sense in view of the qualitatively good match between the fitted and actual responses (Figure 3-5). Our results indicate that the amplitudes of the OSORU basis functions can provide as accurate an assessment of waveshape features as direct measurements from the waveforms themselves. 100 Chapter 3: Physiologically-motivated GLM OSORU-Based (% change) OSORU-Based vs. Waveform-Based Measures of Amplitude Onset 4 Midpoint 4 Offset Regression Line 2 Unity 2 2 0 0 0 0 2 4 0 2 4 0 2 Waveform-Based (% change) Figure 3-4: Comparison of OSORU-based vs. waveform-based measures of three different response amplitudes: onset, midpoint, and offset. Each ‘+’ represents a value based on all the “active” voxels in auditory cortex (defined as voxels with p < 0.001 in activation maps generated using the OSORU basis set) for each stimulus of each session. For the OSORU-based measures, the amplitudes of the basis functions were averaged across the “active” voxels, and then converted to percent change by dividing by the estimated signal mean (i.e., the average amplitude of the “trend” basis function comprised of a vector of 1’s) and multiplying by 100. Values for the waveform-based measures were taken from waveforms also expressed in terms of percent change. The solid line is the linear regression line relating the two measures. The dashed line represents a one-to-one correspondence between the two measures. Results 101 Using the OSORU basis functions to probe response physiology The utility of the OSORU basis set for extracting information concerning the physiology of responding brain areas is illustrated by the following example. Here, we examine the relative amounts of transient and sustained activity in auditory cortex using the waveshape index (defined in Methods), a measure derived directly from the OSORU basis functions. To qualitatively evaluate the effectiveness of this measure in capturing the relative amounts of transient vs. sustained activity, we examined actual response waveforms sorted according to their waveshape index. Figure 3-5 illustrates the sorting, and simultaneously provides a sense of the overall variety of responses in the database. Across the complete data set, the waveshape index did a good job of sorting the responses, as determined by visual inspection. Responses spanned a wide range, extending from highly sustained (indicating activity throughout the stimulus presentation) to highly phasic (indicating transient activity at sound onset and offset). As a rough approximation, responses with waveshape indices less than ~0.25 were primarily sustained, while responses with indices greater than ~0.6 were predominately phasic. Intermediate waveshape indices (between ~0.25 and ~0.6) indicated a blend of sustained and transient activity. In a small minority of cases, the measured response differed noticeably from the sum of the OSORU basis functions (dashed lines in Figure 3-5), resulting in an apparent mismatch between response waveform and waveshape index (6-7 waveforms out of 177). In these cases, the response showed an appreciable signal decline, but the waveshape index was less than 0.25, indicating a sustained response (e.g., Figure 3-5, fourth column, first row). The mismatches occurred because the early part of the response was not effectively captured by the onset basis function, either because the signal decline was not as rapid as that of the onset basis function, or because the response peaked slightly later than the onset function. (The latter effect accounts for the mismatch between waveshape index and waveform in the fourth column of the first row in Figure 3-5). Despite occasional mismatches, the waveshape index was successful in broadly indicating the relative amounts of transient and sustained cortical activity in the overwhelming majority of cases. 102 Chapter 3: Physiologically-motivated GLM Figure 3-5: Response waveforms (solid lines) sorted according to their summary waveshape index. Every sixth waveform of the complete, sorted database (177 waveforms) is displayed. The shaded region indicates the 30 s stimulus “on” period. Each response is an average across “active” voxels in auditory cortex (p < 0.001 in maps constructed from the OSORU basis set) and across all presentations of a given stimulus in a given session. Responses were converted to percent change, and then normalized to a maximum of one. The number of “active” voxels varied from 3 – 78 (mean: 39), and the maximum response (prior to normalization) varied from 0.7 – 3.6% (mean: 1.7%). For an indication of the fit of the OSORU basis set to the responses, the sum of the OSORU basis functions is also plotted (dashed lines). Specifically, this “fitted” response was computed by 1) averaging the estimated amplitudes of a given OSORU basis function across “active” voxels, 2) converting to percent change, 3) summing the OSORU basis functions as weighted by these percent change values, and 4) normalizing the fitted response to a maximum of one. Results 103 Auditory Cortex Responses Sorted by Waveshape Index waveshape : 0.03 index 0 0.08 0.11 0.13 0.15 0.18 ON 1 0.20 0.22 0.23 0.25 0.25 0.28 0.30 0.35 0.39 0.41 0.44 0.45 0.50 0.51 0.54 0.58 0.62 0.65 0.67 0.71 0.74 0.79 0.85 Normalized Signal 0 1 0 1 0 1 Measured Waveform 0 0 30 0 30 0 30 0 30 0 30 Sum of OSORU Basis Functions Time (sec) Figure 3-5 104 Chapter 3: Physiologically-motivated GLM Spatial Maps of Waveshape Index (Left Auditory Cortex) 35/s Clicks Heschl's Gyrus Continuous Noise Superior Temporal Gyrus 0 Waveshape Index 1 Figure 3-6: Two cases that showed clear changes in waveshape with position. The cases correspond to two different subjects and stimuli (top: 35/s clicks; bottom: continuous noise). Each panel shows a color map of waveshape index superimposed on a grayscale anatomical image of the left superior temporal lobe. A waveshape index is indicated for all voxels with p < 0.001 using the OSORU basis set. Figure 3-6 illustrates how positional variations in the relative amounts of transient and sustained activity can be explored by spatially mapping the waveshape index on a voxel-by-voxel basis within auditory cortex. The two cases in the figure were selected because they show clear changes in waveshape with position. Specifically, there is an increase in waveshape index from Heschl's gyrus medially to superior temporal gyrus laterally. This trend indicates that primary auditory cortex (located on Heschl's gyrus) responded to the sound stimuli in a fairly sustained fashion, whereas more lateral, non-primary areas responded more transiently. This example illustrates one of several ways (see Discussion) that spatial variations in the temporal patterns of activation can be explored using the OSORU basis set. Discussion 105 DISCUSSION Successful response detection with the OSORU basis set The physiologically-motivated, OSORU basis set successfully detected a variety of response waveforms in auditory cortex, as measured against two alternatives – a sustained-only basis “set”, and a sinusoidal basis set that has been previously used for flexible response estimation under the general linear model (Ardekani et al. 1999). The OSORU basis set in many instances dramatically outperformed the sustained-only “set”, since the OSORU set is able to fit a variety of underlying response waveforms. Frequently, the sustained-only basis set failed to identify voxels that had a consistent and repeatable, but “non-sustained”, response to a given stimulus, thus providing a misleading picture of brain activation. But importantly, the ability of the OSORU basis set to detect a wide variety of response waveforms did not come at the cost of poor detection of sustained responses – even for sustained responses the OSORU set detected as many or nearly as many voxels as the sustained-only set. The fact that the OSORU basis set consistently designated more voxels as “active” than the sinusoidal set suggests that the OSORU set had greater statistical power for detecting the responses of the test database. The slight detection superiority of the OSORU basis set is particularly impressive given that several factors in the present study acted to boost the likelihood of response detection under the sinusoidal approach. For instance, we purposely extended the sinusoidal set used previously in other studies to include the 4th harmonics, so as to increase the likelihood of detecting phasic responses. In addition, the actual responses were such that the signal power tended to be concentrated at either the 1st, or the 2nd and 4th harmonics. This concentration of signal power at certain frequencies occurred because (1) responses tended to be either sustained, or biphasic (with onset and offset transients), and (2) the “on” and “off” stimulus epochs were of equal length, thus yielding responses with a certain “symmetry” that increases the power at the harmonics. In contrast, for responses consisting of mainly a single transient (e.g., Figure 3-5, 4th row, 1st column), or in paradigms with unequal duration “on” and “off” epochs, more of the signal power will be dispersed to higher frequencies, and consequently, the OSORU basis set would enjoy an even greater 106 Chapter 3: Physiologically-motivated GLM advantage over the sinusoidal set.9 Overall, in terms of activation detection, the OSORU basis set performed slightly better than the sinusoidal alternative, which itself was optimized to do well in the current context. Altogether, the comparison of the OSORU basis set with the sinusoidal and sustained-only sets indicates that the OSORU set had the flexibility to model a variety of response waveforms, and yet was concise enough to maintain good statistical power. A challenging database provided a strong test of the OSORU basis set The auditory cortical responses used to test the OSORU basis set were an important element of the present study. The varied temporal dynamics of these data posed a significant challenge for detection. Despite this challenge, the OSORU basis set performed well. Whether other approaches with the potential for identifying responses with a range of dynamics (e.g., wavelet analysis, fuzzy clustering, or principle component analysis) would also perform well on auditory cortical data is an open question. For the most part, these techniques have been tested on less challenging data, typically sustained responses, either simulated or measured (Sychra et al. 1994; Baumgartner et al. 1998; Brammer 1998; Golay et al. 1998; Chuang et al. 1999; Fadili et al. 2000; von Tscharner and Thulborn 2001). Applying these techniques to a broader range of waveforms, such as those found in auditory cortex, would provide a far stronger test of their detection capabilities. Detecting and mapping response dynamics One advantage of the OSORU basis set is that it can facilitate the detection of as yet unidentified responses with forms that might be logically hypothesized based on already identified 9 Note that these comments apply equally well to Fourier analyses that are applied in the frequency domain. Even though a stimulus paradigm may be completely periodic, this does not imply that a Fourier F-test (e.g., Purdon and Weisskoff 1998; Marchini and Ripley 2000) will detect all periodic responses with equal statistical power. The important criterion is that the frequencies represented in the response be concentrated at the frequencies used for the F-test. Indeed, a Fourier F-test at the (single) frequency representing the paradigm periodicity is mathematically equivalent to the application of the general linear model using just a single sine and cosine term – an approach which has relatively poor power for detecting phasic or transient (yet periodic) responses. Discussion 107 waveforms. For instance, the sharp decline in signal that forms the phasic response onset component may reflect underlying neural adaptation, and the off component likely represents neural responses to the offset of sound (Chap. 2). It is possible that neuronal populations exhibiting these responses are not entirely co-localized, or that onset and offset responses do not occur together for every sound (e.g., Figure 3-5, 4th row, 1st column). A GLM implemented with the OSORU basis set provides a straightforward way to test these ideas, both qualitatively and statistically. Since the GLM provides estimates of the response components complete with variances, statistical inference can be performed, such as examining whether a given component is significantly different from zero on a voxel-by-voxel basis. This statistical rigor is one advantage of using the OSORU basis set for both detection and response quantification, rather than first detecting responses (e.g., using a “nonphysiological model” such as the sinusoidal basis set) and then extracting different measures from response waveforms. There is also a certain parsimony, as well as computational efficiency, in obtaining response measures from the same model used to identify responding brain areas. Using the OSORU basis set, it is a straightforward matter to construct a variety of maps that may reveal spatial variations in response physiology. Mapping the relative amounts of transient and sustained activity using a parameter such as the “waveshape index” (as in Figure 3-6) is one such example. Activation maps could also be constructed based on individual OSORU basis functions or linear combinations thereof, e.g., only the sustained component, or the sum (or difference) of the onset and offset components. The correlation between basis functions does not invalidate such an approach, but simply requires that maps constructed from isolated components be interpreted appropriately within the context of the full basis set in which the parameters were originally estimated (Andrade et al. 1999). In the case of non-linear combinations of components, such as the waveshape index, statistical inference is not automatically supported. However, using the estimated covariance of the basis functions and Monte-Carlo simulation techniques it would be possible to estimate, for instance, whether two waveshape indices are statistically different from one another. An alternative to mapping different response components would be to visually examine the responses for individual voxels and then attempt a mental synthesis across voxels to extract any spatial trends. However, in cases with a large number of “active” voxels, this approach becomes unwieldy and the shear volume of data to be synthesized makes it difficult to identify any trends with 108 Chapter 3: Physiologically-motivated GLM position. In contrast, the approach of extracting and mapping particular OSORU components (or combinations of components) provides a consolidated view of brain responses that can readily reveal positional variations in underlying physiology (e.g., Figure 3-6). In searching for responses with unknown temporal dynamics, there is no single, optimal basis set, and using two or more complementary detection approaches will likely be beneficial. GLMs implemented with the OSORU and sinusoidal basis sets form one such complementary pair. Since the OSORU basis set performed better than the sinusoidal set on the database of the present study, it may also perform better in cases where similar responses are produced. However, the slight superiority of the OSORU set derives from tailoring the basis functions to known response features. Therefore, the relative effectiveness of the OSORU and sinusoidal sets might well reverse if the underlying responses differ sufficiently from those represented in the current database. The OSORU basis set, unlike the sinusoidal set, has the advantage that it is easily extended to paradigms that are not strictly periodic (e.g., paradigms in which the durations of the epochs may vary somewhat across stimulus presentations, or may vary between the different stimuli). However, unlike the OSORU basis set, the sinusoidal set offers flexibility for detecting responses with variable temporal delays. Similar “temporal flexibility” could be incorporated into an OSORU-like physiological basis set by allowing variations in the latency of the different functions. However, this would require non-linear, iterative estimation techniques, which are computationally intensive, and one would no longer be able to take advantage of the well-developed theoretical framework for linear models that allows estimation of appropriate statistics given correlated noise. Thus, the OSORU and sinusoidal approaches are complimentary in several respects. So, using both, rather than either alone, will increase the likelihood that all viable response waveforms are detected. Previous implementations of the general linear model within a physiological framework Within the context of the general linear model, there appears to be little previous discussion regarding the selection of a set of basis functions that might be advantageous from a physiological perspective. Sobel et al. (2000) confirmed the presence of response habituation over the course of a Discussion 109 prolonged odorous stimulus in olfactory cortex using a reference waveform that modeled habituation. However, they did not attempt to define a broader set of more general basis functions that could simultaneously model multiple forms of response. Giraud et al. (2000) observed transient responses in auditory cortex at the transition between amplitude-modulated and continuous noise. These authors handled the situation with an analysis that included three entirely distinct response models (a sustained response model, a transient response model, and mixed model), rather than synthesizing all the possible responses into a single basis set. In short, there have been only a few previous attempts to implement the GLM within any kind of physiological framework, and none, prior to the present study, has attempted a comprehensive basis set capable of accommodating an extensive range of response forms. Physiologically-based implementations of the GLM: broad applicability to any brain system While the present study focused specifically on the auditory system, it is important to recognize that the GLM implemented with the OSORU basis set is equally applicable to any brain system. There are certainly documented cases of non-sustained responses outside the auditory system (Bandettini et al. 1997; Hoge et al. 1999; Nakai et al. 2000; Sobel et al. 2000). There are also hints that responses ranging from sustained to phasic may occur for other sensory systems. For instance, measurements of time-average signal as a function of increasing repetition rate show a downturn at high rates in the auditory (Chap. 2), somatosensory (Ibanez et al. 1995; Takanashi et al. 2001), and visual systems (Fox and Raichle 1984, 1985; Kwong et al. 1992; Mentis et al. 1997; Thomas and Menon 1998; Zhu et al. 1998). In the auditory system, this decrease occurs because the response changes from sustained to phasic. A similar change in response waveform may explain the decrease in time-average signal in other sensory systems (rather than just a sustained response that decreases in amplitude with increasing rate). Whether this fundamental change in response “mode” indeed occurs in other sensory systems could be tested by detecting and quantifying responses using the OSORU basis set. At a broader level, one can imagine devising alternative physiological basis sets that are tailored to the particular temporal dynamics of a given brain system, or “tuning” the 110 Chapter 3: Physiologically-motivated GLM OSORU set to the particular properties of different brain systems (e.g., by adjusting the timing of the onset and offset basis functions). Additional basis functions may also be added, reflecting physiological processes not conceptualized in the current framework. Overall, physiologicallymotivated basis sets should prove useful in detecting and quantifying responses throughout the brain. Acknowledgements The authors gratefully thank John Guinan, Anders Dale, Patrick Purdon, Irina Sigalovsky, Monica Hawley, and Rick Hoge for numerous helpful comments and suggestions; Doug Greve for providing the Matlab code that formed the basis of our GLM implementation; and Barbara Norris, for considerable assistance in figure preparation. Support for this study was provided by NIH/NIDCD P01DC00119, T32DC00038, and a Martinos Scholarship. References 111 REFERENCES Andersen AH, Gash DM and Avison MJ. Principal component analysis of the dynamic response measured by fMRI: A generalized linear systems framework. Magn Reson Imaging 17: 795-815, 1999. Andrade A, Paradis AL, Rouquette S and Poline JB. Ambiguous results in functional neuroimaging data analysis due to covariate correlation. Neuroimage 10: 483-486, 1999. Ardekani BA and Kanno I. Statistical methods for detecting activated regions in functional MRI of the brain. Magn Reson Imaging 16: 1217-1225, 1998. Ardekani BA, Kershaw J, Kashikura K and Kanno I. Activation detection in functional MRI using subspace modeling and maximum likelihood estimation. IEEE Trans Med Imaging 18: 101114, 1999. Bandettini PA, Kwong KK, Davis TL, Tootell RBH, Wong EC, Fox PT, Belliveau JW, Weisskoff RM and Rosen BR. Characterization of cerebral blood oxygenation and flow changes during prolonged brain activation. Hum Brain Mapp 5: 93-109, 1997. Baumgartner R, Windischberger C and Moser E. Quantification in functional magnetic resonance imaging: Fuzzy clustering vs. Correlation analysis. Magn Reson Imaging 16: 115-125, 1998. Brammer MJ. Multidimensional wavelet analysis of functional magnetic resonance images. Hum Brain Mapp 6: 378-382, 1998. Bullmore E, Brammer M, Williams SCR, Rabe-Hesketh S, Janot N, David A, Mellers J, Howard R and Sham P. Statistical methods of estimation and inference for functional MR image analysis. Magn Reson Med 35: 261-277, 1996. Bullmore E, Long C, Suckling J, Fadili J, Calvert G, Zelaya F, Carpenter TA and Brammer M. Colored noise and computional inference in neurophysiological (fMRI) time series analysis: Resampling methods in time and wavelet domains. Hum Brain Mapp 12: 61-78, 2001. Burock MA and Dale AM. Estimation and detection of event-related fMRI signals with temporally correlated noise: A statistically efficient and unbiased approach. Hum Brain Mapp 11: 249-260, 2000. Chuang KH, Chiu MJ, Lin CC and Chen JH. Model-free functional MRI analysis using kohonen clustering neural network and fuzzy c-means. IEEE Trans Med Imaging 18: 1117-1128, 1999. Constable RT, Skudlarski P and Gore JC. An ROC approach for evaluating functional brain MR imaging and postprocessing protocols. Magn Reson Med 34: 57-64, 1995. Draper NR and Smith H. Applied regression analysis. New York: John Wiley & Sons, 1981. Fadili MJ, Ruan S, Bloyet D and Mazoyer B. A multistep unsupervised fuzzy clustering analysis of fMRI time series. Hum Brain Mapp 10: 160-178, 2000. Fomby TB, Hill RC and Johnson SR. Advanced econometric methods. New York: SpringerVerlag, 1984. 112 Chapter 3: Physiologically-motivated GLM Fox PT and Raichle ME. Stimulus rate dependence of regional cerebral blood flow in human striate cortex, demonstrated by positron emission tomography. J Neurophysiol 51: 1109-1120, 1984. Fox PT and Raichle ME. Stimulus rate determines regional brain blood flow in striate cortex. Ann Neurol 17: 303-305, 1985. Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995a. Friston KJ, Frith CD, Frackowiak RSJ and Turner R. Characterizing dynamic brain responses with fMRI: A multivariate approach. Neuroimage 2: 166-172, 1995b. Friston KJ, Holmes AP, Worsley KJ, Poline J-P, Frith CD and Frackowiak RSJ. Statistical parametric maps in functional imaging: A general linear approach. Hum Brain Mapp 2: 189-210, 1995c. Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in fMRI time-series. Magn Reson Med 35: 346-355, 1996. Friston KJ, Zarahn E, Holmes AP, Rouquette S and Poline J-B. To smooth or not to smooth? Bias and efficiency in fMRI time-series analysis. Neuroimage 12: 196-208, 2000. Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000. Golay X, Kollias S, Stoll G, Meier D, Valavanis A and Boesiger P. A new correlation-based fuzzy logic clustering algorithm for fMRI. Magn Reson Med 40: 249-260, 1998. Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S, Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain Mapp 6: 33-41, 1998. Hoge RD, Atkinson J, Gill B, Crelier GR, Marrett S and Pike GB. Stimulus-dependent BOLD and perfusion dynamics in human V1. Neuroimage 9: 573-585, 1999. Ibanez V, Deiber MP, Sadato N, Toro C, Grissom J, Woods RP, Mazziotta JC and Hallett M. Effects of stimulus rate on regional cerebral blood flow after median nerve stimulation. Brain 118: 1339-1351, 1995. Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci 89: 5675-5679, 1992. Ljung GM and Box GEP. On a measure of lack of fit in time series models. Biometrika 65: 297303, 1978. Locascio JJ, Jennings PJ, Moore CI and Corkin S. Time series analysis in the time domain and resampling methods for studies of functional magnetic resonance brain imaging. Hum Brain Mapp 5: 168-193, 1997. Marchini JL and Ripley BD. A new statistical approach to detecting significant activation in functional MRI. Neuroimage 12: 366-380, 2000. Mentis MJ, Alexander GE, Grady CL, Horwitz B, Krasuski J, Pietrini P, Strassburger T, Hampel H, Schapiro MB and Rapoport SI. Frequency variation of a pattern-flash visual stimulus during PET differentially activates brain from striate through frontal cortex. Neuroimage 5: 116-128, 1997. References 113 Miezin FM, Maccotta L, Ollinger JM, Petersen SE and Buckner RL. Characterizing the hemodynamic response: Effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing. Neuroimage 11: 735-759, 2000. Nakai T, Matsuo K, Kato C, Takehara Y, Isoda H, Moriya T, Okada T and Sakahara H. Poststimulus response in hemodynamics observed by functional magnetic resonance imaging--difference between the primary and sensorimotor area and the supplementary motor area. Magn Reson Imaging 18: 1215-1219, 2000. Paradis AL, Mangin JF, Bloch I, Cornilleau-Peres V, Moulines E, Frouin V and Le Bihan D. Detection of periodic signals in brain echo-planar functional images. 18th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Amsterdam, 1996, p. 696697. Percival DB and Walden AT. Spectral analysis for physical applications: Multitaper and conventional univariate techniques. Cambridge: Cambridge University Press, 1993. Purdon PL and Weisskoff RM. Effect of temporal autocorrelation due to physiological noise and stimulus paradigm on voxel-level false-positive rates in fMRI. Hum Brain Mapp 6: 239-249, 1998. Purdon PL, Solo V, Weisskoff RM and Brown EN. Locally regularized spatiotemporal modeling and model comparison for functional MRI. Neuroimage 14: 912-923, 2001. Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and body. J Acoust Soc Am 109: 216-231, 2001. Skudlarski P, Constable RT and Gore JC. ROC analysis of statistical methods used in functional MRI: Individual subjects. Neuroimage 9: 311-329, 1999. Sobel N, Prabhakaran V, Zhao Z, Desmond JE, Glover GH, Sullivan EV and Gabrieli JDE. Time course of odorant-induced activation in the human primary olfactory cortex. J Neurophysiol 83: 537-551, 2000. Sychra JJ, Bandettini PA, Bhattacharya N and Lin Q. Synthetic images by subspace transforms. I. Principal components images and related filters. Med Phys 21: 193-201, 1994. Takanashi M, Abe K, Yanagihara T, Oshiro Y, Watanabe Y, Tanaka H, Hirabuki N, Nakamura H and Fujita N. Effects of stimulus presentation rate on the activity of primary somatosensory cortex: A functional magnetic resonance imaging study in humans. Brain Res Bull 54: 125-129, 2001. Thomas CG and Menon RS. Amplitude response and stimulus presentation frequency response of human primary visual cortex using BOLD EPI at 4 T. Magn Reson Med 40: 203-209, 1998. von Tscharner V and Thulborn KR. Specified-resolution wavelet analysis of activation patterns from BOLD contrast fMRI. IEEE Trans Med Imaging 20: 704-714, 2001. Zarahn E, Aguirre GK and D'Esposito M. Empirical analyses of BOLD fMRI statistics. I. Spatially unsmoothed data collected under null-hypothesis conditions. Neuroimage 5: 179-197, 1997. Zhu XH, Kim SG, Andersen P, Ogawa S, Ugurbil K and Chen W. Simultaneous oxygenation and perfusion imaging study of functional activity in primary visual cortex at different visual stimulation frequency: Quantitative correlation between BOLD and CBF changes. Magn Reson Med 40: 703-711, 1998. 114 Chapter 4 The temporal envelope of sound determines the time-pattern of fMRI responses in human auditory cortex INTRODUCTION Spatial and temporal codes are two general schemes for representing various features of sensory stimuli in the brain. Prominent examples of spatial coding include the orderly tonotopic, retinotopic, and somatotopic mappings in cortical and subcortical structures of the auditory, visual, and somatosensory systems. For example, in the auditory system, neurons in the auditory nerve are preferentially tuned to a particular sound frequency due to the filtering properties of the cochlea, and this tuning is maintained in orderly spatial maps of “best frequency” in structures ranging from the cochlear nucleus to auditory cortical areas. In addition to any existing spatial organization, the time course of neural activity encodes information about the temporal aspects of a stimulus, such as its duration, or the timing between successive stimuli. An example of temporal coding is the manner in which neural firing synchronizes to the amplitude modulation in an acoustic stimulus, up to certain limiting modulation rates (e.g., Schreiner and Urbas 1988; Phillips et al. 1989; Langner 1992). Much of this knowledge regarding the spatio-temporal encoding of stimulus features comes from 115 116 Chapter 4: Sound envelope determines fMRI time-pattern microelectrode recordings in animals, which may not directly apply to humans in all respects, due to anesthesia and interspecies differences. Thus, it is important to study stimulus coding directly in humans. Functional magnetic resonance imaging (fMRI) is one non-invasive technique for studying the activity patterns of the human brain. The fMRI response reflects hemodynamic changes that arise from changes in neural activity (i.e., neural spiking, and excitatory and inhibitory synaptic activity; Auker et al. 1983; Nudo and Masterton 1986; Jueptner and Weiller 1995; Heeger et al. 2000; Rees et al. 2000; Logothetis et al. 2001). These hemodynamic changes occur on a time-scale of seconds. Thus, the fMRI response essentially reflects the time-envelope of population neural activity in a local region (i.e., voxel) of the brain. Because of the high spatial resolution (~1 mm) of fMRI compared to other neuroimaging techniques, neurophysiological investigations using fMRI have primarily focused on the representation of information in spatial patterns of brain activity. Indeed, fMRI studies have provided direct evidence in humans regarding tonotopic (Talavage et al. 2000), retinotopic (Sereno et al. 1995), and somatotopic mappings (Rao et al. 1995). In contrast, the potential for fMRI to uncover representations in the temporal patterns of activity has generally been ignored. That fMRI can provide information concerning temporal coding is strongly suggested by two recent studies showing dramatic, sound-dependent changes in the waveshape of fMRI responses from human auditory cortex (Chap. 2; Giraud et al. 2000). For instance, a low rate (2/s) noise burst train generates cortical fMRI responses that are primarily “sustained”, with a signal level that stays elevated throughout the train duration. In contrast, noise burst trains at higher rates (35/s) elicit a “phasic” response, characterized by signal peaks after train onset and offset (Chap. 2). While sustained responses are well-known to occur for various sounds (e.g., speech, music), phasic responses are a relatively new discovery, and the types of sounds that elicit them are largely unknown. The wide range of fMRI response waveshapes seen in the previous studies indicates substantial differences in the time-pattern of population neural activity for low vs. high-rate noise burst trains, and hence differences in the neural coding of these sounds. The sustained responses for Introduction 117 low rate trains suggest ongoing neural activity throughout the train. In contrast, the signal decline that forms the initial peak of phasic responses is highly suggestive of strong neural adaptation during the first seconds of a high-rate train, while the peak after train offset suggests strong neural offresponses. The resulting concentration of neural activity at the onset and offset of high, but not low, rate trains is especially interesting in light of the perceptual differences between these sounds. The noise bursts of low rate trains can be discerned individually, whereas those of high-rate trains fuse to form a continuous (but modulated) percept. Thus, the peaks in neural activity at sound onset and offset in the high-rate case delineate the endpoints of a sound sequence whose elements are grouped across time so as to form a single auditory object. This qualitative connection between perception and brain activity raises the possibility that auditory objects are generally delineated in the temporal envelope of population neural activity, with neural adaptation and off-responses subserving this delineation. The present study begins to investigate these ideas in human listeners using fMRI response waveshape as an assay of population neural activity in auditory cortex. We specifically explore the relationship between fMRI waveshape and physical sound features, since these features ultimately determine a sound’s perceptual characteristics. Since so little is known about phasic fMRI responses, we began by establishing that this novel response form is actually produced by a variety of sounds. We then proceeded to determine exactly which aspects of sound most strongly influence fMRI response waveshape, a fundamental issue left unresolved by the previous work. For instance, in our previous study, noise burst duration was held constant while rate was varied, so stimulus sound-time fraction (STF) co-varied with rate, raising the possibility that rate was not the primary sound feature coded in the changing fMRI timepatterns. Both our previous study and that of Giraud et al. (2000) considered only broadband stimuli at one stimulus level, leaving open the additional possibility that sound bandwidth or level is also coded via dramatic temporal changes in population neural activity. The present study systematically examined the relationship between specific sound features and the time-envelope of population neural activity seen with fMRI. Two complementary sets of experiments were performed for the present study. The main set explored responses produced by sounds ranging from simple stimuli like tone or noise burst trains to 118 Chapter 4: Sound envelope determines fMRI time-pattern more complex stimuli like speech and music. For these experiments, the acoustic noise during fMRI was handled by essentially trading spatial coverage of widespread auditory cortical areas for the ability to investigate many stimuli per experiment. The approach involved imaging a single slice that targeted posterior auditory cortex, including primary auditory cortex on Heschl's gyrus and the immediately lateral non-primary areas of the superior temporal gyrus. Through these experiments we established that phasic, as well as sustained responses occur for a variety of different stimuli. We further determined that sound temporal envelope characteristics (rate and sound-time fraction), but not level or bandwidth, are strongly coded in the time-pattern of fMRI activation in posterior auditory cortex. The second set of experiments studied two sounds with very different temporal characteristics to test whether the coding of stimulus temporal characteristics in response waveshape applied for other cortical areas besides posterior auditory cortex. In these experiments the fMRI acoustic noise was handled using an approach that favored the coverage of widespread areas over the use of a large number of stimuli. Specifically, the approach involved clustered volume acquisition with a long (8 s) intercluster interval and a temporal sampling scheme that enabled the reliable reconstruction of fMRI response waveshape (e.g., Robson et al. 1998; Belin et al. 1999). These experiments showed that phasic, as well as sustained responses occur in widespread cortical areas. Moreover, they indicate that sound temporal envelope characteristics are strongly represented in the time-pattern of fMRI activation throughout auditory cortex. METHODS Twenty-six subjects participated in forty total imaging sessions. Subjects ranged in age from 21 to 38 years (mean ~26 years). Seventeen of the subjects were male and twenty-two were righthanded. Subjects had no known audiological or neurological disorders. The majority of imaging sessions (37) examined response waveshape in posterior auditory cortex (“single-slice experiments”), using a variety of stimuli. The three remaining sessions examined waveshape throughout auditory cortex (“multislice experiments”) using two of these Methods stimuli. 119 Most of the imaging sessions (25) were conducted expressly for the present study. However, some were part of our previous investigation of repetition rate (5 sessions; Chap. 2) or a separate study that examined the dependencies of activation on sound level (10 sessions; Sigalovsky et al. 2001). The data from these sound level experiments were acquired while subjects performed a task that was not included for the other experiments. Therefore, these data are only compared with each other (i.e., in RESULTS: “Insensitivity of waveshape to sound level in posterior auditory cortex”), except to note a possible effect of task on waveshape. All studies were approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital, and all subjects gave their written informed consent. Stimuli All stimuli were presented binaurally and had a total duration of 30 s. They consisted of trains of broadband noise bursts with various rates and sound-time fractions, trains of narrowband noise bursts, trains of tone bursts, trains of clicks, continuous broadband noise, orchestral music, and speech. Trains of broadband noise bursts – Bursts of uniformly distributed white noise were presented in a 30 s train at repetition rates of 2/s, 10/s, and 35/s. The bursts had a rise and fall time of 2.5 ms. They were usually 25 ms in duration (full width half maximum), resulting in sound-time fractions (STFs) of 5%, 25%, and 88% for the 2/s, 10/s, and 35/s trains, respectively. However, in some sessions, 2/s and/or 35/s trains with other STFs were studied – specifically: 2/s and 35/s trains with an STF of 50% (burst duration for the 2/s train: 250 ms; for the 35/s train: 14.3 ms) and 35/s trains with an STF of 25% (burst duration: 7.1 ms). The repeated bursts within a train were identical (i.e., “frozen”), but they differed across trains (except for two sessions from our previous investigation of repetition rate, in which the noise burst was frozen throughout an entire imaging “run”, but differed across runs). Trains of narrowband bursts – Two types of narrowband stimuli were examined: tone bursts and narrowband (third octave) noise bursts. The bursts were presented in a train at either 2/s or 35/s. 120 Chapter 4: Sound envelope determines fMRI time-pattern Burst center frequency was either 500 Hz or 4 kHz (rise and fall time: 2.5 ms; duration 25 ms). The repeated bursts were identical within a train, but differed across trains (and runs). Continuous noise – The continuous noise was uniformly distributed and white (i.e., uncorrelated) across its entire 30 s duration. Thus, there was no repetition in the temporal fine structure of the continuous noise, in contrast with the “frozen” noise burst trains. Trains of clicks – Clicks were presented in a train at rates of either 35/s or 100/s. The duration of the individual clicks was ~100 usec. Running speech – The speech stimulus was created by concatenating “conversational” sentences taken from the Harvard IEEE Corpus (IEEE 1969).1 The same male professional speaker spoke all sentences. The amplitude envelope of the speech was low-pass, with a power spectral density 10 dB down at 5 Hz relative to its peak at 1.3 Hz. Orchestral music – The music stimulus was the first 30 s of the fourth movement in Beethoven Symphony No. 7. The maximum power in the music amplitude envelope occurred at 0.69 Hz, with harmonics at 1.2, 2.5, and 4.9 Hz that were all within 10 dB of the power at 0.69 Hz. Stimulus level Except in sessions that examined the effects of sound level, levels were approximately 55 dB above threshold (SL), determined separately for each ear and stimulus (to within 5 dB) in the scanner room immediately prior to the imaging session. The resulting sound pressure levels ranged from ~60 to 90 dB SPL, with the majority of cases (> 70%) falling in the range of 70 – 85 dB SPL. (SPL was computed based on the root-mean-square of the entire 30 s stimulus, after first filtering by the frequency response of the sound delivery system). During both threshold determination and functional imaging, there was an on-going low-frequency background noise produced primarily by the pump for the liquid helium (see “Sound delivery” below). 1 The recordings were obtained from the Research Laboratory of Electronics, Massachusetts Institute of Technology. Methods 121 In sessions that examined the effects of stimulus level (a total of 12), level was varied over a 30 to 40 dB range. The stimuli that were varied in level were (1) 35/s (88% STF) noise burst trains (studied at 40, 55, 70 dB SL; 2 sessions), (2) orchestral music (30, 50, 60 dB SL; 5 sessions), and (3) continuous noise (35, 45, 55, 65, 75 dB SL; 5 sessions; 3 – 4 levels were studied in any given session2). The latter data (for music and continuous noise) were collected in our separate study of sound level. The absolute sound pressure levels for the level sessions ranged from ~60 to 100 dB SPL. Task Subjects were instructed to listen “attentively” to the stimuli. Subjects were monitored to ensure that they remained alert throughout an experiment (typically via a non-verbal signal from the subject at the end of each imaging run in response to a question from the experimenter). In sessions from our separate study of sound level (using continuous noise and music), subjects performed an additional task. Specifically, at the beginning and end of each 30 s stimulus “on” period, subjects controlled a knob to turn on or off an array of lights (Melcher et al. 2000). Sound delivery Stimuli were delivered binaurally through a headphone assembly that also attenuated scanner-generated sounds. Specifically, stimuli were produced by a D/A board (running under LabView), amplified, and fed to a pair of piezoelectric transducers. For our previous study of sound level using continuous noise and music, the transducers were incorporated directly into sound attenuating earmuffs placed over the subject’s ears (sound delivery system I; custom built by GEC Marconi, Inc.). For all remaining sessions, the transducers were housed in a shielded box adjacent to the scanner (system II). In this latter setup, the output of the transducers reached earmuffs placed 2 Specifically, three of these five sessions used levels of 35, 55, 65, and 75 dB SL, one session used 35, 45, 55, and 65 dB SL, and one session used 35, 55, and 75 dB SL. 122 Chapter 4: Sound envelope determines fMRI time-pattern over the subject's ears via air-filled tubes. The frequency response of both systems, measured at the subject’s ears, was low pass with a cutoff frequency of 10 kHz (system I) or 6 kHz (system II). Acoustic stimulation paradigm In each imaging session, responses were measured for between 2 to 6 different stimuli. Stimuli were presented during 30 second “on” periods, alternated with 30 s “off” periods during which no auditory stimulus was presented. Stimulus presentation was organized into individual “runs” composed of 4 – 5 such on/off “blocks”. The different stimuli in an imaging session were typically presented once each run, and their order was varied across runs. However, in the following cases the same stimulus was repeated within an imaging run (but consecutive runs used different stimuli): 1) all repetitions of the orchestral music, 2) our previous sound level experiments using continuous noise or music (i.e., same level and stimulus type was presented throughout each run), 3) two of the sessions from our previous study of repetition rate, and 4) the multislice experiments. For the single-slice experiments, the various stimuli within a session were repeated an equal number of times (7 to 13). An exception was the music stimulus, which was typically repeated just 4 times (in a single imaging run collected at either the beginning or end of the functional imaging).3 For the multislice experiments, there were 32 – 40 repetitions of the high rate (35/s) noise burst train, and 8 repetitions of the music stimulus. The reason for the fewer music repetitions in both the single and multislice experiments is that music generally evokes robust responses, so fewer stimulus repetitions were necessary. Handling scanner acoustic noise The earmuffs of the sound delivery systems attenuated the two main types of scanner acoustic noise. The two types of scanner noise are: (1) an ongoing low-frequency background noise produced primarily by the pump for the liquid helium (used to supercool the magnet coils), and (2) 3 However, for the instances of music from our previous study of sound level, there were always eight repetitions of the music (i.e., eight on/off blocks) at a given level. Methods 123 gradient noise generated by flexing of the gradient coils. Accounting for the attenuation provided by the earmuffs, the pump noise reaches levels of ~60 dB SPL in the frequency range of 50-300 Hz (Ravicz et al. 2000; Ravicz and Melcher 2001). The short duration “beep” of the gradient noise reached peak levels of approximately 85 dB SPL at ~1.0 kHz on the 1.5T scanners and 95 dB SPL at ~1.4 kHz on the 3.0T scanners (both values are again estimates for the SPL at subjects’ ears, after the attenuation provided by earmuffs). The gradient noise was further handled in two complementary ways. For the majority of sessions (“single-slice experiments”), a single slice was imaged in order to reduce the impact of the scanner-generated acoustic noise on auditory activation while still allowing two other goals to be met simultaneously. These goals were to 1) investigate many stimuli per session, and 2) maintain a temporal sampling sufficient to capture the time-pattern of the fMRI response. For the three experiments imaging multiple slices (“multislice experiments”), all the slices of the functional volume were acquired within a brief interval (< 1 s) once every 8 s (TR). This “clustered acquisition” of the slices within a volume, in conjunction with a long interval between clusters, reduces the impact of the scanner noise on the responses to sound stimuli (Belin et al. 1999; Edmister et al. 1999; Elliott et al. 1999; Hall et al. 1999). To maintain “temporal sampling” in spite of the long interval between volume acquisitions, the onset of the stimulus relative to the first volume acquisition was staggered by 2 s increments from run to run (e.g., Robson et al. 1998; Belin et al. 1999). Consequently, across the multiple runs for a given stimulus, the functional data in toto included samples acquired every 2 s relative to the stimulus. However, restoring temporal sampling in this manner required an increased number of presentations per stimulus, thus limiting the number of stimuli that could be studied per session to just two (music and 35/s, 88% STF noise burst trains). Several pieces of evidence indicate that scanner gradient noise had little or no effect on response waveshape. For the multislice experiments, the image acquisitions mainly reflect the response to the sound stimulus for two reasons: (1) the long interval between clusters allows the response evoked by the acoustic noise of a given cluster to decay appreciably before the next cluster (Edmister et al. 1999; Hall et al. 2000), and (2) the time-delay of the fMRI response means that the response to a given cluster does not occur until after the cluster has already ended (Talavage et al. 124 Chapter 4: Sound envelope determines fMRI time-pattern 1999; Hall et al. 2000). The fact that the multislice and single-slice experiments of the present study showed no discernable difference in waveshape for either music or 35/s noise bursts (c.f., Figures 4-9 and 4-10 to Figure 4-2) indicates that the gradient noise during the single-slice experiments also had little or no effect on waveshape. Finally, in a previous examination of the effect of TR duration on waveshape (in two subjects), we found that 35/s noise burst trains evoked phasic responses regardless of whether the response was constructed from runs (all single-slice) with a TR of either 2 s or 8 s (Harms and Melcher 1999). Imaging Subjects were imaged using whole-body scanners and standard head coils (either transmit/ receive or receive-only) while resting supine. Head motion was limited using either 1) a bite bar custom-molded to the subject’s teeth and mounted to the head coil, or 2) pillow and foam padding packed snugly around the subject’s head. Each imaging session lasted ~ 2 hours. Single-slice experiments – Imaging was performed on several different scanners, due to both equipment changes beyond the authors’ control and a desire to take advantage of higher field strength magnets when possible. Specifically, imaging was performed on five different systems: a 1.5T or 3.0T General Electric scanner retrofitted for high-speed, echo-planar imaging (by Advanced NMR Systems, Inc.; 1.5T: 10 sessions; 3.0T: 6 sessions), a 1.5T General Electric Signa Horizon scanner (10 sessions), a 1.5T Siemens Sonata scanner (5 sessions) and a 3.0T Siemens Allegra scanner (6 sessions). To check whether response waveshape was influenced by the imaging system employed, we computed average responses for each imaging system for the three stimuli that were each studied in at least three sessions on at least four systems: 2/s noise bursts (5% STF), 35/s noise bursts (88% STF), and orchestral music. To focus on response waveshape, we then normalized these average responses to a peak of one, so as to remove possible amplitude differences related to imaging system (Ogawa et al. 1993; Gati et al. 1997; Fujita 2001). There were no obvious differences in waveshape between systems. To the extent that there was any hint of inter-system differences, they were far smaller than the differences in waveshape that result from changes in stimulus temporal characteristics (e.g., from 2/s to 35/s noise bursts). Furthermore, there did not Methods 125 appear to be any systematic relationship between waveshape and either field strength or magnet manufacturer. Importantly, our analysis of the dependence of response waveshape on particular stimulus variables is based on intra-session comparisons across stimuli, for which the imaging system was obviously constant. The brain slice to be functionally imaged was selected using contiguous sagittal images of the whole head that were acquired at the beginning of each imaging session. The selected slice intersected the inferior colliculus and the posterior aspect of Heschl's gyrus and the superior temporal gyrus (a predominantly coronal slice plane). When there appeared to be multiple Heschl's gyri, we used the anterior one (which includes primary auditory cortex) to position the slice (Penhune et al. 1996; Leonard et al. 1998). Functional images of the selected slice were acquired using a blood oxygenation level dependent (BOLD) sequence. For the 1.5T experiments, the sequence parameters were: asymmetric spin echo, TE = 70 ms, τ offset = -25 ms, flip = 90o. For the 3.0T experiments the parameters were: gradient echo, TE = 30 ms (except one session used 40 ms and another used 50 ms), flip = 600 or 900. The beginning of each functional run included four discarded images to ensure that image signal level had approached a steady state. Slice thickness was always 7 mm with an in-plane resolution of 3.1 x 3.1 mm. A T1-weighted anatomical image (in-plane resolution = 1.6 x 1.6 mm, thickness = 7 mm) of the functionally imaged slice was also obtained in all sessions and used to localize auditory cortex. While the present paper focuses on responses from auditory cortex, our experiments imaging a single slice were designed to also examine the inferior colliculus. Therefore, functional images were generally collected using a cardiac gating method that increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998). Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, resulting in an average interimage interval (TR) of approximately 2 s. Image signal was corrected to account for the image-to-image variations in signal strength (i.e., T1 effects) that result from fluctuations in heart rate (Guimaraes et al. 1998). In the 2 sessions that did not use cardiac gating, the TR was 2 s. 126 Chapter 4: Sound envelope determines fMRI time-pattern Multislice experiments –The multislice experiments were conducted on the 3.0T General Electric scanner. The imaged volume consisted of 10 contiguous slices (in-plane resolution: 3.1 x 3.1 mm; slice thickness: 7 mm), one of which passed through the same “inferior colliculus, Heschl’s gyrus” slice plane used in the single-slice experiments. The beginning of each functional run included one discarded image volume. Functional images were acquired with a gradient echo sequence (TE = 30 ms, flip = 600). Image pre-processing Prior to response detection, the following pre-processing steps were performed. First, the images for each scanning run were corrected for any in-plane movements of the head that may have occurred over the course of the imaging session. Specifically, each functional image of a session was translated and rotated to fit the first image of the first functional run using standard software (SPM95 software package; without spin history correction; Friston et al. 1995; Friston et al. 1996). For the single slice experiments using cardiac gating, there was an additional pre-processing step. Because cardiac gating results in irregular temporal sampling, the time series for each imaging “run” and voxel was linearly interpolated to a consistent 2 s interval between images, using recorded interimage intervals to reconstruct where each image occurred in time. Response detection Responses were detected using a general linear model (GLM) and a set of basis functions designed to detect the wide range of response waveshapes known to occur in auditory cortex. This approach has been described and tested in detail previously (Chap. 3). The basic idea behind the GLM is to model the signal vs. time within each voxel as a weighted sum of basis functions, and then to identify “active” (i.e., “sound-sensitive”) voxels based on the goodness of fit of this model. Briefly, the basis set consists of five components, designed to provide direct information about response waveshape: Onset, Sustained, Ramp, Offset, and Undershoot (Figure 4-1). The onset component reflects the magnitude of an initial transient response that is above and beyond the level of any sustained response (reflected in the sustained component). The ramp component provides Methods 127 flexibility for modeling response changes that occur over the latter two-thirds of the “sound on” period. The offset component models transient signal increases following stimulus termination. The undershoot component is so-named because this component is included primarily to help model responses in which the signal decreases below baseline following a sustained response.4 These basis functions, appropriately weighted, are able to capture response waveshapes ranging from sustained to phasic, as illustrated in Figure 4-1. For the “sustained” waveform in Figure 4-1, the sustained component has the largest amplitude. For the “phasic” waveform, the onset and offset components have the largest amplitudes, and there is an appreciable ramp component (consistent with the signal increase seen over the latter half of the “sound on” period). The GLM was implemented separately for each imaging session, with each stimulus within a session being represented by its own complete basis set. In constructing the design matrix, the basis functions for each component were sampled at the appropriate times, as determined by the temporal relationship between the stimulus and imaging for a given functional run (i.e., every 2 s for the single-slice experiments; every 8 s for the multislice experiments). For all sessions, we assumed that response waveshape and magnitude were constant across repeated presentations of a given stimulus (within a session), so the GLM did not incorporate the possibility of a progressive “habituation” in the amplitude of the basis functions across stimulus repetitions. (This assumption is justified by an analysis performed in Chap. 2). In implementing the GLM, we included three additional functions in the design matrix for each imaging run to handle low-frequency “trends” in the signal vs. time. These included the standard column vector of ones (for estimating the mean), and both a linear and quadratic vector for estimating any signal drift during the course of the run. The estimated amplitudes of these three functions were ignored in the creation of the activation maps. Lastly, as part of the GLM, an estimate of the noise autocorrelation was used to pre-whiten the data from the single-slice experiments, so as to bring the false-positive rate closer in line with its theoretically predicted value (see Chap. 3 for details). No pre-whitening was applied to the data from the 4 The preceding descriptions of the five components represent their “physiological” rationale. From a strictly mathematical perspective, the basis functions are simply weighted (positive or negative) to give the best (i.e., least square error) fit to the fMRI signal vs. time. 128 Chapter 4: Sound envelope determines fMRI time-pattern Fitting of OSORU Basis Functions to Two Different Response Waveshapes Sustained Phasic Response Response onset sustained ramp offset Percent Signal Change undershoot Measured Waveform Sum of Basis Functions 1 0 0 30 Time (s) 60 0 30 60 Time (s) Figure 4-1: An example illustrating the “best fit” amplitudes of the five basis functions to prototypical “sustained” (left) and “phasic” (right) responses. These prototypical responses (dashed lines in bottom row) represent the average response of Heschl’s gyrus to 2/s and 35/s noise burst trains, respectively (Figure 4-2). The amplitudes of the basis functions were determined using a general linear model (i.e., linear regression). Summation of the basis functions for each response type yields the solid line in the bottom row. For this particular example, a vector of ones was not included in the linear model. (If included, the basis function amplitudes would be slightly different, and there would have been less of a difference between the actual and fitted responses during the final 15 s). Methods 129 multislice experiments, since the residuals without pre-whitening were already consistent with a hypothesis of “white” (uncorrelated) noise, presumably due to the long interval (8 s) between volume acquisitions in a given imaging run. For each stimulus in a given imaging session, we created an “omnibus” activation map (using an F-statistic) that tested against the null hypothesis that none of the estimated amplitudes of the basis functions were significantly different from zero (Chap. 3). “Active” voxels were defined as those with p-values less than 0.001 (not corrected for multiple comparisons). Waveshape quantification Quantification of the response to a given stimulus was generally performed by combining the active voxels within a given anatomically-defined region of interest (Heshl's gyrus – HG, superior temporal gyrus – STG, or antero-medial region – AM; see “Defining regions of interest”). For these analyses, the amplitudes of a given basis function were averaged across all the active voxels in a given region (across both hemispheres for the single-slice experiments; across a given hemisphere and slice for the multi-slice experiments). These average amplitudes were then converted to a “percent change” scale by dividing by the estimated signal mean (i.e., the average amplitude over the same active voxels of the “trend” basis function comprised of a vector of 1’s) and multiplying by 100. We denote the resulting amplitudes of the onset, sustained, ramp, and offset components as On, Sust, Ramp, and Off, respectively. Mid, a measure of the response amplitude near the middle of the “sound on” period, was defined as the sum of Sust plus one-half of Ramp. (The undershoot component was included in the basis set to help fit responses with that particular feature, but was not used to quantify response waveshape). For a given stimulus and region, we required a minimum of at least three active voxels (in total across the left and right hemispheres for the single-slice sessions) in order to include that stimulus/region combination in the subsequent analyses. Our overall database collected over all the single-slice sessions included 161 responses for 130 Chapter 4: Sound envelope determines fMRI time-pattern both HG and STG. Of these, 5 responses for STG and one for HG were excluded because the threevoxel criterion was not met.5 The overall waveshape of responses was quantified using a summary measure, the “waveshape index”, capable of broadly distinguishing between sustained and phasic responses. The basic idea behind the “waveshape index” (WI) was to compare the total amount of “transient activity” (defined as the sum of the onset and offset amplitudes) to the activity at the midpoint of the response. Secondarily, the WI was designed to yield smaller (more “sustained”) values when the transient activity was primarily limited to either the onset or offset component, and larger (more “transient”) values as the onset and offset components approached each other in magnitude. The exact formulation of the WI was chosen so as to yield a robust measure that stayed within a finite range. While a single number obviously cannot encapsulate all the dynamics of an fMRI response, the WI is a convenient measure for summarizing the overall dynamics of a response waveform. Reasonable behavior of the WI was previously confirmed by examining how well the WI qualitatively sorted various waveforms (derived from the same underlying database as the present study; Chap. 3). Specifically, the WI was defined as:  1 On + Off   ∈ [0,1] 2 ï£ Mid + max(On, Off )  (1)6 Note that the WI depends only on the response waveshape, and is unchanged if a response is scaled throughout by a constant factor. In some instances (i.e., the figures displaying spatial maps of the WI), the WI was calculated on a voxel-by-voxel, rather than regional basis, using measures analogous to On, Mid, and Off for individual voxels. 5 Specifically, the following cases were eliminated from analysis: one instance in STG of the 2/s (5% STF) noise burst train, one instance in STG of the 35/s noise burst train with a 50% STF, one instance in STG of music, and both the 2/s and 35/s tone burst trains (4 kHz) in HG and STG of one session (thereby removing this session entirely from the analysis of the effect of sound bandwidth on response waveshape). 6 Technically, prior to their use in this equation, On, Off, and Mid were all rectified (i.e., negative values were converted to zero). However, in all other instances in this paper referring to these measures (i.e., outside of the WI calculation), their amplitudes were not rectified (e.g., the calculations of ∆On, ∆Mid, and ∆Off in Figures 4-6 and 4-7). Methods 131 Using this definition, maximum values for the WI (i.e., near 1) can only result if the two transient components are similar in magnitude and are large relative to the midpoint response. Values near one-half can reflect a response consisting of solely an onset or offset response, or alternatively a combination of onset and offset activity in a response also having some midpoint activity. Values near zero reflect a response dominated by the midpoint response (i.e., by the sustained and/or ramp components). Additionally, we examined the effect of stimulus changes on the three individual elements that together constituted the WI – namely, On, Mid, and Off. Because these measures are not normalized, they differ from the WI in that they also incorporate information regarding the actual amplitude of these response features. Calculating response waveforms Single-slice experiments – We computed empirical response waveforms by averaging across repeated presentations of a given stimulus in a given imaging session. Specifically, following motion correction, image signal vs. time for each voxel was corrected for linear or quadratic drifts in signal strength over each run, and then normalized so that each voxel had the same (arbitrary) mean intensity. The resulting time series for each imaging run and voxel was linearly interpolated to a consistent 2 s interval between images (for the cardiac gated experiments), and then temporally smoothed using a three point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response “block” was defined as a 70 s window (35 images) that included 10 s prior to stimulus onset, the 30 s coinciding with the stimulus “on” period, and the 30 s “off” period following the stimulus. These response blocks were averaged according to stimulus to give an average signal vs. time waveform for a given stimulus in a session. For each stimulus and session, we further averaged signal vs. time across the “active” voxels in either HG or STG. The resulting “grand-average” waveform was then converted to percent change in signal relative to baseline. The baseline was defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the stimulus. Response waveforms are included to illustrate the signal vs. time of the actual data. However, all of the actual response quantification was based on the amplitudes of the basis functions as estimated under the general linear model (i.e., WI, On, Mid, and Off). 132 Chapter 4: Sound envelope determines fMRI time-pattern Multi-slice experiments – Response waveforms for a given stimulus were again computed by averaging data across runs to form a single average response over a 70 s window. This averaging accounted for the staggered timing between stimulus and volume acquisition from run to run, such that after appropriately interleaving the data, the response was sampled every 2 s relative to the stimulus. The average response was temporally smoothed and converted to percent change in signal as described above. Defining regions of interest For the single-slice experiments examining posterior auditory cortex, response waveshape was quantified for two anatomically-defined regions of interest: Heschl’s gyrus (HG) and the superior temporal gyrus (STG). The borders of these regions were defined in both hemispheres from the T1-weighted anatomical images (and subsequently down-sampled to the same spatial resolution as the functional images). When Heschl's gyrus was visible as a “mushroom” protruding from the surface of the superior temporal plane, the lateral edge of this mushroom defined the lateral edge of the HG region. The medial edge was the medial-most aspect of the Sylvian fissure. When a distinct mushroom was not present, the HG region covered approximately the medial third of the superior temporal plane (extending from the medial-most aspect of the Sylvian fissure). In the superiorinferior dimension, the HG region extended superiorly to the edge of the overlying parietal lobe, and inferiorly to the superior edge of the superior temporal sulcus (or medial extension thereof). In general, activation was confined to the superior two-thirds of this region (i.e., clearly centered on the “mushroom-like” protrusion). The STG region was defined as the superior temporal cortex lateral to the HG region. The definition of the inferior and superior borders was the same as for the HG region. Again, a clear preponderance of the active voxels generally occurred within the superior two-thirds of the STG region. For the multislice experiments, auditory cortex was divided into three anatomically-defined regions of interest: Heschl’s gyrus (HG), superior temporal gyrus (STG), and an anterior-medial region (AM). Conceptually, given the typical antero-lateral to postero-medial course of Heschl’s gyrus along the superior temporal plane, STG was defined as the cortex lateral and posterior to Results 133 Heschl’s gyrus, while AM was defined as the cortex medial and anterior to Heschl’s gyrus. Therefore, before defining STG and AM, we first identified the medial-lateral and anterior-posterior extent of Heschl’s gyrus. Unlike the single-slice experiments, the medial edge of the HG region was not necessarily the medial-most aspect of the Sylvian fissure. Rather, when there was a clear circular sulcus, the medial aspect of the HG region ended approximately one-half the distance from the crown of Heschl’s gyrus to the depth of the circular sulcus. The tissue in the depth of this sulcus to the medial-most aspect of the Sylvian fissure was defined as the AM region. In general, there is little or no AM in the posterior auditory cortex studied in the single-slice experiments, hence the exclusion of this region from the single-slice analyses. For the more anterior slices in which no HG was present, the lateral limit of the AM region was determined by assigning to the AM region a distance along the cortical surface that was equivalent to that distance in the slice with the most anterior HG region. The STG region was defined as the cortex lateral to the HG region, or if HG was not present, as the cortex lateral to AM. (For the most posterior slices in which neither HG nor AM were present, STG was defined as the entire medial-lateral extent of the superior temporal plane). The formal definition of the HG and STG regions also differed slightly from the single-slice experiments in one other respect, which was that the inferior limit of these regions did not include the tissue that was immediately superior to the superior temporal sulcus. The HG, STG, and AM regions were defined separately for each hemisphere. RESULTS Waveshape dependence on stimulus type in posterior auditory cortex The waveshape of fMRI responses from posterior auditory cortex depended strongly on the type of stimulus, as illustrated in Figure 4-2. The eight stimuli represented in this figure have quite different temporal characteristics, but were similar in spectrum (all were broadband), and sensation level (all were approximately 55 dB SL). Altogether, the responses in Figure 4-2 range from phasic (left) to sustained (right). 134 Chapter 4: Sound envelope determines fMRI time-pattern Figure 4-2: Average responses to eight different stimuli, along with corresponding waveshape indices (WIs) from individual imaging sessions. Results are shown separately for HG and STG. The average responses for each stimulus are computed from the same sessions represented by the WI values. Dashed lines give the standard error of the responses across sessions. WIs near one indicate a response with approximately equal onset and offset components, and comparatively low signal near the “midpoint” of the “sound on” period (i.e., “phasic” responses). Conversely, WIs near zero indicate a response with small onset and offset components, and comparatively large signal near the midpoint (i.e., “sustained” responses). The data in this figure is restricted to the responses collected at a stimulus level of ~55 dB SL and from sessions that did not involve a task. Individual noise bursts were always broadband, and were 25 ms in duration. 1.0 0 1 2 Waveshape Index (WI) Superior Temporal Gyrus Response 0 0.5 1.0 0 1 2 0 Waveshape Index 0.5 (WI) Heschl's Gyrus Response % Signal Change % Signal Change N=2 30 0 N=2 30 Sound On 0 100/s Clicks 0 0 N=27 30 N=27 30 35/s Noise Bursts 0 0 N=5 30 N=5 30 Continuous Noise 0 0 N=4 Time (s) 30 N=4 Time (s) 30 35/s Clicks 0 0 N=7 30 N=7 30 10/s Noise Bursts 0 0 N=22 30 N=23 30 2/s Noise Bursts 0 0 N=3 30 N=3 30 Speech 0 0 N=18 30 N=19 30 Music Results 135 Figure 4-2 136 Chapter 4: Sound envelope determines fMRI time-pattern Figure 4-2 shows that phasic and sustained responses can be elicited by a variety of stimuli. For instance, both 100/s clicks and 35/s noise bursts produced highly phasic responses in HG and STG. These phasic responses were characterized by a prominent signal decline (80-120%) following an initial signal peak (at ~ 6 s), and a clear peak after sound offset (at ~ 36 s). These stimuli consistently evoked phasic responses in individual sessions, as evidenced by the preponderance of WIs greater than 0.6 – values indicative of responses with distinctly phasic waveforms (Chap. 3). At the other end of the waveshape spectrum, 2/s noise bursts, speech, and music elicited primarily sustained responses. For these stimuli, the average waveforms show only a small signal decline following the initial peak (25-30% declines, except for speech in STG, which showed a 40% decline), so the response near the midpoint of the “sound on” period remains elevated. Additionally, the waveforms lack a distinct peak after sound offset. In some individual sessions, the responses to these stimuli were more phasic than indicated in the average waveforms, due primarily to a larger signal decline following the initial peak, although there was also a small “off-peak” in some instances. Overall however, the individual responses to 2/s noise bursts, speech, and music were primarily sustained, as indicated by the typically low values for their WIs. The responses evoked by 35/s clicks, 10/s noise bursts, and continuous noise were “intermediate” in waveshape in that they displayed a blend of sustained and transient activity. The average waveforms for these stimuli exhibited an “intermediate” degree of signal decline (50-75%), and displayed either a small “off-peak” (e.g., 35/s clicks and continuous noise in STG) or evidence for a possible “hidden” off-response in the form of a slightly prolonged elevation in signal following stimulus termination (e.g., 35/s clicks and continuous noise in HG, and 10/s noise bursts). Consistent with the average waveforms, the WIs for the individual sessions also fell within an intermediate range. While Figure 4-2 demonstrates that response waveshape depends strongly on stimulus, a comparison with the data taken in our separate level experiments (not included in Figure 4-2) suggests that certain tasks may also affect response waveshape. For music, there was a tendency for responses from our previous level experiments to be slightly more phasic [mean WIs of 0.17 (HG) and 0.27 (STG) for responses at 50 dB SL, as compared to mean WIs of 0.09 (HG) and 0.12 (STG) Results 137 in Figure 4-2]. For continuous noise, responses from the level experiments were clearly more phasic [mean WIs of 0.58 (HG) and 0.74 (STG) for responses at 55 dB SL, compared to 0.34 (HG) and 0.49 (STG) in Figure 4-2). The finding of more phasic responses in these level experiments may be highly specific to the particular task required, which was performed only at stimulus onset and offset.7 Thus, we cannot conclude that all tasks will affect response waveshape, or will do so in the same way. While these comparisons suggest that factors in addition to stimulus characteristics may influence response waveshape, the wide variations in waveshape that occur across stimuli when task is held constant (e.g., Figure 4-2) indicate that stimulus characteristics are a major determinant of response waveshape. Waveshape dependence on modulation rate in posterior auditory cortex Since the stimuli in Figure 4-2 primarily differ in their temporal characteristics, the data strongly suggest that stimulus temporal characteristics play a prominent role in determining the dynamics of auditory fMRI responses. For instance, stimulus modulation rate had a clear effect on response waveshape in that, on average, higher rate stimuli (i.e., 100/s clicks and 35/s noise bursts) typically elicited phasic responses, whereas the stimuli dominated by low modulation rates (2/s noise bursts, speech, and music) elicited more sustained responses. The dependence of waveshape on rate held within individual sessions, as well as on average. In our experiments, rate was varied within session in three ways: (1) between 2/s and 35/s for noise bursts (STF = 5% for 2/s, 88% for 35/s), (2) from 2/s to 10/s to 35/s for noise bursts (STFs of 5%, 25%, and 88%, respectively), and (3) between 35/s and 100/s for clicks. In the 23 sessions that used both 2/s and 35/s noise burst trains, the WI was greater for 35/s in all but one instance in HG, and in every instance in STG (Table 4-1). This consistent difference in WI was also reflected in the individual elements that together defined the WI. Specifically, the transient components of the physiological basis set – onset (On) and offset (Off) – were almost always greater at 35/s than 2/s, 7 We believe that the comparisons reflect a task effect, rather than motion artifact because (1) there was no sign of motion correlated with the onset and offset of the stimuli, and (2) the waveshape difference between the task and no task conditions was greater for one stimulus (continuous noise) than for the other (music), despite the fact that the task was always the same. 138 Chapter 4: Sound envelope determines fMRI time-pattern whereas the midpoint signal level (Mid) was almost always less at 35/s than 2/s (Table 4-1). For all seven sessions using 2/s, 10/s, and 35/s noise burst trains8 (a subset of the preceding 23 sessions), the WI in both HG and STG for the 10/s train was greater than the value for the 2/s train, and less than the value for the 35/s train. For the two sessions that used both 35/s and 100/s clicks, the WI for 100/s clicks was greater than that for 35/s clicks in both HG and STG. Thus, the intrasession data strongly indicate a change toward more phasic responses with increasing rate. 35/s vs. 2/s noise burst trains ∆WI ∆On ∆Mid ∆Off N+/Ntotal 22/23 22/23 1/23 22/23 HG mean ± ste 0.41 ± 0.03 1.14 ± 0.14 -0.78 ± 0.13 1.00 ± 0.12 STG N+/Ntotal mean ± ste 22/22 0.42 ± 0.03 19/22 0.99 ± 0.17 1/22 -1.01 ± 0.13 22/22 1.04 ± 0.12 Table 4-1: Differences in WI, On, Mid, and Off between responses to a 35/s and 2/s noise burst train that were obtained in the same imaging session. Ntotal is the total number of sessions for which such a comparison was available. N+ is the number of sessions for which the difference (35/s minus 2/s) was positive. In both HG and STG, the difference between the two trains was significant for all measures at p < 0.001 (signed rank test). ste = standard error of the mean. In the comparisons just described, burst or click duration remained constant across rates, so rate co-varied with STF. Since this raises the possibility that waveshape is controlled by STF, and not rate, we examined the effects of rate while holding STF constant. Figure 4-3 shows that even when STF was matched at 50%, the average response waveform for a 35/s noise burst train was still more phasic than that for a 2/s train. In every session (5/5 for HG, 4/4 for STG), the WI for 35/s was greater than for 2/s, indicating more phasic responses at 35/s. However, the response difference between these two noise burst trains with equal STF was less pronounced (average ∆WI ~ 0.24) than the difference between trains of the same two rates but for which STF also co-varied markedly with 8 The seven sessions represent five subjects, since two subjects were studied twice. Results 139 Waveshape for Two Rates with Equal STF (50%) 2 2/s Superior Temporal Gyrus 1 Percent Signal Change Heschl's Gyrus 35/s Sound On 0 2/s 2 1 35/s 0 0 30 Time (s) Figure 4-3: Average responses to 2/s and 35/s noise burst trains, each having a sound-time fraction of 50%. The averages are based on responses collected in the same imaging sessions (N = 5 for HG, 4 for STG). rate (from 88% for 35/s bursts to 5% for 2/s bursts; average ∆WI between these latter two trains was ~ 0.44 for the same sessions). Thus, these results provide further support for the view that response waveshape depends strongly on stimulus rate, but suggest that STF may also influence response waveshape, as we examine in more detail next. Waveshape dependence on sound-time fraction in posterior auditory cortex STF is a second stimulus temporal characteristic that influences auditory response dynamics. This was determined from 7 sessions (6 subjects) that varied STF while holding rate constant. 140 Chapter 4: Sound envelope determines fMRI time-pattern Waveshape vs. STF 35/s Noise Bursts 1.0 Percent Signal Change Superior Temporal Gyrus 25% STF 1 0.5 Waveshape Index Heschl's Gyrus 2 0 1.0 0.5 50% 0 88% 2 25% STF 1 50% 0 88% 0 25 50 88% Sound-Time Fraction 0 30 Time (s) Figure 4-4: Examination of the effect of sound-time fraction on response dynamics for 35/s noise burst trains with STFs of 25%, 50%, or 88%. Results are shown separately for HG and STG. The left panels plot the WIs for individual sessions. The values for the 88% STF are limited to the sessions with data at either of the other two STFs. The right panels plot the response for each STF, averaged across the same sessions represented by the WI values (N = 2, 6, and 7 for 25%, 50%, and 88% in HG; N = 2, 5, 6 in STG). One subject contributes twice (i.e., two sessions) to the results for the STFs of 50% and 88% in both HG and STG. Figure 4-4 shows WIs and average response waveforms for 35/s noise burst trains with STFs of 25%, 50%, or 88%. For the 6 sessions with data for STFs of 50% and 88%, there was clearly an effect of STF in HG, where the WI was greater for the 88% STF in all sessions (p = 0.03, signed rank test; ∆WI = 0.16 ± 0.04). Changes in On, Mid, and Off also occurred in a consistent direction (On and Off: larger for the 88% STF in either 5 or 6 of 6 cases; Mid: smaller for the 88% STF in 5 of 6 cases; p ≤ 0.06 for each comparison). However, the changes in On, Mid, and Off were generally small, consistent with the fact that there were only small differences between the average response Results 141 waveforms for the 50% and 88% STFs (∆On = 0.39 ± 0.07; ∆Mid = -0.35 ± 0.09; ∆Off = 0.43 ± 0.16). In STG, the WI was greater for the 88% STF in 4 of 5 cases9 (p = 0.2), but none of the changes in On, Mid, or Off approached statistical significance (p > 0.3). However, in the two sessions with data for a larger STF differential of 25% and 88%, responses were more phasic (i.e., higher WI) at the higher STF in both HG and STG. This suggests that a larger STF differential might have resulted in larger, more robust changes in response dynamics. Given the consistent (albeit small) difference between the 50% and 88% STFs in HG, and in light of the results for the 25% STF in both HG and STG, the data overall indicates a tendency for responses to a 35/s noise burst train to become less phasic with decreasing STF. There was also evidence of an effect of STF on the response dynamics of a low rate (2/s) noise burst train. In particular, in five sessions with responses to a 2/s noise burst train with STFs of 5% and 50%, On was greater at the 50% STF in all cases, in both HG and STG (p = 0.06, signed rank test; HG: ∆On = 0.68 ± 0.33; STG: ∆On = 0.57 ± 0.12). However, a consistent change in either Mid or Off was not observed (p > 0.6). Overall, the net effect on the average response waveform was a slightly more pronounced “on-peak” at the 50% STF [c.f., the 2/s (50% STF) average waveforms in Figure 4-3 to the 2/s (5% STF) average waveforms in Figure 4-2]. Again, more pronounced changes in response waveshape might have been observed using a larger STF differential (e.g., 80% vs. 5%). Altogether, the experiments varying modulation rate and STF indicate that rate and STF are two temporal characteristics of a stimulus that influence the waveshape of responses from auditory cortex. 9 There was one less case available for comparison of the 50% and 88% STFs in STG than HG, due to an insufficient number of “active” voxels for the 50% STF stimulus in one session. 142 Chapter 4: Sound envelope determines fMRI time-pattern Waveshape vs. Level (35/s noise bursts) Waveshape vs. Rate (70 dB SL) Percent Signal Change 2 40 dB SL 55 dB SL 70 dB SL 2/s 1 35/s 0 0 30 Time (s) 0 30 Time (s) Figure 4-5: Left: Responses to a 35/s noise burst train at three different stimulus levels (40, 55, and 70 dB SL). Right: Comparison of the responses to 2/s and 35/s trains at a common level of ~70 dB SL. Individual noise bursts were broadband, and were 25 ms in duration. All responses are taken from a single imaging session. Insensitivity of waveshape to sound level in posterior auditory cortex Unlike changes in stimulus temporal characteristics, variations of stimulus level over a 30 to 40 dB range did not result in strong, systematic changes in response waveshape. This is illustrated in Figure 4-5, which shows responses from Heschl’s gyrus for one session. The left panel shows that phasic responses were elicited by 35/s noise burst trains (88% STF) regardless of level (40, 55, and 70 dB SL), whereas the right panel illustrates the markedly different responses produced by 35/s and 2/s noise burst trains of comparable level (70 dB SL). Any change in waveshape with level was far less than the change in waveshape with rate. Altogether, three sets of experiments examined the influence of stimulus level on response waveshape, and their results support the impression from Figure 4-5. Two sets of experiments examined the effect of sound level on waveshape and compared it with the effect of a change in Results 143 stimulus temporal characteristics. One of these sets used 35/s noise bursts (88% STF) of various levels (40, 55, 70 dB SL) and 2/s noise bursts (5% STF) for comparison. The second set used continuous noise of various levels (35 – 75 dB SL) and music (50 dB SL) as the comparison stimulus. A third set of experiments examined the effect of stimulus level on waveshape using music as the stimulus, but did not include a comparison stimulus with different temporal characteristics. The effect of level on WI, onset component, midpoint activity, and offset component was quantified as follows. For each session and measure, we subtracted the minimum value for a given measure across all levels from the maximum value across all levels, thus obtaining the largest absolute difference across all levels. This difference was assigned either a positive or negative sign depending on the whether the maximum value occurred for the higher, or lower, level, respectively. The resulting values are plotted in the “Level Change” columns in Figure 4-6 (unfilled symbols). For the sessions that also included a “comparison” stimulus, the difference in WI, onset, midpoint, and offset between the standard and comparison stimuli (at comparable levels) was calculated.10 A sign was assigned based on the following convention: continuous noise minus music, and 35/s minus 2/s noise bursts. The resulting values are plotted in the “Temporal Change” columns in Figure 4-6 (filled symbols). Overall waveshape, as summarized by the WI, was not affected in a systematic manner by changes in stimulus level. The changes in waveshape index (∆WI) due to the changes in level were distributed roughly equally between positive and negative values in both HG and STG (p > 0.15, signed rank test; Figure 4-6, top row). In contrast, for the same sessions, the ∆WI due to a stimulus temporal change were always the same sign (p = 0.004). Furthermore, in all but one instance (in HG), the ∆WI due to the temporal change were greater than the ∆WI due to the level change (irrespective of sign), indicating a clear effect in the magnitude of ∆WI as well. 10 Since stimuli were compared for a given sensation level, their energy differed slightly. For continuous noise and music, this difference amounted to less than 5 dB; for 35/s and 2/s noise bursts the difference was ~12 dB. 144 Chapter 4: Sound envelope determines fMRI time-pattern Figure 4-6: Changes in four different measures, due to either a change in stimulus level (unfilled symbols) or a change in stimulus temporal characteristics (filled symbols). All values represent a difference based on a within-session comparison. The values in the “Level Change” columns represent the largest change in a given measure between 3 – 4 levels presented over a 30-40 dB range. (Diamonds: 35/s noise burst train; Up triangles: continuous noise; Down triangles: music). Four of the five subjects in the sessions using music were also subjects in the sessions using continuous noise. The values in the “Temporal Change” columns represent the change in a given measure due to a large change in stimulus temporal characteristics. [Circles: Difference between values for 35/s vs. 2/s noise burst trains at either 55 dB SL (solid circles) or 70 dB SL (circles with ‘+’ symbol); Squares: Difference between values for continuous noise vs. music (50-55 dB SL)]. The “Temporal Change” values were obtained in a subset of the sessions that yielded the “Level Change” values (specifically, the sessions that varied the level of 35/s noise burst trains and continuous noise). Results 145 Effect of Level on Waveshape Heschl's Gyrus Superior Temporal Gyrus Temporal Change 0.5 Change in Waveshape Index (∆WI) 0 Level Change Temporal Change Level Change 2.0 Change in Onset (∆On) 0 -2.0 1.0 0 Change in -1.0 Midpoint (∆Mid) -2.0 -3.0 2.0 1.0 Change in Offset (∆Off) 0 -1.0 Figure 4-6 146 Chapter 4: Sound envelope determines fMRI time-pattern Examination of the individual waveshape components revealed a similar lack of a systematic level effect for the onset and midpoint. The changes in both measures (∆On and ∆Mid) due to the changes in level were equally distributed around zero for both HG and STG (p > 0.3; Figure 4-6, second and third rows). In contrast, the ∆On and ∆Mid due to a stimulus temporal change were almost always the same sign (p ≤ 0.01). In terms of the magnitude of the changes, there was appreciable overlap of the ∆On between the “Level Change” vs. “Temporal Change” comparisons. However, there was little such overlap for ∆Mid, suggesting that the similar lack of overlap of ∆WI between the two comparisons was largely driven by the changes in midpoint activity. Unlike the onset and midpoint, there was evidence for a consistent effect of level on the offset. The changes in offset (∆Off) due to level were typically positive (p = 0.01 in HG, 0.15 in STG). As with ∆On, there was considerable overlap of the ∆Off between the “Level Change” and “Temporal Change” comparisons. Nevertheless, the ∆Off due to a stimulus temporal change were always positive, and, on average, larger than the changes due to level. Thus, while level appears to have an effect on response offset, the influence of stimulus temporal properties was both larger (on average) and more consistent in the direction of the effect. Altogether, these analyses of ∆WI, ∆On, ∆Mid, and ∆Off indicate that response waveshape is affected in a more systematic manner by changes in stimulus temporal characteristics than by changes in stimulus level. Insensitivity of waveshape to sound bandwidth in posterior auditory cortex In another set of experiments we demonstrated that response waveshape was not systematically affected by changes in stimulus bandwidth. Altogether, these experiments were designed to allow for comparisons both within and across the factors of rate and bandwidth, for two different center frequencies of band-limited stimuli. In particular, responses were collected to broadband noise burst trains and narrowband trains composed of either tone bursts (3 sessions) or filtered (1/3 octave) noise bursts (2 sessions). The repetition rates were either 2/s or 35/s, and the center frequencies of the narrowband bursts were either 500 Hz or 4 kHz. In one session the narrowband stimuli (4 kHz tone bursts at 2/s and 35/s) did not satisfy our criterion of at least three active voxels (in either HG or STG), so this session was excluded from the sensitivity analysis. Results 147 Changes in stimulus bandwidth from broadband to narrowband did not have a consistent affect on the response dynamics. The changes in waveshape index (∆WI) between broadband and narrowband trains of the same rate were clustered around zero in both HG and STG (p > 0.4, signed rank test; Figure 4-7, top row, “Bandwidth Change” columns). In contrast, for the same sessions, the ∆WI due to a change of rate for trains of the same bandwidth were always positive (p = 0.02 in HG and STG; “Rate Change” columns in Figure 4-7), and in all but one instance (in STG) were greater than the ∆WI due to the bandwidth change. The lack of a consistent effect of bandwidth on ∆WI also held for the individual waveshape components (Figure 4-7, bottom three rows). The ∆On, ∆Mid, and ∆Off due to a change in bandwidth were approximately evenly distributed around zero (p = 0.06 for ∆On in STG; p = 0.05 for ∆Off in HG; otherwise, p > 0.35). In contrast, the ∆On, ∆Mid, and ∆Off due to a change in rate were consistently the same sign (p ≤ 0.03). These results indicate that bandwidth did not have a consistent effect on the sustained vs. phasic nature of a response waveform, in contrast to the highly consistent and robust effects of rate. Response waveshapes throughout auditory cortex for music and 35/s noise bursts Altogether, the results of the single-slice experiments indicate that response waveshape depends strongly on stimulus temporal characteristics, but not level or bandwidth. However, the data supporting these conclusions were obtained for a limited region of auditory cortex – namely, the most posterior aspect of HG and immediately lateral STG. Three multislice experiments using a low-rate stimulus (music) and a high-rate stimulus (35/s noise bursts, 88% STF) were designed to test whether or not stimulus temporal characteristics have a profound influence on response waveshape throughout auditory cortex. By imaging multiple slices, the imaged volume was guaranteed to include the full array of cytoarchitectonically or histologically-defined auditory areas in humans, including both primary and non-primary auditory cortex (Galaburda and Sanides 1980; Rivier and Clarke 1997; Morosan et al. 2001). Figure 4-8 displays a spatial map of the WI for one of the sessions using multiple slices. The results are representative of what was observed in the other two subjects. For both music and 35/s 148 Chapter 4: Sound envelope determines fMRI time-pattern Figure 4-7: Changes in four different measures, due to either a change in stimulus bandwidth (unfilled symbols) or a change in rate (filled symbols). All values represent a difference based on a within-session comparison. The values in the “Bandwidth Change” columns are the difference in each measure between a broadband noise burst train and a narrowband train of the same rate. The narrowband stimuli were either tone bursts or filtered noise bursts, with center frequencies (Fc) of either 500 Hz or 4 kHz. (Empty diamonds: Rate of the broadband and narrowband trains was 35/s and Fc of the narrowband train was 500 Hz; Diamonds with ‘+’: Rate = 35/s, Fc = 4 kHz; Empty squares: Rate = 2/s, Fc = 500 Hz; Squares with ‘+’: Rate = 2/s, Fc = 4 kHz). The values in the “Rate Change” columns are the difference in each measure between a 35/s and 2/s train. (Circles: Trains at both rates were broadband; Up Triangles: Trains were narrowband with Fc = 500 Hz; Down Triangles: Trains were narrowband with Fc = 4 kHz). The “Rate Change” values were obtained in the same sessions that yielded the “Bandwidth Change” values. Results 149 Effect of Bandwidth on Waveshape Heschl's Gyrus 0.6 Rate Change Superior Temporal Gyrus Rate Change 0.4 Change in Waveshape 0.2 Index (∆WI) Bandwidth Change Bandwidth Change 0 1.0 Change in Onset (∆On) 0 -1.0 0 Change in Midpoint (∆Mid) -1.0 1.0 Change in Offset (∆Off) 0 Figure 4-7 150 Chapter 4: Sound envelope determines fMRI time-pattern Figure 4-8: Maps of WI for music and 35/s noise bursts (88% STF) for a single subject across a broad expanse of auditory cortex. Each panel shows a color WI map (with in-plane resolution of 3.1 x 3.1 mm) superimposed on a T1-weighted anatomic image (acquired at a resolution of 1.6 x 1.6 mm). The WI is displayed for each “active” voxel (p < 0.001 in the activation maps). Slice position is referenced relative to the most posterior slice with a distinct Heschl’s gyrus. This slice (denoted “0 mm”) most likely encompassed a sizable portion of primary auditory cortex in the anteriorposterior dimension (Liegeois-Chauvel et al. 1991; Rademacher et al. 1993; Rademacher et al. 2001), and was the slice-plane employed in the single-slice experiments. In this subject, the posteromedial aspect of Heschl’s gyrus occurred in the same imaging slice for both hemispheres. Images are displayed in radiological convention, so the subject's right is displayed on the left. Results 151 Spatial Maps of Waveshape Index Music 35/s Noise Bursts right left right left 28 mm 21 mm 14 mm 7 mm 0 mm (posterior HG) -7 mm sustained responses phasic responses Figure 4-8 152 Chapter 4: Sound envelope determines fMRI time-pattern noise bursts, active voxels occurred across a wide expanse of the superior temporal plane. However, the WIs for music and 35/s noise bursts occupied distinct ranges in all slices, indicating that the response waveshape to these two stimuli was dramatically different throughout cortex. In particular, the majority of active voxels for music had a WI less than 0.2, indicating primarily sustained responses. In contrast, for the 35/s noise burst train, the majority of active voxels had a WI greater than 0.4, indicating “intermediate” or phasic responses. The voxels activated by the two stimuli were largely overlapping, indicating that the same regions of cortex could show either phasic or sustained responses depending on stimulus.11 Figure 4-9 illustrates the difference in response waveshape between these two stimuli for HG and STG from the left hemisphere of slice “0 mm” in Figure 4-8 (i.e., the slice employed in the “single-slice” experiments). Similar differences in waveshape were observed in the other slices and subjects (consistent with the spatial maps of WI). Overall, the dramatic difference in waveshape for music vs. 35/s noise bursts in widespread cortical areas indicates that stimulus temporal characteristics strongly influence waveshape throughout auditory cortex. To further quantify response waveshape across cortex, we computed the average WI for the three multislice sessions as a function of slice position, for each of three anatomically defined regions: HG, STG, and AM. The results, shown in Figure 4-10, confirm that the WIs for music and 35/s noise bursts occupied distinct ranges in all slices, with music eliciting sustained responses throughout cortex, and the noise bursts eliciting phasic responses throughout cortex. At a more refined level of analysis, Figure 4-10 suggests the possibility of small variations in WI for a given stimulus, either 1) across slices for a given region, or 2) across regions for a given slice (e.g., the differences between HG, STG, and AM for 35/s noise bursts). While no conclusions could be drawn in this regard, due to the small number of total hemispheres and variability across hemispheres, the data do suggest a waveshape difference between HG and STG that proved to be significant in the more extensive single-slice database. 11 The activation maps for the 35/s noise burst trains were based on approximately four times more data than for the music maps. If equal amounts of data had been obtained for each stimulus, it is likely that there would have been more active voxels in the music map relative to the noise bursts (for a constant p-value threshold). Results 153 Music Percent Signal Change 4 35/s Noise Bursts 3 Heschl's Gyrus 3 2 Superior Temporal Gyrus 2 1 1 0 -1 0 0 30 0 30 Time (s) Figure 4-9: Responses to music and 35/s noise bursts (88% STF) for HG and STG, averaged across the “active” voxels in the left hemisphere of slice “0 mm” in Figure 4-8. 154 Chapter 4: Sound envelope determines fMRI time-pattern Waveshape Index as a Function of Slice Position and Region 1.0 0.8 STG 0.6 WI 35/s Noise Bursts HG AM 0.4 HG 0.2 STG 0 (anterior) 21 14 7 0 -7 Music -14 (posterior) Position relative to (probable) primary auditory cortex (mm) Figure 4-10: Average WI for the three multislice sessions as a function of slice position, for each of three anatomically defined regions: HG, STG, and AM. These regions were defined on a slice-byslice basis, separately for each hemisphere. The WI for each region/slice/hemisphere combination was computed for the “active” voxels (p < 0.001) for each stimulus (provided there were at least three such voxels). The resulting WIs were then averaged across hemispheres, after first aligning the hemispheres in the anterior-posterior dimension according to the most posterior slice with a distinct Heschl’s gyrus (denoted “0 mm”; see Figure 4-8 caption). In some hemispheres, for the slice posterior to “0 mm”, there appeared to be a remnant of Heschl’s gyrus emerging from insular cortex (rather than the superior temporal plane). This cortex was classified as HG (thence the data point for HG at “-7 mm”). For all regions and slices, only data points that represent an average of WIs across at least three hemispheres (out of six total) are included. (This criterion, in conjunction with the “three active voxels” requirement, resulted in no values for the AM region for the music stimulus). Results 155 Differences in response waveshape between cortical areas In the single slice experiments examining posterior auditory cortex, there was clear evidence that responses in STG tended to be slightly more phasic than those in HG. This difference between HG and STG is already suggested by several aspects of the average response waveforms in Figure 4-2. For example, the percentage decline following the on-peak was noticeably greater in STG than HG for continuous noise, 35/s clicks, and 10/s noise bursts. Additionally, the off-peak was noticeably larger or more distinct in STG for 35/s noise bursts, continuous noise, and 35/s clicks. A statistically significant trend for more phasic responses in STG relative to HG was confirmed for the stimuli that typically evoked “intermediate” or phasic responses, but not for the stimuli that typically evoked sustained responses. Figure 4-11 (top left) plots the WI in HG vs. STG. The stimuli were divided into five broad “groups”: (1) a “high rate or STF” group that included the 100/s clicks and the 35/s bursts (both broadband and narrowband) with a STF of 88% (filled diamonds), (2) continuous noise (filled squares), (3) an “intermediate rate or STF” group that included the 10/s noise bursts, 35/s clicks, and 35/s noise bursts with a STF of 25% or 50% (filled stars), (4) a music and speech group (unfilled triangles), and (5) the 2/s bursts (including tone and noise bursts, with STFs of 5% and 50%; unfilled circles). For the “high rate or STF” group, continuous noise, and the “intermediate rate or STF” group, the WI was greater in STG than HG in most cases (Table 4-2). But for the music and speech group, and the 2/s bursts, the difference in WI between STG and HG was not significant. Thus, the evidence for more phasic responses in STG was strongest for the stimuli that tended to evoke phasic or “intermediate” responses. 156 Chapter 4: Sound envelope determines fMRI time-pattern 1.0 WI On 3.0 0.8 2.0 0.6 Superior Temporal Gyrus 0.4 1.0 0.2 0 0 0 0.2 0.4 0.6 0.8 1.0 4.0 0 1.0 2.0 3.0 2.0 Mid Off 3.0 1.0 2.0 0 1.0 0 -1.0 -1.0 -1.0 0 1.0 2.0 3.0 4.0 -1.0 0 1.0 2.0 Heschl's Gyrus Figure 4-11: Scatterplots of WI, On, Mid, and Off in HG vs. STG. A separate point is plotted for each stimulus of each imaging session (provided there were at least three “active” voxels in both HG and STG for a given stimulus). The data set consists of all the responses obtained in the single-slice experiments that did not involve a task. Filled diamonds: 100/s clicks or the 35/s bursts (broadband and narrowband) with a STF of 88%; filled squares: continuous noise; filled stars: 10/s noise bursts, 35/s clicks, or 35/s noise bursts with a STF of 25 or 50%; unfilled circles: all tone and noise bursts with a rate of 2/s; unfilled triangles: music or speech. The following pairs of data points (HG,STG) did not fall within the chosen axis limits (but were included as part of the statistical analysis in Table 4-2) – On: (-0.82,-0.66), (3.16,4.11); Mid: (2.9,4.9), (-1.30,-1.32), (-0.95,-1.77). Results 157 STG vs. HG Stimulus group High rate or STF Continuous noise Intermediate rate or STF Music or Speech 2/s bursts ∆WI 31/40 (<0.001) 0.062 ± 0.013 5/5 (0.06) 0.148 ± 0.033 15/18 (<0.001) 0.113 ± 0.024 13/21 (0.13) 0.030 ± 0.017 19/33 (0.14) 0.025 ± 0.012 ∆On 23/40 (0.15) 0.125 ± 0.071 5/5 (0.06) 0.371 ± 0.067 17/18 (<0.001) 0.473 ± 0.085 10/21 (0.59) -0.032 ± 0.081 22/33 (0.02) 0.122 ± 0.064 ∆Mid 18/40 (0.14) -0.050 ± 0.05 2/5 (1.0) 0.000 ± 0.200 7/18 (0.47) -0.106 ± 0.154 17/21 (0.01) 0.398 ± 0.153 24/33 (0.006) 0.229 ± 0.074 ∆Off 32/40 (<0.001) 0.259 ± 0.055 5/5 (0.06) 0.386 ± 0.074 14/18 (0.01) 0.353 ± 0.114 19/21 (<0.001) 0.530 ± 0.099 28/33 (<0.001) 0.248 ± 0.065 Table 4-2: Differences in various measures between responses in STG and HG that were obtained in the same imaging session. See text for stimuli included in each stimulus group. First row in each cell is N+/Ntotal (see Table 4-1 caption) and p-value (in parenthesis) resulting from a signed rank test. Second row is the mean difference (STG minus HG) ± standard error. Examination of the individual response components (On, Mid, and Off) provided further insight into the nature of the response differences between STG and HG. In general, both On and Off tended to be larger in STG than HG, for all five broad “groups” of stimuli (a clear exception being On for the music and speech group; Table 4-2 and Figure 4-11).12 However, Mid only showed a difference between STG and HG for certain groups (music and speech, 2/s noise bursts; Figure 4-11, lower left). Overall, the following picture emerges (even if the details do not hold exactly, in a statistical sense, for all three components of all five stimulus groups; Table 4-2). For the stimuli that typically evoked sustained responses, the amplitude of On and/or Off tended to be larger in STG. However, Mid also increased in STG for these stimuli, so that the end result was simply larger sustained responses in STG, and no consistent difference in the WI between the two regions. In 12 Note that Off was frequently negative in both HG and STG for the 2/s bursts, and the music and speech group (although less often in STG; Figure 4-11, lower right). These negative values must be interpreted in the framework of the general linear model from which the component amplitudes were derived. Specifically, a negative value for Off acts to model a response with a faster signal decline following stimulus offset than the signal decline modeled in the sustained and ramp components of the basis set. In this sense, one interpretation of the larger values for Off in STG is that sustained responses in STG tend to have slightly slower signal declines than those in HG. 158 Chapter 4: Sound envelope determines fMRI time-pattern contrast, for the stimuli that typically evoked “intermediate” or phasic responses, On and Off tended to be larger in STG, but Mid showed no difference, hence the more phasic responses on STG. We computed two additional, normalized measures for quantifying the differences in signal decline between STG and HG, as well as the strength of the off-response in the two regions. The first was the ratio of Mid to the amplitude of the on-peak (OnPeak; defined as the amplitude of the onset (On) plus sustained (Sust) components of the basis set). Conceptually, this ratio reflects the amount of ongoing activity approximately midway through the “sound on” period relative to the activity evoked by the onset of the stimulus. For the “high-rate or STF” group, continuous noise, and the “intermediate rate or STF” group, this ratio was significantly lower in STG than HG (p ≤ 0.06 for all three groups, signed rank test; Figure 4-12, left). However, no significant difference was observed for 2/s bursts or the music and speech group (p > 0.6). These results are consistent with what might be expected based on the population data for On and Mid. The second normalized measure was the ratio of Off to OnPeak. (It is difficult to assess from Figure 4-11 whether this ratio might differ between STG and HG). This ratio was consistently higher in STG than HG for all stimulus groups (p ≤ 0.06; Figure 4-12, right). Together, these ratios provide greater detail regarding the two transient peaks of the phasic response. Namely, for the intermediate and phasic responses, the signal decline from the on-peak was typically larger in STG than HG, and the relative amount of activity evoked by stimulus termination versus stimulus onset was also typically larger in STG. Left-right differences in response waveshape Responses from the left hemisphere tended to be more phasic than those from the right hemisphere for the “high rate or STF” group of stimuli, but not for the other groups of stimuli. Figure 4-13 plots the WI in the left vs. right hemisphere for the single slice experiments examining posterior auditory cortex. For the “high rate or STF” group, the WI was consistently greater in the 13 Recall that the onset component of the basis set models the transient aspect of the response following stimulus onset. The overall amplitude of the response following stimulus onset is best modeled by the sum of the onset and sustained components of the basis set (i.e., On plus Sust). Results 159 1.5 1.5 Superior Temporal Gyrus Mid / OnPeak Off / OnPeak 1.0 1.0 0.5 0.5 0 0 -0.5 -0.5 -0.5 0 0.5 1.0 1.5 -1.0 -1.0 -0.5 0 0.5 1.0 1.5 Heschl's Gyrus Figure 4-12: Scatterplots of Mid/OnPeak and Off/OnPeak in HG vs. STG. OnPeak was defined as the sum of onset and sustained components of the basis set (i.e., On plus Sust). The symbols are the same as Figure 4-11, as is the data included, except two data points were excluded that had a negative OnPeak in STG. One data point did not fall within the chosen axes for Mid/OnPeak (-0.95,-1.25). left hemisphere, for both HG and STG (HG: L > R in 29 of 34 cases; p < 0.001, signed rank test; ∆WI = 0.120 ± 0.020; STG: L > R in 27 of 38 cases; p = 0.002; ∆WI = 0.074 ± 0.021). However, for all the other groups of stimuli, there was no significant difference in the WI between left and right hemisphere (p > 0.2). The trend for more phasic responses in left hemisphere for the “high rate or STF” group was due to a greater signal decline in the left hemisphere following the on-peak [i.e., Mid was consistently smaller in the left hemisphere for the “high rate or STF” group (p < 0.01 for both HG and STG), but On and Off were not different between the two hemispheres (p > 0.1)]. 160 Chapter 4: Sound envelope determines fMRI time-pattern Waveshape Index Heschl's Gyrus Superior Temporal Gyrus Right Hemisphere 1.0 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Left Hemisphere Figure 4-13: Scatterplots of WI in the left vs. right hemisphere for HG and STG. The symbols are the same as Figure 4-11, as is the data included (provided there were at least three “active” voxels in each hemisphere for a given region). Discussion 161 DISCUSSION The present study considerably extends our knowledge regarding the sound features that are coded in the time-pattern of fMRI responses in human auditory cortex. In particular, we found that response waveshape in posterior auditory cortex is primarily determined by sound temporal envelope characteristics such as rate and sound-time fraction, rather than sound level or bandwidth. Several multislice experiments confirmed that the influence of sound temporal characteristics on response waveshape applied throughout widespread regions of auditory cortex, in that a low rate stimulus (i.e., music) evoked sustained responses throughout cortex, while a high rate stimulus (35/s noise burst train) evoked phasic responses throughout cortex. Several aspects of the study should be kept in mind regarding the extent to which our results allow predications of the time-pattern of fMRI responses for other stimuli or cortical areas. First, our conclusion that sound level did not have a systematic effect on response waveshape was based on a 30-40 dB variation in level, ranging from moderate to loud intensities. It is possible that changes in waveshape might have been observed if a larger range of levels had been examined (i.e., soft vs. loud intensities). Changes in the time-pattern of population neural activity as a function of level are not easily predictable, since there are competing influences – higher levels can lead to increased neural entrainment to successive stimuli in a train (Phillips et al. 1989), which might be expected to yield more sustained fMRI responses, but higher levels can also increase the duration of forward inhibition (Brosch and Schreiner 1997), potentially resulting in more pronounced signal adaptation and hence more phasic responses. Our results empirically demonstrate that waveshape is not strongly affected by stimulus level for the levels commonly employed in fMRI experiments. Secondly, it is conceivable that changes in level or bandwidth could affect response waveshape in areas outside posterior auditory cortex, since the examination of these two sound characteristics was restricted to experiments studying posterior cortex. Nonetheless, it seems quite likely sound temporal envelope will have a dramatic influence on the time-pattern of fMRI responses throughout auditory cortex for a whole range of stimulus levels and bandwidths. 162 Chapter 4: Sound envelope determines fMRI time-pattern Response waveshape: hemodynamic vs. neural factors While the relationship between neural activity and fMRI responses is not fully understood, it is generally accepted that neural activity and image signal are ultimately linked through a chain of metabolic and hemodynamic events (e.g., Jueptner and Weiller 1995; Villringer 1999; Heeger et al. 2000; Rees et al. 2000; Logothetis et al. 2001). These events include changes in blood flow, blood volume, and oxygen consumption (Fox and Raichle 1986; Kwong et al. 1992; Ogawa et al. 1993; Malonek et al. 1997). Since hemodynamic changes occur over the course of seconds, fMRI effectively provides a temporally low-pass filtered view of neural activity. More specifically, since fMRI is sampling activity over small volumes of brain (i.e., voxels), the responses can be thought of as showing the time-envelope of population neural activity on a voxel-by-voxel basis. Even though hemodynamic factors can influence the waveshape of fMRI responses (Buxton et al. 1998; Mandeville et al. 1998; Mandeville et al. 1999), we do not believe that hemodynamics account for the differences seen between stimuli, nor are hemodynamics a particularly plausible explanation for the differences between STG and HG. The reasoning for this first conclusion follows from the fact that the same voxels are capable of showing either phasic or sustained responses depending on the stimulus, which is unlikely unless these waveshape differences are mediated by differences in underlying neural activity. This argument does not work for ascribing the difference between STG and HG to neural factors, since these are two separate regions, and it is known that there can be spatial heterogeneity in tissue hemodynamics (Chen et al. 1998; Davis et al. 1998). However, a statistically significant difference in WI between the two regions was only observed for the stimuli that generally evoked “intermediate” or phasic responses (i.e., 10/s noise bursts, 35/s bursts and clicks, 100/s clicks, and continuous noise). The lack of a similar difference for the stimuli that generally evoked sustained responses (i.e., 2/s bursts, music, and speech) argues against a simple hemodynamic explanation, since that would require that the hemodynamics themselves be a function of the response waveshape. More specifically, it seems unlikely that all aspects of the response differences between STG and HG could be hemodynamic in origin. For example, assume for the sake of argument that the tendency for the amplitude of On and Off to be greater in STG than HG is due a higher Discussion 163 hemodynamic “gain” in STG (or any other “non-neural” factor that can be characterized by a single scalar coefficient). This hemodynamic difference could also then explain the tendency for Mid to be greater in STG for the stimuli that evoke sustained responses. However, by necessity, the decrease in Mid in STG relative to HG for the stimuli that evoke phasic responses must then be due to some other factor than just a simple difference in hemodynamic gain. One logical and straightforward explanation is that there is a true difference between STG and HG in population neural activity during the midpoint of the stimulus. Alternatively, to maintain that all the response differences between the two regions are solely hemodynamic requires invoking more complicated hemodynamic factors, such as an interplay between the dynamics of blood flow and volume changes that is itself dependent on the previous response history (i.e., highly nonlinear). All-in-all, while we cannot conclusively refute a possible hemodynamic contribution to the observed differences in transient onset and offset activity between STG and HG (i.e., one can still construct, with a little imagination, scenarios for which hemodynamics might suffice to explain all the differences), the data suggests that the differences between these two regions likely reflects underlying differences in the timepatterns of population neural activity between the two regions. Compared to HG, it appears that neural activity in STG is concentrated more at sound onset and offset, relative to the activity during the middle of the sound. Response waveshape: neural adaptation and off-responses Given this framework, we can interpret the waveshape of phasic responses in light of known neurophysiological mechanisms (Chap. 2). The pronounced signal decline that creates a distinct onpeak (i.e., high On value relative to Mid) indicates that a general process of neural adaptation is prominently shaping fMRI responses to stimuli within certain temporal regimes. Similarly, the signal increase following stimulus termination that creates an off-peak (i.e., high Off value relative to Mid) indicates the presence of neural off-responses. Together, the degree of adaptation and the presence or absence of an off-response constitute the two main neural mechanisms that determine whether the response to a prolonged stimulus is sustained or phasic. Both of these neural mechanisms appear to be in operation for a variety of stimuli and across multiple cortical areas. 164 Chapter 4: Sound envelope determines fMRI time-pattern The results further indicate greater adaptation and more robust off-responses in population neural activity for STG compared to HG. The ratio of Mid to OnPeak (the response amplitude following stimulus onset) was consistently smaller in STG for the stimuli that typically evoked “intermediate” or phasic responses, indicating greater response adaptation in STG following stimulus onset. (No similar difference between STG and HG was observed for 2/s bursts, music, and speech, which, as above, argues that this regional difference in signal adaptation was not simply hemodynamic). Increases in the strength of adaptation from primary (HG) to non-primary (STG) areas in humans can be similarly inferred from evoked potentials recorded by Howard et al. (2000) using implanted electrodes. The ratio of Off to OnPeak indicated that, relative to the activity evoked by the stimulus onset, the activity evoked by the stimulus offset was consistently greater in STG. In alert macaque monkey, a greater percentage of neurons in the non-primary caudomedial field (CM) have an excitatory response after the offset of tone and noise bursts than in primary auditory cortex (AI; Recanzone 2000). In summary, there appears to be two differences between the responses of STG and HG to high-rate or continuous stimuli: 1) a tendency for enhanced transient activity at sound onset and offset in STG relative to HG, and 2) a particular enhancement of the response following sound offset (relative to onset) in STG. Response waveshape and sound temporal envelope characteristics: rate and sound time fraction Variations in the temporal attributes of a stimulus had a more systematic effect on response waveshape than changes in sound level or bandwidth in posterior auditory cortex. Two particular stimulus temporal characteristics that we varied were the repetition rate and sound-time fraction (STF) of a noise burst train. For the combinations of rate and STF employed in this study, the results indicate that for a fixed STF, higher repetition rates result in fMRI responses with more prominent transient components, as is also the case for higher STFs at a fixed rate. The direction of these changes suggests that the silent interval between individual bursts may be particularly important in determining response waveshape. This follows from the fact that for both cases (i.e., increasing rate for a fixed STF, and increasing STF for a fixed rate), the silent interval changes in a consistent Discussion 165 direction (specifically, it decreases), whereas the duration of the individual noise bursts changes in opposing directions (increasing in the first case, but decreasing in the second). The likely importance of the silent interval as a primary factor influencing the amount of ongoing neural activity to a repetitive stimulus is consistent with microelectrode recordings in animals (de Ribaupierre et al. 1972; Hocherman and Gilant 1981; Phillips 1985; Brosch and Schreiner 1997). Psychophysically, the silent interval (more so than rate or STF per se) is related to the fusion of a noise burst train into continuous noise (Symmes et al. 1955) (although the Symmes et al. study also shows that the silent interval alone is not sufficient to describe fusion threshold for all rate-STF combinations). An important role for the silent interval is not surprising given that this interval obviously controls the available “recovery” time between bursts, and thence the degree of adaptation to successive bursts. One way to more explicitly examine the effect of silent duration on fMRI responses would be to use noise burst trains with a fixed silent duration between bursts (e.g., 25 ms). The resulting stimuli would have a negative correlation between rate and STF, and would therefore compliment the results obtained using noise burst trains with a fixed burst duration (i.e., positive correlation between rate and STF). While the present study was not designed to parametrically explore the entire rate-STF space (or alternatively, burst duration and silent interval) in a systematic manner, we can nevertheless make predictions about the response waveforms that will be evoked by a variety of noise or tone burst trains. The results clearly indicate that phasic responses are likely for trains having both high rates and STFs, whereas sustained responses are likely for trains in the opposite corner of the rateSTF space (i.e., low rate low and STF). For trains in the other two corners of the rate-STF space, the effects of rate and STF on response waveform may be in opposition, resulting in responses that are “intermediate” to the two extremes of phasic and sustained. This was indeed the case for the sessions with responses to a high rate (35/s) noise burst train with a relatively low STF (25%). Additionally, intermediate responses were evoked by 35/s clicks (high rate, low “STF”). No responses were ever obtained for trains with a low rate, but high STF, so we have no information regarding fMRI responses to stimuli in this corner of the temporal space. At a sufficiently high STF 166 Chapter 4: Sound envelope determines fMRI time-pattern (i.e., short silent duration between bursts), the response to a low rate train should approach that of continuous noise.14 However, it is possible that this transition does not occur until very high STFs (i.e., silent durations so brief that the “train” becomes perceptually indistinguishable from continuous noise; see below). A study by Jäncke et al. (1999) suggests that fMRI responses with prominent transient activity can occur for low-rate trains with intermediate STFs. In that study, low-rate (1/s) tone bursts trains with STFs ranging from 20% to 80% evoked fMRI responses with a prominent onset component. They report a signal decline in both primary and secondary auditory cortex of about 60% by 12 seconds following train onset. In the present study, the equivalent declines in the average responses in HG and STG following the onset of 2/s noise burst trains with STFs of 50% were smaller (40-50% declines). While we found no systematic effect of bandwidth on response waveshape for trains with a 88% STF, the larger percentage signal declines in the Jäncke et al. study could reflect an interaction between the effects of bandwidth and STF on response waveshape that the present study was not designed to uncover (since we did not explore the effect of bandwidth on waveshape for multiple STFs). Additionally, Jäncke et al. report no differences in response between STFs of 20, 40, 60, and 80%, which appears inconsistent with our observation of an increase in the On component for a 2/s noise burst train with a 50% STF relative to a one with a 5% STF. One possibility is that the analysis employed by Jäncke et al. was not sensitive to the magnitude of the changes in transient onset activity that might have actually occurred. Alternatively, it is possible that there is a “nonlinear” relationship between changes in response waveshape and changes in STF, such that changes in onset activity occur when STF is increased from 5% to 50%, but not when STF is increased from 20% to 80%. Finally, an interaction between bandwidth and STF could again 14 Note that the noise burst trains employed in the present study would not have “merged” into continuous noise even for very high STFs, since 1) the individual noise bursts within a train were “frozen” repetitions of each other, and 2) the rise/fall time of the noise bursts was not instantaneous. Indeed, in five sessions with responses to both continuous noise and a 35/s noise burst train (88% STF), the WI was always larger for the noise burst train in both HG and STG (mean difference ~0.24 in both structures). The most striking difference between the response waveshapes for these two stimuli was a larger off-response for the noise burst train. This raises the intriguing possibility that the periodicity inherent in the fine-time structure of the “frozen” noise burst train, or perhaps the modulated percept of this train relative to continuous noise, may have contributed to the generation of larger off-responses. Discussion 167 account for the inconsistent effect of changes in STF on response waveshape between the present study and Jäncke et al. Altogether, further experiments appear necessary to resolve the changes in response dynamics that occur as STF is varied for low-rate trains. At a more general level, extrapolating beyond just STF and rate, the amplitude envelope of many stimuli composed of repetitive tokens can be broadly described by three temporal factors: the dominant modulation rate, the depth of the amplitude modulation, and the rapidity of the envelope changes. Conceptually, these latter two factors broadly constitute the “continuity” of the stimulus. Stimuli with low modulation depths and/or slow envelope changes have a high degree of stimulus “continuity”. In the present study, the modulation depth of all the noise and tone bursts trains was 100%, and the individual bursts had a rapid rise/fall time. Thus, sound-time fraction was a convenient single parameter for summarizing stimulus continuity. However, by replacing STF with the more general notion of stimulus continuity, we can hypothesize about the response waveshapes that would be evoked by a broader range of stimuli than just noise bursts. For example, we expect that sinusoidally amplitude modulated (SAM) noise with a low modulation rate (e.g., 2 Hz), but only a shallow modulation depth (e.g., 20%), would evoke more phasic responses than SAM noise having the same low-rate, but full (100%) modulation, since the amplitude envelope of the former stimulus displays more continuity (i.e., it approaches continuous noise). Consistent with the importance of modulation rate in the present study, Giraud et al. (2000) showed that the fMRI response in auditory cortex to fully modulated (100%) SAM noise changed from a sustained response at low modulation rates (≤ 8 Hz) to a transient response at high rates (in particular, a response with a transient onset component). Altogether, across a wide variety of stimuli, we propose that modulation rate and stimulus continuity are two attributes of the temporal envelope of a stimulus that together determine the time-pattern of cortical fMRI responses. Relationship between sound perception, fMRI time-pattern, and neural activity The changes in fMRI response dynamics as a function of stimulus temporal characteristics may correlate with corresponding changes in the perception of the stimulus. The perceptual factor most closely related to response waveshape was whether or not the stimulus was composed of 168 Chapter 4: Sound envelope determines fMRI time-pattern distinct and countable elements. Previously, Miller and Taylor (1948) described four perceptual phases as rate is increased for repeated bursts of noise (at a constant STF): (1) successive, distinct bursts, (2) a train of bursts having a pitch character, (3) a noise differing slightly in quality from continuous noise, and (4) a continuous noise. For the purpose of the present discussion, we suggest an additional phase (“1b”) beginning for rates around 10/s, where the stimuli in a train begin to fuse perceptually. Stimuli at these rates do not evoke a perception of pitch, but rather a percept of “rhythm”, “unitization”, or “roughness” (Fastl 1977, 1983; Royer and Robin 1986; Robin et al. 1990; Fishman et al. 2000) depending on the precise combination of rate and sound-time fraction. It was for stimuli in this rate regime that response waveshape began changing from sustained to phasic. There was no indication that waveshape was related to pitch perception, in that trains of 35/s noise bursts and 100/s clicks both evoked phasic responses, but only the 100/s clicks evoked a pitch percept. In summary, response waveshape was most closely correlated to the ability to perceive distinct elements in an ongoing train. More generally, the generation of phasic and sustained responses may reflect the operation of some of the same processes involved in the segmentation or grouping of stimuli. For stimuli with low repetition or modulation rates, such as 2/s noise burst trains, music, and speech, the fundamental “events” of the stimulus may consist of individual noise bursts, musical notes or beats, or words or syllables. Successive neural responses to each individual event (either at onset, offset, or both) would then result in a sustained elevation of time-averaged population neural activity for these lowrate stimuli, and thence a sustained fMRI response. In contrast, for stimulus trains with high repetition rates, for which the individual bursts are no longer distinct, the train itself may be the perceptual “event”. The phasic response to high rate trains is consistent with such an interpretation, in that cortex appears to be responding just at the beginning and end of the overall train. Previously, based on differences in the time-patterns of fMRI activity as a function of repetition rate in subcortical and cortical structures (Chap. 2), we suggested that a population neural representation of the beginning and end of distinct perceptual events is weak or absent in the midbrain, begins to emerge in the thalamus, and is robust in auditory cortex. The fact that the activity evoked by high-rate and continuous stimuli in the present study was concentrated more at Discussion 169 sound onset and offset in STG than HG suggests a further progression within cortex, in which the neural coding of the beginning and end of distinct perceptual events is more accentuated in nonprimary (STG) as compared to primary auditory cortex (HG). 170 Chapter 4: Sound envelope determines fMRI time-pattern REFERENCES Auker CR, Meszler RM and Carpenter DO. Apparent discrepancy between single-unit activity and [14C]deoxyglucose labeling in optic tectum of the rattlesnake. J Neurophysiol 49: 1504-1516, 1983. Belin P, Zatorre RJ, Hoge R, Evans AC and Pike B. Event-related fMRI of the auditory cortex. Neuroimage 10: 417-429, 1999. Brosch M and Schreiner CE. Time course of forward masking tuning curves in cat primary auditory cortex. J Neurophysiol 77: 923-943, 1997. Buxton RB, Wong EC and Frank LR. Dynamics of blood flow and oxygenation changes during brain activation: The balloon model. Magn Reson Med 39: 855-864, 1998. Chen W, Zhu XH, Toshinori K, Andersen P and Ugurbil K. Spatial and temporal differentiation of fMRI BOLD response in primary visual cortex of human brain during sustained visual simulation. Magn Reson Med 39: 520-527, 1998. Davis TL, Kwong KK, Weisskoff RM and Rosen BR. Calibrated functional MRI: Mapping the dynamics of oxidative metabolism. Proc Natl Acad Sci 95: 1834-1839, 1998. de Ribaupierre F, Goldstein MH, Jr. and Yeni-Komshian G. Cortical coding of repetitive acoustic pulses. Brain Res 48: 205-225, 1972. Edmister WB, Talavage TM, Ledden PJ and Weisskoff RM. Improved auditory cortex imaging using clustered volume acquisitions. Hum Brain Mapp 7: 89-97, 1999. Elliott MR, Bowtell RW and Morris PG. The effect of scanner sound in visual, motor, and auditory functional MRI. Magn Reson Med 41: 1230-1235, 1999. Fastl H. Roughness and temporal masking patterns of sinusoidally amplitude modulated broadband noise. In: Psychophysics and physiology of hearing, edited by Evans EF and Wilson JP. London: Academic Press, 1977, p. 403-417. Fastl H. Fluctuation strength of modulated tones and broadband noise. In: Hearing-physiological bases and psychophysics, edited by Klinke R and Hartmann R. Berlin: Springer-Verlag, 1983, p. 282-288. Fishman YI, Reser DH, Arezzo JC and Steinschneider M. Complex tone processing in primary auditory cortex of the awake monkey. I. Neural ensemble correlates of roughness. J Acoust Soc Am 108: 235-246, 2000. Fox PT and Raichle ME. Focal physiological uncoupling of cerebral blood flow and oxidative metabolism during somatosensory stimulation in human subjects. Proc Natl Acad Sci 83: 1140-1144, 1986. Friston KJ, Ashburner J, Frith CD, Poline J-B, Heather JD and Frackowiak RSJ. Spatial registration and normalization of images. Hum Brain Mapp 2: 165-189, 1995. Friston KJ, Williams S, Howard R, Frackowiak RSJ and Turner R. Movement-related effects in fMRI time-series. Magn Reson Med 35: 346-355, 1996. References 171 Fujita N. Extravascular contribution of blood oxygenation level-dependent signal changes: A numerical analysis based on a vascular network model. Magn Reson Med 46: 723-734, 2001. Galaburda A and Sanides F. Cytoarchitectonic organization of the human auditory cortex. J Comp Neurol 190: 597-610, 1980. Gati JS, Menon RS, Ugurbil K and Rutt BK. Experimental determination of the BOLD field strength dependence in vessels and tissue. Magn Reson Med 38: 296-302, 1997. Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R and Kleinschmidt A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000. Guimaraes AR, Melcher JR, Talavage TM, Baker JR, Ledden P, Rosen BR, Kiang NY-S, Fullerton BC and Weisskoff RM. Imaging subcortical auditory activity in humans. Hum Brain Mapp 6: 33-41, 1998. Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott MR, Gurney EM and Bowtell RW. "Sparse" temporal sampling in auditory fMRI. Hum Brain Mapp 7: 213-223, 1999. Hall DA, Summerfield AQ, Goncalves MS, Foster JR, Palmer AR and Bowtell RW. Timecourse of the auditory BOLD response to scanner noise. Magn Reson Med 43: 601-606, 2000. Harms MP and Melcher JR. Understanding novel fMRI time courses to rapidly presented noise bursts. Neuroimage 9: S847, 1999. Heeger DJ, Huk AC, Geisler WS and Albrecht DG. Spikes versus BOLD: What does neuroimaging tell us about neuronal activity? Nat Neurosci 3: 631-633, 2000. Hocherman S and Gilant E. Dependence of auditory cortex evoked unit activity on interstimulus interval in the cat. J Neurophysiol 45: 987-997, 1981. Howard MA, Volkov IO, Mirsky R, Garell PC, Noh MD, Granner M, Damasio H, Steinschneider M, Reale RA, Hind JE and Brugge JF. Auditory cortex on the human posterior superior temporal gyrus. J Comp Neurol 416: 79-92, 2000. IEEE. IEEE recommended practice for speech quality measurements. IEEE Trans on Audio and Electroacoustics 17: 225-246, 1969. Jäncke L, Buchanan T, Lutz K, Specht K, Mirzazade S and Shah NJS. The time course of the BOLD response in the human auditory cortex to acoustic stimuli of different duration. Brain Res Cogn Brain Res 8: 117-124, 1999. Jueptner M and Weiller C. Review: Does measurement of regional cerebral blood flow reflect synaptic activity?--implications for PET and fMRI. Neuroimage 2: 148-156, 1995. Kwong KK, Belliveau JW, Chesler DA, Goldberg IE, Weisskoff RM, Poncelet BP, Kennedy DN, Hoppel BE, Cohen MS, Turner R, Cheng H-M, Brady TJ and Rosen BR. Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. Proc Natl Acad Sci 89: 5675-5679, 1992. Langner G. Periodicity coding in the auditory system. Hear Res 60: 115-142, 1992. Leonard CM, Puranik C, Kuldau JM and Lombardino LJ. Normal variation in the frequency and location of human auditory cortex landmarks. Heschl's gyrus: Where is it? Cereb Cortex 8: 397406, 1998. 172 Chapter 4: Sound envelope determines fMRI time-pattern Liegeois-Chauvel C, Musolino A and Chauvel P. Localization of the primary auditory area in man. Brain 114: 139-153, 1991. Logothetis NK, Pauls J, Augath M, Trinath T and Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412: 150-157, 2001. Malonek D, Dirnagl U, Lindauer U, Yamada K, Kanno I and Grinvald A. Vascular imprints of neuronal activity: Relationships between the dynamics of cortical blood flow, oxygenation, and volume changes following sensory stimulation. Proc Natl Acad Sci 94: 14826-14831, 1997. Mandeville JB, Marota JJA, Kosofsky BE, Keltner JR, Weissleder R, Rosen BR and Weisskoff RM. Dynamic functional imaging of relative cerebral blood volume during rat forepaw stimulation. Magn Reson Med 39: 615-624, 1998. Mandeville JB, Marota JJA, Ayata C, Zaharchuk G, Moskowitz MA, Rosen BR and Weisskoff RM. Evidence of a cerebrovascular postarteriole windkessel with delayed compliance. J Cereb Blood Flow Metab 19: 679-689, 1999. Melcher JR, Sigalovsky IS, Guinan JJ, Jr. and Levine RA. Lateralized tinnitus studied with functional magnetic resonance imaging: Abnormal inferior colliculus activation. J Neurophysiol 83: 1058-1072, 2000. Miller GA and Taylor WG. The perception of repeated bursts of noise. J Acous Soc Am 20: 171182, 1948. Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T and Zilles K. Human primary auditory cortex: Cytoarchitectonic subdivisions and mapping into a spatial reference system. Neuroimage 13: 684-701, 2001. Nudo RJ and Masterton RB. Stimulation-induced [14C]2-deoxyglucose labeling of synaptic activity in the central auditory system. J Comp Neurol 245: 553-565, 1986. Ogawa S, Menon RS, Tank DW, Kim SG, Merkle H, Ellerman JM and Ugurbil K. Functional brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging: A comparison of signal characteristics with a biophysical model. Biophys J 64: 803-812, 1993. Penhune VB, Zatorre RJ, MacDonald JD and Evans AC. Interhemispheric anatomical differences in human primary auditory cortex: Probabilistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6: 661-672, 1996. Phillips DP. Temporal response features of cat auditory cortex neurons contributing to sensitivity to tones delivered in the presence of continuous noise. Hear Res 19: 253-268, 1985. Phillips DP, Hall SE and Hollett JL. Repetition rate and signal level effects on neuronal responses to brief tone pulses in cat auditory cortex. J Acoust Soc Am 85: 2537-2549, 1989. Rademacher J, Caviness J, V.S., Steinmetz H and Galaburda AM. Topographical variation of the human primary cortices: Implications for neuroimaging, brain mapping, and neurobiology. Cereb Cortex 3: 313-329, 1993. Rademacher J, Morosan P, Schormann T, Schleicher A, Werner C, Freund HJ and Zilles K. Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13: 669-683, 2001. Rao SM, Binder JR, Hammeke TA, Bandettini PA, Bobholz JA, Frost JA, Myklebust BM, Jacobson RD and Hyde JS. Somatotopic mapping of the human primary motor cortex with functional magnetic resonance imaging. Neurology 45: 919-924, 1995. References 173 Ravicz ME, Melcher JR and Kiang NY-S. Acoustic noise during functional magnetic resonance imaging. J Acoust Soc Am 108: 1683-1696, 2000. Ravicz ME and Melcher JR. Isolating the auditory system from acoustic noise during functional magnetic resonance imaging: Examination of noise conduction through the ear canal, head, and body. J Acoust Soc Am 109: 216-231, 2001. Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150: 104-118, 2000. Rees G, Friston K and Koch C. A direct quantitative relationship between the functional properties of human and macaque V5. Nat Neurosci 3: 716-723, 2000. Rivier F and Clarke S. Cytochrome oxidase, acetylcholinesterase, and NADPH-diaphorase staining in human supratemporal and insular cortex: Evidence for multiple auditory areas. Neuroimage 6: 288-304, 1997. Robin DA, Abbas PJ and Hug LN. Neural responses to auditory temporal patterns. J Acoust Soc Am 87: 1673-1682, 1990. Robson MD, Dorosz JL and Gore JC. Measurements of the temporal fMRI response of the human auditory cortex to trains of tones. Neuroimage 7: 185-198, 1998. Royer FL and Robin DA. On the perceived unitization of repetitive auditory patterns. Percept Psychophys 39: 9-18, 1986. Schreiner CE and Urbas JV. Representation of amplitude modulation in the auditory cortex of the cat. II. Comparison between cortical fields. Hear Res 32: 49-64, 1988. Sereno MI, Dale AM, Reppas JB, Kwong KK, Belliveau JW, Brady TJ, Rosen BR and Tootell RB. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268: 889-893, 1995. Sigalovsky I, Hawley ML, Harms MP and Melcher JR. Sound level representations in the human auditory pathway investigated using fMRI. Neuroimage 13: S939, 2001. Symmes D, Chapman LF and Halstead WC. The fusion of intermittent white noise. J Acoust Soc Am 27: 470-473, 1955. Talavage TM, Edmister WB, Ledden PJ and Weisskoff RM. Quantitative assessment of auditory cortex responses induced by imager acoustic noise. Hum Brain Mapp 7: 79-88, 1999. Talavage TM, Ledden PJ, Benson RR, Rosen BR and Melcher JR. Frequency-dependent responses exhibited by multiple regions in human auditory cortex. Hear Res 150: 225-244, 2000. Villringer A. Physiological changes during brain activation. In: Functional MRI, edited by Moonen CTW and Bandettini PA. Berlin: Springer, 1999, p. 3-13. 174 Appendix SUBJECTS Table A-1 (next page): Subject information for the majority of sessions included as part of this thesis. All sessions listed used a single imaging slice. The slice plane for sessions 1-38 intersected the inferior colliculi and the posterior aspect of Heschl’s gryi. The slice plane for sessions 39-44 intersected the inferior colliculi and the medial geniculate bodies. Except for sessions 17 and 18, all sessions utilized cardiac gated imaging. Sessions 1-5 and 39-44 are the sessions of “Exp. I” and “II”, respectively, of Chap. 2.1 Sessions 1-27 and 29-38 are the 37 “single-slice” imaging sessions of Chap. 4.2 Sessions 39-44 and 28-38 involved a task (see Chaps. 2 and 4, respectively). Seven subjects were imaged in multiple sessions, grouped as follows: (1) 2, 16; (2) 3, 11, 17, 28, 30, 35, 41; (3) 5, 10, 12, 14, 38, 42; (4) 6, 29, 37; (5) 15, 39; (6) 31, 36; (7) 33, 34. The scanner abbreviations are as follows (see also Chap. 4): A: 1.5T, General Electric scanner, retrofitted by Advanced NMR systems; B: 3.0T General Electric scanner, retrofitted by Advanced NMR systems; C: 1.5T Siemens Sonata scanner; D: 3.0T Siemens Allegra scanner; E: 1.5T General Electric Signa Horizon scanner. 1 The relationship between the subject #’s referred to in Chap. 2, and the session #’s of this Appendix is as follows. The data for subject #’s 1 and 4 were obtained in the like-numbered sessions. The data for subject #2 were obtained in sessions 3 (“Exp. I”) and 41 (“Exp. II”). The data for subject #3 were obtained in session 2. The data for subject #5 were obtained in sessions 5 (“Exp. I”) and 42 (“Exp. II”). The data for subject #’s 6, 7, 8, and 9 of Chap. 2 were obtained in sessions 39, 40, 43, and 44, respectively. 2 Session 28 presented continuous noise over a 20 dB range of levels to a subject that was subsequently reimaged using a larger 40 dB range. Data from this session was excluded from Chap. 4, but is included as part of this Appendix. 175 176 Appendix Table A-1 Session Gender 1 M 2 M 3 F 4 M 5 M 6 F 7 M 8 M 9 F 10 M 11 F 12 M 13 M 14 M 15 M 16 M 17 F 18 F 19 M 20 M 21 M 22 M 23 M 24 F 25 F 26 M 27 M 28 F 29 F 30 F 31 F 32 F 33 M 34 M 35 F 36 F 37 F 38 M 39 M 40 M 41 F 42 M 43 M 44 M Handedness R R R R R R R R L R R R R R L R R R R R L R R R R R L R R R R R R R R R R R L R R R R R Age 25 27 35 22 25 25 21 26 30 26 36 26 26 26 21 29 38 26 26 24 24 22 22 28 21 21 26 36 30 37 30 24 32 32 38 30 27 27 19 23 35 25 22 32 Scanner A A A A A A A A A A B B B B B B C C C C C D D D D D D E E E E E E E E E E E A A A A A A HG Data 177 HG DATA Table A-2: Response quantification for Heschl’s gyrus based on the “OSORU” basis functions. The amplitudes of a given basis function (Onset, Sustained, Ramp, Offset, and Undershoot) were averaged across the active voxels (p < 0.001 in the OSORU-based maps) of both hemispheres, and then converted to a “percent change” scale (see Chap. 4). Positive amplitudes represent a positive deviation from baseline (e.g., Figure 3-1). The waveshape index (WI) for the active voxels was computed as in Chap. 4. The “sound on” period was always 30 s in duration. NBs: broadband noise bursts. TBs: tone bursts. BP NBs: “band-pass” (third octave) noise bursts. All individual bursts were 25 ms in duration, unless a sound-time fraction (STF) for the train is explicitly specified. All stimuli were presented at approximately 55 dB above threshold (SL), unless a value for “dB SL” is specified. See Chap. 4 for additional stimulus details. The session numbers correspond to those in Table A-1. “# Active Vox”: total number of active voxels (across both hemispheres). “# HG Vox”: total number of voxels in the HG region. Stimulus session 1 session 2 session 3 session 4 session 5 session 6 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF WI Onset Sustained Ramp Offset 0.16 0.28 0.88 0.16 0.19 0.40 0.78 0.15 0.34 0.68 0.43 0.17 0.21 0.45 0.10 0.07 0.28 0.53 0.17 0.21 -0.11 0.60 1.73 0.52 0.79 1.94 2.57 0.49 1.90 2.20 0.73 0.15 0.34 1.14 0.37 0.32 1.02 1.58 0.39 0.66 1.38 1.32 0.06 1.19 1.28 0.50 -0.99 1.21 0.21 -0.71 0.90 1.17 1.35 0.61 1.45 1.83 1.29 0.37 0.88 0.59 0.79 0.38 0.28 -0.07 -0.04 1.31 1.43 -0.16 1.38 1.40 -0.02 0.04 0.20 0.72 0.14 -0.04 1.01 0.98 -0.23 0.69 0.84 0.60 1.64 -1.20 -1.04 0.56 1.42 -0.01 -0.82 0.81 0.67 0.39 0.47 0.76 -0.32 -0.36 0.56 1.00 -0.61 -0.20 0.76 0.44 1.18 0.75 0.01 0.09 0.55 1.01 1.05 0.44 Undershoot # Active Vox 0.62 11 0.22 16 -0.08 5 0.45 4 0.00 10 0.17 11 0.44 13 -0.27 3 0.20 5 -0.79 7 0.46 1 0.53 11 0.33 16 1.27 6 0.25 20 0.08 23 0.45 32 0.10 23 0.02 6 0.21 10 -0.07 0.14 4 4 # HG Vox 45 45 45 48 48 48 48 46 46 46 47 47 47 47 56 56 56 56 78 78 78 78 178 session 7 session 8 session 9 session 10 session 11 session 12 session 13 Appendix Undershoot # Active Vox 0.34 9 0.58 13 # HG Vox 59 59 Stimulus WI Onset Sustained Ramp Offset 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF 1/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs music cont. noise 35/s NBs 35/s NBs, 25% STF 35/s clicks music 0.17 0.23 0.63 1.17 1.82 1.30 -1.24 0.17 -0.09 -0.36 0.65 0.45 2.75 2.20 -0.45 0.59 1.64 0.75 1.28 0.65 -0.44 0.17 12 13 59 59 0.11 0.36 0.44 0.44 0.83 2.81 1.59 1.74 0.17 -0.12 -0.15 1.40 -0.32 1.33 0.40 -0.56 0.05 0.49 14 8 8 59 61 61 0.70 0.55 3.35 3.08 -1.98 -1.63 1.35 1.36 1.32 0.29 0.99 0.40 8 9 61 61 0.08 0.20 0.24 0.51 0.66 1.01 2.75 0.84 0.89 0.13 0.23 0.43 -0.75 -0.07 -0.07 -0.40 -0.04 0.11 8 16 16 61 58 58 0.70 0.56 1.69 1.11 -0.01 0.59 0.36 0.26 0.90 0.94 0.29 0.07 7 12 58 58 0.00 0.10 0.16 -0.05 0.46 0.74 1.65 1.64 1.68 -0.08 0.35 0.49 -0.38 -0.20 0.11 0.02 0.26 0.30 9 23 29 58 59 59 0.46 0.43 1.73 1.44 0.63 0.80 1.04 0.64 0.94 0.75 0.69 0.43 18 25 59 59 0.41 0.30 0.41 0.65 0.17 0.16 0.29 0.41 0.05 0.35 0.62 0.31 1.62 1.24 2.44 3.16 0.72 0.74 0.95 1.53 0.30 1.15 1.50 0.89 -0.02 0.53 -0.16 -0.74 1.06 1.40 1.67 0.71 2.68 0.47 -0.02 0.62 0.80 0.65 1.49 1.32 0.66 0.34 0.82 1.06 0.45 0.07 0.84 0.65 -0.61 -1.14 0.02 0.93 -0.50 -0.41 0.81 0.75 -0.79 -0.11 0.86 0.23 0.27 0.22 -1.41 -0.27 0.25 -0.09 0.00 0.38 -0.61 -0.21 0.21 0.08 8 12 22 25 26 25 28 27 21 13 7 16 48 48 48 48 49 49 49 49 49 44 44 44 0.32 0.14 0.82 0.33 0.81 1.02 0.39 -0.38 0.34 -0.18 -0.07 -0.56 17 11 44 44 HG Data 179 Undershoot # Active Vox 0.08 30 0.11 25 0.18 23 # HG Vox 55 55 55 Stimulus WI Onset Sustained Ramp Offset session 14 cont. noise 35/s NBs 35/s NBs, 50% STF 35/s NBs, 25% STF 35/s clicks music session 15 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music session 16 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music session 17 2/s TBs, 500 Hz 2/s NBs 35/s TBs, 500 Hz 35/s NBs music session 18 2/s NBs 35/s NBs music session 19 2/s TBs, 4 kHz 2/s NBs 35/s TBs, 4 kHz 35/s NBs music session 20 speech cont. noise 35/s NBs music 0.17 0.50 0.36 0.54 1.04 0.84 0.98 0.40 0.54 0.09 0.31 0.68 -0.09 0.56 0.39 0.40 0.71 0.71 0.59 0.65 0.05 30 55 0.33 0.03 0.22 0.28 0.79 0.14 0.62 0.90 0.60 1.79 0.76 0.52 0.73 0.60 0.00 0.38 0.35 -0.02 -0.32 -0.11 -0.09 0.21 -0.15 -0.13 27 29 16 22 55 55 56 56 0.60 1.35 -0.74 1.26 0.28 0.34 19 56 0.67 0.72 1.85 1.49 -0.56 -0.40 0.91 0.27 0.63 0.66 0.08 0.01 9 19 56 56 0.14 0.16 0.19 0.78 0.45 0.74 1.91 1.23 1.10 0.33 -0.55 0.16 -0.44 -0.35 -0.45 -0.37 -0.18 0.12 11 18 16 56 47 47 0.67 1.62 -0.05 0.42 0.75 0.07 13 47 0.69 0.81 2.28 1.65 -0.24 -0.26 0.27 0.19 0.88 1.01 -0.20 -0.27 15 24 47 47 0.14 0.29 0.75 1.08 1.99 0.78 -0.11 -0.01 -0.63 -0.60 -0.11 0.22 11 3 47 55 0.28 0.42 0.90 0.95 0.65 0.41 0.11 0.84 -0.15 0.53 -0.03 -0.24 11 5 55 55 0.53 0.10 0.25 0.61 0.17 – 1.47 0.53 0.29 0.88 0.64 – 0.17 1.77 0.73 0.26 1.13 – 0.61 0.68 0.02 0.43 0.19 – 0.60 -1.12 0.23 0.79 -0.10 – -0.21 -0.06 0.07 -0.11 0.21 – 8 8 16 14 8 0 55 55 48 48 48 70 0.44 – 1.15 – -0.04 – 0.62 – 0.09 – 0.14 – 5 0 70 70 0.41 0.16 0.15 0.40 0.41 0.15 1.08 0.39 0.38 0.91 0.90 0.37 0.46 1.32 0.88 0.33 0.15 0.84 -0.14 -0.91 0.02 0.37 0.49 0.02 0.12 -0.31 -0.65 0.24 0.16 -0.05 -0.41 -0.74 -0.29 0.00 0.20 -0.03 8 10 18 7 12 6 70 70 56 56 56 56 180 Appendix Stimulus session 21 2/s TBs, 4 kHz 35/s TBs, 4 kHz 35/s TBs, 500 Hz 35/s NBs music session 22 2/s NBs 2/s BP NBs, 4 kHz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music session 23 2/s NBs 2/s BP NBs, 500 Hz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music session 24 2/s NBs cont. noise 35/s NBs music session 25 2/s NBs cont. noise 35/s NBs music session 26 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music session 27 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music Undershoot # Active Vox -0.15 15 # HG Vox 57 WI Onset Sustained Ramp Offset 0.27 0.51 0.62 -0.11 0.08 0.49 0.67 0.44 0.13 0.49 -0.07 10 57 0.51 1.08 0.51 0.23 0.66 0.22 9 57 0.49 0.00 0.28 0.18 1.00 -0.82 0.70 0.39 0.69 2.32 0.44 0.69 -0.04 -0.64 0.23 0.02 0.63 -0.20 -0.29 -0.36 0.11 -0.56 -0.14 -0.15 14 12 23 17 57 57 60 60 0.53 0.84 0.01 0.65 0.41 0.16 18 60 0.60 1.28 -0.04 0.53 0.53 0.04 22 60 0.55 0.10 0.28 0.30 1.02 0.41 0.52 1.21 0.25 1.81 1.00 0.58 0.04 -0.18 -0.45 0.47 0.41 -0.19 0.21 -0.36 -0.10 -0.05 0.59 0.03 32 16 15 16 60 60 52 52 0.68 1.60 -0.45 0.81 0.58 0.30 16 52 0.66 2.02 -0.32 0.70 0.69 -0.05 19 52 0.64 0.17 0.18 0.43 0.59 0.02 0.31 0.36 0.79 0.11 0.22 0.14 0.42 0.68 0.55 0.00 0.26 0.24 0.56 0.80 0.77 0.03 1.77 0.97 0.40 1.08 1.13 0.04 0.71 0.94 0.98 0.37 0.79 0.54 1.50 2.09 1.55 -0.32 1.03 1.01 1.64 1.12 1.69 0.05 0.00 2.43 0.66 0.12 -0.01 0.80 0.26 0.18 -0.20 1.18 0.69 1.21 0.42 -0.26 0.26 2.49 0.63 1.06 0.14 -0.22 -0.32 2.59 0.37 -0.95 0.07 0.29 0.27 0.20 0.39 0.36 0.33 0.30 0.66 0.36 1.00 1.01 0.39 0.11 0.60 0.10 1.00 0.90 1.01 -0.53 0.74 -0.21 -0.21 0.08 0.33 -0.41 -0.43 -0.09 0.57 -0.67 -0.38 -0.68 0.51 1.07 0.67 -0.15 -0.29 -0.78 0.90 1.23 1.18 0.11 -0.19 -1.17 0.02 -0.10 -0.04 -0.01 0.43 -0.18 0.29 -0.41 -0.32 -0.06 0.29 0.00 -0.26 0.46 0.04 -0.51 -0.03 0.06 0.03 -0.85 20 9 28 25 25 18 20 21 26 14 14 17 17 10 10 3 27 38 36 33 33 20 52 52 62 62 62 62 49 49 49 49 52 52 52 52 52 52 62 62 62 62 62 62 HG Data 181 Undershoot # Active Vox -0.67 14 # HG Vox 33 Stimulus WI Onset Sustained Ramp Offset session 28 cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 29 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 30 cont. noise, 35 dB SL cont. noise cont. noise, 75 dB SL music session 31 cont. noise, 35 dB SL cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 32 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 33 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 34 music, 30 dB SL music, 50 dB SL music, 60 dB SL 0.38 0.99 0.07 0.51 -0.14 0.31 0.38 0.77 0.92 0.18 0.48 0.57 0.38 -0.46 0.29 -0.39 -0.23 18 21 33 33 0.05 0.71 0.15 1.16 1.40 -0.18 0.09 0.51 -0.17 0.59 -0.16 0.06 13 5 33 79 0.69 0.74 0.95 1.02 -0.17 -0.63 0.46 0.76 0.45 0.49 -0.45 -0.28 11 20 79 79 0.62 0.35 -0.13 0.48 0.87 -0.08 19 79 0.03 0.28 -0.06 0.75 1.62 0.48 -0.42 0.20 0.10 -0.23 -0.18 -0.38 6 18 79 36 0.26 0.32 0.64 1.14 0.65 0.46 -0.11 0.39 -0.14 -0.08 -0.25 -0.07 18 23 36 36 0.09 0.50 0.37 2.36 1.62 -1.30 0.11 2.20 -0.75 -0.36 0.07 -0.76 15 15 36 54 0.62 2.16 -1.33 2.08 0.51 -0.84 22 54 0.64 0.63 1.79 1.50 -1.21 -0.82 1.87 1.87 0.52 0.54 -1.27 -0.52 14 25 54 54 0.24 0.64 1.30 1.13 0.88 -0.43 1.00 0.69 -0.17 0.32 -0.79 -0.28 19 3 54 47 0.79 0.84 1.19 0.99 -0.31 -0.22 0.63 0.62 0.69 0.82 -0.26 0.01 8 9 47 47 0.70 1.18 -0.56 0.79 0.47 -0.12 8 47 0.11 0.64 -0.29 1.96 1.69 -0.80 -0.15 1.03 0.46 0.56 -0.22 -0.16 5 8 47 49 0.77 0.74 1.26 1.58 -0.01 -0.31 0.50 1.16 1.06 1.16 0.43 0.43 9 9 49 49 0.67 1.62 -0.52 1.28 0.73 0.00 11 49 0.04 0.32 -0.07 1.41 2.45 0.76 0.38 0.13 0.20 -0.16 -0.06 -0.57 10 8 49 45 0.25 0.96 0.74 0.37 -0.30 -0.37 11 45 0.28 0.97 0.61 0.31 -0.01 -0.88 14 45 182 Appendix Stimulus session 35 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 36 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 37 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 38 music, 30 dB SL music, 50 dB SL music, 60 dB SL Undershoot # Active Vox -0.15 11 # HG Vox 38 WI Onset Sustained Ramp Offset 0.22 0.85 0.95 0.24 -0.76 0.26 1.00 0.65 0.53 -0.82 -0.24 24 38 0.26 0.97 0.68 0.43 -0.57 -0.22 22 38 0.30 1.37 0.64 0.60 -0.95 -0.64 15 50 0.31 1.50 0.39 1.11 -0.77 -0.22 17 50 0.24 0.94 0.61 0.89 -1.15 -0.29 13 50 0.12 0.41 1.11 0.37 -0.47 -0.07 15 80 0.31 1.16 0.44 0.53 -0.41 -0.62 17 80 0.25 0.72 0.64 0.49 0.08 -0.65 22 80 0.12 0.35 1.26 -0.31 -0.12 0.04 8 52 0.17 0.56 1.06 0.11 -0.12 0.10 16 52 0.03 0.10 1.62 -0.13 -0.16 -0.07 14 52 STG Data 183 STG DATA Table A-3: Same as Table A-2, but for the superior temporal gyrus. Stimulus session 1 session 2 session 3 session 4 session 5 session 6 session 7 session 8 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music Undershoot # Active Vox 0.64 8 0.67 6 0.53 7 -0.20 7 -0.10 11 -0.20 6 -0.08 11 – 0 -0.36 5 -0.64 18 1.07 4 0.70 17 0.29 14 0.76 6 0.31 18 0.21 23 0.62 20 0.05 17 -0.36 9 0.43 10 # STG Vox 63 63 63 76 76 76 76 69 69 69 75 75 75 75 71 71 71 71 89 89 WI Onset Sustained Ramp Offset 0.17 0.45 0.81 0.27 0.17 0.57 0.92 – 0.48 0.75 0.22 0.13 0.30 0.52 0.17 0.16 0.31 0.60 0.36 0.25 -0.13 1.77 1.63 1.59 0.84 2.19 2.10 – 3.04 3.16 -0.35 0.44 0.83 0.64 1.10 1.14 1.74 2.29 0.87 1.25 1.28 0.63 0.09 0.82 1.73 -0.15 -0.46 – -1.49 -1.35 1.35 1.00 1.08 0.72 1.99 2.12 1.76 0.57 1.22 0.96 0.69 0.90 0.48 1.15 -0.04 1.39 0.64 – 3.27 2.13 -0.30 0.49 0.37 -0.28 0.23 0.42 1.62 1.12 -0.77 0.53 0.87 0.79 1.76 -0.58 -0.01 0.91 1.77 – -1.28 1.58 0.91 -0.18 0.44 1.59 -0.25 -0.25 0.91 1.82 0.35 -0.01 0.68 0.73 1.26 1.15 -0.01 -0.26 0.61 1.08 0.86 0.94 -0.07 0.27 6 2 89 89 0.23 0.23 0.59 1.03 1.35 1.17 -0.42 0.00 0.20 -0.12 0.29 0.79 12 18 80 80 0.68 0.67 2.24 2.41 -0.53 0.03 1.12 0.34 0.85 1.12 -0.63 0.32 15 7 80 80 0.13 0.27 0.37 0.23 0.66 1.68 1.58 1.64 1.12 -0.07 0.18 0.66 0.24 0.62 0.63 -0.45 -0.26 -0.15 41 15 13 80 117 117 0.66 0.50 2.94 2.74 -1.88 -3.08 1.14 2.63 0.93 -0.41 0.39 0.98 11 9 117 117 0.02 0.00 3.09 -0.35 0.09 -0.27 14 117 184 session 9 session 10 session 11 session 12 session 13 session 14 session 15 Appendix Undershoot # Active Vox 0.15 30 0.34 31 # STG Vox 80 80 Stimulus WI Onset Sustained Ramp Offset 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF 1/s NBs 2/s NBs 10/s NBs 35/s NBs 1/s NBs 2/s NBs 10/s NBs 35/s NBs music cont. noise 35/s NBs 35/s NBs, 25% STF 35/s clicks music cont. noise 35/s NBs 35/s NBs, 50% STF 35/s NBs, 25% STF 35/s clicks music 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music 0.21 0.43 0.34 0.97 1.59 1.06 0.05 0.37 0.62 0.91 0.80 0.55 2.19 1.82 -0.14 0.31 0.05 0.94 1.33 1.04 -0.19 0.36 20 24 80 80 0.12 0.07 0.20 -0.53 0.45 0.84 2.35 2.36 2.81 -0.44 0.82 0.58 0.68 0.02 0.71 0.07 0.70 0.57 29 34 35 80 61 61 0.43 0.47 1.72 1.74 1.63 1.65 0.77 0.58 1.51 1.78 1.01 0.59 32 26 61 61 0.42 0.40 0.63 0.65 0.13 0.15 0.31 0.43 0.01 0.55 0.89 0.56 1.73 1.46 3.22 4.11 0.98 1.25 1.44 2.57 -0.04 1.68 1.33 1.36 -0.18 0.15 -0.91 -1.25 2.18 2.67 2.70 1.52 4.74 -0.05 -0.30 0.19 1.03 0.45 1.43 1.28 0.98 0.67 1.34 1.31 0.34 0.32 0.56 0.78 -0.49 -0.88 0.84 1.24 -0.24 -0.07 1.66 1.52 0.10 0.28 1.05 0.81 0.31 0.58 -1.72 -0.53 0.39 0.11 0.37 0.55 -0.37 -0.29 0.23 0.24 14 16 30 38 40 39 42 40 38 33 32 28 67 67 67 67 77 77 77 77 77 65 65 65 0.63 0.14 0.25 0.64 0.51 1.49 0.54 0.87 1.71 1.31 0.20 1.50 1.82 0.48 0.83 0.49 -0.30 -0.22 0.28 0.74 0.95 -0.07 0.40 1.26 1.27 -0.16 -0.55 0.13 0.15 0.55 32 28 29 24 22 65 65 66 66 66 0.39 0.93 1.30 0.47 1.12 0.36 28 66 0.43 0.04 0.23 0.27 1.03 0.04 0.70 1.11 1.08 3.23 0.87 0.70 0.67 1.45 -0.09 0.51 1.24 0.33 -0.05 0.02 0.17 0.24 -0.23 -0.08 27 35 20 22 66 66 63 63 0.69 1.72 -0.49 1.06 0.71 0.28 23 63 0.72 0.76 1.93 1.90 -0.35 -0.65 0.56 0.24 0.84 0.97 -0.07 0.01 16 28 63 63 0.14 0.40 2.43 -0.05 0.41 -0.23 18 63 STG Data 185 Stimulus session 16 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music session 17 2/s TBs, 500 Hz 2/s NBs 35/s TBs, 500 Hz 35/s NBs music session 18 2/s NBs 35/s NBs music session 19 2/s TBs, 4 kHz 2/s NBs 35/s TBs, 4 kHz 35/s NBs music session 20 speech cont. noise 35/s NBs music session 21 2/s TBs, 4 kHz 35/s TBs, 4 kHz 35/s TBs, 500 Hz 35/s NBs music session 22 2/s NBs 2/s BP NBs, 4 kHz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music Undershoot # Active Vox -0.17 26 0.02 19 # STG Vox 77 77 WI Onset Sustained Ramp Offset 0.24 0.18 0.90 0.75 1.06 1.15 -0.20 0.26 -0.05 -0.37 0.71 1.63 0.10 0.23 0.99 0.08 20 77 0.77 0.89 1.79 1.80 -0.30 -0.33 0.01 -0.01 0.96 1.40 -0.39 -0.14 25 24 77 77 0.16 0.25 0.84 1.08 1.86 1.12 -0.16 -0.03 -0.39 -0.03 -0.20 0.22 17 31 77 69 0.22 0.66 1.01 1.79 1.33 0.05 0.22 0.91 0.05 1.24 0.23 -0.06 31 29 69 69 0.75 0.12 0.19 0.61 0.10 0.50 2.13 0.23 0.37 0.82 0.36 -0.43 -0.19 2.82 0.92 0.29 1.23 -1.72 0.76 -0.04 0.28 0.43 0.42 2.22 1.34 0.59 0.16 0.98 0.01 2.98 0.09 0.09 0.14 -0.03 0.30 -0.66 29 22 33 30 26 1 69 69 65 65 65 80 0.40 0.39 2.03 1.31 -0.17 -0.46 1.33 1.62 -0.04 -0.03 0.02 0.36 11 2 80 80 0.54 0.26 0.09 0.51 0.57 0.09 0.29 1.27 1.21 0.38 1.42 1.09 0.20 1.02 0.67 1.38 1.75 0.60 0.34 2.13 0.96 -0.27 -0.56 -0.15 0.27 0.49 -0.32 0.11 0.69 -0.39 -0.80 0.77 0.81 0.21 0.16 -0.30 -0.88 -0.52 -0.05 0.41 -0.32 -0.20 19 17 51 21 32 36 12 80 80 89 89 89 89 66 0.41 1.08 0.38 0.61 0.37 -0.37 10 66 0.51 0.84 0.80 0.02 0.98 0.52 19 66 0.56 0.00 0.32 0.31 1.18 -0.66 0.70 0.61 0.88 2.40 0.22 0.25 -0.24 -0.53 0.39 0.24 0.98 -0.04 -0.33 -0.32 0.06 -0.37 -0.08 -0.15 20 28 29 25 66 66 72 72 0.64 0.94 -0.27 0.79 0.43 0.05 18 72 0.68 1.22 -0.26 0.66 0.51 0.22 27 72 0.56 0.18 1.46 0.59 -0.56 2.09 0.64 -0.36 0.18 0.32 -0.12 0.33 33 29 72 72 186 Appendix Stimulus session 23 2/s NBs 2/s BP NBs, 500 Hz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music session 24 2/s NBs cont. noise 35/s NBs music session 25 2/s NBs cont. noise 35/s NBs music session 26 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music session 27 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music session 28 cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 29 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music Undershoot # Active Vox 0.42 18 0.01 26 # STG Vox 83 83 WI Onset Sustained Ramp Offset 0.42 0.28 0.65 0.88 1.02 0.66 -0.37 0.20 0.60 0.03 0.74 1.45 -0.26 0.42 0.71 0.17 16 83 0.64 1.81 0.01 0.39 0.78 -0.25 21 83 0.71 0.20 0.24 0.54 0.65 0.13 0.35 0.61 0.87 0.11 0.32 0.14 0.51 0.67 0.64 0.09 0.25 0.38 0.65 0.94 0.78 0.27 0.59 1.71 0.28 0.74 1.41 1.49 0.39 0.74 1.10 0.54 0.43 1.05 0.57 1.72 1.87 0.81 -2.09 0.86 1.41 1.94 1.56 1.78 0.52 1.28 -0.06 1.69 0.69 -0.04 -0.10 2.44 -0.04 -0.13 -0.62 1.48 0.28 1.36 -0.18 -0.94 0.31 6.62 0.99 0.69 0.15 -0.27 -0.07 2.30 -0.11 0.31 -1.24 0.21 0.21 0.25 -0.85 0.69 0.38 0.36 0.22 0.95 0.20 0.93 1.18 0.24 -2.09 0.61 0.04 0.43 0.64 0.75 -0.71 0.61 0.84 0.27 -0.18 0.19 0.47 0.25 -0.35 0.32 0.40 -0.48 0.09 -0.30 0.32 0.65 0.89 1.24 0.23 0.18 1.04 1.48 1.48 1.13 0.48 0.01 -0.49 0.11 -0.20 0.08 -0.26 0.89 -0.02 0.71 -0.19 -0.07 -0.51 0.32 0.05 -0.20 1.44 0.19 -0.48 0.04 0.21 0.19 -0.90 -0.25 24 21 46 46 43 36 28 10 28 15 10 27 10 10 5 1 36 48 33 23 33 11 32 83 83 63 63 63 63 57 57 57 57 62 62 62 62 62 62 81 81 81 81 81 81 80 0.61 0.78 1.11 1.17 0.05 0.29 0.57 0.01 0.64 1.27 -0.14 0.14 29 30 80 80 0.28 0.93 0.24 1.87 1.82 -0.48 -0.66 0.98 1.36 1.60 -0.02 0.27 20 7 80 69 0.87 0.80 2.67 2.31 -0.38 -1.42 0.48 1.03 1.98 1.36 -0.59 -0.61 7 10 69 69 0.72 0.72 -0.40 0.72 1.66 -0.73 12 69 0.58 2.63 1.02 0.27 1.78 -0.91 5 69 STG Data 187 Undershoot # Active Vox 0.09 37 # STG Vox 82 Stimulus WI Onset Sustained Ramp Offset session 30 cont. noise, 35 dB SL cont. noise cont. noise, 75 dB SL music session 31 cont. noise, 35 dB SL cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 32 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 33 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 34 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 35 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 36 music, 30 dB SL music, 50 dB SL music, 60 dB SL 0.63 0.79 0.70 -0.48 0.81 0.66 0.87 0.84 1.44 0.67 -0.12 -0.51 0.41 0.90 1.22 0.25 0.08 36 42 82 82 0.32 0.50 1.37 1.53 2.00 -1.13 -0.13 1.71 0.72 -0.55 0.39 -0.52 29 11 82 49 0.55 1.22 -1.02 1.57 0.13 -0.59 13 49 0.69 0.68 1.33 1.08 -0.93 -0.78 1.26 1.57 0.51 0.39 -0.88 -0.35 7 13 49 49 0.19 0.68 0.62 1.80 0.82 -0.96 0.46 1.14 -0.57 0.64 -0.57 -0.55 24 15 49 51 0.88 0.86 1.92 1.95 -0.60 -0.91 0.70 1.02 1.46 1.41 -0.34 -0.04 16 26 51 51 0.81 1.68 -0.97 1.06 1.03 -0.02 26 51 0.21 0.75 0.56 2.55 2.65 -0.73 0.09 0.80 0.95 1.26 -0.84 -0.32 18 20 51 76 0.73 0.93 1.56 1.72 0.41 -0.32 0.18 0.89 1.83 1.71 0.52 0.34 13 27 76 76 0.81 2.24 -0.57 1.16 1.39 -0.26 19 76 0.17 0.30 -0.90 1.42 3.88 0.86 -1.14 0.38 1.68 0.08 0.24 -0.23 26 41 76 92 0.23 0.74 1.32 -0.09 0.17 -0.33 55 92 0.24 0.75 1.34 0.11 0.29 -0.57 53 92 0.28 1.67 1.20 0.25 -0.20 0.20 34 68 0.24 1.49 1.36 0.40 -0.27 -0.27 34 68 0.39 1.73 0.89 0.58 0.53 -0.26 35 68 0.32 1.22 0.47 0.47 -0.90 -0.43 27 47 0.30 1.18 0.33 0.93 -0.97 -0.13 24 47 0.26 0.87 0.48 0.63 -0.99 -0.01 15 47 188 Appendix Stimulus session 37 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 38 music, 30 dB SL music, 50 dB SL music, 60 dB SL Undershoot # Active Vox 0.00 13 # STG Vox 73 WI Onset Sustained Ramp Offset 0.28 0.73 1.41 0.00 0.48 0.39 2.29 0.05 1.15 -0.43 -0.49 11 73 0.28 1.65 0.92 1.01 0.08 -0.31 11 73 0.02 0.09 2.57 -0.53 -0.41 0.11 30 67 0.05 0.33 2.85 -0.33 -0.37 0.06 37 67 0.01 -0.01 3.39 -0.11 0.04 0.33 35 67 IC Data 189 IC DATA Table A-4: Response quantification for the inferior colliculus. For each session and stimulus, the most significant (i.e., lowest p-value) voxel in the left and right IC was identified, based on an Fstatistic computed from the “sustained-only” basis “set” (see Chap. 3). For this voxel, an empirical response time course was computed by averaging across repeated presentations of a given stimulus in a given imaging session, and then converting to a percent change in signal relative to the preceding baseline (see Chaps. 2, 3, or 4 for details). Response magnitude was quantified using two measures computed from these percent change time courses: “Time-Average” percent change – the mean percent change from t = 4 to 30 s, and “Onset” percent change – the maximum percent change from t = 4 to 10 s (see Chap. 2).3 See Table A-2 caption for stimulus abbreviations. Sound pressure level (dB SPL) is indicated for the sessions in which this calibration was possible. (SPL was computed based on the root-mean-square of the entire 30 s stimulus, after first filtering by the frequency response of the sound delivery system). Only sessions that utilized cardiac gated imaging are included as part of this table. (Consequently, sessions 17 and 18 of Table A-1 are excluded). NA: not available. LEFT Hemisphere Stimulus session 1 2/s NBs 10/s NBs 35/s NBs session 2 1/s NBs 2/s NBs 10/s NBs 35/s NBs session 3 2/s NBs 10/s NBs 35/s NBs session 4 1/s NBs 2/s NBs 10/s NBs 35/s NBs 3 SPL (dB) NA NA NA NA NA NA NA NA NA NA NA NA NA NA TimeAverage 0.29 0.51 1.67 0.15 -0.05 0.82 1.15 0.66 0.39 0.29 0.30 1.28 0.19 1.58 RIGHT Hemisphere Onset p-value 0.47 0.46 1.93 0.53 0.34 1.48 1.52 1.16 0.82 0.51 -0.24 1.19 0.03 1.79 3.08E-01 1.65E-04 9.64E-14 1.74E-02 4.76E-02 2.66E-04 4.17E-10 3.34E-01 1.60E-01 5.70E-02 1.23E-04 1.54E-02 3.26E-03 7.47E-05 SPL (dB) NA NA NA NA NA NA NA NA NA NA NA NA NA NA TimeAverage 0.84 0.35 1.56 0.41 0.24 0.25 1.04 0.38 0.33 0.78 -0.33 0.43 0.56 1.66 Onset p-value 0.84 0.80 1.41 0.64 0.55 0.48 1.09 0.48 0.44 0.82 -0.17 0.54 0.86 2.63 7.83E-04 5.69E-03 1.57E-12 1.87E-01 1.06E-01 4.01E-01 1.42E-09 4.54E-02 1.42E-02 9.63E-06 2.37E-01 3.25E-02 2.44E-03 5.42E-11 Note that “Onset” percent change of this table (and Chap. 2) has a different definition and interpretation compared to the onset basis function used to quantify results in Chaps. 3 and 4. Note also that the analyses underlying this table differ slightly from those in Chap. 2 in two respects: (1) the voxel was selected based on the “sustained-only” basis set, rather than a t-test, and (2) the lowest p-value voxel (within the anatomically defined IC region) was selected for each stimulus, whereas in Chap. 2 the same voxel (selected from the t-map for the noise burst train with a repetition rate of 35/s) was used to quantify all stimuli within a given session. 190 Appendix LEFT Hemisphere Stimulus session 5 1/s NBs 2/s NBs 10/s NBs 35/s NBs session 6 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF session 7 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music session 8 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music session 9 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF music session 10 2/s NBs 2/s NBs, 50% STF 35/s NBs 35/s NBs, 50% STF session 11 1/s NBs 2/s NBs 10/s NBs 35/s NBs session 12 1/s NBs 2/s NBs 10/s NBs 35/s NBs music RIGHT Hemisphere SPL (dB) NA NA NA NA 77 87 TimeAverage 0.15 0.54 1.09 0.77 -0.30 0.48 Onset p-value TimeAverage 0.06 0.33 1.58 1.43 0.56 0.79 Onset p-value 2.01E-01 1.15E-02 4.03E-05 8.10E-09 2.13E-01 7.30E-06 SPL (dB) NA NA NA NA 80 90 0.33 0.99 1.06 1.23 -0.06 0.78 0.59 0.54 1.73 1.74 0.74 1.00 1.88E-03 2.62E-03 1.57E-10 1.00E-15 1.01E-03 8.63E-05 89 86 0.48 1.24 0.64 1.03 8.47E-08 1.45E-15 92 89 0.17 1.00 0.34 0.96 6.13E-07 4.95E-10 69 79 -0.14 0.76 0.05 1.36 8.57E-02 3.60E-02 83 93 0.14 0.22 0.37 0.47 2.13E-01 6.37E-02 81 78 0.89 0.83 1.80 1.26 1.02E-11 7.57E-10 95 92 0.34 0.67 0.55 0.75 1.28E-07 6.15E-06 78 66 76 0.96 0.72 1.25 1.24 1.09 1.20 2.93E-13 3.92E-03 9.12E-02 92 71 81 0.66 0.56 0.35 0.84 1.09 0.57 3.22E-10 8.75E-04 1.17E-03 78 75 0.42 0.25 0.57 0.51 2.91E-03 4.90E-02 83 80 0.71 0.20 1.25 0.98 3.12E-05 1.41E-01 75 76 86 0.24 0.61 0.27 0.17 0.99 0.69 1.66E-01 3.46E-02 1.37E-03 80 73 83 0.60 0.57 0.48 0.90 0.85 0.73 4.67E-02 1.78E-03 1.21E-05 88 85 0.97 0.78 1.32 1.22 2.46E-11 2.78E-08 85 82 1.13 1.18 1.01 1.45 3.56E-15 1.00E-15 85 71 81 1.34 0.43 1.01 1.38 0.34 1.72 6.90E-07 1.26E-02 6.17E-07 82 64 74 1.73 0.72 2.20 1.77 0.90 6.08 1.00E-15 6.48E-04 1.41E-06 83 80 1.18 1.15 1.34 1.39 4.87E-11 1.92E-08 76 73 1.22 1.03 1.30 1.25 1.06E-10 4.16E-08 64 69 76 81 59 64 71 76 73 -0.45 -0.03 2.77 0.27 0.69 0.37 0.55 1.90 1.51 3.38 3.13 7.70 6.20 1.10 0.55 0.83 1.86 1.36 1.04E-01 2.45E-01 9.17E-03 5.36E-04 2.40E-02 2.36E-05 1.76E-07 1.07E-15 2.37E-06 66 70 77 82 59 63 70 75 72 -0.48 -0.10 1.76 1.35 -0.22 -0.13 0.95 1.55 1.00 3.57 3.39 8.14 0.60 0.51 0.15 1.52 1.90 1.79 1.37E-01 1.18E-01 1.13E-02 1.24E-04 1.91E-01 7.24E-04 2.60E-07 8.34E-12 3.71E-08 IC Data 191 LEFT Hemisphere Stimulus session 13 cont. noise 35/s NBs 35/s NBs, 25% STF 35/s clicks music session 14 cont. noise 35/s NBs 35/s NBs, 50% STF 35/s NBs, 25% STF 35/s clicks music session 15 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music session 16 2/s NBs 2/s NBs, 70 dB SL 35/s NBs, 40 dB SL 35/s NBs 35/s NBs, 70 dB SL music session 19 2/s TBs, 4 kHz 2/s NBs 35/s TBs, 4 kHz 35/s NBs music session 20 speech cont. noise 35/s NBs music RIGHT Hemisphere SPL (dB) 75 73 71 TimeAverage 0.60 0.55 0.35 Onset p-value TimeAverage 0.78 0.48 0.38 Onset p-value 2.04E-11 6.56E-13 6.91E-10 SPL (dB) 74 76 71 0.84 0.98 0.41 0.80 0.58 0.54 9.25E-11 9.16E-12 7.23E-12 75 71 78 79 77 0.70 0.19 0.72 0.77 0.95 0.78 0.75 0.96 1.10 0.66 3.06E-13 1.30E-03 2.98E-08 3.49E-12 2.76E-10 75 70 78 79 73 0.54 0.58 0.60 0.71 0.64 0.79 0.85 0.84 0.76 0.68 1.00E-15 1.04E-04 1.26E-10 5.73E-15 1.00E-09 75 0.43 0.54 7.12E-09 68 0.31 0.64 2.08E-09 78 74 62 77 0.79 1.27 0.23 -0.02 0.88 1.59 0.37 0.29 1.47E-08 4.49E-07 2.15E-02 1.62E-02 74 74 66 81 0.56 0.74 0.15 0.03 0.63 1.06 0.46 0.32 3.31E-10 1.65E-06 3.39E-02 1.17E-01 59 0.43 0.44 3.06E-06 63 0.52 0.56 2.19E-07 74 89 0.70 0.81 0.78 0.99 1.00E-15 1.00E-15 78 93 0.44 0.84 0.65 0.91 1.15E-13 1.00E-15 71 70 85 1.28 0.57 0.44 1.48 0.57 0.39 2.54E-14 8.49E-09 1.89E-06 75 73 88 0.68 0.16 0.43 1.02 0.37 0.53 1.04E-07 3.90E-04 1.05E-05 67 1.04 1.02 1.00E-15 70 0.86 0.98 1.00E-15 82 97 1.23 1.42 1.55 1.49 1.00E-15 1.00E-15 85 100 0.65 1.41 0.90 1.53 1.00E-15 1.00E-15 79 55 1.38 -0.18 1.52 0.61 5.35E-14 1.46E-02 82 54 1.12 0.59 1.65 1.38 1.22E-08 6.50E-03 63 67 -0.35 -0.35 -0.01 -0.24 1.20E-02 3.09E-02 62 66 0.60 0.16 0.99 0.52 1.08E-02 2.56E-01 75 72 83 70 72 69 0.33 0.63 0.70 0.83 0.86 0.68 0.72 0.52 0.71 1.00 1.06 0.91 1.89E-05 5.90E-03 9.19E-06 3.27E-11 5.56E-15 3.26E-03 74 71 85 73 75 72 0.64 0.74 0.59 0.99 0.83 0.77 1.12 0.49 0.80 1.05 0.85 0.88 2.32E-06 5.27E-03 2.00E-05 1.63E-13 1.00E-15 1.08E-05 192 Appendix LEFT Hemisphere Stimulus session 21 2/s TBs, 4 kHz 35/s TBs, 4 kHz 35/s TBs, 500 Hz 35/s NBs music session 22 2/s NBs 2/s BP NBs, 4 kHz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music session 23 2/s NBs 2/s BP NBs, 500 Hz 35/s BP NBs, 4 kHz 35/s BP NBs, 500 Hz 35/s NBs music session 24 2/s NBs cont. noise 35/s NBs music session 25 2/s NBs cont. noise 35/s NBs music session 26 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music session 27 2/s NBs speech 35/s clicks 100/s clicks 35/s NBs music RIGHT Hemisphere SPL (dB) 59 TimeAverage 0.17 Onset p-value TimeAverage 0.62 Onset p-value 7.77E-03 SPL (dB) 61 1.02 1.12 2.19E-05 71 0.35 0.45 2.12E-06 73 0.52 0.92 2.42E-03 85 0.46 0.46 1.41E-06 80 0.97 0.99 1.00E-15 76 73 80 71 0.55 1.33 0.49 0.54 0.51 1.25 0.81 0.80 6.07E-07 5.31E-05 4.30E-02 4.01E-04 74 71 67 61 0.86 0.87 0.19 0.18 1.20 0.67 0.39 0.51 1.55E-14 2.48E-05 7.34E-02 1.32E-03 84 0.25 0.43 6.19E-05 74 0.62 0.69 8.68E-09 98 1.08 1.06 1.21E-12 95 0.82 0.94 1.00E-15 92 89 64 76 1.13 1.03 0.28 0.37 1.44 1.66 0.48 0.56 8.69E-13 2.11E-05 1.10E-02 7.13E-02 79 76 63 77 1.17 0.90 0.33 0.36 1.69 1.35 0.51 0.55 1.00E-15 1.55E-06 1.08E-03 1.42E-03 68 0.55 0.80 1.48E-03 71 0.44 0.73 1.64E-03 92 1.05 1.33 3.94E-11 92 1.27 1.36 1.00E-15 76 73 68 81 80 77 68 81 80 77 63 88 75 77 75 72 70 85 83 81 82 79 1.01 0.79 0.32 0.32 0.35 1.07 0.56 0.98 1.27 0.43 -0.02 0.76 0.87 1.28 0.99 1.03 0.32 0.50 1.16 0.97 0.84 1.16 1.32 1.25 0.51 0.56 0.67 0.92 0.72 1.07 1.17 0.68 1.21 1.00 0.49 1.07 0.82 0.83 0.50 1.04 1.47 1.21 1.16 1.48 8.50E-11 1.89E-03 7.73E-02 9.36E-05 3.04E-04 2.37E-04 3.23E-02 2.73E-10 3.88E-15 2.60E-04 7.42E-02 3.97E-12 2.41E-11 1.00E-15 1.04E-09 1.38E-04 1.44E-02 1.78E-09 1.00E-15 1.00E-15 1.00E-15 1.08E-08 75 72 65 78 77 74 66 79 78 75 63 88 74 77 75 72 75 84 88 80 87 84 1.00 0.77 -0.29 0.55 0.49 0.99 0.08 0.97 0.56 1.22 0.23 0.57 1.04 1.40 1.55 1.36 0.98 0.41 0.85 0.69 0.78 1.07 1.13 1.16 -0.20 0.69 0.94 1.33 0.39 1.51 1.03 1.08 0.66 0.99 0.90 1.51 1.69 1.52 1.19 0.75 1.19 0.64 1.03 1.23 1.00E-15 1.81E-06 2.95E-01 7.05E-09 8.22E-09 3.95E-08 4.28E-02 3.72E-13 4.80E-12 1.79E-05 7.09E-04 4.57E-14 1.00E-15 1.00E-15 1.00E-15 2.76E-09 2.91E-05 2.55E-12 1.00E-15 1.00E-15 1.00E-15 1.06E-11 IC Data 193 LEFT Hemisphere Stimulus session 28 cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 29 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 30 cont. noise, 35 dB SL cont. noise cont. noise, 75 dB SL music session 31 cont. noise, 35 dB SL cont. noise, 45 dB SL cont. noise cont. noise, 65 dB SL music session 32 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 33 cont. noise, 35 dB SL cont. noise cont. noise, 65 dB SL cont. noise, 75 dB SL music session 34 music, 30 dB SL music, 50 dB SL music, 60 dB SL RIGHT Hemisphere SPL (dB) 79 TimeAverage 0.62 Onset p-value TimeAverage 0.76 Onset p-value 7.22E-08 SPL (dB) 77 0.96 1.09 3.71E-07 89 99 0.55 0.90 0.53 1.15 5.46E-08 1.00E-15 87 97 0.62 0.80 0.78 1.11 1.20E-08 2.48E-13 89 64 0.50 0.30 0.46 0.70 3.30E-03 3.14E-04 87 62 0.84 0.20 1.33 0.41 3.75E-06 2.74E-02 84 94 1.06 0.96 1.33 1.00 1.52E-11 1.16E-12 82 92 0.66 0.81 0.65 1.10 3.00E-07 7.90E-12 104 1.07 1.32 1.10E-13 102 0.77 0.88 5.17E-11 84 62 0.19 0.25 0.34 0.64 2.83E-01 3.64E-04 82 60 0.62 0.73 0.62 1.00 1.91E-02 3.94E-10 82 102 0.86 0.88 0.95 1.32 5.61E-15 1.00E-15 80 100 0.90 0.96 1.23 1.15 1.00E-15 1.00E-15 82 63 0.57 0.49 1.01 1.08 1.88E-05 6.37E-03 80 62 0.75 0.64 1.01 1.73 7.21E-06 4.04E-02 73 0.83 1.17 6.84E-10 72 0.72 1.34 6.92E-05 83 93 0.77 0.85 1.17 1.22 1.03E-08 7.30E-13 82 92 0.62 1.15 1.22 1.77 1.01E-08 2.94E-13 83 60 0.64 0.25 1.11 0.85 1.02E-03 8.23E-03 82 55 0.30 0.49 0.61 0.87 1.98E-02 5.61E-03 80 90 0.92 0.96 0.95 1.21 6.57E-09 3.85E-10 75 85 0.89 1.06 1.03 1.13 2.94E-10 1.89E-11 100 1.12 1.02 1.57E-14 95 0.78 0.95 8.75E-15 80 63 1.24 0.53 1.23 0.58 6.29E-05 2.52E-03 75 61 0.85 0.36 0.91 0.43 2.39E-05 4.44E-02 83 93 0.43 1.02 0.45 0.62 5.46E-03 1.81E-08 81 91 0.50 0.98 0.66 1.38 5.16E-04 9.75E-07 103 1.22 1.11 1.91E-09 101 0.86 1.24 1.26E-07 83 64 -0.57 0.31 -0.16 0.54 6.66E-02 4.68E-02 81 61 -0.60 0.27 -0.46 0.52 1.05E-01 7.28E-02 84 0.59 0.96 1.12E-03 81 0.30 0.34 1.35E-01 94 0.88 0.87 6.46E-06 91 0.73 1.14 2.70E-04 194 Appendix LEFT Hemisphere Stimulus session 35 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 36 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 37 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 38 music, 30 dB SL music, 50 dB SL music, 60 dB SL session 39 2/s NBs 10/s NBs 20/s NBs 35/s NBs session 40 2/s NBs 10/s NBs 20/s NBs 35/s NBs session 41 2/s NBs 10/s NBs 20/s NBs 35/s NBs session 42 2/s NBs 10/s NBs 20/s NBs 35/s NBs session 43 2/s NBs 10/s NBs 20/s NBs 35/s NBs session 44 2/s NBs 10/s NBs 20/s NBs 35/s NBs RIGHT Hemisphere SPL (dB) 66 TimeAverage 0.30 Onset p-value TimeAverage 0.35 Onset p-value 1.35E-01 SPL (dB) 64 0.54 0.49 2.25E-02 86 0.33 0.55 1.77E-01 84 0.65 0.96 2.13E-04 96 0.25 0.76 1.79E-05 94 0.65 0.96 7.58E-04 60 0.92 1.08 4.83E-02 58 0.28 0.61 2.99E-02 80 0.32 0.62 2.28E-02 78 0.42 1.11 1.41E-02 90 0.61 1.02 1.31E-03 88 0.59 0.86 2.18E-02 64 0.87 1.01 1.32E-03 61 0.51 0.66 1.07E-01 84 0.52 0.88 1.10E-04 81 0.43 0.95 2.54E-02 94 0.51 0.91 5.80E-04 91 0.57 1.23 2.09E-04 63 0.68 1.48 2.67E-02 55 0.04 0.20 1.98E-01 83 0.37 0.57 4.30E-02 75 0.24 0.11 5.51E-02 93 0.73 0.61 2.50E-02 85 0.68 0.90 7.48E-03 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0.28 0.74 1.21 0.97 0.48 1.11 1.00 0.66 2.13 1.35 0.70 0.87 0.15 0.84 0.57 1.00 1.36 1.13 1.20 1.38 0.35 0.57 0.86 0.30 0.24 0.83 1.31 0.93 0.15 1.42 0.86 0.87 2.40 1.77 1.61 0.91 0.44 1.02 0.22 1.13 1.15 1.46 1.49 1.77 0.45 0.82 0.62 0.39 3.33E-03 7.94E-03 8.31E-10 7.09E-06 4.72E-02 5.56E-07 5.22E-14 3.32E-06 1.79E-11 3.55E-14 1.00E-15 9.17E-11 3.59E-03 1.72E-05 5.66E-06 1.94E-08 4.37E-11 4.65E-14 1.00E-15 2.49E-15 1.26E-01 5.00E-03 1.97E-07 1.69E-08 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0.29 0.34 1.01 1.21 0.83 0.97 0.40 0.58 -0.01 0.47 1.08 0.84 0.64 0.71 1.31 1.07 0.57 0.54 1.13 1.06 0.62 1.32 0.66 0.54 0.41 0.84 1.31 1.22 1.24 1.62 0.86 1.27 2.67 0.75 0.99 0.68 0.70 0.91 0.99 1.58 0.67 0.55 1.30 1.08 0.74 1.30 0.95 0.41 7.23E-02 2.22E-02 5.05E-08 3.39E-09 3.75E-03 8.37E-11 2.88E-07 1.10E-08 1.59E-03 1.95E-07 1.39E-10 1.11E-08 7.85E-03 4.35E-11 1.00E-15 1.35E-14 3.87E-02 3.53E-04 5.48E-14 4.28E-08 3.69E-04 1.63E-04 1.04E-10 2.44E-06 Biography Michael Patrick Harms Place of birth: Minneapolis, Minnesota EDUCATION Massachusetts Institute of Technology, Cambridge, MA Ph.D. Speech and Hearing Biosciences and Technology, Harvard-MIT Division of Health Sciences and Technology, June 2002. Thesis: Sound temporal envelope and time-patterns of activity in the human auditory pathway: an fMRI study. Rice University, Houston, TX B.S. Electrical Engineering, Summa Cum Laude, May 1994 Senior Honors Project: Modeling the LSO neuron: Inhibition of the excitatory response. HONORS AND AWARDS • Martinos Fellowship (1999, 2000). For research in imaging at MIT. • NIH Training Fellowship (1994-1999) • Rice Engineering Alumni Outstanding Senior Award (1994). Awarded to a single senior across all of Rice's engineering departments. • Phi Beta Kappa, Tau Beta Pi National Honor Societies • Jones College (Rice University) Sophomore and Junior Scholar (1992, 1993) TEACHING EXPERIENCE Department of Electrical Engineering, MIT, Cambridge, MA Graduate Teaching Assistant, Probabilistic Systems Analysis, Fall 1996 • Prepared and lead weekly recitation and small group tutorial sessions. PUBLICATIONS Melcher, J.R., Talavage, T.M., and Harms, M.P. (1999). Functional MRI of the auditory system. In: Functional MRI. Moonen, C.T.W. and Bandettini, P.A. (eds), Berlin: SpringerVerlag, p. 393-406. 195 ABSTRACTS Harms, M.P., Sigalovsky, I.S., and Melcher, J.M. (2002). Temporal dynamics of fMRI responses in human auditory cortex: primary vs. non-primary areas. Assoc. for Research in Otolaryngology, Midwinter Meeting. Abstract #929. Harms, M.P., Sigalovsky, I.S., Guinan, J.J., and Melcher, J.R. (2001). Temporal dynamics of fMRI responses in human auditory cortex: dependence on stimulus type. Assoc. for Research in Otolaryngology, Midwinter Meeting. Abstract #653. Sigalovsky, I.S., Hawley, M.L., Harms, M.P., Levine, R.A., and Melcher, J.R. (2001). Sound level representations in the human auditory pathway investigated using fMRI. Assoc. for Research in Otolaryngology, Midwinter Meeting. Abstract #654. Harms, M.P., and Melcher, J.R. (1999). Understanding novel fMRI time courses to rapidly presented noise bursts. 5th International Conference on Functional Mapping of the Human Brain, NeuroImage 9: S847. Harms, M.P., Melcher, J.R., and Weisskoff, R.M. (1998). Time courses of fMRI signals in the inferior colliculus, medial geniculate body, and auditory cortex show different dependencies on noise burst rate. 4th International Conference on Functional Mapping of the Human Brain, NeuroImage 7, S365. Harms, M.P., Melcher, J.R., and Weisskoff, R.M. (1998). Dependence of fMRI activation in the inferior colliculus, medial geniculate body, and auditory cortex on noise burst rate. Assoc. for Research in Otolaryngology, Midwinter Meeting. Abstract #827. Harms, M.P., Weisskoff, R.M., and Melcher, J.R. (1997). Dependence of fMRI activation in the human inferior colliculus and Heschl's gyrus on noise burst rate. Society for Neuroscience Abstracts, 23: 1033. 196