Coding in the Fusiform Face Area of the Human Brain by Lakshminarayan "Ram" Srinivasan Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2003 @ Massachusetts Institute of Technology 2003. All rights reserved. Author ........ Department of Electrical Engineering a d Computer Science August 8, 2003 Certified by.................... -. N ncy G. Kanwisher Professor Thesis Supervisor / Accepted by Arthur C. Smith Chairman, Department Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY OCT 1 5 2003 -IBRARIES DARKER Coding in the Fusiform Face Area of the Human Brain by Lakshminarayan "Ram" Srinivasan Submitted to the Department of Electrical Engineering and Computer Science on August 8, 2003, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science Abstract The fusiform face area of the human brain is engaged in face perception and has been hypothesized to play a role in identifying individual faces [29]. Functional magnetic resonance imaging (fMRI) of the brain reveals that this region of the temporal lobe produces a higher blood oxygenation dependent signal when subjects view face stimuli versus nonface stimuli [29]. This thesis studied the way in which fMRI signals from voxels that comprise this region encode (i) the presence of faces, (ii) distinctions between different face types, and (iii) information about nonface object categories. Results suggest that the FFA encodes the presence of a face in one dominant eigenmode, allowing for encoding of other stimulus information in other modes. This encoding scheme was also confirmed for scene detection in the parahippocampal place region. PCA dimensionality suggests that the FFA may have larger capacity to represent information about face stimuli than nonface stimuli. Experiments reveal that the FFA response is biased by the gender of the face stimulus and by whether a stimulus is a photograph or line drawing. The eigenmode encoding of the presence of a face suggests cooperative and efficient representation between parts of the FFA. Based on the capacity for nonface versus face stimulus information, the FFA may hold information about both the presence and the individual identity of the face. Bias in the FFA may form part of the perceptual representation of face gender. Thesis Supervisor: Nancy G. Kanwisher Title: Professor 3 4 Acknowledgments I would like to thank my advisor, Professor Nancy Kanwisher, for her keen insight, compassionate guidance, and generosity with time and resources. My father, mother and brother lent hours of discussion and evaluation to the ideas developed in this thesis. I am grateful to all members of the Kanwisher laboratory, Douglas Lanman at Lincoln Laboratory, and Benjie Limketkai in the Research Laboratory of Electronics for their friendship and interest in this work. Thanks to Professor Gregory Wornell and Dr. Soosan Beheshti for discussions on signal processing. Design and collection of data for the first experiment (chapter 3) was performed by Dr. Mona Spiridon, at the time a post-doctoral fellow with Professor Nancy Kanwisher. The fMRI pulse sequence and head coil for the second and third experiments (chapters 4 and 5) were designed by Professors Kenneth Kwong and Lawrence Wald respectively, at Massachusetts General Hospital and Harvard Medical School. Some stimuli in the third experiment (chapter 5) were assiduously prepared by the assistance of Dr. Michael Mangini and Dr. Chris Baker, current postdoctoral scholars with the laboratory. Some preprocessing analysis Matlab code used in this experiment was written in conjunction with Tom Griffiths, a graduate student with Professor Joshua Tenenbaum. Portions of the research in this paper use the FERET database of facial images collected under the FERET program. Work at the Martinos Center for Biomedical Imaging was supported in part by the The National Center for Research Resources (P41RR14075) and the Mental Illness and Neuroscience Discovery (MIND) Institute. The author was funded by the MIT Presidential Fellowship during the period of this work. 5 6 Contents 1 17 Problem Statement 21 2 Background 2.1 Fusiform Face Area (FFA): A face selective region in human visual cortex 21 2.2 Physics, acquisition, and physiology of the magnetic resonance image 2.3 2.4 24 2.2.1 Physical principles . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Image acquisition . . . . . . . . . . . . . . . . . . . . . 26 2.2.3 Physiology of fMRI . . . . . . . . . . . . . . . . . . . . 27 Design of fMRI experiments . . . . . . . . . . . . . . . . . . . 29 2.3.1 Brute-force design . . . . . . . . . . . . . . . . . . . . . 30 2.3.2 Counterbalanced block design . . . . . . . . . . . . . . 31 2.3.3 Rapid-presentation event related design . . . . . . . . . 33 . . . . . . . . . . . 42 2.4.1 Univariate analysis of variance (ANOVA) . . . . . . . . 44 2.4.2 Principle components analysis (PCA) . . . . . . . . . . 45 2.4.3 Support vector machines (SVM) . . . . . . . . . . . . . 49 Analysis of fMRI-estimated brain activity 3 Experiment One: FFA response to nonface stimuli 53 4 Experiment Two: FFA response to face stimuli 63 5 Experiment Three: FFA response to stimulus gender 71 6 Conclusions on FFA coding of visual stimuli 7 85 6.1 How do voxels within the FFA collectively encode the presence or absence of a face? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . 88 6.2 How many independent stimulus features can the FFA distinguish? 6.3 Does the FFA contain information about stimulus image format or face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.4 Nonstationarity in fMRI experiments . . . . . . . . . . . . . . . . . . 91 6.5 Defining a brain system in fMRI analysis . . . . . . . . . . . . . . . . 92 gender? Bibliography 95 8 List of Figures 3-1 Typical results from one subject of PCA variance and projection analysis in (a,d,g) FFA, (b,e,h) PPA, and (c,f) all visually active voxels. (a,b,c) ROI response vectors projected onto first two principal components. Labels 1 and 2 indicate face stimuli; all other labels are non-face stimuli. Label 5 indicates house stimuli; all other labels are non-place stimuli. Photograph stimuli are denoted in boldface italics, and line drawings are in normal type. (d,e,f) Variance of dataset along principal components including nonpreferred and preferred category response vectors. (g,h) Variance of dataset along principal components excluding preferred category response vectors. . . . . . . . . . . . . . 3-2 56 Sum of variance from PCi and PC2 for data (a) including and (b) excluding own-category stimulus responses. The labels S1 through S4 denote the subject number, consistent between panel (a) and (b). 9 . . 57 3-3 Typical plots from one subject of nonface principal component coordinates against face detection vector coordinates for stimulus responses in (a,b) FFA and (b,c) PPA. The detection vector coordinate connects the average face response vector to the average nonface response vector. (a,c) ROI nonface response vector coordinates projected onto PCi of nonface data points and the detection vector. (b,d) Similarly for nonface PC2. Alignment of data along a line indicates strong correlation between the two axis variables. Geometrically, this indicates either the nonface data already falls along a line, the vectors corresponding to the axis variables are aligned, or some mixture of these two scenarios. PC nonface variance analysis (Figure 3-1 g,h) suggests that the axis alignment concept is appropriate. The majority of variance is not explained exclusively by the first PC, so the data does not already fall along a line in the original response vector space. Although fourty percent of the variance falls along a line, this is not enough to allow any coordinate choice to trivially project data onto a line, as evidenced by the coordinates chosen in (a) and (d). Axis alignment is quantified directly in Figure 3-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 58 Summary of subtending angle between detection vector and nonpreferred category principal component in (a) FFA and (b) PPA for four subjects. If the subtending angle is greater than 90 degrees, the angle is subtracted from 180 degrees to reflect the alignment between vectors rather than the direction in which the vectors are pointed. Two vectors are completely aligned at 0 degrees and completely unaligned or orthogonal at 90 degrees subtending angle. 10 . . . . . . . . . . . . . 59 4-1 Stimulus set used in study of FFA response to faces. The set includes pictures of sixteen faces and one car. Four contrasts are equally represented, including adiposity (columns 1 and 2 versus 3 and 4), age (rows 1 and 2 versus 3 and 4, gender, and race. Faces were obtained from [37], with exception of the African Americans in the second row and third column which were photographed by the author, and the adipose young Caucasian male obtained from the laboratory. Two face images (elderly, African American, female) have been witheld as per request of volunteers. 4-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Variance of PCA components (a) across FFA response to faces, with preferred stimulus data from one subject, and (b) generated from a Gaussian normal vector with independent identical elements N(0,1). Dimensions and dataset size match between (a) and (b). 4-3 . . . . . . . 66 Multivariate ANOVA across (a) race, (b) gender, (c) age, and (d) adiposity for FFA response to faces in one subject. The data is projected onto the optimal category-separating axis, canonical variable 1, abbreviated "ci". This value is plotted along the y axis in the above graphs. The corresponding p-value is printed on each panel, indicating the probability that the data belongs to a single Gaussian random variable. The accompanying d value indicates that the d-1 Gaussians null hypothesis can be rejected at p < 0.05. Here, we compare only two groups in each panel (eg. male/female), so the maximum d value is 1. ......... .................................... 11 67 4-4 ANOVA on a one-dimensional FFA response measurement to male and female stimuli. For each voxel, the hemodynamic impulse response was fit with a polynomial and the maximum amplitude extracted. This value was averaged across all voxels in the FFA to produce the response feature analyzed with this ANOVA. Panel inscriptions denote the ANOVA F-statistic value and the p-value (Prob>F), the probability that a single univariate Gaussian would produce separation between male and female stimulus responses equal to or greater than observed here. A p-value of 0.0614 is near the conventional significance threshold of p= 0.05 5-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Examples from stimulus set used in study of FFA response to gender. Rows from top to bottom correspond to males, females, reception chairs, and office chairs respectively. Fifteen exemplars of each category were presented in subjects a, b, and c. Blocks of fifteen additional exemplars of each category were included in some trials of subject d. All stimuli were presented at identical pixel height dimensions. ..... 5-2 73 Time course of V1 response, averaged over all voxels in V1 for male, female, and chair stimuli. The y axis denotes percent amplitude change from a baseline signal. Each panel corresponds to a different subject, (a) through (d). The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. . . 5-3 74 Time course of FFA response, averaged over all voxels in FFA for male, female, and chair stimuli. The y axis denotes percent amplitude change from a baseline signal. Each panel corresponds to a different subject, (a) through (d). The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. . . 75 5-4 V1 response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . . . . . 12 76 5-5 FFA response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . . 5-6 80 Non-stationarity correction of FFA response amplitudes in subject (a) across runs for four conditions. (a) The original data, with response to faces drifting downward over the course of the experiment. (b) Compensated for non-stationarity in face stimulus responses. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . . . . . 13 82 14 List of Tables 5.1 SVM performance d' values on gender, chair, and face discrimination tasks in four subjects. The binary classification tasks were male/female, office/reception chair, and face/nonface respectively. Rows correspond to subjects (a) through (d). ROIMean and ROI-Map denote the hemodynamic response feature being classified. ROIMean is a scalar, the average of the response over voxels and time. ROI.Map is the vector of time-average responses of ROI voxels. 15 . . . . . . . . . . . . . . . . 81 16 Chapter 1 Problem Statement When greeting a friend that passes by on the street, we determine both the presence and identity of the human face that is in our field of vision. This process of face detection and identification is studied with functional magnetic resonance imaging (fMRI), a noninvasive technique used to measure brain activity. The fusiform face area (FFA) of the brain responds "maximally" to faces, meaning that pictures of human faces elicit more signal from the FFA of an attending subject than pictures of houses, amorphous blobs, or other nonface stimuli [29]. Most past work on the FFA has averaged activity over the voxels in the FFA, ignoring any differences that may exist in the response of different voxels in this region. Here we used high resolution fMRI scanning and new mathematical analyses to address two general questions: 1. What information is contained in the pattern of response across voxels in the FFA? 2. How is that information represented? In particular, this thesis investigated three specific questions: 1. How do voxels within the FFA collectively encode the presence or absence of a face? 2. How many independent stimulus features can the FFA distinguish? 17 3. Does the FFA contain information about stimulus image format or face gender? In this manuscript, the phrase "detection signal" denotes the part of the FFA response that tells whether a stimulus is a face or nonface. The term "bias" refers to the remaining parts of the FFA response that might encode other information about a stimulus. "Stimulus image format" refers to the difference between photograph and line-drawing versions of a picture. The three experiments performed in this thesis relate to different aspects of the above main questions. The first experiment (chaper 3) studied whether the variation in responses to non-face stimuli was related to a modulation of the face detection signal. The experiment also used principal components analysis (PCA) to graphically explore which other features of the stimuli affected the variation and to determine an upper bound on the number of independent nonface features that could be represented in the FFA. The second experiment (chapter 4) studied FFA response to face stimuli. As with the first experiment, PCA was used to determine an upper bound on the number of independent face features that could be represented with the FFA. Multivariate analysis of variance (MANOVA) was used to quantify FFA sensitivity to race, gender, age, and adiposity. The third experiment (chapter 5) performed more extensive tests to quantify FFA sensitivity to gender. Several experimental and mathematical methods were employed in this thesis. Chapter 2 discusses these methods in the context of fMRI and the FFA. Section 2.1 is a review of previous studies on the FFA. Section 2.2 discusses fMRI and the principles behind the noninvasive imaging of brain activity. Section 2.3 describes the presentation of stimuli and complimentary signal processing that is required to produce estimates of the stimulus response. Section 2.4 discusses the mathematical methods used in this thesis to analyze stimulus coding in the FFA once estimates of stimulus response were obtained. The conclusion (chapter 6) provides a summary and evaluation of the findings. This thesis represents a part of a larger effort to undertand the nature of face processing in general and FFA activity in particular. Because detection and identification of faces is important to our daily social interactions, understanding the neural substrate 18 for this ability represents a core problem in cognitive neuroscience. 19 20 Chapter 2 Background 2.1 Fusiform Face Area (FFA): A face selective region in human visual cortex Many of our daily interactions call upon face detection, the brain's ability to identify the presence of a face in a scene. Efforts develop computer face detection algorithms have highlighted some of the challenges of this problem. Within a test scene, faces may be rotated at different angles, scaled or occluded. A complex background like a city scene might make the face more difficult to detect than a plain white background. Skin color can be altered by lighting. Non-face sections of a scene can coincidentally resemble a face in color or shape. The associated engineering challenge is to maximize probability of detection while minimizing probability of false alarm. Optimum detection of faces requires both flexibility and inflexibility. The detector must be flexible to changes in scale, position, lighting, occlusion, etc., while at the same time inflexibly rejecting non-face samples under various image changes. Algorithms have generally solved scale and position invariance by multiresolution sampling (e.g. Gaussian pyramids) and variable width sliding windows. Unique facial features such as skin color or geometry of eye and mouth position have then been chosen in an attempt to make the detector more face selective. A learning machine then observes these features in a labeled set of faces, 21 and develops a joint probability distribution function (pdf) or chooses a classifier function based on some criterion, such as empirical risk minimization in a support vector machine (SVM). The brain has the capacity to solve these basic computational challenges and perform very high quality face detection. Brain research on face detection in the last fifty years has focused on determining the actual neural substrate. One basic question was whether the signal indicating detection of a face was anatomically localized or present in multiple regions of the brain. While the latter hypothesis has yet to be explored, the search for an anatomically localized face detection signal was prompted by discovery of focal brain lesions that induced prosopagnosia, an inability to recognize faces [3]. These lesions were typically located in temporal lobe. Electrophysiological recording in macaque temporal cortex subsequently revealed face selective cells [23]. Human electrophysiology in temporal and hippocampal sites demonstrated face, facial expression, and gender selective neurons [16, 27, 34]. Event-related potential (ERP) and magnetoecephalography (MEG) studies made a noninvasive and anatomically broader survey of healthy brains, implicating fusiform and inferotemporal gyri in face selective response. In the last decade, functional magnetic resonance imaging (fMRI) of brain activity has allowed for anatomically more precise visualization of this region. Initial studies located regions that responded to face stimuli more than blank baseline stimuli. Two studies [32, 29] in 1997 made explicit signal amplitude comparisons between face and non-face stimuli to locate brain regions that responded maximally to faces. These studies were careful to choose face and non-face stimuli that differed little in low-level features such as luminosity, as verified by checking if low level visual areas like VI were responding differentially to faces and non-faces. The anatomically localized area of face detection signals found most consistantly across subjects is the fusiform face area (FFA), located in the fusiform gyrus of the posterior inferior temporal lobe. Face detection signals are also focused near the occipital temporal junction, in the occipital face area (OFA). Subsequent work has suggested that the FFA responds more to nonface objects for 22 which the subject bares expertise than for other nonface objects. However the paper initiating this claim trained subjects on novel objects called greebles that resembled faces, making the effect of greeble response difficult to distinguish from the normal face detector response [46]. Other work demonstrated elevated FFA response to cars in car experts and birds in bird experts, but with FFA response to faces still maximal [17]. Insufficient additional stimuli were tested to determine whether the FFA could serve reliably as a detector of these alternate expertise-specific objects in daily experience. Four questions regarding the function of the FFA have not yet been empirically resolved. (i) Is the FFA involved in pre-processing stages leading to the detection of faces? (ii) Does the FFA participate in further classification of faces, by expression, race, or other identifying characteristics? Only recently has it been shown in Caucasians and African Americans that the average FFA signal is elevated in the presence of faces from the subject's own race [18]. However, this result does not support the encoding of more than two races. (iii) Does the FFA encode information about nonface stimuli? Maximal response to face stimuli is not interpreted as non-face detection although this function is equivalent to face detection. A recent correlation based study suggested the FFA does not reliably encode non-face objects [44], but alternate decoding methods might prove otherwise. (iv) Are emotional expressions and semantic information about faces (eg. name of the individual) represented in the FFA? The hemodynamic response through which fMRI observes neural activity is know to lose tremendous detail in the temporal and spatial response of neurons. All spike patterning and the individual identity of millions of neurons are lost in each spatial voxel (three dimensional pixel) of hemodynamic activity. The absence of information in fMRI signal is not proof of absence of this information in the respective brain area. The general assumption behind conventional interpretations of FFA activity is that all amplitudes are not equal. Higher amplitude responses are thought to code information while low amplitude responses are not. This "higher-amplitude-better" tenet bares some resemblance to the cortical concept of firing rate tuning curves in individual neurons. A tuning curve is a function describing the response of a neuron 23 that takes the input stimulus and maps it to the firing rate, i.e. the number of spikes (action potentials) emitted by the neuron over some unit time. It has been traditionally assumed that the stimulus for which the neuron responds maximally is the stimulus the neuron represents. Neural modeling literature from the past decade [39, 8], the "new perspective", has understood that the firing rate, regardless of its amplitude, can be used in an estimator of the stimulus based on the tuning curve function. The tuning curve is the expected value of the firing rate given the stimulus, and thus a function of the conditional probability density relating stimulus to firing rate. The new perspective is more general than the traditional higher-amplitudebetter paradigm. 2.2 Physics, acquisition, and physiology of the magnetic resonance image The following discussion on physics of fMRI (section 2.2.1) and image acquisition (section 2.2.2) draw from an introductory text on fMRI [4] and an online tutorial [28]. The textbook and tutorial provide many helpful illustrations of the basic concepts. The reader is directed to the discussion below for a brief overview, and the referenced sources for a more complete introduction to the fMRI technique. 2.2.1 Physical principles A hydrogen proton has an associated quantum state called spin, which can be up or down. Spin produces an effective magnetic dipole, and any collection of protons, such as a sample of water molecules, will have a net magnetic dipole direction and magnitude. The emergence of an apparent continuum of net dipole directions in a sample of protons from two spin states in each proton is a subject of quantum mechanics. Based on its surrounding magnetic field, a proton has a characteristic resonant frequency wo = 7Bo. Electrons near a proton can partially shield a proton from 24 the external magnetic field, shifting the resonant frequency. This so called chemical shift is sufficiently significant that molecules of different structure have hydrogens of different resonant frequencies. Conventional fMRI works with NMR of the water molecule and its associated proton resonance frequency. The net dipole of a sample aligns with the direction of an applied magnetic field, denoted the longitudinal direction. The dipole can be displaced, or "tipped" from the direction of alignment by a radiowave pulse at the resonant frequency. The tipped dipole precesses at the resonant frequency in the transverse plane, orthogonal to the longitudinal direction. The component dipoles of the sample initially rotate in phase, but over time the component phases disperse uniformly in all directions. During this process, called relaxation, the net dipole returns to the equilibrium direction with an exponential timecourse. The time constants to reaching equilibrium in transverse and longitudinal directions are called T2 (transverse, spin-spin) and T1 (longitudinal) relaxation time respectively. After tipping with a 900 RF pulse and during the relaxation phase, the precessing magnetic dipole, an oscilliating magnetic field, induces electromagnetic radiation, a radiowave, that can be detected with a nearby wire coil. This emission is called the free induction decay (FID). A second emmission with identical waveform but larger magnitude can be generated by following with a 1800 RF pulse. This second emmission arrives at a time interval after the 1800 pulse equal to the time between the 900 and 1800 pulses, and is called a spin echo. In echo planar imaging (EPI), additional magnetic field gradients (phase and frequency encoding, explained below) are cycled during the spin echo, resulting in "gradient echo" that allows for the fast slice imaging required for fMRI. Three factors determine the amplitude of the emitted radiowave: the local density of protons, TI relaxation time, and T2 relaxation time. Because these factors are endogenous characteristics of the sample, contrasts between various brain tissue or deoxygenated and oxygenated vasculature can be imaged in the intensity of the MR signal. fMRI exploits the variation that deoxyhemoglobin versus hemoglobin induces in the T2 relaxation time. Because the free induction (and thus the spin 25 echo) emission decays at the T2 relaxation time, the total energy of the emission is sensitive to blood oxygenation. This allows for sufficient image contrast in sections of higher blood oxygenation to create images of brain activity. By employing alternate radiowave and magnetic gradient pulse sequences, the recorded MR signal can reflect different aspects of spin relaxation (proton density, T1, etc.) and hence contrast different endogenous properties of the tissue sample. The T2 dependent MR signal is called blood-oxygen-level-dependent (BOLD) signal in fMRI literature. An explanation of T2*, also relevant to fMRI, has been omitted here. 2.2.2 Image acquisition In order to produce images, the MR signal must be extracted from voxels in a crosssection of brain. The image is formed by a grayscale mosaic of MR signal intensities from in-plane voxels. With current technology, a three dimensional volume is imaged by a series of two dimensional slices. The localization of the MR signal to specific voxels is made possible by magnetic field gradients that control the angle, phase, and frequency of dipole precession on the voxel level. The process begins with slice selection. By applying a linear gradient magnetic field in the direction orthogonal to the desired slice plane, the slice plane is selectively activated by applying the RF pulse frequency that corresponds to the resonant frequency of water hydrogen protons for the magnetic field intensity unique to the desired plane. MRI machines that are able to support steeper slice selection gradients allow for imaging of thinner slices. The slice selection step guarantees that any MR signal received by the RF coil originates from the desired slice. Because the entire plane of proton spins is excited simultaneously, the entire plane will relax simultaneously, and hence the MR signal must be recorded at once from all voxels in plane. Resolution of separate in-plane voxel signals is achieved by inducing unique phase and frequency of dipole precession in each voxel, in a process called phase and frequency encoding. The corresponding RF from each voxel will then retain its voxelspecific phase and frequency assignment. Consider the plane spanned by orthogonal vectors x and y. To acheive frequency encoding, a linearly decreasing gradient mag26 netic field is applied along the x vector. As a result, precession frequency decreases linearly along the x vector. To achieve phase encoding, a frequency encoding gradient is applied along the y vector, and then removed. In the end state, precession along the y vector is constant frequency, but with linearly decreasing phase. The resulting relaxation RF measured by the external coil is composed of energy from the selected slice only, with the various frequency and phase components encoding voxel identities. In practice, the frequency and phase assignments to the in-plane voxels are chosen such that the resulting linear summation RF signal sampled in time represents the matrix values along one row of the two dimensional Fourier transform (2D-FT) of the RF in-plane voxel magnitudes. The in-plane frequencies and phases are rapidly reassigned to produce the remaining rows of the 2D-FT. This Fourier domain in MRI imaging literature is termed "k-space". Performing the inverse 2D-FT on the ac- quired matrix recovers the image, a grayscale mosaic of MR signal intensities from the in-plane voxels. 2.2.3 Physiology of fMRI While it is generally accepted that the T2 relaxation time is sensitive to blood oxygenation levels via de/oxyhemoglobin as discussed above, research is still being performed to understand the relationship between neuronal activity and the BOLD signal. Recent work suggests that the BOLD signal more closely corresponds to local electric field potentials than action potentials [31]. The local field potential (LFP) is acquired by low-pass filtering the signal obtained from a measurement electrode placed in the region of interest. LFPs are generally believed (but not confirmed) to result from synchronous transmembrane potential across many dendrites, and hence the computational inputs to a region. The nature of the LFP is also apparently dependent on choice of reference electrode location. A subsequent theoretical paper [26] proposed a linear model to express the relationship between local field potentials and fMRI signal. The combined use of electrophysiology and fMRI is currently an expanding area of research. The combined approach promises to elucidate the relationship between fMRI signals and neural activity, simultaneously study neural behavior on 27 multiple scales and spatial/temporal resolutions, and enable visualization of local electrical stimulation propogating on a global scale. Some of these questions might preliminarily be answered by coupling electrophysiology with electroencephalography (EEG), circumventing the technical issues associated with performing electrical recordings in the fMRI electromagnetic environment. Some of the first fMRI experiments [12] examined the response of the early visual areas of the brain, namely V1, to flashes of light directed at the retina. These experiments revealed a characteristic gamma-distribution-like shape to the rise and fall of MR signal in activated VI voxels. The MR signal typically rises within two seconds of neural activation in the voxel, peaks by four seconds, and decays slowly at twelve or sixteen seconds post-activation. This characteristic hemodynamic impulse response is parameterized with the functional form used in the gamma probability density function, quoted below from [20]: 0 h(t) t<A (2.1) = where A sets the onset delay of the hemodynamic response, and T determines the rate of the quadratic dominated ascent and exponential dominated descent. In the conventional use of this parameterization of the hemodynamic response, T and A are believed to be constant within anatomically focal regions but vary across brain regions. Consequently, A and T values are typically set, and the data used to estimate a constant amplitude coefficient to fit the response to a given stimulus. With single stimulus or event-related design (discussed below), the choice of A and r can be made based on the best fit to the raw hemodynamic timecourse. The gamma function model of the hemodynamic response has been verified in VI, but not in the face detection areas of the extrastriate occipital lobe and temporal lobe studied in this thesis. This is not an essential deficiency because analysis of information content in a brain area can be performed with incorrect parametrizations, 28 as well as non-parametrically. 2.3 Design of fMRI experiments Because it is noninvasive and anatomically localizable, fMRI is a commonly used technique to measure the human brain's characteristic response to one or more stimulus conditions. Conventionally, the response to multiple repetitions (trials) of the same stimulus are averaged to produce a low variance estimate of the response to that stimulus. The variance observed in the response to the same stimulus across multiple trials is conventionally attributed to physiological and instrument "noise," although the physiological component of the noise variance may include brain processes that are simply not well characterized. These noise processes are not necessarily stationary or white, and moreover the response of the brain to trials of the identical stimulus may change as is often observed in electrophysiology. Consequently, differential analysis of the trial averaged response estimates of two stimuli generally may not be able to detect more subtle differences between stimuli that elicit small changes in neural activity. This issue is illustrated in the results of experiment three of this thesis, gender bias of face detection (chapter 5). While research is currently underway to develop more sophisticated models for brain response estimation [38], methods have been developed based on the simple model of identical stimulus response across multiple trials in the presence of colored or white Gaussian noise. These methods can be applied over the entire duration of the experiment, as with experiments one and two of this thesis (chapters 3 and 4), or applied over consecutive sub-sections of the experiment for which the assumptions might be more plausible, as with the last experiment of this thesis (chapter 5). Three methods to estimate the characteristic responses of a brain voxel to a set of stimuli are described below: brute-force (block and single stimulus), counterbalanced (block), and event-related (single stimulus) design. 29 2.3.1 Brute-force design In brute-force experiment design, a stimulus condition is presented briefly and the next stimulus is presented after eight to sixteen seconds to allow the previous hemodynamic response to decay. Physiological and instrument noise demands that multiple trials of the same condition be averaged to produce an estimate of a hemodynamic response. The extended interstimulus period severely limits scheduling of multiple trials within one experiment, and makes inclusion of more than four conditions impractical. Studies that characterize physiological and instrument noise together were not found in a literature search, although such work might be possible by adding a signal source to measure instrument noise and using signal processing theory to separate and characterize the physiological noise in various brain regions. In a variation of the above basic experiment design, the stimulus condition can be repeated multiple times in close succession, typically every one to two seconds in high-level vision studies. This time window, called a block, is typically sixteen to twenty seconds during which multiple exemplars of one stimulus are presented. Assuming a linear summation of identical hemodynamic impulse responses during the block, the MR signal for activated voxels rises to a nearly constant level within the first four seconds of the block. The block presentation is an improvement over single stimulus presentation in that more measurement points from the same stimulus can be obtained for every interstimulus interval. Because each signal sample has higher amplitude in the block design, the values also have higher signal to noise ratio, assuming an invariant noise baseline. An important omission of this discussion is the temporal correlation between multiple samples of the same block, which is a vital statistic in evaluating the extent to which averaging a block of stimuli improves the signal estimate. An important and unjustified assumption brought in defense of block design is that the hemodynamic response to the stimulus is a fixed-width gamma function. Although it has been suggested to some extent by the V1 fMRI studies [12] and simultaneous electrode/fMRI paper [31] that the hemodynamic response to some 30 length neural spike train is gamma-shaped, it is entirely conceivable that the pattern of neural firing evoked by a stimulus might be two separated volleys of spike trains, or some other nonlinear coding scheme. Under such a circumstance, the temporal profile of the hemodynamic response, rather than the single average amplitude parameter estimated by block design, could contribute information to discriminate stimulus conditions. Not only would the linearity assumption be incorrect, but the fixed-width gamma model of the response would not apply. The nonlinearity of the underlying neural response as a problem in assuming linearity of the hemodynamic response is mentioned briefly in [40]. Another criticism of block design is that presentation of long intervals of the same condition can promote attention induced modulation of fMRI signal. The defense against this criticism is that by checking that VI is not modulated differently by various conditions, the "attentional confound" of the block design can be discounted. However, it is not clear that attentional confounds would necessarily exhibit themselves in V1 for any or all experiments. With all of these methods for estimation of hemodynamic response, the uncertainty regarding whether the complete dynamics (ie. the sufficient statistics) of the response have been captured can be circumvented by conceding that Principle 1 Any transformation on the original data that allows double-blinded discrimination of the stimulus conditions provides a lower bound on the information content about the stimulus in the data. 2.3.2 Counterbalanced block design The presentation of stimuli can be further expedited by omitting the interstimulus period between blocks. Now the tail of the hemodynamic response from a previous condition will overlap into the current condition. This tail decays in an exponential manner, as described in equation (2.1). In counterbalancing, each condition is presented once before every other condition. Across runs, the absolute position of any particular condition is varied over the block. In perfect counterbalancing, the 31 trials of different conditions would have identical scheduling statistics and indistinguishable joint statistics. A typical scheduling of four conditions across four runs in approximate counterbalancing is as follows: run 1: [0 1 2 3 4 0 2 4 1 3 0 3 1 4 2 0 4 3 2 1 0] run 2: [0 3 1 4 2 0 4 3 2 1 0 1 2 3 4 0 2 4 1 3 0] run 3: [0 2 4 1 3 0 1 2 3 4 0 4 3 2 1 0 3 1 4 2 0] run 4: [0 4 3 2 1 0 3 1 4 2 0 2 4 1 3 0 1 2 3 4 0] This is the schedule used in the third experiment of this thesis, although each run was repeated several times. The numbers 1 through 4 correspond to blocks from respective conditions, and 0 corresponds to an interstimulus interval during which the subject fixates on a cross displayed on the screen. Each block lasts sixteen seconds, and each run is 5 minutes 36 seconds. The subject rests between runs. With approximate counterbalancing, the bias introduced by tails can be made equal across all conditions in each average across trials. This bias however is not a constant offset across the time window of each block. Because the tail of a previous block decays exponentially, the bias, a sum of tails across all conditions, will decay exponentially. The first stimulus block in a series is known to produce anomolously high signal, but this bias is experienced by all conditions in a perfectly counterbalanced design. Biases based on non-stationarity of noise, such as baseline drift, are also distributed across conditions. The exponential tail to the bias is generally not acknowledged in descriptions of the counterbalance method. Instead, the bias is described as "washing out" in the average across trials, and samples within a final averaged block are taken to be estimates of roughly the same signal amplitude. Furthermore, most experiments are not perfectly counterbalanced, so that the bias is not identical in shape for all conditions. Consequently, even perfectly counterbalanced experiments that perform averages across all samples of a condition within its trial-averaged block can only be sensitive to relatively large changes (roughly a difference of four percent increase from baseline) in univariate (voxel-by-voxel) ANOVA of stimulus response. 32 Additional sensitivity in contrast can be acheived by multivariate (voxel vector) analysis even when neglecting to average across trials in a counterbalanced design, and when equal-weight averaging all samples of a single block time window [6]. Despite these simplifying and incorrect assumption of equal-weight averaging within a single block, the detection of significant differences in response between conditions is possible although suboptimal, as suggested by principle (1). Although neglecting to average across trials in a counterbalanced design allows different biases between blocks of various conditions, consistant differences detected between trials of different conditions may still be valid, depending on the nature of the statistical test. Because a valid conventional counterbalanced analysis requires averaging over all counterbalancing blocks of a given stimulus in order to acheive uniform bias over all stimulus conditions, the method is extremely inefficient in producing multiple identical-bias averaged trials of the same stimulus. In fact, univariate ANOVA is often performed across multiple non-identical-bias trials, which is fundamentally incorrect, and in practice weakens the power to discriminate two stimulus conditions. 2.3.3 Rapid-presentation event related design An ideal experimental technique would be able to produce multiple independent, unbiased, low variance estimates of hemodynamic response to many conditions. Under assumptions of linear time invariant hemodynamic responses to conditions, a linear finite impulse response (FIR) model and the linear-least-squares linear algebra technique provide an efficient solution. Using this mathematical method, known as deconvolution or the generalized linear model, stimuli from various conditions can be scheduled in single events separated by intervals commonly as short as 2 seconds. Deconvolution is more than fifty years old, linear-least-squares approximation is older, and application to neural signal processing is at least twenty years old [24]. Deconvolution in [24] was presented in the Fourier domain, without reference to random processes. In the treatment below, a time domain approach is taken that includes additive Gaussian noise in the model. A more computationally efficient algorithm might combine these two approaches and perform the noise whitening and deconvolution in 33 Fourier domain. Also, this method is univariate, with deconvolution performed on a per-voxel basis. Since correlations between adjacent voxels can be substantial, these correlations might be estimated and voxel information combined to produce more powerful estimates of impulse responses. Event related design schedules brief presentations of stimulus conditions with short interstimulus intervals. Typical presentation times are 250 to 500 milliseconds for the stimulus and 1.5 to 2 seconds for the interstimulus interval. For the mathematical simplicity of avoiding presentations at half-samples of the MR machine, each stimulus is presented when scanning of the first slice of a series of slices is started. The time taken to scan the fMRI signal of all selected anatomical slices, and thus the sampling period of a voxel, is refered to as a TR in this section and in neuroscience literature. This is in contrast to the previous section and imaging literature, where TR refers to the relaxation time allowed a single slice in free inducton decay based imaging. Also for simplity in applying the deconvolution technique, the voxel of interest is assumed to be sampled every TR at some instant, whereas the sampling process actually requires time over which to perform the MR imaging procedure. It is not clear whether the fs-fast software implementation of deconvolution [21] accounts for the offset in sample times between slices. Many equations and notations in the below discussion on deconvolution have been directly quoted from [19]. The objective of deconvolution is to find the maximum likelihood estimate of the finite impulse responses of a given voxel to the various stimulus conditions. To estimate these responses, we are given the raw sampled MR signal timecourse and the time samples at which each of the conditions were presented. We begin with a simple generative model of the discrete-time signal we observe to quantify our assumptions about how it is related to the impulse responses of a single condition: y[n] = x1[n] * h[n] + g[n] 34 (2.2) where y[n] is the observed MR signal, x[n] is 1 if the stimulus condition was presented at time n and 0 otherwise, h[n] is the causal impulse response of the condition, and g[n] is a colored zero-mean Gaussian random process. The issue of colored versus white noise is discussed later. Qualitatively, equation (2.2) suggests that the signal we observe is a linear superposition of noise and the stereotypical responses of the stimulus condition, shifted to start when the stimulus is presented. This is a basic concept of linear systems [36]. Over an entire experiment, the raw sampled signal from a voxel y[n] has some finite number of time-points Ntp. The causal hemodynamic impulse response h[n] of the voxel to the single condition will have some finite number of samples Nh. It is computationally convenient to re-express equation (2.2) in matrix form: y = X 1 hi + g (2.3) Here, y is a Ntp x 1 column vector [y[l], y[2], ..., y[n]]T and g is the vector [g[1], g[2], ..., g[n]]T. The term X1 hi uses matrix multiplication to acheive a Ntp x 1 vector with column components equal to the samples of x 1 [n] * h[n]. X 1 is accordingly called a stimulus convolution matrix, and is a Ntp x Nh matrix formed as follows: X1= x1 [1] 0 0 x1 [2] x,[1] 0 x,[3] x,[2] x 1 [1] x,1[n] x,1[n - 1] x,1[n - 2] 0 (2.4) ... x1[n - Nh - 1] Now defining h, = [hi[0], hi[1], hi[2], ..., hl[Nh]]T, the product X 1h 1 results in the desired Nt, x 1 vector of the convolved signal x 1 [n] * hi [n]. The final simplifying step in converting to matrix notation is to allow inclusion of multiple conditions. For the ith condition, denote the Ntp x Nh stimulus convolution 35 matrix Xj, and the Nh x 1 hemodynamic response vector hi. Now each of the N, conditions contributes to the generative model: y = Xh1+ X 2 h 2 +... + XNehN, + g (2.5) Assigning X = [X1X 2 ... XN] and h = [hlh2... hNc, equation (2.5) reduces to: y = Xh + g (2.6) This simple linear equation simultaneously models the relationship between the observed single voxel MR signal timecourse y and the unknown hemodynamic impulse responses of all N, conditions, contained in h. y is Ntp x 1, X is Ntp x Nch, h is Neh and g is Ntp x 1, where Nch = N, x X 1, Nh. Estimating h in equation (2.6) is a problem with Nch unknown parameters. In general, variance in the estimate decreases with the number of unknown parameters for a fixed data set. This can be accomplished here by decreasing Nh, the length of the finite hemodynamic impulse response in the model. Alternatively, each hemodynamic response can be further modeled as a fixed-width gamma function as in equation (2.1). Then there are only Nc parameters to estimate, the gamma function amplitude for each condition. This is accomplished as follows: y = XAp + g Here, Ap has been substituted for h in equation (2.6). A is a Nch (2.7) x Nc matrix that contains the desired stereotypical model impulse response function for each condition, such as a fixed-width gamma function. p is the Ne x 1 vector that contains the unknown amplitude-scaling constants for each condition. We desire a format for A and p such that given assignments to p, the resulting Nch x 1 vector is a vertically 36 concatenated series of impulse response waveforms for each condition, just as with h in equation (2.6). This is accomplished with the following constructions: 0 0 a1 [1] 0 a,[2] 0 a1[N] 0 Po 0 a 2 [1] Pi 0 a 2 [2] A=and etc. p= (2.8) a 2 [Nh] 0 0 0 PNc -.. aNe [Nh] As mentioned in section (2.2.3), the optimal choice of gamma function width can be made using an initial estimate of the raw impulse response with equation (2.6). Regardless of the exact form of the linear model, the problem of estimating hemodynamic impulse responses reduces to a stochastic model of the form y = Xh + g. This is a classical problem of nonrandom parameter estimation [54]. We seek an unbiased minimum variance estimator h of the nonrandom (constant) parameter h. Unbiased means that on average, the estimator equals the value to estimate. To solve this problem, it is not possible to apply the Bayesian notion of minimizing expected least squares difference between h and h, because the optimum choice is h = h, which is an invalid estimator. Instead, the common approach is maximum likelihood, where A is chosen to maximize the probability of the observed data y. The steps below are quoted with minor alteration from the derivation in ([53],p.160). Because g is a jointly Gaussian random vector ~ N(0, Ag) and Xh is a constant, the distribution on y is also jointly Gaussian. 37 We chose h to maximize the probability density function on y given h: (2.9) N(y; Xh, Ag) PY(y; h) = (y - Xh)T A;;(y - Xh)] oc exp[- (2.10) To maximize (2.10), we must minimize the negative of the argument of the exponential term, equivalently J(x) = (y - Xh)TAgl(y - Xh) (2.11) We can now set the Jacobian of (2.11) to zero and solve for h. Alternatively, we can solve for h as a deterministic linear least-squares problem. To convert to the notation in [19], we write J(x) as the norm squared of one term of its Cholesky factorization. Note that the Cholesky factorization of a n x n symmetric positive definite matrix A is LTL, where L is the n x n upper triangular Cholesky factor matrix. J(x) is positive definite because the Cholesky factorization exists. The Cholesky factor of J(x) is simply B- 1 (Xh - y) where B- 1 is the Cholesky factor of A- 1 . We know B exists because Ag is positive definite and hence Ag is positive definite. Ag is positive definite because it is a full rank covariance matrix. Rearranging variables accordingly, we have a deterministic linear least-squares problem: h = min|B- 1 (Xh - y)112 (2.12) h = min |IB-Xh - B-1 yI 2 (2.13) h = min|lFh-g11 2 (2.14) h Observe that B- 1 is Ntp x Nt,, X is Ntp x 38 Nch, h is Nch x 1, and y is N x 1. Consequently, both (Fh) and g are Ntp x 1. In geometric terms, equation (2.14) states that we desire a vector Fh in the subspace of RNP spanned by the column vectors of F that is the closest in Euclidean distance to the vector g in RNtP. The linear projection theorem [1] states that the vector in the F column subspace closest to g is the projection of g onto the F column subspace. By the orthogonality principle [53, 451, the error vector r = Fh - g is orthogonal to the F column subspace. In other words, r is in the left nullspace of A, or the nullspace of AT [45]. We solve the orthogonality condition for the least-squares choice of h: FT (Fh - g) = 0 FTFh = h = (2.15) FTg (2.16) (FTF)-FTg (2.17) Substituting from equation (2.13) for F and g, h = ((B- 1 X) T (B- 1 X))- 1 (Bn-IX) T (Bn[y) (2.18) h = (XT(BTB)- 1X)-1XT(BTB)- 1y (2.19) where in equation (2.19), we have used the properties that (QRS)T = STRTQT and (RT)- 1 = (R- 1 )T for arbitrary equal dimension square matrices Q, R, S, and R, RT invertible. We can rewrite equation (2.19) in terms of C, the normalized noise covariance matrix of g, with C = BTB. We use h instead of h to denote this choice of h as the maximum likelihood estimate: h = (XTC-1X)-1XTC-1y 39 (2.20) To calculate error covariance, write down the definition of covariance, substitute (2.20) for h, and simplify (first and last steps shown here): Aj = E[(e - e)(e - e)T] (2.21) = E[(h - h)(h - h)T (2.22) (2.23) (XTClX) - In the general case, a maximum likelihood (ML) estimator need not be efficient in the sense of achieving the Cramer-Rao bound. Quoting from [53], if an efficient estimator exists, it must be the ML estimator. Efficient estimators always exist for nonrandom parameters in a linear model with additive Gaussian noise as in (2.7). This statement is not proven here, but requires calculation and comparison of the Cramer-Rao bound to the ML estimator error covariance. Because the normalized noise covariance matrix C is not available a priori for the given voxel, the current deconvolution software [21] first estimates h with equation (2.20) assuming g is white noise, i.e. C = I, the identity matrix. The residual error r = y - Xh is considered to be a timecourse example of the noise g (recall the original model, y = Xh + g). Time course examples are culled from all within-skull voxels. The autocorrelation of g is estimated from the time course examples in some unspecified way, probably the brute-force no window method. See [35] for better methods of estimating autocorrelation from samples. The following parameterized autocorrelation function [19] is fit to the estimated autocorrelation, normalized so that R[0] = 1: R [k] = ( - a )pIkI 0 < k < kmax 0 Iki > (2.24) kmax No justification for this particular form is provided, although it does bare resemblance 40 to all-pole modeling [2]. Once o and p are chosen, the entries for C are drawn from R, [k]: C= Rr[0] R[1] R[1] Rr[0] ... Rr[Ntp] Rr[Ntp - 1] Rr[Ntp] Rr[Ntp - 1] (2.25) R[0] This matrix is Toeplitz, meaning simply that any diagonal or subdiagonal has identical elements. A more complete discussion of the conditions for invertibility of XTC-1X and C1 is not provided here. The estimator h is unbiased, as can be seen by substituting y = Xh + g into equation (2.20) and taking the expected value E[h]: E[h] = E[(XTC-lX)-lXTC-1y] = E[(XTC-lX)-lXTC-1(Xh + g)] = E [(XTC-lX)-1 (XTC-X)h] because g is zero-mean Gaussian = E[h] =h (2.26) The hemodynamic response estimation method based on a fixed-width gamma function impulse response is in general not an unbiased estimator, but as discussed earlier, the variance of the estimator may be reduced due to fewer parameters. A separate concern is the presence of deterministic additive drift signals, due to instrumentation, breathing, or other physiological signal source. Detrending the data with low order polynomial functions of time can be accomplished by adding a oto + ait 1 + ... terms to the generative model, and fitting the trends and hemodynamic responses together. This is discussed in [19]. Noise waveforms like breathing, measured concurrently with the experiment, can be similarly "regressed out" of the fMRI signal. 41 The quality of the unbiased estimator h can be measured by the variance in estimation error e = h - h. Recall from (2.23) that E[eeT] is a Nh x Nh covariance matrix Ae = (XTC1X)1, but a simpler summary statistic is provided by E[eTe] = trace(Ae). The associated literature [7] calls the reciprocal of this quanitity the "efficiency", which is accordingly: E= 1 1(2.27) trace((XTClX)l) A strange phenomenon was reported [7] by which employing variable interstimulus intervals, and thereby affecting the stimulus convolution matrix X, this efficiency increases dramatically as the mean interstimulus interval decreases, whereas the opposite trend holds for fixed width interstimulus intervals. No explanation was provided in the article, but the effect presented is dramatic. Current work [38] on characterizing the statistical nature of physiological and instrument noise, hemodynamic response profiles, and signal summation, will yield more reliable and flexible methods for measuring stimulus response in neuroscience studies. 2.4 Analysis of fMRI-estimated brain activity With one or multiple independent estimates of hemodynamic response isolated for voxels of interest, questions of physiological relevance can be answered. The most prevalent question in the literature asks to what extent single voxels or average responses from anatomically focal sets of voxels differentiate between stimuli. The broader problems are (i) to what extent responses from sets of voxels can be used to discriminate the stimulus conditions, (ii) how these voxels encode information about the stimulus, and (iii) where in the brain these voxels are present. These problems can be generally described as codebook problems [52], because they seek to describe a method by which brain activity can be matched with an 42 associated event. A complete understanding of the mechanisms of the brain could generate a codebook, but the converse is not true because the codebook does not specify a physical mechanism. Moreover, every neuron or network of neurons could not feasibly be subjected to every possible condition, so that a generalizeable theory is necessary to complete this codebook. Most methods, like ANOVA or split-half decoding analyses, impose a specific decoding function or a set of functions from which to choose. The most prevalent version of ANOVA in the associated literature imposes a univariate Gaussian model. Split-half based SVM imposes a restricted class of indicator functions over which it optimizes. Analyses that estimate non-parametric probability density functions directly quantify the codebook questions. Information theory and Bayesian networks operate in an assumption free manner at the cost of increased datapoints. Some methods require estimation of a parametric or non-parametric probability density function (PDF). This includes most classical statistics, like ANOVA and information theory. Methods that do not require estimation of PDFs generally include learning machines, especially SVMs, that select from a set of decoding functions based on empirical risk minimization. The classical split between statistical methods is descriptive versus exploratory. Descriptive methods seek to accept or reject a hypothesis regarding a specific relationship between the hemodynamic responses and the stimuli. Exploratory methods, often graph-based, extract some property of the responses without knowledge of the stimuli, and then seek to relate the property to the stimuli. An important division between techniques is univariate versus multivariate. Most fMRI neuroscience work focuses on voxel-by-voxel analysis of the codebook questions, but the fMRI literature since last year (2002) is becoming increasingly aware of the need to explore vectors of voxel responses [25, 44, 6]. Electrophysiology has already transitioned to multivariate analysis with regards to the codebook problem [11], multiunit recording, and prosthetics design. EEG and MEG have found multivariate techniques essential to analysis [15]. Additionally, there is increasing awareness of the importance of multiscale and multiresolution searches [43]. 43 Finally, most methods define to varying degrees, a generalizable theory that extends the codebook from a few stimuli to the entire set of stimuli. The most explicit of these methods are called models, and predict the behavior of the system of interest under a wide range of conditions. Some biologically relevant models incorporate aspects of the underlying physiology. The following sections review ANOVA, principle components analysis (PCA), and support vector machines (SVM), three methods that are employed in this study. For each of these methods, the reader is referred to standard texts [48, 41] for detailed development. These methods represent the various divisions in analysis methods. However, many potentially useful techniques have been omitted from discussion, including information theory and mutual information, independent components analysis (ICA), dimensionality reduction algorithms [33, 42, 47], other data compression algorithms, other learning machines like neural networks, and classical detection/estimation methods like linear predictive (Wiener) filters, and adaptive (Kalman) filters. 2.4.1 Univariate analysis of variance (ANOVA) The face detection areas of the brain are identified in the literature and the experiments in this thesis by sets of anatomically localized voxels that produced significantly higher amplitude responses to faces versus nonfaces. Localized patches of face-selective voxels in the temporal lobe and extrastiate occipital lobe have been defined in the literature as the fusiform face area (FFA) and occipital face area (OFA) respectively. Because of its simplicity, univariate ANOVA has been the most popular choice in making this statistical comparison. In the procedure, an individual voxel is selected and the gamma-fit amplitude of impulse responses to the engaged and disengaged conditions are compared. The voxel is called "significant" if the probability that the amplitudes belong to one distribution is less than some threshold, typically 0.05. In general, the region of interest includes individual voxels whose response amplitudes are significantly different between two conditions. The literature typically only includes discovery of regions of interest that are anatomically localized sets of ten or 44 more 27 mm 3 voxels. Scattered and small sets of significant voxels are disregarded as sampling anomolies because of the large number of total voxels (about 40,000 in this study) examined. The Bonferonni correction [49] relates to this issue. Univariate ANOVA calculates the ratio of sample variance within groups to across group, to determine which of two or more groups are indistinguishable. The F- statistic, developed on a chi-square random variable to measure significance of this ratio, is based on the premise that trials from each condition are drawn from a Gaussian distribution of amplitudes. When only two groups are compared, the technique is equivalent to a t-test. Details of univariate ANOVA and variations on the technique are provided in [41]. It should be noted that nonparametric statistical techniques, including the KolmogorovSmirnov test [50] , mutual information [5], and SVM classifier gradient [14] have also been used in literature with similar results to a t-test in the isolation of the FFA. 2.4.2 Principle components analysis (PCA) Suppose we are given a p-dimensional response feature of each voxel in a region of interest (ROI) for multiple trials of several conditions. This feature might be one dimensional, as with the amplitude of a gamma-fit to the response, or h-dimensional, with one dimension for each time-sample of the hemodynamic impulse response. The response feature of a given voxel i across a set of stimulus conditions describes a random vector [xi, + -..Xi+P-1]. To describe the responses of a set of m ROI voxels, we have a vector of random variables x = [x 1 , x 2 , ... , Xn]T, where n = m x p. The objective of PCA is to choose the linear transformation A that rotates x into a vector of uncorrelated random variables y by y = ATx. Rank ordered by their own variance, the first s of these random variables [yi, ... , y]T form the s-dim subspace with the smallest average squared-error between an n-dim ROI response and an s-dim approximation. The following discussion of PCA draws heavily from class lectures ([54] on 9/11/02, 9/13/02); many equations are quoted directly from those lectures. The PCA axis rotation matrix A can be derived from the expected least-squares approximation criterion using the method of Lagrangians. For simplicity of discussion, 45 we instead present the matrix A, and demonstrate that the resulting random vector y - ATx indeed consists of n uncorrelated random variables [yi, ... yn]T. Define the n x n covariance matrix A, of the n x 1 dimensional ROI voxel vector x AX = E[(x - )(x - R)T] (2.28) Observe A. is symmetric because A, = AT and positive semidefinite because it can be written as Ax = FTF with F = (x - R)T ([52], p. 50). Quoting from ([52], p. 49) a symmetric square matrix A is positive semidefinite iff xTAx > 0 Vx, and positive definite iff xT Ax > 0 Vx. Symmetric implies that the matrix has n independent, orthonormal eigenvectors v3 , meaning that v Vk = 1 if j = k and 0 otherwise. Positive semi-definiteness implies that all eigenvalues of Ax are real and non-negative, Ai > 0. Observe that this does not guarantee Ax is full-rank, which would mean invertible and Ai > 0. A positive semi-definite matrix is positive definite if and only if it is invertible [52]. Consequent to these conditions, we are guaranteed that the following construction of n x n matrix A exists. Define the columns of A to be the n orthonormal eigenvectors of A.: A vi (2.29) V2 I j - Because the eigenvectors are orthonormal, ATA = I the identity matrix, meaning that A is orthogonal, AT = A- 1 . Define the diagonal eigenvalue matrix 46 A, A (2.30) A2 Then AxA = AA. Since A is orthogonal, it is invertible, so A, = AAA-', and using A- 1 = AT for A orthogonal, we have A, = AAAT. Consider the data random variables x rotated into the new set of random variables y = ATx. Quoting directly from [54], the covariance matrix Ay is calculated as follows: (2.31) AY = E[(y - y)(y - (2.32) y)T] = E[A T (x = E[AT(x - k)(x - k)TA] - k)(A T (x - k))T] (2.33) (2.34) = E[ATAxA] (2.35) = E[ATAA] (2.36) = E[A- 1 AA] =A (2.37) (2.38) with A as given in equation (2.30). Because the autocovariance matrix Ay is diagonal, we have shown the change of coordinates y = ATx rotates to a set of uncorrelated basis random variables y. In PCA terminology, the eigenvectors listed in the columns of A are known as the principal components of the observed data x. The final relevant property of PCA is the minimization of expected least-squares error in subspaces. Consider the best s-dimensional approximation to random vector x, given by 47 (2.39) and its residual error vector e =x- x The vectors #/ (2.40) define bases for our approximation subspace. The optimal choice of subspace that minimizes the expected square residual error E[eTe] is found through the Lagrangian technique briefly mentioned at the outset of this discussion on PCA. The result is to choose (01, 0 2 , ... , A.) to be the s first eigenvectors of Ax, ranked from greatest to least eigenvalue. The calculation reveals that the least average sum of squared residual errors, based on PCA, is E[eTe] = As+, + As+ 2 + ... + An (2.41) where the approximation subspace is s-dimensional, and the original space is ndimensional. The utility of this result in equation (2.41) is that we can now examine the response of a ROI response vector to a multitude of stimulus conditions and determine some number of uncorrelated random variables that would be needed to explain the dynamics of the output. Comparing this against some notion of the dimensionality of the inputs, we might postulate upper bounds regarding the degrees of freedom or information capacity of the ROI voxel set and brain system itself. The caveat of developing an explanatory model with a linear combination of uncorrelated random variables is important and at times intolerable. Simple deterministic nonlinear systems can transform an input of only one independent random variable into an output that requires many PCA uncorrelated random variables. Even if the 48 system is linear, it may be difficult to determine the true dimensionality of the data, rank(A.), as noise typically ensures that experimental data sets are full-rank with some decaying profile of eigenvectors. Projecting the high demensional voxel response data into the first two most dominant eigenvectors allows a least-squares visualization of the data. The graphs generated from the projection of this data onto the eigenvector subspace allow for an exploratory mode of data analysis. The factors most discriminated by the dominant eigenvector modes of the data can be visualized. This method is most effective when ROI voxels are highly correlated, so that the data lies mostly in a two dimensional subspace despite being embedded in a high dimensional space. 2.4.3 Support vector machines (SVM) While PCA in the form described above is a useful tool for the exploration and characterization of variance in the data, it does not provide a direct quantification of how well an ROI distinguishes between stimulus conditions. The support vector machine does perform such a direct quantification, typically between two stimulus conditions (binary classification). M-ary classifications are accomplished with log 2 M binary classifier SVMs. As with any learning or prediction algorithm, the evaluation process includes training and testing. The training data set, also called a labeled data set, includes examples of the multidimensional ROI response paired or "labeled" with the stimulus that produced the response. The SVM optimizes its parameters based on the labeled data set so as to minimize the probability of misclassification in the testing data set, which is assumed to be drawn from the same distribution as the labeled set. The success of the binary SVM or system of SVMs on the test data provides a lower bound estimate on the amount of information available in the ROI to discriminate the stimulus conditions. Binary SVM classification is based on finding the hyperplane in the desired data space that separates the two conditions so as to minimize testing error. Because the dividing boundary is a hyperplane (w -x + b = 0), any test point can be classified by essentially evaluating the sign of the dot product between the normal vector of 49 the hyperplane and the test point vector, f(x) = sign((w - x) + b). This notation is taken directly from [13]. More sophisticated decision boundaries are achieved by first performing a nonlinear mapping of the data space. The optimal hyperplane in the new data space has some more complex shape in the original data space based on the mapping. When using a nonlinear map when classifying the test data, it can be cumbersome to first perform the map, and then evaluate the classifying dot product. In some cases, it can be possible to express the classifying dot product as a function of the dot product in the original data space. This expression is called a kernel, and can save multiplications especially when the new data space is higher dimensional than the original data space. Since kernels are computationally efficient and determine the SVM decision boundary, much research has focused on crafting kernels that are customized for specific applications. A brief introduction to SVM is provided in [13], and a comprehensive textbook has been written by Vapnik [48]. The ability of SVMs to use fewer training data points than probability density function (pdf) based maximum likelihood (ML) estimators is not magical. This ability is shared by pdf methods that use a parameterized model of the pdf rather than trying to nonparametrically estimate the pdf. For example, if a multivariate Gaussian distribution with independent voxels was the appropriate model for the response of an ROI under one condition, then the multivariate mean and variance of the Gaussian could be estimated with relatively few training data points. If no assumptions were made about the shape of the pdf, then any windowning technique on those few training points (e.g. Parzen windows) would be a comparatively rough estimate of the true Gaussian distribution. Suppose however that despite the parametrized Gaussian assumption, the underlying distribution was not Gaussian at all. As the training set increased, the parameterized pdf approach would persist in its incorrect estimate of the true pdf, whereas the nonparametric method would converge to the true pdf. The disadvantage of the parametrized pdf approach is that its performance might not be good with any size training set if the underlying assumption is incorrect. The SVM performs well with few training points for the same reason as a param50 eterized pdf: it makes strong assumptions about the true underlying scatter of the data under each condition and uses the training data to tune those strong assumptions. And just as with parametrized pdfs, only when those assumptions are close to correct can the SVM perform well with few or many training points. With parameterized pdf models, these assumptions come in the functional form of the pdf that is chosen. With SVM, the assumptions are expressed by the chosen data space. ROI voxel responses under two conditions can be described by points in a multidimensional coordinate system where each point represents a trial, and each coordinate contains some information about the ROI's response during that trial. One example coordinate system discussed in 2.4.2 assigns each voxel's gammafit amplitude to each coordinate. The chosen data space could be this coordinate system or any other, perhaps a nonlinear mapping of the current data space to a new space of different dimensions. Because the SVM procedure finds a dividing hyperplane that minimizes training error in the chosen data space, it is essential that the underlying distribution of responses to the two conditions generally fall on separate sides of some dividing hyperplane. If this is not the case in the chosen data space, then the SVM will not have the power to "generalize" to the testing data set. Choosing the appropriate data space is as essential to the success of SVM as choosing the appropriate functional form in a parametric pdf approach. One application of SVM to stimulus classification in fMRI [6] reported modest success of a linear SVM on the data space with the average response of each voxel in the ROI recorded in a different coordinate. The third experiment of this thesis (chapter 5) employed a linear SVM on a dataspace using various features of voxel response for classification of the gender of the face stimulus. Results were not promising, and systemic non-stationarity that similarly affected both stimulus conditions was implicated. After estimating and removing the non-stationarity, successful SVM classification was demonstrated. Details of this experiment are discussed in chapter 5. 51 52 Chapter 3 Experiment One: FFA response to nonface stimuli The fusiform face area (FFA) in the posterior temporal lobe produces maximal fMRI signal when subjects view pictures with human faces [29]. Similarly, the parahippocampal place area (PPA) responds maximally to indoor or outdoor scenes [9]. Maximal response to a stimulus category implies the response of a region distinguishes members from nonmembers of the category, but does not rule out the representation of other stimulus information in the response. Here we present evidence that both regions' voxels represent nonpreferred-category objects in terms of their similarity to the preferred category, and both regions are sensitive to the difference between photographs and line drawings. Principal components analysis of the FFA and PPA responses to nonfaces and nonplaces respectively suggest one to three dominant eigenmodes of variation in the data. The first two nonpreferred stimulus response modes are biased towards the direction of variation between preferred and nonpreferred stimuli or the discrimination of photographs and line drawings. We propose that FFA and PPA voxel responses to nonpreferred stimuli are optimized for detection, i.e. discrimination between preferred and nonpreferred stimuli. In the experiment, eight subjects were scanned with fMRI by M. Spiridon for a separate study [44]. While they viewed 16 second epochs (blocks) of pictures, subjects performed a "one-back task" to demonstrate their attention, where they would 53 indicate consecutive presentations of the same picture by pressing a button. Consecutive presentation events would occur randomly within a block and with a frequency of two per block. Data from two of those subjects were corrupted by excessive head movement. Four of the remaining six subjects were analyzed for this study. Stimuli collected included seven different categories (faces, cats, houses, chairs, scissors, shoes, bottles) with grayscale photograph or line drawing versions of each picture. See [44] for pictures of the stimuli. Presentation was based on brute-force block design (section 2.3.1) as discussed in [44]. Standard motion correction was performed based on the fs-fast software program (reference fs-fast) and detrending was based on average amplitude of pre- and post-stimulus fixation intervals. FFA and PPA regions of interest (ROI) were localized with standard stimulus contrasts [44] and univariate ANOVA (section 2.4.1) implemented in fs-fast [21]. Trial-averaged block percent signal change from baseline hemodynamic response timecourses were computed separately for each voxel of each ROI. Subsequent to this processing, fourteen timecourses were accordingly available for each voxel of interest, including responses to photo and line drawing versions of each of the seven stimulus categories. These voxel responses across each ROI were then represented as points in multidimensional space, one for each stimulus category. First, a voxel response feature, a function of each voxel's block hemodynamic timecourse was chosen. The feature selected here was a vector of the full timecourse of a voxel's response, although the average amplitude or gamma-fit amplitude (gamma convolved with a square pulse for the block hemodynamic response shape) were also possible features. Note that an amplitude average of this timecourse would have provided a lower variance (and hence lower dimensional) feature of the block hemodynamic response. For each stimulus, a different ROI response vector was formed by concatenating the feature vector from every voxel in the ROI. The ROI response vectors were data points graphed in a n-dimensional data space, where n is the dimensionality of the voxel response feature times the number of voxels in the ROI. With 40 voxels and 10 timepoints per hemodynamic response, the space would be 400 dimensional. The variance in feature response across stimuli for each voxel was not normalized, although this would pro54 vide alternate insight by preventing higher signal range voxels from dominating the analysis. First we performed PCA and verified that the set of fourteen data points included one dominant eigenmode both in the FFA and PPA (Figure 3-1 d and e respectively). This result was expected because the precondition on an ROI voxel is that it respond maximally to preferred category stimuli. One dominant eigenmode was not a trivial consequence of the precondition, because variance in the signal in other directions could have resembled or even dwarfed the face-nonface variation in magnitude. Nevertheless, this situation was less likely because the FFA average response (gemoetrically, data projected onto the mean vector) also shows separation of face and nonface stimuli as the dominant source of variance. Since both preferred and nonpreferred category stimuli were present in the fourteen data points, the large magnitude separation between preferred and nonpreferred datapoints compared to any other source of variance resulted in one dominant eigenmode. We visualized the high-dimensional data set by projecting the fourteen data points onto the first two principal components calculated by principal components analysis (PCA). This produces a least-squares estimate of the data configuration as explained in section 2.4.2 on PCA. The resulting graphs (Figure 3-1 a,b,c) showed the large separation between nonpreferred and preferred stimuli required of ROI voxels. Both PPA and FFA graphs also demonstrated a systematic separation of data points across all subjects by classification as photographs and line drawings. By removing the preferred stimulus data points we characterized the PCA dimensionality of the FFA and PPA voxel responses to nonpreferred stimuli. The preferred stimuli for FFA and PPA included faces and houses respectively; cats were removed from the analysis because they were not clearly categorized as faces or nonfaces. Unlike the PCA dimensionality of the entire data set, the dimensionality of nonpreferred points was independent of the preferred-nonpreferred ANOVA contrast selection of ROI voxels (Figure 3-1 g,h) . The analysis demonstrated that both FFA and PPA responses to nonpreferred stimuli were embedded in low dimensional spaces. A summary of PCA variance with and without preferred stimuli across all four subjects in 55 7 I2 ~6 2 7 3 4. I120 2 4 746 5 0 3 13 7 I 4 3 4 62 3 -20 2 7 -2D 0 -10 10 0 -5 5 1st Pricmpd CompOnet 1st Prki~e Componnt -20 13t PrIcM -40 10 (b) (a) 1M 20 0 CapOM" (c) ,inn. . . . so I 80 0 0 20 >20 20 1C2 P~dpd 3COnManut 4 S 5 1 0 2 RkEI (d) 34p5 -OO- 6 7 I 1 2 3 7 8 -**5C -OPri 4 9 10 (f) (e) ~80 I I0 40 20 . 1 II2 5 4 3 M2i6c7M. 6 U. 1 _ 7 IN=2 4 5 6 7 (li) (g) Figure 3-1: Typical results from one subject of PCA variance and projection analysis in (a,d,g) FFA, (b,e,h) PPA, and (c,f) all visually active voxels. (a,b,c) ROI response vectors projected onto first two principal components. Labels 1 and 2 indicate face stimuli; all other labels are non-face stimuli. Label 5 indicates house stimuli; all other labels are non-place stimuli. Photograph stimuli are denoted in boldface italics, and line drawings are in normal type. (d,e,f) Variance of dataset along principal components including nonpreferred and preferred category response vectors. (g,h) Variance of dataset along principal components excluding preferred category response vectors. 56 101 80 i60 0 so- I j 20 >20 0 0 (a) (b) Figure 3-2: Sum of variance from PC1 and PC2 for data (a) including and (b) excluding own-category stimulus responses. The labels S1 through S4 denote the subject number, consistent between panel (a) and (b). FFA, PPA, and all visually active voxels is given in Figure 3-2. Despite removing the preferred stimulus data points, dominant eigenmodes persisted in the nonpreferred data points, as evidenced by the large percentage (typically greater than fifty percent) of variance explained by the first two PCs. What were these "nonpreferred eigenvectors", i.e. the eigenmodes of these nonpreferred stimuli? Did the different responses to nonpreferred stimuli mainly or exclusively result from their varying similarity to the preferred stimuli? We wanted to know whether the diversity of responses produced by an ROI to nonpreferred stimuli related at all to the function of distinguishing preferred and non-preferred stimuli. This question was equated with the geometric question of whether either of the two most dominant nonpreferred eigenvectors were aligned with the "detection vector" that connected preferred and nonpreferred objects. The aligment was visualized by plotting the nonpreferred stimulus data on a plane defined by the eigenvector and detection vector coordinates (Figure 3-3). As with correlation, if the vectors were aligned, the data would fall neatly along a line. For every subject in both FFA and PPA, this effect was observed for either the first or second nonpreferred eigenvector. In the other eigenvector, misalignment was a result of the eigenvector coding for photograph versus line drawing rather than detection. 57 7 a 14 s.a. - 93 degs 6 3 _ 2 g 7 0 3 6 8- 4 -2 12 z 3 5 4 -10 -8 -6 Detecdon Vector Coordinate -12 5 -10 -8 -6 Detoction Vector Coordinate (b) (a) 4 s.a. - 156 degs 4 3 %,2 0 0. 4 4% 31 96 degs 4 5 1 2 5 z s.a. 4 7 7 -4 -4 -15 4 3 4 0 -12 s.a. - 138 degs -10 -5 0 Detection Vector Coordinat -5 0 Detection Vector Coardinat (d) (N Figure 3-3: Typical plots from one subject of nonface principal component coordinates against face detection vector coordinates for stimulus responses in (a,b) FFA and (b,c) PPA. The detection vector coordinate connects the average face response vector to the average nonface response vector. (a,c) ROI nonface response vector coordinates projected onto PC1 of nonface data points and the detection vector. (b,d) Similarly for nonface PC2. Alignment of data along a line indicates strong correlation between the two axis variables. Geometrically, this indicates either the nonface data already falls along a line, the vectors corresponding to the axis variables are aligned, or some mixture of these two scenarios. PC nonface variance analysis (Figure 3-1 g,h) suggests that the axis alignment concept is appropriate. The majority of variance is not explained exclusively by the first PC, so the data does not already fall along a line in the original response vector space. Although fourty percent of the variance falls along a line, this is not enough to allow any coordinate choice to trivially project data onto a line, as evidenced by the coordinates chosen in (a) and (d). Axis alignment is quantified directly in Figure 3-4. 58 U '~ i 0 10 20 30 40 50 60 70 subtending angle (degrees) 80 0 90 10 20 30 40 50 60 70 80 subtending angle (degrees) 90 (b) (a) Figure 3-4: Summary of subtending angle between detection vector and nonpreferred category principal component in (a) FFA and (b) PPA for four subjects. If the subtending angle is greater than 90 degrees, the angle is subtracted from 180 degrees to reflect the alignment between vectors rather than the direction in which the vectors are pointed. Two vectors are completely aligned at 0 degrees and completely unaligned or orthogonal at 90 degrees subtending angle. The alignment of nonpreferred data eigenvector and detection vector could be quantified by the angle subtending an eigenvector and the detection vector. With both vectors normalized, this angle was mathematically equivalent to the (zero-delay) correlation between the two vectors because AR = RI cos 0. AB The smaller of the angles between the detection vector and the two dominant nonpreferred data eigenvectors was recorded across subjects for FFA and PPA (insert figure). In both ROIs, and especially the PPA, the nonpreferred eigenvector is biased towards alignment (zero subtending angle) with the detection vector as opposed to exhibiting uniform misalignment or consistantly low alignment. Some of the misalignment can be explained by error in the calculation of the true detection vector and the true nonpreferred eigenvectors due to a limited data size and noise in the data coordinates. One question that arose in analysis was how much alignment should be considered as strong alignment. Because angle is equal to correlation in this case, we can equivalently ask how much correlation is strong correlation. The standard comparison made in most research is against a null hypothesis of zero correlation. This would equivalently correspond to a uniform distribution of angles across subjects. This is clearly not the trend observed in the accompanying summary angle graphs. Because 59 the FFA subtending angle is clustered around 38 degrees rather than zero degrees, the true FFA subtending angle is most likely not zero degrees in particular. The PPA subtending angle is lower than in FFA, but also exhibits significant variation across subjects. Nevertheless, because the nonpreferred principal component is typically aligned within 45 degrees of the detection vector in both cases, over 50 percent of the energy of the principal component modulation is in the detection vector direction. This is derived from the fact that the leg of a 45-45-90 degree triangle is " of the hypotenuse, and the signal energy here is the square of the magnitude of the response vector. Consequently ( )2 x 100 = 50%. Intuitively, a signal vector at 45 degrees to the x axis in the x-y plane has half of its power in each of the basis directions x and y. In summary, this experiment showed with four subjects that (1) both FFA and PPA voxels are sensitive to photograph and line drawing stimulus differences (2) voxels in both ROI have low dimensional response to nonpreferred stimuli, and (3) the dominant response variation in nonpreferred stimuli encodes information about detection and the photo/line drawing contrast. Seminal work defined the FFA [29] and PPA [9] as regions responding (in ROI voxel averaged gamma-fit amplitude) maximally to faces and places respectively, establishing those regions as category detectors. This experiment demonstrated how this detection information is represented in FFA and PPA, by amplitude modulating one specific pattern of response across the ROI. An approximately orthogonal pattern of response is modulated and superimposed to signal contrast between photo and line drawing stimuli. A recent review on object recognition [22] asserted FFA invariance to photographs and line drawings, perhaps previously unobserved because this axis of variation is orthogonal to the face detection vector. Dimensionality of the response is related to the amount of information the region can support. We measured significant nonpreferred stimulus data variance in one to three dominant eigenmodes. This suggests that the nonpreferred response has the degree of freedom of at most three uncorrelated random variables. Because independent implies uncorrelated but not vice versa, the PCA dimensionality provides an 60 upper bound on the number of necessary independent explanatory random variables. Our PCA result suggests that PPA and FFA voxel response can only encode at most three independent feature characteristics of nonpreferred stimuli. Two of those characteristics are similarity to the preferred category and the contrast that distinguishes photographs and line drawings. The standard objection to a performance bound such as this is that performance might be improved by a noise-free experiment or elimination of the hemodynamic spatiotemporal smoothing of the neural response. However, these objections do not obviate an empirical performance bound. The bound can be qualified by an accompanying report of the noise characteristics. Furthermore, the fMRI spatiotemporal smoothing of neural response and analysis of tens of voxels that contain millions of neurons is not a drawback to the performance bound. Rather, the performance bound characterizes the neural system at the coarse scale. Because systems can have different properties at various scales of observation, the fMRI performance bound gives insight complementary to electrophysiology. 61 62 Chapter 4 Experiment Two: FFA response to face stimuli The fusiform face area (FFA) of the posterior temporal lobe was identified as an anatomically focal set of voxels that responded maximally when subjects viewed human faces [29]. While this evidence supports face detection coding, it is unknown whether the FFA further codes finer-grained information about faces, like individual identity, familiarity, emotional expression, or age. Here we present evidence that the FFA response to a diverse set of face stimuli has a dominant eigenvector, but sufficient remaining uncorrelated variance to potentially code further information about faces. Multivariate analysis suggests that race and gender are two contrasts coded by the FFA. A subsequent univariate test suggests a simple modulation scheme exists in the FFA for coding gender. This motivates the separate study of FFA and gender presented in the next chapter (chapter 5). We propose that unlike the response profile across nonpreferred stimuli (chapter 3), the FFA response to face stimuli is sufficiently rich to potentially support coding of additional face features. In this experiment, two individuals were scanned with fMRI while viewing stimuli drawn from a set of sixteen faces and one car. The stimulus set was constructed from the orthogonal crossing of four contrasts: adiposity, age, race, and gender (Figure 4-1). The response of the FFA for each of the seventeen stimuli was measured using an event-related design (section 2.3.3) with between 58 and 68 presentations 63 of each stimulus. FFA voxels were isolated by a two-run blocked design face-nonface contrast localizer. Data from one individual was eliminated from the study because their FFA as isolated by this independent localizer appears to have responded more to cars than faces. Motion correction, detrending, and event-related deconvolution were performed using the fs-fast software application [21]. The processed data set included event-related hemodynamic impulse response timecourses under the seventeen stimulus conditions for every voxel of the FFA region of interest (ROI). As with nonface analysis (chapter 3) , the ROI response to each stimulus was then represented by a point (vector) in high dimensional data space. The feature extracted from each voxel impulse response was its maximal amplitude, without gamma fit. The ROI response vector for a given stimulus condition was formed by concatenating the feature vector of every voxel in the FFA. The processed data set now constituted seventeen points in this stimulus response data space. By removing the car stimulus data point we characterized the PCA dimensionality of the FFA voxel responses to preferred stimuli. As with the PCA dimensionality of the nonpreferred stimuli (chapter 3), the dimensionality of preferred points was unconstrained by the preferred-nonpreferred ANOVA contrast selection of ROI voxels, and moreover based on an independent data set. The analysis demonstrated that FFA responses to preferred stimuli were not embedded in low dimensional spaces (Figure 4-2 a). Instead, the eigenvector variances included one prominent eigenvector, with the remaining eigenvector variances dropping off smoothly. In comparison to an independent Gaussian random vector of the same dimensionality (Figure 4-2 b), the remaining variance after the first eigenvector dropped off in a similar gradual fashion. The relevance of this result is that while thirty percent of the diversity in response to face stimuli followed one set modulated pattern (an eigenvector possibly related to detection coding, as with nonface stimuli in chapter 3), the remaining seventy percent of variance across stimuli included largely uncorrelated voxel responses. This result implies that, although zero correlation does not prove independence, the responses to face stimuli do not contradict the possibility of several explanatory independent random variables. In contrast to nonface stimuli, the FFA response to face stimuli 64 Figure 4-1: Stimulus set used in study of FFA response to faces. The set includes pictures of sixteen faces and one car. Four contrasts are equally represented, including adiposity (columns 1 and 2 versus 3 and 4), age (rows 1 and 2 versus 3 and 4, gender, and race. Faces were obtained from [37], with exception of the African Americans in the second row and third column which were photographed by the author, and the adipose young Caucasian male obtained from the laboratory. Two face images (elderly, African American, female) have been witheld as per request of volunteers. 65 I I 2C 20 Prtncpal COmponen" Pprftoconean (b) (a) Figure 4-2: Variance of PCA components (a) across FFA response to faces, with preferred stimulus data from one subject, and (b) generated from a Gaussian normal vector with independent identical elements N(0,1). Dimensions and dataset size match between (a) and (b). shows the rich variation that would be needed to encode multiple stimulus features. Because our stimulus set was balanced across adiposity, age, race, and gender, the data afforded the opportunity to directly test FFA sensitivy to these four contrasts. In order to exploit the coding flexibility of our multidimensional data space, we employed a multivariate ANOVA to compare the separation in the mean between each contrast. Using the built-in Matlab 6.0 function manoval, data were first labeled by the subcategory of a given contrast. With the age contrast for example, each data point was labeled "old" or "young" according to the age of the face stimulus. The manoval function received the labeled data set, and calculated the optimal linear projection that would best separate the two subcategories. As described in the Matlab function reference, this reduces to minimization of a quadratic form very similar to PCA. With the data projected onto this optimal subspace, the standard ANOVA statistic was calculated on the one-dimensional subspace that afforded the greatest ANOVA score. The resulting projections and p-values are presented in Figure 4-3. The multivariate ANOVA analysis showed that the FFA significantly differentiated race and gender, but not adiposity or age. The race effect was previously reported [18] in a study that employed blocked design to demonstrate elevated average FFA signal to face stimuli from a subject's own race. A blocked design might be particularly 66 10 8 50 Race d=1, p=0.0055 40 6 30 4 20 2 1 Gender 5 d=1, p=7. l87e-O7 10 0 0 z 0 -2 -10 -4 -20 -6 -30 -8 -40 -50 (a) 4 3 (b) 5 Adiposity 3 74 1 d=O, p=0. 4 * 2 3 Age d=O, p=0.2332 2 0 -1 z -1 -2 -2 -3 -3 -4 -4 -5 S U (d) (c) Figure 4-3: Multivariate ANOVA across (a) race, (b) gender, (c) age, and (d) adiposity for FFA response to faces in one subject. The data is projected onto the optimal category-separating axis, canonical variable 1, abbreviated "ci". This value is plotted along the y axis in the above graphs. The corresponding p-value is printed on each panel, indicating the probability that the data belongs to a single Gaussian random variable. The accompanying d value indicates that the d-1 Gaussians null hypothesis can be rejected at p < 0.05. Here, we compare only two groups in each panel (eg. male/female), so the maximum d value is 1. 67 6.0 5.5 . E 5.0 4.5 F=4.14 Prob>F = 0.0614 Female Male Figure 4-4: ANOVA on a one-dimensional FFA response measurement to male and female stimuli. For each voxel, the hemodynamic impulse response was fit with a polynomial and the maximum amplitude extracted. This value was averaged across all voxels in the FFA to produce the response feature analyzed with this ANOVA. Panel inscriptions denote the ANOVA F-statistic value and the p-value (Prob>F), the probability that a single univariate Gaussian would produce separation between male and female stimulus responses equal to or greater than observed here. A p-value of 0.0614 is near the conventional significance threshold of p=0.05 succeptible to attentional bias towards own-race stimuli. This present experiment differed from the published result in that it employed event-related design, and the fast, random category presentation style might arguably mitigate attention-based modulation. Moreover, rather than comparing average FFA response between two blocks of exemplars, we compared multivariate responses to independent exemplars of each race, gender, adiposity, and age. One concern in interpreting the multivariate ANOVA results was that perhaps the high significance scores were an artifact of having many fewer data points than the dimensionality of the coordinate space. Could any 16 points, embedded in a 200 dimensional space and partitioned arbitrarily into two categories, produce a significant multivariate ANOVA score? While the choice of linear projection to maximize category separation is greater with more coordinate dimensions, many of those projections are equivalent because a 16 point data set is spanned by a 16 dimensional space. And while any N different points in an N-dimensional space can be perfectly separated with a linear projection (because of the VC dimension of the linear SVM, and the linear projection onto the normal to the separating hyperplane), this does 68 not guarantee that the resulting projection will be sufficiently separated to produce a significant ANOVA score. Empirically, not all sets of 16 data points result in a significant p-value. This is illustrated by the insignificant multivariate ANOVA p-value for age and adiposity (Figure 4-4). Nevertheless, a null hypothesis that depends on both the number of data points n and the n degrees of freedom in choosing an optimal projection would be more appropriate and conservative than the one degree of freedom assumed by the ANOVA null hypothesis used here. In order to further investigate the coding of gender in FFA, we asked whether the polynomial-fit impulse response amplitude averaged across all FFA voxels might differentiate gender (Figure 4-4). The corresponding univariate ANOVA, performed using Matlab's built-in function, reported separation in the means at a p-value of 0.0614, just missing the standard 0.05 p-value significance threshold. Accordingly, the third experiment of this thesis (chapter 5) was an effort to increase statistical power by gathering additional data points per subject by focusing exclusively on the gender contrast. In summary, this experiment showed that (1) the FFA voxel set has one prominent mode but an otherwise high dimensional response to preferred stimuli, and (2) the FFA is sensitive to race and gender of face stimuli. Unlike the situation with nonface stimuli, the FFA response to face stimuli is sufficiently diverse to potentially support coding of many stimulus features. The original characterization of the FFA [29] was as a face detector, with maximal average ROI voxel response to faces. This experiment extends the concept of the FFA response as a face detection signal, and suggests that it may be capable of coding finer grained information. At the very least, the gain on the face detection eigenmode may be biased by race and gender. In the context of recent literature which reports average FFA signal modulation by race [18] and familiarity of the face stimulus to the subject [30], this result broadens the classical concept of the FFA as face detector. One interesting further question regarding this last point is whether with the documented race effect, the multidimensional detection vector that separates face from nonface points is in the same direction as the detection vector between African 69 American and Caucasian. This would differentiate between the case in which race simply enhances the face detection signal (allowing for better face detection), and the case in which race is coded in a different, perhaps orthogonal way from face detection. With repeated measures of the subtending angle between these two vectors, a simple statistical t-test could be used to determine whether the subtending angle was significantly different from zero. Subsequently, the degree to which the vectors were orthogonal would quantify the degree to which race contributed energy to the face detection eigenmap (vector). A similar question could be investigated regarding the gender contrast. 70 Chapter 5 Experiment Three: FFA response to stimulus gender Because the fusiform face area (FFA) is defined by its face detection signal [29], results that indicate other information in the FFA broaden this classical characterization. Two recent papers have demonstrated other information in the FFA, namely race [18] and the presence of nonface objects to which people are expert [17]. Here we present evidence that the FFA signal is biased by the gender of the stimulus. Employing fMRI counterbalanced block design, we measured an increased average response amplitude in FFA for one stimulus gender versus the other within runs. The preferred stimulus was male for one male subject, and female for the remaining three subjects, one of which was female. Consequently, the effect cannot be explained by low level stimulus confounds like brightness or by high-level phenomena like differential attention to same or opposite sex faces. We propose that the FFA is biased by stimulus gender. In this experiment, four subjects were scanned with fMRI while viewing 16 second epochs of pictures and performing a one-back task. A four-block counterbalanced fMRI design (section 2.3.2) was employed with epochs for both genders and two nonface objects. Faces from both genders were white, average adiposity, middle-aged, homely, and plain expressioned. This homogeneity was intended to focus the dominant contrast between stimuli on gender. The nonface objects were office chairs and reception chairs, selected in an attempt to capture a low-level visual similar71 ity comparable to the similarity between faces of different genders (see Figure 5-1). Motion correction was performed with the fs-fast software program [21]. Data was subsequently band-pass filtered, partitioned and averaged with Matlab to remove physiological drift and instrument noise. The resulting data set included block hemodynamic timecourses for the four stimulus conditions, available for each voxel and each run. Each run was approximately counterbalanced and contained four blocks per condition. Roughly ten runs were performed per subject. For the third subject, runs included stimuli of either 300x300 pixel size or about two-fold larger. For the fourth subject, in addition to runs of larger size, other runs included all-new stimulus exemplars at the original size. The average timecourses of V1 voxels suggested that low-level properties of faces of different genders were more similar than office and reception chairs (Figure 5-2). The difference in the VI average timecourse between chair types was more pronounced than between genders. Because of the different aspect ratios of the face and chair stimuli, it is possible that office and reception chair images simply occupied different amounts of visual dark space. For subjects in Figure 5-2 (b), (c), and (d), the similarity in VI average timecourses for the two genders compared to FFA timecourses (Figure 5-3) suggested that gender bias observed in FFA cannot be entirely attributed to global low-level stimulus differences like luminance. For Figure 5-2 (a), the female response dominates in V1, but from Figure 5-3 (a), we see that the male response dominates in FFA, suggesting the low-level and FFA-level gender effects might be independent. Timecourses averaged over all FFA voxels and all runs (Figure 5-3) revealed preference for one gender in each subject. The preferred gender was male in the first subject, himself a male African-American. The preferred gender was female in the remaining three Caucasian subjects, including one female and two males. These FFA response amplitudes for each gender were not constant across runs for each subject. While the same gender was preferred in most or all runs of a given individual, the absolute amplitude of the two genders drifted across runs (Figure 55). Consequently, depending on this nonstationary drifting offset, it is not necessarily 72 Figure 5-1: Examples from stimulus set used in study of FFA response to gender. Rows from top to bottom correspond to males, females, reception chairs, and office chairs respectively. Fifteen exemplars of each category were presented in subjects a, b, and c. Blocks of fifteen additional exemplars of each category were included in some trials of subject d. All stimuli were presented at identical pixel height dimensions. 73 3 14 F 12 1 2 1 2 9 10 3 4 5 6 7 8 9 T1nmp(L5asc=Idbpurampm) 10 3 4 6 5 3 7 2 3 4 5 (a) 6 7 8 7 8 1( (b) a 13 .... .... .... 2 2 3 4 1mu.ehsam 5 6 m.snd 7 8 9 10 1 rminC 2 3 4 5 6 Thnesmpw .5assp(d) ampis) 9 10 (d) (c) Figure 5-2: Time course of V1 response, averaged over all voxels in V1 for male, female, and chair stimuli. The y axis denotes percent amplitude change from a baseline signal. Each panel corresponds to a different subject, (a) through (d). The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. 74 4 I2 I j2 ii ................ .. ..... .. i11 a 1 2 3 4 7Ma 5 6 CLla)Ca 7 8 9 10 1 2....4 ........ 2 3 4 ....... 5 6 7. S. ... 7 8 10 9 10 WM M (a) (b) 4 I ]4 3 --- ---- £ 12 1 2 3 Tkm 4 8 7 6 5 1.5(amo8deperaEM) 9 10 1 2 5 6 7 8 9 3 4 lnMa-ms (.629cMnbperam*4 10 (d) (c) Figure 5-3: Time course of FFA response, averaged over all voxels in FFA for male, female, and chair stimuli. The y axis denotes percent amplitude change from a baseline signal. Each panel corresponds to a different subject, (a) through (d). The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. 75 I I 1, 14 I 4... I.6 j2 A ~~-0 ~ 1 2 3 4 5 6 1 2 3 4 5 6 7 S 9 7 8 9 10 10 1I .4 2 2 4 5 6 7 S 2 3 4 5 6 7 8 (a) U V (b) T A. I& 14 3 Ar . .... .. . ... .. 2[ 1 A 2 3 5 4 8 7 8 1 2 3234567w 4 56 7 8 9 10 (d) (C) Figure 5-4: V1 response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. true that one gender dominates the other in FFA response irrespective of when the two measurements are made. If instead the measurements of FFA response are made sufficiently close in time to avoid the nonstationary drifting offset, one gender will always dominate the other. The implication of this nonstationarity is two-fold. First, it introduces large variance in the measured response amplitude to male and female stimuli, so that a conventional ANOVA performed on the distribution of male and female stimulus responses may fail. Indeed, the prerequisite conditions of the ANOVA test are not met. Because of the nonstationarity, the response to a male stimulus in one run is not drawn from the same random variable as in another run. Consequently, the ANOVA might fail despite the fact that the FFA has a preferred gender. 76 An appropriate statistical test of FFA gender preference must be independent of this non-stationary drifting offset. We can ask whether one gender is consistantly preferred over another gender within-run for all runs. Because the gender amplitude comparison is made within-run, the statistical test is independent of the non-stationary drift that occurs between runs. A hypothesis test allows us to assess with statistical confidence whether VI and the FFA exhibit gender preference. Define the null hypothesis as an equal-genderpreference (unbiased) region of interest (ROI). The probability that female elicits higher average ROI amplitude than male within a run is 0.5 under the null hypothesis. Using the binomial cumulative distribution function (cdf) 4(f) = P(F < f; n) = .5f.5"n-f O (5.1) f the probability that an unbiased ROI produces at least g trials of either gender out of n total trials, is P(G > g;rn) = - 11(g 1) -(n g)l 1 g A n/2 o.w. (5.2) The binomial cdf D(f) is evaluated using (5.1), or equivalently the Matlab function binocdf.m. Define an ROI to significantly prefer one gender over another if an unbiased ROI (the null hypothesis) produces the observed data with less than 0.05 probability, ie. P(G > g; n) K .05. Denote P(G > g; n) to be the p-value. Denote the gender preference ratio to be the number of runs in which the average response was greater for the preferred gender over the total number of runs, g/n. The gender preference ratios are summarized below for all subjects: 77 Subject (a) (b) (c) (d) FFA 10/11 8/9 7/9 7/10 Vi 6/11 5/9 5/9 6/10 The respective p-values of individual subjects are as follows: Subject (a) (b) (c) (d) FFA .0117 .0319 .1797 .3437 V1 1 1 1 .7539 In order to pool results over all subjects, we must calculate the null hypothesis pdf for the pooled data. Because pooling is a summation of the individual random variables of each subject, we can convolve the individual subject null hypothesis pdfs to produce the pooled data null hypothesis pdf and p-value. Following this pooling procedure, we have a total gender preference ratio and p-value for FFA and V1: All Subjects Gender Preference Ratio P-value FFA 32/39 .0023 V1 22/39 1 Although subjects (c) and (d) show greater significance in FFA than V1, FFA pvalues are not less than the customary .05 standard. Pooling over these two subjects alone does not acheive significance (FFA-0.3117, V1-1). However, if two additional subjects were run of preference ratios and total runs comparable to (c) and (d), the p-value excluding (a) and (b) would be significant (FFA-.02, V1-.8491). Our limited subject size suggests varying degrees of gender preference across the population. Although more scans would need to be performed to determine the extent of this phenomenon in the general population, we can say with statistical significance that there are people in the population with FFA gender preference. With more scans, we can develop an estimate of the population percentage of these people with a relatively narrow confidence interval. The statistical test chosen above is akin to a "two-tailed" test, more stringent than the alternative "one-tailed" version which would have given p-values that are half of 78 the p-values reported in the table above. The two-tailed test reports the probability that an unbiased region of interest would show the observed or greater preference to any one gender, either male or female. The one-tailed test gives the probability that an unbiased region of interest would show the observed or greater preference to one chosen gender in particular, like female. The two-tailed test is more appropriate here, because we are claiming that the FFA exhibits bias towards some gender (either male or female), rather than always only to one particular gender like female. We can perform the analogous hypothesis test to determine whether VI and FFA exhibit preference for one of the two types of chairs. The chair preference ratios are summarized below for all subjects: Subject (a) (b) (c) (d) FFA 6/11 5/9 5/9 6/10 V1 6/11 9/9 9/9 9/10 The respective p-values of individual subjects are as follows: Subject (a) (b) (c) (d) FFA 1 1 1 .7539 V1 1 .0039 .0039 .0215 Pooling results over all subjects, we have a total chair preference ratio and p-value for FFA and VI: All Subjects Chair Preference Ratio P-value FFA 22/39 1 V1 33/39 5.56e-04 The results suggest that V1 does preferentially respond to one type of chair, whereas FFA does not. This behavior is exactly opposite from gender preference, where FFA had gender preference, and Vi did not. It is striking that differences in office and reception chair images were so great as to elicit preference from V1, but nevertheless failed to elicit preference in FFA. This suggests that the FFA voxelaveraged activity might be more sensitive to differences in face objects than nonface 79 I p j A f4 2. ''A a.U ~*.k. '..,/ 1 2 3 4 5 6 lam n 8 7 .A"D.~'.A -~ A'' = 12A ~ 9 10 1I 3 2 4 r 5 6 ftminutw 7 a (b) (a) 1% I A 4~ 3 5 U A 2 1' A "'A"' 2 3 5 ftM iN 4 6 7 a 1q ' *~*~ ~ 2 3 4 5 6 7 .. mm ~ 8 "'A 9 . I 10 I (d) (c) Figure 5-5: FFA response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dashdotted - reception chairs. objects, and that large response differences in V1 (i.e. large low-level feature differences) do not necessarily modulate the mean response in FFA, such as with these two nonface objects. This result does not rule out the possibility that sensitivity to nonface objects might emerge in some feature of the entire multidimensional response other than the voxel-averaged maximum amplitude. Before the non-stationary drift was realized, a split-half multidimensional SVM analysis (section 2.4.3) was employed in an effort to quantify gender information in the FFA. The d' values of the operating characteristics for gender discrimination in FFA and V1 are reported in the accompanying table (table 5.1). The d' function is nondecreasing with distance from the chance line on the operating charactersic plane defined by hits (y axis) and false alarms (x axis). Some d' values are negative, indicat80 REGION: FFA (a) (b) (c) (d) Gender Classify ROIMean ROIMap -. 15 -.39 .54 .64 0 .34 -1.18 .3 Chair Classify ROIMean ROIMap .52 0 .93 .9 .64 .26 .68 0 Face Classify ROIMean ROIMap 2.60 2.79 4.64 3.36 4.64 4.64 4.64 4.64 VI Gender Classify ROIMean ROIMap .11 0 0 -.68 .24 .48 -.33 0 Chair Classify ROIMean ROIMap -.32 -.1 1.9 3.96 .6 1.36 2.32 2.30 Face Classify ROIMean ROIMap .11 .13 1.54 2.12 1.52 2.32 1.0 3.0 REGION: (a) (b) (c) (d) Table 5.1: SVM performance d' values on gender, chair, and face discrimination tasks in four subjects. The binary classification tasks were male/female, office/reception chair, and face/nonface respectively. Rows correspond to subjects (a) through (d). ROIMean and ROIMap denote the hemodynamic response feature being classified. ROIMean is a scalar, the average of the response over voxels and time. ROIMap is the vector of time-average responses of ROI voxels. ing the classifier labeled females as males and males as females at a rate higher than chance. This would mean that whatever trend identified by the SVM as distinguishing female and male in the training dataset had reversed in the test dataset. Because the SVM does not adjust to non-stationarity, the operating characteristic resulted in mixed and generally poor performance across subjects. The nonstationary drift across runs also makes conventional statistical tests across runs (as with ANOVA) inapplicable. In order to compensate for non-stationarity in the voxel-averaged FFA response, we estimated the non-stationarity as the average of the male and female stimulus response amplitudes. The signal was then corrected to remove the nonstationarity. For each run, this DC offset was calculated and subtracted from the male and female amplitudes. The resulting male and female responses were centered around zero amplitude (see Figure 5-6). With the broad drifts in amplitude corrected, both SVM and ANOVA produced 81 112 34 867 8910 '1 11 2 345 7 %Mu M*Ww ftm -M& (a) (b) 9101 Figure 5-6: Non-stationarity correction of FFA response amplitudes in subject (a) across runs for four conditions. (a) The original data, with response to faces drifting downward over the course of the experiment. (b) Compensated for non-stationarity in face stimulus responses. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted - reception chairs. results that were consistent with the preference ratio statistics. All four subjects showed significant (p < .05) separation between male and female response amplitudes with the ANOVA in FFA, and insignificant separation in V1 with the exception of subject (a), who showed significant but opposite gender preference in VI from FFA. Similarly gender classification with SVM was perfect or above chance for all subjects in FFA, but at or below chance for V1, with the exception of subject (c), who's V1 and FFA had equal but only slightly better than chance gender classification (TP=0.6, FP=0.4). This experiment showed that the FFA exhibits gender preference. Gender preference occurs on average across trials, and in a statistically significant fashion in a within-run comparison. The FFA gender preference is biased in a manner that does not systematically relate to the gender of the subject for this sample size of four subjects. One potential objection to the experimental method is that block design is subject to attentional effects. The argument is that the consistant FFA gender preference obsered across trials simply reflects the subject's consistantly increased attention to one particular gender over the other. This objection is difficult to counter even with event-related design, and the block design experiment on race bias in the FFA [18] 82 may have faced similar criticism. One hint that attention may not account for the gender effect observed is that VI voxels did not exhibit the effect. Attention would arguably also modulate V1 voxels if it modulated FFA voxels. Activation in the FFA is modulated by attention [51]. However, attention is not yet sufficiently understood to eliminate the possibility that attention might modulate the FFA but not V1. This might be the topic of further research. One study has previously reported that MEG face detection signals are insensitive to face stimulus gender [10]. Our result with fMRI does not contradict the MEG result because these measurement techniques are based on different (although related) underlying neural signal sources and are capable of different spatial and temporal resolution. The contrasting results from MEG and fMRI do emphasize the point that in general, the absence of information in a brain area as measured by one technique is not proof of the absence of information in the brain area. Negative results are in general not conclusive. The experiment describing the race effect in FFA [18] also showed a correlation with hippocampal activity and performance on a face memory recall task. The combined data set argued for a neural mechanism by which people better remember faces of their own race. A similar experiment is possible with gender, the FFA, hippocampus, and a memory recall task. As with the race study [18], a gender-bias result linked to memory would have implications for gender discrimination and social interaction. 83 84 Chapter 6 Conclusions on FFA coding of visual stimuli The fusiform face area (FFA) is defined as a focal set of voxels in the posterior temporal lobe that respond "maximally" to faces, meaning that pictures of human faces elicit a higher signal from the FFA of an attending subject than pictures of houses, amorphous blobs, or other nonface stimuli [29]. Most past work on the FFA has averaged activity over the voxels in the FFA, ignoring any differences that may exist in the response of different voxels in this region. Here we used high resolution fMRI scanning and new mathematical analyses to address two general questions: 1. What information is contained in the pattern of response across voxels in the FFA? 2. How is that information represented? In particular, this thesis investigated three specific questions: 1. How do voxels within the FFA collectively encode the presence or absence of a face? 2. How many independent stimulus features can the FFA distinguish? 3. Does the FFA contain information about stimulus image format or face gender? 85 The following discussion summarizes what was learned about each of these questions through the experiments described in this manuscript (chapters 3,4,5). 6.1 How do voxels within the FFA collectively encode the presence or absence of a face? Previous work identified voxels in the FFA by their individual elevated response to face stimuli over nonface stimuli [29]. In the first result of this thesis, we demonstrated with principal components analysis (PCA) that the dominant source of variation over measured responses to faces and nonfaces was the FFA discrimination between face and nonface, as opposed to discrimination between photographs and line-drawings, or any other variation in the stimulus responses. By projecting data onto the first principal component, we confirmed that this dominant direction of variation discriminated faces from nonfaces. Also with PCA, we now understand that FFA voxels approximately "co-vary" in their response to stimuli. Specifically, both nonface stimuli and face stimuli exhibit a dominant first principal component when analyzed separately. To understand what co-vary means, consider a "response vector", which contains the response amplitudes of each FFA voxel to a stimulus. Because FFA voxels co-vary over nonface stimuli, the response vectors to nonface stimulus A, B, C, D, etc. all approximately fall along a line in multidimensional space. Geometrically, the difference vector between any two nonface stimuli (e.g. B-A), is approximately some scaled version of the same "principal component" vector which points in the direction of the line. By comparing the principal component of non-face stimuli to the "detection vector" that connects face responses to nonface responses, we were able to demonstrate that not only do these voxels co-vary with each other (in the sense defined above) over non-face stimulus responses, but they do so in a fashion that relates to the face detection signal. Geometrically, we found the first or second principal component of nonface modulation was usually around 42 degrees off from the detection 86 vector, out of 90 possible degrees. This means that over 50 percent of the signal energy ((cos 42)2 x 100 = 55%) in the dominant direction of variation in the non- face responses was related to the face detection signal. Knowing the face detection vector, the categorization of a stimulus as face or non-face can be chosen optimally by subtracting off any baseline signal, projecting the adjusted response vector onto the detection vector, and comparing the magnitude of the projection to a decision threshold. This phenomenon was even more marked in the parahippocampal place area (PPA) which responds more to stimuli that depect indoor or outdoor scenes than non-place stimuli. In the PPA, the first or second principal component of the nonplace stimuli was on average 26 degrees off from the detection vector, i.e. over 80 percent of signal energy in the nonplace principal component related to the face detection signal. Interestingly, the papers that defined the FFA and PPA [29, 9], averaged voxel amplitudes equally to demonstrate detection, which we now know to be suboptimal because it corresponds to projecting the response vector onto the average vector rather than the detection vector. Consequently, in answer to a recent debate on the FFA [25, 44], it is true that the response vector carries more information in discriminating stimuli than the average FFA response, specifically by calculating the weighted average that corresponds to projection onto the detection vector. In a simple model, co-variance (as defined above) might be caused when a blood vessel modulates its flow in response to the stimulus and distributes its signal in fixed proportion to its distance to each FFA voxel. This fixed proportion would impose a particular voxel response vector, and any increase in blood vessel signal would translate into a multiplicative gain of the response vector. In an alternate model equally consistent with co-variance, the blood vessel would induce a baseline response vector, but neuronal activity would add on a characteristic vector in a different direction, scaled to indicate face detection. We could specifically reject the simple model if we found the direction of the principal component for nonfaces was different from the direction of the individual nonface response vectors. This distinction may help elucidate whether the co-variance (i.e. 87 dominant principal component) is a result of vessel physiology or neuronal activity. 6.2 How many independent stimulus features can the FFA distinguish? Recent literature indicates that the FFA is sensitive to the race of face stimuli [18], and the level of expertise of nonface stimuli [17]. It is possible that the FFA might encode other features of stimuli. In this thesis, PCA was used to determine an upper bound on the number of independent stimulus features the FFA voxels could encode. The PCA analysis suggested that for our given stimulus set, the fMRI response of the FFA is sensitive to the equivalent of one to three nonface features, and potentially many (more than three) face features. Although information theory provides a direct method for estimating this bound, the method could not be used here because it requires a probability density function (pdf) and is consequently data-intensive. A feature is independent of another feature if the presence of one feature in the stimulus does not make the other feature any more or less likely to also be present. If two features are not entirely independent, then there is less variation in the stimuli that needs to be encoded. If faces with blond hair always (or never) had blue eyes, the FFA could use just one pattern to indicate both the presence of blond hair and blue eyes. If the two features were independent, then they would need to be coded in two separate patterns. Even one voxel alone could encode an arbitrarily large number of features if it produced a unique output for each combination of features. Based qualitatively on the dynamic range and noise properties of FFA voxels, this scheme might be difficult for even three features, because the number of unique outputs grows as 0(2") where n is the number of features. There are in fact a multitude of possible coding schemes, that is, ways in which the voxels of the FFA could assign portions of the response vector space to combinations of features. By proposing a PCA upper bound on the number of codable independent 88 stimulus features, we implicitly assumed one particular coding scheme. In this scheme, the degree of presence of each feature is encoded by the magnitude of a response vector specific to that feature. The total FFA response vector is formed by the sum of the individually scaled feature vectors. Under this coding scheme, the magnitude of a feature is decoded by projecting the response vector onto the desired feature vector. Completely independent features would need to have orthogonal feature vectors so that a set of faces varying in only one of those features would have response vectors with the same projection in the independent feature directions. The geometric analysis of the relative locations of various stimulus response vectors will be useful in describing the exact FFA coding scheme. PCA estimates the equivalent number of independent features coded in the dataset by counting the number of orthogonal feature vectors (principal components) to approximately span the set of responses. The number of principal components to use in this approximate span is difficult to gauge. Here, we chose the PCA dimensionality qualitatively, by examining the size of the first principal components, and comparing the PCA variance profile against a gaussian white noise profile. The Bartlett test provides a statistic with which to choose the PCA dimensionality quantitatively, but this method assumes a white noise background. Further compounding the problem, both noise and signal could have the same PCA variance profile. The signal and noise contributions to the PCA dimensionality could possibly be separated by measurements of the magnitude and cross-voxel correlation of the noise. The PCA dimensionality estimate is referred to as an upper bound because conceivably even only one independent feature could "fill" the response vector space, requiring arbitrarily many orthogonal feature vectors. A high PCA dimensionality for face stimuli that differ only in gender would exemplify this sort of behavior. Moreover, drift in signal baseline or noise would also increase the PCA dimensionality of the FFA response vectors driven by a single independent feature to some dimensionality estimate greater than one. Drift and noise are sources of dimensionality overestimation in this experiment. 89 6.3 Does the FFA contain information about stimulus image format or face gender? Prior studies suggest that the FFA does in fact encode stimulus features other than the face/nonface contrast, including race [18] and the presence of nonface objects for which the subject bears expertise [17]. Newly documented stimulus features to which the FFA is sensitive contribute to an understanding of the brain mechanism for face recognition and the factors that might influence perception and memory of the faces that we see. An accumulating list of various FFA-sensitive features produces an assumption-free lower bound on the number of features the FFA encodes, in contrast to the coding-scheme-assumption based PCA upper bound discussed in the previous section. This thesis reported that the FFA is sensitive to face gender and to stimulus image format, i.e. differences between photographs and line-drawings. The FFA response had previously been reported insensitive to the photograph/line-drawing contrast with fMRI [22]. Magnetoencepholograpy [10] of an FFA-related scalp lo- cation reported no gender preference. Results in this thesis showed a run-by-run gender preference in voxel-averaged FFA activity. The contrast between photographs and line-drawings appeared in scatter plots of the first two principal components. Moreover, the photograph/line-drawing contrast was roughly orthogonal to the face detection vector in the response vector space. This means that although the FFA was sensitive to this contrast, the FFA encoded this information in a way that interfered only minimally with the face detection signal. Additional experiments are necessary to understand the cognitive relevance of the FFA sensitivity to gender. One immediate question is whether the FFA gender preference reflects the subject's propensity to attend to and/or remember faces of one gender or another. Analysis of additional subjects will also help establish that the measured bias in the FFA was truly independent of trivial low-level features like luminance. Although the inversion of preferred signal between V1 and FFA in one subject and the increased separation in signal in FFA for the other subjects is 90 suggestive that the FPA gender bias is not due to low-level differences in the male and female stimuli, replications of this result in additional subjects would be reassuring. One useful experiment would be to purposefully but only slightly increase the contrast or luminance of one gender in an individual known to have an FFA biased for the other gender, and demonstrate that the same bias in the FFA persists despite an inversion in bias toward the more luminant gender in V1. 6.4 Nonstationarity in fMRI experiments Searching for trends on a run-by-run basis proved critical to revealing FFA gender bias in chapter 5, where nonstationarities (i.e. drifting responses to the same condition across runs due to brain activity or instrumentation) that modulated the signal by as much as 7 percent from baseline would have swamped out this more subtle 0.5 to 2 percent effect. Analyzing data on a run-by-run basis may serve as an important complement to conventional whole-scan ANOVA analysis that assumes stationarity by collectively analyzing all runs, especially as we begin to investigate the various physiological phenomena that make the fMRI signal nonstationary. Both pdf and SVM methods rely on the stability of the distributions from which the data is drawn. Nonstationarity in the gender bias experiment made SVM results inconsistent (Figure 5-5). It is possible for nonstationarities to be sufficiently small as to maintain the separation between the response to conditions that are being classified. When the conditions being classified are dramatically different, the classification of their responses is less impaired by nonstationarity. This was the case in a recent publication [6] examined nonface object categories from the same subject on different days. Aligning the subject's head across sessions with a thermoplastic mold, the group was able to perform classification despite nonstationarities. This was perhaps because the thermoplastic mold decreased noise by reducing head movement, and more likely because the nonface categories tested were so much different from each other than faces of each gender. For example, the V1 response in our gender study (Figure 5-4) shows large 91 amplitude separation between office and reception chairs despite nonstationarity. A theoretical calculation would be useful to estimate the required dataset size to perform pdf or SVM methods for a more subtly varying stimulus set like faces. The calculation would need to first specify an experimental paradigm, and then gather realistic noise statistics from repeated measures using that technique. For a theoretical calculation to be practical, it would help to contend with the large-scale nonstationarities observed in Figure 5-5. This theoretical model could assess the feasibility of any experiment based on the noise statistics of pilot data. 6.5 Defining a brain system in fMRI analysis Communication theory provides a fresh perspective on interpretation of data from an anatomically focal set of voxels. When presenting stimuli and analyzing data from a region of interest (ROI), it is accepted practice in current neuroscience literature (and indeed in the above manuscript) to attribute the characteristics of the data to a property of the ROI. The concepts of input, output, and system in communication theory suggests that this approach is flawed. By relating a set of visual stimuli presented to an individual with the output from a ROI, we are studying the system that includes all participating neural circuitry. Consider face detection in the FFA. We present face and nonface stimuli, and observe voxels in the FFA to respond maximally to face stimuli such that this output serves as a face detection signal. The accepted conclusion is that the neurons of the FFA perform face detection. The proper conclusion is that the entire set of neurons implicated in the production of this signal perform face detection. As we move forward in understanding the computational architecture of the brain, both modular and distributed notions will need to make this distinction to develop a more accurate description of brain function.. What then does it mean that only a focal set of voxels in the FFA generate the face detection signal? It means only that the voxels of the FFA encode whether a stimulus is a face or nonface. This could potentially imply that the FFA might be convenient 92 read-out for other brain centers interested in knowing when a face is detected. It does not mean however that the FFA is a face detection system. We would be inclined to call the FFA a face detection system (or at least part of one) if the input to the FFA was not already a face detection signal. A lesion study of the FFA would not distinguish between the FFA as a readout and as a face detection system, because imparing both faculties could incapacitate face detection. This discussion is particularly relevant to PCA dimensionality analysis of ROI signal output. The PCA dimension of the measured signal is only descriptive of the system in the context of the PCA dimension of the input. This concept is native to information theory, where channel capacity is defined in terms of the mutual information between the input and output. Furthermore, the system dimensionality refers to the entire system of participating neural circuitry. Understanding this system dimensionality does however specifically aid in describing the local response properties of the FFA. The concept of system, input and output suggests that future fMRI studies will not only be concerned with the relationship between stimulus and a target ROI, but the relationship and interaction between different ROI within the brain, either remotely located, or as proximal anatomical slices. In combination with new approaches to stimulating brain activity (in addition to sensory stimuli) such as transcranial magnetic stimulation and electrode-based current injection, fMRI is beginning to tease apart the structure of brain function on the large (millimeter) scale. 93 94 Bibliography [1] T.M. Apostol. Calculus, volume 2. John Wiley & Sons, 2 edition, 1967. [2] S. Beheshti. MIT course 6.341: Discrete-time signal processing, 2003. [3] J. Bodamer. Die prosopagnosie. Archiv fur Psychiatrie und Nervenkrankheiten, 179:6-53, 1947. [4] R.R. Buxton. Introduction to Functional Magnetic Resonance Imaging: Principles and Techniques. Cambridge University Press, 2001. [5] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley- Interscience, 1991. [6] D.D. Cox and R.L. Savoy. Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neurolmage, 19(2):912, 2003. [7] A.M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109, 1999. [8] P. Dayan and L. Abbott. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 2001. [9] R. Epstein and N. Kanwisher. A cortical representation of the local visual environment. Nature, 392:598, 1998. [10] E. Halgren et al. Cognitive response profile of the human fusiform face area as determined by MEG. Cerebral Cortex, 10(1):69, 2000. 95 [11] E.N. Brown et al. Distributed and overlapping representations of faces and objects in ventral temporal cortex. J. Neuroscience, 18(18):7411, 1998. [12] K. Kwong et. al. Functional magnetic-resonance-imaging of primary visual- cortex. Investigative Ophthalmology & Visual Science, 33(4):1132, 1992. [13] M.A. Hearst et al. Trends and controversies: Support vector machines. IEEE Intelligent Systems, page 18, 1998. [14] P. Golland et al. Discriminative analysis for image-based studies. Proc. MICCAI, page 508, 2002. [15] T-P. Jung et al. Imaging brain dynamics using independent component analysis. IEEE Proceedings, 89(7):1107, 2001. [16] I. Fried, K.A. MacDonald, and C.L. Wilson. Single neuron activity in human hippocampus and amygdala during recognition of faces and objects. Neuron, 18:753-765, 1997. [17] I. Gauthier, P. Skudlarski, J.C. Gore, and A.W. Anderson. Expertise for cars and birds recruits brain areas involved in face recognition. Nature Neuroscience, 3(2):191-197, 2000. [18] A.J. Golby, J.D. Gabrielli, J.Y. Chiao, and J.L.Eberhardt. Differential responses in the fusiform region to same-race and other-race faces. Nature Neuroscience, 4(8):845-850, 2001. [19] D. Greve. fmri-analysis-theory.tex, 1999. fs-fast software documentation, freesurfer-alpha/fsfast-20020814/docs. [20] D. Greve. selxavg.tex, 2000. fs-fast software documentation, freesurfer-alpha/fsfast-20020814/docs. [21] D. Greve. fs-fast fMRI analysis software (freesurfer), 2001. Available for download at http://surfer.nmr.mgh.harvard.edu/. 96 [22] K. Grill-Spector. The neural basis of object perception. Current Opinion in Neurobiology, 13(2):159, 2003. [23] C.G. Gross, G.E. Roche-Miranda, and D.B. Bender. Visual properties of neurons in the inferotemporal cortex of the macaque. Journal of Neurophysiology, 35:96111, 1972. [24] J.C. Hansen. Separation of overlapping waveforms having known temporal distributions. Journal of Neuroscience Methods, 9:127, 1983. [25] J.V. Haxby, M.I. Gobbini, M.L. Furey, A. Ishai, J.L. Schouten, and P. Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425, 2001. [26] D.J. Heeger and D. Ress. What does fMRI tell us about neuronal activity? Nature Reviews Neuroscience, 3:142, 2002. [27] G. Heit, M.E. Smith, and E. Halgren. Neural encoding of individual words and faces by the human hippocampus and amygdala. Nature, 333:773-775, 1988. [28] J.P. Hornak. The Basics of MRI. 2003. http://www.cis.rit.edu/htbooks/mri/. [29] N. Kanwisher, J. McDermott, and M.M. Chun. The fusiform face area: A module in human extrastriate cortex specialised for face perception. Journal of Neuroscience, 17(11):4302-4311, 1997. [30] G. Kreiman, C. Koch, and I. Fried. Category-specific visual responses of single neurons in the human medial temporal lobe. Nature Neuroscience, 3(9):946, 2000. [31] N.K. Logothetis, J. Pauls, M. Augath, T. Trinath, and A. Oeltermann. A neurophysiological investigation of the basis of the fMRI signal. Nature, 412:150, 2001. [32] G. McCarthy, A. Puce, J.C. Gore, and T. Allison. Face-specific processing in the human fusiform gyrus. Journal of Cognitive Neuroscience, 9(5):604-609, 1997. 97 [33] P. Niyogi and ity reduction M. and Laplacian Belkin. data representation. nical Reports, pages TR-2002-01, eigenmaps for dimensional- University of Chicago Tech- 2002. Available for download at http://www.cs.uchicago.edu/research/publications/techreports/TR-2002-01. [34] J.G. Ojemann, G.A. Ojemann, and E. Lettich. Neuronal activity related to faces and matching in human right nondominant temporal cortex. Brain, 115:1-13, 1992. [35] A.V. Oppenheim, R.W. Schafer, and J.R. Buck. Discrete-Time Signal Processing. Prentice Hall, 2 edition, 1989. [36] A.V. Oppenheim, A.S. Willsky, and S.H. Nawab. Signals and Systems. Prentice Hall, 2 edition, 1996. [37] P.J. Phillips, H. Wechsler, J. Huang, and P. Rauss. The feret database and evaluation procedure for face recognition algorithms. Image and Vision Computing, 16(5):295-306, 1998. [38] P.L. Purdon, V. Solo, R.M. Weisskoff, and E.N. Brown. Locally regularized spatiotemporal modeling and model comparison for functional mri. NeuroImage, 14:912, 2001. [39] F. Rieke, D. Warland, R. deRuytervanSteveninck, and W. Bialek. Spikes: Exploring the Neural Code. MIT Press, 1999. [40] B.R. Rosen, R.L. Buckner, and A.M. Dale. Event-related functional MRI: Past, present, and future. PNAS, 95:773, 1998. [41] S.M. Ross. Statistical Learning Theory. Harcourt / Academic Press, 2 edition, 2000. [42] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323, 2000. 98 [43] M.J. Schnitzer and M. Meister. Multineuronal firing patterns in the signal from eye to brain. Neuron, 37:499, 2003. [44] M. Spiridon and N. Kanwisher. How distributed is visual category information in human occiptio-temporal cortex? an fnri study. Neuron, 35:1157, 2002. [45] G. Strang. Linear Algebra and Its Applications. International Thomson Publishing, 3 edition, 1988. [46] M.J. Tarr and I. Gauthier. FFA: A flexible fusiform area for subordinate-level visual processing automatized by expertise. Nature Neuroscience, 3(8):764-769, 2000. [47] J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319, 2000. [48] V.N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998. [49] E. Weisstein. Bonferroni correction, 2003. Available for download at http://mathworld.wolfram.com/BonferroniCorrection.html. [50] E. Weisstein. Kolmogorov-smirnov test, 2003. Available for download at http://mathworld.wolfram.com/Kolmogorov-Smirnov Test.html. [51] E. Wojciulik, N. Kanwisher, and J. Driver. Covert visual attention modulates face-specific activity in the human fusiform gyrus: fMRI study. Journal of Neurophysiology, 79:1574, 1998. [52] G. Wornell. Personal communication, 2002. [53] G. Wornell and A.S. Willsky. 6.432 MIT Course Notes: Stochastic Processes, Detection and Estimation. 2003. [54] J. Wyatt. MIT course 6.432: Stochastic processes, detection and estimation, 2002. 99