Coding in the Fusiform Face Area ... Brain Lakshminarayan "Ram" Srinivasan

advertisement
Coding in the Fusiform Face Area of the Human
Brain
by
Lakshminarayan "Ram" Srinivasan
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2003
@ Massachusetts Institute of Technology 2003. All rights reserved.
Author ........
Department of Electrical Engineering a d Computer Science
August 8, 2003
Certified by....................
-.
N ncy G. Kanwisher
Professor
Thesis Supervisor
/
Accepted by
Arthur C. Smith
Chairman, Department Committee on Graduate Students
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
OCT 1 5 2003
-IBRARIES
DARKER
Coding in the Fusiform Face Area of the Human Brain
by
Lakshminarayan "Ram" Srinivasan
Submitted to the Department of Electrical Engineering and Computer Science
on August 8, 2003, in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
Abstract
The fusiform face area of the human brain is engaged in face perception and has been
hypothesized to play a role in identifying individual faces [29]. Functional magnetic
resonance imaging (fMRI) of the brain reveals that this region of the temporal lobe
produces a higher blood oxygenation dependent signal when subjects view face stimuli
versus nonface stimuli [29]. This thesis studied the way in which fMRI signals from
voxels that comprise this region encode (i) the presence of faces, (ii) distinctions
between different face types, and (iii) information about nonface object categories.
Results suggest that the FFA encodes the presence of a face in one dominant
eigenmode, allowing for encoding of other stimulus information in other modes. This
encoding scheme was also confirmed for scene detection in the parahippocampal place
region. PCA dimensionality suggests that the FFA may have larger capacity to
represent information about face stimuli than nonface stimuli. Experiments reveal
that the FFA response is biased by the gender of the face stimulus and by whether a
stimulus is a photograph or line drawing.
The eigenmode encoding of the presence of a face suggests cooperative and efficient
representation between parts of the FFA. Based on the capacity for nonface versus
face stimulus information, the FFA may hold information about both the presence and
the individual identity of the face. Bias in the FFA may form part of the perceptual
representation of face gender.
Thesis Supervisor: Nancy G. Kanwisher
Title: Professor
3
4
Acknowledgments
I would like to thank my advisor, Professor Nancy Kanwisher, for her keen insight,
compassionate guidance, and generosity with time and resources. My father, mother
and brother lent hours of discussion and evaluation to the ideas developed in this
thesis. I am grateful to all members of the Kanwisher laboratory, Douglas Lanman at
Lincoln Laboratory, and Benjie Limketkai in the Research Laboratory of Electronics
for their friendship and interest in this work. Thanks to Professor Gregory Wornell
and Dr. Soosan Beheshti for discussions on signal processing.
Design and collection of data for the first experiment (chapter 3) was performed
by Dr. Mona Spiridon, at the time a post-doctoral fellow with Professor Nancy Kanwisher. The fMRI pulse sequence and head coil for the second and third experiments
(chapters 4 and 5) were designed by Professors Kenneth Kwong and Lawrence Wald
respectively, at Massachusetts General Hospital and Harvard Medical School. Some
stimuli in the third experiment (chapter 5) were assiduously prepared by the assistance of Dr. Michael Mangini and Dr. Chris Baker, current postdoctoral scholars with
the laboratory. Some preprocessing analysis Matlab code used in this experiment was
written in conjunction with Tom Griffiths, a graduate student with Professor Joshua
Tenenbaum.
Portions of the research in this paper use the FERET database of facial images
collected under the FERET program. Work at the Martinos Center for Biomedical
Imaging was supported in part by the The National Center for Research Resources
(P41RR14075) and the Mental Illness and Neuroscience Discovery (MIND) Institute.
The author was funded by the MIT Presidential Fellowship during the period of this
work.
5
6
Contents
1
17
Problem Statement
21
2 Background
2.1
Fusiform Face Area (FFA): A face selective region in human visual cortex 21
2.2
Physics, acquisition, and physiology of the magnetic resonance image
2.3
2.4
24
2.2.1
Physical principles
. . . . . . . . . . . . . . . .
. . . .
24
2.2.2
Image acquisition . . . . . . . . . . . . . . . . .
. . . .
26
2.2.3
Physiology of fMRI . . . . . . . . . . . . . . . .
. . . .
27
Design of fMRI experiments . . . . . . . . . . . . . . .
. . . .
29
2.3.1
Brute-force design . . . . . . . . . . . . . . . . .
. . . .
30
2.3.2
Counterbalanced block design . . . . . . . . . .
. . . .
31
2.3.3
Rapid-presentation event related design . . . . .
. . . .
33
. . . . . . .
. . . .
42
2.4.1
Univariate analysis of variance (ANOVA) . . . .
. . . .
44
2.4.2
Principle components analysis (PCA) . . . . . .
. . . .
45
2.4.3
Support vector machines (SVM) . . . . . . . . .
. . . .
49
Analysis of fMRI-estimated brain activity
3 Experiment One: FFA response to nonface stimuli
53
4
Experiment Two: FFA response to face stimuli
63
5
Experiment Three: FFA response to stimulus gender
71
6 Conclusions on FFA coding of visual stimuli
7
85
6.1
How do voxels within the FFA collectively encode the presence or absence of a face? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
.
88
6.2
How many independent stimulus features can the FFA distinguish?
6.3
Does the FFA contain information about stimulus image format or face
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
6.4
Nonstationarity in fMRI experiments . . . . . . . . . . . . . . . . . .
91
6.5
Defining a brain system in fMRI analysis . . . . . . . . . . . . . . . .
92
gender?
Bibliography
95
8
List of Figures
3-1
Typical results from one subject of PCA variance and projection analysis in (a,d,g) FFA, (b,e,h) PPA, and (c,f) all visually active voxels.
(a,b,c) ROI response vectors projected onto first two principal components. Labels 1 and 2 indicate face stimuli; all other labels are
non-face stimuli. Label 5 indicates house stimuli; all other labels are
non-place stimuli. Photograph stimuli are denoted in boldface italics,
and line drawings are in normal type. (d,e,f) Variance of dataset along
principal components including nonpreferred and preferred category response vectors. (g,h) Variance of dataset along principal components
excluding preferred category response vectors. . . . . . . . . . . . . .
3-2
56
Sum of variance from PCi and PC2 for data (a) including and (b)
excluding own-category stimulus responses. The labels S1 through S4
denote the subject number, consistent between panel (a) and (b).
9
. .
57
3-3
Typical plots from one subject of nonface principal component coordinates against face detection vector coordinates for stimulus responses
in (a,b) FFA and (b,c) PPA. The detection vector coordinate connects
the average face response vector to the average nonface response vector.
(a,c) ROI nonface response vector coordinates projected onto PCi of
nonface data points and the detection vector. (b,d) Similarly for nonface PC2. Alignment of data along a line indicates strong correlation
between the two axis variables. Geometrically, this indicates either the
nonface data already falls along a line, the vectors corresponding to
the axis variables are aligned, or some mixture of these two scenarios.
PC nonface variance analysis (Figure 3-1 g,h) suggests that the axis
alignment concept is appropriate. The majority of variance is not explained exclusively by the first PC, so the data does not already fall
along a line in the original response vector space. Although fourty
percent of the variance falls along a line, this is not enough to allow
any coordinate choice to trivially project data onto a line, as evidenced
by the coordinates chosen in (a) and (d). Axis alignment is quantified
directly in Figure 3-4 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-4
58
Summary of subtending angle between detection vector and nonpreferred category principal component in (a) FFA and (b) PPA for four
subjects. If the subtending angle is greater than 90 degrees, the angle
is subtracted from 180 degrees to reflect the alignment between vectors rather than the direction in which the vectors are pointed. Two
vectors are completely aligned at 0 degrees and completely unaligned
or orthogonal at 90 degrees subtending angle.
10
. . . . . . . . . . . . .
59
4-1
Stimulus set used in study of FFA response to faces. The set includes
pictures of sixteen faces and one car. Four contrasts are equally represented, including adiposity (columns 1 and 2 versus 3 and 4), age (rows
1 and 2 versus 3 and 4, gender, and race. Faces were obtained from
[37], with exception of the African Americans in the second row and
third column which were photographed by the author, and the adipose
young Caucasian male obtained from the laboratory. Two face images
(elderly, African American, female) have been witheld as per request
of volunteers.
4-2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
Variance of PCA components (a) across FFA response to faces, with
preferred stimulus data from one subject, and (b) generated from a
Gaussian normal vector with independent identical elements N(0,1).
Dimensions and dataset size match between (a) and (b).
4-3
. . . . . . .
66
Multivariate ANOVA across (a) race, (b) gender, (c) age, and (d) adiposity for FFA response to faces in one subject. The data is projected
onto the optimal category-separating axis, canonical variable 1, abbreviated "ci". This value is plotted along the y axis in the above
graphs. The corresponding p-value is printed on each panel, indicating the probability that the data belongs to a single Gaussian random
variable. The accompanying d value indicates that the d-1 Gaussians
null hypothesis can be rejected at p < 0.05. Here, we compare only
two groups in each panel (eg. male/female), so the maximum d value
is 1. .........
....................................
11
67
4-4
ANOVA on a one-dimensional FFA response measurement to male and
female stimuli. For each voxel, the hemodynamic impulse response was
fit with a polynomial and the maximum amplitude extracted. This
value was averaged across all voxels in the FFA to produce the response feature analyzed with this ANOVA. Panel inscriptions denote
the ANOVA F-statistic value and the p-value (Prob>F), the probability that a single univariate Gaussian would produce separation between
male and female stimulus responses equal to or greater than observed
here. A p-value of 0.0614 is near the conventional significance threshold
of p= 0.05
5-1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
Examples from stimulus set used in study of FFA response to gender. Rows from top to bottom correspond to males, females, reception
chairs, and office chairs respectively. Fifteen exemplars of each category were presented in subjects a, b, and c. Blocks of fifteen additional
exemplars of each category were included in some trials of subject d.
All stimuli were presented at identical pixel height dimensions. .....
5-2
73
Time course of V1 response, averaged over all voxels in V1 for male,
female, and chair stimuli. The y axis denotes percent amplitude change
from a baseline signal. Each panel corresponds to a different subject,
(a) through (d). The line assignments are as follows: dashed - male,
solid - female, dotted - office chairs, dash-dotted - reception chairs. . .
5-3
74
Time course of FFA response, averaged over all voxels in FFA for male,
female, and chair stimuli. The y axis denotes percent amplitude change
from a baseline signal. Each panel corresponds to a different subject,
(a) through (d). The line assignments are as follows: dashed - male,
solid - female, dotted - office chairs, dash-dotted - reception chairs. . .
75
5-4 V1 response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office
chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . . . . .
12
76
5-5
FFA response amplitudes across runs for four conditions.
The line
assignments are as follows: dashed - male, solid - female, dotted -
office chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . .
5-6
80
Non-stationarity correction of FFA response amplitudes in subject (a)
across runs for four conditions. (a) The original data, with response
to faces drifting downward over the course of the experiment.
(b)
Compensated for non-stationarity in face stimulus responses. The line
assignments are as follows: dashed - male, solid - female, dotted - office
chairs, dash-dotted - reception chairs. . . . . . . . . . . . . . . . . . .
13
82
14
List of Tables
5.1
SVM performance d' values on gender, chair, and face discrimination
tasks in four subjects. The binary classification tasks were male/female,
office/reception chair, and face/nonface respectively. Rows correspond
to subjects (a) through (d). ROIMean and ROI-Map denote the hemodynamic response feature being classified. ROIMean is a scalar, the
average of the response over voxels and time. ROI.Map is the vector
of time-average responses of ROI voxels.
15
. . . . . . . . . . . . . . . .
81
16
Chapter 1
Problem Statement
When greeting a friend that passes by on the street, we determine both the presence
and identity of the human face that is in our field of vision. This process of face
detection and identification is studied with functional magnetic resonance imaging
(fMRI), a noninvasive technique used to measure brain activity. The fusiform face
area (FFA) of the brain responds "maximally" to faces, meaning that pictures of
human faces elicit more signal from the FFA of an attending subject than pictures of
houses, amorphous blobs, or other nonface stimuli [29].
Most past work on the FFA has averaged activity over the voxels in the FFA,
ignoring any differences that may exist in the response of different voxels in this
region. Here we used high resolution fMRI scanning and new mathematical analyses
to address two general questions:
1. What information is contained in the pattern of response across voxels in the
FFA?
2. How is that information represented?
In particular, this thesis investigated three specific questions:
1. How do voxels within the FFA collectively encode the presence or absence of a
face?
2. How many independent stimulus features can the FFA distinguish?
17
3. Does the FFA contain information about stimulus image format or face gender?
In this manuscript, the phrase "detection signal" denotes the part of the FFA response
that tells whether a stimulus is a face or nonface.
The term "bias" refers to the
remaining parts of the FFA response that might encode other information about a
stimulus. "Stimulus image format" refers to the difference between photograph and
line-drawing versions of a picture.
The three experiments performed in this thesis relate to different aspects of the
above main questions. The first experiment (chaper 3) studied whether the variation
in responses to non-face stimuli was related to a modulation of the face detection signal. The experiment also used principal components analysis (PCA) to graphically
explore which other features of the stimuli affected the variation and to determine
an upper bound on the number of independent nonface features that could be represented in the FFA. The second experiment (chapter 4) studied FFA response to face
stimuli. As with the first experiment, PCA was used to determine an upper bound
on the number of independent face features that could be represented with the FFA.
Multivariate analysis of variance (MANOVA) was used to quantify FFA sensitivity to
race, gender, age, and adiposity. The third experiment (chapter 5) performed more
extensive tests to quantify FFA sensitivity to gender.
Several experimental and mathematical methods were employed in this thesis.
Chapter 2 discusses these methods in the context of fMRI and the FFA. Section 2.1
is a review of previous studies on the FFA. Section 2.2 discusses fMRI and the principles behind the noninvasive imaging of brain activity. Section 2.3 describes the presentation of stimuli and complimentary signal processing that is required to produce
estimates of the stimulus response. Section 2.4 discusses the mathematical methods
used in this thesis to analyze stimulus coding in the FFA once estimates of stimulus
response were obtained.
The conclusion (chapter 6) provides a summary and evaluation of the findings.
This thesis represents a part of a larger effort to undertand the nature of face processing in general and FFA activity in particular. Because detection and identification of
faces is important to our daily social interactions, understanding the neural substrate
18
for this ability represents a core problem in cognitive neuroscience.
19
20
Chapter 2
Background
2.1
Fusiform Face Area (FFA): A face selective region in human visual cortex
Many of our daily interactions call upon face detection, the brain's ability to identify
the presence of a face in a scene. Efforts develop computer face detection algorithms
have highlighted some of the challenges of this problem. Within a test scene, faces
may be rotated at different angles, scaled or occluded. A complex background like a
city scene might make the face more difficult to detect than a plain white background.
Skin color can be altered by lighting. Non-face sections of a scene can coincidentally
resemble a face in color or shape.
The associated engineering challenge is to maximize probability of detection while
minimizing probability of false alarm.
Optimum detection of faces requires both
flexibility and inflexibility. The detector must be flexible to changes in scale, position,
lighting, occlusion, etc., while at the same time inflexibly rejecting non-face samples
under various image changes. Algorithms have generally solved scale and position
invariance by multiresolution sampling (e.g. Gaussian pyramids) and variable width
sliding windows. Unique facial features such as skin color or geometry of eye and
mouth position have then been chosen in an attempt to make the detector more face
selective. A learning machine then observes these features in a labeled set of faces,
21
and develops a joint probability distribution function (pdf) or chooses a classifier
function based on some criterion, such as empirical risk minimization in a support
vector machine (SVM).
The brain has the capacity to solve these basic computational challenges and perform very high quality face detection. Brain research on face detection in the last
fifty years has focused on determining the actual neural substrate. One basic question
was whether the signal indicating detection of a face was anatomically localized or
present in multiple regions of the brain. While the latter hypothesis has yet to be explored, the search for an anatomically localized face detection signal was prompted by
discovery of focal brain lesions that induced prosopagnosia, an inability to recognize
faces [3]. These lesions were typically located in temporal lobe. Electrophysiological recording in macaque temporal cortex subsequently revealed face selective cells
[23]. Human electrophysiology in temporal and hippocampal sites demonstrated face,
facial expression, and gender selective neurons [16, 27, 34]. Event-related potential
(ERP) and magnetoecephalography (MEG) studies made a noninvasive and anatomically broader survey of healthy brains, implicating fusiform and inferotemporal gyri
in face selective response.
In the last decade, functional magnetic resonance imaging (fMRI) of brain activity
has allowed for anatomically more precise visualization of this region. Initial studies
located regions that responded to face stimuli more than blank baseline stimuli. Two
studies [32, 29] in 1997 made explicit signal amplitude comparisons between face and
non-face stimuli to locate brain regions that responded maximally to faces. These
studies were careful to choose face and non-face stimuli that differed little in low-level
features such as luminosity, as verified by checking if low level visual areas like VI
were responding differentially to faces and non-faces. The anatomically localized area
of face detection signals found most consistantly across subjects is the fusiform face
area (FFA), located in the fusiform gyrus of the posterior inferior temporal lobe. Face
detection signals are also focused near the occipital temporal junction, in the occipital
face area (OFA).
Subsequent work has suggested that the FFA responds more to nonface objects for
22
which the subject bares expertise than for other nonface objects. However the paper
initiating this claim trained subjects on novel objects called greebles that resembled
faces, making the effect of greeble response difficult to distinguish from the normal face
detector response [46]. Other work demonstrated elevated FFA response to cars in car
experts and birds in bird experts, but with FFA response to faces still maximal [17].
Insufficient additional stimuli were tested to determine whether the FFA could serve
reliably as a detector of these alternate expertise-specific objects in daily experience.
Four questions regarding the function of the FFA have not yet been empirically
resolved. (i) Is the FFA involved in pre-processing stages leading to the detection
of faces? (ii) Does the FFA participate in further classification of faces, by expression, race, or other identifying characteristics? Only recently has it been shown in
Caucasians and African Americans that the average FFA signal is elevated in the
presence of faces from the subject's own race [18].
However, this result does not
support the encoding of more than two races. (iii) Does the FFA encode information
about nonface stimuli? Maximal response to face stimuli is not interpreted as non-face
detection although this function is equivalent to face detection. A recent correlation
based study suggested the FFA does not reliably encode non-face objects [44], but
alternate decoding methods might prove otherwise. (iv) Are emotional expressions
and semantic information about faces (eg. name of the individual) represented in
the FFA? The hemodynamic response through which fMRI observes neural activity
is know to lose tremendous detail in the temporal and spatial response of neurons.
All spike patterning and the individual identity of millions of neurons are lost in each
spatial voxel (three dimensional pixel) of hemodynamic activity. The absence of information in fMRI signal is not proof of absence of this information in the respective
brain area.
The general assumption behind conventional interpretations of FFA activity is
that all amplitudes are not equal. Higher amplitude responses are thought to code
information while low amplitude responses are not. This "higher-amplitude-better"
tenet bares some resemblance to the cortical concept of firing rate tuning curves in
individual neurons. A tuning curve is a function describing the response of a neuron
23
that takes the input stimulus and maps it to the firing rate, i.e.
the number of
spikes (action potentials) emitted by the neuron over some unit time. It has been
traditionally assumed that the stimulus for which the neuron responds maximally is
the stimulus the neuron represents. Neural modeling literature from the past decade
[39, 8], the "new perspective", has understood that the firing rate, regardless of its
amplitude, can be used in an estimator of the stimulus based on the tuning curve
function. The tuning curve is the expected value of the firing rate given the stimulus,
and thus a function of the conditional probability density relating stimulus to firing
rate. The new perspective is more general than the traditional higher-amplitudebetter paradigm.
2.2
Physics, acquisition, and physiology of the magnetic resonance image
The following discussion on physics of fMRI (section 2.2.1) and image acquisition
(section 2.2.2) draw from an introductory text on fMRI [4] and an online tutorial [28].
The textbook and tutorial provide many helpful illustrations of the basic concepts.
The reader is directed to the discussion below for a brief overview, and the referenced
sources for a more complete introduction to the fMRI technique.
2.2.1
Physical principles
A hydrogen proton has an associated quantum state called spin, which can be up
or down. Spin produces an effective magnetic dipole, and any collection of protons,
such as a sample of water molecules, will have a net magnetic dipole direction and
magnitude. The emergence of an apparent continuum of net dipole directions in
a sample of protons from two spin states in each proton is a subject of quantum
mechanics.
Based on its surrounding magnetic field, a proton has a characteristic resonant
frequency wo = 7Bo. Electrons near a proton can partially shield a proton from
24
the external magnetic field, shifting the resonant frequency. This so called chemical
shift is sufficiently significant that molecules of different structure have hydrogens
of different resonant frequencies. Conventional fMRI works with NMR of the water
molecule and its associated proton resonance frequency.
The net dipole of a sample aligns with the direction of an applied magnetic field,
denoted the longitudinal direction. The dipole can be displaced, or "tipped" from the
direction of alignment by a radiowave pulse at the resonant frequency. The tipped
dipole precesses at the resonant frequency in the transverse plane, orthogonal to the
longitudinal direction. The component dipoles of the sample initially rotate in phase,
but over time the component phases disperse uniformly in all directions. During this
process, called relaxation, the net dipole returns to the equilibrium direction with
an exponential timecourse. The time constants to reaching equilibrium in transverse
and longitudinal directions are called T2 (transverse, spin-spin) and T1 (longitudinal)
relaxation time respectively.
After tipping with a 900 RF pulse and during the relaxation phase, the precessing
magnetic dipole, an oscilliating magnetic field, induces electromagnetic radiation, a
radiowave, that can be detected with a nearby wire coil. This emission is called
the free induction decay (FID). A second emmission with identical waveform but
larger magnitude can be generated by following with a 1800 RF pulse. This second
emmission arrives at a time interval after the 1800 pulse equal to the time between
the 900 and 1800 pulses, and is called a spin echo. In echo planar imaging (EPI),
additional magnetic field gradients (phase and frequency encoding, explained below)
are cycled during the spin echo, resulting in "gradient echo" that allows for the fast
slice imaging required for fMRI.
Three factors determine the amplitude of the emitted radiowave: the local density of protons, TI relaxation time, and T2 relaxation time. Because these factors
are endogenous characteristics of the sample, contrasts between various brain tissue
or deoxygenated and oxygenated vasculature can be imaged in the intensity of the
MR signal. fMRI exploits the variation that deoxyhemoglobin versus hemoglobin
induces in the T2 relaxation time. Because the free induction (and thus the spin
25
echo) emission decays at the T2 relaxation time, the total energy of the emission
is sensitive to blood oxygenation. This allows for sufficient image contrast in sections of higher blood oxygenation to create images of brain activity. By employing
alternate radiowave and magnetic gradient pulse sequences, the recorded MR signal
can reflect different aspects of spin relaxation (proton density, T1, etc.) and hence
contrast different endogenous properties of the tissue sample. The T2 dependent MR
signal is called blood-oxygen-level-dependent (BOLD) signal in fMRI literature. An
explanation of T2*, also relevant to fMRI, has been omitted here.
2.2.2
Image acquisition
In order to produce images, the MR signal must be extracted from voxels in a crosssection of brain. The image is formed by a grayscale mosaic of MR signal intensities
from in-plane voxels. With current technology, a three dimensional volume is imaged
by a series of two dimensional slices. The localization of the MR signal to specific
voxels is made possible by magnetic field gradients that control the angle, phase,
and frequency of dipole precession on the voxel level. The process begins with slice
selection. By applying a linear gradient magnetic field in the direction orthogonal
to the desired slice plane, the slice plane is selectively activated by applying the
RF pulse frequency that corresponds to the resonant frequency of water hydrogen
protons for the magnetic field intensity unique to the desired plane. MRI machines
that are able to support steeper slice selection gradients allow for imaging of thinner
slices. The slice selection step guarantees that any MR signal received by the RF coil
originates from the desired slice. Because the entire plane of proton spins is excited
simultaneously, the entire plane will relax simultaneously, and hence the MR signal
must be recorded at once from all voxels in plane.
Resolution of separate in-plane voxel signals is achieved by inducing unique phase
and frequency of dipole precession in each voxel, in a process called phase and frequency encoding. The corresponding RF from each voxel will then retain its voxelspecific phase and frequency assignment. Consider the plane spanned by orthogonal
vectors x and y. To acheive frequency encoding, a linearly decreasing gradient mag26
netic field is applied along the x vector. As a result, precession frequency decreases
linearly along the x vector. To achieve phase encoding, a frequency encoding gradient
is applied along the y vector, and then removed. In the end state, precession along
the y vector is constant frequency, but with linearly decreasing phase. The resulting
relaxation RF measured by the external coil is composed of energy from the selected
slice only, with the various frequency and phase components encoding voxel identities.
In practice, the frequency and phase assignments to the in-plane voxels are chosen
such that the resulting linear summation RF signal sampled in time represents the
matrix values along one row of the two dimensional Fourier transform (2D-FT) of
the RF in-plane voxel magnitudes. The in-plane frequencies and phases are rapidly
reassigned to produce the remaining rows of the 2D-FT. This Fourier domain in MRI
imaging literature is termed "k-space".
Performing the inverse 2D-FT on the ac-
quired matrix recovers the image, a grayscale mosaic of MR signal intensities from
the in-plane voxels.
2.2.3
Physiology of fMRI
While it is generally accepted that the T2 relaxation time is sensitive to blood oxygenation levels via de/oxyhemoglobin as discussed above, research is still being performed to understand the relationship between neuronal activity and the BOLD signal. Recent work suggests that the BOLD signal more closely corresponds to local
electric field potentials than action potentials [31]. The local field potential (LFP)
is acquired by low-pass filtering the signal obtained from a measurement electrode
placed in the region of interest. LFPs are generally believed (but not confirmed) to
result from synchronous transmembrane potential across many dendrites, and hence
the computational inputs to a region. The nature of the LFP is also apparently dependent on choice of reference electrode location. A subsequent theoretical paper [26]
proposed a linear model to express the relationship between local field potentials and
fMRI signal. The combined use of electrophysiology and fMRI is currently an expanding area of research. The combined approach promises to elucidate the relationship
between fMRI signals and neural activity, simultaneously study neural behavior on
27
multiple scales and spatial/temporal resolutions, and enable visualization of local
electrical stimulation propogating on a global scale. Some of these questions might
preliminarily be answered by coupling electrophysiology with electroencephalography (EEG), circumventing the technical issues associated with performing electrical
recordings in the fMRI electromagnetic environment.
Some of the first fMRI experiments [12] examined the response of the early visual areas of the brain, namely V1, to flashes of light directed at the retina. These
experiments revealed a characteristic gamma-distribution-like shape to the rise and
fall of MR signal in activated VI voxels. The MR signal typically rises within two
seconds of neural activation in the voxel, peaks by four seconds, and decays slowly at
twelve or sixteen seconds post-activation. This characteristic hemodynamic impulse
response is parameterized with the functional form used in the gamma probability
density function, quoted below from [20]:
0
h(t)
t<A
(2.1)
=
where A sets the onset delay of the hemodynamic response, and
T
determines
the rate of the quadratic dominated ascent and exponential dominated descent. In
the conventional use of this parameterization of the hemodynamic response, T and A
are believed to be constant within anatomically focal regions but vary across brain
regions. Consequently, A and T values are typically set, and the data used to estimate
a constant amplitude coefficient to fit the response to a given stimulus. With single
stimulus or event-related design (discussed below), the choice of A and r can be made
based on the best fit to the raw hemodynamic timecourse.
The gamma function model of the hemodynamic response has been verified in VI,
but not in the face detection areas of the extrastriate occipital lobe and temporal
lobe studied in this thesis. This is not an essential deficiency because analysis of
information content in a brain area can be performed with incorrect parametrizations,
28
as well as non-parametrically.
2.3
Design of fMRI experiments
Because it is noninvasive and anatomically localizable, fMRI is a commonly used
technique to measure the human brain's characteristic response to one or more stimulus conditions. Conventionally, the response to multiple repetitions (trials) of the
same stimulus are averaged to produce a low variance estimate of the response to that
stimulus. The variance observed in the response to the same stimulus across multiple
trials is conventionally attributed to physiological and instrument "noise," although
the physiological component of the noise variance may include brain processes that
are simply not well characterized. These noise processes are not necessarily stationary
or white, and moreover the response of the brain to trials of the identical stimulus
may change as is often observed in electrophysiology. Consequently, differential analysis of the trial averaged response estimates of two stimuli generally may not be able
to detect more subtle differences between stimuli that elicit small changes in neural
activity. This issue is illustrated in the results of experiment three of this thesis,
gender bias of face detection (chapter 5).
While research is currently underway to develop more sophisticated models for
brain response estimation [38], methods have been developed based on the simple
model of identical stimulus response across multiple trials in the presence of colored
or white Gaussian noise. These methods can be applied over the entire duration of
the experiment, as with experiments one and two of this thesis (chapters 3 and 4),
or applied over consecutive sub-sections of the experiment for which the assumptions
might be more plausible, as with the last experiment of this thesis (chapter 5).
Three methods to estimate the characteristic responses of a brain voxel to a set of
stimuli are described below: brute-force (block and single stimulus), counterbalanced
(block), and event-related (single stimulus) design.
29
2.3.1
Brute-force design
In brute-force experiment design, a stimulus condition is presented briefly and the
next stimulus is presented after eight to sixteen seconds to allow the previous hemodynamic response to decay. Physiological and instrument noise demands that multiple
trials of the same condition be averaged to produce an estimate of a hemodynamic
response. The extended interstimulus period severely limits scheduling of multiple
trials within one experiment, and makes inclusion of more than four conditions impractical. Studies that characterize physiological and instrument noise together were
not found in a literature search, although such work might be possible by adding
a signal source to measure instrument noise and using signal processing theory to
separate and characterize the physiological noise in various brain regions.
In a variation of the above basic experiment design, the stimulus condition can
be repeated multiple times in close succession, typically every one to two seconds
in high-level vision studies. This time window, called a block, is typically sixteen
to twenty seconds during which multiple exemplars of one stimulus are presented.
Assuming a linear summation of identical hemodynamic impulse responses during
the block, the MR signal for activated voxels rises to a nearly constant level within
the first four seconds of the block. The block presentation is an improvement over
single stimulus presentation in that more measurement points from the same stimulus
can be obtained for every interstimulus interval.
Because each signal sample has
higher amplitude in the block design, the values also have higher signal to noise ratio,
assuming an invariant noise baseline. An important omission of this discussion is the
temporal correlation between multiple samples of the same block, which is a vital
statistic in evaluating the extent to which averaging a block of stimuli improves the
signal estimate.
An important and unjustified assumption brought in defense of block design is
that the hemodynamic response to the stimulus is a fixed-width gamma function.
Although it has been suggested to some extent by the V1 fMRI studies [12] and
simultaneous electrode/fMRI paper [31] that the hemodynamic response to some
30
length neural spike train is gamma-shaped, it is entirely conceivable that the pattern
of neural firing evoked by a stimulus might be two separated volleys of spike trains, or
some other nonlinear coding scheme. Under such a circumstance, the temporal profile
of the hemodynamic response, rather than the single average amplitude parameter
estimated by block design, could contribute information to discriminate stimulus
conditions. Not only would the linearity assumption be incorrect, but the fixed-width
gamma model of the response would not apply. The nonlinearity of the underlying
neural response as a problem in assuming linearity of the hemodynamic response is
mentioned briefly in [40].
Another criticism of block design is that presentation of long intervals of the same
condition can promote attention induced modulation of fMRI signal. The defense
against this criticism is that by checking that VI is not modulated differently by
various conditions, the "attentional confound" of the block design can be discounted.
However, it is not clear that attentional confounds would necessarily exhibit themselves in V1 for any or all experiments.
With all of these methods for estimation of hemodynamic response, the uncertainty regarding whether the complete dynamics (ie. the sufficient statistics) of the
response have been captured can be circumvented by conceding that
Principle 1 Any transformation on the original data that allows double-blinded discrimination of the stimulus conditions provides a lower bound on the information
content about the stimulus in the data.
2.3.2
Counterbalanced block design
The presentation of stimuli can be further expedited by omitting the interstimulus
period between blocks. Now the tail of the hemodynamic response from a previous
condition will overlap into the current condition. This tail decays in an exponential manner, as described in equation (2.1). In counterbalancing, each condition is
presented once before every other condition. Across runs, the absolute position of
any particular condition is varied over the block. In perfect counterbalancing, the
31
trials of different conditions would have identical scheduling statistics and indistinguishable joint statistics. A typical scheduling of four conditions across four runs in
approximate counterbalancing is as follows:
run 1: [0 1 2 3 4 0 2 4 1 3 0 3 1 4 2 0 4 3 2 1 0]
run 2: [0 3 1 4 2 0 4 3 2 1 0 1 2 3 4 0 2 4 1 3 0]
run 3: [0 2 4 1 3 0 1 2 3 4 0 4 3 2 1 0 3 1 4 2 0]
run 4: [0 4 3 2 1 0 3 1 4 2 0 2 4 1 3 0 1 2 3 4 0]
This is the schedule used in the third experiment of this thesis, although each run was
repeated several times. The numbers 1 through 4 correspond to blocks from respective
conditions, and 0 corresponds to an interstimulus interval during which the subject
fixates on a cross displayed on the screen. Each block lasts sixteen seconds, and each
run is 5 minutes 36 seconds. The subject rests between runs.
With approximate counterbalancing, the bias introduced by tails can be made
equal across all conditions in each average across trials. This bias however is not a
constant offset across the time window of each block. Because the tail of a previous
block decays exponentially, the bias, a sum of tails across all conditions, will decay
exponentially. The first stimulus block in a series is known to produce anomolously
high signal, but this bias is experienced by all conditions in a perfectly counterbalanced design. Biases based on non-stationarity of noise, such as baseline drift, are
also distributed across conditions.
The exponential tail to the bias is generally not acknowledged in descriptions
of the counterbalance method. Instead, the bias is described as "washing out" in
the average across trials, and samples within a final averaged block are taken to
be estimates of roughly the same signal amplitude. Furthermore, most experiments
are not perfectly counterbalanced, so that the bias is not identical in shape for all
conditions. Consequently, even perfectly counterbalanced experiments that perform
averages across all samples of a condition within its trial-averaged block can only be
sensitive to relatively large changes (roughly a difference of four percent increase from
baseline) in univariate (voxel-by-voxel) ANOVA of stimulus response.
32
Additional sensitivity in contrast can be acheived by multivariate (voxel vector)
analysis even when neglecting to average across trials in a counterbalanced design,
and when equal-weight averaging all samples of a single block time window [6]. Despite these simplifying and incorrect assumption of equal-weight averaging within a
single block, the detection of significant differences in response between conditions
is possible although suboptimal, as suggested by principle (1). Although neglecting
to average across trials in a counterbalanced design allows different biases between
blocks of various conditions, consistant differences detected between trials of different
conditions may still be valid, depending on the nature of the statistical test.
Because a valid conventional counterbalanced analysis requires averaging over all
counterbalancing blocks of a given stimulus in order to acheive uniform bias over
all stimulus conditions, the method is extremely inefficient in producing multiple
identical-bias averaged trials of the same stimulus. In fact, univariate ANOVA is often
performed across multiple non-identical-bias trials, which is fundamentally incorrect,
and in practice weakens the power to discriminate two stimulus conditions.
2.3.3
Rapid-presentation event related design
An ideal experimental technique would be able to produce multiple independent, unbiased, low variance estimates of hemodynamic response to many conditions. Under
assumptions of linear time invariant hemodynamic responses to conditions, a linear finite impulse response (FIR) model and the linear-least-squares linear algebra
technique provide an efficient solution. Using this mathematical method, known as
deconvolution or the generalized linear model, stimuli from various conditions can be
scheduled in single events separated by intervals commonly as short as 2 seconds. Deconvolution is more than fifty years old, linear-least-squares approximation is older,
and application to neural signal processing is at least twenty years old [24]. Deconvolution in [24] was presented in the Fourier domain, without reference to random
processes. In the treatment below, a time domain approach is taken that includes additive Gaussian noise in the model. A more computationally efficient algorithm might
combine these two approaches and perform the noise whitening and deconvolution in
33
Fourier domain. Also, this method is univariate, with deconvolution performed on a
per-voxel basis. Since correlations between adjacent voxels can be substantial, these
correlations might be estimated and voxel information combined to produce more
powerful estimates of impulse responses.
Event related design schedules brief presentations of stimulus conditions with short
interstimulus intervals. Typical presentation times are 250 to 500 milliseconds for the
stimulus and 1.5 to 2 seconds for the interstimulus interval. For the mathematical
simplicity of avoiding presentations at half-samples of the MR machine, each stimulus
is presented when scanning of the first slice of a series of slices is started. The time
taken to scan the fMRI signal of all selected anatomical slices, and thus the sampling
period of a voxel, is refered to as a TR in this section and in neuroscience literature.
This is in contrast to the previous section and imaging literature, where TR refers
to the relaxation time allowed a single slice in free inducton decay based imaging.
Also for simplity in applying the deconvolution technique, the voxel of interest is
assumed to be sampled every TR at some instant, whereas the sampling process
actually requires time over which to perform the MR imaging procedure. It is not
clear whether the fs-fast software implementation of deconvolution [21] accounts for
the offset in sample times between slices. Many equations and notations in the below
discussion on deconvolution have been directly quoted from [19].
The objective of deconvolution is to find the maximum likelihood estimate of
the finite impulse responses of a given voxel to the various stimulus conditions. To
estimate these responses, we are given the raw sampled MR signal timecourse and the
time samples at which each of the conditions were presented. We begin with a simple
generative model of the discrete-time signal we observe to quantify our assumptions
about how it is related to the impulse responses of a single condition:
y[n] = x1[n] * h[n] + g[n]
34
(2.2)
where y[n] is the observed MR signal, x[n] is 1 if the stimulus condition was presented
at time n and 0 otherwise, h[n] is the causal impulse response of the condition, and
g[n] is a colored zero-mean Gaussian random process. The issue of colored versus
white noise is discussed later. Qualitatively, equation (2.2) suggests that the signal
we observe is a linear superposition of noise and the stereotypical responses of the
stimulus condition, shifted to start when the stimulus is presented. This is a basic
concept of linear systems [36].
Over an entire experiment, the raw sampled signal from a voxel y[n] has some
finite number of time-points Ntp. The causal hemodynamic impulse response h[n] of
the voxel to the single condition will have some finite number of samples Nh. It is
computationally convenient to re-express equation (2.2) in matrix form:
y = X 1 hi + g
(2.3)
Here, y is a Ntp x 1 column vector [y[l], y[2], ..., y[n]]T and g is the vector [g[1], g[2], ..., g[n]]T.
The term X1 hi uses matrix multiplication to acheive a Ntp x 1 vector with column
components equal to the samples of x 1 [n] * h[n]. X 1 is accordingly called a stimulus
convolution matrix, and is a Ntp x Nh matrix formed as follows:
X1=
x1 [1]
0
0
x1 [2]
x,[1]
0
x,[3]
x,[2]
x 1 [1]
x,1[n] x,1[n - 1]
x,1[n - 2]
0
(2.4)
...
x1[n - Nh - 1]
Now defining h, = [hi[0], hi[1], hi[2], ..., hl[Nh]]T, the product X 1h 1 results in the
desired Nt, x 1 vector of the convolved signal x 1 [n] * hi [n].
The final simplifying step in converting to matrix notation is to allow inclusion of
multiple conditions. For the ith condition, denote the Ntp x Nh stimulus convolution
35
matrix Xj, and the Nh x 1 hemodynamic response vector hi. Now each of the N,
conditions contributes to the generative model:
y = Xh1+ X 2 h 2 +... + XNehN, + g
(2.5)
Assigning X = [X1X 2 ... XN] and h = [hlh2... hNc, equation (2.5) reduces to:
y = Xh + g
(2.6)
This simple linear equation simultaneously models the relationship between the
observed single voxel MR signal timecourse y and the unknown hemodynamic impulse
responses of all N, conditions, contained in h. y is Ntp x 1, X is Ntp x Nch, h is Neh
and g is Ntp x 1, where Nch
=
N,
x
X
1,
Nh.
Estimating h in equation (2.6) is a problem with Nch unknown parameters. In
general, variance in the estimate decreases with the number of unknown parameters
for a fixed data set. This can be accomplished here by decreasing Nh, the length of the
finite hemodynamic impulse response in the model. Alternatively, each hemodynamic
response can be further modeled as a fixed-width gamma function as in equation (2.1).
Then there are only Nc parameters to estimate, the gamma function amplitude for
each condition. This is accomplished as follows:
y = XAp + g
Here, Ap has been substituted for h in equation (2.6). A is a Nch
(2.7)
x
Nc matrix that
contains the desired stereotypical model impulse response function for each condition,
such as a fixed-width gamma function. p is the Ne x 1 vector that contains the
unknown amplitude-scaling constants for each condition. We desire a format for A
and p such that given assignments to p, the resulting Nch x 1 vector is a vertically
36
concatenated series of impulse response waveforms for each condition, just as with h
in equation (2.6). This is accomplished with the following constructions:
0
0
a1 [1]
0
a,[2]
0
a1[N]
0
Po
0
a 2 [1]
Pi
0
a 2 [2]
A=and
etc.
p=
(2.8)
a 2 [Nh]
0
0
0
PNc
-..
aNe [Nh]
As mentioned in section (2.2.3), the optimal choice of gamma function width can be
made using an initial estimate of the raw impulse response with equation (2.6).
Regardless of the exact form of the linear model, the problem of estimating hemodynamic impulse responses reduces to a stochastic model of the form y = Xh + g.
This is a classical problem of nonrandom parameter estimation [54]. We seek an
unbiased minimum variance estimator h of the nonrandom (constant) parameter h.
Unbiased means that on average, the estimator equals the value to estimate. To solve
this problem, it is not possible to apply the Bayesian notion of minimizing expected
least squares difference between h and h, because the optimum choice is h = h, which
is an invalid estimator. Instead, the common approach is maximum likelihood, where
A is chosen
to maximize the probability of the observed data y. The steps below
are quoted with minor alteration from the derivation in
([53],p.160).
Because g is a
jointly Gaussian random vector ~ N(0, Ag) and Xh is a constant, the distribution on
y is also jointly Gaussian.
37
We chose h to maximize the probability density function on y given h:
(2.9)
N(y; Xh, Ag)
PY(y; h) =
(y - Xh)T A;;(y - Xh)]
oc exp[-
(2.10)
To maximize (2.10), we must minimize the negative of the argument of the exponential
term, equivalently
J(x) = (y - Xh)TAgl(y - Xh)
(2.11)
We can now set the Jacobian of (2.11) to zero and solve for h. Alternatively, we can
solve for h as a deterministic linear least-squares problem. To convert to the notation
in [19], we write J(x) as the norm squared of one term of its Cholesky factorization.
Note that the Cholesky factorization of a n x n symmetric positive definite matrix
A is LTL, where L is the n x n upper triangular Cholesky factor matrix. J(x) is
positive definite because the Cholesky factorization exists. The Cholesky factor of
J(x) is simply B- 1 (Xh - y) where B-
1
is the Cholesky factor of A- 1 . We know B
exists because Ag is positive definite and hence Ag is positive definite. Ag is positive
definite because it is a full rank covariance matrix.
Rearranging variables accordingly, we have a deterministic linear least-squares
problem:
h = min|B- 1 (Xh - y)112
(2.12)
h
= min |IB-Xh - B-1 yI 2
(2.13)
h
=
min|lFh-g11 2
(2.14)
h
Observe that B- 1 is Ntp x Nt,, X is Ntp x
38
Nch,
h is Nch
x
1, and y is N
x
1.
Consequently, both (Fh) and g are Ntp x 1.
In geometric terms, equation (2.14) states that we desire a vector Fh in the
subspace of RNP spanned by the column vectors of F that is the closest in Euclidean
distance to the vector g in RNtP. The linear projection theorem [1] states that the
vector in the F column subspace closest to g is the projection of g onto the F column
subspace. By the orthogonality principle [53, 451, the error vector r = Fh - g is
orthogonal to the F column subspace. In other words, r is in the left nullspace of A,
or the nullspace of AT [45]. We solve the orthogonality condition for the least-squares
choice of h:
FT (Fh - g)
= 0
FTFh =
h =
(2.15)
FTg
(2.16)
(FTF)-FTg
(2.17)
Substituting from equation (2.13) for F and g,
h = ((B- 1 X) T (B- 1 X))- 1 (Bn-IX) T (Bn[y)
(2.18)
h = (XT(BTB)- 1X)-1XT(BTB)- 1y
(2.19)
where in equation (2.19), we have used the properties that (QRS)T = STRTQT
and (RT)- 1 = (R- 1 )T for arbitrary equal dimension square matrices
Q, R, S,
and
R, RT invertible. We can rewrite equation (2.19) in terms of C, the normalized noise
covariance matrix of g, with C = BTB. We use h instead of h to denote this choice
of h as the maximum likelihood estimate:
h = (XTC-1X)-1XTC-1y
39
(2.20)
To calculate error covariance, write down the definition of covariance, substitute (2.20)
for h, and simplify (first and last steps shown here):
Aj
=
E[(e - e)(e -
e)T]
(2.21)
= E[(h - h)(h - h)T
(2.22)
(2.23)
(XTClX)
-
In the general case, a maximum likelihood (ML) estimator need not be efficient
in the sense of achieving the Cramer-Rao bound. Quoting from [53], if an efficient
estimator exists, it must be the ML estimator. Efficient estimators always exist for
nonrandom parameters in a linear model with additive Gaussian noise as in (2.7).
This statement is not proven here, but requires calculation and comparison of the
Cramer-Rao bound to the ML estimator error covariance.
Because the normalized noise covariance matrix C is not available a priori for
the given voxel, the current deconvolution software [21] first estimates h with equation (2.20) assuming g is white noise, i.e. C = I, the identity matrix. The residual
error r = y - Xh is considered to be a timecourse example of the noise g (recall the
original model, y = Xh + g). Time course examples are culled from all within-skull
voxels. The autocorrelation of g is estimated from the time course examples in some
unspecified way, probably the brute-force no window method. See [35] for better
methods of estimating autocorrelation from samples. The following parameterized
autocorrelation function [19] is fit to the estimated autocorrelation, normalized so
that R[0] = 1:
R [k] =
( - a )pIkI
0 < k < kmax
0
Iki >
(2.24)
kmax
No justification for this particular form is provided, although it does bare resemblance
40
to all-pole modeling [2]. Once o and p are chosen, the entries for C are drawn from
R, [k]:
C=
Rr[0]
R[1]
R[1]
Rr[0]
...
Rr[Ntp] Rr[Ntp - 1]
Rr[Ntp]
Rr[Ntp - 1]
(2.25)
R[0]
This matrix is Toeplitz, meaning simply that any diagonal or subdiagonal has identical
elements. A more complete discussion of the conditions for invertibility of XTC-1X
and C1 is not provided here. The estimator h is unbiased, as can be seen by substituting y = Xh + g into equation (2.20) and taking the expected value E[h]:
E[h]
= E[(XTC-lX)-lXTC-1y]
=
E[(XTC-lX)-lXTC-1(Xh + g)]
=
E [(XTC-lX)-1 (XTC-X)h] because g is zero-mean Gaussian
=
E[h]
=h
(2.26)
The hemodynamic response estimation method based on a fixed-width gamma function impulse response is in general not an unbiased estimator, but as discussed earlier,
the variance of the estimator may be reduced due to fewer parameters.
A separate concern is the presence of deterministic additive drift signals, due
to instrumentation, breathing, or other physiological signal source. Detrending the
data with low order polynomial functions of time can be accomplished by adding
a oto + ait 1 + ... terms to the generative model, and fitting the trends and hemodynamic responses together. This is discussed in [19]. Noise waveforms like breathing,
measured concurrently with the experiment, can be similarly "regressed out" of the
fMRI signal.
41
The quality of the unbiased estimator h can be measured by the variance in
estimation error e = h - h. Recall from (2.23) that E[eeT] is a Nh x Nh covariance matrix Ae
=
(XTC1X)1, but a simpler summary statistic is provided by
E[eTe] = trace(Ae). The associated literature [7] calls the reciprocal of this quanitity
the "efficiency", which is accordingly:
E=
1
1(2.27)
trace((XTClX)l)
A strange phenomenon was reported [7] by which employing variable interstimulus
intervals, and thereby affecting the stimulus convolution matrix X, this efficiency increases dramatically as the mean interstimulus interval decreases, whereas the opposite trend holds for fixed width interstimulus intervals. No explanation was provided
in the article, but the effect presented is dramatic.
Current work [38] on characterizing the statistical nature of physiological and
instrument noise, hemodynamic response profiles, and signal summation, will yield
more reliable and flexible methods for measuring stimulus response in neuroscience
studies.
2.4
Analysis of fMRI-estimated brain activity
With one or multiple independent estimates of hemodynamic response isolated for
voxels of interest, questions of physiological relevance can be answered. The most
prevalent question in the literature asks to what extent single voxels or average responses from anatomically focal sets of voxels differentiate between stimuli. The
broader problems are (i) to what extent responses from sets of voxels can be used to
discriminate the stimulus conditions, (ii) how these voxels encode information about
the stimulus, and (iii) where in the brain these voxels are present.
These problems can be generally described as codebook problems [52], because
they seek to describe a method by which brain activity can be matched with an
42
associated event. A complete understanding of the mechanisms of the brain could
generate a codebook, but the converse is not true because the codebook does not
specify a physical mechanism. Moreover, every neuron or network of neurons could
not feasibly be subjected to every possible condition, so that a generalizeable theory
is necessary to complete this codebook.
Most methods, like ANOVA or split-half decoding analyses, impose a specific
decoding function or a set of functions from which to choose. The most prevalent
version of ANOVA in the associated literature imposes a univariate Gaussian model.
Split-half based SVM imposes a restricted class of indicator functions over which
it optimizes. Analyses that estimate non-parametric probability density functions
directly quantify the codebook questions. Information theory and Bayesian networks
operate in an assumption free manner at the cost of increased datapoints.
Some methods require estimation of a parametric or non-parametric probability
density function (PDF). This includes most classical statistics, like ANOVA and information theory. Methods that do not require estimation of PDFs generally include
learning machines, especially SVMs, that select from a set of decoding functions based
on empirical risk minimization.
The classical split between statistical methods is descriptive versus exploratory.
Descriptive methods seek to accept or reject a hypothesis regarding a specific relationship between the hemodynamic responses and the stimuli. Exploratory methods,
often graph-based, extract some property of the responses without knowledge of the
stimuli, and then seek to relate the property to the stimuli.
An important division between techniques is univariate versus multivariate. Most
fMRI neuroscience work focuses on voxel-by-voxel analysis of the codebook questions,
but the fMRI literature since last year (2002) is becoming increasingly aware of the
need to explore vectors of voxel responses [25, 44, 6]. Electrophysiology has already
transitioned to multivariate analysis with regards to the codebook problem [11], multiunit recording, and prosthetics design. EEG and MEG have found multivariate
techniques essential to analysis [15]. Additionally, there is increasing awareness of
the importance of multiscale and multiresolution searches [43].
43
Finally, most methods define to varying degrees, a generalizable theory that extends the codebook from a few stimuli to the entire set of stimuli. The most explicit
of these methods are called models, and predict the behavior of the system of interest under a wide range of conditions. Some biologically relevant models incorporate
aspects of the underlying physiology.
The following sections review ANOVA, principle components analysis (PCA), and
support vector machines (SVM), three methods that are employed in this study. For
each of these methods, the reader is referred to standard texts [48, 41] for detailed development. These methods represent the various divisions in analysis methods. However, many potentially useful techniques have been omitted from discussion, including
information theory and mutual information, independent components analysis (ICA),
dimensionality reduction algorithms [33, 42, 47], other data compression algorithms,
other learning machines like neural networks, and classical detection/estimation methods like linear predictive (Wiener) filters, and adaptive (Kalman) filters.
2.4.1
Univariate analysis of variance (ANOVA)
The face detection areas of the brain are identified in the literature and the experiments in this thesis by sets of anatomically localized voxels that produced significantly higher amplitude responses to faces versus nonfaces.
Localized patches of
face-selective voxels in the temporal lobe and extrastiate occipital lobe have been defined in the literature as the fusiform face area (FFA) and occipital face area (OFA)
respectively.
Because of its simplicity, univariate ANOVA has been the most popular choice in
making this statistical comparison. In the procedure, an individual voxel is selected
and the gamma-fit amplitude of impulse responses to the engaged and disengaged
conditions are compared. The voxel is called "significant" if the probability that the
amplitudes belong to one distribution is less than some threshold, typically 0.05.
In general, the region of interest includes individual voxels whose response amplitudes are significantly different between two conditions. The literature typically only
includes discovery of regions of interest that are anatomically localized sets of ten or
44
more 27 mm 3 voxels. Scattered and small sets of significant voxels are disregarded as
sampling anomolies because of the large number of total voxels (about 40,000 in this
study) examined. The Bonferonni correction [49] relates to this issue.
Univariate ANOVA calculates the ratio of sample variance within groups to across
group, to determine which of two or more groups are indistinguishable.
The F-
statistic, developed on a chi-square random variable to measure significance of this
ratio, is based on the premise that trials from each condition are drawn from a Gaussian distribution of amplitudes. When only two groups are compared, the technique
is equivalent to a t-test. Details of univariate ANOVA and variations on the technique
are provided in [41].
It should be noted that nonparametric statistical techniques, including the KolmogorovSmirnov test [50] , mutual information [5], and SVM classifier gradient [14] have also
been used in literature with similar results to a t-test in the isolation of the FFA.
2.4.2
Principle components analysis (PCA)
Suppose we are given a p-dimensional response feature of each voxel in a region of
interest (ROI) for multiple trials of several conditions. This feature might be one
dimensional, as with the amplitude of a gamma-fit to the response, or h-dimensional,
with one dimension for each time-sample of the hemodynamic impulse response. The
response feature of a given voxel i across a set of stimulus conditions describes a
random vector [xi,
+ -..Xi+P-1].
To describe the responses of a set of m ROI
voxels, we have a vector of random variables x
=
[x 1 , x 2 , ...
, Xn]T,
where n
=
m x p.
The objective of PCA is to choose the linear transformation A that rotates x into
a vector of uncorrelated random variables y by y = ATx. Rank ordered by their own
variance, the first s of these random variables [yi, ... , y]T form the s-dim subspace
with the smallest average squared-error between an n-dim ROI response and an s-dim
approximation. The following discussion of PCA draws heavily from class lectures
([54] on 9/11/02, 9/13/02); many equations are quoted directly from those lectures.
The PCA axis rotation matrix A can be derived from the expected least-squares
approximation criterion using the method of Lagrangians. For simplicity of discussion,
45
we instead present the matrix A, and demonstrate that the resulting random vector
y
-
ATx indeed consists of n uncorrelated random variables [yi, ... yn]T.
Define the n x n covariance matrix A, of the n x 1 dimensional ROI voxel vector
x
AX = E[(x -
)(x - R)T]
(2.28)
Observe A. is symmetric because A, = AT and positive semidefinite because it can
be written as Ax = FTF with F = (x
-
R)T
([52], p. 50). Quoting from ([52], p. 49)
a symmetric square matrix A is positive semidefinite iff xTAx > 0 Vx, and positive
definite iff xT Ax > 0 Vx.
Symmetric implies that the matrix has n independent, orthonormal eigenvectors
v3 , meaning that v Vk = 1 if j = k and 0 otherwise. Positive semi-definiteness
implies that all eigenvalues of Ax are real and non-negative, Ai > 0. Observe that
this does not guarantee Ax is full-rank, which would mean invertible and Ai > 0. A
positive semi-definite matrix is positive definite if and only if it is invertible [52].
Consequent to these conditions, we are guaranteed that the following construction
of n x n matrix A exists. Define the columns of A to be the n orthonormal eigenvectors
of A.:
A
vi
(2.29)
V2
I
j
-
Because the eigenvectors are orthonormal, ATA = I the identity matrix, meaning
that A is orthogonal, AT = A- 1 . Define the diagonal eigenvalue matrix
46
A,
A
(2.30)
A2
Then AxA = AA. Since A is orthogonal, it is invertible, so A, = AAA-', and using
A- 1 = AT for A orthogonal, we have A, = AAAT.
Consider the data random variables x rotated into the new set of random variables
y = ATx.
Quoting directly from [54], the covariance matrix Ay is calculated as
follows:
(2.31)
AY
= E[(y
-
y)(y
-
(2.32)
y)T]
=
E[A T (x
=
E[AT(x - k)(x - k)TA]
-
k)(A
T
(x
-
k))T]
(2.33)
(2.34)
= E[ATAxA]
(2.35)
= E[ATAA]
(2.36)
=
E[A- 1 AA]
=A
(2.37)
(2.38)
with A as given in equation (2.30). Because the autocovariance matrix Ay is diagonal,
we have shown the change of coordinates y = ATx rotates to a set of uncorrelated
basis random variables y. In PCA terminology, the eigenvectors listed in the columns
of A are known as the principal components of the observed data x.
The final relevant property of PCA is the minimization of expected least-squares
error in subspaces. Consider the best s-dimensional approximation to random vector
x, given by
47
(2.39)
and its residual error vector
e =x- x
The vectors
#/
(2.40)
define bases for our approximation subspace. The optimal choice of
subspace that minimizes the expected square residual error E[eTe] is found through
the Lagrangian technique briefly mentioned at the outset of this discussion on PCA.
The result is to choose (01, 0 2 , ... , A.) to be the s first eigenvectors of Ax, ranked from
greatest to least eigenvalue. The calculation reveals that the least average sum of
squared residual errors, based on PCA, is
E[eTe] = As+, + As+ 2 + ... + An
(2.41)
where the approximation subspace is s-dimensional, and the original space is ndimensional.
The utility of this result in equation (2.41) is that we can now examine the response of a ROI response vector to a multitude of stimulus conditions and determine
some number of uncorrelated random variables that would be needed to explain the
dynamics of the output. Comparing this against some notion of the dimensionality
of the inputs, we might postulate upper bounds regarding the degrees of freedom or
information capacity of the ROI voxel set and brain system itself.
The caveat of developing an explanatory model with a linear combination of uncorrelated random variables is important and at times intolerable. Simple deterministic
nonlinear systems can transform an input of only one independent random variable
into an output that requires many PCA uncorrelated random variables. Even if the
48
system is linear, it may be difficult to determine the true dimensionality of the data,
rank(A.), as noise typically ensures that experimental data sets are full-rank with
some decaying profile of eigenvectors.
Projecting the high demensional voxel response data into the first two most dominant eigenvectors allows a least-squares visualization of the data. The graphs generated from the projection of this data onto the eigenvector subspace allow for an
exploratory mode of data analysis. The factors most discriminated by the dominant
eigenvector modes of the data can be visualized. This method is most effective when
ROI voxels are highly correlated, so that the data lies mostly in a two dimensional
subspace despite being embedded in a high dimensional space.
2.4.3
Support vector machines (SVM)
While PCA in the form described above is a useful tool for the exploration and
characterization of variance in the data, it does not provide a direct quantification
of how well an ROI distinguishes between stimulus conditions. The support vector
machine does perform such a direct quantification, typically between two stimulus
conditions (binary classification). M-ary classifications are accomplished with log 2 M
binary classifier SVMs. As with any learning or prediction algorithm, the evaluation
process includes training and testing. The training data set, also called a labeled data
set, includes examples of the multidimensional ROI response paired or "labeled" with
the stimulus that produced the response. The SVM optimizes its parameters based
on the labeled data set so as to minimize the probability of misclassification in the
testing data set, which is assumed to be drawn from the same distribution as the
labeled set. The success of the binary SVM or system of SVMs on the test data
provides a lower bound estimate on the amount of information available in the ROI
to discriminate the stimulus conditions.
Binary SVM classification is based on finding the hyperplane in the desired data
space that separates the two conditions so as to minimize testing error. Because the
dividing boundary is a hyperplane (w -x + b = 0), any test point can be classified
by essentially evaluating the sign of the dot product between the normal vector of
49
the hyperplane and the test point vector, f(x) = sign((w - x) + b). This notation
is taken directly from [13]. More sophisticated decision boundaries are achieved by
first performing a nonlinear mapping of the data space. The optimal hyperplane in
the new data space has some more complex shape in the original data space based on
the mapping. When using a nonlinear map when classifying the test data, it can be
cumbersome to first perform the map, and then evaluate the classifying dot product.
In some cases, it can be possible to express the classifying dot product as a function of
the dot product in the original data space. This expression is called a kernel, and can
save multiplications especially when the new data space is higher dimensional than
the original data space. Since kernels are computationally efficient and determine
the SVM decision boundary, much research has focused on crafting kernels that are
customized for specific applications. A brief introduction to SVM is provided in [13],
and a comprehensive textbook has been written by Vapnik [48].
The ability of SVMs to use fewer training data points than probability density
function (pdf) based maximum likelihood (ML) estimators is not magical. This ability
is shared by pdf methods that use a parameterized model of the pdf rather than
trying to nonparametrically estimate the pdf. For example, if a multivariate Gaussian
distribution with independent voxels was the appropriate model for the response of
an ROI under one condition, then the multivariate mean and variance of the Gaussian
could be estimated with relatively few training data points. If no assumptions were
made about the shape of the pdf, then any windowning technique on those few training
points (e.g. Parzen windows) would be a comparatively rough estimate of the true
Gaussian distribution.
Suppose however that despite the parametrized Gaussian assumption, the underlying distribution was not Gaussian at all. As the training set increased, the
parameterized pdf approach would persist in its incorrect estimate of the true pdf,
whereas the nonparametric method would converge to the true pdf. The disadvantage
of the parametrized pdf approach is that its performance might not be good with any
size training set if the underlying assumption is incorrect.
The SVM performs well with few training points for the same reason as a param50
eterized pdf: it makes strong assumptions about the true underlying scatter of the
data under each condition and uses the training data to tune those strong assumptions. And just as with parametrized pdfs, only when those assumptions are close to
correct can the SVM perform well with few or many training points.
With parameterized pdf models, these assumptions come in the functional form
of the pdf that is chosen. With SVM, the assumptions are expressed by the chosen
data space. ROI voxel responses under two conditions can be described by points
in a multidimensional coordinate system where each point represents a trial, and
each coordinate contains some information about the ROI's response during that
trial. One example coordinate system discussed in 2.4.2 assigns each voxel's gammafit amplitude to each coordinate. The chosen data space could be this coordinate
system or any other, perhaps a nonlinear mapping of the current data space to a new
space of different dimensions.
Because the SVM procedure finds a dividing hyperplane that minimizes training
error in the chosen data space, it is essential that the underlying distribution of
responses to the two conditions generally fall on separate sides of some dividing
hyperplane. If this is not the case in the chosen data space, then the SVM will not
have the power to "generalize" to the testing data set. Choosing the appropriate data
space is as essential to the success of SVM as choosing the appropriate functional form
in a parametric pdf approach.
One application of SVM to stimulus classification in fMRI [6] reported modest
success of a linear SVM on the data space with the average response of each voxel in
the ROI recorded in a different coordinate. The third experiment of this thesis (chapter 5) employed a linear SVM on a dataspace using various features of voxel response
for classification of the gender of the face stimulus. Results were not promising, and
systemic non-stationarity that similarly affected both stimulus conditions was implicated. After estimating and removing the non-stationarity, successful SVM classification was demonstrated. Details of this experiment are discussed in chapter 5.
51
52
Chapter 3
Experiment One: FFA response to
nonface stimuli
The fusiform face area (FFA) in the posterior temporal lobe produces maximal fMRI
signal when subjects view pictures with human faces [29]. Similarly, the parahippocampal place area (PPA) responds maximally to indoor or outdoor scenes [9].
Maximal response to a stimulus category implies the response of a region distinguishes
members from nonmembers of the category, but does not rule out the representation
of other stimulus information in the response. Here we present evidence that both
regions' voxels represent nonpreferred-category objects in terms of their similarity
to the preferred category, and both regions are sensitive to the difference between
photographs and line drawings. Principal components analysis of the FFA and PPA
responses to nonfaces and nonplaces respectively suggest one to three dominant eigenmodes of variation in the data. The first two nonpreferred stimulus response modes
are biased towards the direction of variation between preferred and nonpreferred stimuli or the discrimination of photographs and line drawings. We propose that FFA
and PPA voxel responses to nonpreferred stimuli are optimized for detection, i.e.
discrimination between preferred and nonpreferred stimuli.
In the experiment, eight subjects were scanned with fMRI by M. Spiridon for a
separate study [44]. While they viewed 16 second epochs (blocks) of pictures, subjects performed a "one-back task" to demonstrate their attention, where they would
53
indicate consecutive presentations of the same picture by pressing a button. Consecutive presentation events would occur randomly within a block and with a frequency
of two per block. Data from two of those subjects were corrupted by excessive head
movement. Four of the remaining six subjects were analyzed for this study. Stimuli collected included seven different categories (faces, cats, houses, chairs, scissors,
shoes, bottles) with grayscale photograph or line drawing versions of each picture.
See [44] for pictures of the stimuli. Presentation was based on brute-force block design (section 2.3.1) as discussed in [44]. Standard motion correction was performed
based on the fs-fast software program (reference fs-fast) and detrending was based on
average amplitude of pre- and post-stimulus fixation intervals. FFA and PPA regions
of interest (ROI) were localized with standard stimulus contrasts [44] and univariate ANOVA (section 2.4.1) implemented in fs-fast [21]. Trial-averaged block percent
signal change from baseline hemodynamic response timecourses were computed separately for each voxel of each ROI. Subsequent to this processing, fourteen timecourses
were accordingly available for each voxel of interest, including responses to photo and
line drawing versions of each of the seven stimulus categories.
These voxel responses across each ROI were then represented as points in multidimensional space, one for each stimulus category. First, a voxel response feature,
a function of each voxel's block hemodynamic timecourse was chosen. The feature
selected here was a vector of the full timecourse of a voxel's response, although the
average amplitude or gamma-fit amplitude (gamma convolved with a square pulse
for the block hemodynamic response shape) were also possible features. Note that
an amplitude average of this timecourse would have provided a lower variance (and
hence lower dimensional) feature of the block hemodynamic response. For each stimulus, a different ROI response vector was formed by concatenating the feature vector
from every voxel in the ROI. The ROI response vectors were data points graphed
in a n-dimensional data space, where n is the dimensionality of the voxel response
feature times the number of voxels in the ROI. With 40 voxels and 10 timepoints per
hemodynamic response, the space would be 400 dimensional. The variance in feature
response across stimuli for each voxel was not normalized, although this would pro54
vide alternate insight by preventing higher signal range voxels from dominating the
analysis.
First we performed PCA and verified that the set of fourteen data points included
one dominant eigenmode both in the FFA and PPA (Figure 3-1 d and e respectively).
This result was expected because the precondition on an ROI voxel is that it respond
maximally to preferred category stimuli. One dominant eigenmode was not a trivial
consequence of the precondition, because variance in the signal in other directions
could have resembled or even dwarfed the face-nonface variation in magnitude. Nevertheless, this situation was less likely because the FFA average response (gemoetrically, data projected onto the mean vector) also shows separation of face and nonface
stimuli as the dominant source of variance. Since both preferred and nonpreferred
category stimuli were present in the fourteen data points, the large magnitude separation between preferred and nonpreferred datapoints compared to any other source
of variance resulted in one dominant eigenmode.
We visualized the high-dimensional data set by projecting the fourteen data points
onto the first two principal components calculated by principal components analysis
(PCA). This produces a least-squares estimate of the data configuration as explained
in section 2.4.2 on PCA. The resulting graphs (Figure 3-1 a,b,c) showed the large
separation between nonpreferred and preferred stimuli required of ROI voxels. Both
PPA and FFA graphs also demonstrated a systematic separation of data points across
all subjects by classification as photographs and line drawings.
By removing the preferred stimulus data points we characterized the PCA dimensionality of the FFA and PPA voxel responses to nonpreferred stimuli. The preferred
stimuli for FFA and PPA included faces and houses respectively; cats were removed
from the analysis because they were not clearly categorized as faces or nonfaces. Unlike the PCA dimensionality of the entire data set, the dimensionality of nonpreferred
points was independent of the preferred-nonpreferred ANOVA contrast selection of
ROI voxels (Figure 3-1 g,h) . The analysis demonstrated that both FFA and PPA
responses to nonpreferred stimuli were embedded in low dimensional spaces. A summary of PCA variance with and without preferred stimuli across all four subjects in
55
7
I2
~6
2
7
3
4.
I120
2
4
746
5
0
3
13 7
I
4
3
4
62
3
-20
2
7
-2D
0
-10
10
0
-5
5
1st Pricmpd CompOnet
1st Prki~e Componnt
-20
13t PrIcM
-40
10
(b)
(a)
1M
20
0
CapOM"
(c)
,inn.
.
.
.
so
I
80
0
0
20
>20
20
1C2 P~dpd 3COnManut 4
S
5
1
0
2
RkEI
(d)
34p5
-OO-
6
7
I
1
2
3
7 8
-**5C -OPri
4
9
10
(f)
(e)
~80
I
I0
40
20
.
1
II2
5
4
3
M2i6c7M.
6
U. 1
_
7
IN=2
4
5 6
7
(li)
(g)
Figure 3-1: Typical results from one subject of PCA variance and projection analysis
in (a,d,g) FFA, (b,e,h) PPA, and (c,f) all visually active voxels. (a,b,c) ROI response
vectors projected onto first two principal components. Labels 1 and 2 indicate face
stimuli; all other labels are non-face stimuli. Label 5 indicates house stimuli; all
other labels are non-place stimuli. Photograph stimuli are denoted in boldface italics,
and line drawings are in normal type. (d,e,f) Variance of dataset along principal
components including nonpreferred and preferred category response vectors. (g,h)
Variance of dataset along principal components excluding preferred category response
vectors.
56
101
80
i60
0
so-
I
j 20
>20
0
0
(a)
(b)
Figure 3-2: Sum of variance from PC1 and PC2 for data (a) including and (b) excluding own-category stimulus responses. The labels S1 through S4 denote the subject
number, consistent between panel (a) and (b).
FFA, PPA, and all visually active voxels is given in Figure 3-2. Despite removing the
preferred stimulus data points, dominant eigenmodes persisted in the nonpreferred
data points, as evidenced by the large percentage (typically greater than fifty percent)
of variance explained by the first two PCs.
What were these "nonpreferred eigenvectors", i.e. the eigenmodes of these nonpreferred stimuli? Did the different responses to nonpreferred stimuli mainly or exclusively result from their varying similarity to the preferred stimuli? We wanted to
know whether the diversity of responses produced by an ROI to nonpreferred stimuli
related at all to the function of distinguishing preferred and non-preferred stimuli.
This question was equated with the geometric question of whether either of the two
most dominant nonpreferred eigenvectors were aligned with the "detection vector"
that connected preferred and nonpreferred objects. The aligment was visualized by
plotting the nonpreferred stimulus data on a plane defined by the eigenvector and
detection vector coordinates (Figure 3-3). As with correlation, if the vectors were
aligned, the data would fall neatly along a line. For every subject in both FFA and
PPA, this effect was observed for either the first or second nonpreferred eigenvector. In the other eigenvector, misalignment was a result of the eigenvector coding for
photograph versus line drawing rather than detection.
57
7
a
14
s.a. - 93 degs
6
3
_
2
g
7
0
3
6
8-
4
-2
12
z
3 5
4
-10
-8
-6
Detecdon Vector Coordinate
-12
5
-10
-8
-6
Detoction Vector Coordinate
(b)
(a)
4
s.a. - 156 degs
4
3
%,2
0
0.
4
4%
31
96 degs
4
5
1
2
5
z
s.a.
4 7
7
-4
-4
-15
4 3
4
0
-12
s.a. - 138 degs
-10
-5
0
Detection Vector Coordinat
-5
0
Detection Vector Coardinat
(d)
(N
Figure 3-3: Typical plots from one subject of nonface principal component coordinates
against face detection vector coordinates for stimulus responses in (a,b) FFA and (b,c)
PPA. The detection vector coordinate connects the average face response vector to
the average nonface response vector. (a,c) ROI nonface response vector coordinates
projected onto PC1 of nonface data points and the detection vector. (b,d) Similarly
for nonface PC2. Alignment of data along a line indicates strong correlation between
the two axis variables. Geometrically, this indicates either the nonface data already
falls along a line, the vectors corresponding to the axis variables are aligned, or
some mixture of these two scenarios. PC nonface variance analysis (Figure 3-1 g,h)
suggests that the axis alignment concept is appropriate. The majority of variance is
not explained exclusively by the first PC, so the data does not already fall along a line
in the original response vector space. Although fourty percent of the variance falls
along a line, this is not enough to allow any coordinate choice to trivially project data
onto a line, as evidenced by the coordinates chosen in (a) and (d). Axis alignment is
quantified directly in Figure 3-4.
58
U
'~
i
0
10
20
30 40 50 60 70
subtending angle (degrees)
80
0
90
10
20
30 40 50 60 70 80
subtending angle (degrees)
90
(b)
(a)
Figure 3-4: Summary of subtending angle between detection vector and nonpreferred
category principal component in (a) FFA and (b) PPA for four subjects. If the subtending angle is greater than 90 degrees, the angle is subtracted from 180 degrees
to reflect the alignment between vectors rather than the direction in which the vectors are pointed. Two vectors are completely aligned at 0 degrees and completely
unaligned or orthogonal at 90 degrees subtending angle.
The alignment of nonpreferred data eigenvector and detection vector could be
quantified by the angle subtending an eigenvector and the detection vector. With
both vectors normalized, this angle was mathematically equivalent to the (zero-delay)
correlation between the two vectors because AR =
RI cos 0.
AB
The smaller of the
angles between the detection vector and the two dominant nonpreferred data eigenvectors was recorded across subjects for FFA and PPA (insert figure). In both ROIs, and
especially the PPA, the nonpreferred eigenvector is biased towards alignment (zero
subtending angle) with the detection vector as opposed to exhibiting uniform misalignment or consistantly low alignment. Some of the misalignment can be explained
by error in the calculation of the true detection vector and the true nonpreferred
eigenvectors due to a limited data size and noise in the data coordinates.
One question that arose in analysis was how much alignment should be considered as strong alignment. Because angle is equal to correlation in this case, we can
equivalently ask how much correlation is strong correlation. The standard comparison made in most research is against a null hypothesis of zero correlation. This would
equivalently correspond to a uniform distribution of angles across subjects. This is
clearly not the trend observed in the accompanying summary angle graphs. Because
59
the FFA subtending angle is clustered around 38 degrees rather than zero degrees,
the true FFA subtending angle is most likely not zero degrees in particular. The PPA
subtending angle is lower than in FFA, but also exhibits significant variation across
subjects. Nevertheless, because the nonpreferred principal component is typically
aligned within 45 degrees of the detection vector in both cases, over 50 percent of the
energy of the principal component modulation is in the detection vector direction.
This is derived from the fact that the leg of a 45-45-90 degree triangle is " of the
hypotenuse, and the signal energy here is the square of the magnitude of the response
vector. Consequently (
)2 x
100 = 50%. Intuitively, a signal vector at 45 degrees to
the x axis in the x-y plane has half of its power in each of the basis directions x and
y.
In summary, this experiment showed with four subjects that (1) both FFA and
PPA voxels are sensitive to photograph and line drawing stimulus differences (2)
voxels in both ROI have low dimensional response to nonpreferred stimuli, and (3)
the dominant response variation in nonpreferred stimuli encodes information about
detection and the photo/line drawing contrast.
Seminal work defined the FFA [29] and PPA [9] as regions responding (in ROI
voxel averaged gamma-fit amplitude) maximally to faces and places respectively, establishing those regions as category detectors. This experiment demonstrated how
this detection information is represented in FFA and PPA, by amplitude modulating
one specific pattern of response across the ROI. An approximately orthogonal pattern
of response is modulated and superimposed to signal contrast between photo and line
drawing stimuli. A recent review on object recognition [22] asserted FFA invariance
to photographs and line drawings, perhaps previously unobserved because this axis
of variation is orthogonal to the face detection vector.
Dimensionality of the response is related to the amount of information the region
can support. We measured significant nonpreferred stimulus data variance in one to
three dominant eigenmodes. This suggests that the nonpreferred response has the
degree of freedom of at most three uncorrelated random variables. Because independent implies uncorrelated but not vice versa, the PCA dimensionality provides an
60
upper bound on the number of necessary independent explanatory random variables.
Our PCA result suggests that PPA and FFA voxel response can only encode at most
three independent feature characteristics of nonpreferred stimuli. Two of those characteristics are similarity to the preferred category and the contrast that distinguishes
photographs and line drawings.
The standard objection to a performance bound such as this is that performance
might be improved by a noise-free experiment or elimination of the hemodynamic
spatiotemporal smoothing of the neural response. However, these objections do not
obviate an empirical performance bound. The bound can be qualified by an accompanying report of the noise characteristics. Furthermore, the fMRI spatiotemporal
smoothing of neural response and analysis of tens of voxels that contain millions of
neurons is not a drawback to the performance bound. Rather, the performance bound
characterizes the neural system at the coarse scale. Because systems can have different properties at various scales of observation, the fMRI performance bound gives
insight complementary to electrophysiology.
61
62
Chapter 4
Experiment Two: FFA response to
face stimuli
The fusiform face area (FFA) of the posterior temporal lobe was identified as an
anatomically focal set of voxels that responded maximally when subjects viewed human faces [29]. While this evidence supports face detection coding, it is unknown
whether the FFA further codes finer-grained information about faces, like individual
identity, familiarity, emotional expression, or age. Here we present evidence that the
FFA response to a diverse set of face stimuli has a dominant eigenvector, but sufficient remaining uncorrelated variance to potentially code further information about
faces. Multivariate analysis suggests that race and gender are two contrasts coded by
the FFA. A subsequent univariate test suggests a simple modulation scheme exists
in the FFA for coding gender. This motivates the separate study of FFA and gender presented in the next chapter (chapter 5). We propose that unlike the response
profile across nonpreferred stimuli (chapter 3), the FFA response to face stimuli is
sufficiently rich to potentially support coding of additional face features.
In this experiment, two individuals were scanned with fMRI while viewing stimuli
drawn from a set of sixteen faces and one car. The stimulus set was constructed
from the orthogonal crossing of four contrasts: adiposity, age, race, and gender (Figure 4-1). The response of the FFA for each of the seventeen stimuli was measured
using an event-related design (section 2.3.3) with between 58 and 68 presentations
63
of each stimulus. FFA voxels were isolated by a two-run blocked design face-nonface
contrast localizer. Data from one individual was eliminated from the study because
their FFA as isolated by this independent localizer appears to have responded more
to cars than faces. Motion correction, detrending, and event-related deconvolution
were performed using the fs-fast software application [21]. The processed data set included event-related hemodynamic impulse response timecourses under the seventeen
stimulus conditions for every voxel of the FFA region of interest (ROI).
As with nonface analysis (chapter 3) , the ROI response to each stimulus was then
represented by a point (vector) in high dimensional data space. The feature extracted
from each voxel impulse response was its maximal amplitude, without gamma fit. The
ROI response vector for a given stimulus condition was formed by concatenating the
feature vector of every voxel in the FFA. The processed data set now constituted
seventeen points in this stimulus response data space.
By removing the car stimulus data point we characterized the PCA dimensionality
of the FFA voxel responses to preferred stimuli. As with the PCA dimensionality
of the nonpreferred stimuli (chapter 3), the dimensionality of preferred points was
unconstrained by the preferred-nonpreferred ANOVA contrast selection of ROI voxels,
and moreover based on an independent data set. The analysis demonstrated that
FFA responses to preferred stimuli were not embedded in low dimensional spaces
(Figure 4-2 a). Instead, the eigenvector variances included one prominent eigenvector,
with the remaining eigenvector variances dropping off smoothly. In comparison to an
independent Gaussian random vector of the same dimensionality (Figure 4-2 b), the
remaining variance after the first eigenvector dropped off in a similar gradual fashion.
The relevance of this result is that while thirty percent of the diversity in response
to face stimuli followed one set modulated pattern (an eigenvector possibly related to
detection coding, as with nonface stimuli in chapter 3), the remaining seventy percent
of variance across stimuli included largely uncorrelated voxel responses. This result
implies that, although zero correlation does not prove independence, the responses
to face stimuli do not contradict the possibility of several explanatory independent
random variables. In contrast to nonface stimuli, the FFA response to face stimuli
64
Figure 4-1: Stimulus set used in study of FFA response to faces. The set includes
pictures of sixteen faces and one car. Four contrasts are equally represented, including
adiposity (columns 1 and 2 versus 3 and 4), age (rows 1 and 2 versus 3 and 4, gender,
and race. Faces were obtained from [37], with exception of the African Americans
in the second row and third column which were photographed by the author, and
the adipose young Caucasian male obtained from the laboratory. Two face images
(elderly, African American, female) have been witheld as per request of volunteers.
65
I
I
2C
20
Prtncpal COmponen"
Pprftoconean
(b)
(a)
Figure 4-2: Variance of PCA components (a) across FFA response to faces, with
preferred stimulus data from one subject, and (b) generated from a Gaussian normal vector with independent identical elements N(0,1). Dimensions and dataset size
match between (a) and (b).
shows the rich variation that would be needed to encode multiple stimulus features.
Because our stimulus set was balanced across adiposity, age, race, and gender, the
data afforded the opportunity to directly test FFA sensitivy to these four contrasts.
In order to exploit the coding flexibility of our multidimensional data space, we employed a multivariate ANOVA to compare the separation in the mean between each
contrast. Using the built-in Matlab 6.0 function manoval, data were first labeled by
the subcategory of a given contrast. With the age contrast for example, each data
point was labeled "old" or "young" according to the age of the face stimulus. The
manoval function received the labeled data set, and calculated the optimal linear
projection that would best separate the two subcategories. As described in the Matlab function reference, this reduces to minimization of a quadratic form very similar
to PCA. With the data projected onto this optimal subspace, the standard ANOVA
statistic was calculated on the one-dimensional subspace that afforded the greatest
ANOVA score. The resulting projections and p-values are presented in Figure 4-3.
The multivariate ANOVA analysis showed that the FFA significantly differentiated
race and gender, but not adiposity or age. The race effect was previously reported [18]
in a study that employed blocked design to demonstrate elevated average FFA signal
to face stimuli from a subject's own race. A blocked design might be particularly
66
10
8
50
Race
d=1, p=0.0055
40
6
30
4
20
2
1
Gender
5
d=1, p=7. l87e-O7
10
0
0
z 0
-2
-10
-4
-20
-6
-30
-8
-40
-50
(a)
4
3
(b)
5
Adiposity 3 74 1
d=O, p=0.
4
*
2
3
Age
d=O, p=0.2332
2
0
-1
z
-1
-2
-2
-3
-3
-4
-4
-5
S
U
(d)
(c)
Figure 4-3: Multivariate ANOVA across (a) race, (b) gender, (c) age, and (d) adiposity
for FFA response to faces in one subject. The data is projected onto the optimal
category-separating axis, canonical variable 1, abbreviated "ci". This value is plotted
along the y axis in the above graphs. The corresponding p-value is printed on each
panel, indicating the probability that the data belongs to a single Gaussian random
variable. The accompanying d value indicates that the d-1 Gaussians null hypothesis
can be rejected at p < 0.05. Here, we compare only two groups in each panel (eg.
male/female), so the maximum d value is 1.
67
6.0
5.5
.
E
5.0
4.5
F=4.14
Prob>F = 0.0614
Female
Male
Figure 4-4: ANOVA on a one-dimensional FFA response measurement to male and
female stimuli. For each voxel, the hemodynamic impulse response was fit with a
polynomial and the maximum amplitude extracted. This value was averaged across
all voxels in the FFA to produce the response feature analyzed with this ANOVA.
Panel inscriptions denote the ANOVA F-statistic value and the p-value (Prob>F),
the probability that a single univariate Gaussian would produce separation between
male and female stimulus responses equal to or greater than observed here. A p-value
of 0.0614 is near the conventional significance threshold of p=0.05
succeptible to attentional bias towards own-race stimuli. This present experiment
differed from the published result in that it employed event-related design, and the
fast, random category presentation style might arguably mitigate attention-based
modulation. Moreover, rather than comparing average FFA response between two
blocks of exemplars, we compared multivariate responses to independent exemplars
of each race, gender, adiposity, and age.
One concern in interpreting the multivariate ANOVA results was that perhaps
the high significance scores were an artifact of having many fewer data points than
the dimensionality of the coordinate space. Could any 16 points, embedded in a 200
dimensional space and partitioned arbitrarily into two categories, produce a significant multivariate ANOVA score? While the choice of linear projection to maximize
category separation is greater with more coordinate dimensions, many of those projections are equivalent because a 16 point data set is spanned by a 16 dimensional
space. And while any N different points in an N-dimensional space can be perfectly
separated with a linear projection (because of the VC dimension of the linear SVM,
and the linear projection onto the normal to the separating hyperplane), this does
68
not guarantee that the resulting projection will be sufficiently separated to produce
a significant ANOVA score. Empirically, not all sets of 16 data points result in a significant p-value. This is illustrated by the insignificant multivariate ANOVA p-value
for age and adiposity (Figure 4-4). Nevertheless, a null hypothesis that depends on
both the number of data points n and the n degrees of freedom in choosing an optimal projection would be more appropriate and conservative than the one degree of
freedom assumed by the ANOVA null hypothesis used here.
In order to further investigate the coding of gender in FFA, we asked whether
the polynomial-fit impulse response amplitude averaged across all FFA voxels might
differentiate gender (Figure 4-4). The corresponding univariate ANOVA, performed
using Matlab's built-in function, reported separation in the means at a p-value of
0.0614, just missing the standard 0.05 p-value significance threshold. Accordingly,
the third experiment of this thesis (chapter 5) was an effort to increase statistical
power by gathering additional data points per subject by focusing exclusively on the
gender contrast.
In summary, this experiment showed that (1) the FFA voxel set has one prominent
mode but an otherwise high dimensional response to preferred stimuli, and (2) the
FFA is sensitive to race and gender of face stimuli. Unlike the situation with nonface
stimuli, the FFA response to face stimuli is sufficiently diverse to potentially support
coding of many stimulus features.
The original characterization of the FFA [29] was as a face detector, with maximal
average ROI voxel response to faces. This experiment extends the concept of the FFA
response as a face detection signal, and suggests that it may be capable of coding
finer grained information. At the very least, the gain on the face detection eigenmode
may be biased by race and gender. In the context of recent literature which reports
average FFA signal modulation by race [18] and familiarity of the face stimulus to the
subject [30], this result broadens the classical concept of the FFA as face detector.
One interesting further question regarding this last point is whether with the
documented race effect, the multidimensional detection vector that separates face
from nonface points is in the same direction as the detection vector between African
69
American and Caucasian. This would differentiate between the case in which race
simply enhances the face detection signal (allowing for better face detection), and
the case in which race is coded in a different, perhaps orthogonal way from face
detection. With repeated measures of the subtending angle between these two vectors,
a simple statistical t-test could be used to determine whether the subtending angle
was significantly different from zero. Subsequently, the degree to which the vectors
were orthogonal would quantify the degree to which race contributed energy to the
face detection eigenmap (vector). A similar question could be investigated regarding
the gender contrast.
70
Chapter 5
Experiment Three: FFA response
to stimulus gender
Because the fusiform face area (FFA) is defined by its face detection signal [29], results
that indicate other information in the FFA broaden this classical characterization.
Two recent papers have demonstrated other information in the FFA, namely race [18]
and the presence of nonface objects to which people are expert [17]. Here we present
evidence that the FFA signal is biased by the gender of the stimulus. Employing fMRI
counterbalanced block design, we measured an increased average response amplitude
in FFA for one stimulus gender versus the other within runs. The preferred stimulus
was male for one male subject, and female for the remaining three subjects, one of
which was female. Consequently, the effect cannot be explained by low level stimulus
confounds like brightness or by high-level phenomena like differential attention to
same or opposite sex faces. We propose that the FFA is biased by stimulus gender.
In this experiment, four subjects were scanned with fMRI while viewing 16 second epochs of pictures and performing a one-back task. A four-block counterbalanced
fMRI design (section 2.3.2) was employed with epochs for both genders and two nonface objects. Faces from both genders were white, average adiposity, middle-aged,
homely, and plain expressioned. This homogeneity was intended to focus the dominant contrast between stimuli on gender. The nonface objects were office chairs
and reception chairs, selected in an attempt to capture a low-level visual similar71
ity comparable to the similarity between faces of different genders (see Figure 5-1).
Motion correction was performed with the fs-fast software program [21]. Data was
subsequently band-pass filtered, partitioned and averaged with Matlab to remove
physiological drift and instrument noise. The resulting data set included block hemodynamic timecourses for the four stimulus conditions, available for each voxel and
each run. Each run was approximately counterbalanced and contained four blocks
per condition. Roughly ten runs were performed per subject. For the third subject,
runs included stimuli of either 300x300 pixel size or about two-fold larger. For the
fourth subject, in addition to runs of larger size, other runs included all-new stimulus
exemplars at the original size.
The average timecourses of V1 voxels suggested that low-level properties of faces
of different genders were more similar than office and reception chairs (Figure 5-2).
The difference in the VI average timecourse between chair types was more pronounced
than between genders. Because of the different aspect ratios of the face and chair
stimuli, it is possible that office and reception chair images simply occupied different
amounts of visual dark space.
For subjects in Figure 5-2 (b), (c), and (d), the
similarity in VI average timecourses for the two genders compared to FFA timecourses
(Figure 5-3) suggested that gender bias observed in FFA cannot be entirely attributed
to global low-level stimulus differences like luminance. For Figure 5-2 (a), the female
response dominates in V1, but from Figure 5-3 (a), we see that the male response
dominates in FFA, suggesting the low-level and FFA-level gender effects might be
independent.
Timecourses averaged over all FFA voxels and all runs (Figure 5-3) revealed preference for one gender in each subject. The preferred gender was male in the first
subject, himself a male African-American. The preferred gender was female in the
remaining three Caucasian subjects, including one female and two males.
These FFA response amplitudes for each gender were not constant across runs for
each subject. While the same gender was preferred in most or all runs of a given
individual, the absolute amplitude of the two genders drifted across runs (Figure 55). Consequently, depending on this nonstationary drifting offset, it is not necessarily
72
Figure 5-1: Examples from stimulus set used in study of FFA response to gender.
Rows from top to bottom correspond to males, females, reception chairs, and office
chairs respectively. Fifteen exemplars of each category were presented in subjects a, b,
and c. Blocks of fifteen additional exemplars of each category were included in some
trials of subject d. All stimuli were presented at identical pixel height dimensions.
73
3
14
F
12
1 2
1
2
9
10
3
4 5 6 7 8 9
T1nmp(L5asc=Idbpurampm)
10
3
4
6
5
3
7
2 3 4 5
(a)
6 7
8
7
8
1(
(b)
a
13
....
....
....
2
2
3
4
1mu.ehsam
5
6
m.snd
7
8
9
10
1
rminC
2
3
4
5
6
Thnesmpw .5assp(d)
ampis)
9
10
(d)
(c)
Figure 5-2: Time course of V1 response, averaged over all voxels in V1 for male,
female, and chair stimuli. The y axis denotes percent amplitude change from a
baseline signal. Each panel corresponds to a different subject, (a) through (d). The
line assignments are as follows: dashed - male, solid - female, dotted - office chairs,
dash-dotted - reception chairs.
74
4
I2
I
j2
ii
................
.. .....
..
i11
a
1
2
3
4
7Ma
5
6
CLla)Ca
7
8
9
10
1
2....4
........
2
3
4
.......
5
6
7.
S.
...
7
8
10
9
10
WM M
(a)
(b)
4
I
]4
3
--- ----
£
12
1
2
3
Tkm
4
8
7
6
5
1.5(amo8deperaEM)
9
10
1
2
5
6 7 8 9
3
4
lnMa-ms (.629cMnbperam*4
10
(d)
(c)
Figure 5-3: Time course of FFA response, averaged over all voxels in FFA for male,
female, and chair stimuli. The y axis denotes percent amplitude change from a
baseline signal. Each panel corresponds to a different subject, (a) through (d). The
line assignments are as follows: dashed - male, solid - female, dotted - office chairs,
dash-dotted - reception chairs.
75
I
I
1,
14
I
4...
I.6
j2
A
~~-0
~
1
2
3
4
5
6
1
2
3
4
5 6
7
S
9
7
8 9
10
10
1I
.4
2
2
4
5
6
7
S
2
3
4
5
6
7
8
(a)
U
V
(b)
T A.
I&
14
3
Ar
.
....
..
.
...
..
2[
1
A
2
3
5
4
8
7
8
1
2
3234567w
4
56
7
8
9
10
(d)
(C)
Figure 5-4: V1 response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dash-dotted -
reception chairs.
true that one gender dominates the other in FFA response irrespective of when the
two measurements are made. If instead the measurements of FFA response are made
sufficiently close in time to avoid the nonstationary drifting offset, one gender will
always dominate the other.
The implication of this nonstationarity is two-fold. First, it introduces large variance in the measured response amplitude to male and female stimuli, so that a conventional ANOVA performed on the distribution of male and female stimulus responses
may fail. Indeed, the prerequisite conditions of the ANOVA test are not met. Because
of the nonstationarity, the response to a male stimulus in one run is not drawn from
the same random variable as in another run. Consequently, the ANOVA might fail
despite the fact that the FFA has a preferred gender.
76
An appropriate statistical test of FFA gender preference must be independent of
this non-stationary drifting offset. We can ask whether one gender is consistantly preferred over another gender within-run for all runs. Because the gender amplitude comparison is made within-run, the statistical test is independent of the non-stationary
drift that occurs between runs.
A hypothesis test allows us to assess with statistical confidence whether VI and
the FFA exhibit gender preference. Define the null hypothesis as an equal-genderpreference (unbiased) region of interest (ROI). The probability that female elicits
higher average ROI amplitude than male within a run is 0.5 under the null hypothesis.
Using the binomial cumulative distribution function (cdf)
4(f) = P(F < f; n) =
.5f.5"n-f
O
(5.1)
f
the probability that an unbiased ROI produces at least g trials of either gender out
of n total trials, is
P(G > g;rn) =
- 11(g
1) -(n
g)l
1
g A n/2
o.w.
(5.2)
The binomial cdf D(f) is evaluated using (5.1), or equivalently the Matlab function
binocdf.m.
Define an ROI to significantly prefer one gender over another if an unbiased ROI
(the null hypothesis) produces the observed data with less than 0.05 probability, ie.
P(G > g; n) K .05. Denote P(G > g; n) to be the p-value. Denote the gender
preference ratio to be the number of runs in which the average response was greater
for the preferred gender over the total number of runs, g/n.
The gender preference ratios are summarized below for all subjects:
77
Subject
(a)
(b)
(c)
(d)
FFA
10/11
8/9
7/9
7/10
Vi
6/11
5/9
5/9
6/10
The respective p-values of individual subjects are as follows:
Subject
(a)
(b)
(c)
(d)
FFA
.0117
.0319
.1797
.3437
V1
1
1
1
.7539
In order to pool results over all subjects, we must calculate the null hypothesis
pdf for the pooled data. Because pooling is a summation of the individual random
variables of each subject, we can convolve the individual subject null hypothesis pdfs
to produce the pooled data null hypothesis pdf and p-value. Following this pooling
procedure, we have a total gender preference ratio and p-value for FFA and V1:
All Subjects
Gender Preference Ratio
P-value
FFA
32/39
.0023
V1
22/39
1
Although subjects (c) and (d) show greater significance in FFA than V1, FFA pvalues are not less than the customary .05 standard. Pooling over these two subjects
alone does not acheive significance (FFA-0.3117, V1-1). However, if two additional
subjects were run of preference ratios and total runs comparable to (c) and (d), the
p-value excluding (a) and (b) would be significant (FFA-.02, V1-.8491). Our limited subject size suggests varying degrees of gender preference across the population.
Although more scans would need to be performed to determine the extent of this
phenomenon in the general population, we can say with statistical significance that
there are people in the population with FFA gender preference. With more scans, we
can develop an estimate of the population percentage of these people with a relatively
narrow confidence interval.
The statistical test chosen above is akin to a "two-tailed" test, more stringent than
the alternative "one-tailed" version which would have given p-values that are half of
78
the p-values reported in the table above. The two-tailed test reports the probability
that an unbiased region of interest would show the observed or greater preference to
any one gender, either male or female. The one-tailed test gives the probability that
an unbiased region of interest would show the observed or greater preference to one
chosen gender in particular, like female. The two-tailed test is more appropriate here,
because we are claiming that the FFA exhibits bias towards some gender (either male
or female), rather than always only to one particular gender like female.
We can perform the analogous hypothesis test to determine whether VI and FFA
exhibit preference for one of the two types of chairs. The chair preference ratios are
summarized below for all subjects:
Subject
(a)
(b)
(c)
(d)
FFA
6/11
5/9
5/9
6/10
V1
6/11
9/9
9/9
9/10
The respective p-values of individual subjects are as follows:
Subject
(a)
(b)
(c)
(d)
FFA
1
1
1
.7539
V1
1
.0039
.0039
.0215
Pooling results over all subjects, we have a total chair preference ratio and p-value
for FFA and VI:
All Subjects
Chair Preference Ratio
P-value
FFA
22/39
1
V1
33/39
5.56e-04
The results suggest that V1 does preferentially respond to one type of chair,
whereas FFA does not. This behavior is exactly opposite from gender preference,
where FFA had gender preference, and Vi did not. It is striking that differences
in office and reception chair images were so great as to elicit preference from V1,
but nevertheless failed to elicit preference in FFA. This suggests that the FFA voxelaveraged activity might be more sensitive to differences in face objects than nonface
79
I
p
j
A
f4
2.
''A
a.U
~*.k.
'..,/
1
2
3
4
5
6
lam n
8
7
.A"D.~'.A
-~
A''
=
12A
~
9
10
1I
3
2
4
r
5
6
ftminutw
7
a
(b)
(a)
1%
I
A
4~
3
5
U
A
2
1'
A
"'A"'
2
3
5
ftM iN
4
6
7
a
1q
'
*~*~
~
2
3
4
5
6
7 ..
mm
~
8
"'A
9
.
I
10
I
(d)
(c)
Figure 5-5: FFA response amplitudes across runs for four conditions. The line assignments are as follows: dashed - male, solid - female, dotted - office chairs, dashdotted - reception chairs.
objects, and that large response differences in V1 (i.e. large low-level feature differences) do not necessarily modulate the mean response in FFA, such as with these
two nonface objects. This result does not rule out the possibility that sensitivity to
nonface objects might emerge in some feature of the entire multidimensional response
other than the voxel-averaged maximum amplitude.
Before the non-stationary drift was realized, a split-half multidimensional SVM
analysis (section 2.4.3) was employed in an effort to quantify gender information in
the FFA. The d' values of the operating characteristics for gender discrimination in
FFA and V1 are reported in the accompanying table (table 5.1). The d' function is
nondecreasing with distance from the chance line on the operating charactersic plane
defined by hits (y axis) and false alarms (x axis). Some d' values are negative, indicat80
REGION: FFA
(a)
(b)
(c)
(d)
Gender Classify
ROIMean ROIMap
-. 15
-.39
.54
.64
0
.34
-1.18
.3
Chair Classify
ROIMean ROIMap
.52
0
.93
.9
.64
.26
.68
0
Face Classify
ROIMean ROIMap
2.60
2.79
4.64
3.36
4.64
4.64
4.64
4.64
VI
Gender Classify
ROIMean ROIMap
.11
0
0
-.68
.24
.48
-.33
0
Chair Classify
ROIMean ROIMap
-.32
-.1
1.9
3.96
.6
1.36
2.32
2.30
Face Classify
ROIMean ROIMap
.11
.13
1.54
2.12
1.52
2.32
1.0
3.0
REGION:
(a)
(b)
(c)
(d)
Table 5.1: SVM performance d' values on gender, chair, and face discrimination tasks in four subjects. The binary classification tasks were male/female, office/reception chair, and face/nonface respectively. Rows correspond to subjects (a)
through (d). ROIMean and ROIMap denote the hemodynamic response feature
being classified. ROIMean is a scalar, the average of the response over voxels and
time. ROIMap is the vector of time-average responses of ROI voxels.
ing the classifier labeled females as males and males as females at a rate higher than
chance. This would mean that whatever trend identified by the SVM as distinguishing female and male in the training dataset had reversed in the test dataset. Because
the SVM does not adjust to non-stationarity, the operating characteristic resulted
in mixed and generally poor performance across subjects. The nonstationary drift
across runs also makes conventional statistical tests across runs (as with ANOVA)
inapplicable.
In order to compensate for non-stationarity in the voxel-averaged FFA response,
we estimated the non-stationarity as the average of the male and female stimulus
response amplitudes. The signal was then corrected to remove the nonstationarity.
For each run, this DC offset was calculated and subtracted from the male and female
amplitudes.
The resulting male and female responses were centered around zero
amplitude (see Figure 5-6).
With the broad drifts in amplitude corrected, both SVM and ANOVA produced
81
112
34
867
8910
'1
11
2
345
7
%Mu M*Ww
ftm -M&
(a)
(b)
9101
Figure 5-6: Non-stationarity correction of FFA response amplitudes in subject (a)
across runs for four conditions. (a) The original data, with response to faces drifting
downward over the course of the experiment. (b) Compensated for non-stationarity
in face stimulus responses. The line assignments are as follows: dashed - male, solid
- female, dotted - office chairs, dash-dotted - reception chairs.
results that were consistent with the preference ratio statistics.
All four subjects
showed significant (p < .05) separation between male and female response amplitudes
with the ANOVA in FFA, and insignificant separation in V1 with the exception of
subject (a), who showed significant but opposite gender preference in VI from FFA.
Similarly gender classification with SVM was perfect or above chance for all subjects
in FFA, but at or below chance for V1, with the exception of subject (c), who's V1 and
FFA had equal but only slightly better than chance gender classification (TP=0.6,
FP=0.4).
This experiment showed that the FFA exhibits gender preference. Gender preference occurs on average across trials, and in a statistically significant fashion in a
within-run comparison. The FFA gender preference is biased in a manner that does
not systematically relate to the gender of the subject for this sample size of four
subjects.
One potential objection to the experimental method is that block design is subject
to attentional effects. The argument is that the consistant FFA gender preference
obsered across trials simply reflects the subject's consistantly increased attention to
one particular gender over the other. This objection is difficult to counter even with
event-related design, and the block design experiment on race bias in the FFA [18]
82
may have faced similar criticism. One hint that attention may not account for the
gender effect observed is that VI voxels did not exhibit the effect. Attention would
arguably also modulate V1 voxels if it modulated FFA voxels. Activation in the FFA
is modulated by attention [51]. However, attention is not yet sufficiently understood
to eliminate the possibility that attention might modulate the FFA but not V1. This
might be the topic of further research.
One study has previously reported that MEG face detection signals are insensitive
to face stimulus gender [10].
Our result with fMRI does not contradict the MEG
result because these measurement techniques are based on different (although related)
underlying neural signal sources and are capable of different spatial and temporal
resolution. The contrasting results from MEG and fMRI do emphasize the point that
in general, the absence of information in a brain area as measured by one technique
is not proof of the absence of information in the brain area. Negative results are in
general not conclusive.
The experiment describing the race effect in FFA [18] also showed a correlation
with hippocampal activity and performance on a face memory recall task. The combined data set argued for a neural mechanism by which people better remember faces
of their own race. A similar experiment is possible with gender, the FFA, hippocampus, and a memory recall task. As with the race study [18], a gender-bias result linked
to memory would have implications for gender discrimination and social interaction.
83
84
Chapter 6
Conclusions on FFA coding of
visual stimuli
The fusiform face area (FFA) is defined as a focal set of voxels in the posterior
temporal lobe that respond "maximally" to faces, meaning that pictures of human
faces elicit a higher signal from the FFA of an attending subject than pictures of
houses, amorphous blobs, or other nonface stimuli [29]. Most past work on the FFA
has averaged activity over the voxels in the FFA, ignoring any differences that may
exist in the response of different voxels in this region. Here we used high resolution
fMRI scanning and new mathematical analyses to address two general questions:
1. What information is contained in the pattern of response across voxels in the
FFA?
2. How is that information represented?
In particular, this thesis investigated three specific questions:
1. How do voxels within the FFA collectively encode the presence or absence of a
face?
2. How many independent stimulus features can the FFA distinguish?
3. Does the FFA contain information about stimulus image format or face gender?
85
The following discussion summarizes what was learned about each of these questions through the experiments described in this manuscript (chapters 3,4,5).
6.1
How do voxels within the FFA collectively encode the presence or absence of a face?
Previous work identified voxels in the FFA by their individual elevated response to
face stimuli over nonface stimuli [29]. In the first result of this thesis, we demonstrated
with principal components analysis (PCA) that the dominant source of variation over
measured responses to faces and nonfaces was the FFA discrimination between face
and nonface, as opposed to discrimination between photographs and line-drawings, or
any other variation in the stimulus responses. By projecting data onto the first principal component, we confirmed that this dominant direction of variation discriminated
faces from nonfaces.
Also with PCA, we now understand that FFA voxels approximately "co-vary" in
their response to stimuli. Specifically, both nonface stimuli and face stimuli exhibit a
dominant first principal component when analyzed separately. To understand what
co-vary means, consider a "response vector", which contains the response amplitudes
of each FFA voxel to a stimulus. Because FFA voxels co-vary over nonface stimuli,
the response vectors to nonface stimulus A, B, C, D, etc. all approximately fall along
a line in multidimensional space. Geometrically, the difference vector between any
two nonface stimuli (e.g. B-A), is approximately some scaled version of the same
"principal component" vector which points in the direction of the line.
By comparing the principal component of non-face stimuli to the "detection vector" that connects face responses to nonface responses, we were able to demonstrate
that not only do these voxels co-vary with each other (in the sense defined above)
over non-face stimulus responses, but they do so in a fashion that relates to the
face detection signal. Geometrically, we found the first or second principal component of nonface modulation was usually around 42 degrees off from the detection
86
vector, out of 90 possible degrees. This means that over 50 percent of the signal
energy ((cos 42)2
x
100 = 55%) in the dominant direction of variation in the non-
face responses was related to the face detection signal. Knowing the face detection
vector, the categorization of a stimulus as face or non-face can be chosen optimally
by subtracting off any baseline signal, projecting the adjusted response vector onto
the detection vector, and comparing the magnitude of the projection to a decision
threshold.
This phenomenon was even more marked in the parahippocampal place area
(PPA) which responds more to stimuli that depect indoor or outdoor scenes than
non-place stimuli. In the PPA, the first or second principal component of the nonplace stimuli was on average 26 degrees off from the detection vector, i.e. over 80
percent of signal energy in the nonplace principal component related to the face detection signal. Interestingly, the papers that defined the FFA and PPA [29, 9], averaged
voxel amplitudes equally to demonstrate detection, which we now know to be suboptimal because it corresponds to projecting the response vector onto the average
vector rather than the detection vector. Consequently, in answer to a recent debate
on the FFA [25, 44], it is true that the response vector carries more information in
discriminating stimuli than the average FFA response, specifically by calculating the
weighted average that corresponds to projection onto the detection vector.
In a simple model, co-variance (as defined above) might be caused when a blood
vessel modulates its flow in response to the stimulus and distributes its signal in fixed
proportion to its distance to each FFA voxel. This fixed proportion would impose a
particular voxel response vector, and any increase in blood vessel signal would translate into a multiplicative gain of the response vector. In an alternate model equally
consistent with co-variance, the blood vessel would induce a baseline response vector,
but neuronal activity would add on a characteristic vector in a different direction,
scaled to indicate face detection.
We could specifically reject the simple model if we found the direction of the principal component for nonfaces was different from the direction of the individual nonface
response vectors. This distinction may help elucidate whether the co-variance (i.e.
87
dominant principal component) is a result of vessel physiology or neuronal activity.
6.2
How many independent stimulus features can
the FFA distinguish?
Recent literature indicates that the FFA is sensitive to the race of face stimuli [18],
and the level of expertise of nonface stimuli [17]. It is possible that the FFA might
encode other features of stimuli. In this thesis, PCA was used to determine an upper
bound on the number of independent stimulus features the FFA voxels could encode.
The PCA analysis suggested that for our given stimulus set, the fMRI response of the
FFA is sensitive to the equivalent of one to three nonface features, and potentially
many (more than three) face features. Although information theory provides a direct
method for estimating this bound, the method could not be used here because it
requires a probability density function (pdf) and is consequently data-intensive.
A feature is independent of another feature if the presence of one feature in the
stimulus does not make the other feature any more or less likely to also be present.
If two features are not entirely independent, then there is less variation in the stimuli
that needs to be encoded. If faces with blond hair always (or never) had blue eyes,
the FFA could use just one pattern to indicate both the presence of blond hair and
blue eyes. If the two features were independent, then they would need to be coded in
two separate patterns.
Even one voxel alone could encode an arbitrarily large number of features if it
produced a unique output for each combination of features. Based qualitatively on
the dynamic range and noise properties of FFA voxels, this scheme might be difficult
for even three features, because the number of unique outputs grows as 0(2") where
n is the number of features.
There are in fact a multitude of possible coding schemes, that is, ways in which the
voxels of the FFA could assign portions of the response vector space to combinations
of features. By proposing a PCA upper bound on the number of codable independent
88
stimulus features, we implicitly assumed one particular coding scheme. In this scheme,
the degree of presence of each feature is encoded by the magnitude of a response vector
specific to that feature. The total FFA response vector is formed by the sum of the
individually scaled feature vectors.
Under this coding scheme, the magnitude of a feature is decoded by projecting
the response vector onto the desired feature vector. Completely independent features
would need to have orthogonal feature vectors so that a set of faces varying in only
one of those features would have response vectors with the same projection in the
independent feature directions. The geometric analysis of the relative locations of
various stimulus response vectors will be useful in describing the exact FFA coding
scheme.
PCA estimates the equivalent number of independent features coded in the dataset
by counting the number of orthogonal feature vectors (principal components) to approximately span the set of responses. The number of principal components to use in
this approximate span is difficult to gauge. Here, we chose the PCA dimensionality
qualitatively, by examining the size of the first principal components, and comparing
the PCA variance profile against a gaussian white noise profile. The Bartlett test provides a statistic with which to choose the PCA dimensionality quantitatively, but this
method assumes a white noise background. Further compounding the problem, both
noise and signal could have the same PCA variance profile. The signal and noise contributions to the PCA dimensionality could possibly be separated by measurements
of the magnitude and cross-voxel correlation of the noise.
The PCA dimensionality estimate is referred to as an upper bound because conceivably even only one independent feature could "fill" the response vector space,
requiring arbitrarily many orthogonal feature vectors. A high PCA dimensionality
for face stimuli that differ only in gender would exemplify this sort of behavior. Moreover, drift in signal baseline or noise would also increase the PCA dimensionality of
the FFA response vectors driven by a single independent feature to some dimensionality estimate greater than one. Drift and noise are sources of dimensionality
overestimation in this experiment.
89
6.3
Does the FFA contain information about stimulus image format or face gender?
Prior studies suggest that the FFA does in fact encode stimulus features other than
the face/nonface contrast, including race [18] and the presence of nonface objects
for which the subject bears expertise [17]. Newly documented stimulus features to
which the FFA is sensitive contribute to an understanding of the brain mechanism for
face recognition and the factors that might influence perception and memory of the
faces that we see. An accumulating list of various FFA-sensitive features produces an
assumption-free lower bound on the number of features the FFA encodes, in contrast
to the coding-scheme-assumption based PCA upper bound discussed in the previous
section.
This thesis reported that the FFA is sensitive to face gender and to stimulus
image format, i.e.
differences between photographs and line-drawings.
The FFA
response had previously been reported insensitive to the photograph/line-drawing
contrast with fMRI [22].
Magnetoencepholograpy [10] of an FFA-related scalp lo-
cation reported no gender preference. Results in this thesis showed a run-by-run
gender preference in voxel-averaged FFA activity. The contrast between photographs
and line-drawings appeared in scatter plots of the first two principal components.
Moreover, the photograph/line-drawing contrast was roughly orthogonal to the face
detection vector in the response vector space. This means that although the FFA was
sensitive to this contrast, the FFA encoded this information in a way that interfered
only minimally with the face detection signal.
Additional experiments are necessary to understand the cognitive relevance of
the FFA sensitivity to gender. One immediate question is whether the FFA gender
preference reflects the subject's propensity to attend to and/or remember faces of
one gender or another. Analysis of additional subjects will also help establish that
the measured bias in the FFA was truly independent of trivial low-level features
like luminance. Although the inversion of preferred signal between V1 and FFA in
one subject and the increased separation in signal in FFA for the other subjects is
90
suggestive that the FPA gender bias is not due to low-level differences in the male and
female stimuli, replications of this result in additional subjects would be reassuring.
One useful experiment would be to purposefully but only slightly increase the contrast
or luminance of one gender in an individual known to have an FFA biased for the
other gender, and demonstrate that the same bias in the FFA persists despite an
inversion in bias toward the more luminant gender in V1.
6.4
Nonstationarity in fMRI experiments
Searching for trends on a run-by-run basis proved critical to revealing FFA gender bias
in chapter 5, where nonstationarities (i.e. drifting responses to the same condition
across runs due to brain activity or instrumentation) that modulated the signal by
as much as 7 percent from baseline would have swamped out this more subtle 0.5 to
2 percent effect. Analyzing data on a run-by-run basis may serve as an important
complement to conventional whole-scan ANOVA analysis that assumes stationarity
by collectively analyzing all runs, especially as we begin to investigate the various
physiological phenomena that make the fMRI signal nonstationary.
Both pdf and SVM methods rely on the stability of the distributions from which
the data is drawn. Nonstationarity in the gender bias experiment made SVM results
inconsistent (Figure 5-5).
It is possible for nonstationarities to be sufficiently small as to maintain the separation between the response to conditions that are being classified. When the conditions
being classified are dramatically different, the classification of their responses is less
impaired by nonstationarity. This was the case in a recent publication [6] examined
nonface object categories from the same subject on different days. Aligning the subject's head across sessions with a thermoplastic mold, the group was able to perform
classification despite nonstationarities. This was perhaps because the thermoplastic mold decreased noise by reducing head movement, and more likely because the
nonface categories tested were so much different from each other than faces of each
gender. For example, the V1 response in our gender study (Figure 5-4) shows large
91
amplitude separation between office and reception chairs despite nonstationarity.
A theoretical calculation would be useful to estimate the required dataset size to
perform pdf or SVM methods for a more subtly varying stimulus set like faces. The
calculation would need to first specify an experimental paradigm, and then gather realistic noise statistics from repeated measures using that technique. For a theoretical
calculation to be practical, it would help to contend with the large-scale nonstationarities observed in Figure 5-5. This theoretical model could assess the feasibility of
any experiment based on the noise statistics of pilot data.
6.5
Defining a brain system in fMRI analysis
Communication theory provides a fresh perspective on interpretation of data from an
anatomically focal set of voxels. When presenting stimuli and analyzing data from
a region of interest (ROI), it is accepted practice in current neuroscience literature
(and indeed in the above manuscript) to attribute the characteristics of the data to
a property of the ROI. The concepts of input, output, and system in communication
theory suggests that this approach is flawed.
By relating a set of visual stimuli presented to an individual with the output from
a ROI, we are studying the system that includes all participating neural circuitry.
Consider face detection in the FFA. We present face and nonface stimuli, and observe voxels in the FFA to respond maximally to face stimuli such that this output
serves as a face detection signal. The accepted conclusion is that the neurons of the
FFA perform face detection. The proper conclusion is that the entire set of neurons
implicated in the production of this signal perform face detection. As we move forward in understanding the computational architecture of the brain, both modular
and distributed notions will need to make this distinction to develop a more accurate
description of brain function..
What then does it mean that only a focal set of voxels in the FFA generate the face
detection signal? It means only that the voxels of the FFA encode whether a stimulus
is a face or nonface. This could potentially imply that the FFA might be convenient
92
read-out for other brain centers interested in knowing when a face is detected. It does
not mean however that the FFA is a face detection system. We would be inclined
to call the FFA a face detection system (or at least part of one) if the input to the
FFA was not already a face detection signal. A lesion study of the FFA would not
distinguish between the FFA as a readout and as a face detection system, because
imparing both faculties could incapacitate face detection.
This discussion is particularly relevant to PCA dimensionality analysis of ROI
signal output. The PCA dimension of the measured signal is only descriptive of the
system in the context of the PCA dimension of the input. This concept is native to
information theory, where channel capacity is defined in terms of the mutual information between the input and output. Furthermore, the system dimensionality refers
to the entire system of participating neural circuitry. Understanding this system dimensionality does however specifically aid in describing the local response properties
of the FFA.
The concept of system, input and output suggests that future fMRI studies will not
only be concerned with the relationship between stimulus and a target ROI, but the
relationship and interaction between different ROI within the brain, either remotely
located, or as proximal anatomical slices. In combination with new approaches to
stimulating brain activity (in addition to sensory stimuli) such as transcranial magnetic stimulation and electrode-based current injection, fMRI is beginning to tease
apart the structure of brain function on the large (millimeter) scale.
93
94
Bibliography
[1] T.M. Apostol. Calculus, volume 2. John Wiley & Sons, 2 edition, 1967.
[2] S. Beheshti. MIT course 6.341: Discrete-time signal processing, 2003.
[3] J. Bodamer. Die prosopagnosie. Archiv fur Psychiatrie und Nervenkrankheiten,
179:6-53, 1947.
[4] R.R. Buxton. Introduction to Functional Magnetic Resonance Imaging: Principles and Techniques. Cambridge University Press, 2001.
[5] T.M. Cover and J.A. Thomas.
Elements of Information Theory.
Wiley-
Interscience, 1991.
[6] D.D. Cox and R.L. Savoy. Functional magnetic resonance imaging (fMRI) "brain
reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neurolmage, 19(2):912, 2003.
[7] A.M. Dale. Optimal experimental design for event-related fMRI. Human Brain
Mapping, 8:109, 1999.
[8] P. Dayan and L. Abbott. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 2001.
[9] R. Epstein and N. Kanwisher. A cortical representation of the local visual environment. Nature, 392:598, 1998.
[10] E. Halgren et al. Cognitive response profile of the human fusiform face area as
determined by MEG. Cerebral Cortex, 10(1):69, 2000.
95
[11] E.N. Brown et al.
Distributed and overlapping representations of faces and
objects in ventral temporal cortex. J. Neuroscience, 18(18):7411, 1998.
[12] K. Kwong et. al.
Functional magnetic-resonance-imaging of primary visual-
cortex. Investigative Ophthalmology & Visual Science, 33(4):1132, 1992.
[13] M.A. Hearst et al. Trends and controversies: Support vector machines. IEEE
Intelligent Systems, page 18, 1998.
[14] P. Golland et al. Discriminative analysis for image-based studies. Proc. MICCAI,
page 508, 2002.
[15] T-P. Jung et al. Imaging brain dynamics using independent component analysis.
IEEE Proceedings, 89(7):1107, 2001.
[16] I. Fried, K.A. MacDonald, and C.L. Wilson. Single neuron activity in human
hippocampus and amygdala during recognition of faces and objects. Neuron,
18:753-765, 1997.
[17] I. Gauthier, P. Skudlarski, J.C. Gore, and A.W. Anderson. Expertise for cars
and birds recruits brain areas involved in face recognition. Nature Neuroscience,
3(2):191-197, 2000.
[18] A.J. Golby, J.D. Gabrielli, J.Y. Chiao, and J.L.Eberhardt. Differential responses
in the fusiform region to same-race and other-race faces. Nature Neuroscience,
4(8):845-850, 2001.
[19] D. Greve.
fmri-analysis-theory.tex, 1999.
fs-fast software documentation,
freesurfer-alpha/fsfast-20020814/docs.
[20] D.
Greve.
selxavg.tex,
2000.
fs-fast
software
documentation,
freesurfer-alpha/fsfast-20020814/docs.
[21] D. Greve. fs-fast fMRI analysis software (freesurfer), 2001. Available for download at http://surfer.nmr.mgh.harvard.edu/.
96
[22] K. Grill-Spector. The neural basis of object perception.
Current Opinion in
Neurobiology, 13(2):159, 2003.
[23] C.G. Gross, G.E. Roche-Miranda, and D.B. Bender. Visual properties of neurons
in the inferotemporal cortex of the macaque. Journal of Neurophysiology, 35:96111, 1972.
[24] J.C. Hansen. Separation of overlapping waveforms having known temporal distributions. Journal of Neuroscience Methods, 9:127, 1983.
[25] J.V. Haxby, M.I. Gobbini, M.L. Furey, A. Ishai, J.L. Schouten, and P. Pietrini.
Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425, 2001.
[26] D.J. Heeger and D. Ress. What does fMRI tell us about neuronal activity?
Nature Reviews Neuroscience, 3:142, 2002.
[27] G. Heit, M.E. Smith, and E. Halgren. Neural encoding of individual words and
faces by the human hippocampus and amygdala. Nature, 333:773-775, 1988.
[28] J.P. Hornak. The Basics of MRI. 2003. http://www.cis.rit.edu/htbooks/mri/.
[29] N. Kanwisher, J. McDermott, and M.M. Chun. The fusiform face area: A module
in human extrastriate cortex specialised for face perception. Journal of Neuroscience, 17(11):4302-4311, 1997.
[30] G. Kreiman, C. Koch, and I. Fried. Category-specific visual responses of single
neurons in the human medial temporal lobe. Nature Neuroscience, 3(9):946,
2000.
[31] N.K. Logothetis, J. Pauls, M. Augath, T. Trinath, and A. Oeltermann. A neurophysiological investigation of the basis of the fMRI signal. Nature, 412:150,
2001.
[32] G. McCarthy, A. Puce, J.C. Gore, and T. Allison. Face-specific processing in the
human fusiform gyrus. Journal of Cognitive Neuroscience, 9(5):604-609, 1997.
97
[33] P.
Niyogi
and
ity reduction
M.
and
Laplacian
Belkin.
data representation.
nical Reports, pages TR-2002-01,
eigenmaps
for
dimensional-
University of Chicago Tech-
2002.
Available for
download
at
http://www.cs.uchicago.edu/research/publications/techreports/TR-2002-01.
[34] J.G. Ojemann, G.A. Ojemann, and E. Lettich. Neuronal activity related to faces
and matching in human right nondominant temporal cortex. Brain, 115:1-13,
1992.
[35] A.V. Oppenheim, R.W. Schafer, and J.R. Buck. Discrete-Time Signal Processing.
Prentice Hall, 2 edition, 1989.
[36] A.V. Oppenheim, A.S. Willsky, and S.H. Nawab. Signals and Systems. Prentice
Hall, 2 edition, 1996.
[37] P.J. Phillips, H. Wechsler, J. Huang, and P. Rauss. The feret database and evaluation procedure for face recognition algorithms. Image and Vision Computing,
16(5):295-306, 1998.
[38] P.L. Purdon, V. Solo, R.M. Weisskoff, and E.N. Brown.
Locally regularized
spatiotemporal modeling and model comparison for functional mri. NeuroImage,
14:912, 2001.
[39] F. Rieke, D. Warland, R. deRuytervanSteveninck, and W. Bialek. Spikes: Exploring the Neural Code. MIT Press, 1999.
[40] B.R. Rosen, R.L. Buckner, and A.M. Dale. Event-related functional MRI: Past,
present, and future. PNAS, 95:773, 1998.
[41] S.M. Ross. Statistical Learning Theory. Harcourt
/
Academic Press, 2 edition,
2000.
[42] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear
embedding. Science, 290:2323, 2000.
98
[43] M.J. Schnitzer and M. Meister. Multineuronal firing patterns in the signal from
eye to brain. Neuron, 37:499, 2003.
[44] M. Spiridon and N. Kanwisher. How distributed is visual category information
in human occiptio-temporal cortex? an fnri study. Neuron, 35:1157, 2002.
[45] G. Strang. Linear Algebra and Its Applications. International Thomson Publishing, 3 edition, 1988.
[46] M.J. Tarr and I. Gauthier. FFA: A flexible fusiform area for subordinate-level
visual processing automatized by expertise. Nature Neuroscience, 3(8):764-769,
2000.
[47] J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework
for nonlinear dimensionality reduction. Science, 290:2319, 2000.
[48] V.N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.
[49] E. Weisstein.
Bonferroni correction, 2003.
Available for download at
http://mathworld.wolfram.com/BonferroniCorrection.html.
[50] E. Weisstein.
Kolmogorov-smirnov test, 2003.
Available for download at
http://mathworld.wolfram.com/Kolmogorov-Smirnov Test.html.
[51] E. Wojciulik, N. Kanwisher, and J. Driver. Covert visual attention modulates
face-specific activity in the human fusiform gyrus: fMRI study. Journal of Neurophysiology, 79:1574, 1998.
[52] G. Wornell. Personal communication, 2002.
[53] G. Wornell and A.S. Willsky. 6.432 MIT Course Notes: Stochastic Processes,
Detection and Estimation. 2003.
[54] J. Wyatt. MIT course 6.432: Stochastic processes, detection and estimation,
2002.
99
Download