Part I

advertisement
Modeling fMRI data with uncertain
hemodynamic response or
stimulus functions
Martin Lindquist
Department of Statistics
Columbia University
Functional MRI
• Functional MRI (fMRI) performed using BOLD
contrast can detect changes in blood
oxygenation and flow that occur in response to
neural activity.
• A primary goal of fMRI research is to use
information provided by the BOLD signal to
make conclusions about the underlying neuronal
activity.
Overview
Stimulus
Part I:
Neuronal
Activity
Hemodynamics
Given data and stimulus function, estimate the
hemodynamic response function (HRF).
Part II: Given data only, estimate activity.
Modeling the Hemodynamics
• A number of methods exist for modeling the relationship
between stimulus and BOLD response.
– Linear time invariant (LTI) system
− BOLD response to events add linearly
− Relatively simple to use
– Non-linear models (e.g. Balloon model)
− Consists of a set of ODEs
− More complicated/time-consuming than linear models
• Both types provide a means for estimating the HRF.
Estimating the HRF
• The ability to accurately model the evoked
hemodynamic response to a neural event plays
an important role in the analysis of fMRI data.
• When analyzing the shape of the estimated
HRF, summary measures (e.g., amplitude, delay,
and duration) can be extracted.
• They can be used to infer information regarding
the intensity, onset latency, and duration of the
underlying activity.
Summary Measures
Estimate amplitude (H),
time-to-peak (T), and
full-width at half-max (W).
Ideally, these parameters should be directly
interpretable in terms of changes in neuronal activity,
and estimated so that statistical power is maximized.
Interpretability
Q1. Do changes in parameters related to neural activity directly translate
into changes in corresponding parameters of the HRF?
Q2. Does the HRF model recover the true parameters of the response?
Solid
– expected
relationships
Dashed – relationships
that complicate
interpretation.
• BOLD physiology limits the interpretability of parameters
in terms of neuronal and metabolic function.
• We treat the evoked BOLD response as the signal of
interest, without making a direct quantitative link to
neuronal activity.
• Here we focus on the ability of different models to
recover differences in the height, time-to-peak, and width
of the true BOLD response.
• Which model is most efficient while giving rise to the
least amount of bias and misspecification?
LTI System
• The dominant analysis strategy is to assume
that BOLD responses to events add linearly
(Boynton et al.1996) and use a set of smooth
functions to model the underlying HRF.
• We model the relationship between stimuli and
BOLD response using a linear time invariant
(LTI) system.
• The stimulus acts as the input and the HRF acts
as the impulse response function.
Convolution
Examples
Block Design
Event-Related
Experimental
Stimulus Function
*
*
=
=
Hemodynamic
Response
Function
Predicted
Response
General Linear Model
• The General linear model (GLM) approach
treats the data as a linear combination of model
functions (predictors) plus noise (error).
• The model functions are assumed to have
known shapes, but their amplitudes are
unknown and need to be estimated.
• The GLM framework encompasses many of the
commonly used techniques in fMRI data
analysis (and data analysis more generally).
Matrix Formulation
We can write the GLM model as
Y  Xβ  ε
where
Y1  1 X 11  X 1 p    0   1 
    
Y  1 X

X
2 
21
2p   1 
 2  













    
  
1
X

X
2p
np 
Yn  
   p   n 
fMRI Data
Design matrix
Noise
Model parameters
GLM - Solution
Assume the model:
Y  Xβ  ε
where
Var(ε)  V 2
If V is known the optimal solution for  is:


1 T
T
1
ˆ
  X V X X V 1Y
^
Inference is performed using linear combinations of .
Basis Functions
• A linear combination of several basis functions
can be used to account for possible delays and
dispersions in the HRF.
• The stimulus function is convolved with each of
the basis functions to give a set of regressors.
• The parameter estimates give the coefficients
that determine the combination of basis
functions that best model the HRF for the trial
type and voxel in question.
Basis Functions
Examples:
• Canonical HRF + derivatives
• Finite impulse response functions
• Many more…..
Basis Functions
Model
Single HRF
HRF +
derivatives
Finite
Impulse
Response
(FIR)
Time (s)
Image of
predictors
Data & Fitted
Smooth FIR
• The FIR solution tends to be very noisy.
• To constrain the fit to be smoother (but otherwise
of arbitrary shape), a Gaussian prior can be
placed on the filter parameters .
• The maximum a posteriori estimate of  gives a
smoothed version of the filter.
Red – FIR
Blue – Smooth FIR
Non-linear Models
• Alternatively, one can use non-linear models
with free parameters for magnitude and
onset/peak delay.
• Common criticisms of such approaches are their
computational costs and potential convergence
problems.
• However, increases in computational power
make nonlinear estimation over the whole brain
feasible.
Inverse Logit Model
• Superposition of three
inverse logit (sigmoid)
functions.
L( x)  (1  e  x ) 1
• Each function has three
variable parameters
dictating the amplitude,
position and slope.
h(t |  )  1 L(t  T1 ) D1 
  2 L(t  T2 ) D2 
  3 L(t  T3 ) D3 
Lindquist & Wager (2007)
Flexibility
By shifting the position of
the second IL function one
can model differences in
duration.
By shifting the position of
all three IL functions one
can model differences in
onset.
Model fitting
• Model fitting is performed using either simulated
annealing or gradient descent.
• We typically constrain the solution so that the
fitted response begins and ends at 0, which
leads to a model with 7 variable parameters.
• Alternatively, we use a 4 parameter model where
only the position of each function and the total
amplitude is allowed to vary.
Simulation Study
• We compare the different models ability to
handle shifts in onset and duration. The models
we studied were:
–
–
–
–
–
–
–
The canonical HRF
The canonical HRF + 1st derivative
The canonical HRF + 1st & 2nd derivative
The FIR model
The Smooth FIR model
Non-linear Gamma
Inverse Logit Model
GLM based
Non-linear
Lindquist & Wager (2007)
Lindquist, Loh, Atlas & Wager (2008)
Estimation
• After fitting each model we estimate H, T and W
using closed form solutions (when available) or
the fitted HRF.
• For models that include the canonical HRF and
its derivatives it is common to only use the nonderivative term as an estimate of amplitude.
• However, this will be biased and instead we use
“derivative boost” (Calhoun et al., 2004).
Simulation
A
B
Stimulus function
1
5
7
Duration
3
Assumed in analysis
of simulation data
9
1
2
3
4
5
Assumed (black) with
delayed “true” (gray)
Onset shift
C

“True” response of
extended duration

Simulation
• Datasets generated for 15 “subjects”, consisting of the
“true” BOLD time series plus white noise, with plausible
effect size (noise std equal to 1, Cohen’s d = 0.5).
• Estimates of amplitude (H), time-to-peak (T) and width
(W) were obtained for each model. The average values
across the 15 subjects were compared with the true
values to assess model dependent bias in the estimates.
• In addition, for each subject and voxel the residuals were
checked for model misspecification.
Detecting Mis-modeling
• Let r(i) be the whitened residuals and K(i) a Gaussian
kernel.
• When no mis-modeling is present
t+w1
Yw (t) =  r(i)K(t  i)
i=t
is normal with mean 0 for all w, t.
Yw (t )   ) using random field theory.
• Calculate P (max
t
Loh, Lindquist & Wager (2008)
1
5
7
Duration
3
Negative
Bias
9
1
2
3
4
No
Significant
Bias
Positive
Bias
5
Onset shift
GAM
H
T
Mis-modeling
W
TD
DD
FIR
sFIR
NL
IL
Inference
• We perform population inference using the
estimated amplitude for each subject and one of
the following methods:
- Summary Statistics Approach
- Assumes normality
- Bootstrap
- Non-parametric
- Use the bias-corrected accelerated (BCa) version
- Sign-permutation test
- Non-parametric
Results
Conclusions
• The canonical HRF based models (GAM, TD &
DD) are highly susceptible to model
misspecification.
• The FIR models (FIR & sFIR) and the IL model
provide the most flexibility to handle differences
in onset and duration.
• The IL model performs best in terms of bias and
model misspecification, but is computationally
more demanding than the FIR models.
Overview
Stimulus
Part I:
Neuronal
Activity
Hemodynamics
Given data and stimulus function, estimate the
hemodynamic response function (HRF).
Part II: Given data only, estimate activity.
Overview
Stimulus
Neuronal
Activity
Hemodynamics
?
Part I:
Given data and stimulus function, estimate the
hemodynamic response function (HRF).
Part II: Given data only, estimate activity.
Unknown Stimulus Functions
• Most statistical analysis of fMRI data assume
that the timing and duration of the psychological
processes are known.
• However, in many studies, it is hard to specify
this information a priori (e.g., threat/emotional
experience and drug uptake).
• In these situations using a standard GLM-based
analysis is not appropriate and alternatives need
to be explored.
Change Point Analysis
• Our approach uses change point analysis to
detect changes in brain activity without prior
knowledge of the exact onset or duration.
• We can make population inferences on whether,
when, and for how long an fMRI time series
deviates from a baseline level.
• We can then characterize brain responses in
terms of their relationship to physiological
changes (e.g. reported stress).
Change Point Analysis
• We propose a three step procedure:
1. Use HEWMA (Hierarchical EWMA), to determine
voxels with time courses that deviate from baseline in
the population (Lindquist, Waugh, & Wager 2007).
2. Estimate voxel-specific distributions of onset times
and durations from the fMRI response.
3. Perform spatial clustering of voxels according to
onset and duration characteristics, and anatomical
location using a hidden Markov random field model.
HEWMA
Two states: Active/Inactive
Use the weighted average of each subjects
EWMA statistic to get group results:
Z pop
 m i
 W 
 i1 
1 m
W Z
i

Wi    
*
i

* 1
b
i 1
Search across time
for deviations from
baseline (inactive)
state.
Calculate smoothed EWMA statistic
across time (t) for each subject:
zt  xt  (1   ) zt 1 t  1,
i
n
Monte Carlo
simulation provides
correction for multiple
comparisons
Estimating Change Points
• Each subject’s time series is a sequence of
normally distributed observations that may at
some unknown time i undergo a shift in mean.
• This in turn may be followed by a return to
baseline at i + i where i is also unknown.
• Both i and i are random variables drawn from
unknown population distributions: g(t) = P(i=t)
and g(t) = P(i=t), respectively.
• The distributions are estimated assuming no
functional form, and allowing for the possibility of
no response.
• We assume contiguous observations come from
the same component except at i and i + i.
Active: N ( 2 ,  2 )
 i ~ g
Inactive: N ( 1 ,  12 )
i ~ g
2
i
Estimate: 1=(1, 1), 2=(2, 2), g, g.
i+ i
Conditional likelihood (i and i known):
Baseline state Active state Baseline state
Joint likelihood (i and i unknown):
Onset
Distribution
Duration
Distribution
By treating i and i as missing data, we can employ the
EM algorithm to calculate the MLE of g(t) and g(t).
Using the estimated distributions we can calculate the
probability of activation as a function of time:
t
P(activation at time t )   P(  j ) P(  t  j )
j 1
Simulated data
A
B
C
D
Spatial Clustering
• A Hidden Markov Random Field model is used to cluster
voxels based on onset and duration characteristics.
• While the data Y from each voxel is observed, the
cluster labels X are unobserved.
• Conditional on a neighborhood of voxels, a voxel’s
cluster membership is independent of all non-neighbors:
• We use the ICM algorithm to approximate the maximum
a posteriori estimate of X:
Experiment
• 24 participants were scanned in a 3T GE magnet.
• Participants were informed that they were to be given 2
min to prepare a 7 min speech, the topic of which would
be revealed to them during scanning. After the scan, the
speech would be delivered to a panel of expert judges.
• During a run, 215 images were acquired (TR = 2s).
Stress and increased heart rate were reported
throughout the entire preparation interval.
Results
Sustained
Transient
45-90 s
90-160 s
Onset of speech task
Lindquist, Waugh, & Wager 2007
Lindquist & Wager, in press
5
2
3
1
2
3
4
5
4
HR
Visual cue | Speech preparation
1
1.
2.
3.
4.
5.
Visual cortex
Superior temporal sulci
Ventral striatum
Superior temporal sulci
Ventromedial PFC
MPFC only area with sustained
activation throughout speech
preparation.
Summary
• In many experiments the exact form of the
stimulus and/or HRF are not known a priori.
• There exist a number of linear and non-linear
techniques for estimating the HRF, but there are
substantial differences in terms of power, bias
and parameter confusibility.
• Using change point methods we can make
inference about activation with unknown onset
and duration.
Comments
• Collaborators:
–
–
–
–
Lauren Atlas
Ji-Meng Loh
Lucy Robinson
Tor Wager
• Matlab implementation of HEWMA freely
available at:
– http://www.columbia.edu/cu/psychology/tor/
Download