Modeling fMRI data with uncertain hemodynamic response or stimulus functions Martin Lindquist Department of Statistics Columbia University Functional MRI • Functional MRI (fMRI) performed using BOLD contrast can detect changes in blood oxygenation and flow that occur in response to neural activity. • A primary goal of fMRI research is to use information provided by the BOLD signal to make conclusions about the underlying neuronal activity. Overview Stimulus Part I: Neuronal Activity Hemodynamics Given data and stimulus function, estimate the hemodynamic response function (HRF). Part II: Given data only, estimate activity. Modeling the Hemodynamics • A number of methods exist for modeling the relationship between stimulus and BOLD response. – Linear time invariant (LTI) system − BOLD response to events add linearly − Relatively simple to use – Non-linear models (e.g. Balloon model) − Consists of a set of ODEs − More complicated/time-consuming than linear models • Both types provide a means for estimating the HRF. Estimating the HRF • The ability to accurately model the evoked hemodynamic response to a neural event plays an important role in the analysis of fMRI data. • When analyzing the shape of the estimated HRF, summary measures (e.g., amplitude, delay, and duration) can be extracted. • They can be used to infer information regarding the intensity, onset latency, and duration of the underlying activity. Summary Measures Estimate amplitude (H), time-to-peak (T), and full-width at half-max (W). Ideally, these parameters should be directly interpretable in terms of changes in neuronal activity, and estimated so that statistical power is maximized. Interpretability Q1. Do changes in parameters related to neural activity directly translate into changes in corresponding parameters of the HRF? Q2. Does the HRF model recover the true parameters of the response? Solid – expected relationships Dashed – relationships that complicate interpretation. • BOLD physiology limits the interpretability of parameters in terms of neuronal and metabolic function. • We treat the evoked BOLD response as the signal of interest, without making a direct quantitative link to neuronal activity. • Here we focus on the ability of different models to recover differences in the height, time-to-peak, and width of the true BOLD response. • Which model is most efficient while giving rise to the least amount of bias and misspecification? LTI System • The dominant analysis strategy is to assume that BOLD responses to events add linearly (Boynton et al.1996) and use a set of smooth functions to model the underlying HRF. • We model the relationship between stimuli and BOLD response using a linear time invariant (LTI) system. • The stimulus acts as the input and the HRF acts as the impulse response function. Convolution Examples Block Design Event-Related Experimental Stimulus Function * * = = Hemodynamic Response Function Predicted Response General Linear Model • The General linear model (GLM) approach treats the data as a linear combination of model functions (predictors) plus noise (error). • The model functions are assumed to have known shapes, but their amplitudes are unknown and need to be estimated. • The GLM framework encompasses many of the commonly used techniques in fMRI data analysis (and data analysis more generally). Matrix Formulation We can write the GLM model as Y Xβ ε where Y1 1 X 11 X 1 p 0 1 Y 1 X X 2 21 2p 1 2 1 X X 2p np Yn p n fMRI Data Design matrix Noise Model parameters GLM - Solution Assume the model: Y Xβ ε where Var(ε) V 2 If V is known the optimal solution for is: 1 T T 1 ˆ X V X X V 1Y ^ Inference is performed using linear combinations of . Basis Functions • A linear combination of several basis functions can be used to account for possible delays and dispersions in the HRF. • The stimulus function is convolved with each of the basis functions to give a set of regressors. • The parameter estimates give the coefficients that determine the combination of basis functions that best model the HRF for the trial type and voxel in question. Basis Functions Examples: • Canonical HRF + derivatives • Finite impulse response functions • Many more….. Basis Functions Model Single HRF HRF + derivatives Finite Impulse Response (FIR) Time (s) Image of predictors Data & Fitted Smooth FIR • The FIR solution tends to be very noisy. • To constrain the fit to be smoother (but otherwise of arbitrary shape), a Gaussian prior can be placed on the filter parameters . • The maximum a posteriori estimate of gives a smoothed version of the filter. Red – FIR Blue – Smooth FIR Non-linear Models • Alternatively, one can use non-linear models with free parameters for magnitude and onset/peak delay. • Common criticisms of such approaches are their computational costs and potential convergence problems. • However, increases in computational power make nonlinear estimation over the whole brain feasible. Inverse Logit Model • Superposition of three inverse logit (sigmoid) functions. L( x) (1 e x ) 1 • Each function has three variable parameters dictating the amplitude, position and slope. h(t | ) 1 L(t T1 ) D1 2 L(t T2 ) D2 3 L(t T3 ) D3 Lindquist & Wager (2007) Flexibility By shifting the position of the second IL function one can model differences in duration. By shifting the position of all three IL functions one can model differences in onset. Model fitting • Model fitting is performed using either simulated annealing or gradient descent. • We typically constrain the solution so that the fitted response begins and ends at 0, which leads to a model with 7 variable parameters. • Alternatively, we use a 4 parameter model where only the position of each function and the total amplitude is allowed to vary. Simulation Study • We compare the different models ability to handle shifts in onset and duration. The models we studied were: – – – – – – – The canonical HRF The canonical HRF + 1st derivative The canonical HRF + 1st & 2nd derivative The FIR model The Smooth FIR model Non-linear Gamma Inverse Logit Model GLM based Non-linear Lindquist & Wager (2007) Lindquist, Loh, Atlas & Wager (2008) Estimation • After fitting each model we estimate H, T and W using closed form solutions (when available) or the fitted HRF. • For models that include the canonical HRF and its derivatives it is common to only use the nonderivative term as an estimate of amplitude. • However, this will be biased and instead we use “derivative boost” (Calhoun et al., 2004). Simulation A B Stimulus function 1 5 7 Duration 3 Assumed in analysis of simulation data 9 1 2 3 4 5 Assumed (black) with delayed “true” (gray) Onset shift C “True” response of extended duration Simulation • Datasets generated for 15 “subjects”, consisting of the “true” BOLD time series plus white noise, with plausible effect size (noise std equal to 1, Cohen’s d = 0.5). • Estimates of amplitude (H), time-to-peak (T) and width (W) were obtained for each model. The average values across the 15 subjects were compared with the true values to assess model dependent bias in the estimates. • In addition, for each subject and voxel the residuals were checked for model misspecification. Detecting Mis-modeling • Let r(i) be the whitened residuals and K(i) a Gaussian kernel. • When no mis-modeling is present t+w1 Yw (t) = r(i)K(t i) i=t is normal with mean 0 for all w, t. Yw (t ) ) using random field theory. • Calculate P (max t Loh, Lindquist & Wager (2008) 1 5 7 Duration 3 Negative Bias 9 1 2 3 4 No Significant Bias Positive Bias 5 Onset shift GAM H T Mis-modeling W TD DD FIR sFIR NL IL Inference • We perform population inference using the estimated amplitude for each subject and one of the following methods: - Summary Statistics Approach - Assumes normality - Bootstrap - Non-parametric - Use the bias-corrected accelerated (BCa) version - Sign-permutation test - Non-parametric Results Conclusions • The canonical HRF based models (GAM, TD & DD) are highly susceptible to model misspecification. • The FIR models (FIR & sFIR) and the IL model provide the most flexibility to handle differences in onset and duration. • The IL model performs best in terms of bias and model misspecification, but is computationally more demanding than the FIR models. Overview Stimulus Part I: Neuronal Activity Hemodynamics Given data and stimulus function, estimate the hemodynamic response function (HRF). Part II: Given data only, estimate activity. Overview Stimulus Neuronal Activity Hemodynamics ? Part I: Given data and stimulus function, estimate the hemodynamic response function (HRF). Part II: Given data only, estimate activity. Unknown Stimulus Functions • Most statistical analysis of fMRI data assume that the timing and duration of the psychological processes are known. • However, in many studies, it is hard to specify this information a priori (e.g., threat/emotional experience and drug uptake). • In these situations using a standard GLM-based analysis is not appropriate and alternatives need to be explored. Change Point Analysis • Our approach uses change point analysis to detect changes in brain activity without prior knowledge of the exact onset or duration. • We can make population inferences on whether, when, and for how long an fMRI time series deviates from a baseline level. • We can then characterize brain responses in terms of their relationship to physiological changes (e.g. reported stress). Change Point Analysis • We propose a three step procedure: 1. Use HEWMA (Hierarchical EWMA), to determine voxels with time courses that deviate from baseline in the population (Lindquist, Waugh, & Wager 2007). 2. Estimate voxel-specific distributions of onset times and durations from the fMRI response. 3. Perform spatial clustering of voxels according to onset and duration characteristics, and anatomical location using a hidden Markov random field model. HEWMA Two states: Active/Inactive Use the weighted average of each subjects EWMA statistic to get group results: Z pop m i W i1 1 m W Z i Wi * i * 1 b i 1 Search across time for deviations from baseline (inactive) state. Calculate smoothed EWMA statistic across time (t) for each subject: zt xt (1 ) zt 1 t 1, i n Monte Carlo simulation provides correction for multiple comparisons Estimating Change Points • Each subject’s time series is a sequence of normally distributed observations that may at some unknown time i undergo a shift in mean. • This in turn may be followed by a return to baseline at i + i where i is also unknown. • Both i and i are random variables drawn from unknown population distributions: g(t) = P(i=t) and g(t) = P(i=t), respectively. • The distributions are estimated assuming no functional form, and allowing for the possibility of no response. • We assume contiguous observations come from the same component except at i and i + i. Active: N ( 2 , 2 ) i ~ g Inactive: N ( 1 , 12 ) i ~ g 2 i Estimate: 1=(1, 1), 2=(2, 2), g, g. i+ i Conditional likelihood (i and i known): Baseline state Active state Baseline state Joint likelihood (i and i unknown): Onset Distribution Duration Distribution By treating i and i as missing data, we can employ the EM algorithm to calculate the MLE of g(t) and g(t). Using the estimated distributions we can calculate the probability of activation as a function of time: t P(activation at time t ) P( j ) P( t j ) j 1 Simulated data A B C D Spatial Clustering • A Hidden Markov Random Field model is used to cluster voxels based on onset and duration characteristics. • While the data Y from each voxel is observed, the cluster labels X are unobserved. • Conditional on a neighborhood of voxels, a voxel’s cluster membership is independent of all non-neighbors: • We use the ICM algorithm to approximate the maximum a posteriori estimate of X: Experiment • 24 participants were scanned in a 3T GE magnet. • Participants were informed that they were to be given 2 min to prepare a 7 min speech, the topic of which would be revealed to them during scanning. After the scan, the speech would be delivered to a panel of expert judges. • During a run, 215 images were acquired (TR = 2s). Stress and increased heart rate were reported throughout the entire preparation interval. Results Sustained Transient 45-90 s 90-160 s Onset of speech task Lindquist, Waugh, & Wager 2007 Lindquist & Wager, in press 5 2 3 1 2 3 4 5 4 HR Visual cue | Speech preparation 1 1. 2. 3. 4. 5. Visual cortex Superior temporal sulci Ventral striatum Superior temporal sulci Ventromedial PFC MPFC only area with sustained activation throughout speech preparation. Summary • In many experiments the exact form of the stimulus and/or HRF are not known a priori. • There exist a number of linear and non-linear techniques for estimating the HRF, but there are substantial differences in terms of power, bias and parameter confusibility. • Using change point methods we can make inference about activation with unknown onset and duration. Comments • Collaborators: – – – – Lauren Atlas Ji-Meng Loh Lucy Robinson Tor Wager • Matlab implementation of HEWMA freely available at: – http://www.columbia.edu/cu/psychology/tor/