Group analyses of fMRI data Klaas Enno Stephan Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich Laboratory for Social & Neural Systems Research (SNS), University of Zurich Wellcome Trust Centre for Neuroimaging, University College London With many thanks for slides & images to: FIL Methods group, particularly Will Penny & Tom Nichols Methods & models for fMRI data analysis November 2012 Overview of SPM Image time-series Realignment Kernel Design matrix Smoothing General linear model Statistical parametric map (SPM) Statistical inference Normalisation Gaussian field theory p <0.05 Template Parameter estimates Reminder: voxel-wise time series analysis! model specification Time parameter estimation hypothesis statistic BOLD signal single voxel time series SPM The model: voxel-wise GLM p 1 1 1 p y N = N X y X e e ~ N (0, I ) 2 + e N Model is specified by 1. Design matrix X 2. Assumptions about e N: number of scans p: number of regressors The design matrix embodies all available knowledge about experimentally controlled factors and potential confounds. GLM assumes Gaussian “spherical” (i.i.d.) errors sphericity = iid: error covariance is scalar multiple of identity matrix: Cov(e) = 2I Examples for non-sphericity: 4 0 Cov(e) 0 1 non-identity 1 0 Cov(e) 0 1 2 1 Cov(e) 1 2 non-independence Multiple covariance components at 1st level V Cov( e) e ~ N (0, V ) 2 enhanced noise model V = 1 V iQi error covariance components Q and hyperparameters Q1 + 2 Q2 Estimation of hyperparameters with ReML (restricted maximum likelihood). t-statistic based on ML estimates Wy WX We ̂ (WX ) Wy c=10000000000 c ˆ t T ˆ ˆ st d ( c ) T W V stˆd (cT ˆ ) ˆ c (WX ) (WX ) c 1 / 2 2V Cov (e) 2 T ˆ 2 T Wy WXˆ 2 tr ( R) R I WX (WX ) X V Q i i For brevity: ReMLestimates (WX ) ( X TWX ) 1 X T Fixed vs. random effects analysis • Fixed Effects – Intra-subject variation suggests most subjects different from zero • Random Effects – Inter-subject variation suggests population is not very different from zero Distribution of each subject’s estimated effect 2FFX Subj. 1 Subj. 2 Subj. 3 Subj. 4 Subj. 5 Subj. 6 0 2RFX Distribution of population effect 8 Fixed Effects • Assumption: variation (over subjects) is only due to measurement error • parameters are fixed properties of the population (i.e., they are the same in each subject) Random/Mixed Effects • two sources of variation (over subjects) – measurement error – Response magnitude: parameters are probabilistically distributed in the population • effect (response magnitude) in each subject is randomly distributed Random/Mixed Effects • two sources of variation (over subjects) – measurement error – Response magnitude: parameters are probabilistically distributed in the population • effect (response magnitude) in each subject is randomly distributed • variation around population mean Group level inference: fixed effects (FFX) • assumes that parameters are “fixed properties of the population” • all variability is only intra-subject variability, e.g. due to measurement errors • Laird & Ware (1982): the probability distribution of the data has the same form for each individual and the same parameters • In SPM: simply concatenate the data and the design matrices lots of power (proportional to number of scans), but results are only valid for the group studied and cannot be generalized to the population Group level inference: random effects (RFX) • assumes that model parameters are probabilistically distributed in the population • variance is due to inter-subject variability • Laird & Ware (1982): the probability distribution of the data has the same form for each individual, but the parameters vary across individuals • hierarchical model much less power (proportional to number of subjects), but results can be generalized to the population FFX vs. RFX • FFX is not "wrong", it makes different assumptions and addresses a different question than RFX • For some questions, FFX may be appropriate (e.g., low-level physiological processes). • For other questions, RFX is much more plausible (e.g., cognitive tasks, disease processes in heterogeneous populations). Hierachical models fMRI, single subject fMRI, multi-subject EEG/MEG, single subject ERP/ERF, multi-subject Hierarchical models for all imaging data! Linear hierarchical model Hierarchical model Multiple variance components at each level y X (1) (1) (1) (1) X ( 2) ( 2) ( 2) C Q (i) (i) k k ( n 1) X ( n ) ( n ) ( n ) At each level, distribution of parameters is given by level above. What we don’t know: distribution of parameters and variance parameters (hyperparameters). (i) k Example: Two-level model 1 1 yX 1 1 X 2 2 2 1 X 1(1) y = 2 + 1 X 2(1) 1 = X 2 + 2 X 3(1) Second level First level Two-level model y X (1) (1) (1) (1) X (2) (2) (2) y X (1) X (2) (2) (2) (1) X (1) X (2) (2) X (1) (2) (1) fixed effects Friston et al. 2002, NeuroImage random effects Mixed effects analysis Non-hierarchical model y X (1) X (2) (2) X (1) (2) (1) ˆ(1) X (1) y X (2) (2) (2) X (1) (1) Estimating 2nd level effects X (2) (2) (2) Variance components at 2nd level Cov (2) C (2) X (1) (1) C X (1)T within-level between-level non-sphericity non-sphericity Within-level non-sphericity at both levels: multiple covariance components C (i ) k Qk(i ) (i ) k Friston et al. 2005, NeuroImage Algorithmic equivalence y X (1) (1) (1) Hierarchical model (1) X ( 2) ( 2) ( 2) Parametric Empirical Bayes (PEB) ( n 1) X ( n ) ( n ) ( n ) EM = PEB = ReML Single-level model y (1) X (1) ( 2 ) ... X (1) X ( n 1) ( n ) X (1) X ( n ) ( n ) Restricted Maximum Likelihood (ReML) Estimation by Expectation Maximisation (EM) y X N 1 N p p1 N 1 EM-algorithm C | y 1 XT C 1X C 1 η | y C | y X C y C η C k Qk T 1 E-step k maximise L ln p( y | λ) • E-step: finds the (conditional) expectation of the parameters, holding the hyperparameters fixed • M-step: updates the maximum likelihood estimate of the hyperparameters, keeping the parameters fixed dL g d d 2L J 2 d J 1 g M-step Gauss-Newton gradient ascent Friston et al. 2002, NeuroImage Practical problems • Full MFX inference using REML or EM for a wholebrain 2-level model has enormous computational costs • for many subjects and scans, covariance matrices become extremely large • nonlinear optimisation problem for each voxel • Moreover, sometimes we are only interested in one specific effect and do not want to model all the data. • Is there a fast approximation? Summary statistics approach: Holmes & Friston 1998 First level Data Design Matrix ̂1 ̂ 12 Second level Contrast Images t cT ˆ Vaˆr (cT ˆ ) SPM(t) ̂ 2 ˆ 22 ̂11 ̂ 112 ̂12 ̂ 122 One-sample t-test @ 2nd level Validity of the summary statistics approach The summary stats approach is exact if for each session/subject: Within-session covariance the same First-level design the same One contrast per session But: Summary stats approach is fairly robust against violations of these conditions. Mixed effects analysis: spm_mfx y data X [ X (0) V I Summary statistics non-hierarchical model X [ X ( 0) X (1) ] X (1) X ( 2) ] Q {Q1(1) ,, X (1) Q1( 2) X (1)T ,} Step 1 ˆ (1) ( X TV 1 X ) 1 X TV 1 y REML{ yyT n , X , Q} Y ˆ (1) X X ( 2) V (i1) X (1) Qi(1) X (1) T (j2 )Q (j 2 ) i EM approach Friston et al. 2005, NeuroImage j 1st level non-sphericity 2nd level non-sphericity Step 2 ˆ ( 2) ( X TV 1 X ) 1 X TV 1 y ˆ(2) pooling over voxels 2nd level non-sphericity modeling in SPM8 • 1 effect per subject → use summary statistics approach • >1 effect per subject → model sphericity at 2nd level using variance basis functions Reminder: sphericity C Cov( ) E ( ) T y X „sphericity“ means: Scans Cov( ) I 2 i.e. Var ( i ) 1 0 Cov( ) 0 1 Scans 2 2nd level: non-sphericity Error covariance Errors are independent but not identical: e.g. different groups (patients, controls) Errors are not independent and not identical: e.g. repeated measures for each subject (multiple basis functions, multiple conditions etc.) Example of 2nd level non-sphericity Error Covariance Qk: ... ... Example of 2nd level non-sphericity y=X +e N1 Np p1 Cor(ε) =Σk λkQk N1 error covariance • 12 subjects, 4 conditions N • Measurements between subjects uncorrelated • Measurements within subjects correlated • Errors can have different variances across subjects N 30 2nd level non-sphericity modeling in SPM8: assumptions and limitations • Cor() assumed to be globally homogeneous • k’s only estimated from voxels with large F • intrasubject variance assumed homogeneous Practical conclusions • Linear hierarchical models are used for group analyses of multisubject imaging data. • The main challenge is to model non-sphericity (i.e. non-identity and non-independence of errors) within and between levels of the hierarchy. • This is done by estimating hyperparameters using EM or ReML (which are equivalent for linear models). • The summary statistics approach is robust approximation to a full mixed-effects analysis. – Use mixed-effects model only, if seriously in doubt about validity of summary statistics approach. Recommended reading Linear hierarchical models Mixed effect models Thank you