The statistical analysis of fMRI data Keith Worsley12, Chuanhong Liao1, John Aston123, Jean-Baptiste Poline4, Gary Duncan5, Vali Petre2, Frank Morales6, Alan Evans2, Tom Nichols7, Satoru Hayasaki8, Jonathan Taylor9 1Department of Mathematics and Statistics, McGill University, Imaging Centre, Montreal Neurological Institute, 3Statistica Sinica, Taipei, 4Neurospin, CEA, Orsay, 5Centre de Recherche en Sciences Neurologiques, Université de Montréal, 6Cuban Neuroscience Centre 7GlaxoSmithKline and FMRIB, Oxford 8Wake Forest University 9Université de Montréal and Stanford 2Brain Before you start: PCA of time space Component Temporal components (sd, % variance explained) 1 0.68, 46.9% 2 0.29, 8.6% 3 0.17, 2.9% 4 0.15, 2.4% 0 20 40 60 80 100 120 140 Frame Spatial components 1 Component 1 0.5 2 0 3 -0.5 1: exclude first frames 2: drift 3: long-range correlation or anatomical effect: remove by converting to % of brain 4 0 2 4 6 8 Slice (0 based) 10 12 -1 4: signal? Bad design: 2 mins rest 2 mins Mozart 2 mins Eminem 2 mins James Brown Rest Mozart Eminem J. Brown Temporal components Component Period: 5.2 16.1 (sd, % variance explained) 15.6 11.6 seconds 1 0.41, 17% 2 0.31, 9.5% 3 0.24, 5.6% 0 50 100 Frame Spatial components 150 200 1 Component 1 0.5 2 0 -0.5 3 0 2 4 6 8 10 12 Slice (0 based) 14 16 18 -1 Effect of stimulus on brain response Alternating hot and warm stimuli separated by rest (9 seconds each). 2 1 0 -1 0 50 100 150 200 250 300 350 Hemodynamic response function: difference of two gamma densities Stimulus is delayed and dispersed by ~6s 0.4 Modeled by convolving the stimulus with the “hemodynamic response function” 0.2 0 -0.2 0 50 Responses = stimuli * HRF, sampled every 3 seconds 2 1 0 -1 0 50 100 150 200 Time, seconds 250 300 350 fMRI data, pain experiment, one slice First scan of fMRI data Highly significant effect, T=6.59 1000 hot rest warm 890 880 870 500 0 100 200 300 No significant effect, T=-0.74 820 hot rest warm 0 800 T statistic for hot - warm effect 5 0 -5 T = (hot – warm effect) / S.d. ~ t110 if no effect 0 100 0 100 200 Drift 300 810 800 790 200 Time, seconds 300 How fMRI differs from other repeated measures data Many reps (~200 time points) Few subjects (~15) Df within subjects is high, so not worth pooling sd across subjects Df between subjects low, so use spatial smoothing to boost df Data sets are huge ~4GB, not easy to use statistics packages such as R FMRISTAT (Matlab) / BRAINSTAT (Python) statistical analysis strategy Analyse each voxel separately Break up analysis into stages Borrow strength from neighbours when needed 1st level: analyse each time series separately 2nd level: combine 1st level results over runs 3rd level: combine 2nd level results over subjects Cut corners: do a reasonable analysis in a reasonable time (or else no one will use it!) 1st level: Linear model with AR(p) errors Data Model Yt = fMRI data at time t xt = (responses,1, t, t2, t3, … )’ to allow for drift Yt = xt’β + εt εt = a1εt-1 + … + apεt-p + σFηt, ηt ~ N(0,1) i.i.d. Fit in 2 stages: 1st pass: fit by least squares, find residuals, estimate AR parameters a1 … ap 2nd pass: whiten data, re-fit by least squares Higher levels: Mixed effects model Data Model Ei = effect (contrast in β) from previous level Si = sd of effect from previous level zi = (1, treatment, group, gender, …)’ Ei = zi’γ + SiεiF + σRεiR (Si high df, so assumed fixed) εiF ~ N(0,1) i.i.d. fixed effects error εiR ~ N(0,1) i.i.d. random effects error Fit by ReML Use EM for stability, 10 iterations Where we use spatial information 1st level: smooth AR parameters to lower variability and increase “df” Higher levels: smooth Random / Fixed effects sd ratio to lower variability and increase “df” Final level: use random field theory to correct for multiple comparisons 1st level: Autocorrelation AR(1) model: εt = a1 εt-1 + σFηt Fit the linear model using least squares εt = Y t – Y t â1 = Correlation (εt , εt-1) Estimating εt changes their correlation structure slightly, so â1 is slightly biased: Raw autocorrelation Smoothed 12.4mm ~ -0.05 Bias corrected â1 ~0 0.3 0.2 0.1 0 -0.1 How much smoothing? • Variability in â lowers df • Df depends on contrast • Smoothing â brings df back up: ( FWHMâ2 +1 2 FWHMdata dfâ = dfresidual 2 1 dfeff Hot stimulus = 1 + 2 acor(contrast of data)2 dfresidual dfâ FWHMdata = 8.79 Residual df = 110 100 Target = 100 df Contrast of data, acor = 0.61 50 dfeff 0 0 10 20 30 FWHM = 10.3mm FWHMâ ) 3/2 Hot-warm stimulus Residual df = 110 100 Target = 100 df Contrast of data, acor = 0.79 50 dfeff 0 0 10 20 30 FWHM = 12.4mm FWHMâ Higher order AR model? Try AR(3): â 1 â 2 â 3 0.3 0.2 AR(1) seems to be adequate 0.1 0 … has little effect on the T statistics: No correlation AR(1), df=100 AR(2), df=99 -0.1 AR(3), df=98 5 0 -5 biases T up ~12% → more false positives 2nd level: 4 runs, 3 df for random effects sd Run 1 Run 2 Run 3 Run 4 2nd level Effect, Ei 1 0 … very noisy sd: -1 0.2 Sd, Si 0.1 … and T>15.96 for P<0.05 (corrected): 0 5 T stat, E i / Si 0 … so no response is detected … -5 Solution: Spatial smoothing of the sd ratio • Basic idea: increase df by spatial smoothing (local pooling) of the sd. • Can’t smooth the random effects sd directly, - too much anatomical structure. • Instead, sd = smooth random effects sd fixed effects sd fixed effects sd ) which removes the anatomical structure before smoothing. ^ Average Si Random effects sd, 3 df Fixed effects sd, 440 df Mixed effects sd, ~100 df 0.2 0.15 0.1 0.05 divide Random sd / fixed sd 0 multiply Smoothed sd ratio 1.5 1 0.5 random effect, sd ratio ~1.3 How much smoothing? ( dfratio = dfrandom FWHMratio2 2 +1 2 FWHMdata ) 1 1 1 = + dfeff dfratio dffixed 3/2 dfrandom = 3, dffixed = 4 110 = 440, FWHMdata = 8mm: fixed effects analysis, dfeff = 440 400 300 dfeff Target = 100 df random effects analysis, dfeff = 3 200 FWHM = 19mm 100 0 0 20 40 FWHMratio Infinity Final result: 19mm smoothing, 100 df Run 1 Run 2 Run 3 Run 4 2nd level Effect, Ei 1 0 … less noisy sd: -1 0.2 Sd, Si 0.1 … and T>4.93 for P<0.05 (corrected): 0 5 T stat, E i / Si 0 … and now we can detect a response! -5 Final level: Multiple comparisons correction Bonferroni 4.7 4.6 Gaussianized threshold 4.5 True 4.4 4.3 4.2 T, 10 df Random Field Theory T, 20 df Discrete Local Maxima (DLM) 4.1 Gaussian 4 3.9 3.8 3.7 0 Low FWHM: use Bonferroni 1 2 3 4 5 6 7 8 FWHM of smoothing kernel (voxels) In between: use Discrete Local Maxima (DLM) 9 10 High FWHM: use Random Field Theory 0.12 Gaussian T, 20 df T, 10 df 0.1 Random Field Theory Bonferroni P-value 0.08 0.06 0.04 True DLM can ½ P-value when FWHM ~3 voxels 0.02 0 Discrete Local Maxima 0 Low FWHM: use Bonferroni 1 2 3 4 5 6 7 FWHM of smoothing kernel (voxels) In between: use Discrete Local Maxima (DLM) 8 9 10 High FWHM: use Random Field Theory Example: single run, hot-warm Detected by BON and DLM but not by RFT Detected by DLM, but not by BON or RFT FWHM – the local smoothness of the noise FWHM = voxel size (2 log 2)1/2 1/2 (1 – correlation) (If the noise is modeled as white noise smoothed with a Gaussian kernel, this would be its FWHM) P-values depend on Resels: 0.1 Clusters above t = 3.0, search volume resels = 500 0.1 P value of cluster P value of local max Local maximum T = 4.5 0.08 0.06 0.04 0.02 0 0 Volume Resels = FWHM3 500 1000 Resels of search volume 0.08 0.06 0.04 0.02 0 0 0.5 1 1.5 Resels of cluster 2 Non-isotropic data (spatially varying FWHM) FWHM (mm) of scans (110 df) Cluster Resels=1.90 P=0.007 Cluster Resels=0.57 P=0.387 20 15 15 10 10 5 5 0 0 use ‘average’ FWHM inside search region, but Has a big effect on cluster P-values 20 fMRI data is smoother in GM than WM Has little effect on peak P-values FWHM (mm) of effects (3 df) smooth regions → big clusters, rough regions → small clusters, so Replace cluster volume by cluster resels = volume / FWHM3 Estimating the delay of the response • Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions: delay 0.6 0.4 basis1 0.2 HRF basis2 0 -0.2 -0.4 -5 0 shift 5 10 t (seconds) 15 20 25 HRF(t + shift) ~ basis1(t) w1(shift) + basis2(t) w2(shift) • Convolve bases with the stimulus, then add to the linear model • Fit linear model, estimate w1 and w2 3 w2 / w1 2 1 • Equate w2 / w1 to estimates, then solve for shift (Hensen et al., 2002) w1 • To reduce bias when the magnitude is small, use 0 w2 shift / (1 + 1/T2) -1 where T = w1 / Sd(w1) is the T statistic for the magnitude -2 -3 -5 0 shift (seconds) 5 • Shrinks shift to 0 where there is little evidence for a response. Shift of the hot stimulus T stat for magnitude T stat for shift 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 Shift (secs) Sd of shift (secs) 4 2 2 1.5 0 1 -2 0.5 -4 0 Shift of the hot stimulus T stat for magnitude T>4 T stat for shift 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 Shift (secs) ~1 sec T~2 Sd of shift (secs) 4 2 2 1.5 0 +/- 0.5 sec 1 -2 0.5 -4 0 Combining shifts of the hot stimulus (Contours are T stat for magnitude > 4) Run 1 Run 2 Run 3 Run 4 MULTISTAT 4 2 Effect, Ei 0 -2 -4 2 Sd, Si 1 0 5 T stat, E i / Si 0 -5 Shift of the hot stimulus Shift (secs) T stat for magnitude > 4.93 Contrasts in the data used for effects 2 Hot, Sd = 0.16 Warm, Sd = 0.16 9 sec 1 blocks, 9 sec gaps 0 -1 0 50 100 150 200 Hot-warm, Sd = 0.19 250 300 350 Time (secs) 2 Hot, Sd = 0.28 90 sec blocks, 1 90 sec gaps 0 Warm, Sd = 0.43 Only using data near block transitions Ignoring data in the middle of blocks -1 0 50 100 150 200 Hot-warm, Sd = 0.55 250 300 350 Time (secs) Optimum block design Sd of hot stimulus 0.5 20 0.4 15 Magnitude Best design 10 15 20 0.8 15 10 5 0 (secs) 1 20 Delay 5 0 5 X 10 15 0.1 20 20 0.8 15 0.6 Best design X 0.4 0.2 15 20 0 0 (secs) 1 10 (Not enough signal) 10 0.2 Best design 0.6 Best design X 5 0.3 10 0 10 0.4 15 0.2 0.1 5 0.5 20 0.3 X 5 Gap (secs) Sd of hot-warm 5 0 0.4 0.2 (Not enough signal) Block (secs) 5 10 15 20 0 Optimum event design 0.5 (Not enough signal) ____ magnitudes ……. delays uniform . . . . . . . . . random .. . ... .. . concentrated : 0.4 Sd of effect (secs for delays) 0.3 0.2 12 secs best for magnitudes 0.1 0 5 15 7 secs best for 10 delays Average time between events (secs) 20 How many subjects? Largest portion of variance comes from the last stage i.e. combining over subjects: sdrun2 sdsess2 sdsubj2 nrun nsess nsubj + nsess nsubj + nsubj If you want to optimize total scanner time, take more subjects. What you do at early stages doesn’t matter very much! Features special to FMRISTAT / BRAINSTAT Bias correction for AR coefficients Df boosting due to smoothing: P-value adjustment for: AR coefficients random/fixed effects variance peaks due to small FWHM (DLM) clusters due to spatially varying FWHM Delays analysed the same way as magnitudes Sd of effects before collecting data Our entry in the Functional Imaging Analysis contest Jonathan Taylor Stanford Keith Worsley McGill Why a Functional Imaging Analysis Contest (FIAC)? Competing packages produce slightly different results, which is “correct”? Simulated data? Real data, compare analyses “Contest” session at 2005 Human Brain Map conference 9 entrants Results in a special issue of Human Brain Mapping in May, 2006 The main participants SPM (Statistical Parametric Mapping, 1993), University College, London, “SAS”, (MATLAB) AFNI (1995), NIH, more display and manipulation, not much stats (C) FSL (2000), Oxford, the “upstart” (C) …. FMRISTAT (2001), McGill, stats only (MATLAB) BRAINSTAT (2005), Stanford/McGill, Python version of FMRISTAT FIAC paradigm 16 subjects 4 runs per subject 4 conditions per run 2 runs: event design 2 runs: block design Same sentence, same speaker Same sentence, different speaker Different sentence, same speaker Different sentence, different speaker 3T, 191 frames, TR=2.5s Response Events 0.4 0.2 0 -0.2 0 50 100 150 200 250 300 350 400 450 500 350 400 450 500 Beginning of block/run Blocks 0.4 0.2 0 -0.2 0 50 100 150 200 250 Seconds 300 Design matrix for block expt B1, B2 are basis functions for magnitude and delay: 1st snt in block S snt, S spk, B1 S snt, S spk, B2 S snt, D spk, B1 S snt, D spk, B2 D snt, S spk, B1 D snt, S spk, B2 D snt, D spk, B1 D snt, D spk, B2 Constant Linear Quadratic Cubic Spline Whole brain avg 1st level analysis Motion and slice time correction (using FSL) 5 conditions 3 contrasts Beginning of block/run Same sent, Same sent, same speak diff speak Diff sent, same speak Diff sent, diff speak Sentence Speaker 0 0 -0.5 -0.5 -0.5 0.5 0.5 -0.5 0.5 0.5 1 -1 -1 1 Interaction 0 Smoothing of temporal autocorrelation to control the effective df Magnitude sd (relative to error) Efficiency 2 1.5 Sd of contrasts (lower is better) for a single run, assuming additivity of responses Event Block 1 0.5 0 Diff sente Diff speak • For the magnitudes, event and block have similar efficiency • For the delays, event is much better. Interac Delay sd (seconds) 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Event Block Diff sente Diff speak Interac 2nd level analysis Analyse events and blocks separately Register contrasts to Talairach (using FSL) Bad registration on 2 subjects - dropped Combine 2 runs using fixed FX 3rd level analysis Combine remaining 14 subjects using random FX 3 contrasts × event/block × magnitude/delay = 12 Threshold using best of Bonferroni, random field theory, and discrete local maxima (new!) Part of slice z = -2 mm Magnitude (%BOLD), diff - same sentence, event experiment Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Mixed effects 15 Left 2 1 0 Ef Right -1 Random /fixed effects sd smoothed 7.0105mm 1.5 -2 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: min fMRI > 6214 1 Left Sd Right 271 272 271 265 264 132 270 275 269 274 248 256 264 278 1 0 0.5 5 FWHM (mm) 15 40 Left -50 T 0 x (mm) df 0.5 10 0 5 50 Ant. Post. Right P=0.05 threshold for local maxima is +/- 5.68 -5 -40-20 0 y (mm) 0 Magnitude (%BOLD), diff - same sentence, block experiment Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Mixed effects 15 Left 2 1 0 Ef Right -1 Random /fixed effects sd smoothed 7.103mm 1.5 -2 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: min fMRI > 5904 1 Left Sd Right 202 202 204 205 204 203 201 202 200 206 205 202 204 200 1 0 0.5 5 FWHM (mm) 15 40 Left -50 T 0 x (mm) df 0.5 10 0 5 50 Ant. Post. Right P=0.05 threshold for local maxima is +/- 5.67 -5 -40-20 0 y (mm) 0 Delay shift (secs), diff - same sentence, event experiment Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Mixed effects 15 Left 0.2 0.1 0 -0.1 -0.2 Ef Right Random /fixed effects sd smoothed 10.6778mm 1.5 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: magnitude, stimulus average, T statistic > 5 Left 0.4 Sd 1 0.2 Right df 271 272 271 265 264 132 270 275 269 274 248 256 264 278 0 0.5 40 FWHM (mm) 15 T 0 -2 Ant. Post. Right P=0.05 threshold for local maxima is +/- 4.31 -50 x (mm) Left 2 10 0 5 50 -40-20 0 y (mm) 0 Delay shift (secs), diff - same sentence, block experiment Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Mixed effects 15 Left 1 0.5 0 Ef Right -0.5 Random /fixed effects sd smoothed 8.8952mm 1.5 -1 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: magnitude, stimulus average, T statistic > 5 2 Left 1.5 Sd 1 1 0.5 Right df 202 202 204 205 204 203 201 202 200 206 205 202 204 200 0 0.5 40 FWHM (mm) 15 T 0 -2 Ant. Post. Right P=0.05 threshold for local maxima is +/- 4.3 -50 x (mm) Left 2 10 0 5 50 -40-20 0 y (mm) 0 Event Block Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Magnitude (%BOLD), diff - same sentence, block experiment Mixed effects 15 1 3 4 6 7 8 9 10 11 12 13 14 Left Left 0 0 Random /fixed effects sd smoothed 7.0105mm 1.5 -2 1 1 Left 271 265 264 132 270 275 269 274 248 256 264 278 Sd 0.5 1 0 0.5 40 Right Right 272 Random /fixed effects sd smoothed 7.103mm 1.5 -2 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: min fMRI > 5904 Left Sd 271 -1 Right Right -1 2 1 Ef Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: min fMRI > 6214 df Mixed effects 15 1 Ef Magnitude Subj 0 2 df 202 202 204 205 204 203 201 202 200 206 205 202 204 200 x (mm) 10 FWHM (mm) 15 -50 T 0 0.5 Left Left 0 0 5 -50 T 1 40 FWHM (mm) 15 5 0.5 0 5 x (mm) Magnitude (%BOLD), diff - same sentence, event experiment 50 1 3 4 6 7 8 9 10 11 12 13 14 -40-20 0 y (mm) P=0.05 threshold for local maxima is +/- 5.67 Delay shift (secs), diff - same sentence, block experiment Mixed effects 15 Subj 0 1 3 4 6 7 8 9 10 11 12 13 14 Mixed effects 15 Left Left 0.2 0.1 0 -0.1 -0.2 Ef -0.5 2 1.5 Sd 1 0.2 271 272 271 265 264 132 270 275 269 274 248 256 264 278 1 0.5 0 Right Right df Random /fixed effects sd smoothed 8.8952mm 1.5 -1 Left Left 1 0.5 40 df 202 202 204 205 204 203 201 202 200 206 205 202 204 200 0 0.5 40 -2 10 T 0 2 -50 0 5 50 0 P=0.05 threshold for local maxima is +/- 4.3 Ant. -40-20 0 y (mm) -2 Post. Right Ant. Post. Right P=0.05 threshold for local maxima is +/- 4.31 x (mm) 0 -50 Left Left T FWHM (mm) 15 x (mm) FWHM (mm) 15 2 0 1 Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: magnitude, stimulus average, T statistic > 5 0.4 Sd -40-20 0 y (mm) 0 Right Right Random /fixed effects sd smoothed 10.6778mm 1.5 -5 0.5 Ef Slice range is -74<x<70mm, -46<y<4mm, z=-2mm; Contour is: magnitude, stimulus average, T statistic > 5 Delay 0 Ant. Subj 0 5 50 Post. Right Delay shift (secs), diff - same sentence, event experiment -5 Ant. Post. Right P=0.05 threshold for local maxima is +/- 5.68 10 0 10 0 5 50 -40-20 0 y (mm) 0 Delays in different – same sentence Events: 0.14±0.04s; Blocks: 1.19±0.23s Both significant, P<0.05 (corrected) (!?!) Answer: take a look at blocks: Greater magnitude Best fitting block Greater delay Different sentence (sustained interest) Same sentence (lose interest) SPM FMRISTAT/ BRAINSTAT Magnitude increase for Sentence, Event Sentence, Block Sentence, Combined Speaker, Combined at (-54,-14,-2) Magnitude decrease for Sentence, Block Sentence, Combined at (-54,-54,40) Delay increase for Sentence, Event at (58,-18,2) inside the region where all conditions are activated Conclusions Greater %BOLD response for different – same sentences (1.08±0.16%) different – same speaker (0.47±0.0.8%) Greater latency for different – same sentences (0.148±0.035 secs) The main effects of sentence repetition (in red) and of speaker repetition (in blue). 1: Meriaux et al, Madic; 2: Goebel et al, Brain voyager; 3: Beckman et al, FSL; 4: Dehaene-Lambertz et al, SPM2. z=-12 z=2 2 3 z=5 1 1,4 3 Fmristat/ Brainstat: combined block and event, threshold at T>5.67, P<0.05. 3 3 1 3 Functional connectivity Measured by the correlation between residuals at every pair of voxels (6D data!) Activation only Voxel 2 ++ + +++ Correlation only Voxel 2 Voxel 1 + + ++ + Voxel 1 + Local maxima are larger than all 12 neighbours P-value can be calculated using random field theory Good at detecting focal connectivity, but PCA of residuals x voxels is better at detecting large regions of co-correlated voxels |Correlations| > 0.7, P<10-10 (corrected) First Principal Component > threshold