image data Inference on SPMs: Random Field Theory & Alternatives parameter estimates design matrix kernel realignment & motion correction Thomas Nichols, Ph.D. Department of Statistics & Warwick Manufacturing Group University of Warwick General Linear Model model fitting statistic image smoothing Thresholding & Random Field Theory normalisation Warwick fMRI Group March 18, 2010 2 Corrected thresholds & p-values Outline • • • • • • Statistical Parametric Map anatomical reference 1 Outline Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate • • • • • • Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate 3 Assessing Statistic Images Where’s the signal? High Threshold t > 5.5 Med. Threshold t > 3.5 4 Blue-sky inference: What we’d like • Don’t threshold, model the signal! Low Threshold t > 0.5 – Signal location? • Estimates and CI’s on (x,y,z) location – Signal magnitude? • CI’s on % change Good Specificity Poor Power (risk of false negatives) Poor Specificity (risk of false positives) Good Power ...but why threshold?! – Spatial extent? space • Estimates and CI’s on activation volume • Robust to choice of cluster definition • ...but this requires an explicit spatial model 6 1 Blue-sky inference: What we need Real-life inference: What we get • Need an explicit spatial model • No routine spatial modeling methods exist • Signal location – Local maximum – no inference – Center-of-mass – no inference – High-dimensional mixture modeling problem – Activations don’t look like Gaussian blobs • Sensitive to blob-defining-threshold • Signal magnitude • Need realistic shapes, sparse – Local maximum intensity – P-values (& CI’s) – Some initial work • Spatial extent • Hartvig et al., Penny et al., Xu et al. – Not part of mass-univariate framework – Cluster volume – P-value, no CI’s • Sensitive to blob-defining-threshold 7 Voxel-level Inference 8 Cluster-level Inference • Retain voxels above α-level threshold uα • Gives best spatial specificity • Two step-process – Define clusters by arbitrary threshold uclus – Retain clusters larger than α-level threshold kα – The null hyp. at a single voxel can be rejected uclus uα space space Significant Voxels No significant Voxels 9 Cluster-level Inference kα Cluster significant kα 10 Set-level Inference • Typically better sensitivity • Worse spatial specificity • Count number of blobs c – Minimum blob size k • Worst spatial specificity – The null hyp. of entire cluster is rejected – Only means that one or more of voxels in cluster active – Only can reject global null hypothesis uclus uclus space Cluster not significant Cluster not significant kα kα space Cluster significant 11 k k Here c = 1; only 1 cluster larger than k 12 2 Hypothesis Testing Outline • • • • • • • Null Hypothesis H0 • Test statistic T Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate uα – t observed realization of T • α level α – Acceptable false positive rate Null Distribution of T – Level α = P( T>uα | H0 ) – Threshold uα controls false positive rate at level α • P-value t – Assessment of t assuming H0 – P( T > t | H0 ) • Prob. of obtaining stat. as large or larger in a new experiment 13 – P(Data|Null) not P(Null|Data) P-val Null Distribution of T 14 MTP Solutions: Measuring False Positives Multiple Testing Problem • Which of 100,000 voxels are sig.? • Familywise Error Rate (FWE) – α=0.05 ⇒ 5,000 false positive voxels – Familywise Error • Existence of one or more false positives • Which of (random number, say) 100 clusters significant? – α=0.05 ⇒ 5 false positives clusters t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 – FWE is probability of familywise error • False Discovery Rate (FDR) t > 6.5 – FDR = E(V/R) – R voxels declared active, V falsely so • Realized false discovery rate: V/R 15 16 MTP Solutions: Measuring False Positives FWE MTP Solutions: Bonferroni • For a statistic image T... • Familywise Error Rate (FWE) – Ti – Familywise Error ith voxel of statistic image T ...use α = α0/V • Existence of one or more false positives – FWE is probability of familywise error – α0 FWE level (e.g. 0.05) – V number of voxels – uα α-level statistic threshold, P(Ti ≥ uα) = α • False Discovery Rate (FDR) – FDR = E(V/R) – R voxels declared active, V falsely so Conservative under correlation • Realized false discovery rate: V/R 17 Independent: Some dep.: Total dep.: V tests ? tests 1 test 18 3 SPM approach: Random fields… Outline • • • • • • • Consider statistic image as lattice representation of a continuous random field • Use results from continuous random field theory Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate ≈ lattice represtntation 19 20 FWE MTP Solutions: Controlling FWE w/ Max FWE MTP Solutions: Random Field Theory • FWE & distribution of maximum FWE • Euler Characteristic χu = P(FWE) = P( ∪i {Ti ≥ u} | Ho) = P( maxi Ti ≥ u | Ho) – Topological Measure • #blobs - #holes – At high thresholds, just counts blobs • 100(1-α)%ile of max distn controls FWE – where FWE = P( maxi Ti ≥ uα | Ho) = α – FWE = P(Max voxel ≥ u | Ho) = P(One or more blobs | Ho) ≈ P(χu ≥ 1 | Ho) Never more than 1 blob ≈ E(χu | Ho) uα = F-1max (1-α) No holes α uα . 21 RFT Details: Expected Euler Characteristic E(χu) ≈ λ(Ω) |Λ|1/2 (u 2 -1) exp(-u 2/2) / (2π)2 – Ω → Search region Ω ⊂ R3 – λ(Ω) → volume – |Λ|1/2 → roughness * Stationary – Results valid w/out stationary – More accurate when stat. holds 22 Sets Suprathreshold Random Field Theory Smoothness Parameterization • E(χu) depends on |Λ|1/2 – Λ roughness matrix: • Assumptions – Multivariate Normal – Stationary* – ACF twice differentiable at 0 Threshold Random Field Only very upper tail approximates 1-Fmax(u) 23 • Smoothness parameterized as Full Width at Half Maximum – FWHM of Gaussian kernel needed to smooth a white noise random field to roughness Λ FWHM Autocorrelation Function 24 4 Random Field Theory Smoothness Parameterization • Smoothness est’d from standardized residuals • RESELS – Resolution Elements – 1 RESEL = FWHMx × FWHMy × FWHMz – RESEL Count R • R = λ(Ω) √ |Λ| = (4log2)3/2 λ(Ω) / ( FWHMx × FWHMy × FWHMz ) • Volume of search region in units of smoothness – Eg: 10 voxels, 2.5 FWHM 4 RESELS 1 2 1 3 4 5 6 2 7 3 8 9 Random Field Theory Smoothness Estimation – Variance of gradients – Yields resels per voxel (RPV) • RPV image 10 – Local roughness est. – Can transform in to local smoothness est. 4 • Beware RESEL misinterpretation • FWHM Img = (RPV Img)-1/D • Dimension D, e.g. D=2 or 3 – RESEL are not “number of independent ‘things’ in the image” • See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res. . 25 RFT Details: Unified Formula Random Field Intuition • General form for expected Euler characteristic • χ2, F, & t fields • restricted search regions • D dimensions • • Corrected P-value for voxel value t Pc = P(max T > t) ≈ E(χt) ≈ λ(Ω) |Λ|1/2 t2 exp(-t2/2) E[χu(Ω)] = Σd Rd (Ω) ρd (u) Rd (Ω): d-dimensional Minkowski functional of Ω • Statistic value t increases – function of dimension, space Ω and smoothness: – Pc decreases (but only for large t) R0(Ω) R1(Ω) R2(Ω) R3(Ω) • Search volume increases – Pc increases (more severe MTP) • Smoothness increases (roughness |Λ| 1/2 – E(S) = E(N)/E(L) – S cluster size – N suprathreshold volume λ({T > uclus}) – L number of clusters • E(N) = λ(Ω) P( T > uclus ) • E(L) ≈ E(χu) – Assuming no holes χ(Ω) Euler characteristic of Ω resel diameter resel surface area resel volume ρd (Ω): d-dimensional EC density of Z(x) – function of dimension and threshold, specific for RF type: E.g. Gaussian RF: ρ0(u) = 1- Φ(u) ρ1(u) = (4 ln2)1/2 exp(-u2/2) / (2π) ρ2(u) = (4 ln2) exp(-u2/2) / (2π)3/2 ρ3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2π)2 ρ4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2π)5/2 Ω 27 Random Field Theory Cluster Size Tests • Expected Cluster Size = = = = decreases) – Pc decreases (less severe MTP) 26 28 RFT Cluster Inference & Stationarity 5mm FWHM 10mm FWHM • Problem w/ VBM – Standard RFT result assumes stationarity, constant smoothness – Assuming stationarity, false positive clusters will be found in extra-smooth regions – VBM noise very non-stationary VBM: Image of FWHM Noise Smoothness Nonstationary noise… …warped to stationarity • Nonstationary cluster inference 15mm FWHM – Must un-warp nonstationarity – Reported but not implemented • Hayasaka et al, NeuroImage 22:676– 687, 2004 – Now available in SPM toolboxes 29 • http://fmri.wfubmc.edu/cms/software#NS • Gaser’s VBM toolbox (w/ recent update!) 30 5 Random Field Theory Cluster Size Distribution Random Field Theory Cluster Size Corrected P-Values • Previous results give uncorrected P-value • Corrected P-value • Gaussian Random Fields (Nosko, 1969) – Bonferroni – D: Dimension of RF • Correct for expected number of clusters • Corrected Pc = E(L) Puncorr • t Random Fields (Cao, 1999) – Poisson Clumping Heuristic (Adler, 1980) – B: Beta distn – U’s: χ2’s – c chosen s.t. E(S) = E(N) / E(L) • Corrected Pc = 1 - exp( -E(L) Puncorr ) 31 32 Random Field Theory Limitations Real Data Lattice Image Data • Sufficient smoothness • fMRI Study of Working Memory – FWHM smoothness 3-4× voxel size (Z) – More like ~10× for low-df T images – 12 subjects, block design – Item Recognition Random – Estimate is biased when images not sufficiently Continuous Field smooth • Multivariate normality – Difference image, A-B constructed for each subject – One sample t test 33 Real Data: RFT Result yes Baseline ... ... N XXXXX no 34 Outline • Threshold • • • • • • -log10 p-value – S = 110,776 – 2 × 2 × 2 voxels 5.1 × 5.8 × 6.9 mm FWHM – u = 9.870 – 5 voxels above the threshold – 0.0063 minimum FWE-corrected p-value ... D UBKDA • Second Level RFX – Virtually impossible to check • Several layers of approximations • Stationary required for cluster size results • Result ... Marshuetz et al (2000) • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond ≈ • Smoothness estimation Active 35 Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate 36 6 Permutation Test Toy Example Nonparametric Inference • Parametric methods – Assume distribution of statistic under null hypothesis – Needed to find P-values, uα • Data from V1 voxel in visual stim. experiment 5% Parametric Null Distribution • Nonparametric methods – Use data to find distribution of statistic under null hypothesis – Any statistic! A: Active, flashing checkerboard B: Baseline, fixation 6 blocks, ABABAB Just consider block averages... A B A B A B 103.00 90.48 99.93 87.83 99.76 96.06 • Null hypothesis Ho 5% Nonparametric Null Distribution 37 – No experimental effect, A & B labels arbitrary • Statistic – Mean difference Permutation Test Toy Example 38 Permutation Test Toy Example • Under Ho • Under Ho – Consider all equivalent relabelings – Consider all equivalent relabelings – Compute all possible statistic values AAABBB ABABAB BAAABB BABBAA AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB ABABBA BAABAB BBAAAB AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB ABBAAB BAABBA BBAABA AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA ABBABA BABAAB BBABAA AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB ABBBAA BABABA BBBAAA ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82 39 Permutation Test Toy Example 40 Permutation Test Toy Example • Under Ho • Under Ho – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82 ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82 41 42 7 Permutation Test Toy Example Controlling FWE: Permutation Test • Under Ho • Parametric methods – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution – Assume distribution of max statistic under null hypothesis • Nonparametric methods -8 -4 0 4 8 43 Permutation Test Other Statistics 5% Parametric Null Max Distribution – Use data to find distribution of max statistic 5% under null hypothesis – Again, any max statistic! Nonparametric Null Max Distribution 44 Permutation Test Smoothed Variance t • Collect max distribution • Collect max distribution – To find threshold that controls FWE – To find threshold that controls FWE • Consider smoothed variance t statistic • Consider smoothed variance t statistic – To regularize low-df variance estimate mean difference variance 45 Permutation Test Smoothed Variance t • Requires only assumption of exchangeability – To find threshold that controls FWE – Under Ho, distribution unperturbed by permutation – Allows us to build permutation distribution • Consider smoothed variance t statistic smoothed variance 46 Permutation Test Strengths • Collect max distribution mean difference t-statistic Smoothed Variance t-statistic • Subjects are exchangeable – Under Ho, each subject’s A/B labels can be flipped • fMRI scans not exchangeable under Ho – Due to temporal autocorrelation 47 48 8 Permutation Test Limitations Permutation Test Example • Computational Intensity • fMRI Study of Working Memory – Analysis repeated for each relabeling – Not so bad on modern hardware – 12 subjects, block design – Item Recognition • No analysis discussed below took more than 3 hours • Implementation Generality – Each experimental design type needs unique code to generate permutations ... • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond 49 Permutation Test Example – Difference image, A-B constructed for each subject – One sample, smoothed variance t test ... D UBKDA yes Baseline ... • Second Level RFX • Not so bad for population inference with t-tests ... N XXXXX no 50 Permutation Test Example • Permute! • Compare with Bonferroni – 212 = 4,096 ways to flip 12 A/B labels – For each, note maximum of t image . Permutation Distribution Maximum t Active Marshuetz et al (2000) – α = 0.05/110,776 • Compare with parametric RFT – 110,776 2×2×2mm voxels – 5.1×5.8×6.9mm FWHM smoothness – 462.9 RESELs Maximum Intensity Projection 51 Thresholded t 52 Does this Generalize? RFT vs Bonf. vs Perm. uPerm = 7.67 58 sig. vox. uRF = 9.87 uBonf = 9.80 5 sig. vox. t11 Statistic, Nonparametric Threshold t11 Statistic, RF & Bonf. Threshold Test Level vs. t11 Threshold Smoothed Variance t Statistic, Nonparametric Threshold 53 378 sig. vox. 9 RFT vs Bonf. vs Perm. Reliability with Small Groups • Consider n=50 group study – Event-related Odd-Ball paradigm, Kiehl, et al. • Analyze all 50 – Analyze with SPM and SnPM, find FWE thresh. • Randomly partition into 5 groups 10 – Analyze each with SPM & SnPM, find FWE thresh • Compare reliability of small groups with full – With and without variance smoothing . Skip SPM t11: 5 groups of 10 vs all 50 5% FWE Threshold T>10.93 10 subj 2 8 11 15 18 35 41 43 44 50 T>10.69 10 subj 4 5 10 22 31 33 36 39 42 47 T>11.04 10 subj 1 3 20 23 24 27 28 32 34 40 SnPM t: 5 groups of 10 vs. all 50 5% FWE Threshold T>11.01 T>7.06 10 subj 10 subj 9 13 14 16 19 21 25 29 30 45 T>10.10 T>4.66 10 subj all 50 6 7 12 17 26 37 38 46 48 49 2 8 11 15 18 35 41 43 44 50 T>6.49 10 subj 57 4 5 10 22 31 33 36 39 42 47 SnPM SmVar t: 5 groups of 10 vs. all 50 5% FWE Threshold T>4.69 10 subj 2 8 11 15 18 35 41 43 44 50 T>4.84 10 subj 4 5 10 22 31 33 36 39 42 47 T>5.04 10 subj 1 3 20 23 24 27 28 32 34 40 T>4.64 10 subj 6 7 12 17 26 37 38 46 48 49 56 T>4.57 10 subj 9 13 14 16 19 21 25 29 30 45 T>8.28 10 subj 1 3 20 23 24 27 28 32 34 40 T>6.19 10 subj 6 7 12 17 26 37 38 46 48 49 T>6.3 10 subj 9 13 14 16 19 21 25 29 30 45 T>9.00 T>4.09 all 50 Arbitrary thresh 58 of 9.0 Outline • • • • • • Orientation Assessing Statistic images The Multiple Testing Problem Random Field Theory & FWE Permutation & FWE False Discovery Rate T>9.00 all 50 Arbitrary thresh 59 of 9.0 60 10 MTP Solutions: Measuring False Positives False Discovery Rate • For any threshold, all voxels can be cross-classified: • Familywise Error Rate (FWE) – Familywise Error • Existence of one or more false positives – FWE is probability of familywise error Accept Null Reject Null Null True V0A V0R m0 Null False V1A V1R m1 NA NR V • Realized FDR rFDR = V0R/(V1R+V0R) = V0R/NR • False Discovery Rate (FDR) – FDR = E(V/R) – R voxels declared active, V falsely so – If NR = 0, rFDR = 0 • But only can observe NR, don’t know V1R & V0R • Realized false discovery rate: V/R – We control the expected rFDR FDR = E(rFDR) 61 False Discovery Rate Illustration: 62 Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% Percentage of Null Pixels that are False Positives Noise 9.5% Control of Familywise Error Rate at 10% Signal Occurrence of Familywise Error FWE Control of False Discovery Rate at 10% Signal+Noise 6.7% 63 0 i/V × q 1 i/V 65 P-value threshold when no signal: α/V Ordered p-values p(i) 1 p(i) p-value • Reject all hypotheses corresponding to p(1), ... , p(r). JRSS-B (1995) 57:289-300 0 p(i) ≤ i/V × q 8.7% 64 Adaptiveness of Benjamini & Hochberg FDR Benjamini & Hochberg Procedure • Select desired limit q on FDR • Order p-values, p(1) ≤ p(2) ≤ ... ≤ p(V) • Let r be largest i such that 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives Fractional index i/V P-value threshold when all signal: α66 11 Benjamini & Hochberg Procedure Details Benjamini & Hochberg: Key Properties • Standard Result • FDR is controlled E(rFDR) ≤ q m0/V – Positive Regression Dependency on Subsets P(X1≥c1, X2≥c2, ..., Xk≥ck | Xi=xi) is non-decreasing in xi – Conservative, if large fraction of nulls false • Only required of null xi’s – Positive correlation between null voxels – Positive correlation between null and signal voxels • Adaptive • Special cases include – Threshold depends on amount of signal – Independence – Multivariate Normal with all positive correlations • More signal, More small p-values, More p(i) less than i/V × q/c(V) • Arbitrary covariance structure – Replace q by q/c(V), c(V) = Σi=1,...,V 1/i ≈ log(V)+0.5772 – Much more stringent Benjamini & Yekutieli (2001). Ann. Stat. 67 68 29:1165-1188 Controlling FDR: Varying Signal Extent p= Signal Intensity 3.0 Controlling FDR: Varying Signal Extent z= Signal Extent 1.0 p= 69 Noise Smoothness 3.0 1 Controlling FDR: Varying Signal Extent p= Signal Intensity 3.0 Signal Extent 2.0 70 Noise Smoothness 3.0 2 Controlling FDR: Varying Signal Extent z= Signal Extent 3.0 Signal Intensity 3.0 z= p = 0.000252 71 Noise Smoothness 3.0 3 Signal Intensity 3.0 z = 3.48 Signal Extent 5.0 72 Noise Smoothness 3.0 4 12 Controlling FDR: Varying Signal Extent p = 0.001628 Signal Intensity 3.0 Controlling FDR: Varying Signal Extent z = 2.94 Signal Extent 9.5 p = 0.007157 73 Noise Smoothness 3.0 5 Controlling FDR: Varying Signal Extent p = 0.019274 Signal Intensity 3.0 z = 2.45 74 Signal Extent 16.5 Noise Smoothness 3.0 6 Controlling FDR: Benjamini & Hochberg • Illustrating BH under dependence z = 2.07 – Extreme example of positive dependence 1 8 voxel image 32 voxel image 0 p-value p(i) (interpolated from 8 voxel image) Signal Intensity 3.0 Signal Extent 25.0 Noise Smoothness 3.0 i/V × q/c(V) 1 i/V 76 7 Real Data: FDR Example Conclusions • Must account for multiplicity • Threshold – Otherwise have a fishing expedition – Indep/PosDep u = 3.83 – Arb Cov u = 13.15 • FWE – Very specific, not so sensitive – Random Field Theory • Great for single-subject fMRI, EEG • Result – 3,073 voxels above Indep/PosDep u – <0.0001 minimum FDR-corrected p-value 0 75 – Nonparametric / SnPM • Much better power voxel-wise than RFT for small DF • FDR FDR Threshold = 3.83 3,073 voxels FWE Perm. Thresh. = 9.87 77 7 voxels – Less specific, more sensitive – Interpret with care! • FP risk is over whole set of surviving voxels 78 13 References • Most of this talk covered in these papers TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003. TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001. CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002. 79 14