nd 2 Level GLM Emily Falk, Ph.D. 1 (De-noise) Slice Timing Correct Predictors Realign Smooth Acquire Functional s Y Determine Scanning Parameters CoRegister X y = Xβ + ε 1st level (Subject ) GLM β Acquire Structurals (T1) βhow - βwhy Normalize Contrast Template 2nd level (Group) GLM Threshold 2 Groups of Subjects • So far: Analyzing each individual voxel from one person • How do subjects combine data from groups of subjects? – Often referred to as 2nd-level random effects analysis • Basic approach: – Normalize SPMs from each subject into a standard space – Test whether statistic from a given voxel is significantly different from 0 across subjects – Correct for multiple comparisons 3 An Example - 4 Region A: β1-β2 (repeat for all regions) Subject 1: 32 Subject 2: 18 Subject 3: -4 Subject 4: 45 Subject 5: 23 Mean : 22.8 * 5 (Aron et al., 2005) Fixed and Random Effects • Fixed effect – Always the same, from experiment to experiment, levels are not draws from a random variable – Sex (M/F) – Drug type (Prozac) • Random effect – Levels are not randomly sampled from a population – Subject – Day, in a longitudinal design • If effect is treated as fixed, error terms in model do not include variability across levels – Cannot generalize to unobserved levels – e.g., if subject is fixed, cannot generalize to new subjects 6 Courtesy of Tor Wager Fixed vs. Random Effects: Bottom Line • If I treat subject as a fixed effect, the error term reflects only scan-to-scan variability, and the degrees of freedom are determined by the number of observations (scans). • If I treat subject as a random effect, the error term reflects the variability across subjects, which includes two parts: – Error due to scan-to-scan variability – Error due to subject-to-subject variability and degrees of freedom are determined by the number of subjects. 7 Courtesy of Tor Wager Random Effects Analysis • Subjects treated as “random” effect – Randomly sampled from population of interest • Sample is used to make estimates of population effects • Results lead to inferences on the population 8 Random vs. Fixed Effects • Whereas some early studies used fixed effects models, virtually all current studies use random effects models • Use random effects • All analysis that follow treat subject as a random effect 9 More specifically… 10 Voxel-Wise 2nd-Level Analysis Model specification Subject Parameter estimation single voxel subj series Hypothesis Statistic Statistic at that voxel SPM 11 Model Specification: Building the Design Matrix Y = Xb + e éY1 ù é1ù é e1 ù ê ú êú ê ú Y 1 ê 2 ú = ê ú ´ [ b ] + êe 2 ú 0 ê ú êú ê ú ê ú êú ê ú Y 1 ë nû ëû ëe n û Subjects Stat Value Design matrix Residuals Model parameters = X intercept [ b0 ] + 12 Parameter Estimation/Model Fitting Find values that produce best fit to observed data y = 0 + ERROR 13 The SPM Way of Plotting the Variables y X = e X [ b0 ] + 14 Group Analysis Using Summary Statistics: A simple kind of ‘random effects’ model The “Holmes and Friston” approach (HF) First level Data Design Matrix Second level Contrast Images SPM(t) One-sample t-test @ 2nd level 15 Courtesy of Tor Wager Summary Statistic Approach: 2 Sample t-test from Mumford & Nichols, 2006 16 Summary Statistic Approach: Inference • In a 1-sample t-test, the contrast C = 1 derives the group mean – If images taken to a second level represent the contrast A – B, then • C = 1 is the mean difference (A > B) • C = -1 is the mean difference (B > A) • Dividing by the standard error of the mean yields a tstatistic – Degrees of freedom is N – 1, where N is the number of subjects • Comparison of the t-statistic with the t-distribution yields a p-value – P(DataNull) 17 Tech Note: Sufficiency of Summary Statistic Approach • With simple t-tests under the summary statistic approach, withinsubject variance is assumed to be homogenous (within a group) – SPM’s approach, but other packages can act differently • If all subjects (within a group) have equal within-subject variance (homoscedastic), this is ok • If within-subject variance differs among subjects (heteroscedastic), this may lead to a loss of precision – May want to weight individuals as a function of within-subject variability • Practically speaking, the simple approach is good enough (Mumford & Nichols, 2009, NeuroImage) – – – – Inferences are valid under heteroscedasticity Slightly conservative under heteroscedasticity Near optimal sensitivity under heteroscedasticity Computationally efficient 18 • For extended example of ways that you could do this wrong, check out Derek Nee’s second level GLM lecture from last year 19 The GLM Family DV One continuous Repeated measures Predictors Analysis Continuous One predictor Continuous Two+ preds Categorical 1 pred., 2 levels Categorical 1 p., 3+ levels Categorical 2+ predictors Two measures, one factor Regression More than two measures Multiple Regression 2-sample t-test One-way ANOVA Factorial ANOVA General Linear Model Paired t-test Repeated measures ANOVA 20 Correlations • To perform mass bi-variate correlations, use SPM’s “Multiple Regression” option with a single co-variate – Can also specify multiple co-variates and perform true multiple regression • Be cautious of multi-collinearity! • Correlations are done voxel-wise • % of explained variance necessary to reach significance with appropriate correction for multiple comparisons may be very high – Interpret location, not effect size (more later) • May be more realistic to perform correlations on a small set of regions-of-interest (more later) 21 Examples • First level: Why > How – Regression with… • • • • trait empathy trait narcissism scan on weekday or weekend friends on facebook • First level: Loved one > Other – Regression with… • relationship closeness • relationship satisfaction • age 22 Example • Costly exclusion predicts susceptibility to peer influence 23 Falk et al., 2013, JAH Correlations and Outliers Null-hypothesis data, N = 50 Same data, with one outlier 24 Courtesy of Tor Wager Robust Regression • Outliers can be problematic, especially for correlations • Robust regression reduces the impact of outliers – 1) Weight data by inverse of leverage – 2) Fit weighted least squares model – 3) Scale and weight residuals – 4) Re-fit model – 5) Iterate steps 2-4 until convergence – 6) Adjust variances or degrees of freedom for p-values • Can be applied to simple group results or correlations – Whole brain: http://wagerlab.colorado.edu/ – ROI: whatever software you prefer (more later) 25 Null-hypothesis data, N = 50 Same data, with one outlier Robust IRLS solution 26 Courtesy of Tor Wager Case Study: Visual Activation Visual responses 27 Courtesy of Tor Wager (De-noise) Slice Timing Correct Predictors Realign Smooth Acquire Functional s Y Determine Scanning Parameters CoRegister X y = Xβ + ε 1st level (Subject ) GLM β Acquire Structurals (T1) βhow - βwhy Normalize Contrast Template 2nd level (Group) GLM Threshold 28 (De-noise) Slice Timing Correct Predictors Realign Smooth Acquire Functional s Y Determine Scanning Parameters CoRegister X y = Xβ + ε 1st level (Subject ) GLM β Acquire Structurals (T1) βhow - βwhy Normalize Contrast Template 2nd level (Group) GLM Threshold 29 Up Next… • Hypothesis Testing • Levels of Inference • Multiple Comparisons – Family-wise Error Correction – False-Discovery Rate Correction – Non-parametric Correction 30 Hypothesis Testing • Null Hypothesis H0 – No effect • T-test: No difference from zero • F-test: No variance explained • α level – Set to an acceptable false positive rate – Level α = P( T > μα | H0) – Threshold μα controls false positive rate at level α • P-value – Test statistics are compared with appropriate distributions • Changes as a function of degrees of freedom • T-distribution: bell-shaped • F-distribution: skewed – Assessment of probability of test statistic assuming H0 – P(Data | Null) • But not P(Null | Data)! 31 Information for Making Inferences on Activation • Where? Signal location – Local maximum – no inference in SPM • Could extract peak coordinates and test (e.g., Woods lab, Ploghaus, 1999) • How strong? Signal magnitude – Local contrast intensity – Main thing tested in SPM • How large? Spatial extent – Cluster volume – Can get p-values from SPM • Sensitive to blob-defining-threshold • When? Signal timing – No inference in SPM; but see Aguirre 1998; Bellgowan 2003; Miezin et al. 2000, Lindquist & Wager, 2007 32 Unit of Analysis • Fundamental unit of analysis is voxel – GLM is run voxel-by-voxel – Statistical parametric maps (SPM’s) are calculated voxel-by-voxel • Unit of interest may instead by a “region” – Functional unit – Pool data across voxels • May also be broadly interested in the brain as a whole – Considering the brain as a whole, do these 2 conditions differ? 33 Levels of Inference • Inferences can be made at any “level” depending upon your unit of interest • Voxel-level – This/these particular voxels are significant – Most spatially specific, least sensitive • Cluster-level – These contiguous voxels together are significant – Less spatially specific, more sensitive • Set-level – The brain shows an effect – No spatial specificity, but can be most sensitive SPM’s results table shows pvalues for voxel-level, clusterlevel, and set-level tests. 34 Voxel-Level Inference • Retain voxels above α-level threshold uα • Gives best spatial specificity – The null hyp. at a single voxel can be rejected uα space Significant Voxels No significant Voxels 35 Courtesy of Tor Wager Cluster-Level Inference • Two step-process – Define clusters by arbitrary threshold uclus – Retain clusters larger than α-level threshold kα uclus space Cluster not significant kα kα Cluster significant 36 Courtesy of Tor Wager Cluster-Level Inference • Typically better sensitivity • Worse spatial specificity – The null hyp. of entire cluster is rejected – Only means that one or more voxels in cluster active uclus space Cluster not significant kα kα Cluster significant 37 Courtesy of Tor Wager Multiple Comparisons Problem • Often over 100,000 voxels in the brain – Voxel-level tests are repeated over 100,000 times – If α = 0.05 (i.e. p < 0.05), over 5,000 false positive voxels! • Need to control false positive rate at α across all tests – Otherwise, difficult to know if result is believable 38 Multiple Comparisons • Perform statistical tests at every voxel tens and tends of thousands • Quite likely that some would pass threshold by chance even if there was absolutely no effect • Need to correct for multiple comparisons. 39 Some Approaches • Bonferroni correction: Insist on p<.05/#voxels – Severely reduces sensitivity, but works with small ROIs • Gaussian random field theory: Suppose there is no effect but data is spatially smooth. What’s the chance of seeing a blob of X contiguous voxels all of which are above a threshold V? – Default approach to controlling familywise error (FWE) in SPM • False Discovery Rate (FDR): Set threshold so that less than 5% of the voxels above threshold would be false positives under null hypothesis 40 Family-Wise Error (FEW) • FWE-rate is the probability of finding one or more false positives among all hypothesis tests – If FWEα = 0.05, probability of finding one or more false positives is 5% • Based on maximum distribution – If no true positives are present, most significant voxel will exceed the threshold 5% of the time • Several approaches to control FWE 41 Bonferroni Correction • Simplest method for controlling FWE • αcorrected = α/V – α is the desired alpha level – αcorrected is the alpha level corrected for FWE – V is the number of voxels/tests When examining results in SPM, you can find the # of voxels in the statistics table (bottom of the table under Volume). Divide α by the # of voxels to determine a Bonferroni corrected threshold. • 0.05/100,000 = 0.0000005 – E.g. t(20) = 6.93 42 Bonferroni Correction: Limitations • Correction assumes that each test is independent • Data are actually spatially smooth, so not independent! • Correction tends to be overly conservative – False positives appropriately controlled – But threshold is too high to detect many true positives 43 Gaussian Random Fields • SPM’s default method of FWE correction takes into account smoothness of data • Intuition – Smooth data lower the resolution of the search space fewer comparisons less stringent correction • Assumes that an image of residuals can be descripbed by Gaussian noise convolved with a 3D kernel – Forms a Gaussian Random Field – FWHM of the kernel describes the smoothness of the data 44 Tech Note – Estimating Smoothness: RESELS • RESELS = RESolution Elements – 1 RESEL = FWHMx x FWHMy x FWHMz 1 2 1 3 4 2 5 6 7 8 3 9 10 4 voxels RESELS Note, when examining results in SPM, you can find the # of resels and FWHM in the statistics table (bottom of the table under Volume) 45 • Threshold needed to correct – Increases with greater search volume • Need more stringent correction – Decreases with greater smoothness, RESEL • Greater smoothness leads to less stringent correction 46 Gaussian Random Fields: Clusters 1) Threshold at voxel-level 5mm FWHM 2) Estimate chance of clusters of size ≥ k, taking into account Mean expected cluster size search volume smoothness Threshold -> puncorrected of cluster of size ≥k 3) Apply previously described correction pcorrected * z2 10mm FWHM 15mm FWHM 47 Courtesy of Tor Wager Gaussian Random Fields: Limitations • Requires sufficient smoothness of data – FWHM 3-4x voxel size • Performs poorly with low df – Better with df > 20 and sufficient smoothness • Tends to be conservative, especially with rough data (FWHM < 6) • Based on approximations – Approximations can be thrown off by “roughness spikes” – Approximations will vary on a contrast by contrast basis • • Different contrasts in same data will have different thresholds Typically regarded as better at individual level where df are high Select “FWE” in SPM results to threshold using Gaussian Random Field Theory. Expect a conservative threshold. 48 False-Discovery Rate • Correction of FWE ensures that false positives will be controlled per family of tests – αFWE-corrected = 0.05, 5% of contrasts (across all voxels) will have a single false positive • False-Discovery Rate (FDR) controls the number of false positives within a family of tests – αFDR-corrected = 0.05, 5% of reportedly active voxels in a contrast will be false positives • Upside: will find more true signal • Downside: will have a few false positives 49 3. – V is the # of voxels • In other words – Smallest p-value must pass Bonferroni, second smallest Bonferroni*2, third smallest Bonferroni*3, etc • i.e. i = 1: Bonferroni*1, i = 2: Bonferroni*2, i = 3: Bonferroni*3, etc. p(i) p-value 2. Establish a rate, q, of acceptable proportion of false-positives (e.g. 0.05) Sort observed p-values from smallest to largest Find max(i) such that Pi < i*(q/V) (i/V)q 0 1. 1 False-Discovery Rate: Method 0 i/V 1 – Highest such i gives threshold – If no p-value passes, threshold cannot be determined (SPM will say the threshold is t = infinity) 50 False-Discovery Rate: Limitations • Limits inference – Cannot say which activated voxels are true positives or false positives • Adaptive – Good in some cases – Maps with lots of activations (many voxels with low p) will have low thresholds – Maps with little activation (few voxels with low p) with have high thresholds or no determinable threshold • Hard to find signal in small areas Select “FDR” in SPM results to threshold using False-Discovery Rate. Threshold may be “infinity” if effects are weak or it may be very low if results are strong. 51 Simulations Noise Signal Signal+Noise 52 Courtesy of Tor Wager Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% Percentage of Null Pixels that are False Positives 9.5% Control of Familywise Error Rate at 10% FWE Occurrence of Familywise Error Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives 8.7% 53 Nonparametric Inference • Parametric methods – Assume distribution of statistic under null hypothesis – Needed to find P-values, u 5% Parametric Null Distribution • Nonparametric methods – Use data to find distribution of statistic under null hypothesis – Any statistic! 5% Nonparametric Null Distribution 54 Courtesy of Tor Wager Permutation Test: Toy Example • Under H0 – Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution -8 -4 0 4 8 55 Courtesy of Tor Wager Permutation Test: Details • Requires only assumption of exchangeability: – Under H0, distribution unperturbed by permutation • Subjects are exchangeable (good for group analysis) – Under H0, each subject’s A/B labels can be flipped • fMRI scans not exchangeable under H0 (bad for time series analysis/single subject) – Due to temporal autocorrelation On the SPM website, click on Extensions and search under Toolboxes to download SnPM 56 Courtesy of Tom Nichols Other Approaches • AlphaSim (AFNI): create random fields based on smoothness and observe rate of false positive clusters – Provides cluster-level correction – Can input appropriate voxel-level threshold and clusterextent to use with SPM • Once estimated, threshold can be used for all contrasts • Threshold-free Cluster Enhancement (FSL): combines signal strength and voxel extent into a single measure – Provides cluster-level correction without need to first specify voxel-level threshold – Avoids ambiguities that can arise from getting different results at different voxel-level thresholds with other cluster methods 57 Alpha Sim • FDR correction based on simulation… – “All whole-brain analyses were thresholded using an uncorrected p-value of .001 combined with an extent threshold of 21 contiguous voxels, corresponding to a false-positive discovery rate of 5% across the whole brain as estimated by Monte Carlo simluation implemented using AlphaSim in the software package AFNI (http://afni.nimh.gov/afni/doc/manual/AlphaSim)”. – “Whole-brain analyses were conducted using a statistical criterion of at least 21 contiguous voxels exceeding a voxel-wise threshold of p <.001. A Monte Carlo simulation (http://afni.nimh.gov/afni/doc/manual/AlphaSim) of our brain volume demonstrated that this cluster extent cutoff provided an experiment-wise threshold of p <.05, corrected for multiple comparisons.” – Cox, R. W. (1996). ANFI: Software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 29, 162-173. 58 AlphaSim: Which volume? Analysis mask Cube 59 AlphaSim: Usage > AlphaSim –nxyz [nx] [ny] [nz] –dxyz [dx] [dy] [dz] … –fwhm 8 –pthr .001 –iter 10000 -quiet Where: nx, ny, nz = resolution in each dimension (e.g. 64 64 34) dx, dy, dz = size of voxels in mms (e.g. 3 3 3) fwhm = Size of smoothing kernel (mm) pthr = voxel-level threshold Iter = number of simulations to run (usually 10k) quiet = suppress output 60 AlphaSim: Usage > AlphaSim –mask [name.hdr] –fwhm 8 –pthr .001 … –iter 10000 -quiet Where: mask = name of normalized anatomical mask (2nd level) fwhm = Size of smoothing kernel (mm) pthr = voxel-level threshold Iter = number of simulations to run (usually 10k) quiet = suppress output 61 Thresholding Summary • Most thresholding is done at either voxel or cluster level – Depends on level of inference that is of interest – Spatial precision: voxel – Regional inference • Methods differ in control over false positives and sesntivity – Bonferroni: strong control over false positives, Somewhat conservative – False-Discovery Rate: admits false positives, more sensitive – Non-parametric: most adaptive, reputedly most accurate • Next up: Whole brain search isn’t your only option • Avoid the problem! – Can use SCV’s and ROI’s to reduce/eliminate multiple comparisons issues to improve power in areas with a prior hypothesis – But conclusions and inferences are different 62 Some notes on inference in brain mapping studies 63 What Brain Mapping is Good For • Making inferences on the presence of activity, to either a) test a theory, or b) characterize the pattern of brain responses to a task • Limiting the false positive rate to a specified level • Leverage hypothesis testing to provide evidence on a variety of theories: Is Area r involved in Task x? 64 What Brain Mapping is Not Good For • Estimating effect sizes (effect strength, or predictive power) • Testing the assumptions involved in the analysis – ‘Neural’ timing and temporal profile of neural response – Link between neural activity and observed signal: Hemodynamic response profile – Appropriateness of additive linear model – Normality and homogeneity of variance (needed for valid p-values) • Building a cumulative knowledge base 65 Kinds of Inferences Hypothesis tests Is there an effect? How well can I predict…? Where are the effects? Effect size estimation: Cross-validation Spatial statistics Mixture models Key: There are tradeoffs among these goals With current analysis options, they cannot be maximized at once 66 Effect Size Estimation is Important for Development of Applications • Medicine – Predicting treatment response, diagnosis, ‘personalized medicine’, neuro-rehabilitation and prosthetics – Sensitivity and specificity of tests: relate to effect size • Law – Lie detection, guilty knowledge, tort cases; evidence on pain and cognitive deficits • Psychology and neuroscience – Testing for meaningfully large effects • Marketing, military, homeland security 67 Spatial Pattern Estimation is Important for Theory Building • Need to know both which areas are ‘active’ and which are not • Balance of false positives and false negatives important for building a cumulative science • Often ignored because of bias towards hypothesis testing and ‘strong inference’ 68 True signal Signal Noise Noise Negative Observed signal Positive Results Hypothesis Test Threshold 69 Courtesy of Tor Wager The Problem with Estimating Effect Sizes Conditioning on Significant Test Results Observed signal Results Hypothesis test Multiple comparisons Signal Noise • Conditioning on significance selects for high noise values (red/purple) • Equally true for all effect size measures: Pearson’s r, t-values, Z-scores, pvalues, etc. 70 Courtesy of Tor Wager The ‘File Drawer Problem’ Sample of ‘significant’ voxels False positive Threshold Obs. effect size (d) True positive Voxels True effect size Sample of published studies Threshold Average Observed Effect size Conditioning on publication causes bias in effect size estimate Study 1 2 3 4 True effect size (blue) 71 Courtesy of Tor Wager The ‘File Drawer Problem’: Meta-analysis of 5 Antidepressants 72 Melander et al., 2003 BMJ Regression to the Mean • Average height of a Chinese male is 5’7” • Yao Ming is 7’6” tall • If Yao had a son, is he likely to be shorter or taller than Yao? 73 Tradeoff #1: Is there… vs. How big… Hypothesis tests (inference) Is there an effect? How well can I predict…? Where are the effects? Effect size estimation Spatial statistics Mixture models Stringent multiple comparisons correction is good for inference, but bad for size estimation 74 The Problem with Estimating Effect Sizes Conditioning on Significant Test Results • More stringent multiple comparisons: less accurate estimation of effect size • Increased power: reduced selection bias – Larger effects – Larger sample sizes – Less noise Yarkoni, 2009 75 Courtesy of Tor Wager Problems with Estimating Effects Sizes Observed signal Results N = 20, p < .001 If the true correlation looks like this… r = .5 A typical ‘significant’ voxel looks like this… Hypothesis test Multiple comparisons Signal Noise Negative r = .78 Positive Why? In looking at ‘significant’ tests, we are conditioning on having a high observed effect size: r must be at least 0.67 in order to consider it! 76 So why would you ever show this? r = .78 Descriptive reporting and plotting of results: Checking statistical assumptions Brain [Task – Control] Results Pathological OK OR 0 Event-related Pathological Brain [Task – Control] OK AND Behavior Behavior 77 Non-Solutions • Omitting display of scatterplots • Using cross-validation for everything, even if theory calls for a hypothesis test • Using regions of interest only and ignoring the information in much of the brain 78 Solutions: Effect Size Estimation 1. When performing a hypothesis test, interpret the results literally: “Given the model assumptions, this brain area shows a non-zero effect.” (…Not as an estimate of effect size) 2. Increase power! 2. To estimate effect size, use ‘hold out’ data, i.e., cross-validation Unbiased estimates of true effect size 2. Select a small number of a prioi ROIs Voxels 79 Avoiding the Problem (Preview of What is Up Next) • Small Volume Correction (SVC) – If you know the region(s) you are interested in a priori, you can limit examination to just those voxels – Reduces number of voxels and thus reduces multiple comparisons correction • Region-of-interest (ROI) Analysis – If you have a strong prior and reason to believe that areas-of- interest are homogenous, can simply average signal across an ROI – Single comparison (per ROI) • WFUPickAtlas toolbox for SPM provides a good means to create anatomical masks for SVC’s or ROI’s • Downside: can only confirm prior hypotheses – May miss new discoveries! 80 Region of Interest and Other Analysis Methods 81 Signal Negative OR 0 Hypothesis test Multiple comparisons Positive Contrast: Task comparison Brain [Task – Control] Results Noise Results Observed signal Brain-behavior correlation OR Information-based mapping Predictive accuracy Noise Brain [Task – Control] True signal Chance Behavior Individual subject Effect size: d, Z, p Effect size: r, p Effect size: accuracy, p82 Whole Brain vs. ROI Precision of prior spatial information Some None Test each voxel in whole brain Test each voxel In set of regions Test each voxel Single region Lots Test average in single region Multiple comparisons correction required Stringent need very strong evidence Some need strong evidence None need less evidence 83 What is the Question? Observed data Does this brain area respond to my task? Does this one? Does this one? Does this one? 84 What is the Answer? Observed data Does this one? “Given the model assumptions, this brain area shows a non-zero effect.” (Not necessarily informative about how big the effect is.) 85 Up Next… • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction • ROI Analysis – ROI selection • Reverse Inference • Degrees of Freedom • Brain Mapping Considerations 86 Masking 87 Whole Brain vs. ROI Precision of prior spatial information Some None Test each voxel in whole brain Test each voxel In set of regions Test each voxel Single region Lots Test average in single region Multiple comparisons correction required Stringent need very strong evidence Some need strong evidence None need less evidence 88 Masking • In 1st level analysis, all voxels with a nonzero value for every subject are estimated – gray matter, white matter, csf, skull, eyeballs • Estimating a 1st level model yields a mask representing all voxels included in the model 89 Masking • Masking 1st level can eliminate unnecessary tests from your analysis, enabling better correlation for multiple comparisons 90 Masking • Masking can also be used to test more targeted hypotheses – E.g. are the areas activated by in one contrast also activated in another (independent) contrast? • Create a mask from the first spmT*.img file using ImCalc, then use the mask analysis of second 91 Process Comparison • One use of fMRI is to compare brain processes • Two main interests: – Conjunction: tasks recruit common brain areas and thus common processes – Disjunction: tasks recruit distinct brain areas and thus distinct processes 92 Conjunctions • Often interested in processes shared across tasks – Task 1 recruits regions A,B,C, and D – Task 2 recruits regions A,D,E, and F • Task 1 and Task 2 share common process(es) instantiated by regions A and D • Conjunction analysis aims to demonstrate neural overlap Select 2 or more contrasts in SPM’s results menu (ctrl+click to select multiple) to perform conjunctions Different null hypotheses can be tested. Conjunction null is the accepted standard. 93 Conjunction Analysis • SPM8 gives 2 flavors which differ in the null hypothesis that they test • Global null: assesses whether the contrasts are likely to be sampled from the null distribution – Essentially a meta-analysis – Contrasts may not be individually significant • Consider three contrasts with t-scores of 0.5, 1.1, 1.3 – – • None of significant individually But together, they reject the global null Conjunction null: logical AND – all contrasts are individually significant – Typically what researchers are interested in – Joint test can be conservative • • Each contrast must be individually significant at corrected threshold Conjunction null is generally accepted approach – SVC or ROI’s may be appropriate to overcome conservativeness at whole-brain levels Different null hypotheses can be tested. Nichols et al., 2005, NeuroImage Conjunction null is the accepted standard. 94 Conjunction: Practical Issues • Contrast A activates a wide network of regions • Contrast B activates a smaller network, which differs from A, but some voxels overlap • Is this overlap meaningful? – How much overlap would be expected by chance? 95 Conjunction: Practical Issues • Behavioral studies report low correlations between Task A and Task B • Subjects can concurrently perform Task A and Task B with little interference and additive factors analysis reveals no interactions • fMRI of the tasks reveal that they both activate the dorsal ACC • Researcher concludes that the tasks do share processes and touts the superiority of fMRI • Conclusion justified? Yarkoni et al., 2011, Nature Methods 20% of published studies find activation in the dorsal ACC Should this base-rate be taken into account? 96 Conjunction: Summary • Conjunction analysis is one means of process comparison and should likely be done against the conjunction null (rather than the global null) • Conjunctions may occur simply because one contrast is very encompassing • Conjunctions may occur in areas that sub-serve domain general processing across many tasks • Important to be mindful of these matters when interpreting conjunctions 97 Disjunctions • How to characterize areas involved in contrast A, but not contrast B? • Requirements – Active in contrast A – A>B – Not active in contrast B • Bad practice – Significant in A, not significant in B – Could be active in B, just failed to detect it! – One sample demonstrated that nearly half of neuroscience papers made this error (Nieuwenhuis et al., 2011, Nature Neuroscience) When looking at results of contrast A, use an exclusive mask of contrast B to look for voxels active in A, but not B. Note, this is not sufficient inand-of itself to determine an interaction! Be sure to directly contrast A > B, as well. 98 Disjunctions: Practical Concerns • Want: area X involved in contrast A, but not contrast B – Need: 1) 2) 3) Contrast A area X A > B area X B area X • Issue #2: how to demonstrate not active in contrast B? – Difficult, but in the least should show that area X is not significant in contrast B at a very liberal criterion • E.g. p < 0.05, uncorrected • Issue #1: how to define area X? – Voxels active in contrast A? • No! This is biased to show 1 and 2 (more on this in a moment) – Contrast orthogonal to A and B • Independent data (e.g. functional localizer) • Anatomically defined Different null hypotheses can be tested. Conjunction null is the accepted standard. 99 Outline • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction • ROI Analysis – ROI selection • Reverse Inference • Degrees of Freedom • Brain Mapping Considerations 100 ROI Analysis • Often want to make inferences on a particular region-of-interest (ROI) • Must be careful to define ROI in such a way that does not make inferences dubious • Inappropriate ROI definition has led to a great deal of controversy in neuroscience – Double-dipping: Kriegeskorte et al., 2009, Nat. Neurosci. 101 ROIs Revsited • ROIs based upon particular contrast are biased to show a greater effect size than is truly present. – Cannot estimate effect size of A region defined from contrast A – Cannot examine A > B from a region defined from contrast A – Some researchers object to even visually depicting effects from A • E.g. fitted response will look too good • NOTE: I DISAGREE WITH THIS STRONGLY 102 ROIs Revsited • Inferences should be performed on unbiased ROIs – Orthogonal to contrast of interest – Defined from independent data • Separate functional localizer • Cross-validation (separate data into sample test sets) • Separate study – Expect regression to the mean! – Defined anatomically 103 Methods for Selecting Unbiased ROIs • Anatomical • Functional – Functional localizer – Meta analysis • Curated • Neurosynth 104 Anatomical WFUPickAtlas toolbox in SPM can define ROIs using popular atlases such as AAL and Talairach Daemon 105 Functional • Localizer 106 Functional • Meta analysis Image from Bartra et al., 2012 107 Example: Neurosynth (ToM) 108 Outline • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction • ROI Analysis – ROI selection • Reverse Inference • Degrees of Freedom • Brain Mapping Considerations 109 Reverse inference: When the heat is on, the house gets hot The house is hot… what can I conclude? 110 Reverse Inference 1. 2. 3. In the present study, when task comparison A was presented, brain area Z was active In other studies, when cognitive process X was putatively engaged, then brain area Z was active. Thus, the activity of area Z in the present study demonstrates engagement of cognitive process X by task comparison A • Example: Poldrack, 2006, TICS – Stroop task activates the dorsal ACC – In several studies, pain activated the dorsal ACC – -> The Stroop task hurts • Reverse Inference: – Reason backwards from activation in a region engagement of a cognitive function to 111 Yarkoni et al., 2011, Nature Methods Reverse Inference: Problems • Logical fallacy: affirming the consequent – If one takes the fMRI course, they will know fMRI. – Neo knows fMRI. – Neo took the fMRI course. • Brain regions are engaged by diverse demands – Even the presumably selective, fusiform “face” area is activated in response to diverse stimuli 112 Reverse Inference: What to Do? • On some level, reverse inference is necessary – Trying to build a collective knowledge of the brain – Need to link results with prior data • Keep selectively in mind when making inferences – Brain area X is engaged in your context of interest. What about other contexts? – Check out neurosynth.org • For a given region, can output words associated with that region in the published literature 113 Analysis Choices & Degrees of Freedom 114 The Degrees of Freedom Problem • Many aspects of fMRI analysis have multiple solutions and options – – – – – – Order of pre-processing steps Size of smoothing kernel Spatial normalization template High-pass filter length Basis set Motion regression • Choices can strongly affect results Mean activation and variation as a result of analysis choices Carp, 2012, Frontiers in Neuroscience 115 Degrees of Freedom and Bias • Problematic scenario – “This doesn’t look how I expected it to look. I wonder if I did something wrong?” • Re-analyze until it looks right • fMRI analysis is complex and it is likely that some degree of optimization is necessary post-data collection – E.g. Planned basis function does not appropriately fit data – Must be very careful not to bias results • Ninja Derek’s recommendations – Embed contrasts with known solutions that are orthogonal to contrasts of interest • E.g. Right vs Left Motor response, Error vs Correct response – Use these contrasts as criterion for optimization – Design experiments to adjudicate between multiple equally interesting hypotheses • Theory A predicts X, Theory B predicts Y, Theory C predicts Z – Reduces bias towards any one result 116 Conclusions • Most studies have been based on null-hypothesis tests • These are useful for exploratory purposes, and for constraining theories about the physical basis of mind • In the brain imaging setting, there has been little attention paid to estimating effect sizes, and the standard framework produced biased post-hoc estimates • Effect sizes may be increasingly important in the future, as applications are developed • There are tradeoffs in analysis choices, and the best option depends on your goal and what kinds of effects (local vs. distributed) you expect 117 Questions? …for you and for me! 118