Multivariate models for fMRI data Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen) Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Why multivariate? Univariate approaches are excellent for localizing activations in individual voxels. * n.s. v1 v2 v1 v2 reward no reward 2 Why multivariate? Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels. n.s. n.s. v1 v2 orange juice v1 v2 apple juice v2 v1 3 Why multivariate? Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths. signal ๐ฅ2 (๐ก) signal ๐ฅ1 (๐ก) activity ๐ง2 (๐ก) activity ๐ง1 (๐ก) driving input ๐ข1(๐ก) signal ๐ฅ3 (๐ก) activity ๐ง3 (๐ก) observed BOLD signal hidden underlying neural activity and coupling strengths modulatory input ๐ข2(๐ก) t t Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage 4 Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding 5 Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding 6 Encoding vs. decoding condition stimulus response prediction error encoding model ๐: ๐๐ก → ๐๐ก decoding model โ: ๐๐ก → ๐๐ก context (cause or consequence) ๐๐ก ∈ โ๐ BOLD signal ๐๐ก ∈ โ๐ฃ 7 Regression vs. classification Regression model independent variables (regressors) ๐ continuous dependent variable ๐ categorical dependent variable (label) Classification model independent variables (features) vs. 8 Univariate vs. multivariate models A univariate model considers a single voxel at a time. context ๐๐ก ∈ โ๐ BOLD signal ๐๐ก ∈ โ Spatial dependencies between voxels are only introduced afterwards, through random field theory. A multivariate model considers many voxels at once. context ๐๐ก ∈ โ๐ BOLD signal ๐๐ก ∈ โ๐ฃ , v โซ 1 Multivariate models enable inferences on distributed responses without requiring focal activations. 9 Prediction vs. inference The goal of prediction is to find a generalisable encoding or decoding function. predicting a cognitive state using a brain-machine interface predicting a subject-specific diagnostic status The goal of inference is to decide between competing hypotheses. comparing a model that links distributed neuronal activity to a cognitive state with a model that does not weighing the evidence for sparse vs. distributed coding predictive density marginal likelihood (model evidence) ๐ ๐๐๐๐ค ๐๐๐๐ค , ๐, ๐ = ∫ ๐ ๐๐๐๐ค ๐๐๐๐ค , ๐ ๐ ๐ ๐, ๐ ๐๐ ๐ ๐ ๐ = ∫ ๐ ๐ ๐, ๐ ๐ ๐ ๐๐ 10 Goodness of fit vs. complexity Goodness of fit is the degree to which a model explains observed data. Complexity is the flexibility of a model (including, but not limited to, its number of parameters). ๐ 1 parameter truth data model underfitting ๐ 4 parameters optimal 9 parameters overfitting We wish to find the model that optimally trades off goodness of fit and complexity. Bishop (2007) PRML 11 Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding 13 Constructing a classifier A principled way of designing a classifier would be to adopt a probabilistic approach: ๐๐ก ๐ that ๐ which maximizes ๐ ๐๐ก = ๐ ๐๐ก , ๐, ๐ In practice, classifiers differ in terms of how strictly they implement this principle. Generative classifiers Discriminative classifiers Discriminant classifiers use Bayes’ rule to estimate ๐ ๐๐ก ๐๐ก ∝ ๐ ๐๐ก ๐๐ก ๐ ๐๐ก estimate ๐ ๐๐ก ๐๐ก directly without Bayes’ theorem estimate ๐ ๐๐ก directly • Gaussian naïve Bayes • Linear discriminant analysis • Logistic regression • Relevance vector machine • Gaussian process classifier • Fisher’s linear discriminant • Support vector machine 14 Common types of fMRI classification studies Searchlight approach Whole-brain approach A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere → map of tscores. A constrained classifier is trained on wholebrain data. Its voxel weights are related to their empirical null distributions using a permutation test → map of t-scores. Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage 15 Support vector machine (SVM) Linear SVM Nonlinear SVM v2 v1 Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press 16 Stages in a classification analysis Feature extraction Classification using crossvalidation Performance evaluation Bayesian mixedeffects inference ๐ =1− ๐ ๐ > ๐0 ๐, ๐ mixed effects 17 Feature extraction for trial-by-trial classification We can obtain trial-wise estimates of neural activity by filtering the data with a GLM. data ๐ design matrix ๐ coefficients = Boxcar regressor for trial 2 × ๐ฝ1 ๐ฝ2 +๐ โฎ of this ๐ฝEstimate ๐ coefficient reflects activity on trial 2 18 Cross-validation The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation: examples 1 2 3 ? ? ? training example ? test examples ... ... 99 100 ? ? 1 2 folds performance evaluation 19 Cross-validation A more commonly used variant is leave-one-out cross-validation. examples 1 2 3 ? ? test example ? ? ... ... ... ... ... ... 99 100 training example ? ? 1 2 98 99 100 folds performance evaluation 20 Performance evaluation Single-subject study with ๐ trials The most common approach is to assess how likely the obtained number of correctly classified trials could have occurred by chance. subject Binomial test ๐ = ๐ ๐ ≥ ๐ ๐ป0 = 1 − ๐ต ๐|๐, ๐0 In MATLAB: p = 1 - binocdf(k,n,pi_0) trial 1 trial ๐ +- + - 0 1 1 1 0 ๐ ๐ ๐0 ๐ต number of correctly classified trials total number of trials chance level (typically 0.5) binomial cumulative density function 21 Performance evaluation population subject 1 subject 2 subject 3 subject ๐ subject 4 … trial 1 trial ๐ +- + - 0 1 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 … 0 1 1 1 0 22 Performance evaluation Group study with ๐ subjects, ๐ trials each In a group setting, we must account for both within-subjects (fixed-effects) and betweensubjects (random-effects) variance components. Binomial test on concatenated data Binomial test on averaged data t-test on summary statistics ๐=1− ๐ต ∑๐|∑๐, ๐0 ๐=1− 1 1 ๐ต ๐ ∑๐| ๐ ∑๐, ๐0 ๐ก = ๐๐ fixed effects ๐−๐0 ๐−1 ๐ = 1 − ๐ก๐−1 ๐ก fixed effects ๐ sample mean of sample accuracies ๐๐−1 sample standard deviation random effects ๐0 ๐ก๐−1 Bayesian mixedeffects inference ๐=1− ๐ ๐ > ๐0 ๐,available ๐ for MATLAB and R mixed effects chance level (typically 0.5) cumulative Student’s ๐ก-distribution Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage 23 Research questions for classification Overall classification accuracy Spatial deployment of discriminative regions accuracy 80% 100 % 50 % Left or right button? Truth or lie? Healthy or ill? 55% classification task Temporal evolution of discriminability accuracy 100 % Model-based classification Participant indicates decision { group 1, group 2 } 50 % Accuracy rises above chance within-trial time Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection 24 Potential problems ๏ค ๏ค Multivariate classification studies conduct group tests on single-subject summary statistics that ๏จ discard the sign or direction of underlying effects ๏จ do not necessarily take into account confounding effects (e.g. correlation of task conditions with difficulty etc.) Therefore, in some analyses confounds rather than distributed representations may have produced positive results. Todd et al. 2013, NeuroImage 25 Potential problems ๏ค Simulation: Experiment condition (rule A vs. rule B) does not affect voxel activity, but difficulty does. Moreover, experiment condition and difficulty are confounded at the individual-subject level in random directions across 500 subjects. Todd et al. 2013, NeuroImage 26 Potential problems ๏ค ๏ค ๏ค ๏ค Empirical example: searchlight analysis: fMRI study on rule representations (flexible stimulus–response mappings Standard MVPA: rule representations in prefrontal regions. GLM: no significant results. Controlling for a variable that is confounded with rule at the individualsubject level but not the group level (reaction time differences across rules) eliminates the MVPA results. Todd et al. 2013, NeuroImage 27 Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding 28 Multivariate Bayes SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion. Mike West 29 Multivariate Bayes Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison. some cause or consequence vs. decoding model sparse coding in orbitofrontal cortex distributed coding in prefrontal cortex 30 From encoding to decoding Encoding model: GLM ๐ฝ ๐ ๐ฝ ๐พ ๐ ๐ด ๐ Decoding model: MVB = ๐๐ฝ = ๐๐ด + ๐บ๐พ + ๐ ๐ = ๐๐๐ฝ + ๐บ๐พ + ๐ ๐ = ๐ด๐ฝ ๐ด ๐พ ๐ ๐ = ๐๐ด + ๐บ๐พ + ๐ ๐๐ = ๐๐ฝ − ๐บ๐พ๐ฝ − ๐๐ฝ 31 Multivariate Bayes Friston et al. 2008 NeuroImage 32 MVB: details ๏ค Encoding model: ๏จ neuronal activity = linear mixture of causes: ๏ฎ A=X๐ฝ ๏จ BOLD data matrix is a temporal convolution of underlying neuronal activity: ๏ฎ ๐=๐A+๐บ๐พ+๐ ๏ค Decoding model: ๏จ ๏จ ๏ค mental state is a linear mixture of voxel-wise activity: ๐=A๐ฝ ๏ฎ thus A=X๐ฝ-1 substitution into ๐ gives: ๏ฎ Y =๐X๐ฝ-1 +๐บ๐พ+๐ ⇒ Y๐ฝ =๐X+๐บ๐พ๐ฝ+๐๐ฝ ⇒ ๐X= Y๐ฝ-๐บ๐พ๐ฝ-๐๐ฝ Importantly, we can remove the influence of confounds by premultiplying with the appropriate residual forming matrix: ๏จ RTX= RY๐ฝ+๐ 33 MVB: details ๏ค To make inversion tractable: ๏จ ๏ค imposing priors on ๐ฝ (i.e., assumption about how mental states are represented by voxel-wise activity) Scientific inquiry: ๏จ model selection: comparing different priors to identify which neuronal representation of mental states is most plausible 34 Specifying the prior for MVB To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses. For example: ๐ข patterns Voxel 2 is allowed to play a role. ๐ voxels Voxel 3 is allowed to play a role, but only if its neighbours play similar roles. Friston et al. 2008 NeuroImage 35 Example: decoding motion from visual cortex photic MVB can be illustrated using SPM’s attentionto-motion example dataset. This dataset is based on a simple block design. There are three experimental factors: photic – display shows random dots ๏จ motion – dots are moving ๏จ attention – subjects asked to pay attention attention const During these scans, for example, subjects were passively viewing moving dots. scans ๏จ motion Büchel & Friston 1999 Cerebral Cortex Friston et al. 2008 NeuroImage 36 Multivariate Bayes in SPM Step 1 After having specified and estimated a model, use the Results button. Step 2 Select the contrast to be decoded. 37 Multivariate Bayes in SPM Step 3 Pick a region of interest. 38 Multivariate Bayes in SPM anatomical hypothesis coding hypothesis Step 5 Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis. Step 4 Multivariate Bayes can be invoked from within the Multivariate section. 39 Multivariate Bayes in SPM Step 6 Results can be displayed using the BMS button. 40 Model evidence and voxel weights log BF = 3 41 Summary: research questions for MVB Where does the brain represent things? How does the brain represent things? Evaluating competing anatomical hypotheses Evaluating competing coding hypotheses 42 Overview 1 Modelling principles 2 Classification 3 Multivariate Bayes 4 Generative embedding 43 Model-based classification step 1 — modelling C measurements from an individual subject B subject-specific generative model A→B A→C B→B B→C subject representation in model-based feature space B step 5 — interpretation accuracy 1 A C step 2 — embedding A step 3 — classification step 4 — evaluation 0 jointly discriminative connection strengths? discriminability of groups? classification model Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol 44 Model-based classification: model specification planum anatomical temporale regions of interest planum temporale L Heschl’s gyrus (A1) Heschl’s gyrus (A1) R y = –26 mm medial geniculate body medial geniculate body stimulus input 45 Model-based classification * accuracy balanced accuracy balanced 100 100% 90 90% 80% 80 70% 70 60% 60 n.s. patients controls n.s. 50% 50 a 189 c 177 s 3p e 307 z 81 o243 f 389 l 360 r 162 47 2m 91 332 activation- correlation- modelbased based based 46 Generative embedding 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 -0.1 -0.5 -0.1 -0.5 0 Voxel 1 -0.15 -0.15 -0.2 -0.25 -0.3 patients -0.35 controls -10 0 0 0.5 10 Voxel 2 0.5 10 0 -10 -0.4 -0.4 -0.2 Parameter 3 0.4 Model-based parameter space generative embedding Voxel 3 Voxel-based activity space -0.25 -0.3 -0.35 0.5 -0.4 -0.4 -0.2 0 0.5 0 Parameter 2 -0.2 0 1-0.5 Parameter 0 -0.5 classification accuracy classification accuracy 75% 98% 47 Model-based classification: interpretation PT PT L HG (A1) HG (A1) MGB R MGB stimulus input 48 Model-based classification: interpretation PT PT L HG (A1) HG (A1) MGB R MGB stimulus input highly discriminative somewhat discriminative not discriminative 49 Model-based clustering 42 patients diagnosed with schizophrenia fMRI data acquired during working-memory task & modelled using a three-region DCM WM PC dLPFC 41 healthy controls VC stimulus Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci 50 Model-based clustering model selection interpretation validation Brodersen et al. 2014, NeuroImage: Clinical 51 Summary Classification • to assess whether a cognitive state is linked to patterns of activity • to visualize the spatial deployment of discriminative activity Multivariate Bayes • to evaluate competing anatomical hypotheses • to evaluate competing coding hypotheses Generative embedding • to assess whether groups differ in terms of model parameter estimates (connectivity) • to generate mechanistic subgroup hypotheses 52 Thank you 53 54 Further reading Classification ๏ค Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial overview. NeuroImage, 45(1, Supplement 1), S199-S209. ๏ค O'Toole, A. J., Jiang, F., Abdi, H., Penard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical, Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of Functional Neuroimaging Data. Journal of Cognitive Neuroscience, 19(11), 1735-1752. ๏ค Haynes, J., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523-534. ๏ค Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424-30. ๏ค Brodersen, K. H., Haiss, F., Ong, C., Jung, F., Tittgemeyer, M., Buhmann, J., Weber, B., et al. (2010). Model-based feature construction for multivariate decoding. NeuroImage (2010). ๏ค Brodersen, K. H., Ong, C., Stephan, K. E., Buhmann, J. (2010). The balanced accuracy and its posterior distribution. ICPR, 3121-3124. Multivariate Bayes ๏ค Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., et al. (2008). Bayesian decoding of brain images. NeuroImage, 39(1), 181-205. 55 Why multivariate? Multivariate approaches can exploit a sampling bias in voxelized images to reveal interesting activity on a subvoxel scale. = Boynton (2005) Nature Neuroscience 56 Why multivariate? The last 10 years have seen a notable increase in decoding analyses in neuroimaging. PET prediction Lautrup et al. (1994) Supercomputing in Brain Research Haxby et al. (2001) Science 57 Spatial deployment of informative regions Which brain regions are jointly informative of a cognitive state of interest? Searchlight approach Whole-brain approach A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere → map of tscores. A constrained classifier is trained on wholebrain data. Its voxel weights are related to their empirical null distributions using a permutation test → map of t-scores. Nandy & Cordes (2003) MRM Kriegeskorte et al. (2006) PNAS Mourao-Miranda et al. (2005) NeuroImage 58 Issues to be aware of (as researcher or reviewer) ๏ค ๏ค ๏ค ๏ค ๏ค Classification induces constraints on the experimental design. ๏จ When estimating trial-wise Beta values, we need longer ITIs (typically 8 – 15 s). ๏จ At the same time, we need many trials (typically 100+). ๏จ Classes should be balanced. If they are imbalanced, we can resample the training set, constrain the classifier, or report the balanced accuracy. Construction of examples ๏จ Estimation of Beta images is the preferred approach. ๏จ Covariates should be included in the trial-by-trial design matrix. Temporal autocorrelation ๏จ In trial-by-trial classification, exclude trials around the test trial from the training set. Avoiding double-dipping ๏จ Any feature selection and tuning of classifier settings should be carried out on the training set only. Performance evaluation ๏จ Do random-effects or mixed-effects inference. 59 Performance evaluation ๏ค Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples have been identified with their correct class labels. ๏ค The procedure of averaging across accuracies obtained on individual cross-validation folds is flawed in two ways. First, it does not allow for the derivation of a meaningful confidence interval. Second,it leads to an optimistic estimate when a biased classifier is tested on an imbalanced dataset. ๏ค Both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy. Brodersen, Ong, Buhmann, Stephan (2010) ICPR 60 Pattern characterization voxel 1 Example – decoding the identity of the person speaking to the subject in the scanner ... fingerprint plot (one plot per class) Formisano et al. (2008) Science 61 Lessons from the Neyman-Pearson lemma Is there a link between ๐ and ๐? The Neyman-Pearson lemma To test for a statistical dependency between a contextual variable ๐ and the BOLD signal ๐, we compare The most powerful test of size ๐ผ is: to reject ๐ป0 when the likelihood ratio Λ exceeds a criticial value ๐ข, ๏ค ๐ป0 : there is no dependency ๏ค ๐ป๐ : there is some dependency ๐ ๐๐ ๐ ๐๐ Λ ๐ = = ≥๐ข ๐ ๐ ๐ ๐ with ๐ข chosen such that Which statistical test? 1. define a test size ๐ผ (the probability of falsely rejecting ๐ป0 , i.e., 1 − specificity), 2. choose the test with the highest power 1 − ๐ฝ (the probability of correctly rejecting ๐ป0 , i.e., sensitivity). ๐ Λ ๐ ≥ ๐ข ๐ป0 = ๐ผ. The null distribution of the likelihood ratio ๐ Λ ๐ ๐ป0 can be determined nonparametrically or under parametric assumptions. This lemma underlies both classical statistics and Bayesian statistics (where Λ ๐ is known as a Bayes factor). Neyman & Person (1933) Phil Trans Roy Soc London 62 Lessons from the Neyman-Pearson lemma In summary 1. Inference about how the brain represents things reduces to model comparison. 2. To establish that a link exists between some context ๐ and activity ๐, the direction of the mapping is not important. 3. Testing the accuracy of a classifier is not based on Λ and is therefore suboptimal. Neyman & Person (1933) Phil Trans Roy Soc London Kass & Raftery (1995) J Am Stat Assoc Friston et al. (2009) NeuroImage 63 Temporal evolution of discriminability Example – decoding which button the subject pressed classification accuracy motor cortex decision response frontopolar cortex Soon et al. (2008) Nature Neuroscience 64 Identification / inferring a representational space Approach 1. estimation of an encoding model 2. nearest-neighbour classification or voting Mitchell et al. (2008) Science 65 Reconstruction / optimal decoding Approach 1. estimation of an encoding model 2. model inversion Paninski et al. (2007) Progr Brain Res Pillow et al. (2008) Nature Miyawaki et al. (2009) Neuron 66 Recent MVB studies 67 Specifying the prior for MVB 1st level – spatial coding hypothesis ๐ ๐ข Voxel 2 is allowed to play a role. ๐ voxels × ๐ patterns ๐ Voxel 3 is allowed to play a role, but only if its neighbours play weaker roles. ๐ ๐ 2nd level – pattern covariance structure Σ ๐ ๐ = ๐ฉ ๐ 0, Σ Σ = ∑๐ ๐ ๐ ๐ ๐ Thus: ๐ ๐ผ|๐ = ๐ฉ๐ ๐ผ 0, ๐Σ๐ ๐ and ๐ ๐ = ๐ฉ ๐ ๐, Π −1 68 Inverting the model Partition #1 subset ๐ Pattern 3 makes an important positive contribution / is activated. 1 Σ = ๐1 × Pattern 8 is allowed to make some contribution, independently of the1 Partition #2 subset ๐ contributions of other patterns. Σ = ๐1 × Partition #3 (optimal) Σ = ๐1 × subset ๐ 2 subset ๐ 2 Model inversion involves finding the posterior distribution over voxel weights ๐ผ. In MVB, this includes a greedy search for the optimal covariance structure that governs the prior over ๐ผ. +๐2 × subset ๐ 1 +๐2 × subset ๐ 3 +๐3 × 69 (motion) Observations vs. predictions ๐น๐ฟ๐ 70 Using MVB for point classification MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis. Support Vector Machine 71 Model-based analyses by data representation Model-based analyses How do patterns of hidden quantities (e.g., connectivity among brain regions) differ between groups? Structure-based analyses Which anatomical structures allow us to separate patients and healthy controls? Activation-based analyses Which functional differences allow us to separate groups? 72 Model-based clustering 0.2 0 0.6 0.4 0.2 0 regional functional effective activity connectivity connectivity 100 80 60 40 20 0 120 100 80 60 40 1 best model 1 71% 0.8 20 0.6 0.4 0.2 0.8 71% balanced purity 0.4 55% 120 log model evidence 0.6 1 78% 78% 0.8 62% log model evidence 0.8 balanced accuracy balanced accuracy 1 unsupervised learning: GMM clustering (using effective connectivity) balanced purity supervised learning: SVM classification 0.6 0.4 0.2 0 0 0 1 2 3 1 4 25 36 47 5 8 6 7 8 1 2 31 42 53 64 75 86 7 8 ๐พ (model selection) ๐พ (model validation) Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation) 73 Generative embedding and DCM Question 1 – What do the data tell us about hidden processes in the brain? ? ๏ฐ compute the posterior ๐ ๐ ๐ฆ, ๐ = ๐ ๐ฆ ๐, ๐ ๐ ๐ ๐ ๐ ๐ฆ๐ Question 2 – Which model is best w.r.t. the observed fMRI data? ๏ฐ compute the model evidence ๐ ๐ ๐ฆ ∝ ๐ ๐ฆ ๐ ๐(๐) = ∫ ๐ ๐ฆ ๐, ๐ ๐ ๐ ๐ ๐๐ ? Question 3 – Which model is best w.r.t. an external criterion? ๏ฐ compute the classification accuracy ๐ โ ๐ฆ =๐ฅ๐ฆ = { patient, control } ๐ โ ๐ฆ = ๐ฅ ๐ฆ, ๐ฆtrain , ๐ฅtrain ๐ ๐ฆ ๐ ๐ฆtrain ๐ ๐ฅtrain ๐๐ฆ ๐๐ฆtrain ๐ฅtrain 74 Model-based classification using DCM model-based classification activation-based classification structure-based classification { group 1, group 2 } model selection vs. inference on model parameters ? 75