Multivariate models for fMRI - Translational Neuromodeling Unit

advertisement
Multivariate models for fMRI data
Klaas Enno Stephan
(with 90% of slides kindly contributed by Kay H. Brodersen)
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering
University of Zurich & ETH Zurich
Why multivariate?
Univariate approaches are excellent for localizing activations in individual voxels.
*
n.s.
v1 v2
v1 v2
reward
no reward
2
Why multivariate?
Multivariate approaches can be used to examine responses that are jointly encoded
in multiple voxels.
n.s.
n.s.
v1 v2
orange juice
v1 v2
apple juice
v2
v1
3
Why multivariate?
Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths.
signal
๐‘ฅ2 (๐‘ก)
signal
๐‘ฅ1 (๐‘ก)
activity
๐‘ง2 (๐‘ก)
activity
๐‘ง1 (๐‘ก)
driving input ๐‘ข1(๐‘ก)
signal
๐‘ฅ3 (๐‘ก)
activity
๐‘ง3 (๐‘ก)
observed BOLD signal
hidden underlying
neural activity and
coupling strengths
modulatory input ๐‘ข2(๐‘ก)
t
t
Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage
4
Overview
1 Modelling principles
2 Classification
3 Multivariate Bayes
4 Generative embedding
5
Overview
1 Modelling principles
2 Classification
3 Multivariate Bayes
4 Generative embedding
6
Encoding vs. decoding
condition
stimulus
response
prediction error
encoding model
๐‘”: ๐‘‹๐‘ก → ๐‘Œ๐‘ก
decoding model
โ„Ž: ๐‘Œ๐‘ก → ๐‘‹๐‘ก
context (cause or consequence)
๐‘‹๐‘ก ∈ โ„๐‘‘
BOLD signal
๐‘Œ๐‘ก ∈ โ„๐‘ฃ
7
Regression vs. classification
Regression model
independent
variables
(regressors)
๐‘“
continuous
dependent variable
๐‘“
categorical
dependent variable
(label)
Classification model
independent
variables
(features)
vs.
8
Univariate vs. multivariate models
A univariate model considers a
single voxel at a time.
context
๐‘‹๐‘ก ∈ โ„๐‘‘
BOLD signal
๐‘Œ๐‘ก ∈ โ„
Spatial dependencies between voxels
are only introduced afterwards,
through random field theory.
A multivariate model considers
many voxels at once.
context
๐‘‹๐‘ก ∈ โ„๐‘‘
BOLD signal
๐‘Œ๐‘ก ∈ โ„๐‘ฃ , v โ‰ซ 1
Multivariate models enable
inferences on distributed responses
without requiring focal activations.
9
Prediction vs. inference
The goal of prediction is to find
a generalisable encoding or
decoding function.
predicting a cognitive
state using a
brain-machine
interface
predicting a
subject-specific
diagnostic status
The goal of inference is to decide
between competing hypotheses.
comparing a model that
links distributed neuronal
activity to a cognitive
state with a model that
does not
weighing the
evidence for
sparse vs.
distributed coding
predictive density
marginal likelihood (model evidence)
๐‘ ๐‘‹๐‘›๐‘’๐‘ค ๐‘Œ๐‘›๐‘’๐‘ค , ๐‘‹, ๐‘Œ = ∫ ๐‘ ๐‘‹๐‘›๐‘’๐‘ค ๐‘Œ๐‘›๐‘’๐‘ค , ๐œƒ ๐‘ ๐œƒ ๐‘‹, ๐‘Œ ๐‘‘๐œƒ
๐‘ ๐‘‹ ๐‘Œ = ∫ ๐‘ ๐‘‹ ๐‘Œ, ๐œƒ ๐‘ ๐œƒ ๐‘‘๐œƒ
10
Goodness of fit vs. complexity
Goodness of fit is the degree to which a model explains observed data.
Complexity is the flexibility of a model (including, but not limited to, its number of
parameters).
๐‘Œ
1 parameter
truth
data
model
underfitting
๐‘‹
4 parameters
optimal
9 parameters
overfitting
We wish to find the model that optimally trades off goodness of fit and complexity.
Bishop (2007) PRML
11
Overview
1 Modelling principles
2 Classification
3 Multivariate Bayes
4 Generative embedding
13
Constructing a classifier
A principled way of designing a classifier would be to adopt a probabilistic approach:
๐‘Œ๐‘ก
๐‘“
that ๐‘˜ which maximizes ๐‘ ๐‘‹๐‘ก = ๐‘˜ ๐‘Œ๐‘ก , ๐‘‹, ๐‘Œ
In practice, classifiers differ in terms of how strictly they implement this principle.
Generative classifiers
Discriminative classifiers
Discriminant classifiers
use Bayes’ rule to estimate
๐‘ ๐‘‹๐‘ก ๐‘Œ๐‘ก ∝ ๐‘ ๐‘Œ๐‘ก ๐‘‹๐‘ก ๐‘ ๐‘‹๐‘ก
estimate ๐‘ ๐‘‹๐‘ก ๐‘Œ๐‘ก directly
without Bayes’ theorem
estimate ๐‘“ ๐‘Œ๐‘ก directly
• Gaussian naïve Bayes
• Linear discriminant
analysis
• Logistic regression
• Relevance vector machine
• Gaussian process classifier
• Fisher’s linear
discriminant
• Support vector machine
14
Common types of fMRI classification studies
Searchlight approach
Whole-brain approach
A sphere is passed across the brain. At each
location, the classifier is evaluated using only
the voxels in the current sphere → map of tscores.
A constrained classifier is trained on wholebrain data. Its voxel weights are related to
their empirical null distributions using a
permutation test → map of t-scores.
Nandy & Cordes (2003) MRM
Kriegeskorte et al. (2006) PNAS
Mourao-Miranda et al. (2005) NeuroImage
15
Support vector machine (SVM)
Linear SVM
Nonlinear SVM
v2
v1
Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press
16
Stages in a classification analysis
Feature
extraction
Classification
using crossvalidation
Performance
evaluation
Bayesian mixedeffects inference
๐‘ =1−
๐‘ƒ ๐œ‹ > ๐œ‹0 ๐‘˜, ๐‘›
mixed effects
17
Feature extraction for trial-by-trial classification
We can obtain trial-wise estimates of neural activity by filtering the data with a GLM.
data ๐‘Œ
design matrix ๐‘‹
coefficients
=
Boxcar
regressor for
trial 2
×
๐›ฝ1
๐›ฝ2
+๐‘’
โ‹ฎ
of this
๐›ฝEstimate
๐‘
coefficient
reflects activity
on trial 2
18
Cross-validation
The generalization ability of a classifier can be estimated using a resampling procedure
known as cross-validation. One example is 2-fold cross-validation:
examples
1
2
3
?
?
?
training example
? test examples
...
...
99
100
?
?
1
2
folds
performance evaluation
19
Cross-validation
A more commonly used variant is leave-one-out cross-validation.
examples
1
2
3
?
? test example
?
?
...
...
...
...
...
...
99
100
training example
?
?
1
2
98
99
100
folds
performance evaluation
20
Performance evaluation
Single-subject study with ๐’ trials
The most common approach is to assess how likely the obtained number of correctly
classified trials could have occurred by chance.
subject
Binomial test
๐‘ = ๐‘ƒ ๐‘‹ ≥ ๐‘˜ ๐ป0 = 1 − ๐ต ๐‘˜|๐‘›, ๐œ‹0
In MATLAB:
p = 1 - binocdf(k,n,pi_0)
trial 1
trial ๐‘›
+-
+
-
0
1
1
1
0
๐‘˜
๐‘›
๐œ‹0
๐ต
number of correctly classified trials
total number of trials
chance level (typically 0.5)
binomial cumulative density function
21
Performance evaluation
population
subject 1
subject 2
subject 3
subject ๐‘š
subject 4
…
trial 1
trial ๐‘›
+-
+
-
0
1
1
1
0
1
1
0
1
1
0
1
1
0
0
1
1
1
1
1
…
0
1
1
1
0
22
Performance evaluation
Group study with ๐’Ž subjects, ๐’ trials each
In a group setting, we must account for both within-subjects (fixed-effects) and betweensubjects (random-effects) variance components.
Binomial test on
concatenated data
Binomial test on
averaged data
t-test on
summary statistics
๐‘=1−
๐ต ∑๐‘˜|∑๐‘›, ๐œ‹0
๐‘=1−
1
1
๐ต ๐‘š ∑๐‘˜| ๐‘š ∑๐‘›, ๐œ‹0
๐‘ก = ๐‘š๐œŽ
fixed effects
๐œ‹−๐œ‹0
๐‘š−1
๐‘ = 1 − ๐‘ก๐‘š−1 ๐‘ก
fixed effects
๐œ‹
sample mean of sample accuracies
๐œŽ๐‘š−1 sample standard deviation
random effects
๐œ‹0
๐‘ก๐‘š−1
Bayesian mixedeffects inference
๐‘=1−
๐‘ƒ ๐œ‹ > ๐œ‹0 ๐‘˜,available
๐‘›
for
MATLAB and R
mixed effects
chance level (typically 0.5)
cumulative Student’s ๐‘ก-distribution
Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR
Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage
23
Research questions for classification
Overall classification accuracy
Spatial deployment of discriminative regions
accuracy
80%
100 %
50 %
Left or right
button?
Truth
or
lie?
Healthy or
ill?
55%
classification task
Temporal evolution of discriminability
accuracy
100 %
Model-based classification
Participant indicates
decision
{ group 1,
group 2 }
50 %
Accuracy rises above
chance
within-trial time
Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection
24
Potential problems
๏‚ค
๏‚ค
Multivariate classification
studies conduct group tests
on single-subject summary
statistics that
๏‚จ
discard the sign or
direction of underlying
effects
๏‚จ
do not necessarily take
into account confounding
effects (e.g. correlation of
task conditions with
difficulty etc.)
Therefore, in some analyses
confounds rather than
distributed representations
may have produced positive
results.
Todd et al. 2013, NeuroImage
25
Potential problems
๏‚ค
Simulation: Experiment
condition (rule A vs.
rule B) does not affect
voxel activity, but
difficulty does.
Moreover, experiment
condition and difficulty
are confounded at the
individual-subject level
in random directions
across 500 subjects.
Todd et al. 2013, NeuroImage
26
Potential problems
๏‚ค
๏‚ค
๏‚ค
๏‚ค
Empirical example:
searchlight analysis:
fMRI study on rule
representations (flexible
stimulus–response
mappings
Standard MVPA: rule
representations in
prefrontal regions.
GLM: no significant
results.
Controlling for a variable
that is confounded with
rule at the individualsubject level but not the
group level (reaction
time differences across
rules) eliminates the
MVPA results.
Todd et al. 2013, NeuroImage
27
Overview
1 Modelling principles
2 Classification
3 Multivariate Bayes
4 Generative embedding
28
Multivariate Bayes
SPM brings multivariate analyses into the conventional inference framework of
hierarchical Bayesian models and their inversion.
Mike West
29
Multivariate Bayes
Multivariate analyses in SPM rest on the central notion that inferences about how
the brain represents things can be reduced to model comparison.
some cause or
consequence
vs.
decoding model
sparse coding in
orbitofrontal cortex
distributed coding in
prefrontal cortex
30
From encoding to decoding
Encoding model: GLM
๐›ฝ
๐‘‹
๐›ฝ
๐›พ
๐œ€
๐ด
๐‘Œ
Decoding model: MVB
= ๐‘‹๐›ฝ
= ๐‘‡๐ด + ๐บ๐›พ + ๐œ€
๐‘Œ = ๐‘‡๐‘‹๐›ฝ + ๐บ๐›พ + ๐œ€
๐‘‹
= ๐ด๐›ฝ
๐ด
๐›พ
๐œ€
๐‘Œ
= ๐‘‡๐ด + ๐บ๐›พ + ๐œ€
๐‘‡๐‘‹ = ๐‘Œ๐›ฝ − ๐บ๐›พ๐›ฝ − ๐œ€๐›ฝ
31
Multivariate Bayes
Friston et al. 2008 NeuroImage
32
MVB: details
๏‚ค
Encoding model:
๏‚จ neuronal activity = linear mixture of causes:
๏ฎ A=X๐›ฝ
๏‚จ BOLD data matrix is a temporal convolution of underlying neuronal activity:
๏ฎ ๐‘Œ=๐‘‡A+๐บ๐›พ+๐œ€
๏‚ค
Decoding model:
๏‚จ
๏‚จ
๏‚ค
mental state is a linear mixture of voxel-wise activity: ๐‘‹=A๐›ฝ
๏ฎ thus A=X๐›ฝ-1
substitution into ๐‘Œ gives:
๏ฎ Y =๐‘‡X๐›ฝ-1 +๐บ๐›พ+๐œ€ ⇒ Y๐›ฝ =๐‘‡X+๐บ๐›พ๐›ฝ+๐œ€๐›ฝ ⇒ ๐‘‡X= Y๐›ฝ-๐บ๐›พ๐›ฝ-๐œ€๐›ฝ
Importantly, we can remove the influence of confounds by premultiplying with
the appropriate residual forming matrix:
๏‚จ
RTX= RY๐›ฝ+๐œ
33
MVB: details
๏‚ค
To make inversion tractable:
๏‚จ
๏‚ค
imposing priors on ๐›ฝ (i.e., assumption about how mental states are
represented by voxel-wise activity)
Scientific inquiry:
๏‚จ model selection: comparing different priors to identify which neuronal
representation of mental states is most plausible
34
Specifying the prior for MVB
To make the ill-posed regression problem tractable, MVB uses a prior on voxel
weights. Different priors reflect different anatomical and/or coding hypotheses.
For example:
๐‘ข
patterns
Voxel 2 is
allowed to play a
role. ๐‘›
voxels
Voxel 3 is allowed to
play a role, but only if
its neighbours play
similar roles.
Friston et al. 2008 NeuroImage
35
Example: decoding motion from visual cortex
photic
MVB can be illustrated using SPM’s attentionto-motion example dataset.
This dataset is based on a simple block
design. There are three experimental factors:
photic
– display shows random dots
๏‚จ
motion
– dots are moving
๏‚จ
attention – subjects asked to pay attention
attention
const
During these scans,
for example,
subjects were
passively viewing
moving dots.
scans
๏‚จ
motion
Büchel & Friston 1999 Cerebral Cortex
Friston et al. 2008 NeuroImage
36
Multivariate Bayes in SPM
Step 1
After having specified and estimated
a model, use the Results button.
Step 2
Select the contrast to be decoded.
37
Multivariate Bayes in SPM
Step 3
Pick a region of interest.
38
Multivariate Bayes in SPM
anatomical hypothesis
coding hypothesis
Step 5
Here, the region of
interest is
specified as a
sphere around the
cursor. The spatial
prior implements
a sparse coding
hypothesis.
Step 4
Multivariate Bayes can be
invoked from within the
Multivariate section.
39
Multivariate Bayes in SPM
Step 6
Results can be displayed using the BMS
button.
40
Model evidence and voxel weights
log BF = 3
41
Summary: research questions for MVB
Where does the brain represent things?
How does the brain represent things?
Evaluating competing anatomical hypotheses
Evaluating competing coding hypotheses
42
Overview
1 Modelling principles
2 Classification
3 Multivariate Bayes
4 Generative embedding
43
Model-based classification
step 1 —
modelling
C
measurements from
an individual subject
B
subject-specific
generative model
A→B
A→C
B→B
B→C
subject representation in
model-based feature space
B
step 5 —
interpretation
accuracy
1
A
C
step 2 —
embedding
A
step 3 —
classification
step 4 —
evaluation
0
jointly discriminative
connection strengths?
discriminability of
groups?
classification
model
Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage
Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol
44
Model-based classification: model specification
planum
anatomical
temporale
regions of interest
planum
temporale
L
Heschl’s
gyrus
(A1)
Heschl’s
gyrus
(A1)
R
y = –26 mm
medial
geniculate
body
medial
geniculate
body
stimulus input
45
Model-based classification
*
accuracy
balanced accuracy
balanced
100
100%
90
90%
80%
80
70%
70
60%
60
n.s.
patients
controls
n.s.
50%
50
a 189
c 177
s 3p
e 307
z 81
o243
f 389
l 360
r
162
47
2m
91
332
activation- correlation- modelbased
based
based
46
Generative embedding
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.5
-0.1
-0.5
0
Voxel 1
-0.15
-0.15
-0.2
-0.25
-0.3
patients
-0.35
controls
-10
0
0
0.5
10
Voxel 2
0.5
10
0
-10
-0.4
-0.4
-0.2
Parameter 3
0.4
Model-based parameter space
generative
embedding
Voxel 3
Voxel-based activity space
-0.25
-0.3
-0.35
0.5
-0.4
-0.4
-0.2
0
0.5
0
Parameter 2
-0.2
0 1-0.5
Parameter
0
-0.5
classification accuracy
classification accuracy
75%
98%
47
Model-based classification: interpretation
PT
PT
L
HG
(A1)
HG
(A1)
MGB
R
MGB
stimulus input
48
Model-based classification: interpretation
PT
PT
L
HG
(A1)
HG
(A1)
MGB
R
MGB
stimulus input
highly discriminative
somewhat discriminative
not discriminative
49
Model-based clustering
42 patients diagnosed
with schizophrenia
fMRI data acquired during working-memory
task & modelled using a three-region DCM
WM
PC
dLPFC
41 healthy controls
VC
stimulus
Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci
50
Model-based clustering
model selection
interpretation
validation
Brodersen et al. 2014, NeuroImage: Clinical
51
Summary
Classification
• to assess whether a cognitive state is linked
to patterns of activity
• to visualize the spatial deployment of
discriminative activity
Multivariate Bayes
• to evaluate competing anatomical
hypotheses
• to evaluate competing coding hypotheses
Generative embedding
• to assess whether groups differ in terms of
model parameter estimates (connectivity)
• to generate mechanistic subgroup
hypotheses
52
Thank you
53
54
Further reading
Classification
๏‚ค
Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial
overview. NeuroImage, 45(1, Supplement 1), S199-S209.
๏‚ค
O'Toole, A. J., Jiang, F., Abdi, H., Penard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical,
Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of
Functional Neuroimaging Data. Journal of Cognitive Neuroscience, 19(11), 1735-1752.
๏‚ค
Haynes, J., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews
Neuroscience, 7(7), 523-534.
๏‚ค
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel
pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424-30.
๏‚ค
Brodersen, K. H., Haiss, F., Ong, C., Jung, F., Tittgemeyer, M., Buhmann, J., Weber, B., et al. (2010).
Model-based feature construction for multivariate decoding. NeuroImage (2010).
๏‚ค
Brodersen, K. H., Ong, C., Stephan, K. E., Buhmann, J. (2010). The balanced accuracy and its posterior
distribution. ICPR, 3121-3124.
Multivariate Bayes
๏‚ค
Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., et al. (2008). Bayesian
decoding of brain images. NeuroImage, 39(1), 181-205.
55
Why multivariate?
Multivariate approaches can exploit a sampling bias in voxelized images
to reveal interesting activity on a subvoxel scale.
=
Boynton (2005) Nature Neuroscience
56
Why multivariate?
The last 10 years have seen a notable increase in decoding analyses in neuroimaging.
PET
prediction
Lautrup et al. (1994) Supercomputing in Brain Research
Haxby et al. (2001) Science
57
Spatial deployment of informative regions
Which brain regions are jointly informative of a cognitive state of interest?
Searchlight approach
Whole-brain approach
A sphere is passed across the brain. At each
location, the classifier is evaluated using only
the voxels in the current sphere → map of tscores.
A constrained classifier is trained on wholebrain data. Its voxel weights are related to
their empirical null distributions using a
permutation test → map of t-scores.
Nandy & Cordes (2003) MRM
Kriegeskorte et al. (2006) PNAS
Mourao-Miranda et al. (2005) NeuroImage
58
Issues to be aware of (as researcher or reviewer)
๏‚ค
๏‚ค
๏‚ค
๏‚ค
๏‚ค
Classification induces constraints on the experimental design.
๏‚จ When estimating trial-wise Beta values, we need longer ITIs (typically 8 – 15
s).
๏‚จ At the same time, we need many trials (typically 100+).
๏‚จ Classes should be balanced. If they are imbalanced, we can resample the
training set, constrain the classifier, or report the balanced accuracy.
Construction of examples
๏‚จ Estimation of Beta images is the preferred approach.
๏‚จ Covariates should be included in the trial-by-trial design matrix.
Temporal autocorrelation
๏‚จ In trial-by-trial classification, exclude trials around the test trial from the
training set.
Avoiding double-dipping
๏‚จ Any feature selection and tuning of classifier settings should be carried out on
the training set only.
Performance evaluation
๏‚จ Do random-effects or mixed-effects inference.
59
Performance evaluation
๏‚ค
Evaluating the performance of a classification algorithm critically requires a measure of
the degree to which unseen examples have been identified with their correct class labels.
๏‚ค
The procedure of averaging across accuracies obtained on individual cross-validation folds
is flawed in two ways. First, it does not allow for the derivation of a meaningful confidence
interval. Second,it leads to an optimistic estimate when a biased classifier is tested on an
imbalanced dataset.
๏‚ค
Both problems can be overcome by replacing the conventional point estimate of accuracy
by an estimate of the posterior distribution of the balanced accuracy.
Brodersen, Ong, Buhmann, Stephan (2010) ICPR
60
Pattern characterization
voxel 1
Example – decoding the identity of
the person speaking to the subject in
the scanner
...
fingerprint plot
(one plot per class)
Formisano et al. (2008) Science
61
Lessons from the Neyman-Pearson lemma
Is there a link between ๐‘‹ and ๐‘Œ?
The Neyman-Pearson lemma
To test for a statistical dependency
between a contextual variable ๐‘‹ and the
BOLD signal ๐‘Œ, we compare
The most powerful test of size ๐›ผ is:
to reject ๐ป0 when the likelihood ratio Λ exceeds
a criticial value ๐‘ข,
๏‚ค
๐ป0 : there is no dependency
๏‚ค
๐ป๐‘Ž : there is some dependency
๐‘ ๐‘Œ๐‘‹
๐‘ ๐‘‹๐‘Œ
Λ ๐‘Œ =
=
≥๐‘ข
๐‘ ๐‘Œ
๐‘ ๐‘‹
with ๐‘ข chosen such that
Which statistical test?
1. define a test size ๐›ผ
(the probability of falsely rejecting
๐ป0 , i.e., 1 − specificity),
2. choose the test with the highest
power 1 − ๐›ฝ
(the probability of correctly rejecting
๐ป0 , i.e., sensitivity).
๐‘ƒ Λ ๐‘Œ ≥ ๐‘ข ๐ป0 = ๐›ผ.
The null distribution of the likelihood ratio
๐‘ Λ ๐‘Œ ๐ป0 can be determined nonparametrically or under parametric assumptions.
This lemma underlies both classical statistics and
Bayesian statistics (where Λ ๐‘Œ is known as a
Bayes factor).
Neyman & Person (1933) Phil Trans Roy Soc London
62
Lessons from the Neyman-Pearson lemma
In summary
1. Inference about how the brain represents
things reduces to model comparison.
2. To establish that a link exists between some
context ๐‘‹ and activity ๐‘Œ, the direction of the
mapping is not important.
3. Testing the accuracy of a classifier is not
based on Λ and is therefore suboptimal.
Neyman & Person (1933) Phil Trans Roy Soc London
Kass & Raftery (1995) J Am Stat Assoc
Friston et al. (2009) NeuroImage
63
Temporal evolution of discriminability
Example – decoding which button the subject pressed
classification
accuracy
motor cortex
decision response
frontopolar cortex
Soon et al. (2008) Nature Neuroscience
64
Identification / inferring a representational space
Approach
1. estimation of an encoding model
2. nearest-neighbour classification or voting
Mitchell et al. (2008) Science
65
Reconstruction / optimal decoding
Approach
1. estimation of an encoding model
2. model inversion
Paninski et al. (2007) Progr Brain Res
Pillow et al. (2008) Nature
Miyawaki et al. (2009) Neuron
66
Recent MVB studies
67
Specifying the prior for MVB
1st level – spatial coding hypothesis ๐‘ˆ
๐‘ข
Voxel 2 is
allowed to play a
role. ๐‘›
voxels
×
๐œ‚
patterns
๐‘ˆ
Voxel 3 is allowed to
play a role, but only if
its neighbours play
weaker roles.
๐‘ˆ
๐‘ˆ
2nd level – pattern covariance structure Σ
๐‘ ๐œ‚ = ๐’ฉ ๐œ‚ 0, Σ
Σ = ∑๐‘– ๐œ† ๐‘– ๐‘ 
๐‘–
Thus: ๐‘ ๐›ผ|๐œ† = ๐’ฉ๐‘› ๐›ผ 0, ๐‘ˆΣ๐‘ˆ ๐‘‡
and ๐‘ ๐œ† = ๐’ฉ ๐œ† ๐œ‹, Π −1
68
Inverting the model
Partition #1
subset ๐‘ 
Pattern 3 makes an
important positive
contribution / is
activated.
1
Σ = ๐œ†1 ×
Pattern 8 is allowed to
make some contribution,
independently of the1
Partition
#2
subset ๐‘ 
contributions of other
patterns.
Σ = ๐œ†1 ×
Partition #3
(optimal)
Σ = ๐œ†1 ×
subset ๐‘ 
2
subset ๐‘ 
2
Model inversion involves
finding the posterior
distribution over voxel
weights ๐›ผ.
In MVB, this includes a
greedy search for the
optimal covariance
structure that governs
the prior over ๐›ผ.
+๐œ†2 ×
subset ๐‘ 
1
+๐œ†2 ×
subset ๐‘ 
3
+๐œ†3 ×
69
(motion)
Observations vs. predictions
๐‘น๐‘ฟ๐‘
70
Using MVB for point classification
MVB may outperform
conventional point
classifiers when using a
more appropriate coding
hypothesis.
Support Vector
Machine
71
Model-based analyses by data representation
Model-based
analyses
How do patterns of
hidden quantities (e.g.,
connectivity among brain
regions) differ between groups?
Structure-based
analyses
Which anatomical
structures allow us to
separate patients and
healthy controls?
Activation-based
analyses
Which functional
differences allow us to
separate groups?
72
Model-based clustering
0.2
0
0.6
0.4
0.2
0
regional functional effective
activity connectivity connectivity
100
80
60
40
20
0
120
100
80
60
40
1
best model
1
71%
0.8
20
0.6
0.4
0.2
0.8 71%
balanced purity
0.4
55%
120
log model evidence
0.6
1
78%
78%
0.8 62%
log model evidence
0.8
balanced accuracy
balanced accuracy
1
unsupervised learning:
GMM clustering (using effective connectivity)
balanced purity
supervised learning:
SVM classification
0.6
0.4
0.2
0
0
0
1 2 3 1 4 25 36 47 5 8 6 7 8
1 2 31 42 53 64 75 86 7 8
๐พ (model selection)
๐พ (model validation)
Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation)
73
Generative embedding and DCM
Question 1 – What do the data tell us about hidden processes in the brain?
?
๏ƒฐ compute the posterior
๐‘ ๐œƒ ๐‘ฆ, ๐‘š =
๐‘
๐‘ฆ ๐œƒ, ๐‘š ๐‘ ๐œƒ ๐‘š
๐‘ ๐‘ฆ๐‘š
Question 2 – Which model is best w.r.t. the observed fMRI data?
๏ƒฐ compute the model evidence
๐‘ ๐‘š ๐‘ฆ ∝ ๐‘ ๐‘ฆ ๐‘š ๐‘(๐‘š)
= ∫ ๐‘ ๐‘ฆ ๐œƒ, ๐‘š ๐‘ ๐œƒ ๐‘š ๐‘‘๐œƒ
?
Question 3 – Which model is best w.r.t. an external criterion?
๏ƒฐ compute the classification accuracy
๐‘ โ„Ž ๐‘ฆ =๐‘ฅ๐‘ฆ
=
{ patient,
control }
๐‘ โ„Ž ๐‘ฆ = ๐‘ฅ ๐‘ฆ, ๐‘ฆtrain , ๐‘ฅtrain ๐‘ ๐‘ฆ ๐‘ ๐‘ฆtrain ๐‘ ๐‘ฅtrain ๐‘‘๐‘ฆ ๐‘‘๐‘ฆtrain ๐‘ฅtrain
74
Model-based classification using DCM
model-based
classification
activation-based
classification
structure-based
classification
{ group 1,
group 2 }
model selection
vs.
inference on
model
parameters
?
75
Download