10_Bayes - Wellcome Trust Centre for Neuroimaging

advertisement
Bayesian inference
J. Daunizeau
Brain and Spine Institute, Paris, France
Wellcome Trust Centre for Neuroimaging, London, UK
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Bayesian paradigm
probability theory: basics
Degree of plausibility desiderata:
- should be represented using real numbers
- should conform with intuition
- should be consistent
• normalization:
a=2
• marginalization:
b=5
a=2
• conditioning :
(Bayes rule)
(D1)
(D2)
(D3)
Bayesian paradigm
deriving the likelihood function
- Model of data with unknown parameters:
y  f  
- But data is noisy:

e.g., GLM:
f    X
y  f    
- Assume noise/residuals is ‘small’:
f
 1

p     exp   2  2 
 2

P    4   0.05

→ Distribution of data, given fixed parameters:
2
 1
p  y    exp   2  y  f    
 2

Bayesian paradigm
likelihood, priors and the model evidence
Likelihood:
Prior:

Bayes rule:
generative model m
Bayesian paradigm
forward and inverse problems
forward problem
p  y , m
likelihood
posterior distribution
p  y , m 
inverse problem
Bayesian paradigm
model comparison
Principle of parsimony :
« plurality should not be assumed without necessity »
y = f(x)
Model evidence:
“Occam’s razor” :
model evidence p(y|m)
y=f(x)
x
space of all data sets
Hierarchical models
principle
•••
inference
causality
Hierarchical models
directed acyclic graphs (DAGs)
Frequentist versus Bayesian inference
a (quick) note on hypothesis testing
• define the null, e.g.: H 0 :   0
• define two alternative models, e.g.:
1 if   0
m0 : p  m0   
0 otherwise
p t H 0 
m1 : p  m1   N  0,  
p Y m0 
P t  t * H0 
p Y m1 
t  t Y 
t*
• estimate parameters (obtain test stat.)
Y
space of all datasets
• apply decision rule, e.g.:
• apply decision rule, i.e.:
if P  t  t * H 0   
y
then reject H0
classical SPM
if
P  m0 y 
P  m1 y 
  then accept m0
Bayesian Model Comparison
Family-level inference
trading model resolution against statistical power
model selection error risk:
P(m1|y) = 0.04
P(m2|y) = 0.25
P  e  1 y   1  max P  m y 
m
B
A
P(m2|y) = 0.01
B
A
u
B
A
P(m2|y) = 0.7
B
A
u
 0.3
Family-level inference
trading model resolution against statistical power
model selection error risk:
P(m1|y) = 0.04
P(m2|y) = 0.25
P  e  1 y   1  max P  m y 
m
B
A
P(m2|y) = 0.01
B
A
B
A
P(m2|y) = 0.7
B
A
 0.3
family inference
(pool statistical evidence)
P  f y   P m y
m f
u
u
P  e  1 y   1  max P  f y 
f
P(f1|y) = 0.05
P(f2|y) = 0.95
 0.05
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Sampling methods
MCMC example: Gibbs sampling
Variational methods
VB / EM / ReML
→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ)
under some (e.g., mean field, Laplace) simplifying constraint
p 1 , 2 y, m 
p 1 or 2 y, m 
q 1 or 2 
2
1
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
segmentation
and normalisation
realignment
posterior probability
maps (PPMs)
smoothing
dynamic causal
modelling
general linear model
statistical
inference
normalisation
p <0.05
template
multivariate
decoding
Gaussian
field theory
aMRI segmentation
mixture of Gaussians (MoG) model
class variances
1
2
1
…
k
2
yi
…
ith voxel
label
ith voxel
value
ci

class
frequencies
k
class
means
grey matter
white matter
CSF
Decoding of brain images
recognizing brain states from fMRI
fixation cross
+
pace
response
>>
log-evidence of X-Y sparse mappings:
effect of lateralization
log-evidence of X-Y bilateral mappings:
effect of spatial deployment
fMRI time series analysis
spatial priors and model comparison
PPM: regions best explained
by short-term memory model
short-term memory
design matrix (X)
prior variance
of GLM coeff
prior variance
of data noise
AR coeff
(correlated noise)
GLM coeff
fMRI time series
PPM: regions best explained
by long-term memory model
long-term memory
design matrix (X)
Dynamic Causal Modelling
network structure identification
m1
m2
m3
m4
attention
attention
attention
attention
PPC
stim
V1
V5
PPC
stim
V1
V5
PPC
stim
V1
V5
PPC
stim
V1
attention
models marginal likelihood
ln p y m
15
PPC
10
1.25
stim
5
estimated
effective synaptic strengths
for best model (m4)
0.10
0.26
V1
0.39
0.26
0.13
0.46
0
m1 m2 m3 m4
V5
V5
DCMs and DAGs
a note on causality
1
1
2
2
21
3
1
1
3
2
time
2
32
13
t  t
3
t  0
t
t  t
x  f ( x, u ,  )
3
13u
 3u
u
Dynamic Causal Modelling
model comparison for group studies
differences in log- model evidences
ln p  y m1   ln p  y m2 
m1
m2
subjects
fixed effect
assume all subjects correspond to the same model
random effect
assume different subjects might correspond to different models
I thank you for your attention.
A note on statistical significance
lessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

p  y H1 
p  y H0 
is the most powerful test of size
u
  p    u H 0  to test the null.
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?
1 - error II rate
ROC analysis
MVB (Bayes factor)
u=1.09, power=56%
CCA (F-statistics)
F=2.20, power=20%
error I rate
Download