Bayesian Inference Lee Harrison York Neuroimaging Centre 14 / 05 / 2010 Bayesian segmentation and normalisation realignment Spatial priors on activation extent smoothing Posterior probability maps (PPMs) Dynamic Causal Modelling general linear model statistical inference normalisation p <0.05 template Gaussian field theory Attention to Motion Paradigm Results SPC V3A V5+ Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain - fixation only - observe static dots - observe moving dots - task on moving dots + photic + motion + attention V1 V5 V5 + parietal cortex Attention to Motion Paradigm Dynamic Causal Models Model 1 (forward): Model 2 (backward): attentional modulation of V1→V5: forward attentional modulation of SPC→V5: backward Photic SPC Photic V1 - fixation only - observe static dots - observe moving dots - task on moving dots SPC V1 V5 Motion Attention Attention Motion V5 Bayesian model selection: Which model is optimal? Responses to Uncertainty Long term memory Short term memory Responses to Uncertainty Paradigm Stimuli sequence of randomly sampled discrete events Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory) 1 2 3 4 Question which regions are best explained by short / long term memory model? … 1 trials 2 40 ? ? Overview • Introductory remarks • Some probability densities/distributions • Probabilistic (generative) models • Bayesian inference • A simple example – Bayesian linear regression • SPM applications – Segmentation – Dynamic causal modeling – Spatial models of fMRI time series Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities p( | ) k k 1 k k=2 p( | ) exp( 12 ( 0 )2 ) 2 1 2 Probability distributions and densities k=2 Probability distributions and densities k=2 Probability distributions and densities k=2 Generative models q( ) ? estimation 2 2 3 2 e3 ~ N (0, 31 ) 1 2 1 X 2 2 e2 y 1 y X 11 e1 space 1 space generation y { , } time Bayesian statistics new data prior knowledge p( y | ) p ( ) p( | y) p( y | ) p( ) posterior likelihood ∙ prior Bayes theorem allows one to formally incorporate prior knowledge into computing statistical probabilities. The “posterior” probability of the parameters given the data is an optimal combination of prior knowledge and new data, weighted by their relative precision. Bayes’ rule Given data y and parameters , their joint probability can be written in 2 ways: p( y, ) p( y | ) p( ) p( | y) p( y) p( y, ) Eliminating p(y,) gives Bayes’ rule: Likelihood Posterior Prior p( y | ) p( ) p( | y ) p( y) Evidence Principles of Bayesian inference Formulation of a generative model likelihood p(y|) prior distribution p() Observation of data y Update of beliefs based upon observations, given a prior state of knowledge p( | y) p( y | ) p( ) Univariate Gaussian Normal densities p( ) N ( ; p , p1 ) y e p( y | ) N ( y; ,e1 ) p( | y) N ( ; , 1 ) e p 1 e y p p Posterior mean = precision-weighted combination of prior mean and data mean p Bayesian GLM: univariate case Normal densities p( ) N ( ; p , p1 ) y x e p( y | ) N ( y; x,e1 ) p( | y) N ( ; , 1 ) e x p 2 1 e xy p p p x Bayesian GLM: multivariate case Normal densities y X e p( ) N ( ; p , C p ) p( y | ) N ( y; X , Ce ) 1 1 C X T Ce X C p 1 β2 p( | y ) N ( ; , C ) 1 C X T Ce y C p p One step if Ce and Cp are known. Otherwise iterative estimation. β1 Approximate inference: optimization True posterior p( | y, m) mean-field approximation iteratively improve Approximate posterior 2 1 p( y, | m) p( y m) q( ) q(i ) i free energy p( y, | m) q( ) q( ) log log p( y m) q( ) log q( ) p( | y, m) Objective function Value of parameter Simple example – linear regression Data Ordinary least squares y X y ED ( y X )T ( y X ) ED 0 ˆols ( X T X ) 1 X T y Simple example – linear regression Data and model fit Ordinary least squares y X ED ( y X )T ( y X ) y ED 0 ˆols ( X T X ) 1 X T y Bases (explanatory variables) X Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares y X ED ( y X )T ( y X ) y ED 0 ˆols ( X T X ) 1 X T y Bases (explanatory variables) X Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares y X ED ( y X )T ( y X ) y ED 0 ˆols ( X T X ) 1 X T y Bases (explanatory variables) X Sum of squared errors Simple example – linear regression Data and model fit Ordinary least squares Over-fitting: model fits noise Inadequate cost function: blind to overly complex models y Solution: include uncertainty in model parameters Bases (explanatory variables) X Sum of squared errors Bayesian linear regression: priors and likelihood Model: X y X e Bayesian linear regression: priors and likelihood Model: Prior: 2 y X e p( 2 ) N k (0, 21 I k ) exp( 2 / 2) 2 1 Bayesian linear regression: priors and likelihood Model: Prior: 2 y X e p( 2 ) N k (0, 21 I k ) exp( 2 / 2) 2 1 X Sample curves from prior (before observing any data) Mean curve Bayesian linear regression: priors and likelihood y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: 1 N p ( y , 1 ) p ( yi | , 11 ) i 1 p ( yi , 1 ) N ( X i , 11 ) exp(1 ( yi X i ) 2 / 2) Bayesian linear regression: priors and likelihood y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: 1 N p ( y , 1 ) p ( yi | , 11 ) i 1 p ( yi , 1 ) N ( X i , 11 ) exp(1 ( yi X i ) 2 / 2) X Bayesian linear regression: priors and likelihood y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: 1 N p ( y , 1 ) p ( yi | , 11 ) i 1 p ( yi , 1 ) N ( X i , 11 ) exp(1 ( yi X i ) 2 / 2) X Bayesian linear regression: posterior y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: 1 N p( y , 1 ) p( yi | , 1 ) i 1 Bayes Rule: p( y, ) p( y | , ) p( | ) y Bayesian linear regression: posterior y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: N p( y , 1 ) p( yi | , 1 ) 1 i 1 Bayes Rule: p( y, ) p( y | , ) p( | ) X Posterior: p | y , N , C C 1 X T X 2 I k 1CX T y 1 Bayesian linear regression: posterior y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: N p( y , 1 ) p( yi | , 1 ) 1 i 1 Bayes Rule: p( y, ) p( y | , ) p( | ) X Posterior: p | y , N , C C 1 X T X 2 I k 1CX T y 1 Bayesian linear regression: posterior y X e Model: p( 2 ) N k (0, 21 I k ) Prior: 2 exp( 2 / 2) 2 Likelihood: N p( y , 1 ) p( yi | , 1 ) 1 i 1 Bayes Rule: p( y, ) p( y | , ) p( | ) X Posterior: p | y , N , C C 1 X T X 2 I k 1CX T y 1 Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the data p( | y ) mean: size of effect precision: variability Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold sth, given the data y p( sth | y) pth p( | y ) sth Two thresholds: • activation threshold sth : percentage of whole brain mean signal (physiologically relevant size of effect) • probability pth that voxels must exceed to be displayed (e.g. 95%) Bayesian linear regression: model selection Bayes Rule: p( y, , m) p( y | , , m) p( | , m) p( y , m) X normalizing constant Model evidence: p( y , m) p( y | , , m) p( | , m)d log p( y | , m) accuracy(m) com plexity(m) accuracy(m) y X 2 com plexity(m) k log 21 2 2 aMRI segmentation class variances 32 2 2 12 1 ith voxel label 2 yi ith voxel value 3 i class prior frequencies class means PPM of belonging to… grey matter white matter CSF Dynamic Causal Modelling: generative model for fMRI and ERPs Hemodynamic forward model: neural activityBOLD Electric/magnetic forward model: neural activityEEG MEG LFP Neural state equation: x F ( x, u, ) fMRI Neural model: 1 state variable per region bilinear state equation no propagation delays inputs ERPs Neural model: 8 state variables per region nonlinear state equation propagation delays Bayesian Model Selection for fMRI m1 m2 m3 m4 attention attention attention attention PPC stim V1 V5 PPC stim V1 V5 PPC stim V1 V5 PPC stim V1 attention models marginal likelihood ln p y m 15 PPC 10 1.25 stim 5 estimated effective synaptic strengths for best model (m4) 0.10 0.26 V1 0.39 0.26 0.13 0.46 0 m1 m2 m3 m4 [Stephan et al., Neuroimage, 2008] V5 V5 fMRI time series analysis with spatial priors degree of smoothness p N 0, 1L1 Y X aMRI Spatial precision matrix prior precision of GLM coeff Smooth Y (RFT) prior precision of AR coeff prior precision of data noise ML estimate of β VB estimate of β Y GLM coeff A AR coeff (correlated noise) observations Penny et al 2005 fMRI time series analysis with spatial priors: posterior probability maps Display only voxels that exceed e.g. 95% p pth p q( sth ) activation threshold sth Probability mass pn Mean (Cbeta_*.img) PPM (spmP_*.img) Posterior density q(βn) probability of getting an effect, given the data q(n ) N (n , n ) Std dev (SDbeta_*.img) mean: size of effect covariance: uncertainty fMRI time series analysis with spatial priors: Bayesian model selection log p( y m) F (q) Log-evidence maps subject 1 model 1 subject N model K Compute log-evidence for each model/subject fMRI time series analysis with spatial priors: Bayesian model selection log p( y m) F (q) Log-evidence maps BMS maps subject 1 model 1 subject N q(rk 0.5) 0.941 rk PPM k EPM q(rk ) model K rk Compute log-evidence for each model/subject Probability that model k generated data model k Joao et al, 2009 Reminder… Long term memory Short term memory Compare two models Short-term memory model IT indices: H,h,I,i onsets long-term memory model IT indices are smoother Missed trials H=entropy; h=surprise; I=mutual information; i=mutual surprise Group data: Bayesian Model Selection maps Regions best explained by shortterm memory model primary visual cortex Regions best explained by long-term memory model frontal cortex (executive control) Thank-you