The General Linear Model and Statistical Parametric Mapping

advertisement
Statistical Analysis
Rik Henson
With thanks to:
Karl Friston, Andrew Holmes, Stefan Kiebel, Will Penny
Overview
fMRI time-series
kernel
Design matrix
Motion
correction
Smoothing
General Linear Model
Spatial
normalisation
Statistical Parametric Map
Parameter Estimates
Standard
template
Some Terminology
• SPM (“Statistical Parametric Mapping”) is a massively
univariate approach - meaning that a statistic (e.g., T-value) is
calculated for every voxel - using the “General Linear Model”
• Experimental manipulations are specified in a model (“design
matrix”) which is fit to each voxel to estimate the size of the
experimental effects (“parameter estimates”) in that voxel…
• … on which one or more hypotheses (“contrasts”) are tested to
make statistical inferences (“p-values”), correcting for multiple
comparisons across voxels (using “Gaussian Field Theory”)
• The parametric statistics assume continuous-valued data and
additive noise that conforms to a “normal” distribution
(“nonparametric” versions of SPM eschew such assumptions)
Some Terminology
• SPM usually focused on “functional specialisation” - i.e.
localising different functions to different regions in the brain
• One might also be interested in “functional integration” - how
different regions (voxels) interact
• Multivariate approaches work on whole images and can identify
spatial/temporal patterns over voxels, without necessarily
specifying a design matrix (PCA, ICA)...
• … or with an experimental design matrix (PLS, CVA), or with an
explicit anatomical model of connectivity between regions “effective connectivity” - eg using Dynamic Causal Modelling
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
General Linear Model…
• Parametric statistics
•
•
•
•
•
•
•
•
•
•
one sample t-test
two sample t-test
paired t-test
Anova
AnCova
correlation
linear regression
multiple regression
F-tests
etc…
all cases of the
General Linear Model
General Linear Model
• Equation for single (and all) voxels:
yj = xj1 b1 + … + xjP bP + ej
yj
xjp
bp
ej
: data for scan, j = 1…N
: explanatory variables / covariates / regressors, p = 1…P
: parameters / regression slopes / fixed effects
: residual errors, independent & identically (normally) distributed
• Equivalent matrix form:
y = Xb + e
X
ej ~ N(0,s2)
: “design matrix” / model
Matrix Formulation
Equation for scan j
Simultaneous
equations for
scans 1..N(J)
Scans
Regressors
…that can be solved
for parameters b1..P(L)
General Linear Model (Estimation)
• Estimate parameters from least squares fit to data, y:
^
b = (XTX)-1XTy = X+y
(OLS estimates)
• Fitted response is:
^
Y = Xb
• Residual errors and estimated error variance are:
s^2 = rTr / df
r=y-Y
where df are the degrees of freedom (assuming iid):
df = N - rank(X)
( R = I - XX+
r = Ry
(=N-P if X full rank)
df = trace(R) )
General Linear Model (Inference)
• Specify contrast (hypothesis), c, a linear combination
of parameter estimates, cT b^
T
c = [1 -1 0 0]
• Calculate T-stastistic for that contrast:
^
T(N-p) = cTb^ / var(cTb)
= cTb^ / sqrt(s^2cT(XTX)-1c)
(c is a vector), or an F-statistic:
F(p-p0,N-p) = [(r0Tr0 – rTr) / (p-p0)] / [rTr / (N-P)]
where r0 and p0 are parameters of the reduced model
specified by c (which is a matrix)
• Prob. of falsely rejecting Null hypothesis, H0: cTb=0
(“p-value”)
F
c=
[ 2 -1 -1 0
-1 2 -1 0
-1 -1 2 0]
Example PET experiment
rank(X)=3
• 12 scans, 3 conditions (1-way ANOVA)
yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + ej
where (dummy) variables:
x1j = [0,1] = condition A (first 4 scans)
x2j = [0,1] = condition B (second 4 scans)
x3j = [0,1] = condition C (third 4 scans)
x4j = [1] = grand mean
• T-contrast :
[1 -1 0 0] tests whether A>B
[-1 1 0 0] tests whether B>A
• F-contrast:
[ 2 -1 -1 0
-1 2 -1 0
-1 -1 2 0] tests main effect of A,B,C
11
9
12
8
21
19
22
18
31
29
32
28
=
1001
1001
1001
1001
0101
0101
0101
0101
0011
0011
0011
0011
-10
0
10
20
+
1
-1
2
-2
1
-1
2
-2
1
-1
2
-2
c=[-1 1 0 0], T=10/sqrt(3.3*8)
df=12-3=9, T(9)=1.94, p<.05
Global Effects
• May be variation in PET tracer dose
from scan to scan
• Such “global” changes in image intensity
(gCBF) confound local / regional (rCBF)
changes of experiment
global
AnCova
• Adjust for global effects by:
- AnCova (Additive Model) - PET?
- Proportional Scaling -
global
fMRI?
• Can improve statistics when orthogonal
to effects of interest (as here)…
• …but can also worsen when effects of
interest correlated with global (as next)
Scaling
global
Global Effects (AnCova)
b1 b2 b3 b4 b5
• 12 scans, 3 conditions, 1 confounding covariate
yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + x5j b5 + ej
where (dummy) variables:
x1j = [0,1] = condition A (first 4 scans)
x2j = [0,1] = condition B (second 4 scans)
x3j = [0,1] = condition C (third 4 scans)
x4j = grand mean
x5j = global signal (mean over all voxels)
(further mean-corrected over all scans)
• Global correlated here with conditions (and time)
• Global estimate can be scaled to, eg, 50ml/min/dl
11
9
12
8
21
19
22
18
31
29
32
28
=
1 0 0 1 -1
1 0 0 1 -1
1 0 0 1 -1
1 0 0 1 -1
0101 0
0101 0
0101 0
0101 0
0011 1
0011 1
0011 1
0011 1
1.7
5.0
8.3
15
6.7
+
1
-1
2
-2
1
-1
2
-2
1
-1
2
-2
c=[-1 1 0 0], T=3.3/sqrt(3.8*8)
df=12-4=8, T(8)=0.61, p>.05
Global Effects (fMRI)
• Two types of scaling: Grand Mean scaling and Global scaling
• Grand Mean scaling is automatic, global scaling is optional
• Grand Mean scales by 100/mean over all voxels and ALL scans
(i.e, single number per session)
• Global scaling scales by 100/mean over all voxels for EACH scan
(i.e, a different scaling factor every scan)
• Problem with global scaling is that TRUE global is not (normally) known…
• …we only estimate it by the mean over voxels
• So if there is a large signal change over many voxels, the global estimate will
be confounded by local changes
• This can produce artifactual deactivations in other regions after global scaling
• Since most sources of global variability in fMRI are low frequency (drift),
high-pass filtering may be sufficient, and many people to not use global scaling
A word on correlation/estimability
• If any column of X is a linear
combination of any others (X is rank
deficient), some parameters cannot be
estimated uniquely (inestimable)
rank(X)=2
• … which means some contrasts cannot
be tested (eg, only if sum to zero)
A
• This has implications for whether
“baseline” (constant term) is explicitly
or implicitly modelled

cd = [1 -1 0]

B A+B
“implicit”
A
cm = [1 0 0]
B
“explicit”
A A+B

cm = [1
0]
cd = [1 
1]
cd*b = [1 -1]*b = 0.9
b1 = 1.6
b2 = 0.7
b1 = 0.9
b2 = 0.7
cd = [1 0]
cd*b = [1 0]*b = 0.9

A word on correlation/estimability
• If any column of X is a linear
combination of any others (X is rank
deficient), some parameters cannot be
estimated uniquely (inestimable)
rank(X)=2
• … which means some contrasts cannot
be tested (eg, only if sum to zero)
A
cm = [1 0 0]

cd = [1 -1 0]

B A+B
“explicit”
“implicit”
T= 1 1
0 1
• This has implications for whether
“baseline” (constant term) is explicitly
or implicitly modelled
• (rank deficiency might be thought of as
perfect correlation…)
A
A A+B
B
X(1)
*
T
=
X(2)
c(1)
*
T
=
c(2)
1 1
0 1
= [10]
[ 1 -1 ] *
A word on correlation/estimability
• When there is high (but not perfect)
correlation between regressors,
parameters can be estimated…
• …but the estimates will be inefficient
estimated (ie highly variable)
A
• … so some contrasts can still be
inefficient, even though pairwise
correlations are low

cd = [1 -1 0]

B A+B
convolved with HRF!
• …meaning some contrasts will not lead
to very powerful tests
• SPM shows pairwise correlation
between regressors, but this will NOT
tell you that, eg, X1+X2 is highly
correlated with X3…
cm = [1 0 0]
cm = [1 0 0] ()
cd = [1 -1 0] 
A
B A+B
A word on orthogonalisation
• To remove correlation between two regressors,
you can explicitly orthogonalise one (X1) with
respect to the other (X2):
X1^ = X1 – (X2X2+)X1
(Gram-Schmidt)
Y
• Paradoxically, this will NOT change the
parameter estimate for X1, but will for X2
X1
• In other words, the parameter estimate for the
orthogonalised regressor is unchanged!
• This reflects fact that parameter estimates
automatically reflect orthogonal component of
each regressor…
• …so no need to orthogonalise, UNLESS you
have a priori reason for assigning common
variance to the other regressor
X1^
b1
X2
b2
b2 ^
A word on orthogonalisation
X1
X2
b1 = 0.9
b2 = 0.7
Orthogonalise X2
(Model M1)
X1 X2^
Orthogonalise X1
(Model M2)
b1(M1) = 1.6
b2(M1) = 0.7
T = 0.5 1
-0.5 1
X1^ X2
b1(M2) = 0.9
= b1(M1) – b2(M1)
b2(M2) = 1.15 = ( b1(M1) + b2(M1) )/2
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
fMRI Analysis
• Scans are treated as a timeseries…
… and can be filtered to remove low-frequency (1/f) noise
• Effects of interest are convolved with haemodynamic (BOLD)
response function (HRF), to capture sluggish nature of response
• Scans can no longer be treated as independent observations…
… they are typically temporally autocorrelated (for TRs<8s)
fMRI Analysis
• Scans are treated as a timeseries…
… and can be filtered to remove low-frequency (1/f) noise
• Effects of interest are convolved with haemodynamic (BOLD)
response function (HRF), to capture sluggish nature of response
• Scans can no longer be treated as independent observations…
… they are typically temporally autocorrelated (for TRs<8s)
(Epoch) fMRI example…
= b1
+ b2
+ e(t)
(box-car
unconvolved)
voxel timeseries
box-car function
baseline (mean)
(Epoch) fMRI example…
b1

=
+
b2
y
=
X

b
+
e
Low frequency noise
• Low frequency noise:
– Physical (scanner drifts)
– Physiological (aliased)
aliasing
• cardiac (~1 Hz)
• respiratory (~0.25 Hz)
power spectrum
noise
signal
(eg infinite 30s on-off)
power spectrum
highpass
filter
(Epoch) fMRI example…
...with highpass filter
b1
b2
b3
b4
=
b5
+
b6
b7
b8
b9
y
=
X
 b
+
e
(Epoch) fMRI example…
…fitted and adjusted data
Raw fMRI timeseries
Adjusted data
fitted box-car
highpass filtered (and scaled)
fitted high-pass filter
Residuals
fMRI Analysis
• Scans are treated as a timeseries…
… and can be filtered to remove low-frequency (1/f) noise
• Effects of interest are convolved with haemodynamic (BOLD)
response function (HRF), to capture sluggish nature of response
• Scans can no longer be treated as independent observations…
… they are typically temporally autocorrelated (for TRs<8s)
Convolution with HRF
Unconvolved fit
Residuals

Boxcar function
Convolved fit
=
hæmodynamic response
convolved with HRF
Residuals (less structure)
fMRI Analysis
• Scans are treated as a timeseries…
… and can be filtered to remove low-frequency (1/f) noise
• Effects of interest are convolved with haemodynamic (BOLD)
response function (HRF), to capture sluggish nature of response
• Scans can no longer be treated as independent observations…
… they are typically temporally autocorrelated (for TRs<8s)
Temporal autocorrelation…
• Because the data are typically correlated from one scan to the next, one
cannot assume the degrees of freedom (dfs) are simply the number of scans
minus the dfs used in the model – need “effective degrees of freedom”
• In other words, the residual errors are not independent:
Y = Xb + e
e ~ N(0,s2V)
V  I, V=AA'
where A is the intrinsic autocorrelation
• Generalised least squares:
KY = KXb + Ke
Ke ~ N(0, s2V)
(autocorrelation is a special case of “nonsphericity”…)
V = KAA'K'
Temporal autocorrelation (History)
KY = KXb + Ke Ke ~ N(0, s2V)
V = KAA'K'
• One method is to estimate A, using, for example, an AR(p) model, then:
K = A-1
V=I
(allows OLS)
This “pre-whitening” is sensitive, but can be biased if K mis-estimated
• Another method (SPM99) is to smooth the data with a known autocorrelation
that swamps any intrinsic autocorrelation:
K=S
V = SAA'S’ ~ SS'
(use GLS)
Effective degrees of freedom calculated with Satterthwaite approximation
df = trace(RV)2/trace(RVRV) )
This is more robust (providing the temporal smoothing is sufficient, eg 4s
FWHM Gaussian), but less sensitive
• Most recent method (SPM2) is to restrict K to highpass filter, and estimate
residual autocorrelation A using voxel-wide, one-step ReML…
(
New in
SPM2
Nonsphericity and ReML (SPM2)
Scans
• Nonsphericity means (kind of) that:
Ce = cov(e)  s2I
cov(e)
spherical
Scans
• Nonsphericity can be modelled by set
of variance components:
Ce = 1Q1 + 2Q2 + 3Q3 ...
(i are hyper-parameters)
- Non-identical (inhomogeneous):
(e.g, two groups of subjects)
Q1 =
Q2 =
- Non-independent (autocorrelated):
(e.g, white noise + AR(1))
Q1 =
Q2 =
New in
SPM2
Nonsphericity and ReML (SPM2)
• Joint estimation of parameters and hyperparameters requires ReML
• ReML gives (Restricted) Maximum Likelihood
(ML) estimates of (hyper)parameters, rather
than Ordinary Least Square (OLS) estimates
• ML estimates are more efficient, entail exact dfs
(no Satterthwaite approx)…
• …but computationally expensive: ReML is
iterative (unless only one hyper-parameter)
Ce = ReML( yyT, X, Q )
b^ OLS = (XTX)-1XTy (= X+y)
b^ ML = (XTCe-1X)-1XTCe-1y
V = ReML(  yjyjT, X, Q )
• To speed up:
– Correlation of errors (V) estimated by pooling
over voxels
– Covariance of errors (s2V) estimated by
single, voxel-specific scaling hyperparameter
 yy
voxel
T
ˆ1Q1  ˆ2Q2
New in
SPM2
1.
Nonsphericity and ReML (SPM2)
Voxels to be pooled collected by first-pass
through data (OLS)
B
(biased if correlation structure
not stationary across voxels?)
2.
Correlation structure V estimated iteratively
using ReML once, pooling over all voxels
3.
Remaining hyper-parameter estimated using
V and ReML noniteratively, for each voxel
•
Estimation of nonsphericity is used to prewhiten the data and design matrix, W=V-1/2 (or
by KW, if highpass filter K present)
•
(which is why design matrices in SPM2 can
differ from those in SPM99 after estimation)
X
W
WX
New in
SPM2
The Full-Monty T-test (SPM2)
y = Xb  e
c bˆ
t=
Stˆd (cT bˆ )
T
b̂ = (WX )  Wy
W =V
1 / 2
s 2V = cov( e )
T
2
T


ˆ
Stˆd (c b ) = sˆ c (WX ) (WX ) c
T
cc==+1
+100000000000000000000
X
sˆ
2
(

=
WY  WXbˆ
)
V
2
trace( R)
R = I  WX (WX ) 
ReMLReMLestimation
estimation
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
Multiple comparisons…
• If n=100,000 voxels tested with
pu=0.05 of falsely rejecting Ho...
…then approx n  pu (eg 5,000)
will do so by chance (false
positives, or “type I” errors)
SPM{t}
Eg random noise
• Therefore need to “correct” pvalues for number of comparisons
• A severe correction would be a
Bonferroni, where pc = pu /n…
Gaussian
…but this is only appropriate when
10mm FWHM
the n tests independent…
(2mm pixels)
… SPMs are smooth, meaning that
nearby voxels are correlated
=> Gaussian Field Theory...
pu = 0.05
Gaussian Field Theory
• Consider SPM as lattice representation
of continuous random field
• “Euler characteristic” - topological
measure of “excursion set” (e.g,
# components - # “holes”)
• Smoothness estimated by covariance of
partial derivatives of residuals
(expressed as “resels” or FWHM)
• Assumes:
1) residuals are multivariate normal
2) smoothness » voxel size
(practically, FWHM  3  VoxDim)
• Not necessarily stationary: smoothness
estimated locally as resels-per-voxel
Generalised Form
• General form for expected Euler characteristic for D dimensions:
E[(WAu)] =
 R (W) r (u)
d
d
Rd (W): d-dimensional Minkowski
rd (W): d-dimensional EC density of Z(x)
– function of dimension, d, space W and
smoothness:
– function of dimension, d, threshold, u, and
statistic, e.g. Z-statistic:
R0(W)
R1(W)
R2(W)
R3(W)
=
=
=
=
(W) Euler characteristic of W
resel diameter
resel surface area
resel volume
r0(u)
r1(u)
r2(u)
r3(u)
r4(u)
= 1- (u)
= (4 ln2)1/2 exp(-u2/2) / (2p)
= (4 ln2)
exp(-u2/2) / (2p)3/2
= (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2p)2
= (4 ln2)2 (u3 -3u) exp(-u2/2) / (2p)5/2
Levels of Inference
• Three levels of inference:
– extreme voxel values
 voxel-level inference
Omnibus: P(c  7, t  u) = 0.031
voxel-level: P(t  4.37) = .048
– big suprathreshold clusters
n=1
2
 cluster-level inference
– many suprathreshold clusters
 set-level inference
n=82
Parameters:
“Height” threshold, u
“Extent” threshold, k
- t > 3.09
- 12 voxels
Dimension, D
Volume, S
Smoothness, FWHM
-3
- 323 voxels
- 4.7 voxels
n=32
cluster-level: P(n  82, t  u) = 0.029
set-level: P(c  3, n  k, t  u) = 0.019
(Spatial) Specificity vs. Sensitivity
Small-volume correction
• If have an a priori region of interest, no need to correct for wholebrain!
• But can use GFT to correct for a Small Volume (SVC)
• Volume can be based on:
– An anatomically-defined region
– A geometric approximation to the above (eg rhomboid/sphere)
– A functionally-defined mask (based on an ORTHOGONAL contrast!)
• Extent of correction can be APPROXIMATED by a Bonferonni
correction for the number of resels…
• ..but correction also depends on shape (surface area) as well as
size (volume) of region (may want to smooth volume if rough)
Example SPM window
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
Fixed vs. Random Effects
• Subjects can be Fixed or Random variables
• If subjects are a Fixed variable in a single design
matrix (SPM “sessions”), the error term conflates
within- and between-subject variance
– In PET, this is not such a problem because the
within-subject (between-scan) variance can be as
great as the between-subject variance; but in fMRI
the between-scan variance is normally much
smaller than the between-subject variance
• If one wishes to make an inference from a subject
sample to the population, one needs to treat
subjects as a Random variable, and needs a proper
mixture of within- and between-subject variance
Multi-subject Fixed Effect model
Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
• In SPM, this is achieved by a two-stage procedure:
1) (Contrasts of) parameters are estimated from a
(Fixed Effect) model for each subject
2) Images of these contrasts become the data for a
second design matrix (usually simple t-test or ANOVA)
Subject 6
error df ~ 300
Two-stage “Summary Statistic” approach
1st-level (within-subject)
2nd-level (between-subject)

b^2
^ 22
s

b^3
^ 23
s

b^4
^ 24
s

b^5
^ 25
s

b^6
^ 26
s

One-sample
t-test
contrast images of cbi
b^1
^ 21
s
^s2 = within-subject error
w

N=6 subjects
(error df =5)
p < 0.001 (uncorrected)
SPM{t}
b^pop
WHEN special case of n independent
observations per subject:
var(bpop) = s2b / N + s2w / Nn
New in
SPM2
Limitations of 2-stage approach
• Summary statistic approach is a special case, valid
only when each subject’s design matrix is identical
(“balanced designs”)
• In practice, the approach is reasonably robust to
unbalanced designs (Penny, 2004)
• More generally, exact solutions to any hierarchical
GLM can be obtained using ReML
• This is computationally expensive to perform at
every voxel (so not implemented in SPM2)
• Plus modelling of nonsphericity at 2nd-level can
minimise potential bias of unbalanced designs…
New in
SPM2
Nonsphericity again!
• When tests at 2nd-level are more complicated than
1/2-sample t-tests, errors can be non i.i.d
Inhomogeneous variance
(3 groups of 4 subjects)
1
• For example, two groups (e.g, patients and controls)
may have different variances (non-identically
distributed; inhomogeniety of variance)
• Or when taking more than one parameter per subject
(repeated measures, e.g, multiple basis functions in
event-related fMRI), errors may be non-independent
(If nonsphericity correction selected, inhomogeniety
assumed, and further option for repeated measures)
2
3
Q
Repeated measures
(3 groups of 4 subjects)
• Same method of variance component estimation with
ReML (that used for autocorrelation) is used
(Greenhouse-Geisser correction for repeatedmeasures ANOVAs is a special case approximation)
Q
New in
SPM2
Hierarchical Models
• Two-stage approach is special case of
Hierarchical GLM
y
= X(1) (1) + e(1)
(1) = X(2) (2) + e(2)
• In a Bayesian framework, parameters of one
level can be made priors on distribution of
parameters at lower level: “Parametric
Empirical Bayes” (Friston et al, 2002)
• The parameters and hyperparameters at each
level can be estimated using EM algorithm
(generalisation of ReML)
• Note parameters and hyperparameters at final
level do not differ from classical framework
• Second-level could be subjects; it could also
be voxels…
…
(n-1) = X(n) (n) + e(n)
Ce(i) =  k(i) Qk(i)
New in
SPM2
Parametric Empirical Bayes & PPMs
• Bayes rule:
p(|y) = p(y|) p()
Posterior
Likelihood
(PPM)
(SPM)
Prior
• What are the priors?
– In “classical” SPM, no (flat) priors
– In “full” Bayes, priors might be from theoretical
arguments, or from independent data
– In “empirical” Bayes, priors derive from same
data, assuming a hierarchical model for
generation of that data
New in
SPM2
Parametric Empirical Bayes & PPMs
• Bayes rule:
Classical T-test
p(|y) = p(y|) p()
Posterior
Likelihood
(PPM)
(SPM)
u
Prior
p (t |  = 0)
t = f ( y)
• For PPMs in SPM2, priors come from
distribution over voxels
• If remove mean over voxels, prior mean can be
set to zero (a “shrinkage” prior)
• One can threshold posteriors for a given
probability of a parameter estimate greater than
some value …
• …to give a posterior probability map (PPM)
Bayesian test

p ( | y )

New in
SPM2
Parametric Empirical Bayes & PPMs
rest [2.06]
rest
contrast(s)
<
PPM 2.06
SPMresults: C:\home\spm\analysis_PET
Height threshold P = 0.95
Extent threshold k = 0 voxels
SPMmip
[0, 0, 0]
<
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
<
SPM{T39.0}
SPMresults: C:\home\spm\analysis_PET
Height threshold T = 5.50
Extent threshold k = 0 voxels
1 4 7 10 13 16 19 22
Design matrix
3
<
4
<
SPMmip
[0, 0, 0]
<
contrast(s)
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
1 4 7 10 13 16 19 22
Design matrix
• Activations greater than certain amount
Voxels with non-zero activations
• Can infer no responses
Cannot “prove the null hypothesis”
• No fallacy of inference
Fallacy of inference (large df)
• Inference independent of search volume
Correct for search volume
• Computationally expensive
Computationally faster
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
A categorical analysis
Experimental design
Word generation G
Word repetition R
RGRGRGRGRGRG
G - R = Intrinsic word generation
…under assumption of pure insertion,
ie, that G and R do not differ in other ways
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
Cognitive Conjunctions
• One way to minimise problem of pure insertion is to
isolate same process in several different ways (ie,
multiple subtractions of different conditions)
Object viewing
Colour viewing
Object naming
Colour naming
R,V
V
P,R,V
P,V
(Object - Colour viewing) [1 -1 0 0]
&
(Object - Colour naming) [0 0 1 -1]
[ R,V - V ] & [ P,R,V - P,V ] = R & R = R
(assuming RxP = 0; see later)
Objects Colours
V
R
P
Viewing
Stimuli (A/B)
Visual Processing
Object Recognition
Phonological Retrieval
Task (1/2)
Price et al, 1997
Naming
A1
A2
B1
B2
Common object
recognition response (R)
Cognitive Conjunctions
• Original (SPM97) definition of conjunctions
entailed sum of two simple effects (A1-A2 +
B1-B2) plus exclusive masking with
interaction (A1-A2) - (B1-B2)
B1-B2
New in
SPM2
p((A1-A2)=
(B1-B2))>P2
+
• Ie, “effects significant and of similar size”
• (Difference between conjunctions and
masking is that conjunction p-values reflect
the conjoint probabilities of the contrasts)
• However, the logic has changed slightly, in
that voxels can survive a conjunction even
though they show an interaction
A1-A2
B1-B2
• SPM2 defintion of conjunctions uses
advances in Gaussian Field Theory (e.g,
T2 fields), allowing corrected p-values
p(A1=A2+B1=B2)<P1
p(A1=A2)<p
+
p(B1=B2)<p
A1-A2
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
Nonlinear parametric responses
Inverted ‘U’ response to
increasing word presentation
rate in the DLPFC
Polynomial expansion:
f(x) ~ b1 x + b2 x2 + ...
…(N-1)th order for N levels
SPM{F}
Linear
E.g, F-contrast [0 1 0] on
Quadratic Parameter =>
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
Interactions and pure insertion
• Presence of an interaction can show a failure of
pure insertion (using earlier example)…
R,V
V
P,R,V,RxP
P,V
(Object – Colour) x (Viewing – Naming)
[1 -1 0 0] - [0 0 1 -1] = [1 -1]  [1 -1] = [1 -1 -1 1]
[ R,V - V ] - [ P,R,V,RxP - P,V ] = R – R,RxP = RxP
Objects Colours
Object viewing
Colour viewing
Object naming
Colour naming
Object - Colour
V
R
P
Viewing
Stimuli (A/B)
Visual Processing
Object Recognition
Phonological Retrieval
Task (1/2)
Naming
A1
A2
B1
B2
Naming-specific
object recognition
viewing
naming
A taxonomy of design
•
Categorical designs
Subtraction
Conjunction
•
Parametric designs
Linear
Nonlinear
•
- Additive factors and pure insertion
- Testing multiple hypotheses
- Cognitive components and dimensions
- Polynomial expansions
Factorial designs
Categorical
Parametric
- Interactions and pure insertion
- Adaptation, modulation and dual-task inference
- Linear and nonlinear interactions
- Psychophysiological Interactions
Psycho-physiological Interaction (PPI)
Parametric, factorial design, in which
one factor is psychological (eg attention)
V1 activity
...and other is physiological (viz. activity
extracted from a brain region of interest)
Attention
V5
Attentional modulation of
V1 - V5 contribution
time
V5 activity
V1
SPM{Z}
attention
no attention
V1 activity
New in
SPM2
Psycho-physiological Interaction (PPI)
• PPIs tested by a GLM with form:
y = (V1A).b1 + V1.b2 + A.b3 + e
c = [1 0 0]
• However, the interaction term of interest, V1A, is the product of V1
activity and Attention block AFTER convolution with HRF
• We are really interested in interaction at neural level, but:
(HRF  V1)  (HRF  A) 
HRF  (V1  A)
(unless A low frequency, eg, blocked; so problem for event-related PPIs)
• SPM2 can effect a deconvolution of physiological regressors (V1), before
calculating interaction term and reconvolving with the HRF
• Deconvolution is ill-constrained, so regularised using smoothness priors
(using ReML)
Overview
1. General Linear Model
Design Matrix
Global normalisation
2. fMRI timeseries
Highpass filtering
HRF convolution
Temporal autocorrelation
3. Statistical Inference
Gaussian Field Theory
4. Random Effects
5. Experimental Designs
6. Effective Connectivity
Effective vs. functional connectivity
Correlations:
No connection between B and C,
yet B and C correlated because
of common input from A, eg:
A = V1 fMRI time-series
B = 0.5 * A + e1
C = 0.3 * A + e2
A
1
0.49
0.30
C
1
0.12
1
B
0.49
A
-0.02
2=0.5,
0.31
Effective connectivity
B
C
ns.
Functional
connectivity
New in
SPM2
Dynamic Causal Modelling
• PPIs allow a simple (restricted) test of effective connectivity
• Structural Equation Modelling is more powerful (Buchel & Friston, 1997)
• However in SPM2, Dynamic Causal Modelling (DCM) is preferred
• DCMs are dynamic models specified at the neural level
• The neural dynamics are transformed into predicted BOLD signals using a
realistic biological haemodynamic forward model (HDM)
• The neural dynamics comprise a deterministic state-space model and a
bilinear approximation to model interactions between variables
New in
SPM2
Dynamic Causal Modelling
• The variables consist of:
connections between regions
self-connections
direct inputs (eg, visual stimulations)
contextual inputs (eg, attention)
• Connections can be bidirectional
direct inputs - u1
contextual inputs - u2
(e.g. visual stimuli)
(e.g. attention)
z1 V1
z2 V5
y1
y2
z3 SPC
• Variables estimated using EM algorithm
• Priors are:
empirical (for haemodynamic model)
principled (dynamics to be convergent)
shrinkage (zero-mean, for connections)
• Inference using posterior probabilities
• Methods for Bayesian model comparison
y3
.
z = f(z,u,z)  Az + uBz + Cu
y = h(z,h) + e
z = state vector
u = inputs
 = parameters (connection/haemodynamic)
New in
SPM2
Dynamic Causal Modelling
stimuli
u1
context
u2

+
u1
-
-
u2
Z1
+
z1
+
Z2
-

z2
New in
SPM2
Dynamic Causal Modelling
Attention
Photic
.52 (98%)
.37
(90%)
.42
(100%)
.56
(99%)
V1
Büchel & Friston (1997)
Motion
Effects
Photic – dots vs fixation
Motion – moving vs static
Attenton – detect changes
SPC
.69 (100%)
.47
(100%)
.82
(100%)
.65 (100%)
IFG
V5
Friston et al. (2003)
• Attention modulates the backwardconnections IFG→SPC and SPC→V5
• The intrinsic connection V1→V5 is
insignificant in the absence of motion
Some References
Friston KJ, Holmes AP, Worsley KJ, Poline J-B, Frith CD, Frackowiak RSJ (1995) Statistical parametric maps in
functional imaging: A general linear approach” Human Brain Mapping 2:189-210
Worsley KJ & Friston KJ (1995) Analysis of fMRI time series revisited — again” NeuroImage 2:173-181
Friston KJ, Josephs O, Zarahn E, Holmes AP, Poline J-B (2000) “To smooth or not to smooth” NeuroImage
Zarahn E, Aguirre GK, D'Esposito M (1997) “Empirical Analyses of BOLD fMRI Statistics” NeuroImage 5:179-197
Holmes AP, Friston KJ (1998) “Generalisability, Random Effects & Population Inference” NeuroImage 7(4-2/3):S754
Worsley KJ, Marrett S, Neelin P, Evans AC (1992) “A three-dimensional statistical analysis for CBF activation studies in
human brain”Journal of Cerebral Blood Flow and Metabolism 12:900-918
Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC (1995) “A unified statistical approach for determining
significant signals in images of cerebral activation” Human Brain Mapping 4:58-73
Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC (1994) Assessing the Significance of Focal Activations
Using their Spatial Extent” Human Brain Mapping 1:214-220
Cao J (1999) The size of the connected components of excursion sets of 2, t and F fields” Advances in Applied Probability
(in press)
Worsley KJ, Marrett S, Neelin P, Evans AC (1995) Searching scale space for activation in PET images” Human Brain
Mapping 4:74-90
Worsley KJ, Poline J-B, Vandal AC, Friston KJ (1995) Tests for distributed, non-focal brain activations” NeuroImage
2:183-194
Friston KJ, Holmes AP, Poline J-B, Price CJ, Frith CD (1996) Detecting Activations in PET and fMRI: Levels of Inference
and Power” Neuroimage 4:223-235
PCA/SVD and Eigenimages
A time-series of 1D images
128 scans of 32 “voxels”
Expression of 1st 3 “eigenimages”
Eigenvalues and spatial “modes”
The time-series ‘reconstituted’
PCA/SVD and Eigenimages
V1
voxels
V2
U1
Y
(DATA)
=
s1
V3
U2
APPROX.
OF Y
+ s2
U3
APPROX.
OF Y
+ s3
APPROX.
OF Y
time
Y = USVT = s1U1V1T + s2U2V2T + ...
+ ...
Time x Condition interaction
Time x condition interactions (i.e. adaptation)
assessed with the SPM{T}
Structural Equation Modelling (SEM)
Minimise the difference between the observed (S) and implied () covariances by adjusting the
path coefficients (B)
The implied covariance structure:
x
= x.B + z
x
= z.(I - B)-1
x : matrix of time-series of Regions 1-3
B: matrix of unidirectional path coefficients
z
z
B12
1
2
B13
Variance-covariance structure:
xT . x = 
= (I-B)-T. C.(I-B)-1
where C
= zT z
B23
3
z
xT.x is the implied variance covariance structure 
C contains the residual variances (u,v,w) and covariances
The free parameters are estimated by minimising a [maximum likelihood] function of S and 
Attention - No attention
0.43
0.75
0.47
0.76
No attention
Changes in “effective connectivity”
Attention
Second-order Interactions
2 =11, p<0.01
PP
V1
0.14
V5
=
V1xPP
Modulatory influence of parietal cortex on V1 to V5
V5
Download