worsley ismrm

advertisement
Model-driven statistical analysis
of fMRI data
Keith Worsley
Department of Mathematics and Statistics,
Brain Imaging Centre, Montreal Neurological
Institute, McGill University
www.math.mcgill.ca/keith
References
• Worsley et al. (2002). A general statistical
analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the
response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from
www.math.mcgill.ca/keith/fmristat
fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, hot, rest, …
First scan of fMRI data
1000
Highly significant effect, T=6.59
hot
rest
warm
890
880
870
500
0
100
200
300
No significant effect, T=-0.74
820
hot
rest
warm
0
800
T statistic for hot - warm effect
5
0
-5
T = (hot – warm effect) / S.d.
~ t110 if no effect
0
100
200
Drift
300
810
800
790
0
100
200
300
Time, seconds
Exploring the data: PCA of time
space
Component
Temporal components (sd, % variance explained)
1
105.7, 77.8%
2
26.1, 4.8%
3
15.8, 1.7%
4
14.8, 1.5%
0
20
40
60
80
100
120
Frame
Spatial components
Component
1
2
3
1: exclude
first frames
2: drift
3: long-range
correlation
or anatomical
effect: remove
by converting
to % of brain
4
0
2
4
6
Slice
8
10
4: signal?
Modeling the data: Choices …
•
•
•
•
•
•
•
Time domain / frequency domain?
AR / ARMA / state space models?
Linear / non-linear time series model?
Fixed HRF / estimated HRF?
Voxel / local / global parameters?
Fixed effects / random effects?
Frequentist / Bayesian?
Compromise:
Simple, general, valid, robust, fast statistical analysis
Covariates example: pain perception
2
Alternating hot and warm stimuli separated by rest (9 seco
1
0
-1
0
50
100
150
200
250
300
350
Hemodynamic response function: difference of two gamma den
0.4
0.2
0
-0.2
0
50
Responses = stimuli * HRF, sampled every 3 seconds
2
1
0
-1
0
50
100
150
200
Time, seconds
250
300
350
Linear model for fMRI time series
with AR(p) correlated errors
• Linear model:
?
?
Yt = (stimulust * HRF) b + driftt c + errort
• AR(p) errors:
unknown parameters
?
?
?
errort = a1 errort-1 + … + ap errort-p + s WNt
‘White Noise’
First step: estimate the autocorrelation
?
AR(1) model: errort = a1 errort-1 + s WNt
• Fit the linear model using least squares
• errort = Yt – fitted Yt
• â1 = Correlation ( errort , errort-1)
• Estimating errort’s changes their correlation
structure slightly, so â1 is slightly biased:
Raw autocorrelation Smoothed 15mm
~ -0.05
Bias corrected â1
~0
0.3
0.2
0.1
0
-0.1
Second step: pre-whiten, refit the linear model
Pre-whiten: Yt* = Yt – â1 Yt-1, then refit using least squares:
Hot - warm effect, %
Sd of effect, %
1
0.25
0.2
0.5
0.15
0
0.1
-0.5
0.05
-1
0
T = effect / sd, 110 df
6
4
2
0
-2
-4
-6
T > 4.93
(P < 0.05,
corrected)
Higher order AR model? Try AR(3):
a
1
a
2
a
3
0.3
0.2
AR(1) seems
to be adequate
0.1
0
… has little effect on the T statistics:
No correlation
AR(1)
AR(2)
-0.1
AR(3)
5
0
-5
biases T up ~12%
more false positives
Results from 4 runs on the same subject
Run 1
Run 2
Run 3
Run 4
1
Effect,
E
i
0
-1
0.2
Sd,
S
i
0.1
0
5
T stat,
E i / Si
0
-5
Mixed effects linear model
for combining effects from different
runs/sessions/subjects:
• Ei = effect for run/session/subject i
from
• Si = standard error of effect
Lin. Mod.
• Mixed effects model:
?
?
F
Ei = covariatesi c + Si WNi +  WNiR
}
Usually 1, but
could add group,
treatment, age,
sex, ...
‘Fixed effects’ error,
due to variability
within the same run
Random effect,
due to variability
from run to run
REML estimation using the
EM algorithm
•
•
•
•
Slow to converge (10 iterations by default).
^2 > 0 ), but
Stable (maintains estimate 
^2 biased if 2 (random effect) is small, so:

Re-parameterize the variance model:
?2
2
Var(Ei) = Si + 
= (Si2 – minj Sj2) + (2 + minj Sj2)
? 2
2
=
Si*
+
*
^2 = *
^ 2 – min S 2 (less biased estimate)
• 
j j
Problem: 4 runs, 3 df for random effects sd
Run 1
Run 2
Run 3
Run 4
...
MULTISTAT
Effect,
E
i
1
0
… very noisy sd:
-1
0.2
Sd,
S
0.1
i
… and T>15.96 for P<0.05 (corrected):
0
5
T stat,
E i / iS
0
… so no response is detected …
-5
Solution: Spatial regularization of the sd
• Basic idea: increase df by spatial smoothing
(local pooling) of the sd.
• Can’t smooth the random effects sd directly,
- too much anatomical structure.
• Instead,
sd = smooth
random effects sd
fixed effects sd
) fixed effects sd
which removes the anatomical structure
before smoothing.
^
Average Si
Random effects sd, 3 Fixed
df
effects sd, 440Mixed
df effects sd, ~100 df
0.2
0.15
0.1
0.05
0
divide
Random sd / fixed sd
multiply
Smoothed sd ratio
1.5
1
0.5
random
effect, sd
ratio ~1.3
Effective df depends on smoothing
(
FWHMratio2
dfratio = dfrandom 2 FWHM 2 + 1
data
1 = 1 + 1
dfeff dfratio dffixed
)
3/2
e.g. dfrandom = 3,
dffixed = 4  110
= 440,
FWHMdata = 8mm:
fixed effects
analysis,
dfeff = 440
400
300
dfeff
Target = 100 df
random effects
analysis,
dfeff = 3
200
FWHM
= 19mm
100
0
0
20
40
FWHMratio
Infinity
Why 100?
If out by 50%,
dbn of T not
much affected
Final result: 19mm smoothing, 100 effective df …
Run 1
Run 2
Run 3
Run 4
MULTISTAT
Effect,
Ei
1
0
… less noisy sd:
-1
0.2
Sd,
Si
0.1
… and T>4.93 for P<0.05 (corrected):
0
5
T stat,
E i / Si
0
… and now we can detect a response!
-5
P-values assessed for:
• Peaks or local maxima
• Spatial extent of clusters of neighbouring voxels
above a pre-chosen threshold (~3)
• Correct for searching over a pre-specified region
(usually the whole brain), which depends on:
– number of voxels in the search region (Bonferroni) or
– number of resels = volume / FWHM3 in the search
region (random field theory)
– in practice, take the minimum of the two!
FWHM is spatially varying
(non-isotropic)
• fMRI data is smoother in GM than WM
• VBM data is highly non-isotropic
• Has little effect on P-values for local maxima (use
‘average’ FWHM inside search region), but
• Has a big effect on P-values for spatial extents:
smooth regions → big clusters,
rough regions → small clusters, so
• Replace cluster volume by cluster resels
= volume / FWHM3
FWHM – the local smoothness of the noise
FWHM =
voxel size
(2 log 2)1/2
1/2
(1 – correlation)
(If the noise is modeled as white noise smoothed
with a Gaussian kernel, this would be its FWHM)
P-values depend on resels:
0.1
Clusters above t = 3.0, search volume resels = 500
0.1
P value of cluster
P value of local max
Local maximum T = 4.5
0.08
0.06
0.04
0.02
0
0
resels =
Volume
FWHM3
500
1000
Resels of search volume
0.08
0.06
0.04
0.02
0
0
0.5
1
1.5
Resels of cluster
2
FWHM (mm) of scans (110 df)
Resels=1.90
P=0.007
Resels=0.57
P=0.387
FWHM (mm) of effects (3 df)
20
20
15
15
10
10
5
5
0
0
FWHM of effects (smoothed)
effects / scans FWHM (smoothed)
20
1.5
15
10
1
5
0
0.5
Statistical summary: clusters
•
•
•
•
•
•
•
•
•
•
•
•
clus
1
2
3
4
5
6
7
8
9
10
vol
33992
14150
12382
2538
2538
1577
1000
500
1000
385
resel
54.22
25.03
20.29
3.12
2.77
2.15
1.43
1.31
1.07
0.99
p-val
0
0
0
0.011
0.016
0.035
0.098
0.119
0.179
0.208
(one)
(
0)
(
0)
(
0)
(0.001)
(0.001)
(0.002)
(0.006)
(0.007)
(0.011)
(0.013)
Statistical summary: peaks
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
clus
1
1
1
1
1
1
peak
12.72
12.58
11.45
11.08
10.95
10.6
p-val
0
0
0
0
0
0
(one)
(
0)
(
0)
(
0)
(
0)
(
0)
(
0)
q-val
0
0
0
0
0
0
(i
(59
(60
(61
(62
(61
(62
j
74
75
73
66
70
69
k)
1)
1)
2)
4)
4)
3)
(
x
( 10.5
( 8.2
( 5.9
( 3.5
( 5.9
( 3.5
y
-28.7
-31
-25.3
-6.9
-16.2
-15
z )
24.1)
23.7)
17.5)
6.3)
4.8)
12.1)
2
3
3
13
6
11
9
1
3
3
1
3
1
5
5.07
5.06
5.03
5.02
4.91
4.91
4.91
4.85
4.82
4.81
4.8
4.77
4.75
4.73
0.029
0.029
0.033
0.035
0.054
0.055
0.055
0.069
0.08
0.082
0.086
0.097
0.106
0.114
(0.004)
(0.004)
(0.004)
(0.005)
(0.007)
(0.007)
(0.007)
(0.008)
(0.009)
(0.009)
( 0.01)
(0.011)
(0.012)
(0.012)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(48
(73
(81
(88
(42
(69
(48
(52
(79
(78
(62
(82
(55
(67
69
72
63
72
69
70
46
93
66
65
59
61
71
84
10)
9)
10)
8)
3)
7)
5)
2)
8)
8)
5)
10)
2)
2)
( 36.3
(-22.3
( -41
(-57.4
( 50.4
(-12.9
( 36.3
(
27
(-36.3
( -34
( 3.5
(-43.4
( 19.9
( -8.2
-7.3
-15.3
6.6
-16.4
-15
-12.9
40.5
-71.6
-2.5
-0.2
10.4
11.2
-20.7
-50.8
-36.3)
-30.5)
-34.1)
-23.6)
12.1)
-15.9)
6.7)
10.2)
-21.4)
-21)
1.9)
-33.4)
18.3)
13.5)
T>4.86
T>4.86
T > 4.93
(P < 0.05, corrected)
T>4.86
T > 4.93
(P < 0.05, corrected)
T>4.86
Efficiency : optimum block design
InterStimulus Interval (secs)
Sd of hot stimulus
Magnitude
Delay
0.5
20
0.4
15
Optimum
design
10
15
20
0
(secs)
1
20
0.8
15
Optimum
design
5
X
(Not enough signal)
10
15
0.2
Optimum
design
5
0
5
X
10
15
0.1
20
20
0.8
15
0.6
10
0.4
0.2
20
0
0
(secs)
1
0.6
10
5
0.3
0.2
0
10
0.4
15
10
0.1
5
0.5
20
0.3
X
5
Sd of hot-warm
5
0
Optimum
design
0.4
X
(Not enough signal)
5
10
Stimulus Duration (secs)
15
0.2
20
0
Sd of effect (secs for delays)
Efficiency : optimum event design
0.5
0.45
(Not
enough
signal)
____ magnitudes
……. delays
uniform . . . . . . . . .
random .. . ... .. .
concentrated
:
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
5
10
15
Average time between events (secs)
20
How many subjects?
• Largest portion of variance comes from the
last stage i.e. combining over subjects:
sdrun2
sdsess2
sdsubj2
nrun nsess nsubj + nsess nsubj + nsubj
• If you want to optimize total scanner time,
take more subjects.
• What you do at early stages doesn’t matter
very much!
References
• Worsley et al. (2002). A general statistical
analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the
response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from
www.math.mcgill.ca/keith/fmristat
Estimating the delay of the response
• Delay or latency to the peak of the HRF is approximated by
a linear combination of two optimally chosen basis functions:
delay
0.6
0.4
basis1
0.2
HRF
basis2
0
-0.2
-0.4
-5
0
shift
5
10
15
t (seconds)
20
25
HRF(t + shift) ~ basis1(t) w1(shift) + basis2(t) w2(shift)
• Convolve bases with the stimulus, then add to the linear model
3
• Fit linear model, estimate w1 and w2
1
• Equate w2 / w1 to estimates, then
solve for shift (Hensen et al., 2002)
w2 / w1
2
w1
• To reduce bias when the magnitude
is small, use
0
-1
shift / (1 + 1/T2)
w2
-2
where T = w1 / Sd(w1) is the T statistic
for the magnitude
-3
-5
• Shrinks shift to 0 where there is little
evidence for a response.
0
shift (seconds)
5
Shift of the hot stimulus
T stat for magnitude
6
T stat for shift
6
4
4
2
2
0
0
-2
-2
-4
-4
-6
-6
Shift (secs)
Sd of shift (secs)
4
2
2
1.5
0
1
-2
0.5
-4
0
Shift of the hot stimulus
T stat for magnitude
T>
4
6
6
4
4
2
2
0
T~2
0
-2
-2
-4
-4
-6
-6
Shift (secs)
~1 sec
T stat for shift
Sd of shift (secs)
4
2
2
1.5
0
+/- 0.5 sec
1
-2
0.5
-4
0
Combining shifts of the hot stimulus
(Contours are T stat for magnitude > 4)
Run 1
Run 2
Run 3
Run 4
MULTISTAT
4
2
Effect,
E
i
0
-2
-4
2
Sd,
S
i
1
0
5
T stat,
E i / iS
0
-5
Shift of the hot stimulus
Shift (secs)
T stat for
magnitude > 4.93
References
• Worsley et al. (2002). A general statistical
analysis for fMRI data. NeuroImage, 15:1-15.
• Liao et al. (2002). Estimating the delay of the
response in fMRI data. NeuroImage, 16:593-606.
• FMRISTAT: MATLAB package from
www.math.mcgill.ca/keith/fmristat
False Discovery Rate (FDR)
Benjamini and Hochberg (1995), Journal of the Royal Statistical Society
Benjamini and Yekutieli (2001), Annals of Statistics
Genovese et al. (2001), NeuroImage
• FDR controls the expected proportion of false
positives amongst the discoveries, whereas
• Bonferroni / random field theory controls the
probability of any false positives
• No correction controls the proportion of false
positives in the volume
Signal + Gaussian
white noise
Signal
Noise
P < 0.05 (uncorrected), T > 1.64
5% of volume is false +
4
4
2
2
0
-2
-4
FDR < 0.05, T > 2.82
5% of discoveries is false +
True +
False +
0
-2
-4
P < 0.05 (corrected), T > 4.22
5% probability of any false +
4
4
2
2
0
0
-2
-2
-4
-4
Comparison of thresholds
• FDR depends on the ordered P-values:
P1 < P2 < … < Pn. To control the FDR at a = 0.05, find
K = max {i : Pi < (i/n) a}, threshold the P-values at PK
Proportion of true + 1
0.1 0.01 0.001 0.0001
Threshold T 1.64 2.56 3.28 3.88 4.41
• Bonferroni thresholds the P-values at a/n:
Number of voxels 1
10 100 1000 10000
Threshold T 1.64 2.58 3.29 3.89 4.42
• Random field theory: resels = volume / FHHM3:
Number of resels
0
1
10 100 1000
Threshold T 1.64 2.82 3.46 4.09 4.65
P < 0.05 (uncorrected), T > 1.64
5% of volume is false +
FDR < 0.05, T > 2.67
5% of discoveries is false +
P < 0.05 (corrected), T > 4.93
5% probability of any false +
Conjunction: Minimum Ti > threshold
‘Minimum of Ti’
For P=0.05,
threshold = 1.82
Efficiency = 82%
‘Average of Ti’
6
6
4
4
2
2
0
0
-2
-2
-4
-4
-6
-6
1
For P=0.05,
threshold = 4.93
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Functional connectivity
• Measured by the correlation between residuals at
every pair of voxels (6D data!)
Activation only
Voxel 2
++
+ +++
Correlation only
Voxel 2
Voxel 1
+
+
++
+
Voxel 1
+
•
•
•
•
Local maxima are larger than all 12 neighbours
P-value can be calculated using random field theory
Good at detecting focal connectivity, but
PCA of residuals x voxels is better at detecting large
regions of co-correlated voxels
|Correlations| > 0.7,
P<10-10 (corrected)
First Principal
Component > threshold
Download