chicago psych

advertisement
Statistical analysis of fMRI
data, ‘bubbles’ data, and the
connectivity between the two
Keith Worsley, McGill (and Chicago)
Nicholas Chamandy, McGill and Google
Jonathan Taylor, Université de Montréal and Stanford
Robert Adler, Technion
Philippe Schyns, Fraser Smith, Glasgow
Frédéric Gosselin, Université de Montréal
Arnaud Charil, Alan Evans, Montreal Neurological Institute
Before you start: PCA of time  space
Component
Temporal components (sd, % variance explained)
1
0.68, 46.9%
2
0.29, 8.6%
3
0.17, 2.9%
4
0.15, 2.4%
0
20
40
60
80
100
120
140
Frame
Spatial components
1
Component
1
0.5
2
0
3
-0.5
1: exclude
first frames
2: drift
3: long-range
correlation
or anatomical
effect: remove
by converting
to % of brain
4
0
2
4
6
8
Slice (0 based)
10
12
-1
4: signal?
Bad design:
2 mins rest
2 mins Mozart
2 mins Eminem
2 mins James Brown
Rest
Mozart
Eminem
J. Brown
Temporal components
Component
Period:
5.2
16.1
(sd, % variance explained)
15.6
11.6
seconds
1
0.41, 17%
2
0.31, 9.5%
3
0.24, 5.6%
0
50
100
Frame
Spatial components
150
200
1
Component
1
0.5
2
0
-0.5
3
0
2
4
6
8
10
12
Slice (0 based)
14
16
18
-1
Effect of stimulus on brain response
Alternating hot and warm stimuli separated by rest (9 seconds each).
2
1
0
-1
0
50
100
150
200
250
300
350
Hemodynamic response function: difference of two gamma densities
Stimulus is
delayed and
dispersed by ~6s
0.4
Modeled by convolving
the stimulus with the
“hemodynamic
response function”
0.2
0
-0.2
0
50
Responses = stimuli * HRF, sampled every 3 seconds
2
1
0
-1
0
50
100
150
200
Time, seconds
250
300
350
fMRI data, pain experiment, one slice
First scan of fMRI data
Highly significant effect, T=6.59
1000
hot
rest
warm
890
880
870
500
0
100
200
300
No significant effect, T=-0.74
820
hot
rest
warm
0
800
T statistic for hot - warm effect
5
0
-5
T = (hot – warm effect) / S.d.
~ t110 if no effect
0
100
0
100
200
Drift
300
810
800
790
200
Time, seconds
300
How fMRI differs from other repeated
measures data





Many reps (~200 time points)
Few subjects (~15)
Df within subjects is high, so not worth
pooling sd across subjects
Df between subjects low, so use spatial
smoothing to boost df
Data sets are huge ~4GB, not easy to use
statistics packages such as R
FMRISTAT (Matlab) /
BRAINSTAT (Python)
statistical analysis strategy

Analyse each voxel separately


Break up analysis into stages




Borrow strength from neighbours when needed
1st level: analyse each time series separately
2nd level: combine 1st level results over runs
3rd level: combine 2nd level results over subjects
Cut corners: do a reasonable analysis in a
reasonable time (or else no one will use it!)
1st level:
Linear model with AR(p) errors

Data



Model



Yt = fMRI data at time t
xt = (responses,1, t, t2, t3, … )’ to allow for drift
Yt = xt’β + εt
εt = a1εt-1 + … + apεt-p + σFηt,
ηt ~ N(0,1) i.i.d.
Fit in 2 stages:


1st pass: fit by least squares, find residuals,
estimate AR parameters a1 … ap
2nd pass: whiten data, re-fit by least squares
Higher levels:
Mixed effects model

Data




Model




Ei = effect (contrast in β) from previous level
Si = sd of effect from previous level
zi = (1, treatment, group, gender, …)’
Ei = zi’γ + SiεiF + σRεiR (Si high df, so assumed fixed)
εiF ~ N(0,1) i.i.d. fixed effects error
εiR ~ N(0,1) i.i.d. random effects error
Fit by ReML

Use EM for stability, 10 iterations
Where we use spatial information

1st level: smooth AR parameters to lower their
variability and increase “df”


“df” defined by Satterthwaite approximation
surrogate for variance of the variance parameters

Higher levels: smooth Random / Fixed effects
sd ratio to lower variability and increase “df”

Final level: use random field theory to correct
for multiple comparisons
1st level: Autocorrelation
AR(1) model: εt = a1 εt-1 + σFηt
 Fit the linear model using least squares
 εt = Y t – Y t
 â1 = Correlation (εt , εt-1)
 Estimating εt changes their correlation structure slightly,
so â1 is slightly biased:

Raw autocorrelation Smoothed 12.4mm
~ -0.05
Bias corrected â1
~0
0.3
0.2
0.1
0
-0.1
How much smoothing?
• Variability in
â lowers df
• Df depends
on contrast
• Smoothing â
brings df back up:
(
FWHMâ2
+1
2
FWHMdata
dfâ = dfresidual 2
1
dfeff
Hot stimulus
=
1
+
2 acor(contrast of data)2
dfresidual
dfâ
FWHMdata = 8.79
Residual df = 110
100
Target = 100 df
Contrast of data, acor = 0.61
50
dfeff
0
0
10
20
30
FWHM = 10.3mm
FWHMâ
)
3/2
Hot-warm stimulus
Residual df = 110
100
Target = 100 df
Contrast of data, acor = 0.79
50
dfeff
0
0
10
20
30
FWHM = 12.4mm FWHMâ
2nd level: 4 runs, 3 df for random effects sd
Run 1
Run 2
Run 3
Run 4
2nd level
Effect,
Ei
1
0
… very noisy sd:
-1
0.2
Sd,
Si
0.1
… and T>15.96 for P<0.05 (corrected):
0
5
T stat,
E i / Si
0
… so no response is detected …
-5
Solution:
Spatial smoothing of the sd ratio
• Basic idea: increase “df” by spatial
smoothing (local pooling) of the sd.
• Can’t smooth the random effects sd directly,
- too much anatomical structure.
• Instead,

sd = smooth
random effects sd
 fixed effects sd
fixed effects sd
)
which removes the anatomical structure
before smoothing.
^

Average Si
Random effects sd, 3 df
Fixed effects sd, 440 df
Mixed effects sd, ~100 df
0.2
0.15
0.1
0.05
divide
Random sd / fixed sd
0
multiply
Smoothed sd ratio
1.5
1
0.5
random
effect, sd
ratio ~1.3
How much smoothing?
(
dfratio = dfrandom
FWHMratio2
2
+1
2
FWHMdata
)
1
1
1
=
+
dfeff dfratio dffixed
3/2
dfrandom = 3,
dffixed = 4  110
= 440,
FWHMdata = 8mm:
fixed effects
analysis,
dfeff = 440
400
300
dfeff
Target = 100 df
random effects
analysis,
dfeff = 3
200
FWHM
= 19mm
100
0
0
20
40
FWHMratio
Infinity
Final result: 19mm smoothing, 100 df
Run 1
Run 2
Run 3
Run 4
2nd level
Effect,
Ei
1
0
… less noisy sd:
-1
0.2
Sd,
Si
0.1
… and T>4.93 for P<0.05 (corrected):
0
5
T stat,
E i / Si
0
… and now we can detect a response!
-5
Final level: Multiple comparisons correction
0.1
Threshold chosen so that
P(maxS Z(s) ≥ t) = 0.05
0.09
0.08
Bonferroni
Random field theory
0.07
P value
0.06
0.05
Discrete local maxima
0.04
2
0.03
0
0.02
-2
0.01
0
Z(s)
0
1
2
3
4
5
6
7
8
FWHM (Full Width at Half Maximum) of smoothing filter
9
10
FWHM
Random
field theory
Z(s)
white noise
=
filter
*
FWHM
If Z (s) is whit e noise smoot hed wit h an isot ropic Gaussian ¯lt er of Full Widt h
at Half Maximum FWHM
µ
¶
Z
1
1
P max Z (s) ¸ t ¼ E C(S)
e¡ z 2 =2 dz
(2¼) 1=2
s2 S
EC (S)
t
Resels0(S)
Resels1(S)
Resels2(S)
Resels3(S)
Diamet er(S)
e¡ t 2 =2
FWHM
2¼
Area(S) 4 log 2
1
+
te¡ t 2 =2
2 FWHM 2 (2¼) 3=2
Volume(S) (4 log 2) 3=2
+
(t 2 ¡ 1)e¡
3
(2¼) 2
FWHM
+ 2
Resels (Resolution elements)
0
(4 log 2) 1=2
EC1(S)
EC2(S)
t 2 =2 :
EC3(S)
EC densities
Discrete local maxima

Bonferroni applied to N events:
{Z(s) ≥ t and Z(s) is a discrete local maximum} i.e.
{Z(s) ≥ t and neighbour Z’s ≤ Z(s)}

Conservative

If Z(s) is stationary, with
Cor(Z(s1),Z(s2)) = ρ(s1-s2),
Then the DLM P-value is
Z(s2)
≤
Z(s-1)≤ Z(s) ≥Z(s1)
≥
Z(s-2)
P{maxS Z(s) ≥ t} ≤ N × P{Z(s) ≥ t and neighbour Z’s ≤ Z(s)}

We only need to evaluate a (2D+1)-variate integral …
Discrete local maxima:
“Markovian” trick

If ρ is “separable”: s=(x,y),
ρ((x,y)) = ρ((x,0)) × ρ((0,y))
 e.g. Gaussian spatial correlation function:
ρ((x,y)) = exp(-½(x2+y2)/w2)
Then Z(s) has a “Markovian” property:
conditional on central Z(s), Z’s on
different axes are independent:
Z(s±1) ┴ Z(s±2) | Z(s)

Z(s2)
≤
Z(s-1)≤ Z(s) ≥Z(s1)
≥
Z(s-2)
So condition on Z(s)=z, find
P{neighbour Z’s ≤ z | Z(s)=z} = ∏d P{Z(s±d) ≤ z | Z(s)=z}
then take expectations over Z(s)=z
 Cuts the (2D+1)-variate integral down to a bivariate
integral

T he result only involves t he correlat ion ½d between adjacent voxels along
each lat t ice axis d, d = 1; : : : ; D . First let t he Gaussian density and uncorrect ed
P values be
Z
p
1
Á(z) = exp(¡ z2 =2)= 2¼; ©(z) =
Á(u)du;
z
respect ively. T hen de¯ne
1
Q(½; z) = 1 ¡ 2©(hz) +
¼
where
® = sin¡
³p
1
Z
®
exp(¡
1 h2 z2 =sin2
2
0
r
´
(1 ¡ ½2 )=2 ;
h=
µ)dµ;
1¡ ½
:
1+ ½
T hen t he P-value of t he maximum is bounded by
µ
P
¶
max Z (s) ¸ t
s2 S
Z
· jSj
t
1
YD
Q(½d ; z) Á(z)dz;
d= 1
where jSj is t he number of voxels s in t he search region S. For a voxel on
t he boundary of t he search region wit h just one neighbour in axis direct ion d,
replace Q(½; z) by 1 ¡ ©(hz), and by 1 if it has no neighbours.
Example: single run, hot-warm
Detected by BON and
DLM but not by RFT
Detected by DLM,
but not by BON or RFT
Estimating the delay of the response
• Delay or latency to the peak of the HRF is approximated by
a linear combination of two optimally chosen basis functions:
delay
0.6
0.4
basis1
0.2
HRF
basis2
0
-0.2
-0.4
-5
0
shift
5
10
t (seconds)
15
20
25
HRF(t + shift) ~ basis1(t) w1(shift) + basis2(t) w2(shift)
• Convolve bases with the stimulus, then add to the linear model
• Fit linear model,
estimate w1 and w2
3
w2 / w1
2
1
• Equate w2 / w1 to estimates, then
solve for shift (Hensen et al., 2002)
w1
• To reduce bias when the magnitude
is small, use
0
w2
shift / (1 + 1/T2)
-1
where T = w1 / Sd(w1) is the T statistic
for the magnitude
-2
-3
-5
0
shift (seconds)
5
• Shrinks shift to 0 where there is little
evidence for a response.
Shift of the hot stimulus
T stat for magnitude
T stat for shift
6
6
4
4
2
2
0
0
-2
-2
-4
-4
-6
-6
Shift (secs)
Sd of shift (secs)
4
2
2
1.5
0
1
-2
0.5
-4
0
Shift of the hot stimulus
T stat for magnitude
T>4
T stat for shift
6
6
4
4
2
2
0
0
-2
-2
-4
-4
-6
-6
Shift (secs)
~1 sec
T~2
Sd of shift (secs)
4
2
2
1.5
0
+/- 0.5 sec
1
-2
0.5
-4
0
Combining shifts of the hot stimulus
(Contours are T stat for magnitude > 4)
Run 1
Effect,
Ei
Run 2
Run 3
Run 4
MULTISTAT
~1 sec
4
2
0
-2
-4
2
Sd,
Si
+/- 0.25 sec
1
0
5
T stat,
E i / Si
T~4
0
-5
Shift of the hot stimulus
Shift (secs)
T stat for
magnitude
> 4.93
Functional Imaging Analysis Contest
HBM2005





15 subjects / 4 runs per subject (2 with events, 2 with blocks)
4 conditions per run
 Same sentence, same speaker
 Same sentence, different speaker
 Different sentence, same speaker
 Different sentence, different speaker
3T, 191 frames, TR=2.5s
Greater %BOLD response for
 different – same sentences (1.08±0.16%)
 different – same speaker (0.47±0.08%)
Greater latency for
 different – same sentences (0.148±0.035 secs)
Contrasts in the data used for effects
2
Hot, Sd = 0.16
Warm, Sd = 0.16
9 sec
1
blocks,
9 sec
gaps 0
-1
0
50
100
150
200
Hot-warm, Sd = 0.19
250
300
350
Time (secs)
2
Hot, Sd = 0.28
90 sec
blocks, 1
90 sec
gaps 0
Warm, Sd = 0.43
Only using data near block transitions
Ignoring data in the middle of blocks
-1
0
50
100
150
200
Hot-warm, Sd = 0.55
250
300
350
Time (secs)
Optimum block design
Sd of hot stimulus
0.5
20
0.4
15
Magnitude
Best
design
10
15
20
0.8
15
10
5
0
(secs)
1
20
Delay
5
0
5
X
10
15
0.1
20
20
0.8
15
0.6
Best
design
X
0.4
0.2
15
20
0
0
(secs)
1
10
(Not enough signal)
10
0.2
Best
design
0.6
Best
design
X
5
0.3
10
0
10
0.4
15
0.2
0.1
5
0.5
20
0.3
X
5
Gap
(secs)
Sd of hot-warm
5
0
0.4
0.2
(Not enough signal)
Block (secs)
5
10
15
20
0
Optimum event design
0.5
(Not
enough
signal)
____ magnitudes
……. delays
uniform . . . . . . . . .
random .. . ... .. .
concentrated :
0.4
Sd of
effect
(secs
for
delays)
0.3
0.2
12 secs best for
magnitudes
0.1
0
5
15
7 secs best for 10
delays Average time between events (secs)
20
How many subjects?

Largest portion of variance comes from the
last stage i.e. combining over subjects:
sdrun2
sdsess2
sdsubj2
nrun nsess nsubj + nsess nsubj + nsubj

If you want to optimize total scanner time,
take more subjects.

What you do at early stages doesn’t matter
very much!
Features special to
FMRISTAT / BRAINSTAT


Bias correction for AR coefficients
Df boosting due to smoothing:



P-value adjustment for:




AR coefficients
random/fixed effects variance
peaks due to small FWHM (DLM)
clusters due to spatially varying FWHM
Delays analysed the same way as magnitudes
Sd of effects before collecting data
What is ‘bubbles’?
Nature (2005)
Subject is shown one of 40
faces chosen at random …
Happy
Sad
Fearful
Neutral
… but face is only revealed
through random ‘bubbles’

First trial: “Sad” expression
Sad
75 random
Smoothed by a
bubble centres Gaussian ‘bubble’
What the
subject sees
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0


Subject is asked the expression:
Response:
“Neutral”
Incorrect
Your turn …

Trial 2
Subject response:
“Fearful”
CORRECT
Your turn …

Trial 3
Subject response:
“Happy”
INCORRECT
(Fearful)
Your turn …

Trial 4
Subject response:
“Happy”
CORRECT
Your turn …

Trial 5
Subject response:
“Fearful”
CORRECT
Your turn …

Trial 6
Subject response:
“Sad”
CORRECT
Your turn …

Trial 7
Subject response:
“Happy”
CORRECT
Your turn …

Trial 8
Subject response:
“Neutral”
CORRECT
Your turn …

Trial 9
Subject response:
“Happy”
CORRECT
Your turn …

Trial 3000
Subject response:
“Happy”
INCORRECT
(Fearful)
Bubbles analysis

1
E.g. Fearful (3000/4=750 trials):
+
2
+
3
+
Trial
4 + 5
+
6
+
7 + … + 750
1
= Sum
300
0.5
200
0
100
250
200
150
100
50
Correct
trials
Proportion of correct bubbles
=(sum correct bubbles)
/(sum all bubbles)
0.75
Thresholded at
proportion of
0.7
correct trials=0.68,
0.65
scaled to [0,1]
1
Use this
as a
0.5
bubble
mask
0
Results

Mask average face
Happy

Sad
Fearful
But are these features real or just noise?
 Need statistics …
Neutral
Statistical analysis
Correlate bubbles with response (correct = 1, incorrect =
0), separately for each expression
Equivalent to 2-sample Z-statistic for correct vs. incorrect
bubbles, e.g. Fearful:


Trial 1
2
3
4
5
6
7 …
750
1
0.5
0
1
1
Response
0
1
Z~N(0,1)
statistic
4
2
0
-2
0
1
1 …
1
0.75

Very similar to the proportion of correct bubbles:
0.7
0.65
Results

Thresholded at Z=1.64 (P=0.05)
Happy
Average face
Sad
Fearful
Neutral
Z~N(0,1)
statistic
4.58
4.09
3.6
3.11
2.62
2.13
1.64

Multiple comparisons correction?
 Need random field theory …
Results, corrected for search

Random field theory threshold: Z=3.92 (P=0.05)
Happy
Average face
Sad
Fearful
Neutral
Z~N(0,1)
statistic
4.58
4.47
4.36
4.25
4.14
4.03
3.92


3.82
3.80
3.81
3.80
Saddle-point approx (Chamandy, 2007): Z=↑ (P=0.05)
Bonferroni: Z=4.87 (P=0.05) – nothing
Scale
Separate analysis of the bubbles at each scale
Scale space: smooth Z(s) with range of filter widths w
= continuous wavelet transform
adds an extra dimension to the random field: Z(s,w)
Scale space, no signal
w = FWHM (mm, on log scale)
34
8
6
4
2
0
-2
22.7
15.2
10.2
6.8
-60
-40
34
-20
0
20
One 15mm signal
40
60
8
6
4
2
0
-2
22.7
15.2
10.2
6.8
-60
-40
-20
0
s (mm)
20
40
60
15mm signal is best detected with a 15mm smoothing filter
Z(s,w)
Matched Filter Theorem (= Gauss-Markov Theorem):
“to best detect signal + white noise,
filter should match signal”
10mm and 23mm signals
w = FWHM (mm, on log scale)
34
8
6
4
2
0
-2
22.7
15.2
10.2
6.8
-60
-40
34
-20
0
20
Two 10mm signals 20mm apart
40
60
8
6
4
2
0
-2
22.7
15.2
10.2
6.8
-60
-40
-20
0
20
40
60
s (mm)
But if the signals are too close together they are
detected as a single signal half way between them
Z(s,w)
Scale space can even separate
two signals at the same location!
8mm and 150mm signals at the same location
10
5
w = FWHM (mm, on log scale)
0
-60
170
-40
-20
0
20
40
60
20
76
15
34
10
15.2
6.8
5
-60
-40
-20
0
s (mm)
20
40
60
Z(s,w)
Bubbles task in fMRI scanner

Correlate bubbles with BOLD at every voxel:
Trial
1
2
3
4
5
6
7 …
3000
1
0.5
0
fMRI
10000
0

Calculate Z for each pair (bubble pixel, fMRI voxel)

a 5D “image” of Z statistics …
Thresholding?

Thresholding in advance is vital, since we
cannot store all the ~1 billion 5D Z values



Resels = (image resels = 146.2) × (fMRI resels =
1057.2)
for P=0.05, threshold is Z = 6.22 (approx)
Only keep 5D local maxima

Z(pixel, voxel) > Z(pixel, 6 neighbours of voxel)
> Z(4 neighbours of pixel, voxel)
Generalised linear models?







The random response is Y=1 (correct) or 0 (incorrect), or Y=fMRI
The regressors are Xj=bubble mask at pixel j, j=1 … 240x380=91200 (!)
Logistic regression or ordinary regression:
 logit(E(Y)) or E(Y) = b0+X1b1+…+X91200b91200
But there are only n=3000 observations (trials) …
Instead, since regressors are independent, fit them one at a time:
 logit(E(Y)) or E(Y) = b0+Xjbj
However the regressors (bubbles) are random with a simple known distribution, so
turn the problem around and condition on Y:
 E(Xj) = c0+Ycj
 Equivalent to conditional logistic regression (Cox, 1962) which gives exact
inference for b1 conditional on sufficient statistics for b0
 Cox also suggested using saddle-point approximations to improve accuracy of
inference …
Interactions? logit(E(Y)) or E(Y)=b0+X1b1+…+X91200b91200+X1X2b1,2+ …
Download