Nonnegative least squares for imaging data Keith Worsley Jonathan Taylor

advertisement
Nonnegative least squares
for imaging data
Keith Worsley
McGill
Jonathan Taylor
Stanford and Université de Montréal
John Aston
Academia Sinica, Taipei
Nature (2005)
Subject is shown one of 40
faces chosen at random …
Happy
Sad
Fearful
Neutral
… but face is only revealed
through random ‘bubbles’

First trial: “Sad” expression
Sad
75 random
Smoothed by a
bubble centres Gaussian ‘bubble’
What the
subject sees
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0


Subject is asked the expression:
Response:
“Neutral”
Incorrect
Your turn …

Trial 2
Subject response:
“Fearful”
CORRECT
Your turn …

Trial 3
Subject response:
“Happy”
INCORRECT
(Fearful)
Your turn …

Trial 4
Subject response:
“Happy”
CORRECT
Your turn …

Trial 5
Subject response:
“Fearful”
CORRECT
Your turn …

Trial 6
Subject response:
“Sad”
CORRECT
Your turn …

Trial 7
Subject response:
“Happy”
CORRECT
Your turn …

Trial 8
Subject response:
“Neutral”
CORRECT
Your turn …

Trial 9
Subject response:
“Happy”
CORRECT
Your turn …

Trial 3000
Subject response:
“Happy”
INCORRECT
(Fearful)
Bubbles analysis

1
E.g. Fearful (3000/4=750 trials):
+
2
+
3
+
Trial
4 + 5
+
6
+
7 + … + 750
1
= Sum
300
0.5
200
0
100
250
200
150
100
50
Correct
trials
Proportion of correct bubbles
=(sum correct bubbles)
/(sum all bubbles)
0.75
Thresholded at
proportion of
0.7
correct trials=0.68,
0.65
scaled to [0,1]
1
Use this
as a
0.5
bubble
mask
0
Results

Mask average face
Happy

Sad
Fearful
But are these features real or just noise?
 Need statistics …
Neutral
Statistical analysis
Correlate bubbles with response (correct = 1, incorrect =
0), separately for each expression
Equivalent to 2-sample Z-statistic for correct vs. incorrect
bubbles, e.g. Fearful:


Trial 1
2
3
4
5
6
7 …
750
1
0.5
0
1
1
Response
0
1
Z~N(0,1)
statistic
4
2
0
-2
0
1
1 …
1
0.75

Very similar to the proportion of correct bubbles:
0.7
0.65
Results

Thresholded at Z=1.64 (P=0.05)
Happy
Average face
Sad
Fearful
Neutral
Z~N(0,1)
statistic
4.58
4.09
3.6
3.11
2.62
2.13
1.64

Multiple comparisons correction for 91200 pixels?
 Need random field theory …
Euler Characteristic = #blobs - #holes
Excursion set {Z > threshold} for neutral face
EC = 0
30
Euler Characteristic
20
0
-7
-11
13
14
9
1
0
Heuristic:
At high thresholds t,
the holes disappear,
EC ~ 1 or 0,
E(EC) ~ P(max Z > t).
Observed
Expected
10
0
-10
-20
-4
-3
-2
-1
0
Threshold
1
2
• Exact expression for
E(EC) for all thresholds,
• E(EC) ~ P(max Z > t) is
3
4
extremely
accurate.
Random
field theory
»
If Z(s) N(0; 1) ¡is an¢ isotropic Gaussian random ¯eld, s 2 <2 ,
with ¸2 I2£2 = V @Z ,
@s
µ
¶
P max Z(s) ¸ t ¼ E(EC(S \ fs : Z(s) ¸ tg))
s2S
Z 1
1
£
L (S)
= EC(S)
e¡z2 =2 dz
0
(2¼)1=2
t
1 ¡2
L (S)
Lipschitz-Killing
1 Perimeter(S) £
+
¸
e t =2
1
2
curvatures of S
2¼
1
(=Resels(S) × c)
L (S)
¡t2 =2
2 Area(S) £
+
¸
te
2
(2¼)3=2
If Z(s) is white noise convolved
with an isotropic Gaussian
Z(s)
¯lter of Full Width at Half
Maximum
FWHM then
p
¸ = 4 log 2 :
FWHM
½0 (Z ¸ t)
½1 (Z ¸ t)
½2 (Z ¸ t)
EC densities
of Z above t
white noise
=
filter
*
FWHM
Results, corrected for search

Random field theory threshold: Z=3.92 (P=0.05)
Happy
Average face
Sad
Fearful
Neutral
Z~N(0,1)
statistic
4.58
4.47
4.36
4.25
4.14
4.03
3.92


3.82
3.80
3.81
3.80
Saddle-point approx (Rabinowitz, 1997; Chamandy, 2007)↑
Bonferroni: Z=4.87 (P=0.05) – nothing
fMRI data: 120 scans, 3 scans each of hot, rest, warm, rest, …
First scan of fMRI data
Highly significant effect, T=6.59
1000
hot
rest
warm
890
880
870
500
0
100
200
300
No significant effect, T=-0.74
820
hot
rest
warm
0
800
T statistic for hot - warm effect
5
0
-5
T = (hot – warm effect) / S.d.
~ t110 if no effect
0
100
0
100
200
Drift
300
810
800
790
200
Time, seconds
300
Linear model regressors
Alternating hot and warm stimuli separated by rest (9 seconds each).
2
1
0
-1
0
50
100
150
200
250
300
350
Hemodynamic response function: difference of two gamma densities
0.4
0.2
0
-0.2
0
50
Regressors = stimuli * HRF, sampled every 3 seconds
2
1
0
-1
0
50
100
150
200
Time, seconds
250
300
350
Linear model for fMRI time series
with AR(p) errors
Y (t) = (s ? h)(t)¯? + z(t)°? + ²(t)
²(t) = a1 ²(t ¡ 1) + ¢ ¢ ¢ + ap ²(t ¡ p) + ¾W N (t)
?
?
?
t = time
Y (t) = fMRI data
s(t) = stimulus
h(t) = hemodynamic response function (HRF)
z(t) = drift etc:
²(t) = error
W N (t) » N(0; 1) independently
? = convolution
unknown
parameters
Unknown latency δ of HRF
h(t; ±) = h0 (t ¡ ±)
h0 (t) is a known canonical HRF, and latency shift ± is unknown. This is a hard
non-linear regression problem so Friston et al. (1998) linearized the problem by
expanding h(t; ±) in a Taylor series:
h(t; ±) ¼ h0 (t) ¡ ± h_ 0 (t):
The model now becomes another linear model with an extra regressor
Y (t) = (s ? h0 )(t)¯ ¡ (s ? h_ 0 )(t)±¯ + z(t)° + ²(t)
= x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t)
?
?
?
x1 = s ? h0
x = ¡s ? h_
2
¯1 = ¯
¯2 = ±¯
0
unknown parameters
Example
Y (t) = (s ? (h0 ¡ ± h_ 0 ))(t)¯ + z(t)° + ²(t)
= (x1 (t) + x2 (t)±)¯ + z(t)° + ²(t)
Linearized HRF
h0 (t) ¡ ± h_ 0 (t)
± = ¡2; 0; 2 seconds
0.4-2
convolve
with
stimulus
2
0
Linearized regressors
x1 (t) + ±x2 (t)
± = ¡2; 0; 2 seconds
+2
0
-2
+2
1
0.2
0
0
-0.2
-1
0
10
20
Time, t (seconds)
stimulus
10
20
30
40
50
Time, t (seconds)
60
70
Two interesting problems:
• Estimate the latency shift δ and its standard error;
• Test for the magnitude β of the stimulus allowing for unknown latency.
Test for the magnitude β of the stimulus allowing
for unknown latency
Y (t) = (s ? h0 )(t)¯ ¡ (s ? h_ 0 )(t)±¯ + z(t)° + ²(t)
= x1 (t)¯ + x2 (t)±¯ + z(t)° + ²(t)
= x1 (t)¯1 + x2 (t)¯2 + z(t)° + ²(t)
We could do either
• T-test on β1 > 0, allowing for β2
• loses sensitivity if δ is far from 0
• F-test on (β1, β2) ≠ 0
• wastes sensitivity on unrealistic HRF’s
Cone alternative
We know that the
• magnitude of the response is positive
• latency shift must lie in some restricted interval, say [-2, 2] seconds
This implies that
•β≥0
• -Δ ≤ δ ≤ Δ, where Δ = 2 seconds
This specifies a cone alternative for (β1 = β, β2 = δβ) (Friman et al. , 2003):
δ=2
β2
Cone angle θ = 2 atan(Δ||x2||/||x1||)
Null 0
cone
alternative
δ=0
β1
δ = -2
Non-negative least squares
Express model in terms of the two extremes:
x¤ (t) = x1 (t) ¡ ±x2 (t)
x¤ (t) = x1 (t) + ±x2 (t);
1
2
Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t)
1
1
2
2
Then the coe±cients are non-negative:
¯ ¤ ¸ 0;
1
¯ ¤ ¸ 0:
2
2
1
x2*(t)
x1*(t)
stimulus
0
-1
10
20
30
40
50
Time, t (seconds)
60
70
Non-negative least squares
Express model in terms of the two extremes:
x¤ (t) = x1 (t) ¡ ±x2 (t)
x¤ (t) = x1 (t) + ±x2 (t);
1
2
Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t)
1
1
2
2
Then the coe±cients are non-negative:
¯ ¤ ¸ 0;
1
¯ ¤ ¸ 0:
2
x1*(t)
β2
Cone angle θ = angle between x1* and x2*
Null 0
β1* ≥ 0
β2 * ≥ 0
β1
x2*(t)
Example of three extremes
Y (t) = x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + x¤ (t)¯ ¤ + z(t)° + ²(t)
1
1
2
2
3
3
¯ ¤ ¸ 0; ¯ ¤ ¸ 0; ¯ ¤ ¸ 0:
1
2
3
4
x3*(t)
spread
4 seconds
3D cone
0
25
4
x1*(t)
standard
β3
β1
β2
x2*(t)
delayed
4 seconds
General non-negative least squares problem
In general we may have p constrained regressors and q unconstrained regressors.
The constrained regressors could be the extremes of the HRF, e.g. min/max
latency shift, min/max spread, etc. The model is then all HRFs in between. In
vector form:
Yn£1 = Xn£p ¯p£1 + Zn£q °q£1 + ²n£1
¯p£1 ¸ 0 (component ¡ wise)
°q£1 unconstrained
²n£1 » N(0n£1 ; In£n ¾ 2 ) (without loss of generality)
We might also have far more regressors than actual observations(!), i.e. p >> n.
Footnote: Woolrich et al. (2004) replace “hard” constraints by “soft” constraints
through a prior distribution on β, taking a Bayesian approach.
Pick a range of say p = 150 plausible values of the non-linear
parameter ν: ν1,…,νp
Fitting the NNLS model
Simple:
• Do “all subsets” regression
• Throw out any model that does not satisfy the non-negativity constraints
• Among those left, pick the model with smallest error sum of squares
For larger models there are more efficient methods e.g. Lawson & Hanson
(1974).
The non-negativity constraints tend to enforce sparsity, even if regressors are
highly correlated (e.g. PET).
Why? Highly correlated regressors have huge positive and negative
unconstrained coefficients – non-negativity suppresses the negative ones.
Example:
n=20, p=150, but surprisingly it does not overfit
30
Y
Yhat
Yhat component 1
Yhat component 2
25
Tracer
20
b̄ = 107:4
b̄40 = 46:9
41
15
10
5
0
0
1000
2000
b̄ = 5:4
b̄86 = 71:7
87
3000
Time
4000
rest:
b̄ = 0
j
5000
6000
Tend to get sparse pairs of adjacent regressors, suggesting best regressor
is somewhere inbetween.
P-values?
! error sum of squares = SSE
H0 : ¯p£1 = 0p£1
0
H1 : ¯p£1 ¸ 0p£1 (component ¡ wise) ! error sum of squares = SSE1
Likelihood ratio test statistic is the Beta-bar statistic, equal to the coe±cient of
determination or multiple correlation R2 :
¡ SSE
SSE
0
1
¹=
B
SSE0
Null distribution if there are no constraints, i.e. H1 : ¯p£1 6= 0p£1 :
¡
¡
¢¸ ¢
¡
¸
¹
P(B t) = P Beta p ; º p
t ; º =n¡q
2
2
Null distribution with constraints is a weighted average of Beta distributions,
hence the name Beta-bar (Lin & Lindsay, Takemura & Kuriki, 1997):
¹ ¸ t) =
P(B
X
º
j=1
¡
¡
¢¸ ¢
¡
j
º
j
wj P Beta ;
t
2
=1 when j=ν
2
wj = P(#funconstrained b̄0 s > 0g = j)
P-values¹ for PET data at a single voxel
Observed B
is t = 0:9524, º = 20.
p = 2: Cone weights:
w1 = ½, w2 = θ/2π
¹ weights w
Easiest way to ¯nd the B
j
is by simulation (10000 iterations):
-8
0.5
2
0.4
wj
1.5
x 10
¡
¡
¢¸ ¢
¡
j
º
j
P Beta ;
t
2
0.3
1
2
0.2
0.5
0.1
0
0
1
2
3
4
5
0
6
j
¹ ¸ t) =
P(B
X
º
j=1
0
1
2
3
4
j
¡
¡
¢¸ ¢
¡
wj P Beta j ; º j
t = 2:35 £ 10¡12
2
2
5
6
The Beta-bar random field
Recall that at a single point (voxel):
¹ ¸ t) =
P(B
X
º
j=1
¡
¡
¢¸ ¢
¡
j
º
j
wj P Beta ;
t
2
2
Recall that if F (s); s 2 S ½ <D , is an isotropic random ¯eld:
µ
¶
X
D
¸
¼
\
f
¸
g
L (S)½ (F ¸ t)
P max Z(s) t
E(EC(S
s : Z(s) t )) =
d
d
s2S
d=0
with ½0 ´ P. Taylor & Worsley (2007):
¹ ¸ t) =
½d (B
X
º
j=1
¡
¡
¢
¢
¡j ¸
j
º
wj ½d Beta ;
t
2
=1 when j=ν
2
From well-known
EC
µ From simulations
¶ X
µ
¶
Same
linear
combination!
¡
¢ densities
at a single¸
voxel ¼ º
of F field
¡
¸
¹
j
º
j
P max B(s) t
wj P max Beta ;
(s) t
s2S
j=1
s2S
2
2
º = 1;
Proof
Â
¹=
max Z1 cos µ + Z2 sin µ
0·µ·¼=2
Z1~N(0,1)
Z2~N(0,1)
s2
3
2
1
0
-1
-2
Excursion sets,
Xt = fs : Â
¹ ¸ tg s1
Threshold
t 4
3
Search
Region,
S
Rt = fZ : Â
¹ ¸ tg
-3
2
Rejection regions,
Z2
Cone
2
alternative
0
Z1
Null
1
-2
-2
0
2
Euler characteristic heuristic again
Search Region, S
Excursion sets, Xt
EC= #blobs - # holes
= 1
7
6
5
2
1
1
Euler characteristic, EC
10
Heuristic :
¸ t)
P(max Â(s)
¹
Observed
8
0
s2S
6
¼ E(EC) = 0:05
Expected
4
) t = 3:75
2
0
-2
0
0.5
EXACT!
1
1.5
2
E(EC(S \ Xt )) =
X
D
d=0
2.5
3
L (S)½ (R )
d
d
t
3.5
4
Threshold, t
E(EC(S \ Xt )) =
Proof:
X
D
L (S)½ (R )
d
d
t
d=0
Theorem (Hadwiger, 1930s): Suppose Á(S), S ½ <D , is a set functional that
is invariant under translations and rotations of S, and satis¯es the additivity
property
Á(A [ B) = Á(A) + Á(B) ¡ Á(A \ B):
Then Á(S) must be a linear combination of intrinsic volumes L (S):
d
Á(S) =
X
D
L (S)c :
d
d
d=0
Proof: The choice
Á(S) = E(EC(S \ Xt ))
is invariant under translations and rotations because the random ¯eld is isotropic,
and is additive because the EC is additive:
EC(A [ B) = EC(A) + EC(B) ¡ EC(A \ B)
E(EC(S \ Xt )) =
X
D
L (S)½ (R )
d
d
t
µ
@Z
¸ = Sd
@s
d=0
Lipschitz-Killing curvature Ld (S)
Steiner-Weyl Tube Formula (1930)
• Put a tube of radius r about the search
region λS
14
r
12
10
Tube(λS,r)
8
λS
6
EC density ½d (Rt )
Morse Theory
µ Approachµ(1995) ¶
1
@2Â
¹
¡
E
½d (Rt ) =
1f¹¸tg det
¸d
@s@s0
¯
¶ µ
¶
¯ @Â
¹
¯ ¹ = 0 P @Â
=0
¯ @s
@s
µ random
¶
For a Gaussian
field
¡
½d (Z ¸ t) =
4
2
2
4
6
8 10 12 14
• Find volume, expand as a power series
in r, pull off coefficients:
jTube(¸S; r)j =
X
D
d=0
¼d
L
¡d (S)r d
D
¡(d=2 + 1)
¶
p1 @
2¼ @t
d
P(Z ¸ t)
For a chi-bar random field???
Lipschitz-Killing
curvature Ld (S)
of a triangle
r
Tube(λS,r)
λS
¸ = Sd
µ
@Z
@s
¶
Steiner-Weyl Volume of Tubes Formula (1930)
Area(Tube(¸S; r)) =
X
D
¼ d=2
L
¡d (S)r d
D
¡(d=2 + 1)
d=0
= L2 (S) + 2L1 (S)r + ¼ L0 (S)r2
= Area(¸S) + Perimeter(¸S)r + EC(¸S)¼r2
L (S) = EC(¸S)
0
L (S) = 1 Perimeter(¸S)
1
2
L (S) = Area(¸S)
2
Lipschitz-Killing curvatures are just “intrinisic volumes” or “Minkowski functionals”
in the (Riemannian) metric of the variance of the derivative of the process
Lipschitz-Killing curvature Ld (S) of any set
S
S
S
¸ = Sd
Edge length × λ
12
10
8
6
4
2
.
.. . .
.
. . .
.. . .
.. . .
.
. . . .
. . . .
. . . .
.. . .
. . .
... .
.
4
..
.
.
.
.
.
.
.
.
.
.
.
6
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
..
. .
. ...
. ..
. . .
. .
. . .....
. . .
. ....
..
..
10
µ
@Z
@s
¶
of triangles
L (Lipschitz-Killing
²) = 1, L (¡) curvature
L (N
=
1,
)=1
0
0
0
L (¡) = edge length, L (N) = 1 perimeter
1
2
L1 (N) = area
2
P Lcurvature
P L
Lipschitz-Killing
union
L
² ¡ Pof L
¡ of triangles
N
(S) = P² 0 ( )
¡ 0( ) +
P
L (S) =
L (¡) ¡
L (N)
¡
N 1
L1 (S) = P L 1(N)
2
N 2
0
N
0
( )
Non-isotropic data? Use Riemannian metric
µ
¶ of Var(∇Z)
¸ = Sd
Z~N(0,1)
s2
3
@Z
@s
2
1
0.14
0.12
0
-1
-2
Edge length × λ
12
10
8
6
4
2
..
.
.
.
. .
. .
.
.. .
.
.. .
.
.
.
.
.
.
.
. . .
.
. . .
.. .
.
.
. .
...
.
. .
.
.
.
.
.
.
.
.
.
.
.
4
6
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
.
..
. .
. ...
. ..
. . .
. .
. . .....
. . .
. ....
...
10
s1
0.1
0.08
0.06
-3
of triangles
L (Lipschitz-Killing
²) = 1, L (¡) curvature
L (N
=
1,
)=1
0
0
0
L (¡) = edge length, L (N) = 1 perimeter
1
2
L1 (N) = area
2
P Lcurvature
P L
Lipschitz-Killing
union
L
² ¡ Pof L
¡ of triangles
N
(S) = P² 0 ( )
¡ 0( ) +
P
L (S) =
L (¡) ¡
L (N)
¡
N 1
L1 (S) = P L 1(N)
2
N 2
0
N
0
( )
Estimating Lipschitz-Killing curvature Ld (S)
We need independent & identically distributed random fields
e.g. residuals from a linear model
Z1
Z2
Z3
Z4
Replace coordinates of
the triangles in S ½ <2
by normalised residuals
Z
jjZ jj ;
Z = (Z1 ; : : : ; Zn ) 2 <n :
Taylor & Worsley, JASA (2007)
Z5
Z6
Z7
Z8
Z9 … Zn
of triangles
L (Lipschitz-Killing
²) = 1, L (¡) curvature
L (N
=
1,
)=1
0
0
0
L (¡) = edge length, L (N) = 1 perimeter
1
2
L1 (N) = area
2
P Lcurvature
P L
Lipschitz-Killing
union
L
² ¡ Pof L
¡ of triangles
N
(S) = P² 0 ( )
¡ 0( ) +
P
L (S) =
L (¡) ¡
L (N)
¡
N 1
L1 (S) = P L 1(N)
2
N 2
0
N
0
( )
E(EC(S \ Xt )) =
Beautiful symmetry:
X
D
µ
L (S)½ (R )
d
d
t
@Z
¸ = Sd
@s
d=0
Lipschitz-Killing curvature Ld (S)
Steiner-Weyl Tube Formula (1930)
¶
EC density ½d (Rt )
Taylor Gaussian Tube Formula (2003)
• Put a tube of radius r about the search region λS and rejection region Rt:
Z2~N(0,1)
14
r
12
10
Rt
Tube(λS,r)
8
Tube(Rt,r)
r
λS
6
t-r
t
Z1~N(0,1)
4
2
2
4
6
8 10 12 14
• Find volume or probability, expand as a power series in r, pull off1coefficients:
jTube(¸S; r)j =
X
D
d=0
¼d
L
P(Tube(Rt ; r)) =
¡d (S)r d
D
¡(d=2 + 1)
X (2¼)d=2
d!
d=0
½d (Rt )rd
EC density ½d (Â
¹ ¸ t)
of the Â
¹ statistic
Z2~N(0,1)
Tube(Rt,r)
r
t-r
Rejection region
Rt
t
Z1~N(0,1)
Taylor’s Gaussian Tube Formula
(2003)
1
P (Z1 ; Z2 2 Tube(Rt ; r) =
X (2¼)d=2
d!
½d (Â
¹ ¸ t)rd
d=0
(2¼)1=2 ½1 (Â
¹
¸ t)r + (2¼)½ (Â
¸ t)r2 =2 + ¢ ¢ ¢
= ½0 (Â
¹ ¸ t) +
¹
2
Z 1
=
(2¼)¡1=2 e¡z2 =2 dz + e¡(t¡r)2 =2 =4
t¡r
½0 (Â
¹ ¸ t) =
Z
t
1
(2¼)¡1=2 e¡z2 =2 dz + e¡t2 =2 =4
½1 (Â
¹ ¸ t) = (2¼)¡1 e¡t2 =2 + (2¼)¡1=2 e¡t2 =2 t=4
½ (Â
¹ ¸ t) = (2¼)¡3=2 e¡t2 =2 t + (2¼)¡1 e¡t2 =2 (t2 ¡ 1)=8
2
..
.
¹ ¸ t) of the B
¹ statistic
EC density ½d (B
Recall that at a single point (voxel):
¹ ¸ t) =
P(B
X
n
j=1
¡
¡
¢¸ ¢
¡
j
º
j
wj P Beta ;
t
2
2
Recall that if F (s); s 2 S ½ <D , is an isotropic random ¯eld:
µ
¶
X
D
¸
¼
\
f
¸
g
L (S)½ (F ¸ t)
P max Z(s) t
E(EC(S
s : Z(s) t )) =
d
d
s2S
d=0
with ½0 ´ P. Taylor & Worsley (2007):
¹ ¸ t) =
½d (B
X
n
j=1
¡
¡
¢
¢
¡j ¸
j
º
wj ½d Beta ;
t
2
2
From well-known
EC
µ From simulations
¶ X
µ
¶
linear combination!
n
¡
¢ densities
at a single¸
voxel ¼ Same
of F field
¡
¸
¹
j
º
j
P max B(s) t
wj P max Beta ;
(s) t
s2S
j=1
s2S
2
2
Proof, n=3:
Power?
S = 1000cc brain, FWHM = 10mm, P = 0.05
Event
Block (20 seconds)
1
1
Cone angle θ = 78.4o
Cone angle θ = 38.1o
0.9
0.9
T-test on β1
0.8
0.8
0.7
0.6
0.6
Power of test
F-test
on
0.5 (β , β )
1
2
0.4
Cone
weights:
w1 = ½
w2 = θ/2π
0.5
0.4
0.3
0.2
0
0
0.3
x1*(t)
0
0.2
x2*(t)
-0.5
0.1
2
0
2
Shift d of HRF (seconds)
3
0
0
-2
0.1
20
40
Time t (seconds)
1
Response
0.5
Response
Power of test
Beta-bar test
0.7
0
0
20
40
Time t (seconds)
1
2
Shift d of HRF (seconds)
3
Bubbles task in fMRI scanner

Correlate bubbles with BOLD at every voxel:
Trial
1
2
3
4
5
6
7 …
3000
1
0.5
0
fMRI
10000
0

Calculate Z for each pair (bubble pixel, fMRI voxel) – a
5D “image” of Z statistics …
Thresholding? Cross correlation random field
Correlation between 2 fields at 2 different locations,
searchedµ
over all pairs of locations,
¶ one in S, one in T:

P
max C(s; t) ¸ c
s2S;t2T
=
¼ E(EC fs 2 S; t 2 T : C(s; t) ¸ cg)
dim(S)
X dim(T
X)
i=0
2n¡2¡h (i ¡ 1)!j!
¸
½ij (C c) =
¼h=2+1
L (S)L (T )½ (C ¸ c)
i
j
ij
j=0
b(hX
¡1)=2c
(¡1)k ch¡1¡2k (1 ¡ c2 )(n¡1¡h)=2+k
k=0
X
k X
k
l=0 m=0
¡( n¡i + l)¡( n¡j + m)
2
2
¡
¡
¡
¡
l!m!(k l m)!(n 1 h + l + m + k)!(i ¡ 1 ¡ k ¡ l + m)!(j ¡ k ¡ m + l)!
Cao & Worsley, Annals of Applied Probability (1999)

Bubbles data: P=0.05, n=3000, c=0.113, T=6.22
NNLS for bubbles?










At the moment, we are correlating Y(t) = fMRI data at each voxel with
each of the 240x380=91200 face pixels as regressors x1(t),…,x91200(t)
separately:
Y(t) = xj(t)βj + z(t)γ + ε(t).
We should be doing this simultaneously:
Y(t) = x1(t)β1 + … + x91200(t)β91200 + z(t)γ + ε(t).
Obviously impossible: #observations(3000) << #regressors(91200).
Maybe we can use NNLS: β1 ≥ 0, …, β91200 ≥ 0.
It should enforce sparsity over β = activation at face pixels, provided
#observations(3000) >> #dimensions of cone ~ #resels of face(146.2)
We can threshold Beta-bar over brain voxels to P<0.05 using above.
Result will be an face image of isolated “local maxima” for each voxel.
It will tell you which brain voxels are activated, but not which face pixels.
Might be a huge computational task!
Interactions?
Y ~ x1 + … + x91200 + x1x2 + z.
MS lesions and cortical thickness

Idea: MS lesions interrupt neuronal signals, causing thinning in
down-stream cortex
Data: n = 425 mild MS patients
5.5
Average cortical thickness (mm)

5
4.5
4
3.5
3
2.5
Correlation = -0.568,
T = -14.20 (423 df)
2
1.5
0
10
20
30
40
50
Total lesion volume (cc)
60
70
80
Charil et al,
NeuroImage (2007)
MS lesions and cortical thickness at all pairs of
points





Dominated by total lesions and average cortical thickness, so remove these
effects as follows:
CT = cortical thickness, smoothed 20mm
ACT = average cortical thickness
LD = lesion density, smoothed 10mm
TLV = total lesion volume

Find partial correlation(LD, CT-ACT) removing TLV via linear model:
 CT-ACT ~ 1 + TLV + LD
 test for LD

Repeat for all voxels in 3D, nodes in 2D
~1 billion correlations, so thresholding essential!
Look for high negative correlations …
Threshold: P=0.05, c=0.300, T=6.48



Cluster extent rather than peak height
(Friston, 1994)

Choose a lower level, e.g. t=3.11 (P=0.001)

Find clusters i.e. connected components of excursion set

L (cluster)
Measure cluster
extent
by resels D
Z
D=1
extent

L (cluster) » c
D

t
®
k
Distribution of maximum cluster extent:
 Bonferroni on N = #clusters ~ E(EC).
Peak
height
Distribution:
 fit a quadratic to the
peak:
Y
s
Cao and Worsley,
Advances in Applied
Probability (1999)
Download