WEB SUPPLEMENTARY MATERIAL FOR “A BAYESIAN MULTI-TYPE NEUROIMAGING META-ANALYSIS”

advertisement
WEB SUPPLEMENTARY MATERIAL FOR “A BAYESIAN
HIERARCHICAL SPATIAL POINT PROCESS MODEL FOR
MULTI-TYPE NEUROIMAGING META-ANALYSIS”
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
1. HPGRF Model
In this Appendix, we provide proofs for all the theorems that are presented in the paper
for the HPGRF model, except for Theorem 2. A proof of Theorem 2 is provided in Wolpert
and Ickstadt (1998). We first introduce the following lemmas.
Lemma 1. If Y ∼ PP(B, Λ), then for any A, B ⊂ B, we have
Cov{NY (A), NY (B)} = Var{NY (A ∩ B)} = Λ(A ∩ B).
Proof. Note that NY (A) = NY (A∩B)+NY (A\B) and NY (B) = NY (A∩B)+NY (B \A).
By the independence property of the Poisson process, we have that
Cov{NY (A), NY (B)} = Cov{NY (A ∩ B) + NY (A \ B), NY (A ∩ B) + NY (B \ A)}
= Cov{NY (A ∩ B), NY (A ∩ B)} = Var{NY (A ∩ B)}.
Lemma 2. Let Γ(dx) ∼ GRF{α(dx), β}, and let f1 (x) and f2 (x) be measurable functions
on B, Then
Z
Cov
Z
1
f1 (x)f2 (x)α(dx).
f1 (x)Γ(dx), f2 (x)Γ(dx) = 2
β B
B
B
Z
Proof. We start with the case that f1 (x) and f2 (x) are simple functions on B, i.e. for
m = 1, . . . , M and n =S 1, . . . , M , there
SN exist numbers am , bn ∈PRMand disjoint sets
Am and Bn with B = M
A
=
that f1 (x) =
a δ (x) and
m=1 mR
n=1 Bn such
m=1
R m Am
PN
PM
f2 (x) =
n=1 bn δBn (x). Then B f1 (x)Γ(dx) =
m=1 am Γ(Am ) and B f2 (x)Γ(dx) =
1
2
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
PN
n=1 bn Γ(Bn ).
Thus,
( M
)
Z
Z
N
X
X
f2 (x)Γ(dx) = E
E
f1 (x)Γ(dx) ×
am Γ(Am ) ×
bn Γ(Bn )
B
B
=
M X
N
X
m=1
n=1
am bn E{Γ2 (Am ∩ Bn )}
m=1 n=1
+
X
am bn0 E{Γ(Am ∩ Bn )}E{Γ(Am0 ∩ Bn0 )}
(m,n)6=(m0 ,n0 )
=
M N
1 XX
am bn α(Am ∩ Bn )
β2
m=1 n=1
M X
N X
M
X
N
X
1
+ 2
am bn0 α(Am ∩ Bn )α(Am0 ∩ Bn0 )
β
m=1 n=1 m0 =1 n0 =1
Z
Z
Z
1
1
f1 (x)f2 (x)α(dx) + 2
f1 (x)α(dx) f2 (x)α(dx).
=
β2 B
β B
B
Furthermore,
Z
Z
Cov
f1 (x)Γ(dx), f2 (x)Γ(dx)
B
B
Z
Z
Z
Z
1
= E
f1 (x)Γ(dx) ×
f2 (x)Γ(dx) − 2
f1 (x)α(dx) f2 (x)α(dx)
β B
B
B
B
Z
1
=
f1 (x)f2 (x)α(dx).
β2 B
For general measurable functions, f1 (x) and f2 (x), a routine passage to the limit completes
the proof.
Theorem 1. Within emotion type j and for all A, B ⊆ B,
Z
1
E{NYj (A) | σj2 , τ, α, β} =
K 2 (A, x)α(dx).
τ β B σj
Cov{NYj (A), NYj (B) | σj2 , τ, α, β}
Z
Z
1
1+β
(1)
=
K 2 (A ∩ B, x)α(dx) + 2 2
K 2 (A, x)Kσ2 (B, x)α(dx).
j
τ β B σj
τ β B σj
Between emotion types j and k (j 6= k),
(2)
Cov{NYj (A), NYk (B) | σj2 , σk2 , τ, α, β}
Z
1
K 2 (A, x)Kσ2 (B, x)α(dx).
= 2 2
k
τ β B σj
WEB SUPPLEMENTARY
3
Proof. First, we note that given G0 , τ and σj2 , the conditional expectation of NYj (A) is
E{NYj (A) | G0 , τ, σj2 } = EGj {E{NYj (A) | Gj } | G0 , τ, σj2 }
Z
Z
1
K 2 (A, x)G0 (dx).
=
Kσ2 (A, x)E {Gj (dx) | G0 , τ } =
j
τ B σj
B
Also, the conditional covariance between NYj (A) and NYj (B) is
Cov{NYj (A), NYj (B) | G0 , τ, σj2 }
= EGj {Cov{NYj (A), NYj (B) | Gj } | G0 , τ, σj2 }
+ CovGj {E{NYj (A) | Gj }, E{NYj (B) | Gj } | G0 , τ, σj2 }
Z
Z
1
1
=
Kσ2 (A ∩ B, x)G0 (dx) + 2
K 2 (A, x)Kσ2 (B, x)G0 (dx).
j
j
τ B
τ B σj
Now, given σj2 , α, β, within type j, for any A ⊆ B,
E{NYj (A) | σj2 , τ, α, β} = EG0 {E{NYj (A) | G0 , σj2 , τ } | σj2 , τ, α, β}
Z
Z
1
1
Kσ2 (A, x)E{G0 (dx) | α, β} =
K 2 (A, x)α(dx).
=
j
τ B
τ β B σj
Within type j, the conditional covariance between NYj (A) and NYj (B) is
Cov{NYj (A), NYj (B) | σj2 , τ, α, β}
= EG0 {Cov{NYj (A), NYj (B) | G0 , τ, σj2 } | σj2 , τ, α, β}
+CovG0 {E{NYj (A) | G0 , τ, σj2 }, E{NYj (B) | G0 , τ, σj2 } | τ, σj2 , α, β}
Z
1+β
1
K 2 (A ∩ B, x)α(dx) + 2 2
K 2 (A, x)Kσ2 (B, x)α(dx).
=
j
τ β B σj
τ β B σj
Z
For j 6= k, the conditional covariance between NYj (A) and NYk (B) is
Cov{NYj (A), NYk (B) | σj2 , σk2 , τ, α, β}
= EG0 {Cov{NYj (A), NYk (B) | G0 , σj2 , σk2 , τ } | σj2 , σk2 , τ, α, β}
+CovG0 {E{NYj (A) | G0 , σj2 , τ }, E{NYk (B) | G0 , σk2 , τ } | σk2 , σj2 , τ, α, β}
Z
1
= 2 2
K 2 (A, x)Kσ2 (B, x)α(dx).
k
τ β B σj
The following theorem states that our model can be approximated by the truncated
model to any desired level of accuracy.
Theorem 3. For j = 1, . . . , J, for any > 0 and for any measurable A ⊆ B, there exists
a natural number M , such that
E{Λj (A) − ΛM
j (A) | β, τ } < ,
PM
where ΛM
j (A) =
m=1 µjm Kσj2 (A, θm ) is the conditional expectation of NYj (A) in the
truncated model (3).
4
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
Proof. For any M > 0, the conditional expected truncation error given σj2 , τ , β, and
{νm , θm }M
m=1 is
∞
1 X
M
2
M
E{νm | {νm }M
E Λj (A) − Λj (A) | σj , τ, {νm , θm }m=1 =
m=1 }Kσj2 (A, θm )
τ
m=M +1
≤
=
=
=
=
1
τ
∞
X
E{νm | {νm }M
m=1 , β} =
m=M +1
∞
X
1
τβ
1
τβ
1
τβ
1
τ
∞
X
E{νm+M | {νm }M
m=1 , β}
m=1
E E1−1 (ζm+M /α(B)) | {νm }M
m=1
m=1
∞
X
0
(let ζm
= ζm+M − ζM )
0
0
E E1−1 ((ζm
+ ζM )/α(B)) | {νm }M
m=1 (note that ζm ∼ Gamma(m, 1))
m=1
∞ Z ∞
X
m=1 0
E1−1 ((s + ζM )/α(B))
sm−1
exp(−s)ds
Γ(m)
E −1 (ζM /α(B))
1 1
1 − exp{−E1−1 (ζM /α(B))} ≤ 1
≤
.
τβ
τβ
(exp(ζM /α(B)) − 1)τ β
Taking the expectation with respect to ζM on both sides of the above inequality results in
Z ∞
1
sM −1 exp(−s)
M
E Λj (A) − Λj (A) | τ, β ≤
ds
τ βΓ(M ) 0 (exp(s/α(B)) − 1)
Z ∞
α(B)
α(B)
sM −2 exp(−s)ds =
.
≤
τ βΓ(M ) 0
τ β(M − 1)
h
i
When α(B) = 1, we have a more accurate upper bound E Λj (A) − ΛM
(A)
|
τ,
β
≤
j
P∞ −M
. Therefore, for any > 0, take M =
(ζ(M ) − 1)/(τ β), where ζ(M ) =
k=1 k
ba(B)/τ βc + 1, where bxc is the largest integer less than or equal to x.
2. Posterior Computation
In this section, we provide the details on the posterior compuation algorithm for the
truncated HPGRF model
(
)
M
X
2
(Yj , Xj ) | {(µj,m , θm )}M
∼ PP B × B, Kσ2 (dy, x)
µj,m δθm (dx) ,
m=1 , σj
j
m=1
(3)
iid
[µj,m | νm , τ ] ∼ Gamma (νm , τ ) ,
{(θm , νm )}M
m=1 ∼ InvLévy{α(dx), β, M },
2.1. Posterior Computation. The truncated model (3) only involves a fixed number of
parameters allowing computation of the posterior.
WEB SUPPLEMENTARY
5
The target distribution
Let (yi,j , xi,j ), for i = 1, 2, . . . , nj , be multiple independent realizations (e.g. emotion studies) of the Cox process (Yj , Xj ) in model (3), where nj is the number of realizations of
mi,j
(Yj , Xj ). For each i and j, write (yi,j , xi,j ) = {(xi,j,l , yi,j,l )}l=1
, where (yi,j,l , xi,j,l ) is an
observed point in (yi,j , xi,j ) indexed by l, for l = 1, . . . , mi,j , and mi,j is the observed
nj J
number of points in (yi,j , xi,j ). The joint density of {{xi,j }i=1
}j=1 , {µj }Jj=1 , {σj2 }Jj=1 , ν,
nj J
θ, τ and β given {{yi,j }i=1 }j=1 is
" nj
#
J
Y
Y
π(yi,j , xi,j | µj , θ, σj2 ) π(σj2 )π(µj | ν, τ ) π(τ )π(ν, θ | β)π(β)
j=1
i=1
∝
J
Y
"
(
exp −
M
X
)
nj Kσ2 (B, θm )µj,m
j
m=1
j=1
nj mi,j
Y
Y
×
J
Y
kσ2 (yi,j,l , xi,j,l )
j
π(σj2 )
j=1
M
Y
m=1
(
m −1
τ νm µνj,m
Γ(νm )
µj,m Iθm (xi,j,l )
)#
exp{−τ µj,m }
×π(τ ) exp{−E1 (βνM )/`(B)}π(β)
(4)
#
m=1
i=1 l=1
"
M
X
M
Y
−1
[νm
exp{−νm β}],
m=1
where π(yi,j , xi,j | µj , θ, σj2 ) is the density of (Yj , Xj ) with respect to a unit rate Poisson
process (Møller and Waagepetersen 2004). The densities of other parameters are all with
respect to a product of Lebesgue measures. In this article, we assume α(dx) = `(dx)
(i.e. Lebesgue measure) which is non-atomic; thus, the θm , for m = 1, 2, . . . , M , are distinct
with probability one. Also, Ix (y) is the indicator function with Ix (y) = 1 if x = y, Ix (y) = 0,
otherwise. Now we summarize the key steps in the posterior simulation.
Sampling xj : From (4), it is straightforward to obtain the conditional distribution of xi,j,l
given all other parameters:
Pr(xi,j,l = θm | ·) ∝ µj,m kσ2 (yi,j,l , θm ).
j
Sampling θ: The full conditional density of θ is given by
π(θ | ·) ∝
(5)


#
nj mi,j " M
M X
J
J Y
 X
Y
Y X
exp −
Kσ2 (B, θm )µjm nj
µj,m Ixi,j,l (θm ) .
j


m=1 j=1
PM
j=1 i=1 l=1
m=1
f
This implies that m=1 Ixi,j,l (θm ) > 0 for all i, j and l. Let θe1 , . . . , θeM
f be M distinct
mij nj J
M
points in {{{xi,j,l }l=1 }i=1 }j=1 . Due to the symmetry of {µ1,m , . . . , µJ,m , θm }m=1 in (5)
and noting that θ1 , . . . , θM are distinct points, there exists one and only one θm that is
equal to one of θe1 , . . . , θeM
f. Thus, to sample θ, we first draw a random permutation of
f, let θpm = θem . For m > M
f,
{1, . . . , M } denoted by {p1 , . . . , pM }. Then for 1 ≤ m ≤ M
6
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
draw θpm according to the following density,


J
 X

π(θpm | ·) ∝ exp −
Kσ2 (B, θpm )µj,pm nj .
j


j=1
Note that this is a discrete distribution with normalizing constant equal to the sum of the
right hand side over m = 1, . . . , M .
Sampling µj :
Proposition 1. The full conditional distribution of µj,m , for j = 1, . . . , J and m =
1, . . . , M , is given by
"
#
nj mi,j
X
X
[µj,m | ·] ∼ Gamma νm +
Iθm (xi,j,l ), nj Kσ2 (B, θm ) + τ .
j
i=1 l=1
P
Proof. Write ai,j,l = M
mIθm (xi,j,l ), then we have Im (ai,j,l ) = Iθm (xi,j,l ). Note that
P
PM m=1
the θm are distinct, m=1 Iθm (xi,j,l ) = 1 = M
m=1 Im (ai,j,l ). Thus,
M
X
(6)
µj,m Iθm (xi,j,l ) =
M
X
µj,m Im (ai,j,l ) = µj,ai,j,l .
m=1
m=1
Furthermore,
(7)
nj mi,j
Y
Y
i=1 l=1
µj,ai,j,l =
nj mi,j
Y
Y
i=1 l=1
µ
PM
m=1 Im (ai,j,l )
j,ai,j,l
=
nj mi,j M
Y Y
Y
i=1 l=1 m=1
Im (ai,j,l )
µj,a
i,j,l
=
M
Y
b
j,m
µj,m
,
m=1
Pnj Pmi,j
where bj,m = i=1 l=1 Iθm (xi,j,l ). From the joint posterior distribution of all the paνm +bj,m −1
rameters, (6) and (7), we have π(µj,m | ·) ∝ µj,m
exp{−(nj Kσj (B, x) + θm )µj,m }.
Sampling ν: The full conditional distribution of νm is
( cνmm −1
m = 1, . . . , M − 1
Γ(νm ) νm ,
ν
π(νm | ·) ∝
cMM
−1
exp{−E1 (βνM )/`(B)} Γ(νM ) νM
, m=M
Q
where cm = τ J Jj=1 µj,m e−β . We use a symmetric random walk to update vm : sample
v ∗ ∼ N (vm , σv2 ) and accept v ∗ with probability min{1, π(v ∗ | ·)/π(vm | ·)}.
2.2. Sampling Hyperparameters. We update σj2 , for j = 1, . . . , J, τ using Metropolis
within Gibbs sampling (Smith and Roberts 1993). In this article, we choose kσ2 (y, x) =
j
−d/2
2
2
2
2πσj
exp{−ky − xk /(2σj )} (an isotropic Gaussian density) and assume, a priori,
σj−2 ∼ Uniform[aσ , bσ ], τ ∼ Gamma(aτ , bτ ) and β ∼ Gamma(aβ , bβ ) (hyperprior parameters will be specified in the next section).
WEB SUPPLEMENTARY
7
The full conditional of σj2 is
π(σj2 | ·)
#
M n
o S2
X
m·,j d
·,j
2
log(σj ) I[aσ ,bσ ] (σj−2 ),
∝ exp −
Kσ2 (B, θm )µj,m nj − 2 −
j
2
σ
j
m=1
"
Pnj Pmi,j
Pnj
2 = 1
2
2
where S·,j
i=1
j=1 mi,j . Thus, to update σj , we
l=1 kyi,j,l − xi,j,l k and m·,j =
2
use a random walk and first draw σj2∗ ∼ N (σj2 , θσ2 ), if σj2∗ ∈ (aσ , bσ ), then set σj2 = σj2∗ with
probability min{r(σ), 1}, where r(σ) is
M nh
i
o
X
2
exp
Kσ2 (B, θm ) − Kσ2∗ (B, θm ) µj,m nj + S·,j
"
j
j
m=1
1
1
− 2∗
2
σj
σj
!#
σj2
σj2∗
! 1 m·,j d
2
.
The full conditional of τ is
π(τ | ·) ∝ τ J
 
 
J X
M


X
m=1 νm +aτ −1 exp
− bτ +
µj,m  τ ,


PM
j=1 m=1
which implies that we can update τ by drawing


M
J X
M
X
X
[τ | ·] ∼ Gamma J
νm + aτ − 1,
µj,m + bτ  .
m=1
j=1 m=1
The full conditional of β is
( "
π(β | ·) ∝ β bβ −1 exp − aβ +
M
X
#
)
νm β − E1 (βνM )/`(B) .
i=1
We update β using a symmetric random walk by sampling β ∗ ∼ N (β, σβ2 ) and accepting it
with probability min {1, π(β ∗ | ·)/π(β | ·)}.
3. Sensitivity Analysis
In this section, we conduct simulation studies to study how the posterior inference on the
intensity function varies with different prior specifications of σj2 , β and τ and truncation
approximations M . Specifically, we consider nine scenarios in Table 1. We simulate the
posterior distribution with 20,000 iterations after a burn-in of 2,000 iterations. Table 2
shows that summary statistics of posterior mean intensity estimates over the whole brain
regions for nine different scenarios. Theses summary statistics are quite similar and show
that posterior results are not very sensitive to prior specification.
8
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
Table 1. Prior specifications and truncation approximations of nine scenarios for sensitivity analysis
Scenarios
1
2
3
4
5
6
7
8
9
σj−2
U (0, 10)
U (0, 10)
U (0, 10)
U (0, 10)
U (0, 10)
U (0, 10)
G(2.0, 2.0)
G(1.0, 1.0)
G(3.0, 3.0)
β
G(0.1, 0.1)
G(0.2, 0.2)
G(0.1, 0.1)
G(0.2, 0.2)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
τ
G(0.1, 0.1)
G(0.2, 0.2)
G(0.1, 0.1)
G(0.2, 0.2)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
G(2.0, 2.0)
M
10, 000
10, 000
5, 000
5, 000
10, 000
12, 500
10, 000
10, 000
10, 000
Table 2. Summary statistics of posterior mean intensity estimates over
the whole brain regions for nine different scenarios
Emotions
Scenarios
5
6
3.8e-20 4.0e-16
1.3e-07 1.4e-07
1.7e-03 1.7e-03
Stats
Min.
Med.
Max.
1
9.1e-18
1.5e-07
1.7e-03
2
2.8e-17
1.4e-07
1.6e-03
3
3.6e-20
1.2e-07
1.6e-03
4
1.2e-17
1.6e-07
1.6e-03
Happy
Min.
Med.
Max.
1.7e-17
3.1e-08
1.5e-03
1.6e-17
3.2e-08
1.4e-03
3.9e-23
1.2e-08
1.6e-03
5.8e-19
1.7e-08
1.6e-03
6.5e-24
8.0e-09
1.5e-03
Anger
Min.
Med.
Max.
6.2e-19
2.8e-08
1.8e-03
1.7e-17
3.2e-08
1.9e-03
3.9e-22
2.5e-08
1.6e-03
1.6e-18
2.7e-08
1.8e-03
Fear
Min.
Med.
Max.
7.8e-21
6.4e-08
1.5e-03
1.6e-17
8.0e-08
1.5e-03
2.2e-21
6.1e-08
1.5e-03
Disgust
Min.
Med.
Max.
3.7e-19
5.6e-08
2.0e-03
2.9e-19
5.3e-08
2.0e-03
Pop. Mean
Min.
Med.
Max.
8.3e-07
1.6e-05
3.2e-04
6.8e-07
1.5e-05
2.8e-04
Sad
7
1.5e-16
1.5e-07
1.6e-03
8
7.4e-17
1.7e-07
1.6e-03
9
3.0e-17
1.8e-07
1.6e-03
6.7e-17
3.4e-08
1.6e-03
3.8e-18
2.2e-08
1.6e-03
1.4e-17
2.0e-08
1.5e-03
4.7e-18
2.0e-08
1.5e-03
6.5e-24
2.4e-08
1.7e-03
3.9e-17
3.3e-08
1.9e-03
1.3e-18
2.0e-08
1.8e-03
7.3e-19
3.4e-08
1.7e-03
6.4e-19
2.1e-08
1.9e-03
3.1e-17
6.2e-08
1.5e-03
1.7e-22
6.3e-08
1.5e-03
2.3e-17
7.8e-08
1.4e-03
5.2e-18
6.9e-08
1.4e-03
1.2e-17
9.0e-08
1.5e-03
3.5e-18
6.2e-08
1.4e-03
4.1e-22
5.8e-08
2.0e-03
1.2e-18
4.8e-08
2.0e-03
2.1e-22
4.2e-08
2.1e-03
3.0e-17
4.2e-08
2.1e-03
2.6e-18
5.2e-08
2.1e-03
2.9e-18
5.6e-08
2.0e-03
3.4e-19
5.9e-08
2.1e-03
7.7e-08
1.1e-05
4.6e-04
5.9e-07
1.4e-05
3.0e-04
4.8e-08
1.1e-05
6.7e-04
9.0e-07
1.4e-05
2.2e-04
6.8e-07
1.4e-05
2.8e-04
6.0e-07
1.2e-05
2.6e-04
5.8e-07
1.3e-05
2.5e-04
4. Bayesian Spatial Point Process Classifier
In this section, we present the important sampling approach to sampling Tn+1 . We have
the follow proposition:
WEB SUPPLEMENTARY
9
Proposition 2. The posterior predictive distribution for Tn+1 is given by
Pr[Tn+1 = j | xn+1 , Dn ] =
R
pj π(xn+1 | Tn+1 = j, Θ)π(Θ | Dn )dΘ
R
PJ
0
j 0 =1 pj 0 π(xn+1 | Tn+1 = j , Θ)π(Θ | Dn )dΘ
(8)
where
Z
π(xn+1 | Tn+1 = j, Θ) = exp |B| −
λj (x | Θ)dx
B
Y
λj (x | Θ).
x∈xn+1
Proof. First, we notice that
π(Θ | xn+1 , Dn )
π(Θ | Dn )
π(Θ | Dn )
π(xn+1 , Dn | Θ)π(Θ)
π(Dn )
·
· π(Θ | Dn )
π(xn+1 , Dn )
π(Dn | Θ)π(Θ)
π(xn+1 | Θ)π(Dn | Θ)π(Θ)
π(Dn )
·
· π(Θ | Dn )
π(xn+1 , Dn )
π(Dn | Θ)π(Θ)
π(xn+1 | Θ)
π(xn+1 | Θ)π(Θ | Dn )
.
π(Θ | Dn ) = R
π(xn+1 | Dn )
π(xn+1 | Θ)π(Θ | Dn )dΘ
π(Θ | xn+1 , Dn ) =
=
=
(9)
=
Thus, the probability becomes
R
(10) Pr[Tn+1 = j | xn+1 , Dn ] = R PJ
pj π(xn+1 | Tn+1 = j, Θ)π(Θ | Dn )dΘ
j 0 =1 pj 0 π(xn+1
| Tn+1 = j 0 , Θ)π(Θ | Dn )dΘ
.
This proposition leads to the following algorithm used to estimate the posterior predictive
probability used for reverse inference.
Reverse inference algorithm.
• Input: The observed data Dn , a foci pattern, xn+1 , reported from a new study
and the total number of simulations K.
• Step 1: Run a Bayesian spatial point process model to obtain the posterior draws
Θ(k) ∼ π(Θ | Dn ), for k = 1, . . . , K.
• Step 2: Compute


 Y
Z

(k)
λj (x | Θ(k) ) .
πj = exp − λj (x | Θ(k) )dx


B
x∈xn+1
• Output: The posterior predictive probability is given by
P
(k)
pj K
k=1 πj
c
Pr[Tn+1 = j | xn+1 , Dn ] = P
(11)
.
PK
(k)
J
j 0 =1 pj 0
k=1 πj 0
10
JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER
The prediction of Tn+1 is given by
(12)
T̂n+1 = arg max pj
j
K
X
!
(k)
πj
.
k=1
To evaluate the performance of our method, we are interested in computing the leave-oneout cross-validation (LOOCV) classification rates. More specifically, we use data D−i =
{(xl , tl )}l6=i to make a predication of Ti denoted by T̂i , and focus on the J × J LOOCV
confusion matrix C = {cjj 0 }, defined by
Pn
I (t )I 0 (T̂ )
Pn j i j i ,
(13)
cjj 0 = i=1
i=1 Ij (ti )
where Ia (b) is an indicator function. Ia (b) = 1 if a = b, Ia (b) = 0, otherwise. Then the
overall and the average correct classification rates are respectively given by
n
(14)
1X
Iti (T̂i ),
n
co =
n
and ca =
i=1
1X
cjj .
n
i=1
In order to obtain T̂i , we compute the LOOCV predictive probabilities. For i = 1, . . . , n,
Z
(15)
Pr[Ti = j | xi , D−i ] = Pr[Ti = j | xi , D−i , Θ]π(Θ | xi , D−i )dΘ,
which can be estimated via Monte Carlo simulation. However, it is not straightforward
and very inefficient to draw Θ from π(Θ | xi , D−i ) for each i. Thus, to avoid having to
run the multiple posterior simulations, we consider another representation of (15) in the
following proposition.
Proposition 3. The LOOCV predictive probabilities of Ti , for i = 1, . . . , n, is given by
(16)
Pr[Ti = j | xi , D−i ] = PJ
pj Qjti
j 0 =1
pj 0 Qj 0 ti
,
where
Z
Qjj 0 =
π(xi | Ti = j, Θ)
π(Θ | Dn )dΘ.
π(xi | Ti = j 0 , Θ)
Proof. The LOOCV posterior predictive probability is
Z
Pr[Ti = j | xi , D−i ] = Pr[Ti = j | xi , Θ]π(Θ | xi , D−i )dΘ
Z
π(Θ | xi , D−i )
=
Pr[Ti = j | xi , Θ]
π(Θ | xi , Ti = ti , D−i )dΘ.
π(Θ | xi , Ti = ti , D−i )
Note that
π(Θ | xi , D−i )
π(Θ, xi , D−i )π(xi , Ti = ti , x−i , t−i )
Pr[Ti = ti | xi , D−i ]
=
=
.
π(Θ | xi , Ti = ti , D−i )
π(xi , D−i ) · π(Θ, xi , Ti = ti , D−i )
Pr[Ti = ti | xi , Θ]
WEB SUPPLEMENTARY
11
This implies that
Z
Pr[ti = j | xi , D−i ]
Pr[Ti = j | xi , Θ]
=
π(Θ | xi , Ti = ti , D−i )dΘ
Pr[Ti = ti | xi , D−i ]
Pr[Ti = ti | xi , Θ]
Z
π[xi | Ti = j, Θ]pj
π(Θ | xi , Ti = ti , D−i )dΘ := Qj,ti .
=
π[xi | Ti = ti , Θ]pti
P
By Jj=1 Pr[Ti = j | xi , D−i ] = 1, we have that
1 − Pr[Ti = ti | xi , D−i ] X
=
Qj,ti .
Pr[Ti = ti | xi , D−i ]
j6=ti
This implies
Pr[Ti = ti | xi , D−i ] =
1
1+
P
j6=ti
Qj,ti
,
Pr[Ti = j | xi , D−i ] =
1+
Q
P j,ti
j6=ti
Qj,ti
.
This proposition leads to the following algorithm.
LOOCV algorithm.
• Input: The observed data Dn and the total number of simulations K.
• Step 1: Run a Bayesian spatial point process model to obtain the posterior draws
Θ(k) ∼ π(Θ | Dn ), for k = 1, . . . , K.
• Step 2: For i = 1, . . . , n and j = 1, . . . , J, compute
(17)
K
(k)
1 X π(xi | Ti = j, Θi )
b
.
Qjti =
(k)
K
k=1 π(xi | Ti = ti , Θi )
• Output: The posterior of the predictive probabilities of Ti , for i = 1, . . . , n, are
given by
(18)
c i = j | xi , D−i ] = P pj Q̂jti
.
Pr[T
J
j 0 =1 pj 0 Q̂j 0 ti
And the estimate of Ti is
(19)
T̂i = arg max (pj Qjti ) .
j
References
Møller, J. and Waagepetersen, R. P. (2004), Statistical Inference and Simulation for Spatial
Point Processes, Chapman and Hall/CRC.
Smith, A. and Roberts, G. (1993), “Bayesian computation via the Gibbs sampler and
related Markov chain Monte Carlo methods,” Journal of the Royal Statistical Society.
Series B (Methodological), 3–23.
Wolpert, R. L. and Ickstadt, K. (1998), “Poisson/gamma random field models for spatial
statistics,” Biometrika, 85, 251–267.
Download