WEB SUPPLEMENTARY MATERIAL FOR “A BAYESIAN HIERARCHICAL SPATIAL POINT PROCESS MODEL FOR MULTI-TYPE NEUROIMAGING META-ANALYSIS” JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER 1. HPGRF Model In this Appendix, we provide proofs for all the theorems that are presented in the paper for the HPGRF model, except for Theorem 2. A proof of Theorem 2 is provided in Wolpert and Ickstadt (1998). We first introduce the following lemmas. Lemma 1. If Y ∼ PP(B, Λ), then for any A, B ⊂ B, we have Cov{NY (A), NY (B)} = Var{NY (A ∩ B)} = Λ(A ∩ B). Proof. Note that NY (A) = NY (A∩B)+NY (A\B) and NY (B) = NY (A∩B)+NY (B \A). By the independence property of the Poisson process, we have that Cov{NY (A), NY (B)} = Cov{NY (A ∩ B) + NY (A \ B), NY (A ∩ B) + NY (B \ A)} = Cov{NY (A ∩ B), NY (A ∩ B)} = Var{NY (A ∩ B)}. Lemma 2. Let Γ(dx) ∼ GRF{α(dx), β}, and let f1 (x) and f2 (x) be measurable functions on B, Then Z Cov Z 1 f1 (x)f2 (x)α(dx). f1 (x)Γ(dx), f2 (x)Γ(dx) = 2 β B B B Z Proof. We start with the case that f1 (x) and f2 (x) are simple functions on B, i.e. for m = 1, . . . , M and n =S 1, . . . , M , there SN exist numbers am , bn ∈PRMand disjoint sets Am and Bn with B = M A = that f1 (x) = a δ (x) and m=1 mR n=1 Bn such m=1 R m Am PN PM f2 (x) = n=1 bn δBn (x). Then B f1 (x)Γ(dx) = m=1 am Γ(Am ) and B f2 (x)Γ(dx) = 1 2 JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER PN n=1 bn Γ(Bn ). Thus, ( M ) Z Z N X X f2 (x)Γ(dx) = E E f1 (x)Γ(dx) × am Γ(Am ) × bn Γ(Bn ) B B = M X N X m=1 n=1 am bn E{Γ2 (Am ∩ Bn )} m=1 n=1 + X am bn0 E{Γ(Am ∩ Bn )}E{Γ(Am0 ∩ Bn0 )} (m,n)6=(m0 ,n0 ) = M N 1 XX am bn α(Am ∩ Bn ) β2 m=1 n=1 M X N X M X N X 1 + 2 am bn0 α(Am ∩ Bn )α(Am0 ∩ Bn0 ) β m=1 n=1 m0 =1 n0 =1 Z Z Z 1 1 f1 (x)f2 (x)α(dx) + 2 f1 (x)α(dx) f2 (x)α(dx). = β2 B β B B Furthermore, Z Z Cov f1 (x)Γ(dx), f2 (x)Γ(dx) B B Z Z Z Z 1 = E f1 (x)Γ(dx) × f2 (x)Γ(dx) − 2 f1 (x)α(dx) f2 (x)α(dx) β B B B B Z 1 = f1 (x)f2 (x)α(dx). β2 B For general measurable functions, f1 (x) and f2 (x), a routine passage to the limit completes the proof. Theorem 1. Within emotion type j and for all A, B ⊆ B, Z 1 E{NYj (A) | σj2 , τ, α, β} = K 2 (A, x)α(dx). τ β B σj Cov{NYj (A), NYj (B) | σj2 , τ, α, β} Z Z 1 1+β (1) = K 2 (A ∩ B, x)α(dx) + 2 2 K 2 (A, x)Kσ2 (B, x)α(dx). j τ β B σj τ β B σj Between emotion types j and k (j 6= k), (2) Cov{NYj (A), NYk (B) | σj2 , σk2 , τ, α, β} Z 1 K 2 (A, x)Kσ2 (B, x)α(dx). = 2 2 k τ β B σj WEB SUPPLEMENTARY 3 Proof. First, we note that given G0 , τ and σj2 , the conditional expectation of NYj (A) is E{NYj (A) | G0 , τ, σj2 } = EGj {E{NYj (A) | Gj } | G0 , τ, σj2 } Z Z 1 K 2 (A, x)G0 (dx). = Kσ2 (A, x)E {Gj (dx) | G0 , τ } = j τ B σj B Also, the conditional covariance between NYj (A) and NYj (B) is Cov{NYj (A), NYj (B) | G0 , τ, σj2 } = EGj {Cov{NYj (A), NYj (B) | Gj } | G0 , τ, σj2 } + CovGj {E{NYj (A) | Gj }, E{NYj (B) | Gj } | G0 , τ, σj2 } Z Z 1 1 = Kσ2 (A ∩ B, x)G0 (dx) + 2 K 2 (A, x)Kσ2 (B, x)G0 (dx). j j τ B τ B σj Now, given σj2 , α, β, within type j, for any A ⊆ B, E{NYj (A) | σj2 , τ, α, β} = EG0 {E{NYj (A) | G0 , σj2 , τ } | σj2 , τ, α, β} Z Z 1 1 Kσ2 (A, x)E{G0 (dx) | α, β} = K 2 (A, x)α(dx). = j τ B τ β B σj Within type j, the conditional covariance between NYj (A) and NYj (B) is Cov{NYj (A), NYj (B) | σj2 , τ, α, β} = EG0 {Cov{NYj (A), NYj (B) | G0 , τ, σj2 } | σj2 , τ, α, β} +CovG0 {E{NYj (A) | G0 , τ, σj2 }, E{NYj (B) | G0 , τ, σj2 } | τ, σj2 , α, β} Z 1+β 1 K 2 (A ∩ B, x)α(dx) + 2 2 K 2 (A, x)Kσ2 (B, x)α(dx). = j τ β B σj τ β B σj Z For j 6= k, the conditional covariance between NYj (A) and NYk (B) is Cov{NYj (A), NYk (B) | σj2 , σk2 , τ, α, β} = EG0 {Cov{NYj (A), NYk (B) | G0 , σj2 , σk2 , τ } | σj2 , σk2 , τ, α, β} +CovG0 {E{NYj (A) | G0 , σj2 , τ }, E{NYk (B) | G0 , σk2 , τ } | σk2 , σj2 , τ, α, β} Z 1 = 2 2 K 2 (A, x)Kσ2 (B, x)α(dx). k τ β B σj The following theorem states that our model can be approximated by the truncated model to any desired level of accuracy. Theorem 3. For j = 1, . . . , J, for any > 0 and for any measurable A ⊆ B, there exists a natural number M , such that E{Λj (A) − ΛM j (A) | β, τ } < , PM where ΛM j (A) = m=1 µjm Kσj2 (A, θm ) is the conditional expectation of NYj (A) in the truncated model (3). 4 JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER Proof. For any M > 0, the conditional expected truncation error given σj2 , τ , β, and {νm , θm }M m=1 is ∞ 1 X M 2 M E{νm | {νm }M E Λj (A) − Λj (A) | σj , τ, {νm , θm }m=1 = m=1 }Kσj2 (A, θm ) τ m=M +1 ≤ = = = = 1 τ ∞ X E{νm | {νm }M m=1 , β} = m=M +1 ∞ X 1 τβ 1 τβ 1 τβ 1 τ ∞ X E{νm+M | {νm }M m=1 , β} m=1 E E1−1 (ζm+M /α(B)) | {νm }M m=1 m=1 ∞ X 0 (let ζm = ζm+M − ζM ) 0 0 E E1−1 ((ζm + ζM )/α(B)) | {νm }M m=1 (note that ζm ∼ Gamma(m, 1)) m=1 ∞ Z ∞ X m=1 0 E1−1 ((s + ζM )/α(B)) sm−1 exp(−s)ds Γ(m) E −1 (ζM /α(B)) 1 1 1 − exp{−E1−1 (ζM /α(B))} ≤ 1 ≤ . τβ τβ (exp(ζM /α(B)) − 1)τ β Taking the expectation with respect to ζM on both sides of the above inequality results in Z ∞ 1 sM −1 exp(−s) M E Λj (A) − Λj (A) | τ, β ≤ ds τ βΓ(M ) 0 (exp(s/α(B)) − 1) Z ∞ α(B) α(B) sM −2 exp(−s)ds = . ≤ τ βΓ(M ) 0 τ β(M − 1) h i When α(B) = 1, we have a more accurate upper bound E Λj (A) − ΛM (A) | τ, β ≤ j P∞ −M . Therefore, for any > 0, take M = (ζ(M ) − 1)/(τ β), where ζ(M ) = k=1 k ba(B)/τ βc + 1, where bxc is the largest integer less than or equal to x. 2. Posterior Computation In this section, we provide the details on the posterior compuation algorithm for the truncated HPGRF model ( ) M X 2 (Yj , Xj ) | {(µj,m , θm )}M ∼ PP B × B, Kσ2 (dy, x) µj,m δθm (dx) , m=1 , σj j m=1 (3) iid [µj,m | νm , τ ] ∼ Gamma (νm , τ ) , {(θm , νm )}M m=1 ∼ InvLévy{α(dx), β, M }, 2.1. Posterior Computation. The truncated model (3) only involves a fixed number of parameters allowing computation of the posterior. WEB SUPPLEMENTARY 5 The target distribution Let (yi,j , xi,j ), for i = 1, 2, . . . , nj , be multiple independent realizations (e.g. emotion studies) of the Cox process (Yj , Xj ) in model (3), where nj is the number of realizations of mi,j (Yj , Xj ). For each i and j, write (yi,j , xi,j ) = {(xi,j,l , yi,j,l )}l=1 , where (yi,j,l , xi,j,l ) is an observed point in (yi,j , xi,j ) indexed by l, for l = 1, . . . , mi,j , and mi,j is the observed nj J number of points in (yi,j , xi,j ). The joint density of {{xi,j }i=1 }j=1 , {µj }Jj=1 , {σj2 }Jj=1 , ν, nj J θ, τ and β given {{yi,j }i=1 }j=1 is " nj # J Y Y π(yi,j , xi,j | µj , θ, σj2 ) π(σj2 )π(µj | ν, τ ) π(τ )π(ν, θ | β)π(β) j=1 i=1 ∝ J Y " ( exp − M X ) nj Kσ2 (B, θm )µj,m j m=1 j=1 nj mi,j Y Y × J Y kσ2 (yi,j,l , xi,j,l ) j π(σj2 ) j=1 M Y m=1 ( m −1 τ νm µνj,m Γ(νm ) µj,m Iθm (xi,j,l ) )# exp{−τ µj,m } ×π(τ ) exp{−E1 (βνM )/`(B)}π(β) (4) # m=1 i=1 l=1 " M X M Y −1 [νm exp{−νm β}], m=1 where π(yi,j , xi,j | µj , θ, σj2 ) is the density of (Yj , Xj ) with respect to a unit rate Poisson process (Møller and Waagepetersen 2004). The densities of other parameters are all with respect to a product of Lebesgue measures. In this article, we assume α(dx) = `(dx) (i.e. Lebesgue measure) which is non-atomic; thus, the θm , for m = 1, 2, . . . , M , are distinct with probability one. Also, Ix (y) is the indicator function with Ix (y) = 1 if x = y, Ix (y) = 0, otherwise. Now we summarize the key steps in the posterior simulation. Sampling xj : From (4), it is straightforward to obtain the conditional distribution of xi,j,l given all other parameters: Pr(xi,j,l = θm | ·) ∝ µj,m kσ2 (yi,j,l , θm ). j Sampling θ: The full conditional density of θ is given by π(θ | ·) ∝ (5) # nj mi,j " M M X J J Y X Y Y X exp − Kσ2 (B, θm )µjm nj µj,m Ixi,j,l (θm ) . j m=1 j=1 PM j=1 i=1 l=1 m=1 f This implies that m=1 Ixi,j,l (θm ) > 0 for all i, j and l. Let θe1 , . . . , θeM f be M distinct mij nj J M points in {{{xi,j,l }l=1 }i=1 }j=1 . Due to the symmetry of {µ1,m , . . . , µJ,m , θm }m=1 in (5) and noting that θ1 , . . . , θM are distinct points, there exists one and only one θm that is equal to one of θe1 , . . . , θeM f. Thus, to sample θ, we first draw a random permutation of f, let θpm = θem . For m > M f, {1, . . . , M } denoted by {p1 , . . . , pM }. Then for 1 ≤ m ≤ M 6 JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER draw θpm according to the following density, J X π(θpm | ·) ∝ exp − Kσ2 (B, θpm )µj,pm nj . j j=1 Note that this is a discrete distribution with normalizing constant equal to the sum of the right hand side over m = 1, . . . , M . Sampling µj : Proposition 1. The full conditional distribution of µj,m , for j = 1, . . . , J and m = 1, . . . , M , is given by " # nj mi,j X X [µj,m | ·] ∼ Gamma νm + Iθm (xi,j,l ), nj Kσ2 (B, θm ) + τ . j i=1 l=1 P Proof. Write ai,j,l = M mIθm (xi,j,l ), then we have Im (ai,j,l ) = Iθm (xi,j,l ). Note that P PM m=1 the θm are distinct, m=1 Iθm (xi,j,l ) = 1 = M m=1 Im (ai,j,l ). Thus, M X (6) µj,m Iθm (xi,j,l ) = M X µj,m Im (ai,j,l ) = µj,ai,j,l . m=1 m=1 Furthermore, (7) nj mi,j Y Y i=1 l=1 µj,ai,j,l = nj mi,j Y Y i=1 l=1 µ PM m=1 Im (ai,j,l ) j,ai,j,l = nj mi,j M Y Y Y i=1 l=1 m=1 Im (ai,j,l ) µj,a i,j,l = M Y b j,m µj,m , m=1 Pnj Pmi,j where bj,m = i=1 l=1 Iθm (xi,j,l ). From the joint posterior distribution of all the paνm +bj,m −1 rameters, (6) and (7), we have π(µj,m | ·) ∝ µj,m exp{−(nj Kσj (B, x) + θm )µj,m }. Sampling ν: The full conditional distribution of νm is ( cνmm −1 m = 1, . . . , M − 1 Γ(νm ) νm , ν π(νm | ·) ∝ cMM −1 exp{−E1 (βνM )/`(B)} Γ(νM ) νM , m=M Q where cm = τ J Jj=1 µj,m e−β . We use a symmetric random walk to update vm : sample v ∗ ∼ N (vm , σv2 ) and accept v ∗ with probability min{1, π(v ∗ | ·)/π(vm | ·)}. 2.2. Sampling Hyperparameters. We update σj2 , for j = 1, . . . , J, τ using Metropolis within Gibbs sampling (Smith and Roberts 1993). In this article, we choose kσ2 (y, x) = j −d/2 2 2 2 2πσj exp{−ky − xk /(2σj )} (an isotropic Gaussian density) and assume, a priori, σj−2 ∼ Uniform[aσ , bσ ], τ ∼ Gamma(aτ , bτ ) and β ∼ Gamma(aβ , bβ ) (hyperprior parameters will be specified in the next section). WEB SUPPLEMENTARY 7 The full conditional of σj2 is π(σj2 | ·) # M n o S2 X m·,j d ·,j 2 log(σj ) I[aσ ,bσ ] (σj−2 ), ∝ exp − Kσ2 (B, θm )µj,m nj − 2 − j 2 σ j m=1 " Pnj Pmi,j Pnj 2 = 1 2 2 where S·,j i=1 j=1 mi,j . Thus, to update σj , we l=1 kyi,j,l − xi,j,l k and m·,j = 2 use a random walk and first draw σj2∗ ∼ N (σj2 , θσ2 ), if σj2∗ ∈ (aσ , bσ ), then set σj2 = σj2∗ with probability min{r(σ), 1}, where r(σ) is M nh i o X 2 exp Kσ2 (B, θm ) − Kσ2∗ (B, θm ) µj,m nj + S·,j " j j m=1 1 1 − 2∗ 2 σj σj !# σj2 σj2∗ ! 1 m·,j d 2 . The full conditional of τ is π(τ | ·) ∝ τ J J X M X m=1 νm +aτ −1 exp − bτ + µj,m τ , PM j=1 m=1 which implies that we can update τ by drawing M J X M X X [τ | ·] ∼ Gamma J νm + aτ − 1, µj,m + bτ . m=1 j=1 m=1 The full conditional of β is ( " π(β | ·) ∝ β bβ −1 exp − aβ + M X # ) νm β − E1 (βνM )/`(B) . i=1 We update β using a symmetric random walk by sampling β ∗ ∼ N (β, σβ2 ) and accepting it with probability min {1, π(β ∗ | ·)/π(β | ·)}. 3. Sensitivity Analysis In this section, we conduct simulation studies to study how the posterior inference on the intensity function varies with different prior specifications of σj2 , β and τ and truncation approximations M . Specifically, we consider nine scenarios in Table 1. We simulate the posterior distribution with 20,000 iterations after a burn-in of 2,000 iterations. Table 2 shows that summary statistics of posterior mean intensity estimates over the whole brain regions for nine different scenarios. Theses summary statistics are quite similar and show that posterior results are not very sensitive to prior specification. 8 JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER Table 1. Prior specifications and truncation approximations of nine scenarios for sensitivity analysis Scenarios 1 2 3 4 5 6 7 8 9 σj−2 U (0, 10) U (0, 10) U (0, 10) U (0, 10) U (0, 10) U (0, 10) G(2.0, 2.0) G(1.0, 1.0) G(3.0, 3.0) β G(0.1, 0.1) G(0.2, 0.2) G(0.1, 0.1) G(0.2, 0.2) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) τ G(0.1, 0.1) G(0.2, 0.2) G(0.1, 0.1) G(0.2, 0.2) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) G(2.0, 2.0) M 10, 000 10, 000 5, 000 5, 000 10, 000 12, 500 10, 000 10, 000 10, 000 Table 2. Summary statistics of posterior mean intensity estimates over the whole brain regions for nine different scenarios Emotions Scenarios 5 6 3.8e-20 4.0e-16 1.3e-07 1.4e-07 1.7e-03 1.7e-03 Stats Min. Med. Max. 1 9.1e-18 1.5e-07 1.7e-03 2 2.8e-17 1.4e-07 1.6e-03 3 3.6e-20 1.2e-07 1.6e-03 4 1.2e-17 1.6e-07 1.6e-03 Happy Min. Med. Max. 1.7e-17 3.1e-08 1.5e-03 1.6e-17 3.2e-08 1.4e-03 3.9e-23 1.2e-08 1.6e-03 5.8e-19 1.7e-08 1.6e-03 6.5e-24 8.0e-09 1.5e-03 Anger Min. Med. Max. 6.2e-19 2.8e-08 1.8e-03 1.7e-17 3.2e-08 1.9e-03 3.9e-22 2.5e-08 1.6e-03 1.6e-18 2.7e-08 1.8e-03 Fear Min. Med. Max. 7.8e-21 6.4e-08 1.5e-03 1.6e-17 8.0e-08 1.5e-03 2.2e-21 6.1e-08 1.5e-03 Disgust Min. Med. Max. 3.7e-19 5.6e-08 2.0e-03 2.9e-19 5.3e-08 2.0e-03 Pop. Mean Min. Med. Max. 8.3e-07 1.6e-05 3.2e-04 6.8e-07 1.5e-05 2.8e-04 Sad 7 1.5e-16 1.5e-07 1.6e-03 8 7.4e-17 1.7e-07 1.6e-03 9 3.0e-17 1.8e-07 1.6e-03 6.7e-17 3.4e-08 1.6e-03 3.8e-18 2.2e-08 1.6e-03 1.4e-17 2.0e-08 1.5e-03 4.7e-18 2.0e-08 1.5e-03 6.5e-24 2.4e-08 1.7e-03 3.9e-17 3.3e-08 1.9e-03 1.3e-18 2.0e-08 1.8e-03 7.3e-19 3.4e-08 1.7e-03 6.4e-19 2.1e-08 1.9e-03 3.1e-17 6.2e-08 1.5e-03 1.7e-22 6.3e-08 1.5e-03 2.3e-17 7.8e-08 1.4e-03 5.2e-18 6.9e-08 1.4e-03 1.2e-17 9.0e-08 1.5e-03 3.5e-18 6.2e-08 1.4e-03 4.1e-22 5.8e-08 2.0e-03 1.2e-18 4.8e-08 2.0e-03 2.1e-22 4.2e-08 2.1e-03 3.0e-17 4.2e-08 2.1e-03 2.6e-18 5.2e-08 2.1e-03 2.9e-18 5.6e-08 2.0e-03 3.4e-19 5.9e-08 2.1e-03 7.7e-08 1.1e-05 4.6e-04 5.9e-07 1.4e-05 3.0e-04 4.8e-08 1.1e-05 6.7e-04 9.0e-07 1.4e-05 2.2e-04 6.8e-07 1.4e-05 2.8e-04 6.0e-07 1.2e-05 2.6e-04 5.8e-07 1.3e-05 2.5e-04 4. Bayesian Spatial Point Process Classifier In this section, we present the important sampling approach to sampling Tn+1 . We have the follow proposition: WEB SUPPLEMENTARY 9 Proposition 2. The posterior predictive distribution for Tn+1 is given by Pr[Tn+1 = j | xn+1 , Dn ] = R pj π(xn+1 | Tn+1 = j, Θ)π(Θ | Dn )dΘ R PJ 0 j 0 =1 pj 0 π(xn+1 | Tn+1 = j , Θ)π(Θ | Dn )dΘ (8) where Z π(xn+1 | Tn+1 = j, Θ) = exp |B| − λj (x | Θ)dx B Y λj (x | Θ). x∈xn+1 Proof. First, we notice that π(Θ | xn+1 , Dn ) π(Θ | Dn ) π(Θ | Dn ) π(xn+1 , Dn | Θ)π(Θ) π(Dn ) · · π(Θ | Dn ) π(xn+1 , Dn ) π(Dn | Θ)π(Θ) π(xn+1 | Θ)π(Dn | Θ)π(Θ) π(Dn ) · · π(Θ | Dn ) π(xn+1 , Dn ) π(Dn | Θ)π(Θ) π(xn+1 | Θ) π(xn+1 | Θ)π(Θ | Dn ) . π(Θ | Dn ) = R π(xn+1 | Dn ) π(xn+1 | Θ)π(Θ | Dn )dΘ π(Θ | xn+1 , Dn ) = = = (9) = Thus, the probability becomes R (10) Pr[Tn+1 = j | xn+1 , Dn ] = R PJ pj π(xn+1 | Tn+1 = j, Θ)π(Θ | Dn )dΘ j 0 =1 pj 0 π(xn+1 | Tn+1 = j 0 , Θ)π(Θ | Dn )dΘ . This proposition leads to the following algorithm used to estimate the posterior predictive probability used for reverse inference. Reverse inference algorithm. • Input: The observed data Dn , a foci pattern, xn+1 , reported from a new study and the total number of simulations K. • Step 1: Run a Bayesian spatial point process model to obtain the posterior draws Θ(k) ∼ π(Θ | Dn ), for k = 1, . . . , K. • Step 2: Compute Y Z (k) λj (x | Θ(k) ) . πj = exp − λj (x | Θ(k) )dx B x∈xn+1 • Output: The posterior predictive probability is given by P (k) pj K k=1 πj c Pr[Tn+1 = j | xn+1 , Dn ] = P (11) . PK (k) J j 0 =1 pj 0 k=1 πj 0 10 JIAN KANG, TIMOTHY D. JOHNSON, THOMAS E. NICHOLS, AND TOR D. WAGER The prediction of Tn+1 is given by (12) T̂n+1 = arg max pj j K X ! (k) πj . k=1 To evaluate the performance of our method, we are interested in computing the leave-oneout cross-validation (LOOCV) classification rates. More specifically, we use data D−i = {(xl , tl )}l6=i to make a predication of Ti denoted by T̂i , and focus on the J × J LOOCV confusion matrix C = {cjj 0 }, defined by Pn I (t )I 0 (T̂ ) Pn j i j i , (13) cjj 0 = i=1 i=1 Ij (ti ) where Ia (b) is an indicator function. Ia (b) = 1 if a = b, Ia (b) = 0, otherwise. Then the overall and the average correct classification rates are respectively given by n (14) 1X Iti (T̂i ), n co = n and ca = i=1 1X cjj . n i=1 In order to obtain T̂i , we compute the LOOCV predictive probabilities. For i = 1, . . . , n, Z (15) Pr[Ti = j | xi , D−i ] = Pr[Ti = j | xi , D−i , Θ]π(Θ | xi , D−i )dΘ, which can be estimated via Monte Carlo simulation. However, it is not straightforward and very inefficient to draw Θ from π(Θ | xi , D−i ) for each i. Thus, to avoid having to run the multiple posterior simulations, we consider another representation of (15) in the following proposition. Proposition 3. The LOOCV predictive probabilities of Ti , for i = 1, . . . , n, is given by (16) Pr[Ti = j | xi , D−i ] = PJ pj Qjti j 0 =1 pj 0 Qj 0 ti , where Z Qjj 0 = π(xi | Ti = j, Θ) π(Θ | Dn )dΘ. π(xi | Ti = j 0 , Θ) Proof. The LOOCV posterior predictive probability is Z Pr[Ti = j | xi , D−i ] = Pr[Ti = j | xi , Θ]π(Θ | xi , D−i )dΘ Z π(Θ | xi , D−i ) = Pr[Ti = j | xi , Θ] π(Θ | xi , Ti = ti , D−i )dΘ. π(Θ | xi , Ti = ti , D−i ) Note that π(Θ | xi , D−i ) π(Θ, xi , D−i )π(xi , Ti = ti , x−i , t−i ) Pr[Ti = ti | xi , D−i ] = = . π(Θ | xi , Ti = ti , D−i ) π(xi , D−i ) · π(Θ, xi , Ti = ti , D−i ) Pr[Ti = ti | xi , Θ] WEB SUPPLEMENTARY 11 This implies that Z Pr[ti = j | xi , D−i ] Pr[Ti = j | xi , Θ] = π(Θ | xi , Ti = ti , D−i )dΘ Pr[Ti = ti | xi , D−i ] Pr[Ti = ti | xi , Θ] Z π[xi | Ti = j, Θ]pj π(Θ | xi , Ti = ti , D−i )dΘ := Qj,ti . = π[xi | Ti = ti , Θ]pti P By Jj=1 Pr[Ti = j | xi , D−i ] = 1, we have that 1 − Pr[Ti = ti | xi , D−i ] X = Qj,ti . Pr[Ti = ti | xi , D−i ] j6=ti This implies Pr[Ti = ti | xi , D−i ] = 1 1+ P j6=ti Qj,ti , Pr[Ti = j | xi , D−i ] = 1+ Q P j,ti j6=ti Qj,ti . This proposition leads to the following algorithm. LOOCV algorithm. • Input: The observed data Dn and the total number of simulations K. • Step 1: Run a Bayesian spatial point process model to obtain the posterior draws Θ(k) ∼ π(Θ | Dn ), for k = 1, . . . , K. • Step 2: For i = 1, . . . , n and j = 1, . . . , J, compute (17) K (k) 1 X π(xi | Ti = j, Θi ) b . Qjti = (k) K k=1 π(xi | Ti = ti , Θi ) • Output: The posterior of the predictive probabilities of Ti , for i = 1, . . . , n, are given by (18) c i = j | xi , D−i ] = P pj Q̂jti . Pr[T J j 0 =1 pj 0 Q̂j 0 ti And the estimate of Ti is (19) T̂i = arg max (pj Qjti ) . j References Møller, J. and Waagepetersen, R. P. (2004), Statistical Inference and Simulation for Spatial Point Processes, Chapman and Hall/CRC. Smith, A. and Roberts, G. (1993), “Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods,” Journal of the Royal Statistical Society. Series B (Methodological), 3–23. Wolpert, R. L. and Ickstadt, K. (1998), “Poisson/gamma random field models for spatial statistics,” Biometrika, 85, 251–267.