1 Prevalence and Sibling Recurrence Risk Based on the Multi-Risk Model 2 We let Xi represent the dummy variable denoting the affectedness status of a sibling, where Xi 3 = 1 if sibling “i” is affected and Xi = 0 if sibling “i” is not affected, for i = 1, 2, …, n. 4 Similarly, we let Yi represent the dummy variable denoting sex of a sibling, where Yi = 1 if 5 sibling “i” is male and Yi = 0 if sibling “i” is female, for i = 1, 2, …, n. 6 7 It follows, using the notation of this article, that: K 8 a x P(Xi =1|Yi =1) = i i i 1 (= Rm) K 9 P(Xi =1|Yi =0) = p ai xi (= pRm) i 1 10 When the male proportion, P(Yi =1), and the female proportion, P(Yi =0), are denoted by qm 11 and qf, respectively, then the prevalence, R = P(Xi =1), equals: 12 P(Yi =1) P(Xi =1|Yi =1) + P(Yi =0) P(Xi =1|Yi =0) K 13 = (qm pq f ) ai xi ( = (qm+pqf)Rm) (3) i 1 14 Moreover, it follows, using the notation of this article, that: K 15 P(Xi =1, Xj =1|Yi =1, Yj =1) = a x i 1 i 2 i ( = Sm) K 16 P(Xi =1, Xj =1|Yi =1, Yj =0) = P(Xi =1, Xj =1|Yi =0, Yj =1) = p ai xi2 ( = pSm) i 1 K 17 P(Xi =1, Xj =1|Yi =0, Yj =0) = p 2 ai xi2 ( = p2Sm) i 1 18 When the risk that the sibling of the proband will be affected, given that the proband is male, 19 is termed as the sibling recurrence risk given by the male proband, this equals: 1 1 P(Xj =1|Xi =1, Yi =1) 2 = {P(Xi =1, Xj =1,Yi =1, Yj =1)+ P(Xi =1, Xj =1,Yi =1, Yj =0)}/ P(Xi =1,Yi =1) 3 = {P(Xi =1, Xj =1|Yi =1, Yj =1) qm2 + P(Xi =1, Xj =1|Yi =1, Yj =0) qm qm}/ qmRm 4 = (qm+pqf)Sm/Rm 5 6 Similarly, the sibling recurrence risk given by the female proband is: P(Xj =1|Xi =1, Yi =0) 7 = {P(Xi =1, Xj =1,Yi =0, Yj =1)+ P(Xi =1, Xj =1,Yi =0, Yj =0)}/ P(Xi =1,Yi =0) 8 = {P(Xi =1, Xj =1|Yi =0, Yj =1) qmqf + P(Xi =1, Xj =1|Yi =0, Yj =0) qf2}/ pqfRm 9 = (qm+pqf)Sm/Rm 10 11 12 13 Moreover, the joint probability that both siblings i and j are affected equals: P(Xj =1, Xi =1) = P(Xi =1, Xj =1|Yi =1, Yj =1) qm2 + 2P(Xi =1, Xj =1|Yi =0, Yj =1) qm qm + P(Xi =1, Xj =1|Yi =0, Yj =0) qf2 14 = (qm2 + 2pqm qm + p2qf2) Sm 15 = (qm+ pqf)2 Sm 16 17 Then, the sibling recurrence risk, S, equals: P(Xj =1|Xi =1) = P(Xj =1, Xi =1)/ P(Xi =1) 18 = (qm+pqf)2 Sm/(qm+pqf)Rm 19 = (qm+pqf)Sm/Rm 2 K 1 i 1 0 = (qm pq f ) ai xi / x 2 f ( x)dx 1 (4) 2 Therefore, the sibling recurrence risk, the sibling recurrence risk given by the male proband 3 and the sibling recurrence risk given by the female proband are all equal under the model 4 used. 5 6 Log-likelihood Including the Prevalence Information, LLsample+supp 7 Suppose that the number of families in the catchment area is T and that the families in the 8 catchment area are partitioned into sampled families, A, and the other families, Ac. Then 9 consider the decomposition: 10 T T i 1 i 1 P(n i ) P(n i | mi 1) P(mi 1) P(n i | mi 0) P(mi 0) P(n i | mi 1) P(mi 1) P(n i | mi 0) P(mi 0) iA . iAc 11 since P(mi =1) = 0 for sampled families, and P(mi ≥ 1) = 0 for non-sampled families. 12 As a first approximation on the non-sampled families, we neglect the heterogeneity of the risk 13 among families and assume that the number of affected children, mi, is distributed according 14 to a binomial distribution: mi ~ Bin (ni , R) . We suppose that P(mi 1) P(mi 1) , because the 15 probability of two or more siblings in a family becoming affected is small (0.0018, compared 16 with the prevalence of 0.022 in the catchment area). In this case, the number of families in the 17 sample (N) is approximately equal to the number of affected children (M1) that is, N M1 , 18 because each family in the sample has one affected child. Then, P(mi 1) ni R(1 R) ni 1 and 3 1 P(mi 0) (1 R) ni . Therefore, we obtain 2 P(n i ) const P(n i | mi 1) R N (1 R) N1 N const iA N P(n i | mi 1) R M1 (1 R) N1 M1 , i 1,m1 1 3 which yields LLsample+supp(θ). Note that N1 refers to the number of all children in the 4 catchment area and P(n i | mi 0) 1 under the binomial assumption. 5 For the AGRE sample, suppose that the number of families in the catchment area is T and that 6 the families in the catchment area are partitioned into the families with two or more affected 7 children, A2, the families with one affected child, A1, and the families with no affected child, 8 A0. Then consider the decomposition as in our sample: T 9 P(n ) P(n i i 1 iA2 i | mi 2) P(mi 2) P(n i | mi 1) P(mi 1) P(n i | mi 0) P(mi 0) iA1 iA0 10 We suppose that P(mi 2) P(mi 2) , because the probability of three or more siblings in a 11 family becoming affected is small. If we assume that the number of affected children, mi, is 12 distributed according to a binomial distribution: mi ~ Bin (ni , R) , then 13 P(mi 2) ni (ni 1) R 2 (1 R) ni 2 / 2 , P(mi 1) ni R(1 R) ni 1 , and P(mi 0) (1 R) ni . 14 Here note that P(n i |mi 1) P(n i | mi 0) 1 under the binomial assumption. Therefore, P(n i ) const P(n i | mi 2) P(mi 2) P(mi 1) P(mi 0) iA2 15 iA2 const P(n i | mi 2) R iA1 2 A2 A1 1 (1 R) iA2 const P(n i | mi 2) R M1 (1 R) N1 M1 iA2 16 which yields LLsample+supp(θ) for the AGRE sample. 17 4 A2 ( ni 2 ) iA0 A1 ( ni 1) ni A0 1 MCMC Sampling For a Discrete Model 2 Given observation X, we sample from the joint distribution of the model parameters, θ = (x, 3 a, w) where {xi} and {ai} (1 ≤ i ≤ K). In brief, the approach is as follows: 4 1. Update w, given x and a. 5 2. Update a, given w and x. 6 3. Update x, given a and w. 7 8 These steps are iterated many times to obtain samples from a Markov Chain whose stationary 9 distribution is the joint posterior distribution of all model parameters. 10 11 1. To update w, we first propose a new value w’ from Beta(α0, β0), where Beta(α, β), α0 and 12 β0 represent the beta distribution with parameters α and β, Kw w and Kw (1-w), 13 respectively. We set α1 = Kw w’ and β1 = Kw (1-w’). Then, we use a Metropolis-Hasting 14 step to accept or reject this new value. The new configuration is accepted with probability: 15 Beta ( w | 1 , 1 ) P( X | x, a, w' ) min 1, Beta ( w'| 0 , 0 ) P( X | x, a, w) 16 2. To describe this update, we used the Dirichlet distribution, Dir(γ), where {γi},i = 1,…,K, 17 while the prior γ0 is specified as γ0 = Kγ a. We propose a new value of a’ for a from 18 Dir(γ0), and we set Kγ a’ = γ1. The new proposed values are then accepted with 5 1 2 3 probability: Dir ( | 1 ) P( X | x, a' , w) min 1, Dir ( '| 0 ) P( X | x, a, w) 3. To describe this update, we introduced additional notation. Let Ui+1 denote the interval 4 between xi and xi+1 (1 ≤ i ≤ K, U1 = x1 and UK+1 = 1-XK), and set μ0 = Kμ u. We propose a 5 new value of u’ from Dir(μ0), and we set Kμ u’ = μ1. From u’, we can obtain x’. Then, the 6 new proposed values are accepted with probability: 7 Dir (u | 1 ) P( X | x' , a, w) min 1, Dir (u '| 0 ) P( X | x, a, w) 8 In practice, we adjusted the values of Kw, Kγ and Kμ to 10-100, 500- 10000 and 500- 10000, 9 respectively, to achieve a good mixing of the chain. 10 11 MCMC Sampling For a Continuous Model 12 Given observation X, we sample from the joint distribution of the model parameters, θ = (α, 13 β, w). Briefly, the approach is as follows: 14 1. Update w, given α and β. 15 2. Update α, given w and β. 16 3. Update β, given α and w. 17 18 1. To update w, we first propose a new value w’ from Beta(θ0,κ0), where Beta(θ, κ), θ0 and κ0 6 1 represent the beta distribution with parameters θ and κ, Kw w and Kw (1-w), 2 respectively. We set θ1 = Kw w’ and κ1 = Kw (1-w’). Then, we use a Metropolis-Hasting 3 step to accept or reject the new value. The new configuration is accepted with probability: Beta( w | 1 , 1 ) P( X | , , w' ) min 1, Beta ( w'| 0 , 0 ) P( X | , , w) 4 5 2. To update α, we first propose a new value α’ from U((1-Kα)α,(1+Kα)α), where U(δ, ζ) 6 represents the uniform distribution over [δ, ζ]. Then, we use a Metropolis-Hasting step to 7 accept or reject the new value. The new configuration is accepted with probability: P( X | ' , , w) min 1, ' P( X | , , w) 8 9 10 3. To update β, we used the same procedure as that used for α, replacing α with β and replacing Kα with Kβ. 11 12 In practice, we adjusted the values of Kw , Kα and Kβ to 10-100, 0.10- 0.35 and 0.10- 0.35, 13 respectively, to achieve good mixing of the chain. 7