Supplementary information (doc 150K)

advertisement
1
Prevalence and Sibling Recurrence Risk Based on the Multi-Risk Model
2
We let Xi represent the dummy variable denoting the affectedness status of a sibling, where Xi
3
= 1 if sibling “i” is affected and Xi = 0 if sibling “i” is not affected, for i = 1, 2, …, n.
4
Similarly, we let Yi represent the dummy variable denoting sex of a sibling, where Yi = 1 if
5
sibling “i” is male and Yi = 0 if sibling “i” is female, for i = 1, 2, …, n.
6
7
It follows, using the notation of this article, that:
K
8
a x
P(Xi =1|Yi =1) =
i i
i 1
(= Rm)
K
9
P(Xi =1|Yi =0) = p  ai xi (= pRm)
i 1
10
When the male proportion, P(Yi =1), and the female proportion, P(Yi =0), are denoted by qm
11
and qf, respectively, then the prevalence, R = P(Xi =1), equals:
12
P(Yi =1)  P(Xi =1|Yi =1) + P(Yi =0)  P(Xi =1|Yi =0)
K
13
= (qm  pq f ) ai xi ( = (qm+pqf)Rm)
(3)
i 1
14
Moreover, it follows, using the notation of this article, that:
K
15
P(Xi =1, Xj =1|Yi =1, Yj =1) =
a x
i 1
i
2
i
( = Sm)
K
16
P(Xi =1, Xj =1|Yi =1, Yj =0) = P(Xi =1, Xj =1|Yi =0, Yj =1) = p  ai xi2 ( = pSm)
i 1
K
17
P(Xi =1, Xj =1|Yi =0, Yj =0) = p 2  ai xi2 ( = p2Sm)
i 1
18
When the risk that the sibling of the proband will be affected, given that the proband is male,
19
is termed as the sibling recurrence risk given by the male proband, this equals:
1
1
P(Xj =1|Xi =1, Yi =1)
2
= {P(Xi =1, Xj =1,Yi =1, Yj =1)+ P(Xi =1, Xj =1,Yi =1, Yj =0)}/ P(Xi =1,Yi =1)
3
= {P(Xi =1, Xj =1|Yi =1, Yj =1)  qm2 + P(Xi =1, Xj =1|Yi =1, Yj =0)  qm qm}/ qmRm
4
= (qm+pqf)Sm/Rm
5
6
Similarly, the sibling recurrence risk given by the female proband is:
P(Xj =1|Xi =1, Yi =0)
7
= {P(Xi =1, Xj =1,Yi =0, Yj =1)+ P(Xi =1, Xj =1,Yi =0, Yj =0)}/ P(Xi =1,Yi =0)
8
= {P(Xi =1, Xj =1|Yi =0, Yj =1)  qmqf + P(Xi =1, Xj =1|Yi =0, Yj =0)  qf2}/ pqfRm
9
= (qm+pqf)Sm/Rm
10
11
12
13
Moreover, the joint probability that both siblings i and j are affected equals:
P(Xj =1, Xi =1)
= P(Xi =1, Xj =1|Yi =1, Yj =1)  qm2 + 2P(Xi =1, Xj =1|Yi =0, Yj =1)  qm qm
+ P(Xi =1, Xj =1|Yi =0, Yj =0)  qf2
14
= (qm2 + 2pqm qm + p2qf2) Sm
15
= (qm+ pqf)2 Sm
16
17
Then, the sibling recurrence risk, S, equals:
P(Xj =1|Xi =1) = P(Xj =1, Xi =1)/ P(Xi =1)
18
= (qm+pqf)2 Sm/(qm+pqf)Rm
19
= (qm+pqf)Sm/Rm
2
K
1
i 1
0
= (qm  pq f ) ai xi /  x 2 f ( x)dx
1
(4)
2
Therefore, the sibling recurrence risk, the sibling recurrence risk given by the male proband
3
and the sibling recurrence risk given by the female proband are all equal under the model
4
used.
5
6
Log-likelihood Including the Prevalence Information, LLsample+supp
7
Suppose that the number of families in the catchment area is T and that the families in the
8
catchment area are partitioned into sampled families, A, and the other families, Ac. Then
9
consider the decomposition:
10
T
T
i 1
i 1
 P(n i )   P(n i | mi  1) P(mi  1)  P(n i | mi  0) P(mi  0)
  P(n i | mi  1) P(mi  1)  P(n i | mi  0) P(mi  0)
iA
.
iAc
11
since P(mi =1) = 0 for sampled families, and P(mi ≥ 1) = 0 for non-sampled families.
12
As a first approximation on the non-sampled families, we neglect the heterogeneity of the risk
13
among families and assume that the number of affected children, mi, is distributed according
14
to a binomial distribution: mi ~ Bin (ni , R) . We suppose that P(mi  1)  P(mi  1) , because the
15
probability of two or more siblings in a family becoming affected is small (0.0018, compared
16
with the prevalence of 0.022 in the catchment area). In this case, the number of families in the
17
sample (N) is approximately equal to the number of affected children (M1) that is, N  M1 ,
18
because each family in the sample has one affected child. Then, P(mi  1)  ni R(1  R) ni 1 and
3
1
P(mi  0)  (1  R) ni . Therefore, we obtain
2
P(n i )  const   P(n i | mi  1)  R N (1  R) N1  N  const 
iA
N
 P(n
i
| mi  1)  R M1 (1  R) N1 M1 ,
i 1,m1 1
3
which yields LLsample+supp(θ). Note that N1 refers to the number of all children in the
4
catchment area and P(n i | mi  0)  1 under the binomial assumption.
5
For the AGRE sample, suppose that the number of families in the catchment area is T and that
6
the families in the catchment area are partitioned into the families with two or more affected
7
children, A2, the families with one affected child, A1, and the families with no affected child,
8
A0. Then consider the decomposition as in our sample:
T
9
 P(n )  P(n
i
i 1
iA2
i
| mi  2) P(mi  2)  P(n i | mi  1) P(mi  1)  P(n i | mi  0) P(mi  0)
iA1
iA0
10
We suppose that P(mi  2)  P(mi  2) , because the probability of three or more siblings in a
11
family becoming affected is small. If we assume that the number of affected children, mi, is
12
distributed according to a binomial distribution: mi ~ Bin (ni , R) , then
13
P(mi  2)  ni (ni  1) R 2 (1  R) ni 2 / 2 , P(mi  1)  ni R(1  R) ni 1 , and P(mi  0)  (1  R) ni .
14
Here note that P(n i |mi  1)  P(n i | mi  0)  1 under the binomial assumption. Therefore,
P(n i )  const   P(n i | mi  2)   P(mi  2)   P(mi  1)   P(mi  0)
iA2
15
iA2
 const   P(n i | mi  2)  R
iA1
 
2
A2
A1
1
 (1  R)
iA2
 const   P(n i | mi  2)  R M1 (1  R) N1 M1
iA2
16
which yields LLsample+supp(θ) for the AGRE sample.
17
4

A2
( ni  2 ) 
iA0

A1
( ni 1) 
 ni
A0
1
MCMC Sampling For a Discrete Model
2
Given observation X, we sample from the joint distribution of the model parameters, θ = (x,
3
a, w) where {xi} and {ai} (1 ≤ i ≤ K). In brief, the approach is as follows:
4
1. Update w, given x and a.
5
2. Update a, given w and x.
6
3. Update x, given a and w.
7
8
These steps are iterated many times to obtain samples from a Markov Chain whose stationary
9
distribution is the joint posterior distribution of all model parameters.
10
11
1. To update w, we first propose a new value w’ from Beta(α0, β0), where Beta(α, β), α0 and
12
β0 represent the beta distribution with parameters α and β, Kw  w and Kw  (1-w),
13
respectively. We set α1 = Kw  w’ and β1 = Kw  (1-w’). Then, we use a Metropolis-Hasting
14
step to accept or reject this new value. The new configuration is accepted with probability:
15
 Beta ( w | 1 , 1 ) P( X | x, a, w' ) 

min 1,
 Beta ( w'|  0 ,  0 ) P( X | x, a, w) 
16
2. To describe this update, we used the Dirichlet distribution, Dir(γ), where {γi},i = 1,…,K,
17
while the prior γ0 is specified as γ0 = Kγ  a. We propose a new value of a’ for a from
18
Dir(γ0), and we set Kγ  a’ = γ1. The new proposed values are then accepted with
5
1
2
3
probability:
 Dir ( |  1 ) P( X | x, a' , w) 

min 1,
 Dir ( '|  0 ) P( X | x, a, w) 
3. To describe this update, we introduced additional notation. Let Ui+1 denote the interval
4
between xi and xi+1 (1 ≤ i ≤ K, U1 = x1 and UK+1 = 1-XK), and set μ0 = Kμ  u. We propose a
5
new value of u’ from Dir(μ0), and we set Kμ  u’ = μ1. From u’, we can obtain x’. Then, the
6
new proposed values are accepted with probability:
7
 Dir (u | 1 ) P( X | x' , a, w) 

min 1,
 Dir (u '|  0 ) P( X | x, a, w) 
8
In practice, we adjusted the values of Kw, Kγ and Kμ to 10-100, 500- 10000 and 500- 10000,
9
respectively, to achieve a good mixing of the chain.
10
11
MCMC Sampling For a Continuous Model
12
Given observation X, we sample from the joint distribution of the model parameters, θ = (α,
13
β, w). Briefly, the approach is as follows:
14
1. Update w, given α and β.
15
2. Update α, given w and β.
16
3. Update β, given α and w.
17
18
1. To update w, we first propose a new value w’ from Beta(θ0,κ0), where Beta(θ, κ), θ0 and κ0
6
1
represent the beta distribution with parameters θ and κ, Kw  w and Kw  (1-w),
2
respectively. We set θ1 = Kw  w’ and κ1 = Kw  (1-w’). Then, we use a Metropolis-Hasting
3
step to accept or reject the new value. The new configuration is accepted with probability:
 Beta( w | 1 , 1 ) P( X |  ,  , w' ) 

min 1,
 Beta ( w'|  0 ,  0 ) P( X |  ,  , w) 
4
5
2. To update α, we first propose a new value α’ from U((1-Kα)α,(1+Kα)α), where U(δ, ζ)
6
represents the uniform distribution over [δ, ζ]. Then, we use a Metropolis-Hasting step to
7
accept or reject the new value. The new configuration is accepted with probability:
  P( X |  ' ,  , w) 

min 1,
  ' P( X |  ,  , w) 
8
9
10
3. To update β, we used the same procedure as that used for α, replacing α with β and
replacing Kα with Kβ.
11
12
In practice, we adjusted the values of Kw , Kα and Kβ to 10-100, 0.10- 0.35 and 0.10- 0.35,
13
respectively, to achieve good mixing of the chain.
7
Download