file

advertisement
Supplementary Information
Validation of differential gene expression algorithms: Application comparing fold-change
estimation to hypothesis testing
Corey M. Yanofsky and David R. Bickel
A.1. Heuristic overview of the derivation of the Bayes factor
This document contains an explicit derivation of the Bayes factor used in the main paper for
both paired and unpaired data. In each case, there are two models for the data: the null model in
which the gene is equivalently expressed in the two conditions, and the alternative model in which
the gene is differentially expressed.
The derivation of the Bayes factor requires two components per model. The first component
is the probability distribution of the data conditional on some statistical parameters; this is termed
the likelihood function. The differential expression model will always have one extra parameter to
take into account the fact that the gene’s expression level is different across conditions.
The data are always modeled as:
observed datum = average data level + measurement error.
Here, the average data level is an unknown parameter. Throughout, the measurement errors are
assumed to be independent and identically distributed as Gaussian random variables. That is, for
all j,
𝑝(
πœ€π‘— |𝜎 2 ) = 𝜎
1
1
exp [− 2𝜎2 πœ€π‘— 2 ],
2πœ‹
√
where πœ€π‘— is the measurement error of the jth observation and 𝜎 2 is the data variance .
The second component is the prior distribution of the model parameters, namely, the
baseline expression level of the gene and the experimental variability of the data. The prior
distribution summarizes everything that is known about the model parameters prior to observing
the data. Since the basic expression level of the gene and the variability of the data are unknown, we
use standard default priors for them.
The extra parameter in the alternative model measures the amount of differential
expression. Here, we use what has been called a unit-information prior distribution, that is, a prior
distribution that contains exactly as much information as one extra data point. The unitinformation prior is weakly informative, so it will not unduly influence the results in favor of either
model.
To calculate the Bayes factor, we marginalize the model parameters; that is, we integrate
the likelihood function with respect to the prior distribution, resulting in a prior predictive
distribution. The marginalization removes the nuisance parameters from the expression. The Bayes
factor is the ratio of the prior predictive distributions under the null and alternative models.
1
A.2 Derivations
Bayes factor for paired data
′
Suppose n = n′ and π‘₯𝑖,𝑗
is paired with π‘₯𝑖,𝑗 . Let
′
𝑦𝑗 = π‘₯𝑖,𝑗
− π‘₯𝑖,𝑗 .
(S1)
The hypothesis of equivalent expression is
M0: yj = ο₯j,
and the hypothesis of differential expression is
M1: yj =  + ο₯j.
Prior distributions
For both models, we set
𝑝(𝜎 2 ) ∝
1
.
𝜎2
For M1, we use the unit information prior
𝑝(𝛼|𝜎 2 ) =
1
𝜎√2πœ‹
exp [−
1
(𝛼 − πœ‡π›Ό )2 ].
2𝜎 2
In the main text of the paper, the prior mean πœ‡π›Ό is set to zero.
Null model prior predictive distribution
The prior predictive distribution of the data under M0 is
∞
𝑝(𝑦|𝑀0 ) = ∫ 𝑝(𝜎 2 ) ∏ 𝑝(𝑦𝑗 |πœ‡ = 0, 𝜎 2 ) π‘‘πœŽ 2 ,
0
𝑗
𝑛
∞
Μ…Μ…Μ…2
1 2 +1
𝑛𝑦
⁄2
−𝑛
𝑝(𝑦|𝑀0 ) = (2πœ‹)
∫ ( 2)
exp (− 2 ) π‘‘πœŽ 2 ,
𝜎
2𝜎
0
𝑛
−𝑛⁄2
Μ…Μ…Μ…2 )
𝑝(𝑦|𝑀0 ) = (2πœ‹)−𝑛⁄2  (2 ) (𝑛𝑦
.
Alternative model prior predictive distribution
We define in advance:
𝛼̂ =
(𝑛𝑦̅+πœ‡π›Ό )
𝑛+1
,
2
2
𝑆𝑆𝑅1 = (πœ‡π›Ό − 𝛼̂)2 + ∑(𝑦𝑗 − 𝛼̂) .
𝑗
After some algebra, we can derive
Μ…Μ…Μ…2 − (𝑦̅)2 ) +
𝑆𝑆𝑅1 = 𝑛(𝑦
𝑛
(𝑦̅ − πœ‡π›Ό )2 .
𝑛+1
Here, SSR1 is the effective sum of squares of the residuals under M1. It is the sum of the SSR using
the maximum likelihood estimator 𝛼𝑀𝐿𝐸 = 𝑦̅ and a term that penalizes disagreement between the
MLE and the prior mean.
The prior predictive distribution of the data under M1 is
∞
∞
𝑝(𝑦|𝑀1 ) = ∫ ∫ 𝑝(𝜎 2 )𝑝(𝛼|𝜎 2 ) ∏ 𝑝(𝑦𝑗 |πœ‡ = 𝛼, 𝜎 2 ) 𝑑𝛼 π‘‘πœŽ 2 ,
𝑗
0 −∞
∞
(𝑛+1)
+1
2
1
𝑝(𝑦|𝑀1 ) = (2πœ‹)−(𝑛+1)⁄2 ∫ ( 2 )
𝜎
0
∞
( ∫ exp {−
−∞
1
2
[(𝛼 − πœ‡π›Ό )2 + ∑(𝑦𝑗 − 𝛼) ]} 𝑑𝛼 ) π‘‘πœŽ 2 .
2
2𝜎
𝑗
Isolating the part of the integrand that is a quadratic expression in , we complete the square:
2
(𝛼 − πœ‡π›Ό )2 + ∑(𝑦𝑗 − 𝛼)
𝑗
Μ…Μ…Μ…2
= 𝛼 2 − 2π›Όπœ‡π›Ό + πœ‡π›Ό 2 + 𝑛𝛼 2 − 2𝛼𝑛𝑦̅ + 𝑛𝑦
Μ…Μ…Μ…2 + πœ‡π›Ό 2 )
= (𝑛 + 1)𝛼 2 − 2𝛼(𝑛𝑦̅ + πœ‡π›Ό ) + (𝑛𝑦
Μ…Μ…Μ…2 − 𝑛(𝑦̅)2 ) + πœ‡π›Ό 2 + 𝑛(𝑦̅)2 −
= (𝑛 + 1)(𝛼 − 𝛼̂)2 + (𝑛𝑦
𝑛2 (𝑦̅)2 + 2π‘›π‘¦Μ…πœ‡π›Ό + πœ‡π›Ό 2
𝑛+1
2
2 (𝑦 2
2
2
2 (𝑦 2
2
Μ…Μ…Μ…2 − (𝑦̅)2 ) + π‘›πœ‡π›Ό + 𝑛 Μ…) + πœ‡π›Ό + 𝑛(𝑦̅) − 𝑛 Μ…) + 2π‘›π‘¦Μ…πœ‡á + πœ‡π›Ό
= (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦
𝑛+1
𝑛+1
Μ…Μ…Μ…2 − (𝑦̅)2 ) +
= (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦
1
(𝑛(𝑦̅)2 − 2π‘›π‘¦Μ…πœ‡π›Ό + π‘›πœ‡π›Ό 2 )
𝑛+1
Μ…Μ…Μ…2 − (𝑦̅)2 ) +
= (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑛(𝑦
𝑛
(𝑦̅ − πœ‡π›Ό )2
𝑛+1
= (𝑛 + 1)(𝛼 − 𝛼̂)2 + 𝑆𝑆𝑅1 .
Substituting back into the integral, we have
(𝑛+1)
𝑝(𝑦|𝑀1 ) =
(2πœ‹)−(𝑛+1)⁄2
+1
∞ 1
∞
𝑆𝑆𝑅
𝑛+1
2
exp (− 21 ) (∫−∞ exp [− 2 (𝛼
∫0 (𝜎2 )
2𝜎
2𝜎
− 𝛼̂)2 ] 𝑑𝛼 ) π‘‘πœŽ 2 ,
3
∞
1
𝑛
+1
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)−𝑛⁄2 (𝑛 + 1)−1⁄2 ∫0 (𝜎2 )
exp (−
𝑆𝑆𝑅1
) π‘‘πœŽ 2 ,
2𝜎 2
𝑛
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)−𝑛⁄2  ( ) (𝑛 + 1)−1⁄2 (𝑆𝑆𝑅1 )−𝑛⁄2 .
The Bayes factor is
𝑛
𝑝(𝑦|𝑀0 )
𝑆𝑆𝑅1 2
𝐡𝐹 =
= √𝑛 + 1 ( 2 ) .
Μ…Μ…Μ…
𝑝(𝑦|𝑀1 )
𝑛𝑦
(S2)
Equations (S1) and (S2) together are equivalent to equations (9), (11), and (12) of the main paper.
Bayes factor for two-sample data
′
Suppose that 𝑋𝑖,𝑗
and 𝑋𝑖,𝑗 are independent. Define
𝑦𝑗 = π‘₯𝑖,𝑗 , 𝑗 = 1, … , 𝑛,
(S3)
′
𝑦𝑛+𝑗 = π‘₯𝑖,𝑗
, 𝑗 = 1, … , 𝑛′ .
(S4)
The hypothesis of equivalent expression is
M0: 𝑦𝑗 = 𝛽 + πœ€π‘— , 𝑗 = 1, … , 𝑛 + 𝑛′,
and the hypothesis of differential expression is
M1: 𝑦𝑗 = 𝛽 + πœ€π‘— , 𝑗 = 1, … , 𝑛,
𝑦𝑗 = 𝛼 + πœ€π‘— , 𝑗 = 𝑛 + 1, … , 𝑛 + 𝑛′.
Preliminaries
To fix notation, let
𝑛+𝑛′
1
Μ…Μ…Μ…
𝑦2 =
∑ 𝑦𝑗 2 ,
𝑛+π‘š
𝑗=1
𝑛+𝑛′
1
𝑦̅ =
∑ 𝑦𝑗 ,
𝑛+π‘š
𝑗=1
𝑛+𝑛′
1
π‘¦π‘Ž =
Μ…Μ…Μ…
∑ 𝑦𝑗 ,
𝑛′
𝑗=𝑛+1
𝑛
1
𝑦𝑏 = ∑ 𝑦𝑗 .
Μ…Μ…Μ…
𝑛
𝑗=1
4
Before beginning the derivation of the Bayes factor, we note that the maximum likelihood estimates
under M1 are
𝛼𝑀𝐿𝐸 = Μ…Μ…Μ…,
π‘¦π‘Ž
𝛽𝑀𝐿𝐸 = Μ…Μ…Μ…,
𝑦𝑏
and the sum of squares of the residuals using the MLEs is
2
2
Μ…Μ…Μ…2 − π‘š(𝑦
𝑆𝑆𝑅𝑀𝐿𝐸 = (𝑛 + π‘š)𝑦
Μ…Μ…Μ…)
− 𝑛(𝑦
Μ…Μ…Μ…)
π‘Ž
𝑏 .
Prior distributions
For both models, we set the prior for (𝛽, 𝜎 2 ) to be
1
𝑝(𝛽, 𝜎 2 ) ∝ 𝜎2 .
For the extra parameter in M1, we use the unit information prior centered at 𝛼 = 𝛽,
𝑝(𝛼|𝛽, 𝜎 2 ) =
1
1
exp [− 2 (𝛼
𝜎 √2πœ‹
2𝜎
− 𝛽)2 ].
Null model prior predictive distribution
The prior predictive probability of the data under M0 is
∞
∞ 𝑛+𝑛′
𝑝(𝑦|𝑀0 ) = ∫ 𝑝(𝛽, 𝜎 2 ) ∫ ∏ 𝑝(𝑦𝑗 |𝛽, 𝜎 2 ) 𝑑𝛽 π‘‘πœŽ 2 ,
−∞ 𝑗=1
0
∞
−𝑛+𝑛′
2
𝑝(𝑦|𝑀0 ) = (2πœ‹)
𝑛+𝑛′
+1
2
1
∫ ( 2)
𝜎
0
−∞
∞
−𝑛+𝑛′−1
2
𝑝(𝑦|𝑀0 ) = (2πœ‹)
∞
Μ…Μ…Μ…2 − (𝑦̅)2 )
(𝑛 + 𝑛′)(𝑦
𝑛(𝛽 − 𝑦̅)2
exp (−
(
∫
exp
)
(−
) 𝑑𝛽 ) π‘‘πœŽ 2 ,
2𝜎 2
2𝜎 2
−12
(𝑛 + 𝑛′)
𝑛+𝑛′−1
+1
2
1
∫ ( 2)
𝜎
exp (−
0
𝑛+𝑛′−1
2
𝑝(𝑦|𝑀0 ) = (2πœ‹)−
1
(𝑛 + 𝑛′)−2  (
Μ…Μ…Μ…2 − (𝑦̅)2 )
(𝑛 + 𝑛′)(𝑦
) π‘‘πœŽ 2 ,
2𝜎 2
𝑛 + 𝑛′ − 1
Μ…Μ…Μ…2 − (𝑦̅)2 )]−(𝑛+𝑛′−1)⁄2 .
) [(𝑛 + 𝑛′)(𝑦
2
Alternative model prior predictive distribution
We define in advance:
𝐸(𝛼|𝛽, 𝑦) =
𝛽̂ = (
(𝑛′𝑦
Μ…Μ…Μ…
π‘Ž + 𝛽)
,
𝑛′ + 1
(𝑛 + 𝑛𝑛′)𝑦
Μ…Μ…Μ…
Μ…Μ…Μ…
𝑏 + 𝑛′𝑦
π‘Ž
),
′
𝑛 + 𝑛 + 𝑛𝑛′
5
𝑆𝑆𝑅1 = 𝑆𝑆𝑅𝑀𝐿𝐸 +
𝑛𝑛′
(𝑦
Μ…Μ…Μ… − Μ…Μ…Μ…)
𝑦𝑏 2 .
(𝑛 + 𝑛′ + 𝑛𝑛′) π‘Ž
As before, the effective sum of squares of the residuals under M1 is the sum of the SSR using the
maximum likelihood estimators and a penalty term for disagreement between the MLEs and the
prior distribution.
Before dealing with the marginal probability of the data under M1, we re-arrange the quadratic
expression in  and  to ease the integrations.
𝑛
𝑛+𝑛′
2
2
2
(𝛼 − 𝛽) + ∑(𝑦𝑗 − 𝛽) + ∑ (𝑦𝑗 − 𝛼)
𝑗=1
𝑗=𝑛+1
2
Μ…Μ…Μ…2
= 𝛼 2 + 𝛽 2 − 2𝛼𝛽 + 𝑛𝛽 2 − 2𝑛𝑦
̅̅̅𝛽
̅̅̅𝛼
𝑏 + 𝑛′𝛼 − 2𝑛′𝑦
π‘Ž + (𝑛 + 𝑛′)𝑦
2
Μ…Μ…Μ…2
= (𝑛 + 1)𝛽 2 − 2𝑛𝑦
̅̅̅𝛽
Μ…Μ…Μ…
𝑏 + (𝑛′ + 1)𝛼 − 2(𝑛′𝑦
π‘Ž + 𝛽)𝛼 + (𝑛 + 𝑛′)𝑦
2
Μ…Μ…Μ…2
= (𝑛 + 1)𝛽 2 − 2𝑛𝑦
̅̅̅𝛽
𝑏 + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦 −
=(
=(
=(
=(
=(
2
(𝑛′𝑦
Μ…Μ…Μ…
π‘Ž + 𝛽)
𝑛′ + 1
𝑛 + 𝑛′ + 𝑛𝑛′ 2
𝑛′
2
Μ…Μ…Μ…2
Μ…Μ…Μ…
π‘¦π‘Ž 𝛽 + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦
Μ…Μ…Μ…)
) 𝛽 − 2 (𝑛𝑦
𝑏+
𝑛′ + 1
𝑛′ + 1
𝑛′2
(𝑦
−
Μ…Μ…Μ…)2
𝑛′ + 1 π‘Ž
(𝑛 + 𝑛𝑛′)𝑦
𝑛 + 𝑛′ + 𝑛𝑛′
Μ…Μ…Μ…
Μ…Μ…Μ…
2
𝑏 + 𝑛′𝑦
π‘Ž
Μ…Μ…Μ…2
) [𝛽 2 − 2 (
) 𝛽] + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽)) + (𝑛 + 𝑛′)𝑦
𝑛′ + 1
𝑛 + 𝑛′ + 𝑛𝑛′
𝑛′2
(𝑦
−
Μ…Μ…Μ…)2
𝑛′ + 1 π‘Ž
2
𝑛 + 𝑛′ + 𝑛𝑛′
2
2
Μ…Μ…Μ…2 − 𝑛′ (𝑦
Μ…Μ…Μ…)2
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦
𝑛′ + 1
𝑛′ + 1 π‘Ž
2
[(𝑛 + 𝑛𝑛′)𝑦
Μ…Μ…Μ…
Μ…Μ…Μ…]
𝑏 + 𝑛′𝑦
π‘Ž
−
(𝑛 + 𝑛′ + 𝑛𝑛′)(𝑛′ + 1)
2
𝑛 + 𝑛′ + 𝑛𝑛′
2
2
Μ…Μ…Μ…2 − 𝑛′ (𝑦
Μ…Μ…Μ…)2
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦
𝑛′ + 1
𝑛′ + 1 π‘Ž
2
2
(𝑛 + 𝑛𝑛′)2 (𝑦
Μ…Μ…Μ…)
+ 𝑛′2 (𝑦
Μ…Μ…Μ…)
+ 2𝑛′(𝑛 + 𝑛𝑛′)𝑦
Μ…Μ…Μ…
𝑦𝑏
𝑏
π‘Ž
π‘Ž Μ…Μ…Μ…
−
(𝑛 + 𝑛′ + 𝑛𝑛′)(𝑛′ + 1)
𝑛 + 𝑛′ + 𝑛𝑛′
2
2
Μ…Μ…Μ…2
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦
𝑛′ + 1
2
2
𝑛2 (𝑛′ + 1)(𝑦
Μ…Μ…Μ…)
+ 𝑛′2 (𝑛 + 1)(𝑦
Μ…Μ…Μ…)
− 2𝑛𝑛′𝑦
Μ…Μ…Μ…
𝑦𝑏
𝑏
π‘Ž
π‘Ž Μ…Μ…Μ…
−
(𝑛 + 𝑛′ + 𝑛𝑛′)
6
=(
𝑛 + 𝑛′ + 𝑛𝑛′
2
2
2
2
Μ…Μ…Μ…2 − 𝑛′(𝑦
Μ…Μ…Μ…)
− 𝑛(𝑦
Μ…Μ…Μ…)
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + (𝑛 + 𝑛′)𝑦
π‘Ž
𝑏
𝑛′ + 1
2
2
𝑛𝑛′(𝑦
Μ…Μ…Μ…)
+ 𝑛𝑛′(𝑦
Μ…Μ…Μ…)
− 2𝑛𝑛′𝑦
Μ…Μ…Μ…
𝑦𝑏
𝑏
π‘Ž
π‘Ž Μ…Μ…Μ…
+
(𝑛 + 𝑛′ + 𝑛𝑛′)
=(
𝑛 + 𝑛′ + 𝑛𝑛′
𝑛𝑛′
2
2
(𝑦
Μ…Μ…Μ… − Μ…Μ…Μ…)
𝑦𝑏 2
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + 𝑆𝑆𝑅𝑀𝐿𝐸 +
(𝑛 + 𝑛′ + 𝑛𝑛′) π‘Ž
π‘š+1
=(
𝑛 + 𝑛′ + 𝑛𝑛′
2
2
) (𝛽 − 𝛽̂ ) + (𝑛′ + 1)(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) + 𝑆𝑆𝑅1 .
𝑛′ + 1
The marginal probability of the data under M1 is
∞
∞
∞
𝑛
𝑝(𝑦|𝑀1 ) = ∫ ∫ ∫ 𝑝(𝛼̂, 𝜎
2 )𝑝(𝛼|𝛽,
𝜎
2)
𝑛+𝑛′
2
∏ 𝑝(𝑦𝑗 |πœ‡ = 𝛽, 𝜎 ) ∏ 𝑝(𝑦𝑗 |πœ‡ = 𝛼, 𝜎 2 ) 𝑑𝛼 𝑑𝛽 π‘‘πœŽ 2 ,
𝑖=𝑗
0 −∞ −∞
∞
(𝑛+𝑛′+1)
+1
2
1
∫ ( 2)
𝜎
−𝑛+𝑛′+1
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)
0
∞
𝑖=𝑛+1
∞
𝑛
1
2
∫ ∫ exp [− 2 ((𝛼 − 𝛽)2 + ∑(𝑦𝑗 − 𝛽)
2𝜎
𝑗=1
−∞ −∞
𝑛+𝑛′
2
+ ∑ (𝑦𝑗 − 𝛼) )] 𝑑𝛼 𝑑𝛽 π‘‘πœŽ 2 ,
𝑗=𝑛+1
∞
−𝑛+𝑛′+1
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)
(𝑛+𝑛′+1)
+1
2
1
∫ ( 2)
𝜎
0
∞
𝑆𝑆𝑅1
𝑛 + 𝑛′ + 𝑛𝑛′
exp (−
) ( ∫ exp [− 2
] 𝑑𝛽 )
2
2𝜎
2𝜎 (π‘š + 1)
−∞
∞
× ( ∫ exp [−
−∞
∞
−𝑛+𝑛′−1
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)
−12
(𝑛 + 𝑛′ + 𝑛𝑛′)
(π‘š + 1)
2
(𝛼 − 𝐸(𝛼|𝛽, 𝑦)) ] 𝑑𝛼 ) π‘‘πœŽ 2 ,
2
2𝜎
(𝑛+𝑛′−1)
+1
2
1
∫ ( 2)
𝜎
exp (−
0
𝑛+π‘š−1
2
𝑝(𝑦|𝑀1 ) = (2πœ‹)−
1
(𝑛 + 𝑛′ + 𝑛𝑛′)−2  (
𝑆𝑆𝑅1
) π‘‘πœŽ 2 ,
2𝜎 2
𝑛 + 𝑛′ − 1
) (𝑆𝑆𝑅1 )−(𝑛+𝑛′−1)⁄2 .
2
The Bayes factor is
𝑝(𝑦|𝑀0 )
𝑛 + 𝑛′ + 𝑛𝑛′
𝑆𝑆𝑅1
𝐡𝐹 =
=√
(
Μ…Μ…Μ…2 − (𝑦̅)2 ))
𝑝(𝑦|𝑀1 )
𝑛 + 𝑛′
(𝑛 + 𝑛′)(𝑦
𝑛+𝑛′−1
2
.
(S5)
Equations (S3), (S4) and (S5) together are equivalent to equations (10), (11), and (12) of the main
paper.
7
B.1. Derivation of posterior distribution for sampling variances
Under the null hypothesis with non-paired data, the data have the same mean, but the null
hypothesis says nothing about sampling variances for the treatment and control data. In section A,
the variances were treated as identical. Here we treat them as unrelated. Suppose that
𝑦𝑗 = 𝛽 + πœŽπœ€π‘— , 𝑗 = 1, … , 𝑛,
𝑦𝑗 = 𝛽 + 𝜎′πœ€π‘— , 𝑗 = 𝑛 + 1, … , 𝑛 + 𝑛′.
where ο₯j are independent and identically Gaussian with mean zero and variance 1, 2 is the
variance of the control data, and ’2 is the sampling variance of the treatment data.
To calculate the posterior predictive variance of a new treatment data point minus a new control
data point, we need (up to proportionality) the posterior distribution of 2 and ’2, which we derive
here.
Prior distribution
We set
1
1
𝑝(𝛽, 𝜎 2 , 𝜎 ′2 ) ∝ 𝜎2 𝜎′2 .
Posterior distribution
Define
𝑛+𝑛′
1
π‘¦π‘Ž =
Μ…Μ…Μ…
∑ 𝑦𝑗 ,
𝑛′
𝑗=𝑛+1
𝑛
1
𝑦𝑏 = ∑ 𝑦𝑗 ,
Μ…Μ…Μ…
𝑛
𝑗=1
𝑛+𝑛′
2
𝑆𝑆𝑅′ = ∑ (𝑦𝑗 − Μ…Μ…Μ…)
π‘¦π‘Ž ,
𝑗=𝑛+1
𝑛
2
𝑆𝑆𝑅 = ∑(𝑦𝑗 − Μ…Μ…Μ…)
𝑦𝑏 .
𝑗=1
8
Before dealing with the posterior distribution of 2 and ’2, we re-arrange the quadratic expression
in to ease the integration that will follow.
𝑛(𝛽 − Μ…Μ…Μ…)
𝑦𝑏 2 𝑛′(𝛽 − Μ…Μ…Μ…)
π‘¦π‘Ž 2
+
𝜎2
𝜎′2
=
𝑛
𝑛′
Μ…Μ…Μ…
𝑦𝑏 2 ) + 2 (𝛽 2 − 2𝛽𝑦
Μ…Μ…Μ…
π‘¦π‘Ž 2 ),
(𝛽 2 − 2𝛽𝑦
𝑏 + Μ…Μ…Μ…
π‘Ž + Μ…Μ…Μ…
2
𝜎
𝜎′
=(
2
2
𝑛
𝑛′
𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑏
π‘Ž
𝑏
π‘Ž
2
+
𝛽
−
2𝛽
+
+
+
,
)
( 2
)
𝜎 2 𝜎′2
𝜎
𝜎′2
𝜎2
𝜎′2
=(
2
2
𝑛
𝑛′
π‘›πœŽ′2 Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…
π‘¦π‘Ž
𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑏
π‘Ž
2
+
[𝛽
−
2𝛽
+
+
,
)
(
)]
𝜎 2 𝜎′2
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎2
𝜎′2
2
2
2
(π‘›πœŽ′2 Μ…Μ…Μ…
𝑛
𝑛′
π‘›πœŽ′2 Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…
π‘¦π‘Ž
𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…)
π‘¦π‘Ž 2
𝑏
π‘Ž
= ( 2 + 2 ) [𝛽 − (
+
+
−
.
)]
𝜎
𝜎′
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎2
𝜎′2
π‘›πœŽ′2 + 𝑛′ 𝜎 2
For the full set of parameters, the posterior distribution is,
2
𝑛
𝑛′
(𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1)
′2
𝑝(𝛽, 𝜎 , 𝜎 |𝑦) ∝
𝑛
𝑛′
1 𝑛(𝛽
−( + 1) ′2 −( +1)
2
(𝜎 ) 2
exp [− (
= (𝜎 2 )
=
2
𝑛
𝑛′
(𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1)
−
2
exp [−
∑𝑛𝑗=1(𝛽 − 𝑦𝑗 )
2𝜎 2
2
−
∑𝑛′
𝑗=𝑛+1(𝛽 − 𝑦𝑗 )
2𝜎 2
],
− Μ…Μ…Μ…)
𝑦𝑏 2 𝑛′(𝛽 − Μ…Μ…Μ…)
π‘¦π‘Ž 2 𝑆𝑆𝑅 𝑆𝑆𝑅′
+
+ 2 + 2 )],
𝜎2
𝜎′2
𝜎
𝜎′
2
2
2
1 𝑛
𝑛′
π‘›πœŽ′2 Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…
π‘¦π‘Ž
𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑏
π‘Ž
exp [− {( 2 + 2 ) [𝛽 − (
+
+
)]
2 𝜎
𝜎′
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎2
𝜎′2
(π‘›πœŽ′2 Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…)
π‘¦π‘Ž 2 𝑆𝑆𝑅 𝑆𝑆𝑅′
+ 2 + 2 }].
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎
𝜎′
Next we marginalize  to eliminate it from the posterior distribution.
∞
𝑝(𝜎 2 , 𝜎 ′2 |𝑦) = ∫−∞ 𝑝(𝛽, 𝜎 2 , 𝜎 ′2 |𝑦)𝑑𝛽,
2
𝑛
𝑛′
1 𝑛𝑦
Μ…Μ…Μ…
𝑏
−( + 1) ′2 −( +1)
2
(𝜎 ) 2
exp [− { 2
2 𝜎
∝ (𝜎 2 )
+
2
(π‘›πœŽ′2 Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…)
π‘¦π‘Ž 2 𝑆𝑆𝑅
π‘Ž
−
+ 2
𝜎′2
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎
2
+
∞
𝑆𝑆𝑅′
1 𝑛
𝑛′
π‘›πœŽ′2 Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…
π‘¦π‘Ž
∫
exp
+
[𝛽
−
}]
{−
(
)
(
)] } 𝑑𝛽,
2
2
2
2
′
2
𝜎′
2 𝜎
𝜎′
π‘›πœŽ′ + 𝑛 𝜎
−∞
1
=
𝑛
𝑛′
(𝜎 2 )−(2 + 1) (𝜎 ′2 )−( 2 +1) (
+
2
2
(π‘›πœŽ′2 Μ…Μ…Μ…
𝑛
𝑛′ 2
1 𝑛𝑦
Μ…Μ…Μ…
𝑛′𝑦
Μ…Μ…Μ…
𝑦𝑏 + 𝑛′ 𝜎 2 Μ…Μ…Μ…)
π‘¦π‘Ž 2 𝑆𝑆𝑅
𝑏
π‘Ž
+
exp
[−
+
−
+ 2
)
(
𝜎 2 𝜎′2
2 𝜎2
𝜎′2
π‘›πœŽ′2 + 𝑛′ 𝜎 2
𝜎
𝑆𝑆𝑅′
)].
𝜎′2
9
This is the final expression. It is not a standard distribution, but it is easy to generate Markov chain
Monte Carlo samples from it.
10
Download