Supplementary Information Validation of differential gene expression algorithms: Application comparing fold-change estimation to hypothesis testing Corey M. Yanofsky and David R. Bickel A.1. Heuristic overview of the derivation of the Bayes factor This document contains an explicit derivation of the Bayes factor used in the main paper for both paired and unpaired data. In each case, there are two models for the data: the null model in which the gene is equivalently expressed in the two conditions, and the alternative model in which the gene is differentially expressed. The derivation of the Bayes factor requires two components per model. The first component is the probability distribution of the data conditional on some statistical parameters; this is termed the likelihood function. The differential expression model will always have one extra parameter to take into account the fact that the gene’s expression level is different across conditions. The data are always modeled as: observed datum = average data level + measurement error. Here, the average data level is an unknown parameter. Throughout, the measurement errors are assumed to be independent and identically distributed as Gaussian random variables. That is, for all j, π( ππ |π 2 ) = π 1 1 exp [− 2π2 ππ 2 ], 2π √ where ππ is the measurement error of the jth observation and π 2 is the data variance . The second component is the prior distribution of the model parameters, namely, the baseline expression level of the gene and the experimental variability of the data. The prior distribution summarizes everything that is known about the model parameters prior to observing the data. Since the basic expression level of the gene and the variability of the data are unknown, we use standard default priors for them. The extra parameter in the alternative model measures the amount of differential expression. Here, we use what has been called a unit-information prior distribution, that is, a prior distribution that contains exactly as much information as one extra data point. The unitinformation prior is weakly informative, so it will not unduly influence the results in favor of either model. To calculate the Bayes factor, we marginalize the model parameters; that is, we integrate the likelihood function with respect to the prior distribution, resulting in a prior predictive distribution. The marginalization removes the nuisance parameters from the expression. The Bayes factor is the ratio of the prior predictive distributions under the null and alternative models. 1 A.2 Derivations Bayes factor for paired data ′ Suppose n = n′ and π₯π,π is paired with π₯π,π . Let ′ π¦π = π₯π,π − π₯π,π . (S1) The hypothesis of equivalent expression is M0: yj = ο₯j, and the hypothesis of differential expression is M1: yj = ο‘ + ο₯j. Prior distributions For both models, we set π(π 2 ) ∝ 1 . π2 For M1, we use the unit information prior π(πΌ|π 2 ) = 1 π√2π exp [− 1 (πΌ − ππΌ )2 ]. 2π 2 In the main text of the paper, the prior mean ππΌ is set to zero. Null model prior predictive distribution The prior predictive distribution of the data under M0 is ∞ π(π¦|π0 ) = ∫ π(π 2 ) ∏ π(π¦π |π = 0, π 2 ) ππ 2 , 0 π π ∞ Μ Μ Μ 2 1 2 +1 ππ¦ ⁄2 −π π(π¦|π0 ) = (2π) ∫ ( 2) exp (− 2 ) ππ 2 , π 2π 0 π −π⁄2 Μ Μ Μ 2 ) π(π¦|π0 ) = (2π)−π⁄2 ο (2 ) (ππ¦ . Alternative model prior predictive distribution We define in advance: πΌΜ = (ππ¦Μ +ππΌ ) π+1 , 2 2 πππ 1 = (ππΌ − πΌΜ)2 + ∑(π¦π − πΌΜ) . π After some algebra, we can derive Μ Μ Μ 2 − (π¦Μ )2 ) + πππ 1 = π(π¦ π (π¦Μ − ππΌ )2 . π+1 Here, SSR1 is the effective sum of squares of the residuals under M1. It is the sum of the SSR using the maximum likelihood estimator πΌππΏπΈ = π¦Μ and a term that penalizes disagreement between the MLE and the prior mean. The prior predictive distribution of the data under M1 is ∞ ∞ π(π¦|π1 ) = ∫ ∫ π(π 2 )π(πΌ|π 2 ) ∏ π(π¦π |π = πΌ, π 2 ) ππΌ ππ 2 , π 0 −∞ ∞ (π+1) +1 2 1 π(π¦|π1 ) = (2π)−(π+1)⁄2 ∫ ( 2 ) π 0 ∞ ( ∫ exp {− −∞ 1 2 [(πΌ − ππΌ )2 + ∑(π¦π − πΌ) ]} ππΌ ) ππ 2 . 2 2π π Isolating the part of the integrand that is a quadratic expression in ο‘, we complete the square: 2 (πΌ − ππΌ )2 + ∑(π¦π − πΌ) π Μ Μ Μ 2 = πΌ 2 − 2πΌππΌ + ππΌ 2 + ππΌ 2 − 2πΌππ¦Μ + ππ¦ Μ Μ Μ 2 + ππΌ 2 ) = (π + 1)πΌ 2 − 2πΌ(ππ¦Μ + ππΌ ) + (ππ¦ Μ Μ Μ 2 − π(π¦Μ )2 ) + ππΌ 2 + π(π¦Μ )2 − = (π + 1)(πΌ − πΌΜ)2 + (ππ¦ π2 (π¦Μ )2 + 2ππ¦Μ ππΌ + ππΌ 2 π+1 2 2 (π¦ 2 2 2 2 (π¦ 2 2 Μ Μ Μ 2 − (π¦Μ )2 ) + πππΌ + π Μ ) + ππΌ + π(π¦Μ ) − π Μ ) + 2ππ¦Μ πá + ππΌ = (π + 1)(πΌ − πΌΜ)2 + π(π¦ π+1 π+1 Μ Μ Μ 2 − (π¦Μ )2 ) + = (π + 1)(πΌ − πΌΜ)2 + π(π¦ 1 (π(π¦Μ )2 − 2ππ¦Μ ππΌ + πππΌ 2 ) π+1 Μ Μ Μ 2 − (π¦Μ )2 ) + = (π + 1)(πΌ − πΌΜ)2 + π(π¦ π (π¦Μ − ππΌ )2 π+1 = (π + 1)(πΌ − πΌΜ)2 + πππ 1 . Substituting back into the integral, we have (π+1) π(π¦|π1 ) = (2π)−(π+1)⁄2 +1 ∞ 1 ∞ πππ π+1 2 exp (− 21 ) (∫−∞ exp [− 2 (πΌ ∫0 (π2 ) 2π 2π − πΌΜ)2 ] ππΌ ) ππ 2 , 3 ∞ 1 π +1 2 π(π¦|π1 ) = (2π)−π⁄2 (π + 1)−1⁄2 ∫0 (π2 ) exp (− πππ 1 ) ππ 2 , 2π 2 π 2 π(π¦|π1 ) = (2π)−π⁄2 ο ( ) (π + 1)−1⁄2 (πππ 1 )−π⁄2 . The Bayes factor is π π(π¦|π0 ) πππ 1 2 π΅πΉ = = √π + 1 ( 2 ) . Μ Μ Μ π(π¦|π1 ) ππ¦ (S2) Equations (S1) and (S2) together are equivalent to equations (9), (11), and (12) of the main paper. Bayes factor for two-sample data ′ Suppose that ππ,π and ππ,π are independent. Define π¦π = π₯π,π , π = 1, … , π, (S3) ′ π¦π+π = π₯π,π , π = 1, … , π′ . (S4) The hypothesis of equivalent expression is M0: π¦π = π½ + ππ , π = 1, … , π + π′, and the hypothesis of differential expression is M1: π¦π = π½ + ππ , π = 1, … , π, π¦π = πΌ + ππ , π = π + 1, … , π + π′. Preliminaries To fix notation, let π+π′ 1 Μ Μ Μ π¦2 = ∑ π¦π 2 , π+π π=1 π+π′ 1 π¦Μ = ∑ π¦π , π+π π=1 π+π′ 1 π¦π = Μ Μ Μ ∑ π¦π , π′ π=π+1 π 1 π¦π = ∑ π¦π . Μ Μ Μ π π=1 4 Before beginning the derivation of the Bayes factor, we note that the maximum likelihood estimates under M1 are πΌππΏπΈ = Μ Μ Μ , π¦π π½ππΏπΈ = Μ Μ Μ , π¦π and the sum of squares of the residuals using the MLEs is 2 2 Μ Μ Μ 2 − π(π¦ πππ ππΏπΈ = (π + π)π¦ Μ Μ Μ ) − π(π¦ Μ Μ Μ ) π π . Prior distributions For both models, we set the prior for (π½, π 2 ) to be 1 π(π½, π 2 ) ∝ π2 . For the extra parameter in M1, we use the unit information prior centered at πΌ = π½, π(πΌ|π½, π 2 ) = 1 1 exp [− 2 (πΌ π √2π 2π − π½)2 ]. Null model prior predictive distribution The prior predictive probability of the data under M0 is ∞ ∞ π+π′ π(π¦|π0 ) = ∫ π(π½, π 2 ) ∫ ∏ π(π¦π |π½, π 2 ) ππ½ ππ 2 , −∞ π=1 0 ∞ −π+π′ 2 π(π¦|π0 ) = (2π) π+π′ +1 2 1 ∫ ( 2) π 0 −∞ ∞ −π+π′−1 2 π(π¦|π0 ) = (2π) ∞ Μ Μ Μ 2 − (π¦Μ )2 ) (π + π′)(π¦ π(π½ − π¦Μ )2 exp (− ( ∫ exp ) (− ) ππ½ ) ππ 2 , 2π 2 2π 2 −12 (π + π′) π+π′−1 +1 2 1 ∫ ( 2) π exp (− 0 π+π′−1 2 π(π¦|π0 ) = (2π)− 1 (π + π′)−2 ο ( Μ Μ Μ 2 − (π¦Μ )2 ) (π + π′)(π¦ ) ππ 2 , 2π 2 π + π′ − 1 Μ Μ Μ 2 − (π¦Μ )2 )]−(π+π′−1)⁄2 . ) [(π + π′)(π¦ 2 Alternative model prior predictive distribution We define in advance: πΈ(πΌ|π½, π¦) = π½Μ = ( (π′π¦ Μ Μ Μ π + π½) , π′ + 1 (π + ππ′)π¦ Μ Μ Μ Μ Μ Μ π + π′π¦ π ), ′ π + π + ππ′ 5 πππ 1 = πππ ππΏπΈ + ππ′ (π¦ Μ Μ Μ − Μ Μ Μ ) π¦π 2 . (π + π′ + ππ′) π As before, the effective sum of squares of the residuals under M1 is the sum of the SSR using the maximum likelihood estimators and a penalty term for disagreement between the MLEs and the prior distribution. Before dealing with the marginal probability of the data under M1, we re-arrange the quadratic expression in ο‘ and ο’ to ease the integrations. π π+π′ 2 2 2 (πΌ − π½) + ∑(π¦π − π½) + ∑ (π¦π − πΌ) π=1 π=π+1 2 Μ Μ Μ 2 = πΌ 2 + π½ 2 − 2πΌπ½ + ππ½ 2 − 2ππ¦ Μ Μ Μ π½ Μ Μ Μ πΌ π + π′πΌ − 2π′π¦ π + (π + π′)π¦ 2 Μ Μ Μ 2 = (π + 1)π½ 2 − 2ππ¦ Μ Μ Μ π½ Μ Μ Μ π + (π′ + 1)πΌ − 2(π′π¦ π + π½)πΌ + (π + π′)π¦ 2 Μ Μ Μ 2 = (π + 1)π½ 2 − 2ππ¦ Μ Μ Μ π½ π + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ − =( =( =( =( =( 2 (π′π¦ Μ Μ Μ π + π½) π′ + 1 π + π′ + ππ′ 2 π′ 2 Μ Μ Μ 2 Μ Μ Μ π¦π π½ + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ Μ Μ Μ ) ) π½ − 2 (ππ¦ π+ π′ + 1 π′ + 1 π′2 (π¦ − Μ Μ Μ )2 π′ + 1 π (π + ππ′)π¦ π + π′ + ππ′ Μ Μ Μ Μ Μ Μ 2 π + π′π¦ π Μ Μ Μ 2 ) [π½ 2 − 2 ( ) π½] + (π′ + 1)(πΌ − πΈ(πΌ|π½)) + (π + π′)π¦ π′ + 1 π + π′ + ππ′ π′2 (π¦ − Μ Μ Μ )2 π′ + 1 π 2 π + π′ + ππ′ 2 2 Μ Μ Μ 2 − π′ (π¦ Μ Μ Μ )2 ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ π′ + 1 π′ + 1 π 2 [(π + ππ′)π¦ Μ Μ Μ Μ Μ Μ ] π + π′π¦ π − (π + π′ + ππ′)(π′ + 1) 2 π + π′ + ππ′ 2 2 Μ Μ Μ 2 − π′ (π¦ Μ Μ Μ )2 ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ π′ + 1 π′ + 1 π 2 2 (π + ππ′)2 (π¦ Μ Μ Μ ) + π′2 (π¦ Μ Μ Μ ) + 2π′(π + ππ′)π¦ Μ Μ Μ π¦π π π π Μ Μ Μ − (π + π′ + ππ′)(π′ + 1) π + π′ + ππ′ 2 2 Μ Μ Μ 2 ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ π′ + 1 2 2 π2 (π′ + 1)(π¦ Μ Μ Μ ) + π′2 (π + 1)(π¦ Μ Μ Μ ) − 2ππ′π¦ Μ Μ Μ π¦π π π π Μ Μ Μ − (π + π′ + ππ′) 6 =( π + π′ + ππ′ 2 2 2 2 Μ Μ Μ 2 − π′(π¦ Μ Μ Μ ) − π(π¦ Μ Μ Μ ) ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + (π + π′)π¦ π π π′ + 1 2 2 ππ′(π¦ Μ Μ Μ ) + ππ′(π¦ Μ Μ Μ ) − 2ππ′π¦ Μ Μ Μ π¦π π π π Μ Μ Μ + (π + π′ + ππ′) =( π + π′ + ππ′ ππ′ 2 2 (π¦ Μ Μ Μ − Μ Μ Μ ) π¦π 2 ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + πππ ππΏπΈ + (π + π′ + ππ′) π π+1 =( π + π′ + ππ′ 2 2 ) (π½ − π½Μ ) + (π′ + 1)(πΌ − πΈ(πΌ|π½, π¦)) + πππ 1 . π′ + 1 The marginal probability of the data under M1 is ∞ ∞ ∞ π π(π¦|π1 ) = ∫ ∫ ∫ π(πΌΜ, π 2 )π(πΌ|π½, π 2) π+π′ 2 ∏ π(π¦π |π = π½, π ) ∏ π(π¦π |π = πΌ, π 2 ) ππΌ ππ½ ππ 2 , π=π 0 −∞ −∞ ∞ (π+π′+1) +1 2 1 ∫ ( 2) π −π+π′+1 2 π(π¦|π1 ) = (2π) 0 ∞ π=π+1 ∞ π 1 2 ∫ ∫ exp [− 2 ((πΌ − π½)2 + ∑(π¦π − π½) 2π π=1 −∞ −∞ π+π′ 2 + ∑ (π¦π − πΌ) )] ππΌ ππ½ ππ 2 , π=π+1 ∞ −π+π′+1 2 π(π¦|π1 ) = (2π) (π+π′+1) +1 2 1 ∫ ( 2) π 0 ∞ πππ 1 π + π′ + ππ′ exp (− ) ( ∫ exp [− 2 ] ππ½ ) 2 2π 2π (π + 1) −∞ ∞ × ( ∫ exp [− −∞ ∞ −π+π′−1 2 π(π¦|π1 ) = (2π) −12 (π + π′ + ππ′) (π + 1) 2 (πΌ − πΈ(πΌ|π½, π¦)) ] ππΌ ) ππ 2 , 2 2π (π+π′−1) +1 2 1 ∫ ( 2) π exp (− 0 π+π−1 2 π(π¦|π1 ) = (2π)− 1 (π + π′ + ππ′)−2 ο ( πππ 1 ) ππ 2 , 2π 2 π + π′ − 1 ) (πππ 1 )−(π+π′−1)⁄2 . 2 The Bayes factor is π(π¦|π0 ) π + π′ + ππ′ πππ 1 π΅πΉ = =√ ( Μ Μ Μ 2 − (π¦Μ )2 )) π(π¦|π1 ) π + π′ (π + π′)(π¦ π+π′−1 2 . (S5) Equations (S3), (S4) and (S5) together are equivalent to equations (10), (11), and (12) of the main paper. 7 B.1. Derivation of posterior distribution for sampling variances Under the null hypothesis with non-paired data, the data have the same mean, but the null hypothesis says nothing about sampling variances for the treatment and control data. In section A, the variances were treated as identical. Here we treat them as unrelated. Suppose that π¦π = π½ + πππ , π = 1, … , π, π¦π = π½ + π′ππ , π = π + 1, … , π + π′. where ο₯j are independent and identically Gaussian with mean zero and variance 1, ο³2 is the variance of the control data, and ο³’2 is the sampling variance of the treatment data. To calculate the posterior predictive variance of a new treatment data point minus a new control data point, we need (up to proportionality) the posterior distribution of ο³2 and ο³’2, which we derive here. Prior distribution We set 1 1 π(π½, π 2 , π ′2 ) ∝ π2 π′2 . Posterior distribution Define π+π′ 1 π¦π = Μ Μ Μ ∑ π¦π , π′ π=π+1 π 1 π¦π = ∑ π¦π , Μ Μ Μ π π=1 π+π′ 2 πππ ′ = ∑ (π¦π − Μ Μ Μ ) π¦π , π=π+1 π 2 πππ = ∑(π¦π − Μ Μ Μ ) π¦π . π=1 8 Before dealing with the posterior distribution of ο³2 and ο³’2, we re-arrange the quadratic expression in to ease the integration that will follow. π(π½ − Μ Μ Μ ) π¦π 2 π′(π½ − Μ Μ Μ ) π¦π 2 + π2 π′2 = π π′ Μ Μ Μ π¦π 2 ) + 2 (π½ 2 − 2π½π¦ Μ Μ Μ π¦π 2 ), (π½ 2 − 2π½π¦ π + Μ Μ Μ π + Μ Μ Μ 2 π π′ =( 2 2 π π′ ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ π π π π 2 + π½ − 2π½ + + + , ) ( 2 ) π 2 π′2 π π′2 π2 π′2 =( 2 2 π π′ ππ′2 Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ π π 2 + [π½ − 2π½ + + , ) ( )] π 2 π′2 ππ′2 + π′ π 2 π2 π′2 2 2 2 (ππ′2 Μ Μ Μ π π′ ππ′2 Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ ) π¦π 2 π π = ( 2 + 2 ) [π½ − ( + + − . )] π π′ ππ′2 + π′ π 2 π2 π′2 ππ′2 + π′ π 2 For the full set of parameters, the posterior distribution is, 2 π π′ (π 2 )−(2 + 1) (π ′2 )−( 2 +1) ′2 π(π½, π , π |π¦) ∝ π π′ 1 π(π½ −( + 1) ′2 −( +1) 2 (π ) 2 exp [− ( = (π 2 ) = 2 π π′ (π 2 )−(2 + 1) (π ′2 )−( 2 +1) − 2 exp [− ∑ππ=1(π½ − π¦π ) 2π 2 2 − ∑π′ π=π+1(π½ − π¦π ) 2π 2 ], − Μ Μ Μ ) π¦π 2 π′(π½ − Μ Μ Μ ) π¦π 2 πππ πππ ′ + + 2 + 2 )], π2 π′2 π π′ 2 2 2 1 π π′ ππ′2 Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ π¦π ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ π π exp [− {( 2 + 2 ) [π½ − ( + + )] 2 π π′ ππ′2 + π′ π 2 π2 π′2 (ππ′2 Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ ) π¦π 2 πππ πππ ′ + 2 + 2 }]. ππ′2 + π′ π 2 π π′ Next we marginalize ο’ to eliminate it from the posterior distribution. ∞ π(π 2 , π ′2 |π¦) = ∫−∞ π(π½, π 2 , π ′2 |π¦)ππ½, 2 π π′ 1 ππ¦ Μ Μ Μ π −( + 1) ′2 −( +1) 2 (π ) 2 exp [− { 2 2 π ∝ (π 2 ) + 2 (ππ′2 Μ Μ Μ π′π¦ Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ ) π¦π 2 πππ π − + 2 π′2 ππ′2 + π′ π 2 π 2 + ∞ πππ ′ 1 π π′ ππ′2 Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ π¦π ∫ exp + [π½ − }] {− ( ) ( )] } ππ½, 2 2 2 2 ′ 2 π′ 2 π π′ ππ′ + π π −∞ 1 = π π′ (π 2 )−(2 + 1) (π ′2 )−( 2 +1) ( + 2 2 (ππ′2 Μ Μ Μ π π′ 2 1 ππ¦ Μ Μ Μ π′π¦ Μ Μ Μ π¦π + π′ π 2 Μ Μ Μ ) π¦π 2 πππ π π + exp [− + − + 2 ) ( π 2 π′2 2 π2 π′2 ππ′2 + π′ π 2 π πππ ′ )]. π′2 9 This is the final expression. It is not a standard distribution, but it is easy to generate Markov chain Monte Carlo samples from it. 10