Bayes Factors Comparing Two Multi-Normal Covariance Matrices and Their Application to Microarray Data Analysis Ainong Zhou A Dissertation submitted to The Faculty of Columbian School of Arts and Sciences of The George Washington University in partial satisfaction of the requirements for the degree of Doctor of Philosophy June 20, 2006 1 Outline Microarray techniques and functional gene sets Class comparison of gene set expression in mean level and similarity measures Bayes factors comparing two covariance matrices Simulation study of proposed Bayes factors Case study of a lung cancer microarray analysis Conclusions Further developments 2 DNA microarrays • • • • • Microarrays measure the gene expression of thousands of genes to an entire genome in a sample by measuring RNA abundance on one slide/chip. Typically in an experiment, RNA is extracted from a part (cells in culture, tumors, other.) of an organism. The RNA is then amplified, labeled so that a scanner can see it and then hybridized to the microarray slide. The brightness of each spot on the microarray represents the amount of RNA for each gene, and hence the gene expression. Image analysis and data normalization are needed. 3 cDNA Arrays cDNA clones (probes) excitation laser 2 scanning laser 1 emission PCR product amplification purification printing mRNA target overlay image and normalize 0.1nl/spot microarray Hybridize target to microarray analysis 4 Compliments of R. Irizarry High Density Oligonucleotide Array GeneChip Probe Array Hybridized Probe Cell Single stranded, labeled RNA target * * * * * Oligonucleotide probe 24µm 1.28cm Millions of copies of a specific oligonucleotide probe >200,000 different complementary probes Image of Hybridized Probe Array 5 Compliments of R. Irizarry Standard Statistical Tests for Differential Expression of Individual Genes Two Groups: Fold change Student’s t-test: gene-specific, global, or regularized. More than two groups: ANOVA : gene-specific, global, or regularized F-test. Significance and multiple testing: Nominal p-values Family-wise error rate: the probability of accumulating one or more false-positive errors over a number of statistical tests. False discovery rate: the proportion of false positives among all of the genes initially identified as being differentially expressed. 6 Analysis at the level of single gene Why is it insufficient? • • Microarrays have a well-defined internal data structure dictated by the genetic network of the living cell. Gene-specific analytical tools often ignore this structure. Identifying differentially expressed genes becomes a challenge when the magnitude of differential expression is small. 7 Analysis at the level of functional groups defined a priori : why important? • • By incorporating biological knowledge, we can detect modest but coordinated expression changes of sets of functionally related genes. The interaction of genes in the gene set may be more important than the changes in individual genes. 8 Gene Sets There are usually many sets of genes that might be of interest in a given microarray experiment. Examples include: genes in biological pathways, e.g. biochemical, metabolic, and signaling. genes associated with a particular location in the cell. genes having a particular function or being involved in a particular process. 9 Class comparison of gene set activity: currently used methods Enrichment of members of a gene set among the top-ranked genes: Fisher’s exact test (Draghici et al 2003) Z-test (Doniger et al 2003) Kolmogorov-Smirnov (K-S) test (Mootha et al 2003) Permutation test of similarity measures (Rahnenfuhrer et al 2004) Comment: All of these methods assume all the genes are independent, and focus on the comparison of gene expression at mean level. 10 Biological Motivation for Comparing Two Covariance Matrices • • • Increased/decreased variations of gene expression are found under an abnormal condition compared to a normal condition. Some genes may be tightly correlated or coupled in the normal or less severe condition but become decoupled due to disease progression, as, for example, in cancer patients. Any significant difference between two covariance matrices can be attributed to difference in either variances or correlations and may illustrate some important geneset-wise regulation/deregulation associated with a specific phenotype. 11 Motivation from Observed Data -0.05 -0.15 Observed 0.00 0.05 0.05 Variances -0.10 Observed Means -0.05 0.00 0.05 -0.15 -0.05 0.05 Permuted Permuted Covariances Correlations -0.4 0.0 Observed 0.02 -0.02 Observed 0.4 -0.10 -0.02 0.00 0.02 Permuted 0.04 -0.4 -0.2 0.0 0.2 0.4 Permuted Gene expression of the Myosin signaling pathway, Data from Bhattacharjee et al. PNAS, 2001 12 Class Comparison of Variance/Covariance Matrices Likelihood ratio test (Anderson 1984) |S| Ng g N 2 T g N log |S | p log 1 g , df 12 p p 1 It is asymptotically distributed as Chi-squared with (1/2)p(p+1) degrees of freedom. A likelihood-based hierarchical test of principal components (Flury 1984,1987) Resampling tests (Zhu et al 2002) Bayes factors using Jeffery’s improper priors (Kim et al 1999, 2001) 13 Why Bayes Factor? • • • The use of prior probability distributions represents a powerful mechanism for incorporating information from previous studies. In microarray data, this becomes more needed as the sample sizes are usually very limited due to the cost of microarrays and availability of samples. Bayes factor uses information on whole likelihood function, not just maxima. Bayes factor can be used as easily interpretable alternatives to p values. 14 The Bayes Factor 1. If data D under each of the two hypotheses H₁ and H₂ is distributed according to probability densities p(D|H₁) or p(D|H₂) and the prior probabilities p(H₁) and p(H₂)=1-p(H₁), the posterior probabilities can be computed as: 2. 3. p H1 |Dand p H2 |D1 p H1 |D By the Bayes's therorem, p D|H1 p H1 , p D|H1 p H1 p D|H2 p H2 p D|H2 p H2 p H2 |D . p D|H1 p H1 p D|H2 p H2 p H1 |D 4. 5. 6. So, p H 2 |D p H1 p D|H p H p H 2 2 2 B 21 p H |D p D|H p H 1 1 1 where B21 is the Bayes factor. Thus, the Bayes factor is a summary of the evidence provided by the data in favor of one scientific theory, represented by a statistical model, as compared to another. 15 Microarray Data Let Xg={xgij}, i=1,...,Ng; j=1,...,p be the observed data (a Ng×p matrix) in the g-th group, and define: k N N g , n g N g 1 g1 Ng 1 x g Ng x gi , i1 k x 1 N Ng k i1 g1 Ng x gi g1 i1 S g x gi x x gi x g g S S g . k Ng S 0 x gi x x gi x g1 i1 where xgi., i=1,...,Ng, is an independent observation (vector) of the p genes from the gth group. In a microarray study, the observed data is either the log-transformed absolute gene expression with 0 so far out in the tail or the log-ratios. In both cases, the normal approximation is often met. 16 Assumptions on Data Assume Xg={xgij}, the observed data matrix in the gth group, has a p-dimensional normal distribution, conditional on the mean parameter μg and the covariance matrix Σg: p Xg |g , g 1 1 pNg 1 Ng 2 2 |g | 2 Ng 1 exp 12 i x g x gi g gi g 1 where xgi, i=1,...,Ng, is an independent observation (vector) on the p genes from the g-th group. 1 N g |g | 2 1 pN g 2 2 1 1 exp 12 N g x x tr S g g g g g g g 17 Prior Parameters In p-dimensional multivariate normal data with complete observations, a conjugate prior can be specified for the gth population by: g |g N p g , 1 g , g IWp , . where νg is the mean p-vector of the prior distribution of the mean, and κis a scalar for the scale of this prior distribution. IWp(Λ,η) denotes the inverted Wishart distribution with (η-p-1) degrees of freedom and scale matrix Λ, a p×p positive definite matrix, whose density is: p g where 1 p 1 || 2 |g | 2 p1exp 12 tr g 1 1 22 p 2 p t pp1/4 1 p t 12 j 1 , the multivariate gamma function j 1 ηis a scalar for the scale parameter of the prior covariance distribution, and Λis a p×p positive definite matrix for the shape parameter of this prior distribution. We assume that the two populations are independent a priori, i.e.: p 1 , 2 , 1 , 2 p 1 , 1 p 2 , 2 18 Hierarchical Models for Comparing Two Covariance Matrices Models Conjugate Priors Two covariances are: H0: Equal: 1 2 H1: Proportionally equal 1 , 2 c 2 H2: correlationally equal 1 , 2 CC H3: Freely different: 1 2 Np 1 |1 , 1 , Np 2 |2 , 1 , and IWp |, 1 |N p 1 , 1 , 2 |, c N p 2 , 1 c 2 IWp , , c p c , c 1 |N p 1 , 1 , 2 |, C N p 2 , 1 CC IWp , , C p C , C 1 |1 N p 1 , 1 1 , 2 |2 N p 2 , 1 2 1 IWp , , 2 IWp , 19 Bayes Factor for Testing Whether the Two Covariance Matrices Are Equal or Not Equal The following Bayes factor is to compare the Models H3 and H0: B 30 || 1 2 p 2 g1 2 p p N g 2 g1 Ng 2 g 1 N g Ng Ng 2 N 2 x x S g g g g x x S g g g g g 1 2 N 1 2 Ng . # 20 Bayes Factor for Testing Whether the Two Covariance Matrices Are Equal or Proportional The following Bayes factor is to compare the proportional model and equivalent model: 2 B 10 g1 c pN N g x x S g g g g N g N 1 N1 2 c 2 N 2 N2 x x 1 1 1 1 x x 2 2 2 2 1 N 2 12 N p c dc. # S1 c 2 S 2 where p(c) is any chosen positively-ranged univariate probability distribution. 21 Bayes Factor for Testing Whether the Two Covariance Matrices Are Proportionally or Correlationally Equal The following Bayes factor is to compare the correlational model and proportional model: N 1 N1 |C| N 2 N 2 N2 12 N x x 1 1 1 1 1 C 1 x x 2 2 2 2 C p C dC S 1 C 1 S 2 C 1 B 21 N 1 N1 c pN 2 c 2 N 2 N2 x x 1 1 1 1 x x 2 2 2 2 12 N p c dc. S1 c 2 S 2 where p(c) is any chosen positively-ranged univariate distribution, and p(C) is any chosen positively-ranged multivariate distribution. 22 Bayes Factor for Testing Whether the Two Covariance Matrices Are Correlationally Equal or Different The following Bayes factor is to compare the different model and correlational model: B 32 || 1 2 2 g 1 p p 2 g1 p 2 N g Ng N 2 N2 2 N 2 # x x g g g g S g N 1 N1 |C|N2 Ng x x 1 1 1 1 1 C1 x x 2 2 2 2 C 12 Ng 12 N . p C dC S 1 C1 S 2 C1 where p(C) is any chosen positively-ranged multivariate distribution. 23 Simulation Study 1 The first set of simulations is conducted to determine the power of the Bayes factor B30 in comparison with that of likelihood ratio test for several sets of prior parameters. Assumption of Prior Parameters: • The location parameters (ν1 and ν2) for the two prior population means are assumed to be a common p-vector of 0’s. The scale parameter κ for the prior population means is set to be 1, 15, 60, or 120 in the range of sample size to be simulated. • The parameters (η and Λ) for the prior population covariance matrix are chosen according to the first moment of the inverted Wishart distribution. The shape parameter η is set to be p+2, p+4, p+7, or p+12. The scale matrix Λ is assumed to be the product of a positive scalar (1, 2, 4, or 8) to the identity matrix. 24 Procedure for Power Estimation 1. 2. 3. Draw one sample (Σ) from the common inverted Wishart distribution IW(Λ,η). Draw one p-variate normal sample separately for each of the two population means (μ1,μ2) with common prior mean (ν) and covariance matrix (1/κΣ). Generate N1 p-variate normal samples for the first group with mean μ1 and covariance matrix Σ, and N2 p-variate normal samples for the second group with mean μ2+ ε (ε =0 or 2, the difference between two sample means) and covariance matrix which is equal to either: 4. 5. 6. 7. δΣ: Proportional to the first group, δ= 1, 1.2, 1.5, or 2.0 - Scenario 1, or ΔΣΔ: Correlationally equal to the first group, where Δ is a diagonal matrix {σq,σq+1,...,σq+(p-1)}, σ=1.0, 1.1, 1.2, or 1.5, q=floor(-p/2) - Scenario 2. Compute the Bayes factors (B30) and LRT statistics. Repeat steps 1-4 for 1000 times Set the significance limit as the 95th %tile under the null condition: δ=1 or σ=1. Compute the power of the Bayes factor and LRT as the proportion of B30’s and LRT’s that are greater than the significance limit. 25 Comparison of Power between LRT and Bayes Factor (B30) Effect of sample size (N1=N2=N*p), dimension (p), and difference of sample means (ε) Scenario 1 p Test ε 0 10 15 20 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=2*p N=3*p N=5*p N=10*p 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 2 5 1.2 1.5 2 0 1 1.2 1.5 2 0 1 1.2 1.5 2 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.2 1.5 2 0 1 1.2 1.5 2 1 1.2 1.5 2 1 1.2 1.5 2 0 1 1.2 1.5 2 X-axis: δ (effect size in Scenario 1), Y-axis: Power, Priors: ν1= ν2=0, κ=1, η=p+2, Λ=Ip 26 Comparison of Power between LRT and Bayes Factor (B30) Effect of sample size (N1=N2=N*p), dimension (p), and difference of sample means (ε) Scenario 2 p Test 15 20 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=3*p N=5*p N=10*p 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 2 10 N=2*p ε 0 5 1.1 1.2 1.5 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 1 1.1 1.2 1.5 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 X-axis: σ (effect size in Scenario 2), Y-axis: Power, Priors: ν1= ν2= 0, κ=1, η=p+2, Λ=Ip 27 Effect of Sample Size and Dimension on the Power of LRT and Bayes Factor (B30) • • In Scenario 1, B30 has greater power than LRT when the sample sizes are 2 times of dimension and there is no difference between the two sample means. As the sample size increases proportionally to the dimension or the two sample means become different, however, LRT becomes more powerful than B30 in most cases. In Scenario 2, LRT shows greater power than B30 under all conditions. 28 Comparison of Power between LRT and Bayes Factor (B30) Effect of κ with different sample sizes (N1=N2=N*p, p=10) Scenario 1 κ Test 1 15 60 120 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.2 1.5 2 0 1 1.2 1.5 2 0 1 1.2 1.5 2 1 1.2 1.5 2 X-axis: δ (effect size in Scenario 1), Y-axis: Power, Priors: ν1 = ν2 = 0, η=p+2, Λ=Ip 29 Comparison of Power between LRT and Bayes Factor (B30) Effect of κ with different sample sizes (N1=N2=N*p, p=10) Scenario 2 κ Test 1 15 60 120 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 1 1.1 1.2 1.5 X-axis: σ (effect size in Scenario 2), Y-axis: Power, Priors: ν1 = ν2 = 0, η=p+2, Λ=Ip 30 Comparison of Power between LRT and Bayes Factor (B30) Effect of η with different sample sizes (N1=N2=N*p, p=10) Scenario 1 η Test p+2 p+4 p+7 p+12 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.2 1.5 2 0 1 1.2 1.5 2 0 1 1.2 1.5 2 1 1.2 1.5 2 X-axis: δ (effect size in Scenario 1), Y-axis: Power, Priors: ν1 = ν2 = 0, κ = 1, Λ=Ip 31 Comparison of Power between LRT and Bayes Factor (B30) Effect of η with different sample sizes (N1=N2=N*p, p=10) Scenario 2 η Test p+2 p+4 p+7 p+12 LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 1 1.1 1.2 1.5 X-axis: σ (effect size in Scenario 2), Y-axis: Power, Priors: ν1 = ν2 = 0, κ = 1, Λ=Ip 32 Comparison of Power between LRT and Bayes Factor (B30) Effect of Λ with different sample sizes (N1=N2=N*p, p=10) Scenario 1 Λ (Diagonal) Test Ip 2*Ip 4*Ip 8*Ip LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.2 1.5 2 0 1 1.2 1.5 2 0 1 1.2 1.5 2 1 1.2 1.5 2 X-axis: δ (effect size in Scenario 1), Y-axis: Power, Priors: ν1 = ν2 = 0, κ = 1,η=p+2 33 Comparison of Power between LRT and Bayes Factor (B30) Effect of Λ with different sample sizes (N1=N2=N*p, p=10) Scenario 2 Λ (Diagonal) Test Ip 2*Ip 4*Ip 8*Ip LRT ─■─ ─■─ ─■─ ─■─ B30 ─▲─ ─▲─ ─▲─ ─▲─ N=15 N=20 N=30 N=50 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 0 1 1.1 1.2 1.5 1 1.1 1.2 1.5 X-axis: σ (effect size in Scenario 2), Y-axis: Power, Priors: ν1 = ν2 = 0, κ = 1,η=p+2 34 Effect of Prior Parameters on the Power of LRT and Bayes Factor (B30) • • • No effect of κ is found on the power of either B30 or LRT. The power of both LRT and B30 decreases as η increases. No effect of Λ is observed on the power of either B30 or LRT. 35 Simulation Study 2 The power of the three hierarchical Bayes Factors comparing proportional vs equal models (B10), correlationally vs proportionally equal models (B21), and Freely different vs correlationally equal models (B32) is compared in this set of simulations. The power is estimated as follows: 1. Draw one sample (Σ) from the inverted Wishart distribution IW(Λ,η). 2. Draw one sample (c) from the log-normal distribution (0, 1). 3. Draw one sample (C) from the multivariate log-normal distribution (0, Ip). 4. Generate one p-variate normal sample for each of the two population means (μ1,μ2) with common prior mean (ν) and covariance matrix (1/kΣ, c2/kΣ, or 1/κCΣC depending on the underlying model). 5. Generate N1 p-variate normal samples for the first group with means μ1 and covariances Σ, and N2 p-variate normal samples for the second group with means μ2 and variances δΣ (Scenario 1) or ΔΣΔ (Scenario 2) as before. 6. Compute the integrants of c and C. 7. Repeat steps 2-6 for 1000 times to estimate the integrals relative to c and C using simple Monte Carlo simulations. 8. Compute the Bayes factors. 9. Repeat steps 1-8 for 1000 times. 10. Set the significance limit as the 95th %tile when δ=1 or σ=1. 11. Compute the power of the Bayes factors. 36 Comparison of Power between Bayes Factors B10, B21 and B32 B21 B32 ─▲─ ─▲─ ─▲─ Scenario 1 p 2 B10 Scenario 2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 5 2 4 8 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 1 1.2 1.5 2 1 1.2 1.5 2 0 1 2 4 8 X-axis: δ(Scenario 1) or σ (Scenario 2), Y-axis: Power Sample Sizes: N1=N2=10, Priors: ν1 = ν2 = 0, κ = 1, η = p+2, Λ = Ip 37 Case Study Bhattacharjee et al. PNAS, 2001 12,600 gene expression measurements obtained using Affymetrix oligonucleotide arrays. 203 cancer patients and normal subjects, 5 predefined cancer subtypes, plus staging and survival information. 38 Data Preprocessing • • • Multiple transcripts of the same genes defined by Locuslink are averaged, which resulted in a total of 8895 distinct genes. Multiple specimens from the same subject are averaged. The data are centered separately for each gene within each group 39 Determination of Priors • • • ν1 and ν2 are set to 0 considering that the data are centered in each group. k is set to be 0 to minimize effect of differential means between prior and sample means, if any. η and Λ are set to be the maximum likelihood and unbiased estimates, respectively from the data excluding genes that are involved in the pathways being analyzed. 4) The genes that are contained in a specific pathway (say, p genes) are excluded from the genes on the chip (say, G genes). A total of g (=(G-p)/p) exclusive subsets of genes are randomly generated from (G-p) genes. For each subset of genes, the sample covariance matrix (Si) was computed for the two groups combined. Λ is set to be the unbiased estimate expressed as: 5) 1 n 1 p 1 , where 1n i i . maximum likelihood estimator of η. 1 the A recursive approach is used to compute 1) 2) 3) 40 Comparisons of the Myosin Signaling Pathways between Above and Below the Median Survival -0.05 0.05 0.00 0.05 -0.15 -0.05 0.05 Permuted Covariances Correlations Observed LR_P 0 BF_P 0.018 LR_P 0 BF_P 0.018 -0.4 0.0 0.4 Permuted -0.02 0.02 -0.10 Observed LR_P 0 BF_P 0.018 -0.15 Observed 0.43 -0.05 0.00 0.05 LR_P Variances -0.10 Observed Means -0.02 0.00 0.02 Permuted 0.04 -0.4 -0.2 0.0 0.2 0.4 Permuted 41 Conclusions • • • Four conjugate Bayes factors are derived to compare two covariance matrices in four hierarchical models. The power of proposed Bayes factors is evaluated with different prior parameters in comparison to that of the likelihood ratio test. One of the proposed Bayes factor is applied to a microarray analysis and the gene expression of one pathway is found to significantly differ in variances but not in means between two survival groups. 42 Future Studies Sensitivity of Bayes factors to prior distributions Bayes factors comparing principal components of covariance matrices 43 Acknowledgments Research Committee: Director: Prof. Giovanni Parmigiani, Johns Hopkins University Co-director: Prof. John Lachin, George Washington University Readers: Prof. Reza Modarres and Naji Younes Examination Committee: Chairman: Prof. Efstathia Bura GWU Members: Prof. Yinglei Lai,Reza Modarres,Naji Younes Extramural Member: Prof. Bruce Trock, Johns Hopkins University The Biostatistics Center, GWU Friends and Family 44