Analysis of dose-response microarray data using Bayesian Variable Selection (BVS) methods: Modeling and multiplicity adjustments Ziv Shkedy, Martin Otava, Adetayo Kasim and Dan Lin Center for Statistics (CenStat), Hasselt University, Belgium and Durham University, UK 7th meeting of the Eastern Mediterranean Region of the International Biometric Society (EMR-IBS) Tel – Aviv 22/04 – 25/04,2013 Research team Hasselt University, Belgium • Dan Lin. • Ziv Shkedy. • Martin Otava Durham University, UK • Adetyo Kasim. Imperial College, UK • Bernet Kato. Johnson & Johnson Pharmaceutical • • • • Luc Bijnens. Willem Talloen. Hinrich Gohlmann. Dhammika Amaratunga Overview • Introduction to dose-response modeling in microarray experiments. • Primary interest: selection of a subset of genes with significant monotone dose-response relationship. • Focus: 1. Estimation and inference under order restrictions. 2. Multiplicity adjustment. • Methodology: Bayesian Variable Selection models. 3 Dose-response microarray experiment Example of four genes. Different dose-response relationships. 4 dose levels. 16988 genes. Primary Interest: detection of genes with monotone doseresponse relationship 4 Estimation and inference under order restrictions •Primary interest: discovery of genes with monotone relationship with respect to dose. • Order restricted inference. •Simple order (=monotone) alternatives. H 0 : 0 1 ,..., K H1 : 0 1 ,..., K H01 , H02 ,...,H0 g ,...,H0m 16988 null hypotheses to test 5 Model formulation (1) •Gene specific model •One-way ANOVA with order restricted parameters. •Simple order (monotone profiles). Yij ~ N i , 2 0 1 ,..., K •The order constraints are build into the specification of the prior distributions (Gelfand, Smith and Lee, 1992). Model formulation (1) •Likelihood: Yij ~ N i , 2 •Specification of the prior : i ~ N , 2 I (i 1, i1 ) • i ~ N , 2 unconstrained prior. N , 2 P | , 0 i 1 i i 1 otherwise 7 Model formulation (2) •Re formulation of the mean structure: Yij ~ N i , 2 i i 0 1 dose 0 ~ N , I (0, ) 2 0 ~ N , 2 •For a dose-response experiment with 4 dose levels (control + 3 doses): c d1 d2 d3 mean 0 0 1 0 1 2 0 1 2 3 i 0 0 1 ,..., K •We fitted two monotone models: g7 : 0 1 2 3 8.6 8.8 g7 : 0 1 2 3 8.4 g5 : 0 1 2 3 g5 : 0 1 2 3 8.2 gene expression 9.0 Example of one gene (13386) 1.0 1.5 2.0 2.5 dose 3.0 3.5 4.0 Equality constraints are replaced with a single parameter. Inference g5 : 0 1 2 3 •Simple order alternative. dose H 0 : 0 1 ,..., K c d1 d2 d3 H1 : 0 1 ,..., K i i 0 mean 0 0 0 2 0 2 3 1 0 1 8.4 8.6 8.8 9.0 g5 : 0 1 2 3 8.2 gene expression 0 i i1 1.0 1.5 2.0 2.5 dose 3.0 3.5 4.0 10 All possible monotone dose-response models •Simple order alternative. g 0 : 0 1 2 3 g1 : 0 1 2 3 g 2 : 0 1 2 3 g 3 : 0 1 2 3 g 4 : 0 1 2 3 g 5 : 0 1 2 3 g 6 : 0 1 2 3 g 7 : 0 1 2 3 •The null model H1 : 0 1 2 3 •We decompose the simple order alternative to all sub alternative. 11 All possible monotone dose-response models •4 dose levels: g5 : 0 1 2 3 g0 : 0 1 2 3 1 0, 2 0, 3 0 1 0, 2 0, 3 0 12 Bayesian variable selection: model formulation for order restricted model •The mean structure: i i 0 1 •Bayesian Variable Selection: a procedure of deciding which of the model parameters is equal to zero. •Define an indicator variable: 1 i zi 0 i included in the model not Included in the model 13 Bayesian variable selection: model formulation for order restricted model •The mean structure for a candidate model: gr S K 1 : 0 , 1,...,K K i 0 zi i i 1 0 ~ N , 2 i ~ N , 2 I (0, ) zi ~ B( i ) i ~ U (0,1) Order restrictions Variable selection ESTIMATION INFERENCE and MODEL SELECTION 14 The posterior probability of the null model •The posterior probability that the triplet equal to zero: z (0,0,0) p( z ( z1 0, z2 0, z3 0) | data, R) K i 0 zi i 0 1 2 3 i 1 p( z (0,0,0) | data, R) p( g0 | data, R) 15 Example: gene 3413 0.5 5.8 •The highest posterior probability is obtained for the null model (0.514). •Shrinkage through the overall mean. 0.4 p( g0 | data, R) 0.514 0.3 5.4 0.2 5.2 0.1 5.0 BVS 0.0 4.8 gene expression 5.6 g_7 BVS null 1.0 1.5 2.0 2.5 dose 3.0 3.5 4.0 g0 g3 g2 g6 g1 g4 g5 g7 16 Example: gene 13386 0 1 2 3 p( g0 | data, R) 0.001 0 1 2 3 p( g1 | data, R) 0.4059 0 1 2 3 p( g5 | data, R) 0.4186 0.4 •The highest posterior probability is obtained for model g5. •Data do not support the null model. 0.3 0.2 8.8 8.6 0.1 8.4 0.0 8.2 gene expression 9.0 g_7 g_5 BVS 1.0 1.5 2.0 2.5 dose 3.0 3.5 4.0 g0 g3 g2 g6 g1 g4 g5 g7 Multiplicity adjustment •Primary interest: discovery of subset of genes with monotone relationship with respect to dose. N ( ) The number of genes in the discovery list. 1 Ig 0 pg ( g 0 | data, R) gene g is included in the discovery list pg ( g 0 | data, R) gene g is not included in the discovery list m cFD( ) cFDR( ) N ( ) P g g 1 g 0 | data, R I g N ( ) Multiplicity adjustment τ p g g cFDR(0.102) 0 | z , data, R 3295 5% The expected error rate for the list with all genes for which the posterior probability of the null model < 0.102 are included. 19 Discussion & To Do list • BVS methods: estimation and inference. • Multiplicity adjustment is based on the posterior probability of the null model. • Connection between BVS and MCT. • Connection between BVS and Bayesian model averaging. • BVS for order restricted but non monotone alternatives (umbrella alternatives/partial order alternatives). • Posterior probabilities for the number of levels and the level probabilities for isotonic regressions. Thank you! 21