A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer May 16, 2013 Levi Waldron Supervisors: Curtis Huttenhower and Giovanni Parmigiani Harvard School of Public Health Department of Biostatistics Dana Farber Cancer Institute Biostatistics and Computational Biology Predictive modeling for translational genomics • Measure Xij (gene expression, mutations, …) • Predict Yj (survival, treatment response, …) 2 Training + Validation Cross-validation to estimate prediction accuracy Independent Validation training test Dataset 2 Dataset 3 Lasso Ridge Elastic Net Random Forests Support Vector Machine K Nearest Neighbors Supervised PCA Linear Discriminant Analysis Boosting / Bagging Insert Favorite Method Here • Need a new cohort of patients • Can use public data 3 Prognostic gene signatures of ovarian cancer Objectives: 1. Assess the reproducibility of published prognostic gene expression models 2. Evaluate published models using publicly available data 3. Improve on models using all publicly available data 4. Validate promising models in FFPE specimens from GOG-218 bevacizumab phase-III clinical trial With Michael Birrer, MD (MGH) 4 4 23 ovarian cancer microarray studies Machine syntax check C U R A T I O N ID Debulk Status D2640 S 0 ID ch1.3 Status GSM123 opt DOD … … sampleid debulking vital_status D2640 suboptimal living sampleid debulking vital_status GSM123 optimal deceased … ✔ ✔ … Human double check Available in Bioconductor (v2.12): Y Download expression data Affymetrix platform Y (f) RMA re-normalization Raw data? > source("http://bioconductor.org/biocLite.R") N N > biocLite("curatedOvarianData") Collapse probesets to genes Probeset Gene GSM123 GSM124 204531_s_at BRCA1 4.0 4.1 211851_x_at BRCA1 5.0 6.0 Automatically build documented curatedOvarianData R package Gene GSM123 GSM124 BRCA1 5.0 6.0 B.F. Ganzfried, M. Riester, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, X. V. Wang, M. Ahmadifar, M. Birrer, G. Parmigiani, C. Huttenhower, L. Waldron. curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer Transcriptome (DATABASE 2013). 5 Meta-analysis overview Literature review Prognostic models 101 papers from Pubmed search Five review papers Inclusion Criteria Training sample size > 40 Focus on late-stage serous Multivariate model Continuous risk score Claims to predict survival Possible to reproduce model 14 prediction models implemented 100 pages documentation survHD Bioconductor package curatedOvarianData Standardized clinical annotation and gene ID 23 studies, 2,908 samples Inclusion Criteria Sample size > 40 Primary tumors Overall survival available Events (deaths) > 15 Late stage, high grade tumors Serous subtype 10 datasets, 1,386 samples Assessment of prognostic signatures Validation Statistics: 14 Models in 10 Datasets 14 prognostic signatures C-Index = Pr(g(Z1)>g(Z2) | T2>T1) T1, T2 = times to death of two patients g(Z1), g(Z2) = predicted risk scores C=0.5 expectation for random prediction C=1 if the exact order of all deaths is predicted Forest plot Study Survival Kaplan-Meier estimate 10 microarray datasets L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. Submitted. C-Index Time 7 Assessment of prognostic models 14 prognostic signatures Validation Statistics: 14 Models in 10 Datasets 10 microarray datasets L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. Submitted. 8 Assessment of prognostic models 14 prognostic signatures Validation Statistics: 14 Models in 10 Datasets Cancer Genome Atlas Research Network. Nature. 2011 474(7353):609-15. Integrated genomic analyses of ovarian carcinoma. Bonome et al. Cancer Res. 2008 68(13):5478-86. A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. 193 10 263 10 microarray datasets L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. Submitted. 9 A little gene overlap corresponds to substantial risk score similarity Risk scores Correlations Gene overlap 10 Assessment of prognostic models 14 prognostic signatures Validation Statistics: 14 Models in 10 Datasets Dressman et al. J Clin Oncol. 2007 25(5):517-25. An integrated genomicbased approach to individualized treatment of patients with advancedstage ovarian cancer. Baggerly et al. J Clin Oncol. 2008 26(7):1186-7. Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. Dressman et al. J Clin Oncol. 2012 30(6):678. Retraction. 10 microarray datasets L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. Submitted. 11 Assessment of prognostic models Validation Statistics: 14 Models in 10 Datasets 14 prognostic signatures Conclusions: • Validation datasets can be biased • Most models make better predictions than random • Large, consortium studies performed best • None of these models are ready for the clinic L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. Submitted. 12 Assessment of gene signatures (not models) • • • • Start with a signature defined as a list of genes Fit a simple prediction algorithm (β = ±1) Compute “leave-one-in” matrix of C-statistics Repeat with random gene sets Test sets Training sets 1 2 3 4 5 1 CV Z12 Z13 Z14 Z15 2 Z21 CV Z23 Z24 Z25 3 Z31 Z32 CV Z34 Z35 4 Z41 Z42 Z43 CV Z45 5 Z51 Z52 Z53 Z54 CV 13 Assessment of gene signatures About half of gene signatures provide prognostic “value added” over 97.5% of gene random signatures L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. 14 Prediction of surgical debulkability • Standard treatment includes surgical debulking, but it is suboptimal for ~50% cases • What if we could predict suboptimal debulking from the biopsy? M. Riester, W. Wei, L. Waldron, A. C. Culhane, L. Trippa, F. Michor, C. Huttenhower, G. Parmigiani, M. Birrer. Risk prediction for late-stage ovarian cancer by meta-analysis of 1,622 patient samples: Biologic and Clinical Correlations. Validation of a meta-analysis discovery: prediction of suboptimal debulking Stage 1: public data 200-gene signature 16 Validation of a meta-analysis discovery: prediction of suboptimal debulking qRT-PCR 8-gene signature 78 new specimens from Bonome et al. study Compare to AUC ~ 0.6 in microarray validation 17 Validation of a meta-analysis discovery: prediction of suboptimal debulking 179 new specimens from tissue microarray Immunohistochemistry 3-protein signature Number of Cases POSTN Immunohistochemistry - + ++ +++ Compare to AUC ~ 0.6 in microarray validation 18 Outlook: Meta-analysis and Validation • Meta-analysis for prediction modeling works – Provides sample size – Identifies and mitigates dataset-specific bias • qRT-PCR and protein assays can dramatically improve prediction accuracy • Model testing in meta-analysis by: – “leave-one-dataset-in” cross-validation – “leave-one-dataset-out” cross-validation 19 Reproducible analysis 20 Thank you Giovanni Parmigiani lab Markus Riester, Dave Zhao, Cristian Tomasetti, Emmanuele Mazzola, Jie Ding, Svitlana Tyekucheva, Victoria Wang, Ina Jazic, Ben Ganzfried, Romi Magori-Cohen Curtis Huttenhower lab Nicola Segata, Tim Tickle, Xochitl Morgan, Daniela Boernigen, Eric Franzosa, Brian Palmer, Joseph Moon, Emma Schwager, Jim Kaminski, Craig Bielski, Vagheesh Narasimhan MGH – Boston Michael Birrer Dana-Farber Cancer Institute Lorenzo Trippa University of Montreal Benjamin Haibe-Kains 21 HR increases with training sample size for most test sets 22 RNA-seq vs. microarray validation TCGA validation dataset 23 Manuscripts and publications 1. B.F. Ganzfried* and M. Riester*, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, X. V. Wang, M. Ahmadifar, M. Birrer, G. Parmigiani, C. Huttenhower, L. Waldron. curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer Transcriptome (DATABASE 2013). 2. L. Waldron, B. Haibe-Kains, A. C. Culhane, M. Riester, J. Ding, V. Wang, S. Tyekucheva, C. Bernau, T. Risch, B. Ganzfried, C. Huttenhower, M. Birrer, G. Parmigiani. A comparative meta-analysis of prognostic gene signatures for latestage ovarian cancer (submitted). 3. M. Riester, W. Wei, L. Waldron, A. C. Culhane, L. Trippa, F. Michor, C. Huttenhower, G. Parmigiani, M. Birrer. Risk prediction for late-stage ovarian cancer by metaanalysis of 1,622 patient samples: Biologic and Clinical Correlations (submitted). 4. D. Zhao, C. Huttenhower, G. Parmigiani, L. Waldron. Mas-o-menos: a simple sign average method for discrimination in genomic data analysis (submitted, preprint at http://biostats.bepress.com/harvardbiostat/paper158/). 5. L. Trippa, L. Waldron, C. Huttenhower, G. Parmigiani. Cross-study validation of prediction methods. (submitted). 24