Introduction to Challenge 2 The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge THE DATA Fred A. Wright, Ph.D. Professor and Director of the Bioinformatics Research Center Departments of Statistics and Biological Sciences North Carolina State University amateurbrainsurgery.com 1 In vitro cytotoxicity screening of human cell lines to characterize variability and map suseptibility loci •Many caveats are obvious, but bear repeating: • limitations of the in vitro environment • cell type • sources of technical variation •On the other hand, we are working with the correct species, and there is much that can be done: • heritability analysis • identification of potential mechanisms underlying variability, mostly via genetic mapping • characterization of average response and variation across agents/chemicals, to prioritize • in vitro data used for predictive toxicity models 2 1983 1996 2007 2009 2010 3 Image courtesy of M. Andersen and D. Krewski •Much of the previous work has been in pharmacogenomics, especially cytotoxicity screening of anticancer agents •However, most of the principles apply to any agent/chemical Cytotoxicity heritability estimates from 125 lymphoblastoid cell lines (LCLs), 29 chemotherapeutic agents 4 CYTOTOXICITY PROFILING – BOILING DOWN TO A NUMBER(?) cytotoxicity (normalized % cell survival) Challenge: estimation of cytoxic response or other relevant phenotype per cell line in the presence of variation Solution: likelihood-based fitting of EC10 values, with outlier detection and batch correction log10(concentration) cytotoxicity (normalized % cell survival) Experiments done in batches log10(concentration) 5 The concept of population toxicity involves means and true variability, obscured by technical variation Measurement variation True variation across population Observed data Chemical 1 Measure of susceptibility/resistance (e.g. EC10) for one cell line has error 6 The concept of population toxicity involves means and true variability, obscured by technical variation A vulnerable subpopulation 7 The concept of population toxicity involves means and true variability, obscured by technical variation Chemical 1 Chemical 2 Chemical 3 Chemical 4 Prioritizing chemicals for vulnerable subpops depends on both means and variances Observed variability has the potential to provide finer-grained uncertainty factors in risk assessment In the high-throughput screening toxicology literature, relatively little data to support these concepts across multiple populations 8 The Challenge Data 1000+ cell lines 179 compounds (6 duplicate chemicals) 8 concentrations (0.1 nM-100 mM) 1-3 plate replicates 1 assay (ATP) = ~2,400,000 data points and 1.2x106 SNPs 10 Heatmap of the EC10 values (axes to scale) The data in context – previous cell line vs. chemical/drug studies Ranking chemicals by average cytotoxicity is of obvious interest – even with this large sample size, some uncertainty in ranking EC10 for each cell line 5th and 95th percentiles/quantiles are of interest from a risk assessment perspective. We call q95-q05 the “fold-range” 156 chemicals that are “predictable” 106 training 50 test Subchallenge 2 – predict average and fold-range from chemical descriptors 884 lines that are “unrelated” (i.e. no first degree relatives) Training Test Subchallenge 1 – predict Validation EC10 from SNPs and RNASeq data The NIEHS - NCATS - UNC DREAM Toxicogenetics Challenge OVERALL RESULTS Federica Eduati, Ph.D. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Cambridge, United Kingdom 15 Subchallenge 1: Data Comp 106 … Comp 1 Comp 2 Predict interindividual variability in cytotoxicity based on genomic profiles Train cell line 1 Train cell line 2 Cytotoxicity data (EC10) Training: - EC10 data for 106 compounds and 487 cell lines - Genotype data for 487 cell lines - RNAseq data for 192 cell lines Leaderboard cell line 1 Leaderboard cell line 2 … Leaderboard cell line 133 Cytotoxicity data (EC10) Leaderboard (released Aug 31st): - Genotype data for 133 cell lines - RNAseq data for 48 cell lines - Predict: EC10 data for 106 compounds and 133 cell lines Final Test cell line 1 Final Test cell line 2 … Final Test cell line 264 Cytotoxicity data (EC10) Final test: - RNAseq data for 97 cell lines - Genotype data for 264 cell lines … Train cell line 487 - Predict: EC10 data for 106 compounds and 264 cell lines Experimental error 2.1 1.0 ranking 1.9 0.1 Comp 1 Comp 1 Comp 1 Exact measures cell line 1 cell line 2 cell line 3 cell line 4 4 2 3 1 Comp 1 Noisy measures cell line 1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2.0 2.5 3.0 2.0 2.5 3.0 2.0 2.5 3.0 EC10 cell line 2 ranking 0.0 0.5 1.0 1.5 EC10 cell line 3 0.0 0.5 1.0 1.5 Exact order is variable if there is noise EC10 cell line 4 0.0 For each compound: 0.5 1.0 1.5 EC10 Probabilistic C-index accounts for the probabilistic nature of the gold standard To each pair of cell lines, it assigns a score given by the probability that the predicted ranking is supported by the noisy gold standard Scoring metrics • Correlation between predicted and observed values – Pearson correlation • Ranking of cytotoxicity for different cell lines – Probabilistic C-index – Spearman correlation Predictions vs null hypothesis Comp 106 SUBMISSION21 SUBMISSION SUBMISSION SUBMISSION333 SUBMISSION 3 SUBMISSIONM SUBMISSION Test cell line 1 Test cell line 2 … Test cell line N … Comp 1 Comp 2 Scoring Cytotoxicity Cytotoxicity Cytotoxicity Cytotoxicity data (EC ) Cytotoxicity 10 data (EC Cytotoxicity 10) ) data (EC Cytotoxicity 1010) data (EC data (EC )) 1010 data (EC data (EC ) … Submission M Comp 106 Mean ranking Submission 1 Submission 2 … Comp 1 Comp 2 10 1. For each submission, compute the following metrics compound by compound: a. Pearson correlation b. Probabilistic C-index 2. For each metric: a. Rank submissions for each compound b. Compute the mean ranking over all compounds c. Rank submissions according to the mean ranking 3. The final ranking is obtained averaging the ranking obtained with the 2 different metrics Robustness (sampling) analysis • Verify if the rank is robust with respect to the compounds • For 10000 times: randomly mask data for 10% of the compounds re-compute the score ranking mean ranki significantly* UT_CCB Yang_Lab different CQB Yang_Lab not significantly* O6d0A Yang_Lab different UT_CCB amss2012 CASSIS one sided Wilcoxon Yang_Lab * signed-rank test, FDR<10-10 Yang_Lab CASSIS amss2012 UT_CCB Yang_Lab O6d0A UT_CCB Yang_Lab CQB Yang_Lab O6d0A Yang_Lab UT_CCB amss2012 CASSIS Yang_Lab Yang_Lab CASSIS amss2012 UT_CCB Yang_Lab O6d0A Yang_Lab CQB Yang_Lab UT_CCB 1. 2. Wisdom of crowds 1.0 0.5 0.0 average z−score Pearson correlation single prediction 0 20 40 60 predictions 80 100 Subchallenge 2: Data Test Comp 50 … Test Comp 1 Test Comp 2 Train Comp 106 … Train Comp 1 Train Comp 2 Predict population-level parameters of cytotoxicity of chemicals based on structural attributes of compounds. Cell line 1 Cell line 2 … Cytotoxicity data (EC10) (a) Median EC10 (b) Interquantile distance (q95-q05) Cell line 620 DATA Training: - EC10 data for 106 compounds and 620 cell lines - Chemical attributes for 106 chemicals PREDICTIONS Final test: - Chemical attributes for 50 chemicals - Predict: population level parameters for 50 compounds - Median EC10 values - Interquantile distance (q95-q05) Predictions vs null hypothesis Submission 1 Submission 2 … Submission M Mean ranking Median EC10 Q95-Q05 Test Comp 1 Test Comp 2 … Test Comp 50 SUBMISSION21 SUBMISSION SUBMISSION 33 SUBMISSION 3 SUBMISSION 3 SUBMISSIONM SUBMISSION Median EC10 Q95-Q05 Scoring 1. For each submission, compute the following metrics for each predicted population parameter (median, q95-905) a. Pearson correlation b. Spearman correlation 2. For each metric: a. Rank submissions each for population parameter b. Compute the mean ranking over the 2 population parameters c. Rank submissions according to the mean ranking 3. The final ranking is obtained averaging the ranking obtained with the 2 different metrics Robustness (sampling) analysis • Verify if the rank is robust with respect to the compounds • For 10000 times: randomly mask data of 10% of the compounds re-compute the score ranking austria Austria Battelle Team newDream mlcb QBRC QBRC QBRC QBRC QBRC mean ran austria significantly Austria * Battelle Team different QBRC QBRC QBRC QBRC QBRC mlcb newDream mlcb QBRC notQBRC significantly* QBRC different QBRC QBRC QBRC QBRC QBRC QBRC QBRC mlcb newDream Battelle Team Austria austria 1. 2. * one sided Wilcoxon signed-rank test, FDR<10-10 Wisdom of crowds median 20 40 60 0.4 single prediction randomly aggregated prediction 1 6 12 19 26 33 40 47 54 61 68 75 82 predictions predictions interquantile distance (Q95−Q05) interquantile distance (Q95−Q05) 0 20 40 predictions 60 80 0.2 0.4 0.6 0.8 aggregated predictions single predictions −0.2 Pearson Correlation 0.6 0.4 0.2 −0.2 Pearson Correlation 0.2 80 0.8 0 0.6 0.8 aggregated predictions single prediction −0.2 Pearson Correlation 0.6 0.4 0.2 −0.2 Pearson Correlation 0.8 median single prediction randomly aggregated prediction 1 6 12 19 26 33 40 47 54 61 68 75 82 predictions Conclusions • Predictive models of toxicity were developed by participants, great response from the community: – Subchallenge 1: 99 submissions from 34 teams – Subchallenge 2: 85 submissions from 24 teams • predictions were scored against a hidden test set • top performing models provide significant predictions that could be useful to assess health risk • best performers are robustly ranked first, but there are other models which provide good predictions – wisdom of crowds: the aggregation of predictions can increase overall performances Rebecca Boyles Allen Dearry Raymond Tice Christopher Austin Ruili Huang Anton Simeonov Menghang Xia Nour Abdo Paul Gallins Oksana Kosyk Ivan Rusyn Jessica Wignall Fred Wright Kai Xia Yi-Hui Zhou Chris Bare Stephen Friend Mike Kellen Lara Mangravite Thea Norman Federica Eduati Michael Menden Kely Norel Julio Saez-Rodriguez Gustavo Stolovitzky 213 participants