The Broad-DREAM Gene Essentiality Prediction Challenge: Motivation, Data, Scoring, and Results Mehmet Gönen Department of Biomedical Engineering Oregon Health & Science University Genome-wide RNAi screens in a large number of cell lines Pathway Activation Mutational analysis Copy number Dependencies Extensive molecular characterization of cell lines Gene expression • Project Achilles is a large collaboration between the Broad Institute and the Dana-Farber Cancer Institute to identify and catalog genetic vulnerabilities across hundreds of molecularly characterized cancer cell lines. The Broad-DREAM Gene Essentiality Prediction Challenge is designed to help addressing this goal, specifically to look for models that can predict these vulnerabilities from biomarkers. • Current public datasets Achilles v2.0 and v2.4.3 contain 102 and 216 cell lines, respectively http://www.broadinstitute.org/achilles Querying genes for essentiality values across cell lines Downloading data at the shRNA or gene level Forwarding data to GENE-E or GenePattern module PARIS shRNA Loss of Function Screens Overview Infect a cell line with the shRNA library shRNA Library Cancer Cell Line: Each cell in the population gets one single shRNA Measure Essentiality based on proportion of shRNAs present at the end of the experiment compared to the start Readout is Next Generation Sequencing (previously array-based hybridization) of shRNA barcode sequences to get counts per shRNA, per cell line Red and Green in this example are depleted during the experiment and therefore have a negative effect on viability (are ‘essential’ to viability). Orange is neutral to viability. Blue and Yellow are enriched during the experiment and therefore have a positive effect on viability. Assay Development Screening Raw data QC Final Data Deconvolution Data processing GenePattern modules Quality control GenePattern modules FilterLowshRNAs FPmatching shRNAremoveOverlap ReplicatesQC removeSamples shRNAfoldChange NormLines shRNAcollapseReps shRNAmapGenes Removes undesirable shRNAs and cell line replicates: • Those shRNAs low in initial DNA pool or that overlap in sequence. • Replicate samples that fail QC • Calculates fold change values between initial DNA pool values and final cell line values. • Normalizes cell lines to the same scale (quantile, ZMAD, PMAD) • Collapses replicate cell line samples to a single values per shRNA • Maps shRNAs to gene symbols based on a mapping file .gct file of shRNA scores per cell line ATARiS or Demeter .gct file of gene scores per cell line http://genepattern.broadinstitute.org/gp/ RNAi Data is Challenging • shRNAs targeting the same gene produce different effects Viability data for one gene in one sample • Sources of variability Varying suppression levels Off-target effects Noise shRNA-5 shRNA-4 • No experimental way to fully assess shRNA performance Especially off-target suppression shRNA-3 • Naïve averaging of the values will hurt the signal shRNA-1 shRNA-2 -3 -2 -1 0 1 Cell death 2 DEMETER infers gene dependencies by explicitly modeling both gene- and seed-effects Gene effects (DEMETER solutions) α1 shRNA #1 β1 Seed seq #1 Gene A α2 shRNA #2 β2 Seed seq #2 Gene B α3 shRNA #3 β3 Seed seq #3 […] shRNA readouts . . . shRNA #4 shRNA #5 Seed effects . . . Seed seq #4 Seed seq #5 shRNAi = µi + a i ×Gene(i ) + bi × Seed(i) + e shRNA #6 Seed seq #6 shRNA #7 Seed seq #7 shRNA #8 Seed seq #8 shRNA #9 Seed seq #9 shRNA #10 Seed seq #10 0 £ a i , bi £ 1 […] […] Challenge Data Release • Phase 1 (89 cell lines): from June 2 to August 11 45 in training set (fully available) 22 in leaderboard set (hidden gene essentiality) 22 in test set (fully hidden) • Phase 2 (149 cell lines): from August 11 to September 1 66 in training set (fully available) 33 in leaderboard set (hidden gene essentiality) 50 in test set (fully hidden) • Phase 3 (149 cell lines): from September 1 to September 28 105 in training set (fully available) 44 in test set (hidden gene essentiality) Sub-challenge 1 Question • Build a model that best predicts the gene essentiality scores of thousands of genes, using the molecular characteristics/features of the cancer cell lines 23288 copy number features 18960 gene expression features 1667 mutation features 14738 gene essentiality scores Sub-challenge 1 Scoring • We use Spearman’s rank correlation coefficient to evaluate the performance measured gene essentiality scores … ρ1 ………. ρi predicted gene essentiality scores … ………. ρ14738 • Overall score is the mean of 14738 correlation values Sub-challenge 2 Question • Identify the most predictive features for each gene essentiality score of a prioritized list of 2647 genes 23288 copy number features K1 features 18960 gene expression features 1667 mutation features K2 features K3 features K1 + K2 + K3 ≤ 10 single gene essentiality score Sub-challenge 3 Question • Identify the most predictive features for all gene essentialities of a prioritized list of 2647 genes 23288 copy number features K1 features 18960 gene expression features 1667 mutation features K2 features K3 features K1 + K2 + K3 ≤ 100 2647 gene essentiality scores Sub-challenges 2 & 3 Scoring • We use Spearman’s rank correlation coefficient to evaluate the performance measured gene essentiality scores … ρ1 ………. ρi predicted gene essentiality scores … ………. ρ2647 • Overall score is the mean of 2647 correlation values Challenge Overview • 300 registered participants • 2896 leaderboard submissions – 1621 in sub-challenge 1 – 585 in sub-challenge 2 – 690 in sub-challenge 3 • 48 teams with final submissions – 21 in sub-challenge 3 – 13 in sub-challenge 2 – 14 in sub-challenge 3 • 22 unique teams with final submissions • 52 topics in the discussion forum • 233 posts in the discussion forum Overall Scores in Sub-challenge 1 Separated from the rest (next slide) 0.25 Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Teams Overall Comparison in Sub-challenge 1 Teams Teams 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 0.5 0.43 0.47 0.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0.43 0.5 0.46 0.22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0.47 0.46 0.5 0.19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0.17 0.22 0.19 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0.5 0.5 0.49 0.41 0.06 0.01 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0.5 0.5 0.49 0.41 0.06 0.01 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0.49 0.49 0.5 0.43 0.06 0.01 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0.41 0.41 0.43 0.5 0.09 0.01 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0.06 0.06 0.06 0.09 0.5 0.21 0.02 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0.01 0.01 0.01 0.01 0.21 0.5 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0.5 0.01 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0.01 0.5 0.1 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0.1 0.5 0.02 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0.02 0.5 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.22 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.22 0.5 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.02 0.1 0.1 • Wilcoxon signed rank test over 14738 correlation scores for each team pair • p-values are displayed • Blue: significantly different • White: no significant difference Nonparametric Friedman’s Test used to detect differences in treatments across multiple blocks Score matrix (sij) M1 M2 M3 M4 G1 0.5 0.3 0.2 0.6 G2 0.2 0.5 0.5 G3 0.1 0.3 G4 0.3 G5 G6 Rank matrix (rij) M1 M2 M3 M4 G1 2 3 4 1 0.4 G2 4 1.5 1.5 3 0.2 0.4 G3 4 2 3 1 0.2 0.1 0.4 G4 2 3 4 1 0.4 0.1 0.4 0.3 G5 1.5 4 1.5 3 0.2 0.4 0.3 0.1 G6 3 1 2 4 2.75 2.42 2.67 2.17 2.50 If the p-value is significant, appropriate post-hoc multiple comparisons tests would be performed Detailed Comparison of Top 4 Teams • Nonparametric Friedman’s test on rankings over 14738 correlation scores of top four teams 1 • Methods are different with a p-value = 3e-4 Teams 2 • Tukey’s honestly significant difference criterion as the post-hoc test 3 4 2.46 2.48 2.50 2.52 Average rank 2.54 2.56 • Top three teams could not be separated from each other Wisdom of Crowds in Sub-challenge 1 0.25 Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Teams Sub-challenge 2 Overall Scores Separated from the rest 0.25 (next slide) Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 Teams 8 9 10 11 12 13 Overall Comparison in Sub-challenge 2 Teams Teams 1 2 3 4 5 6 7 8 9 10 11 12 13 1 0.5 0.04 0 0 0 0 0 0 0 0 0 0 0 2 0.04 0.5 0.15 0.06 0.06 0 0 0 0 0 0 0 0 3 0 0.15 0.5 0.32 0.31 0.04 0.01 0.01 0 0 0 0 0 4 0 0.06 0.32 0.5 0.48 0.09 0.03 0.04 0 0 0 0 0 5 0 0.06 0.31 0.48 0.5 0.1 0.04 0.04 0 0 0 0 0 6 0 0 0.04 0.09 0.1 0.5 0.3 0.34 0.05 0.06 0 0 0 7 0 0 0.01 0.03 0.04 0.3 0.5 0.45 0.13 0.16 0 0 0 8 0 0 0.01 0.04 0.04 0.34 0.45 0.5 0.1 0.13 0 0 0 9 0 0 0 0 0 0.05 0.13 0.1 0.5 0.45 0.02 0 0 10 0 0 0 0 0 0.06 0.16 0.13 0.45 0.5 0.01 0 0 11 0 0 0 0 0 0 0 0 0.02 0.01 0.5 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0.5 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0.5 • Wilcoxon signed rank test over 2647 correlation scores for each team pair • p-values are displayed • Blue: significantly different • White: no significant difference Wisdom of Crowds in Subchallenge 2 0.25 Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 Teams 8 9 10 11 12 13 Overall Scores in Sub-challenge 3 Separated from the rest (next slide) 0.25 Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 Teams 9 10 11 12 13 14 Overall Comparison in Sub-challenge 3 Teams Teams 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 0.5 0.04 0 0 0 0 0 0 0 0 0 0 0 0 2 0.04 0.5 0.09 0.03 0 0 0 0 0 0 0 0 0 0 3 0 0.09 0.5 0.3 0 0 0 0 0 0 0 0 0 0 4 0 0.03 0.3 0.5 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0.5 0.26 0 0 0 0 0 0 0 0 6 0 0 0 0 0.26 0.5 0.02 0 0 0 0 0 0 0 7 0 0 0 0 0 0.02 0.5 0.3 0.19 0.11 0 0 0 0 8 0 0 0 0 0 0 0.3 0.5 0.35 0.24 0 0 0 0 9 0 0 0 0 0 0 0.19 0.35 0.5 0.36 0 0 0 0 10 0 0 0 0 0 0 0.11 0.24 0.36 0.5 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0.5 0.19 0 0 12 0 0 0 0 0 0 0 0 0 0 0.19 0.5 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 • Wilcoxon signed rank test over 2647 correlation scores for each team pair • p-values are displayed • Blue: significantly different • White: no significant difference Wisdom of Crowds in Sub-challenge 3 0.25 Overall score 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 Teams 9 10 11 12 13 14 Further Analysis for Feature Selection in Sub-challenge 2 Copy number Gene expression Mutation 10 Number of features 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 Teams 8 9 10 11 12 13 Further Analysis for Feature Selection in Sub-challenge 3 Gene expression Mutation Copy number 100 Number of features 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Teams 9 10 11 12 13 14 Further Analysis for Predictability of Genes 400 200 0 1000 Frequency 3000 Sub−challenge 2 0 0.0 0.5 −0.5 1.0 0.0 Scores 0.5 Scores 200 400 600 Sub−challenge 3 0 −0.5 Frequency Frequency Sub−challenge 1 −0.5 0.0 0.5 Scores 1.0 1.0 Acknowledgments • Organizers Mehmet Gönen, Adam Margolin (OHSU) Barbara Weir, Aviad Tsherniak, Sara Howell (Broad Institute) Daniel Marbach (EPFL) Bruce Hoff, Thea Norman (Sage Bionetworks) Gustavo Stolovitzky (IBM Computational Biology Center) • Funding Broad Institute (Projects Achilles and CCLE) and Dana-Farber Cancer Institute NCI Cancer Target Discovery and Development Network (CTDD) NCI Integrative Cancer Biology Program (ICBP) • Journal Partner Nature Biotechnology (Craig Mak and Andy Marshall) Best Performing Teams • Sub-challenge 1 BERL (Masayuki Karasuyama and Hiroshi Mamitsuka) UPS (Vladislav Uzunangelov, Sahil Chopra, Kiley Graim, Daniel Carlin, Yulia Newton, Alden Deran, Adrian Bivol, Sam Ng, Kyle Ellrott, Evan Paull, Artem Sokolov, and Joshua M. Stuart) wtwt5237 (Tao Wang, Xiaowei Zhan, Hao Tang, Yang Xie, and Guanghua Xiao) • Sub-challenge 2 Guanlab_UMich (Fan Zhu and Yuanfang Guan) • Sub-challenge 3 Team FAT (Peddinti Gopalacharyulu, Alok Jaiswal, Kerstin Bunte, Suleiman Khan, Jing Tang, Antti Airola, Krister Wennerberg, Tapio Pahikkala, Samuel Kaski, and Tero Aittokallio)