Presentation

Training Data (4 treatments) FGFR1/3i AKTi AKTi+MEKi DMSO All Data (N treatments) Participants infer 32 networks using training data Test Data (N-4 treatments) Test1 Test2 …. Test(N-4) Inferred networks assessed using test data • No definitive “gold standard” causal networks • Use a novel held-out validation approach, emphasizing causal aspect of challenge Assessment: How well do inferred causal networks agree with effects observed under inhibition in test data? Step 1: Identify “gold standard” with a paired t-test to compare DMSO and test inhibitors for each phosphoprotein and cell line/stimulus regime Phospho2 (a.u.) phospho1 (a.u.) e.g. UACC812/Serum, Test1 DMSO p-value = 3.2x10-5 Test1 DMSO p-value = 0.45 Test1 time time phosphoproteins Test1 0 1 1 0 1 0 0 1 0 0 “gold standard” Step 2: Score submissions 0.67 ⋮ 0.58 Matrix of predicted edge scores for a single cell line/stimulus regime ⋯ 0.43 ⋱ ⋮ ⋯ 0.87 threshold, τ 1 ⋮ 0 Test1 Obtain protein descendants downstream of test inhibitor target ⋯ 0 ⋱ ⋮ ⋯ 1 phosphoproteins Test1 1 0 1 0 1 FP 0 1 TP TP 1 FP TP Compare descendants of test inhibitor target to “gold standard” list of observed effects in held-out data #TP(τ), #FP(τ) Vary threshold τ ROC curve and AUROC score # TP AUROC # FP 0 0 • 74 final submissions • Each submission has 32 AUROC scores (one for each cell line/stimulus regime) 3.58 x 10-6 4.18 x 10-6 non-significant AUROC significant AUROC best performer 8.98 x 10-6 9.19 x 10-4 1. For each submission and each cell line/stimulus pair, compute AUROC score 32 cell line/stimulus pairs Submissions Scoring procedure: 0.5 0.7 0.9 0.6 0.5 0.8 0.7 0.4 0.7 0.6 AUROC 0.8 scores 0.5 4. Mean rank across cell line/stimulus pairs calculated for each submission Rank submissions according to mean rank 3 2 mean 1.33 rank 3.66 4 2 1 3 3 1 2 4 2 3 1 4 Submissions 3. Submissions ranked for each cell line/stimulus pair Submissions 2. Submissions 32 cell line/stimulus pairs 3 2 1 4 final rank AUROC ranks • Verify that final ranking is robust Procedure: 1. Mask 50% of phosphoproteins in each AUROC calculation 2. Re-calculate final ranking 3. Repeat (1) and (2) 100 times rank 5.40 x 10-10 Top 10 teams phosphoproteins • Gold-standard available: Data-generating causal network ER-alpha_pS118 HER2_pY1248 EGFR_pY1173 Src_pY416 PKC-alpha_pS657 Src_pY527 S6_pS235_S236 p38_pT180_Y182 Rb_pS807_S811 C-Raf_pS338 p27_pT198 p90RSK_pT359_S363 • Participants submitted a single set of edge scores MEK1_pS217_S221 JNK_pT183_pT185 GSK3-alpha-beta_pS21_S9 p70S6K_pT389 S6_pS240_S244 MAPK_pT202_Y204 AMPK_pT172 Bad_pS112 • Edge scores compared against gold standard -> AUROC score Akt_pS473 mTOR_pS2448 STAT3_pY705 PRAS40_pT246 4E-BP1_pS65 PDK1_pS241 ACC_pS79 YAP_pS127 • Participants ranked based on AUROC score 3.11 x 10-11 Robustness Analysis: 1. Mask 50% of edges in calculation of AUROC 2. Re-calculate final ranking 3. Repeat (1) and (2) 100 times non-significant AUROC (51) rank 3.90 x 10-14 significant AUROC (14) best performer Top 10 teams • 59 teams participated in both SC1A and SC1B • Reward for consistently good performance across both parts of SC1 • Average of SC1A rank and SC1B rank • Top team ranked robustly first Training Data (4 treatments) FGFR1/3i AKTi AKTi+MEKi DMSO All Data (N treatments) Test Data (N-4 treatments) Test1 Test2 …. Test(N-4) Participants build dynamical models using training data and make predictions for phosphoprotein trajectories under inhibitions not in training data Predictions assessed using test data • Participants made predictions for all phosphoproteins for each cell line/stimulus pair, under inhibition of each of 5 test inhibitors • Assessment: How well do predicted trajectories agree with the corresponding trajectories in the test data? • Scoring metric: Root-mean-squared error (RMSE), calculated for each cell line/phosphoprotein/test inhibitor combination e.g. UACC812, Phospho1, Test1 RMSE r p , c ,i 1 T S r 2 ˆ  ( x  x )  p,c,i,s,t p,c,i,s,t TS t 1 s 1 • 14 final submissions 1.35 x 10-4 3.70 x 10-8 non-significant AUROC significant AUROC best performer 1.49 x 10-5 1.21 x 10-6 Final ranking: Analogously to SC1A, submissions ranked for each regime and mean rank calculated • Verify that final ranking is robust Procedure: 1. Mask 50% of data points in each RMSE calculation 2. Re-calculate final ranking 3. Repeat (1) and (2) 100 times 3.04 x 10-18 rank 6.97 x 10-5 Incomplete submission 0.99 2 best performers Top 10 teams • Participants made predictions for all phosphoproteins for each stimulus regime, under inhibition of each phosphoprotein in turn 1 T S r • Scoring metric is RMSE and procedure r RMSE p ,i  ( xˆ p ,i ,s ,t  x p ,i ,s ,t )2  TS t 1 s 1 follows that of SC2A 0.015 1.68 x 10-14 2.89 x 10-7 non-significant AUROC significant AUROC best performer 7.71 x 10-19 rank 1.0 Robustness Analysis: 1. Mask 50% of data points in each RMSE calculation 2. Re-calculate final ranking 3. Repeat (1) and (2) 100 times Incomplete submission Top 10 teams 0.99 • 10 teams participated in both SC2A and SC2B • Reward for consistently good performance across both parts of SC2 • Average of SC2A rank and SC2B rank • Top team ranked robustly first • 14 submissions • 36 HPN-DREAM participants voted – assigned ranks 1 to 3 • Final score = mean rank (unranked submissions assigned rank 4) • Submissions rigorously assessed using held-out test data: • SC1A: Novel procedure used to assess network inference performance in setting with no true “gold standard” • Many statistically significant predictions submitted For further investigation: • Explore why some regimes (e.g. cell line/stimulus pairs) are easier to predict than others • Determine why different teams performed well in experimental and in silico challenges • Identify the methods/approaches that yield the best predictions • Wisdom of crowds – does aggregating submissions improve performance and lead to discovery of biological insights?

Presentation

Related documents

Products

Support

Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib