Compound Set Enrichment A novel approach to analysis of primary HTS data Thibault Varin Ansgar Schuffenhauer Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P. Compound Set Enrichment INTRODUCTION 2 | Compound Set Enrichment | Thibault Varin | 10/07/14 Introduction Active series identification: Can relevant SAR be extracted from primary HTS data? Are activity data binary or continuous? 3 | Compound Set Enrichment | Thibault Varin | 10/07/14 Introduction Active series identification Hypothesis 1: Within primary HTS screening data, structure activity relationships (SAR) are apparent and can be used to help selecting active compound classes. 4 | Compound Set Enrichment | Thibault Varin | 10/07/14 Introduction Are the activity data binary or continuous? Activity Scaffold 1 N Scaffold 2 N N O N Binary activity: -1 active / 5 inactives -Scaffold 1 = Scaffold 2 O Continuous activity: Scaffold 1 > Scaffold 2 Active compound (binary) Inactive compound (binary) 5 | Compound Set Enrichment | Thibault Varin | 10/07/14 Introduction Are the activity data binary or continuous? N N N Threshold 1 Threshold 2 Activity Activity Binary scaffold activity is different according to the threshold Hypothesis 2: Methods based on an activity cut-off distort the activity information leading to the incorrect assignment of active series of compounds. Active compound (binary) Inactive compound (binary) 6 | Compound Set Enrichment | Thibault Varin | 10/07/14 Compound Set Enrichment METHODS 7 | Compound Set Enrichment | Thibault Varin | 10/07/14 Methods The Scaffold Tree classification The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical Scaffold Classification A. Schuffenhauer, P. Ertl et al. J. Chem. Inf. Model., 47, 47, 2007 8 | Compound Set Enrichment | Thibault Varin | 10/07/14 Methods Datasets -7 PubChem bioassays - Ranging from 9389 to 263679 compounds - Ranging from 0.03 to 26.29% of active compounds Hypothesis 1 Simulation of the primary screening data 9 | Compound Set Enrichment | Thibault Varin | 10/07/14 PubChem Annotation from CRC Methods Single hypothesis test: summary procedure 1. State the null and the alternative hypotheses - H0: „the scaffold is inactive“ - H1: „the scaffold is active“ 2. Specify a significance level: α=0.01 3. Compute the statistics and the p-value ) →p-value=probability that the scaffold is inactive (H0) 4. Decision step: - p-value> α: H0 is accepted - p-value< α: H0 is rejected and then H1 is accepted „The scaffold is active“ 10 | Compound Set Enrichment | Thibault Varin | 10/07/14 Methods The KS and the Binomial hypothesis tests H00:: there there is is no no H difference in in the the difference proportion of activity active compounds distribution for compounds defined by having the compounds scaffoldthe S3-2 and having the proportion of scaffold S3-2 and active compounds the background for the full distribution dataset. Continuous data KS test 11 | Compound Set Enrichment | Thibault Varin | 10/07/14 Bioassay Scaffold Actives Inactives Binary data Binomial test Methods Multiple hypothesis tests: Bonferroni correction Problem of false positives • α =probability to identify as active an inactive scaffold (for each test done...) • 100 inactive scaffolds: probability to identify an „active“ by chance is equal 63% (1-0.99100)) Suggests to test each scaffold at a critical significance level equal to α = 0.01 / Nbr of scaffolds Makes the assumption that the individual tests are independent Each level in the Scaffold Tree have been done separately 12 | Compound Set Enrichment | Thibault Varin | 10/07/14 Methods Determining the activity of classes Hypo 1 Hypo 2 Scaffold activity evaluation Multiple hypothesis test correction (Bonferroni) Comparison of results 13 | Compound Set Enrichment | Thibault Varin | 10/07/14 Compound Set Enrichment RESULTS 14 | Compound Set Enrichment | Thibault Varin | 10/07/14 Results Comparison of KSP and BTP predictions BPCA significantly actives Total Bioassay KSP BTP Δ Hydroxysteroid 330 231 +99 dehydrogenase 331 114 +217 Caspase-1 BPCA KSP BPCA non significantly actives BTP Δ KSP BTP Δ +84 199 183 168 +15 147 63 5 2 2 0 329 112 +217 PK 12 4 +8 12 3 3 0 9 1 +8 Luciferase 67 12 +55 15 13 11 +2 54 1 +53 Luciferase 178 48 +130 41 32 35 -3 146 13 +133 CYP450 2C9 58 33 +25 34 34 31 +3 24 2 +22 CYP450 3A4 121 64 +57 60 60 53 +7 61 11 +50 With: -KSP: KS Prediction -BTP: Binomial Threshold Prediction -Δ: KSP-BTP -BPCA: Binomial PubChem Annotation 15 | Compound Set Enrichment | Thibault Varin | 10/07/14 Both KSP BTP retrieve Most of new KSP active classes Number ofand active classes: KSP > BTP BPCA active classes are notsignificantly BPCA significantly actives Results KSP significantly active scaffolds that are in Pubchem inactives WA Inconclusive? Inconclusives? H N WA NH S S S O WA WA Inconclusives? N O O O O N NH O HN 16 | Compound Set Enrichment | Thibault Varin | 10/07/14 Compound activity (PubChem Annotation) Active Inconclusive Inactive Results Prioritize nodes instead of individual scaffolds Scaffold activity (KS Prediction / Bonferroni) Non significantly active Significantly active 17 | Compound Set Enrichment | Thibault Varin | 10/07/14 Results Visualization tool (Peter Ertl) 18 | Compound Set Enrichment | Thibault Varin | 10/07/14 Compound Set Enrichment CONCLUSION 19 | Compound Set Enrichment | Thibault Varin | 10/07/14 Conclusion Compound Set Enrichment Validation of initial hypotheses A method to mine HTS data and identify active series of compounds • Chemical classification: Scaffold Tree • Statistical analysis: Kolmogorov-Smirnov hypothesis test • Multiple hypothesis test correction: Bonferroni correction Use all primary data No activity cut-off Identification of new active scaffolds not necessarily represented by very active compounds (latent hits) during the primary screen 20 | Compound Set Enrichment | Thibault Varin | 10/07/14 With many thanks to Acknowledgments Primary mentor: - Ansgar Schuffenhauer Help: MLI group Scientific advisers: -Christian Parker -Hanspeter Gubler -Ji-Hu Zhang -Peter Ertl -Edgar Jacoby Fellowship: Education office Discussions: -Martin Beibel -Sebastian Bergling -Meir Glick -Alain Dietrich -Marie-Cecile Didiot 21 | Compound Set Enrichment | Thibault Varin | 10/07/14 Questions? 22 | Compound Set Enrichment | Thibault Varin | 10/07/14