in silico small molecule discovery Target gene Discover Hit to lead hit Target gene High identified throughput with a viable screen assay in silico Optimise Clinical Sales lead Case 1 – receptor structure known dock molecules into receptor Computer Database of Molecules 100,000 + Novel in silico hits ~ 100 Secondary assay ? hits IC50 < 10 µM How successful is this method? • From Shoichet’s group on target – protein tyrosine phosphate 1B Method Compounds tested Hits with IC50 < 10µ M Hit rate High throughput screening (HTS) 400,000 6 0.001% In silico docking 365 from docking 18 5% • None of the in silico hits found by HTS • But unpredictable - other systems yielding < 1% How does one get the receptor structure? • X-ray structure available already at RCSB databank • Set up a structure determination • Predict structure X-ray crystallography pipeline Cloning Protein structure Recombinant protein Expression Electron density map Protein purification – mg quantities X-ray diffraction pattern Crystallization Protein crystals Prediction protein structure by homology Query sequence Match sequence against library of known folds Matched fold Phyre- www.sbg.bio.ic.ac.uk Phyre and predecessor 3DPSSM > 1,000 citations Case 2: Ligand activity data available Observed activity Structureactivity rules Screen Novel in silico hits database INDDExTM –A logic-based method • Muggleton & Sternberg developed a logic-based strategy • Method now incorporated into INDDEx within an Imperial spin-out Equinox Pharma • INDDEx designed to exploit availability of active and inactive data on a at least c. 5 but ideally more ligands Logic-rules lead to new chemotypes 7Å B B C C D A Fragment C is bonded to fragment D Fragment B is bonded to fragment C Fragment A is 7Å from fragment B INDDEx can learn complex rule from simpler facts 7Å B C D A Fragment A is 7Å from fragment B which is bonded to fragment C which is bonded to fragment D Rules can be understood by chemists Standard programs: Activity = 0.45 LogP + 0.56667 Lumo +1.65 V ILP rule: In an active molecule: Fragment A is 7Å from fragment B which is bonded to fragment C which is bonded to fragment D 7Å A B C D Blind trial of hit discovery on GPCR-1 Data from literature Observed activity - From Literature INDDEx 250 novel in silico hits in silico at Equinox Equinox outsourced wet chemistry and biology 30 Verified in vitro hits NEW CHEMOTYPES Chemistrt Test a Cerep Order 157 Compounds GPCR-1: training set Distribution of 686 training molecules collected from public domain Actives Inactives GPCR Target 1 hits for optimatisation 4.7M molecules in Zinc database 400,000 drug like molecules 500 in silico hits 250 hits & new chemotypes 157 tested for inhibition 76 actives 39 for IC50 30 confirmed 30 chemotypes 30 GPCR-1: results of primary screening CB1 results - primary screening Number of hits 90 81 80 70 60 50 40 30 20 10 19 10 9 22 16 0 >70% 60%-70% 50%-60% 40%-50% 30%-40% Percent of specific binding Number of in silico hits: 157 (10µM concentration) Number of actives: 76 Number of inactives: 81 Primary screen success rate = 48% <30% True hits False hits GPCR-1: new chemotypes Distribution of hits based on their diversity (Tanimoto coefficients) CB1 results - new chemotype 14 Number of hits 14 12 10 8 8 8 6 4 2 0 <0.60 0.60-0.70 0.70-0.75 Tanimoto coefficient New chemotype Equinox hit discovery on GPCR-2 - Data from BioPrint (Cerep) Observed activity - From BioPrint INDDEx 250 novel in silico hits in silico at Equinox Equinox outsources wet chemistry and biology 28 Verified in vitro hits Test a Chemistrt Cerep Order 94 Compounds Confirmed hit rate of in silico predictions on secondary screen c. 35% Target 1 Target 2 In silco hits 157 94 Primary screen hits (>30% binding at 10µM) 76 42 No. compounds tested for IC50 39 28 IC50 results (<12µM) 30 28 Estimated secondary hits if all primary hits tested 40 42 Estimated hit rate = 38/157 = 24 % 42 /94 = 45 % estimated secondary hits In silico hits Comparative hit rates Company / approach Target Hit Rate Technology INDDEx GPCR 1 & 2 + 35 % unknown target Ligand-based Structure-based Multiple targets Docking into 3D structure High throughput Multiple targets Average < 2% Average 0.001% Experimental screening Concluding remarks • If protein structure available can initiative an in silico screening approach to find hits. – Success rate generally <.2% – X-ray structure determination requires mgs of material – Prediction of structure if sequence identity > 50% • If structure- activity data available then in silico methods can yield far better hit rates c. 35% • in silco methods complement high throughput and can find different hits In silico small molecule discovery • Michael Sternberg, Ata Amini, Paul Freemont & Michael Sternberg • Imperial Collge Lond – www.sbg.bio.ic.ac.uk & www.doc.ic.ac.uk/~shm – www.equinoxpharma.com