Drug Discovery: Proteomics, Genomics Philip E. Bourne Professor of Pharmacology UCSD pbourne@ucsd.edu 858-534-8301 SPPS273 1 Agenda • Where my perspective comes from • The interplay between omics, IT and drug discovery • The omics revolution • Changes in IT and open science and software licensing • Applying the new biology to drug discovery – Example 1 – Drug repositioning – Example 2 - Determining side-effects • Words of caution SPPS273 2 Some Background • We work in the area of structural bioinformatics • We distribute the equivalent to ¼ the Library of Congress to approx. 250,000 scientists each month • We are interested in improving the drug discovery process through computationally driven hypotheses on the complete biological system • Personally: – Open science advocate – Started 4 companies – Spent whole life in the ivory tower The Source of My Perspective SPPS273 3 Observations • Glass ½ Empty: drug discovery in the traditional sense is in a woeful state • Glass ½ Full: – We have an explosion of data and hence a new emerging understanding of complex biological systems – Information technology is advancing rapidly The Take Home Message • Let optimism rule – let traditional computational chemistry and cheminfomatics meet bioinformatics, systems biology and information science to discover drugs in new ways SPPS273 4 The Drivers of Change – Data & IT Biological Experiment Collect Data Information Characterize Knowledge Compare Discovery Model Infer Complexity Higher-life Technology 1 Organ 10 Brain Mapping Cardiac Modeling Cellular Structure Sequence The Omics Revolution 102 Neuronal Modeling 106 Virus Structure 106 Computing Power Virtual Communities # People /Web Site 1 Ribosome Genetic Circuits Human Genome Project Yeast E.Coli C.Elegans Genome Genome Genome ESTs 90 105 Blogs Facebook Model Metaboloic Pathway of E.coli Sub-cellular Assembly Data 1000 100 Gene Chips 95 Year 00 1 Small Genome/Mo. Human Genome 05 1000’s GWAS Sequencing Number of released entries Its Not Just About Numbers its About Complexity The Omics Revolution Year Courtesy of the RCSB Protein Data Bank Metagenomics - 2007 • New type of genomics • New data (and lots of it) and new types of data – 17M new (predicted proteins!) 4-5 x growth in just few months and much more coming – New challenges and exacerbation of old challenges The Omics Revolution 8 Metagenomics: Early Results • More then 99.5% of DNA in very environment studied represent unknown organisms – Culturable organisms are exceptions, not the rule • Most genes represent distant homologs of known genes, but there are thousands of new families The Omics Revolution • Everything we touch turns out to be a gold mine • Environments studied: – Water (ocean, lakes) – Soil – Human body (gut, oral cavity, human microbiome) 9 Metagenomics New Discoveries Environmental (red) vs. Currently Known PTPases (blue) 1 The Omics Revolution 10 The Good News and the Bad News • Good news – Data pointing towards function are growing at near exponential rates – IT can handle it on a per dollar basis • Bad news – Data are growing at near exponential rates – Quality is highly variable – Accurate functional annotation is sparse The Omics Revolution 11 Example of the Interplay Between Bioinformatics & Proteomics - The Structural Genomics Pipeline Structural biology moves from being functionally driven to genomically driven Basic Steps Crystallomics • Isolation, Target • Expression, Data Selection • Purification, Collection • Crystallization Fill in Robotics protein fold -ve data space The Omics Revolution Structure Solution Structure Refinement Software engineering Functional Annotation Publish Functional prediction Not necessarily 12 Towards Open Science • Open access publishing • Open source software • Generation of scientists weaned on social networks • Blogs, wikis, social bookmarking etc. are becoming a valid form of scientific discourse http://www.osdd.net/ SPPS273 13 University Tech Transfer Offices are Slow to Embrace this Change • Overvalue disclosures • Inability to market disclosures appropriately • Protracted negotiations in a fast moving market • Disable rather than enable startups SPPS273 14 So Why is All of This So Important to Drug Discovery? We are beginning to piece together a complex living system and we need to understand that to do better SPPS273 15 Why Don’t we Do Better? A Couple of Observations • Gene knockouts only effect phenotype in 10-20% of cases , why? – redundant functions – alternative network routes – robustness of interaction networks A.L. Hopkins Nat. Chem. Biol. 2008 4:682-690 • 35% of biologically active compounds bind to more than one target Paolini et al. Nat. Biotechnol. 2006 24:805–815 Why Don’t we Do Better? A Couple of Observations • Tykerb – Breast cancer • Gleevac – Leukemia, GI cancers • Nexavar – Kidney and liver cancer • Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive Collins and Workman 2006 Nature Chemical Biology 2 689-700 Implications • Ehrlich’s philosophy of magic bullets targeting individual chemoreceptors has not been realized • Stated another way – The notion of one drug, one target, one disease is a little naïve in a complex system So How Can We Exploit All The New Data We are Collecting on This Complex System? Lets Work Through a Couple of Examples SPPS273 19 What if… • We can characterize a protein-ligand binding site from a 3D structure (primary site) and search for that site on a proteome wide scale? • We could perhaps find alternative binding sites (off-targets) for existing pharmaceuticals and NCEs? Exploiting the Structural Proteome What Do These Off-targets Tell Us? • Potentially many things: 1. Nothing 2. How to optimize a NCE 3. A possible explanation for a side-effect of a drug already on the market 4. A possible repositioning of a drug to treat a completely different condition 5. The reason a drug failed 6. A multi-target strategy to attack a pathogen Exploiting the Structural Proteome Need to Start with a 3D Drug-Receptor Complex - The PDB Contains Many Examples Generic Name Other Name Treatment PDBid Lipitor Atorvastatin High cholesterol 1HWK, 1HW8… Testosterone Testosterone Osteoporosis 1AFS, 1I9J .. Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH Viagra Sildenafil citrate ED, pulmonary arterial hypertension 1TBF, 1UDT, 1XOS.. Digoxin Lanoxin Congestive heart failure 1IGJ Exploiting the Structural Proteome A Reverse Engineering Approach to Drug Discovery Across Gene Families Characterize ligand binding site of primary target (Geometric Potential) Identify off-targets by ligand binding site similarity (Sequence order independent profile-profile alignment) Extract known drugs or inhibitors of the primary and/or off-targets Search for similar small molecules … Dock molecules to both primary and off-targets Statistics analysis of docking score correlations Exploiting the Structural Proteome Xie and Bourne 2009 Bioinformatics 25(12) 305-312 The Problem with Tuberculosis • • • • • One third of global population infected 1.7 million deaths per year 95% of deaths in developing countries Anti-TB drugs hardly changed in 40 years MDR-TB and XDR-TB pose a threat to human health worldwide • Development of novel, effective, and inexpensive drugs is an urgent priority Example 1 – Repositioning The TB Story Found.. • Evolutionary linkage between: – NAD-binding Rossmann fold – S-adenosylmethionine (SAM)-binding domain of SAMdependent methyltransferases • Catechol-O-methyl transferase (COMT) is SAMdependent methyltransferase • Entacapone and tolcapone are used as COMT inhibitors in Parkinson’s disease treatment • Hypothesis: – Further investigation of NAD-binding proteins may uncover a potential new drug target for entacapone and tolcapone Example 1 – Repositioning The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423 Functional Site Similarity between COMT and InhA • Entacapone and tolcapone docked onto 215 NAD-binding proteins from different species • M.tuberculosis Enoyl-acyl carrier protein reductase ENR (InhA) discovered as potential new drug target • InhA is the primary target of many existing anti-TB drugs but all are very toxic • InhA catalyses the final, rate-determining step in the fatty acid elongation cycle • Alignment of the COMT and InhA binding sites revealed similarities ... Repositioning - The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423 Binding Site Similarity between COMT and InhA COMT SAM (cofactor) BIE (inhibitor) InhA NAD (cofactor) 641 (inhibitor) Example 1 – Repositioning The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423 Summary of the TB Story • Entacapone and tolcapone shown to have potential for repositioning • Direct mechanism of action avoids M. tuberculosis resistance mechanisms • Possess excellent safety profiles with few side effects – already on the market • In vivo support • Assay of direct binding of entacapone and tolcapone to InhA reveals a possible lead with no chemical relationship to existing drugs Example 1 – Repositioning The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423 Summary from the TB Alliance – Medicinal Chemistry • The minimal inhibitory concentration (MIC) of 260 uM is higher than usually considered • MIC is 65x the estimated plasma concentration • Have other InhA inhibitors in the pipeline Example 1 – Repositioning The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423 Predicted protein-ligand interaction network of M.tuberculosis. Proteins that are predicted to have similar binding sites are connected. Squares represent the top 18 most connected proteins. Bioinformatics 2009 25(12) 305-312 The TB Druggome Bioinformatics 2009 25(12) 305-312 The TB Druggome SMAP p-value < 1e-5 drugs TB proteins The TB Druggome p < 1e-7 p < 1e-6 p < 1e-5 New Ways of Thinking • Polypharmacology – One or multiple drugs binding to multiple targets for a collective effect aka Dirty Drugs • Network Pharmacology – Measuring that effect on the whole biological network SPPS273 33 Example 2 - The Torcetrapib Story PLoS Comp Biol 2009 5(5) e1000387 Cholesteryl Ester Transfer Protein (CETP) CETP inhibitor X CETP LDL Bad Cholesterol HDL Good Cholesterol • collects triglycerides from very low density or low density lipoproteins (VLDL or LDL) and exchanges them for cholesteryl esters from high density lipoproteins (and vice versa) • A long tunnel with two major binding sites. Docking studies suggest that it possible that torcetrapib binds to both of them. • The torcetrapib binding site is unknown. Docking studies show that both sites can bind to torcetrapib with the docking score around -8.0. Example 2 - The Torcetrapib Story PLoS Comp Biol 2009 5(5) e1000387 Docking Scores eHits/Autodock Off-target PDB Ids Torcetrapib Anacetrapib JTT705 Complex ligand CETP 2OBD -11.675 / -5.72 -11.375 / -8.15 -7.563 / -6.65 -8.324 (PCW) Retinoid X receptor 1YOW 1ZDT -11.420 / -6.600 -6.74 -8.696 / -7.68 -7.35 -6.276 / -7.28 -6.95 -9.113 (POE) PPAR delta 1Y0S -10.203 / -8.22 -10.595 / -7.91 -7.581 / -8.36 -10.691(331) PPAR alpha 2P54 -11.036 / -6.67 -0.835 / -7.27 -9.599 / -7.78 -11.404(735) PPAR gamma 1ZEO -9.515 / -7.31 > 0.0 / -8.25 -7.204 / -8.11 -8.075 (C01) Vitamin D receptor 1IE8 >0.0/ -4.73 >0.0 / -6.25 -6.628 / -9.70 -8.354 (KH1) -7.35 Glucocorticoid Receptor 1NHZ 1P93 Fatty acid binding protein 2F73 2PY1 2NNQ >0.0/ -4.33 >0.0/-6.13 /-6.40 >0.0/ -7.81 >0.0/ -6.98 /-7.64 -7.191 / -8.49 /-6.33 /6.35 ??? T-Cell CD1B 1GZP -8.815 / -7.02 -13.515 / -7.15 -7.590 / -8.02 -6.519 (GM2) IL-10 receptor 1LQS / -4.59 / -6.77 GM-2 activator 2AG9 -9.345 / -6.26 -9.674 / -6.98 (3CA2+) CARDIAC TROPONIN C 1DTL /-5.83 /-6.71 /-5.79 cytochrome bc1 complex 1PP9 (PEG) /-6.97 /-9.07 /-6.64 1PP9 (HEM) /-7.21 /8.79 /-8.94 1V5H /-4.89 /-7.00 /-4.94 human cytoglobin Example 2 - The Torcetrapib Story /-4.43 /-5.63 /-7.08 /-0.58 /-7.09 /-9.42 / -5.95 -8.617 / -6.17 ??? ??? (MYR) -4.16 PLoS Comp Biol 2009 5(5) e1000387 JTT705 Torcetrapib Anacetrapib JTT705 VDR – RAS High blood pressure + RXR PPARα PPARδ PPARγ + Anti-inflammatory function FA ? ? FABP ? JNK/IKK pathway JNK/NF-KB pathway Immune response to infection Example 2 - The Torcetrapib Story PLoS Comp Biol 2009 5(5) e1000387 The Future? Chang et al. 2009 Mol Sys Biol Submitted Modifications to Early Stage Drug Discovery Off-targets http://www.celgene.com/images/celgene_drug_arrow.gif Systems Biology SPPS273 39 Some Known Limitations • • • • Structural coverage of the given proteome False hits / poor docking scores Literature searching It’s a hypothesis – need experimental validation • Money Known Limitations Perceived Limitations • Mistrust of computational approaches • Bioinformatics was previously oversold • Omics was previously oversold • Still too cutting edge • No interest in drug resistance SPPS273 41 pbourne@ucsd.edu Questions?