Nothing in Biology {Including Drug Discovery} Makes Sense Except in the Light of Evolution Philip E. Bourne University of California San Diego pbourne@ucsd.edu Theodosius Dobzhansky (1900-1975) Big Questions in the Lab 1. Can we improve how science is disseminated and comprehended? 2. What is the ancestry of the protein structure universe? 3. What really happens when we take a drug? 1. Scholarly Communication • Support for open access /openID • Support for semantic enrichment – (http://biolit.ucsd.edu) • Support for rich media – (http://www.scivee.tv) 2. What is the ancestry of the protein structure universe? 3. What really happens when we take a drug? Valas et al. 2009 Current Opinions in Structural Biology June issue Phosphoinositide-3 Kinase (D) and Actin-Fragmin Kinase (E) PKA ChaK (“Channel Kinase”) 5 Can We Propose an Evolutionary History for the Protein Kinase-Like Superfamily? •Bayesian inference of phylogeny (MrBayes) •Manual structure alignment produces very high-quality sequence alignment of diverse homologues •But, sequence information too degraded to produce branching with sufficient support (i.e. a high posterior probability) •Addition of a matrix of structural characteristics (similar to morphological characteristics) produces a well supported combined model •Neither sequence structural characteristics sufficient to alone produce resolved tree, must be used in combination. 1 2 3 4 5 Example columns: 1BO1 Atypical 0 0 0 0 1 1IA9 Atypical 1 1 1 1 0 1) Ion pair analogous to K72-E91 in PKA 1E8X Atypical 1 0 1 1 1 2) α-Helix B present 3) State of α-Helix C (0: kinked, 1: straight) 4) State of Strand 4 (0: kinked, 1: straight) 5) α-Helix D present 1CJA Atypical 1 0 1 1 1 1NW1 Atypical 1 0 1 0 0 1J7U Atypical 1 0 1 0 1 1CDK AGC 1 1 1 0 1 1O6L AGC 1 1 1 0 1 1OMW AGC 1 1 1 0 1 1H1W AGC 1 1 1 0 1 1MUO Other 1 1 1 0 1 1TKI CAMK 1 0 1 0 1 1JKL CAMK 1 0 1 0 1 1A06 CAMK 1 0 1 0 1 1PHK CAMK 1 0 1 0 1 1KWP CAMK 1 0 1 0 1 1IA8 CAMK 1 0 1 0 0 1GNG CMGC 1 0 1 0 1 1HCK CMGC 1 0 1 0 1 1JNK CMGC 1 0 1 0 1 1HOW CMGC 1 0 1 0 1 1LP4 Other 1 0 1 0 1 1F3M STE 1 0 1 0 1 1O6Y Other 1 0 1 0 1 1CSN CK1 1 0 1 0 1 1B6C TKL 1 0 1 0 1 2SRC TK 1 0 1 0 1 1LUF TK 1 0 1 0 1 1IR3 TK 1 0 1 0 1 1M14 TK 1 0 1 0 1 1GJO TK 1 0 1 0 1 6 Proposed Evolutionary History for the Protein Kinase-Like Superfamily • Suggests distinctive history for atypical kinases, as opposed to intermittent divergence from the typical protein kinases (TPKs) APH AGC CK CAMK • TPK portion of tree shows high degree of agreement with Manning tree • Branching is supported by species representation of kinase families 0.64 AFK 0.97 CMGC 1.0 0.85 0.78 TKL PI3K CK1 TK •Atypical kinase families: Blue PIPKIIβ ChaK •Typical protein kinase groups (subfamilies): Red •Branch labels: posterior probability of branch 7 What if only the binding pocket was conserved and the global structure of the protein has changed? Hold that thought we will come back to it Motivation • The truth is we know very little about how the major drugs we take work • We know even less about what side effects they might have • Drug discovery seems to be approached in a very consistent and conventional way • The cost of bringing a drug to market is huge ~$800M • The cost of failure is even higher e.g. Vioxx $4.85Bn Motivation • The truth is we know very little about how the major drugs we take work – receptors are unknown • We know even less about what side effects they might have - receptors are unknown • Drug discovery seems to be approached in a very consistent and conventional way • The cost of bringing a drug to market is huge ~$800M – drug reuse is a big business • The cost of failure is even higher e.g. Vioxx $4.85Bn - fail early and cheaply What if… • We can characterize a protein-ligand binding site from a 3D structure (primary site) and search for that site on a proteome wide scale? • We could perhaps find alternative binding sites (offtargets) for existing pharmaceuticals and NCEs? • We could use it for lead optimization and possible ADME/Tox prediction • We might be able to construct a site similarity network to define multiple targets for dirty drugs What Do Off-targets Tell Us? • One of four things: 1. Nothing 2. A possible explanation for a side-effect of a drug 3. A possible repositioning of a drug to treat a completely different condition 4. A multi-target strategy to attack a pathogen Today I will give you examples of 2, 3 and 4 while illustrating the complexity of the problem Agenda • Computational Methodology • Side Effects - The Tamoxifen Story • Repositioning an Existing Drug - The TB Story • Salvaging $800M – The Torcetrapib Story • The Future? - The TB Drugome Need to Start with a 3D Drug-Receptor Complex - The PDB Contains Many Examples Generic Name Other Name Treatment PDBid Lipitor Atorvastatin High cholesterol 1HWK, 1HW8… Testosterone Testosterone Osteoporosis 1AFS, 1I9J .. Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH Viagra Sildenafil citrate ED, pulmonary arterial hypertension 1TBF, 1UDT, 1XOS.. Digoxin Lanoxin Congestive heart failure 1IGJ A Reverse Engineering Approach to Drug Discovery Across Gene Families Characterize ligand binding site of primary target (Geometric Potential) Identify off-targets by ligand binding site similarity (Sequence order independent profile-profile alignment) Extract known drugs or inhibitors of the primary and/or off-targets Search for similar small molecules … Dock molecules to both primary and off-targets Statistics analysis of docking score correlations Computational Methodology 43,738 Human Proteins map human proteins to drug targets with BLAST e-value < 0.001 map human proteins to PDB structures with >95% sequence identity 13,865 Human Proteins (2,002 Drug Targets) 3,158 Human Proteins (10,730 PDB Structures) map drug targets to PDB structures 1,585 PDB Structures (929 Drug Targets) cover 929/2,002 = 46.4% drug targets structurally remove redundant structures with 30% sequence identity 2,586 PDB Structures remove redundant structures with 30% sequence identity, 825 PDB Structures (druggable) The Human Target List Characterization of the Ligand Binding Site - The Geometric Potential Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments • Initially assign Ca atom with a value that is the distance to the environmental boundary • Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i GP P Pi cos(ai) 1.0 2.0 neighbors Di 1.0 Computational Methodology Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 Discrimination Power of the Geometric Potential 4 binding site non-binding site 3.5 • Geometric potential can distinguish binding and non-binding sites 3 2.5 2 1.5 1 0.5 100 99 88 77 66 55 44 33 22 11 0 0 Geometric Potential Computational Methodology 0 Geometric Potential Scale Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9 Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm Structure A Structure B LER VKDL LER VKDL • Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix • The maximum-weight clique corresponds to the optimum alignment of the two structures Xie and Bourne 2008 PNAS, 105(14) 5441 Similarity Matrix of Alignment Chemical Similarity • Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH) • Amino acid chemical similarity matrix Evolutionary Correlation • Amino acid substitution matrix such as BLOSUM45 • Similarity score between two sequence profiles d f a Sb f b S a i i i i i i fa, fb are the 20 amino acid target frequencies of profile a and b, respectively Sa, Sb are the PSSM of profile a and b, respectively Computational Methodology Xie and Bourne 2008 PNAS, 105(14) 5441 Lead Discovery from Fragment Assembly • Privileged molecular moieties in medicinal chemistry • Structural genomics and high throughput screening generate a large number of proteinfragment complexes • Similar sub-site detection enhances the application of fragment assembly strategies in drug discovery 1HQC: Holliday junction migration motor protein from Thermus thermophilus 1ZEF: Rio1 atypical serine protein kinase from A. fulgidus Lead Optimization from Conformational Constraints • Same ligand can bind to different proteins, but with different conformations • By recognizing the conformational changes in the binding site, it is possible to improve the binding specificity with conformational constraints placed on the ligand 1ECJ: amido-phosphoribosyltransferase from E. Coli 1H3D: ATP-phosphoribosyltransferase from E. Coli Scoring a) b) Blosum45 and b) McLachlan substitution matrices. • Scores for binding site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version Xie, Xie and Bourne 2009 Bioinformatics ISMB Agenda • Computational Methodology • Side Effects - The Tamoxifen Story • Repositioning an Existing Drug - The TB Story • Salvaging $800M – The Torcetrapib Story • The Future? - The TB Drugome Found.. • Evolutionary linkage between: – NAD-binding Rossmann fold – S-adenosylmethionine (SAM)-binding domain of SAMdependent methyltransferases • Catechol-O-methyl transferase (COMT) is SAMdependent methyltransferase • Entacapone and tolcapone are used as COMT inhibitors in Parkinson’s disease treatment • Hypothesis: – Further investigation of NAD-binding proteins may uncover a potential new drug target for entacapone and tolcapone Repositioning an Existing Drug - The TB Story Functional Site Similarity between COMT and ENR • Entacapone and tolcapone docked onto 215 NADbinding proteins from different species • M.tuberculosis Enoyl-acyl carrier protein reductase ENR (InhA) discovered as potential new drug target • ENR is the primary target of many existing anti-TB drugs but all are very toxic • ENR catalyses the final, rate-determining step in the fatty acid elongation cycle • Alignment of the COMT and ENR binding sites revealed similarities ... Kinnings et al. 2009 PLoS Comp Biol under review Binding Site Similarity between COMT and ENR COMT SAM (cofactor) BIE (inhibitor) ENR NAD (cofactor) 641 (inhibitor) Repositioning an Existing Drug - The TB Story Summary of the TB Story • Entacapone and tolcapone shown to have potential for repositioning • Direct mechanism of action avoids M. tuberculosis resistance mechanisms • Possess excellent safety profiles with few side effects – already on the market • In vivo support • Assay of direct binding of entacapone and tolcapone to ENR reveals promising leads with no chemical relationship to existing drugs Repositioning an Existing Drug - The TB Story Agenda • Computational Methodology • Side Effects - The Tamoxifen Story • Repositioning an Existing Drug - The TB Story • Salvaging $800M – The Torcetrapib Story • The Future? - The TB Drugome Selective Estrogen Receptor Modulators (SERM) • One of the largest classes of drugs • Breast cancer, osteoporosis, birth control etc. • Amine and benzine moiety Side Effects - The Tamoxifen Story PLoS Comp. Biol., 2007 3(11) e217 Adverse Effects of SERMs cardiac abnormalities thromboembolic disorders loss of calcium homeostatis ????? ocular toxicities Side Effects - The Tamoxifen Story PLoS Comp. Biol., 3(11) e217 Structure and Function of SERCA Sacroplasmic Reticulum (SR) Ca2+ ion channel ATPase • Regulating cytosolic calcium levels in cardiac and skeletal muscle • Cytosolic and transmembrane domains • Predicted SERM binding site locates in the TM, inhibiting Ca2+ uptake Side Effects - The Tamoxifen Story PLoS Comp. Biol., 3(11) e217 Binding Poses of SERMs in SERCA from Docking Studies • Salt bridge interaction between amine group and GLU • Aromatic interactions for both N-, and C-moiety 6 SERMS A-F (red) Side Effects - The Tamoxifen Story PLoS Comp. Biol., 3(11) e217 The Challenge • Design modified SERMs that bind as strongly to estrogen receptors but do not have strong binding to SERCA, yet maintain other characteristics of the activity profile Side Effects - The Tamoxifen Story PLoS Comp. Biol., 3(11) e217 Agenda • Computational Methodology • Side Effects - The Tamoxifen Story • Repositioning an Existing Drug - The TB Story • Salvaging $800M – The Torcetrapib Story • The Future? - The TB Drugome The Torcetrapib Story PLoS Comp Biol2009 Accepted Cholesteryl Ester Transfer Protein (CETP) CETP inhibitor X CETP LDL Bad Cholesterol HDL Good Cholesterol • collects triglycerides from very low density or low density lipoproteins (VLDL or LDL) and exchanges them for cholesteryl esters from high density lipoproteins (and vice versa) • A long tunnel with two major binding sites. Docking studies suggest that it possible that torcetrapib binds to both of them. • The torcetrapib binding site is unknown. Docking studies show that both sites can bind to torcetrapib with the docking score around -8.0. The Torcetrapib Story PLoS Comp Biol 2009 Accepted Docking Scores eHits/Autodock Off-target PDB Ids Torcetrapib Anacetrapib JTT705 Complex ligand CETP 2OBD -11.675 / -5.72 -11.375 / -8.15 -7.563 / -6.65 -8.324 (PCW) Retinoid X receptor 1YOW 1ZDT -11.420 / -6.600 -6.74 -8.696 / -7.68 -7.35 -6.276 / -7.28 -6.95 -9.113 (POE) PPAR delta 1Y0S -10.203 / -8.22 -10.595 / -7.91 -7.581 / -8.36 -10.691(331) PPAR alpha 2P54 -11.036 / -6.67 -0.835 / -7.27 -9.599 / -7.78 -11.404(735) PPAR gamma 1ZEO -9.515 / -7.31 > 0.0 / -8.25 -7.204 / -8.11 -8.075 (C01) Vitamin D receptor 1IE8 >0.0/ -4.73 >0.0 / -6.25 -6.628 / -9.70 -8.354 (KH1) -7.35 Glucocorticoid Receptor 1NHZ 1P93 Fatty acid binding protein 2F73 2PY1 2NNQ >0.0/ -4.33 >0.0/-6.13 /-6.40 >0.0/ -7.81 >0.0/ -6.98 /-7.64 -7.191 / -8.49 /-6.33 /6.35 ??? T-Cell CD1B 1GZP -8.815 / -7.02 -13.515 / -7.15 -7.590 / -8.02 -6.519 (GM2) IL-10 receptor 1LQS / -4.59 / -6.77 GM-2 activator 2AG9 -9.345 / -6.26 -9.674 / -6.98 (3CA2+) CARDIAC TROPONIN C 1DTL /-5.83 /-6.71 /-5.79 cytochrome bc1 complex 1PP9 (PEG) /-6.97 /-9.07 /-6.64 1PP9 (HEM) /-7.21 /8.79 /-8.94 1V5H /-4.89 /-7.00 /-4.94 human cytoglobin The Torcetrapib Story /-4.43 /-5.63 /-7.08 /-0.58 /-7.09 /-9.42 / -5.95 -8.617 / -6.17 ??? ??? (MYR) -4.16 JTT705 Torcetrapib Anacetrapib JTT705 VDR – RAS + RXR PPARα PPARδ FA ? FABP ? ? PPARγ High blood pressure + Anti-inflammatory function JNK/IKK pathway JNK/NF-KB pathway Immune response to infection The Torcetrapib Story PLoS Comp Biol 2009 Accepted Agenda • Computational Methodology • Side Effects - The Tamoxifen Story • Repositioning an Existing Drug - The TB Story • Salvaging $800M – The Torcetrapib Story • The Future? - The TB Drugome Existing Drugs 3. Protein-ligand Docking Structural Proteome 2. Binding site Similarity … Protein-drug Interactome Drugome Target identification 1. Structural Determination & Modeling Genome 4.2 Network Integration 4.1 Network Reconstruction Metabolome The TB Drugome ISMB 2009 Drug repurposing Side effect prediction New therapeutics Drug resistance mechanism Existing Drugs 3. Protein-ligand Docking TB Structural Proteome … TB Protein-drug Interactome 2. Binding site Similarity Drugome/TB 1. Structural Determination & Modeling TB Genome 4.2 Network Integration 4.1 Network Reconstruction TB Metabolome The TB Drugome ISMB 2009 Target identification Drug repurposing Side effect prediction New therapeutics for MDR and XDR-TB Drug resistance mechanism Predicted protein-ligand interaction network of M.tuberculosis. Proteins that are predicted to have similar binding sites are connected. Squares represent the top 18 most connected proteins. The TB Drugome ISMB 2009 The TB Drugome ISMB 2009 Limitations • Structural coverage of the given proteome • False hits / poor docking scores • Literature searching • It’s a hypothesis – need experimental validation • Money Summary • We have established a protocol to look for offtargets for existing therapeutics and NCEs • Understanding these in the context of pathways would seem to be the next step towards a new understanding – cheminfomatics meets systems biology • Lots of other opportunities to examine existing drugs – DrugX and the Recovery Act Bioinformatics Final Examples.. • Donepezil for treating Alzheimer’s shows positive effects against other neurological disorders • Orlistat used to treat obesity has proven effective against certain cancer types • Ritonavir used to treat AIDS effective against TB • Nelfinavir used to treat AIDS effective against different types of cancers Acknowledgements Lei Xie Li Xie Jian Wang Sarah Kinnings