Chemoinformatics in Drug Design Irene Kouskoumvekaki, Associate Professor, Computational Chemical Biology, CBS, DTU-Systems Biology Biological Sequence Analysis, May 6, 2011 Computational Chemical Biology group Tudor Oprea Guest Professor Olivier Taboureau Associate Professor Irene Kouskoumvekaki Associate Professor Sonny Kim Nielsen PhD student Kasper Jensen PhD student 2 CBS, Department of Systems Biology Ulrik Plesner master student 3 CBS, Department of Systems Biology 4 CBS, Department of Systems Biology Definition: Chemoinformatics Gathering and systematic use of chemical information, and application of this information to predict the behavior of unknown compounds in silico. data 5 CBS, Department of Systems Biology prediction Definition: A drug candidate… ... is a (ligand) compound that binds to a biological target (protein, enzyme, receptor, ...) and in this way either initiates a process (agonist) or inhibits it (antagonist) The structure/conformation of the ligand is complementary to the space defined by the protein’s active site The binding is caused by favorable interactions between the ligand and the side chains of the amino acids in the active site. (electrostatic interactions, hydrogen bonds, hydrophobic contacts...) 6 CBS, Department of Systems Biology Drug Discovery Animal studies In vitro / In silico studies Clinical studies Disease 7 Biological Target CBS, Department of Systems Biology Drug candidate The Drug Discovery Process Genome Gene Protein HTS Hit Lead Candidate Drug Genomics Bioinformatics Structural Bioinformatics Chemoinformatics Chemoinformatics Structure-based Drug Design ADMET Modelling 8 CBS, Department of Systems Biology The Drug Discovery Process We know the structure of the biological target We identify/predict the binding pocket MKTAALAPLFFLPSALATTVYLA GDSTMAKNGGGSGTNGWGEYL ASYLSATVVNDAVAGRSAR…(etc) Challenge: To design an organic molecule that would bind strong enough to the biological target and modute it’s activity. 9 CBS, Department of Systems Biology New drug candidate Example: – Alzheimer’s disease What is it? Alzheimer's is a disease that causes failure of brain functions and dementia. It starts with bad memory and disability to function in common everyday activities. How do you get it? Alzheimer's disease is the result of malfunctioning neurons at different parts of the brain. This, in turn, is due to an inbalance in the concentration of neurotranmitters. 10 CBS, Department of Systems Biology Example: – Alzheimer’s disease How can we treat it? Acetylkolin neurotransmitter Drug against Alzheimer’s 11 CBS, Department of Systems Biology Old School Drug discovery process HTS Screening collection 106 cmp. Follow-up Actives Hits 103 actives 1-10 hits High rate of false positives !!! 12 CBS, Department of Systems Biology Hit-to-lead Lead-to-drug Lead series 0-3 lead series Drug candidate 0-1 Clinical trials 13 CBS, Department of Systems Biology Failures 14 CBS, Department of Systems Biology Drug discovery in the 21st Century in vitro in silico + in vitro Diverse set of molecules tested Computational methods to select in the lab subsets (to be tested in the lab) based on prediction of drug-likeness, solubility, binding, pharmacokinetics, toxicity, side effects, ... 15 CBS, Department of Systems Biology The Lipinski ‘rule of five’ for druglikeness prediction Octanol-water partition coefficient (logP) ≤ 5 Molecular weight ≤ 500 # hydrogen bond acceptors (HBA) ≤ 10 # hydrogen bond donors (HBD) ≤ 5 If two or more of these rules are violated, the compound might have problems with oral bioavailability. (Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.) 16 CBS, Department of Systems Biology Major Aspects of Chemoinformatics Experimental data 17 CBS, Department of Systems Biology Model generation Prediction for unknown compounds Major Aspects of Chemoinformatics •Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. •Information Use: Data analysis, correlation and model building. •Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences. 18 CBS, Department of Systems Biology Major Aspects of Chemoinformatics •Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. •Information Use: Data analysis, correlation and model building. •Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences. 19 CBS, Department of Systems Biology Information Acquisition and Management 20 CBS, Department of Systems Biology Small molecule databases 21 CBS, Department of Systems Biology Growth In PubChem Substances & Compounds Recent count: Substance: 72,156,631 Compound: 28,807,320 Rule of 5: 20,692,980 20,000,000 18,000,000 16,000,000 Compound Substance 14,000,000 12,000,000 10,000,000 8,000,000 6,000,000 4,000,000 2,000,000 0 May-05 22 Sep-05 Jan-06 CBS, Department of Systems Biology May-06 Sep-06 Jan-07 May-07 Sep-07 Searching in PubChem 23 CBS, Department of Systems Biology Structural representation of molecules Structural representation of molecules 24 CBS, Department of Systems Biology Major Aspects of Chemoinformatics •Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. •Information Use: Data analysis, correlation and model building. •Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences. 25 CBS, Department of Systems Biology Beyond the Lipinski Rule of 5... •Chemometrics: The application of mathematical or statistical methods to chemical data (simple, linear methods) e.g. Principal Component Analysis •Machine Learning: The design and development of algorithms and techniques that allow computers to learn (complex, non-linear algorithms) e.g. Artificial Neural Networks, K-means clustering 26 CBS, Department of Systems Biology Major Aspects of Chemoinformatics •Information Acquisition and Management: Methods for collecting data (mainly experimental). Development of databases for storage and retrieval of information. •Information Use: Data analysis, correlation and model building. •Information Application: Prediction of molecular properties relevant to chemical and biochemical sciences. 27 CBS, Department of Systems Biology Prediction of Solubility, ADME & Toxicity Solid Dissolution drug Membrane Drug in solution Solubility 28 transfer CBS, Department of Systems Biology Absorbed Liver extraction circulation drug Absorption Systemic Metabolism Prediction of biological activity/selectivity 29 CBS, Department of Systems Biology Prediction models at CBS 30 CBS, Department of Systems Biology Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates. Exploit knowledge of the active ligand molecule or the protein target. 31 CBS, Department of Systems Biology Virtual Screening Flavors TARGET-BASED 1D filters 1D e.g. Lipinskis Rule of Five 32 CBS, Department of Systems Biology LIGAND-BASED Molecular similarity on the Chemical Space • Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. (Not always true!) • Thus, molecules that are located closely together in the chemical space are often considered to be functionally related. 33 CBS, Department of Systems Biology Ligand-based VS: Fingerprints – widely used similarity search tool – consists of descriptors encoded as bit strings – Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: • Is there a ring of size 4? • Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0) 34 CBS, Department of Systems Biology Tanimoto Similarity c 9 Tc 0.9 a b c 10 9 9 or 90% similarity 35 CBS, Department of Systems Biology Tanimoto Similarity 36 CBS, Department of Systems Biology Ligand-based VS: Pharmacophore 37 CBS, Department of Systems Biology Structure-based Virtual Screening: Docking Binding pocket of target Library of small compounds Given a protein and a database of ligands, docking scores determine which ligands are most likely to bind. 38 CBS, Department of Systems Biology Energy of binding Binding pocket of target Library of small compounds -1 kcal/mol -10 kcal/mol +10 kcal/mol +1 kcal/mol ΔG = ΔH - TΔS vdW Hbond Desolvation E Electrostatic E 39 CBS, Department of Systems Biology Torsional free E “Docking” and “Scoring” • Docking involves the prediction of the binding mode of individual molecules – Goal: new ligand orientation closest in geometry to the observed X-ray structure (Conformations of ligands in complexes often have very similar geometries to minimum-energy conformations of the isolated ligand) • Scoring ranks the ligands using some function related to the free energy of association of the two partners, looking at attractive and repulsive regions and taking into account steric and hydrogen bonding interactions – Goal: new ligand score closest in value to the docking score of the X-ray structure 40 CBS, Department of Systems Biology Docking algorithms • Most exhaustive algorithms: –Accurate prediction of a binding pose • Most efficient algorithms –Docking of small ligand databases in reasonable time • Rapid algorithms –Virtual high-throughput screening of millions of compounds 41 CBS, Department of Systems Biology Scoring functions • Molecular mechanics force field-based Score is estimated by summing the strength of intermolecular van der Waals and electrostatic interactions between all atoms of the ligand-target complex -CHARMM, AMBER • Empirical-based Based on summing various types of interactions between the two binding partners (hydrogen bonds, hydrophobic, …) - ChemScore, GlideScore, AutoDock • Knowledge-based Based on statistical observations of intermolecular close contacts from large 3D databases, which are used to derive potentials or mean forces -PMF, DrugScore 42 CBS, Department of Systems Biology Combination of pharmacophore, docking and molecular dynamics (MD) screens Ligand-based VS good enrichment of candidate molecules from the screening of large databases with less computational efforts Structure-based VS better fit for analyzing smaller sets of compounds, especially in retrospective analysis × too coarse to pick up subtle differences induced by small structural variations in the ligands include all possible interactions thus allowing the detection of unexpected binding modes many options for model refinement × Changing parameters for docking algorithms and scores is demanding Mutants are being developed: • pharmacophore methods with information about the target’s binding site 43 CBS, Department of Systems Biology • docking programs that incorporate pharmacophore constraints http://www.vcclab.org/lab/edragon/ 44 4 CBS, Department of Systems Biology Public Web Chemoinformatics Tools http://pasilla.health.unm.edu/ http://pasilla.health.unm.edu/ 45 CBS, Department of Systems Biology ChemSpider www.chemspider.com 46 CBS, Department of Systems Biology Open Babel http://openbabel.org/wiki/Main_page 47 CBS, Department of Systems Biology 48 CBS, Department of Systems Biology D. Vidal et al, Ligand-based Approaches to In Silico Pharmacology, Chemoinformatics and Computational Chemical Biology, Ed J. Bajorath, Springer, 2011 Questions? 49 CBS, Department of Systems Biology