Advanced Bioinformatics Lecture 7: Computer-aided lead identification ZHU FENG zhufeng@cqu.edu.cn http://idrb.cqu.edu.cn/ Innovative Drug Research Centre in CQU 创新药物研究与生物信息学实验室 Table of Content 1. Schematic of DOCKing 2. Pharmacophore-based docking 3. INVDOCK Strategy 4. Ligand-based drug design 5. Classification of drugs by SVM 2 What is docking? Given two molecules find their correct association T + = Computationally predict the structures of protein-ligand complexes from their conformations and orientations. The orientation that maximizes the interaction reveals the most accurate structure of the complex. 3 General protein–ligand binding Ligand − Molecule that binds with a protein Protein active site(s) − Allosteric binding − Competitive binding Function of binding interaction − Natural and artificial 4 Docking strategy PDB file Surface Representation Patch Detection Matching Patches Scoring & Filtering Candidate complexes 5 Schematic of docking methodology (A) the target binding site is filled with site points (B) distances between atoms in a molecule are matched to that of site points (C) a transformation matrix is calculated for an orientation (D) the molecule is docked into the binding site, and the fit of that conformer is scored 6 Design of HIV-1 protease inhibitor Step 1: creation of spheres to fit a cavity 7 Design of HIV-1 protease inhibitor Step 2: place a ligand to match the position of spheres 8 Design of HIV-1 protease inhibitor Step 3: check chemical complementarity 9 Scoring in ligand-protein docking Potential energy description 10 Some techniques Surface representation, that efficiently represents the docking surface and identifies the regions of interest − Connolly surface − Lenhoff technique etc. Dense MS surface (Connolly) Sparse surface (Shuo Lin et al.) 11 Connolly surface Each atomic sphere is given the van der Waals radius of the atom Rolling a Probe Sphere over the Van der Waals surface leads to the Solvent Reentrant Surface or Connolly surface 12 Lenhoff technique Computes a “complementary” surface for the receptor instead of the Connolly surface, i.e. computes possible positions for the atom centers of the ligand Atom centers of the ligand van der Waals surface 13 Pharmacophore-based docking Basic idea Appropriate spatial disposition of a small number of functional groups in a molecule is sufficient for achieving a desired biological effect. The ensemble formation will be guided by these functional groups 14 3-D representation of a protein binding site 6.7 4.2-4.7 5.2 5.1-7.1 4.8 Distances between binding groups in Angstroms and the type of interaction is searchable 15 Pharmacophore Fingerprint Appropriate spatial disposition of a small number of functional groups in a molecule is sufficient for achieving a desired biological effect. The ensemble formation will be guided by these functional groups 16 Schematic of PhDOCK methodology DOCK PhDOCK 17 Advantages and disadvantages of PhDOCK Advantages: speed increase due to (1) rapid elimination of ligands containing functional groups which would interfere with binding. (2) speed increase over docking of individual molecules. (3) more information pertaining to the entire molecule is retained (no rigid portions). (4) Chemical matching and critical clusters are encouraged. Disadvantages: (1) complex queries are extremely slow. (2) the majority of the information contained in the target structure is not considered during the search. 18 INVDOCK Strategy Existing methods INVDOCK methods Given a protein, find putative Given a ligand, find putative binding ligands from chemical protein targets from protein database database Given Lock, find Key Given Key, find Lock Forward lead identification Backward MOA prediction Science 1992; 257:1078 Proteins 1999; 36:1 19 INVDOCK Test on Drug Target Prediction Anticancer Drug Tamoxifen PDB Id 1a25 1a52 1bhs 1bld 1cpt 1dmo Protein Protein Kinase C Estrogen Receptor 17 beta HSD dehydragenase bFGF Factor Cytochrome P450-TERP Calmodulin Experimental Findings Secondary Target Drug Target Inhibitor Inhibitor Metabolism Secondary Target Proteins. 1999; 36:1 Tamoxifen is a famous anticancer drug for treatment of breast cancer. It was approved by FDA in 1998 as the 1st cancer preventive drug. 30 million people are expected to use it. 20 INVDOCK Test on Drug Target Prediction Drug Toxicity Targets (J. Mol. Graph. Mod. 2001, 20, 199) Number of experimentally confirmed or implicated toxicity targets Number of toxicity targets predicted by INVDOCK Number of toxicity targets without structure or involving covalent bond No. of INVDOCK predicted toxicity targets without experimental finding Aspirin 15 9 2 4 2 Gentamicin 17 5 2 10 2 Ibuprofen 5 3 0 2 2 Indinavir 6 4 0 2 2 Neomycin 14 7 1 6 6 Penicillin G 7 6 0 1 8 Tamoxifen 2 2 0 0 4 Vitamin C 2 2 0 0 3 Total 68 38 5 25 29 Compound Number of toxicity targets missed by INVDOCK 21 Results of docking studies The docked (blue) and crystal (yellow) structure of ligands in some PDB ligand-protein complexes. The PDB Id of each structure is shown. 22 Dataset and Testing Results Protein-Protein cases from protein-protein docking benchmark: Enzyme-inhibitor – 22 cases Antibody-antigen – 16 cases Protein-DNA docking: 2 unbound-bound cases Protein-drug docking: tens of bound cases (Estrogen receptor, HIV protease, COX) Performance: Several minutes for large protein molecules and seconds for small drug molecules on standard PC computer. Estrogen receptor Estradiol molecule from complex Docking solution DNA Endonuclease Docking solution Endonuclease I-PpoI (1EVX) with DNA (1A73). RMSD 0.87Å, rank 2 Estrogen receptor with estradiol (1A52). RMSD 0.9Å, rank 1, running time: 11 seconds 23 Classification of Drugs by SVM A drug is classified as either belong (+) or not belong (-) to a class Drug class: inhibitor of a protein, BBB penetrating, genotoxic, etc. Protein class: enzyme EC3.4 family, DNA-binding, etc. By screening against all classes, the property of a drug or the function of a protein can be identified Class-1 SVM - Class-2 SVM + …… - Class-n SVM - Drug Drug belongs to class-2 24 Classification of drugs by SVM What is SVM? • Support vector machines, a machine learning method based on artificial intelligence, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm). 25 Artificial Intelligence (AI) 26 Machine learning method Inductive learning (example-based learning) 27 Machine learning method Feature vectors A = (1, 1, 1) B = (0, 1, 1) C = (1, 1, 1) D = (0, 1, 1) E = (0, 0, 0) F = (1, 0, 1) 28 Machine learning method Feature vectors in input space Z Feature vector A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Input space F E A B Y X 29 SVM Method Drug family members Border Drug family members New border Nonmembers Nonmembers Project to a higher dimensional space 30 SVM Method New border Support vector Support vector Protein family members Nonmembers 31 SVM Method Support vector Protein family members Nonmembers New border Support vector 32 Best Linear Separator? 33 Find closest points in convex hulls d c 34 Plane bisect closest points d c 35 Best Linear Separator Supporting plane method Maximize distance Between two parallel supporting planes Distance = “Margin” = 36 Best Linear Separator Supporting plane method 37 SVM Method Border line is nonlinear 38 SVM Method Non-linear transformation: use of kernel function 39 SVM Method 40 SVM Method 41 SVM Method 42 SVM Method 43 SVM Method 44 SVM for classification of drugs How to represent a drug? • Each structure represented by specific feature vector assembled from structural, physico-chemical properties Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) Molecular Connectivity and shape (28 in total) Electro-topological state polarity (84 in total) Quantum chemical properties (electric charge, polaritability etc. 13 in total) Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004) 45 SVM-based drug design and property prediction software Your drug structure Chemical Structure Chemical Structure Drug Option two Which class your drug belongs to? Option one Send structure to classifier Input structure through internet Computer loaded with SVMProt Input structure on local machine Drug designed or property predicted SVM classifier for every Drug class Identified classes 46 SVM drug prediction results Protein inhibitor/activator/substrate prediction • 86% of the 129 estrogen receptor activators and 84% of 101 nonactivators correctly predicted. • 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted Drug toxicity prediction • 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted • 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted Pharmacokinetics prediction • 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted • 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. 47 Projects Q&A! 1. Biological pathway simulation 2. Computer-aided anti-cancer drug design 3. Disease-causing mutation on drug target Any questions? Thank you! 48