Adventures in Computational Enzymology John Mitchell University of St Andrews The MACiE Database Mechanism, Annotation and Classification in Enzymes. http://www.ebi.ac.uk/thornton-srv/databases/MACiE/ Gemma Holliday, Daniel Almonacid, Noel O’Boyle, Janet Thornton, Peter Murray-Rust, Gail Bartlett, James Torrance, John Mitchell G.L. Holliday et al., Nucl. Acids Res., 35, D515-D520 (2007) Enzyme Nomenclature and Classification EC Classification Class Subclass Sub-subclass Serial number The EC Classification Deals with overall reaction, not mechanism Reaction direction arbitrary Cofactors and active site residues ignored Doesn’t deal with structural and sequence information However, it was never intended to do so A New Representation of Enzyme Reactions? Should be complementary to, but distinct from, the EC system Should take into account: Reaction Mechanism Structure Sequence Active Site residues Cofactors Need a database of enzyme mechanisms MACiE Database Mechanism, Annotation and Classification in Enzymes. http://www.ebi.ac.uk/thornton-srv/databases/MACiE/ Global Usage of MACiE MACiE Entries MACiE Mechanisms are Sourced from the Literature Coverage of MACiE Representative – based on a non-homologous dataset, and chosen to represent each available EC sub-subclass. EC is not Everything • Different mechanisms can occur with exactly the same EC number. • MACiE has six beta-lactamases, all with different mechanisms but the same overall reaction. EC Coverage of MACiE Structures exist for: MACiE covers: 6 EC 1.-.-.- 6 EC 1.-.-.- 61 EC 1.2.-.- 57 EC 1.2.-.- 204 EC 1.2.3.- 183 EC 1.2.3.- 1776 EC 1.2.3.4 321 EC 1.2.3.4 Representative – based on a non-homologous dataset, and chosen to represent each available EC sub-subclass. EC Coverage of MACiE Repertoire of Enzyme Catalysis G.L. Holliday et al., J. Molec. Biol., 372, 1261-1277 (2007) G.L. Holliday et al., J. Molec. Biol., 390, 560-577 (2009) Number of steps in MACiE Repertoire of Enzyme Catalysis 140 Intramolecular 120 Bimolecular Unimolecular Enzyme chemistry is largely nucleophilic 100 80 60 40 20 0 Heterolytic Elimination Homolytic Elimination Electrophilic Addition Nucleophilic Addition Homolytic Addition Reaction Types Electrophilic Substitution Nucleophilic Substitution Homolytic Substitution Repertoire of Enzyme Catalysis Enzyme chemistry is largely nucleophilic Repertoire of Enzyme Catalysis Repertoire of Enzyme Catalysis 450 400 Number of steps in MACiE 350 300 250 200 150 100 50 0 Proton transfer AdN2 E1 SN2 E2 Reaction Types Radical reaction Tautom. Others Repertoire of Enzyme Catalysis Repertoire of Enzyme Catalysis Repertoire of Enzyme Catalysis Repertoire of Enzyme Catalysis We do see a few steps corresponding to wellknown organic reactions; but these are the exception. Residue Catalytic Propensities Residue Catalytic Functions Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010) Phospholipidosis • • • • • An adverse effect caused by drugs Excess accumulation of phospholipids Often by cationic amphiphilic drugs Affects many cell types Causes delay in the drug development process Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010) Phospholipidosis • Causes delay in the drug development process • May or may not be related to human pathologies such as Niemann-Pick disease Electron micrographs of alveolar macrophages (A and B) and peritoneal macrophages (C and D) obtained from 3-month-old Lpla2+/+ and Lpla2-/- mice Hiraoka, M. et al. 2006. Mol. Cell. Biol. 26(16):6139-6148 Tomizawa et al., Literature Mined Dataset • Produced our own dataset of 185 compounds (from literature survey) • 102 PPL+ and 83PPL• Each compound is an experimentally confirmed positive or negative R. Lowe, R.C. Glen, J.B.O. Mitchell Mol. Pharm. 2010 VOL. 7, NO. 5, 1708–1714 Some PPL+ molecules, from Reasor et al., Exp Biol Med, 226, 825 (2001) 10001101010011001101 10110101000011101101 10111101010001001100 10000001110011100111 10100101011101001110 10011111110001001010 Represent molecules using descriptors (we used E-Dragon & Circular Fingerprints) Experimental Design Split data into N folds, then train on (N-2) of them, keeping one for parameter optimisation and one for unseen testing. Average results over all runs (each molecule is predicted once per N-fold validation). We also repeat the whole process several times with randomly different assignments of which molecules are in which folds. Models are built using machine learning techniques such as Random Forest … … or Support Vector Machine Results Average MCC Values: RF SVM 0.619 0.650 So we have built a good predictive model that can learn the features that predispose a molecule to being PPL+, and can make predictions from chemical structure. This is useful – one could add it to a virtual screening protocol. But can we understand anything new about how phospholipidosis occurs? Read up on gene expression studies related to phospholipidosis … Sawada et al. listed genes which they found to be up- or down- regulated in phospholipidosis As with all gene expression experiments, some of these will be highly relevant, others will be noise. Can we help interpret these data? Mechanism? H. Sawada, K. Takami, S. Asahi Toxicological Sciences 2005 282-292 What expertise do we have available amongst our team, colleagues & collaborators? • Multiple target prediction Florian Nigsch • Maths Hamse Mussa • Programming Rob Lowe • Multiple target prediction Predicting off-target interactions of drugs. Not with the primary pharmaceutical target, but with other targets relevant to side effects. CHEMBL Data mining and filtering Filtered CHEMBL, 241145 compounds & 1923 targets Random 99:1 split of the whole dataset, 10 repeats 10 models Phospholipidosis dataset: 100 PPL+, 82 PPL- compounds Predicted target associations Target PS scores ChEMBL Mining • Mined the ChEMBL (03) database for compounds and targets they interact with • Target description included the word "enzyme", "cytosolic", "receptor", "agonist" or "ion channel" • A high cut-off (weak binding) was used on Ki/Kd/IC50 values (< 500μM) to define activity Method • Number of Compounds : 241145 • Number of Targets : 1923 • Split the data into 10 different partitions of training and validation • Used circular fingerprints with SYBYL atom types to define similarities between molecules Multi-class Classification Algorithms: • Parzen-Rosenblatt window • Naive Bayes Parzen-Rosenblatt window • Rank likely targets using estimates of classcondition probabilities 1 p( xi | ) N K x , x x j i j using a Gaussian kernel K(xi, xj) = ( x i x j )T ( x i x j ) exp 2 d 2h ) 1 (h 2 (xi - xj)T(xi - xj) corresponds to the number of features in which xi and xj disagree Partition No. PRW Rank NB Rank 1 17.049 74.104 2 16.343 76.251 3 18.424 79.078 4 16.212 73.539 5 17.339 73.535 6 18.630 77.244 7 20.694 78.560 8 18.870 74.464 9 16.584 76.235 10 18.200 78.077 Average 17.835 76.109 When we test the two methods, PRW ranks known targets better than Naïve Bayes does. Hence we use PRW for our study. Assemble List of Targets Relevant to Sawada’s Suggested Mechanisms Mechanisms: 1. Inhibition of lysosomal phospholipase activity; 2. Inhibition of lysosomal enzyme transport; 3. Enhanced phospholipid biosynthesis; 4. Enhanced cholesterol biosynthesis. Assemble List of Targets Relevant to Sawada’s Suggested Mechanisms Inhibition of lysosomal phospholipase activity Enhanced phospholipid biosynthesis Enhanced cholesterol biosynthesis Assigning Scores to Targets • Use these 10 models of target interactions • Predict targets for phospholipidosis dataset • Score targets according to the likelihood of involvement in phospholipidosis • Use the top 100 predicted targets per compound as we seek off-target interactions N PS C p ( xi ) ( ) i 1 N PS C p ( xi ) ( ) i 1 • Score measures tendency of target to interact with PPL+ rather than PPL- compounds. M1 & M5 are involved in phospholipase C regulation & may be relevant; but not in Sawada’s list. 62 We consider a PS score significant if the target is predicted to interact with at least 50 more PPL+ compounds than PPL- compounds. Our Scores for 8 of Sawada’s PPL-Relevant Targets Mechanism Target 1 Sphingomyelin phosphodiesterase (SMPD) (h) 55 163= 90 152= 97 1203= -10 610= 0 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456= 10 Squalene monooxygenase (SQLE) (h) 437= 14 Lanosterol synthase (LSS) (h) 114= 134 Phospholipase A2 (PLA2) (h) 3 Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) Enhanced phospholipid biosynthesis Acyl-CoA desaturase (SCD) (m) Enhanced cholesterol biosynthesis PS 225 Inhibition of lysosomal Lysosomal Phospholipase A1 (LYPLA1) (r) phospholipase activity 4 Rank Our Scores for Sawada’s PPL-Relevant Targets Mechanism Target 1 Sphingomyelin phosphodiesterase (SMPD) (h) 55 163= 90 152= 97 1203= -10 610= 0 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h) 456= 10 Squalene monooxygenase (SQLE) (h) 437= 14 Lanosterol synthase (LSS) (h) 114= 134 Phospholipase A2 (PLA2) (h) 3 Elongation of very long chain fatty acids protein 6 (ELOVL6) (h) Enhanced phospholipid biosynthesis Acyl-CoA desaturase (SCD) (m) Enhanced cholesterol biosynthesis PS 225 Inhibition of lysosomal Lysosomal Phospholipase A1 (LYPLA1) (r) phospholipase activity 4 Rank Other Mechanisms • The mechanisms and targets suggested here are insufficient to explain all the PPL+ compounds in our data set. • We expect that other targets and possibly mechanisms are important. • Our method can’t test direct compound – phospholipid binding. 67 ACKNOWLEDGEMENTS Dr Gemma Holliday Dr Rob Lowe Dr Daniel Almonacid Prof. Janet Thornton Dr Florian Nigsch Dr Hamse Mussa Prof. Bobby Glen Dr Andreas Bender Alexios Koutsoukas ACKNOWLEDGEMENTS Cambridge Overseas Trust