PowerPoint Presentation - Scientific Advisory Board

advertisement
Adventures in Computational
Enzymology
John Mitchell
University of St Andrews
The MACiE Database
Mechanism, Annotation and Classification in Enzymes.
http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
Gemma Holliday, Daniel Almonacid, Noel O’Boyle,
Janet Thornton, Peter Murray-Rust, Gail Bartlett,
James Torrance, John Mitchell
G.L. Holliday et al., Nucl. Acids Res., 35, D515-D520 (2007)
Enzyme Nomenclature and
Classification
EC Classification
Class
Subclass
Sub-subclass
Serial number
The EC Classification
Deals with overall reaction, not mechanism
Reaction direction arbitrary
Cofactors and active site residues
ignored
Doesn’t deal with structural and
sequence information
However, it was never intended to do so
A New Representation of
Enzyme Reactions?
 Should be complementary to, but distinct from, the
EC system
 Should take into account:
 Reaction Mechanism
Structure
Sequence
Active Site residues
Cofactors
 Need a database of enzyme mechanisms
MACiE Database
Mechanism, Annotation and Classification in Enzymes.
http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
Global Usage of MACiE
MACiE Entries
MACiE Mechanisms are Sourced from the Literature
Coverage of MACiE
Representative – based on a non-homologous dataset,
and chosen to represent each available EC sub-subclass.
EC is not Everything
• Different mechanisms can occur with
exactly the same EC number.
• MACiE has six beta-lactamases, all with
different mechanisms but the same
overall reaction.
EC Coverage of MACiE
Structures exist for:
MACiE covers:
6 EC 1.-.-.-
6 EC 1.-.-.-
61 EC 1.2.-.-
57 EC 1.2.-.-
204 EC 1.2.3.-
183 EC 1.2.3.-
1776 EC 1.2.3.4
321 EC 1.2.3.4
Representative – based on a non-homologous dataset,
and chosen to represent each available EC sub-subclass.
EC Coverage of MACiE
Repertoire of Enzyme Catalysis
G.L. Holliday et al., J. Molec. Biol., 372, 1261-1277 (2007)
G.L. Holliday et al., J. Molec. Biol., 390, 560-577 (2009)
Number of steps in MACiE
Repertoire of Enzyme Catalysis
140
Intramolecular
120
Bimolecular
Unimolecular
Enzyme chemistry
is largely nucleophilic
100
80
60
40
20
0
Heterolytic
Elimination
Homolytic
Elimination
Electrophilic
Addition
Nucleophilic
Addition
Homolytic
Addition
Reaction Types
Electrophilic
Substitution
Nucleophilic
Substitution
Homolytic
Substitution
Repertoire of Enzyme Catalysis
Enzyme chemistry
is largely nucleophilic
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
450
400
Number of steps in MACiE
350
300
250
200
150
100
50
0
Proton
transfer
AdN2
E1
SN2
E2
Reaction Types
Radical
reaction
Tautom.
Others
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
Repertoire of Enzyme Catalysis
We do see a few steps
corresponding to wellknown organic reactions;
but these are the
exception.
Residue Catalytic Propensities
Residue Catalytic Functions
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
Phospholipidosis
•
•
•
•
•
An adverse effect caused by drugs
Excess accumulation of phospholipids
Often by cationic amphiphilic drugs
Affects many cell types
Causes delay in the drug development
process
Lowe et al., Molec. Pharmaceutics, 7, 1708 (2010)
Phospholipidosis
• Causes delay in the drug development
process
• May or may not be related to human
pathologies such as Niemann-Pick
disease
Electron micrographs of alveolar macrophages (A and B) and peritoneal macrophages (C and D)
obtained from 3-month-old Lpla2+/+ and Lpla2-/- mice
Hiraoka, M. et al. 2006. Mol. Cell. Biol. 26(16):6139-6148
Tomizawa et al.,
Literature Mined Dataset
• Produced our own dataset of 185
compounds (from literature survey)
• 102 PPL+ and 83PPL• Each compound is an experimentally
confirmed positive or negative
R. Lowe, R.C. Glen, J.B.O. Mitchell Mol. Pharm. 2010 VOL. 7, NO. 5, 1708–1714
Some PPL+ molecules, from Reasor et al., Exp Biol Med, 226, 825 (2001)
10001101010011001101
10110101000011101101
10111101010001001100
10000001110011100111
10100101011101001110
10011111110001001010
Represent molecules using descriptors (we used E-Dragon & Circular Fingerprints)
Experimental Design
Split data into N folds, then train on
(N-2) of them, keeping one for
parameter optimisation and one for
unseen testing. Average results over
all runs (each molecule is predicted
once per N-fold validation).
We also repeat the whole process
several times with randomly
different assignments of which
molecules are in which folds.
Models are built using machine learning techniques such as Random Forest …
… or Support Vector Machine
Results
Average MCC Values:
RF
SVM
0.619
0.650
So we have built a good predictive model that can learn the
features that predispose a molecule to being PPL+, and can
make predictions from chemical structure.
This is useful – one could add it to a virtual screening protocol.
But can we understand anything new about how
phospholipidosis occurs?
Read up on gene expression studies related to phospholipidosis …
Sawada et al. listed genes which they found to be up- or down- regulated in phospholipidosis
As with all gene expression experiments, some of these will be highly relevant, others will
be noise. Can we help interpret these data?
Mechanism?
H. Sawada, K. Takami, S. Asahi Toxicological Sciences 2005 282-292
What expertise do we have available amongst our team, colleagues
& collaborators?
• Multiple target prediction
Florian Nigsch
• Maths
Hamse Mussa
• Programming
Rob Lowe
• Multiple target prediction
Predicting off-target interactions of drugs. Not with the primary
pharmaceutical target, but with other targets relevant to side effects.
CHEMBL
Data mining and filtering
Filtered CHEMBL,
241145 compounds & 1923 targets
Random 99:1 split of the whole dataset, 10 repeats
10 models
Phospholipidosis dataset: 100 PPL+, 82 PPL- compounds
Predicted target associations
Target PS scores
ChEMBL Mining
• Mined the ChEMBL (03) database for
compounds and targets they interact with
• Target description included the word
"enzyme", "cytosolic", "receptor",
"agonist" or "ion channel"
• A high cut-off (weak binding) was used on
Ki/Kd/IC50 values (< 500μM) to define
activity
Method
• Number of Compounds : 241145
• Number of Targets : 1923
• Split the data into 10 different partitions
of training and validation
• Used circular fingerprints with SYBYL atom
types to define similarities between
molecules
Multi-class Classification
Algorithms:
• Parzen-Rosenblatt window
• Naive Bayes
Parzen-Rosenblatt window
• Rank likely targets using estimates of classcondition probabilities
1
p( xi |  ) 
N
 K x , x 
x j
i
j

using a Gaussian kernel
K(xi, xj) =
 ( x i  x j )T ( x i  x j ) 

exp 
2
d


2h
)


1
(h 2
(xi - xj)T(xi - xj) corresponds to the number of features in which xi and xj disagree
Partition No.
PRW Rank
NB Rank
1
17.049
74.104
2
16.343
76.251
3
18.424
79.078
4
16.212
73.539
5
17.339
73.535
6
18.630
77.244
7
20.694
78.560
8
18.870
74.464
9
16.584
76.235
10
18.200
78.077
Average
17.835
76.109
When we test the two methods, PRW ranks known targets better
than Naïve Bayes does. Hence we use PRW for our study.
Assemble List of Targets Relevant to
Sawada’s Suggested Mechanisms
Mechanisms:
1. Inhibition of lysosomal phospholipase activity;
2. Inhibition of lysosomal enzyme transport;
3. Enhanced phospholipid biosynthesis;
4. Enhanced cholesterol biosynthesis.
Assemble List of Targets Relevant to
Sawada’s Suggested Mechanisms
Inhibition of
lysosomal
phospholipase
activity
Enhanced
phospholipid
biosynthesis
Enhanced
cholesterol
biosynthesis
Assigning Scores to Targets
• Use these 10 models of target interactions
• Predict targets for phospholipidosis dataset
• Score targets according to the likelihood of
involvement in phospholipidosis
• Use the top 100 predicted targets per
compound as we seek off-target interactions
N
PS   C p ( xi ) ( )
i 1
N
PS   C p ( xi ) ( )
i 1
• Score measures tendency of target to interact with
PPL+ rather than PPL- compounds.
M1 & M5 are involved in phospholipase C regulation & may be relevant; but not in Sawada’s list.
62
We consider a PS score significant if the target is
predicted to interact with at least 50 more PPL+
compounds than PPL- compounds.
Our Scores for 8 of Sawada’s PPL-Relevant Targets
Mechanism Target
1
Sphingomyelin phosphodiesterase (SMPD) (h)
55
163=
90
152=
97
1203=
-10
610=
0
3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h)
456=
10
Squalene monooxygenase (SQLE) (h)
437=
14
Lanosterol synthase (LSS) (h)
114=
134
Phospholipase A2 (PLA2) (h)
3
Elongation of very long chain fatty acids protein 6 (ELOVL6) (h)
Enhanced
phospholipid
biosynthesis Acyl-CoA desaturase (SCD) (m)
Enhanced
cholesterol
biosynthesis
PS
225
Inhibition of
lysosomal
Lysosomal Phospholipase A1 (LYPLA1) (r)
phospholipase
activity
4
Rank
Our Scores for Sawada’s PPL-Relevant Targets
Mechanism Target
1
Sphingomyelin phosphodiesterase (SMPD) (h)
55
163=
90
152=
97
1203=
-10
610=
0
3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) (h)
456=
10
Squalene monooxygenase (SQLE) (h)
437=
14
Lanosterol synthase (LSS) (h)
114=
134
Phospholipase A2 (PLA2) (h)
3
Elongation of very long chain fatty acids protein 6 (ELOVL6) (h)
Enhanced
phospholipid
biosynthesis Acyl-CoA desaturase (SCD) (m)
Enhanced
cholesterol
biosynthesis
PS
225
Inhibition of
lysosomal
Lysosomal Phospholipase A1 (LYPLA1) (r)
phospholipase
activity
4
Rank
Other Mechanisms
• The mechanisms and targets suggested here
are insufficient to explain all the PPL+
compounds in our data set.
• We expect that other targets and possibly
mechanisms are important.
• Our method can’t test direct compound –
phospholipid binding.
67
ACKNOWLEDGEMENTS
Dr Gemma Holliday
Dr Rob Lowe
Dr Daniel Almonacid
Prof. Janet Thornton
Dr Florian Nigsch
Dr Hamse Mussa
Prof. Bobby Glen
Dr Andreas Bender
Alexios Koutsoukas
ACKNOWLEDGEMENTS
Cambridge Overseas
Trust
Download