Toxicological Relationships Between Proteins Obtained From a

advertisement
Toxicological Relationships
Between Proteins Obtained From
a Molecular Spam Filter
Florian Nigsch & John Mitchell
F. Nigsch, et al., J. Chem. Inf. Model., 48, 306-318 (2008)
F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008)
F. Nigsch, et al., J. Chem. Inf. Model., 48, 2313-2325 (2008)
Toxicological Relationships
Between Proteins Obtained From
a Molecular Spam Filter
Florian Nigsch & John Mitchell
F. Nigsch, et al., J. Chem. Inf. Model., 48, 306-318 (2008)
F. Nigsch, et al., Toxicology and Applied Pharmacology, 231, 225-234 (2008)
F. Nigsch, et al., J. Chem. Inf. Model., 48, 2313-2325 (2008)
Toxicological Relationships
Between Proteins Obtained From
a Molecular Spam Filter
Florian Nigsch & John Mitchell
Now at Novartis Institutes, Boston
Toxicological Relationships
Between Proteins Obtained From
a Molecular Spam Filter
Florian Nigsch & John Mitchell
Soon moving to University of St Andrews
Spam
• Unsolicited
(commercial) email
• Approx. 90% of all
email traffic is spam
• Where are the
legitimate
messages?
• Filtering
Analogy to Drug Discovery
• Huge number of possible candidates
• Virtual screening to help in selection process
Properties of Drugs
• High affinity to protein
target
• Soluble
• Permeable
• Absorbable
• High bioavailability
• Specific rate of
metabolism
• Renal/hepatic clearance?
•
•
•
•
Volume of distribution?
Low toxicity
Plasma protein binding?
Blood-Brain-Barrier
penetration?
• Dosage (once/twice
daily?)
• Synthetic accessibility
• Formulation (important in
development)
Multiobjective Optimisation
Bioactivity
Toxicity
Synthetic
accessibility
Solubility
Metabolism
Permeability
Huge number of candidates …
Multiobjective Optimisation
Bioactivity
Drug
Toxicity
Synthetic
accessibility
Solubility
Metabolism
Permeability
Huge number of candidates …
most of which are useless!
Winnow Algorithm
• Invented in late 1980s by Nick Littlestone to learn
Boolean functions
• Name from the verb “to winnow”
– High-dimensional input data
• Natural Language Processing (NLP), text
classification, bioinformatics
• Different varieties (regularised, Sparse Network
Of Winnow - SNOW, …)
• Error-driven, linear threshold, online algorithm
Winnow Algorithm
• Invented in late 1980s by Nick Littlestone to learn
Boolean functions
• Name from the verb “to winnow”
– High-dimensional input data
• Natural Language Processing (NLP), text
classification, bioinformatics
• Different varieties (regularised, Sparse Network
Of Winnow - SNOW, …)
• Error-driven, linear threshold, online algorithm
Winnow Algorithm
• Invented in late 1980s by Nick Littlestone to learn
Boolean functions
• Name from the verb “to winnow”
– High-dimensional input data
• Natural Language Processing (NLP), text
classification, bioinformatics
• Different varieties (regularised, Sparse Network
Of Winnow - SNOW, …)
• Error-driven, linear threshold, online algorithm
Winnow Algorithm
• Invented in late 1980s by Nick Littlestone to learn
Boolean functions
• Name from the verb “to winnow”
– High-dimensional input data
• Natural Language Processing (NLP), text
classification, bioinformatics
• Different varieties (regularised, Sparse Network
Of Winnow - SNOW, …)
• Error-driven, linear threshold, online algorithm
Winnow Algorithm
• Invented in late 1980s by Nick Littlestone to learn
Boolean functions
• Name from the verb “to winnow”
– High-dimensional input data
• Natural Language Processing (NLP), text
classification, bioinformatics
• Different varieties (regularised, Sparse Network
Of Winnow - SNOW, …)
• Error-driven, linear threshold, online algorithm
Feature Space - Chemical Space
m=
(f1,f2,…,fn)
f3
f3
CDK2 COX2
CDK1
f1
DHFR
f2
f1
f2
Feature spaces of
high dimensionality
Combinations of Features
Combinations of
molecular features
to account for
synergies.
Features of Molecules
Based on circular fingerprints
Training Example
Workflow
For predicting protein targets
Protein Target Prediction
•
•
•
•
•
Which protein does a given molecule bind to?
Virtual Screening
Multiple endpoint drugs - polypharmacology
New targets for existing drugs
Prediction of adverse drug reactions (ADR)
– Computational toxicology
Protein Target Prediction
•
•
•
•
•
Which protein does a given molecule bind to?
Virtual Screening
Multiple endpoint drugs - polypharmacology
New targets for existing drugs
Prediction of adverse drug reactions (ADR)
– Computational toxicology
Protein Target Prediction
•
•
•
•
•
Which protein does a given molecule bind to?
Virtual Screening
Multiple endpoint drugs - polypharmacology
New targets for existing drugs
Prediction of adverse drug reactions (ADR)
– Computational toxicology
Protein Target Prediction
•
•
•
•
•
Which protein does a given molecule bind to?
Virtual Screening
Multiple endpoint drugs - polypharmacology
New targets for existing drugs
Prediction of adverse drug reactions (ADR)
– Computational toxicology
Protein Target Prediction
•
•
•
•
•
Which protein does a given molecule bind to?
Virtual Screening
Multiple endpoint drugs - polypharmacology
New targets for existing drugs
Prediction of adverse drug reactions (ADR)
– Computational toxicology
Predicted Protein Targets
• Selection of 233
classes from the
MDL Drug Data
Report
• ~90,000 molecules
• 15 independent
50%/50% splits into
training/test set
Predicted Protein Targets
Cumulative probability of correct prediction within
the three top-ranking predictions: 82.1% (±0.5%)
Computational Toxicology
• Model for target
prediction
• Annotated library of
toxic molecules
• For each molecule we
predict the likely target
• Correlations between
predicted protein
targets and known
– MDL Toxicity database
toxicity codes
– ~150,000 molecules
– Standardisation
– MySQL database
– Canonical (23)
– Full (490)
Toxicological Relationships Outline (1)
• Protein target prediction allows us to link
(predictively) 150,000 toxic organic molecules to
233 specific protein targets
• Each target is treated as a single protein,
although may be sets of related proteins)
• Toxicological databases link (experimentally)
these 150,000 molecules to 23 toxicity classes
• Combining these two sources of data matches
the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1)
• Protein target prediction allows us to link
(predictively) 150,000 toxic organic molecules to
233 specific protein targets
• Each target is treated as a single protein,
although may be sets of related proteins
• Toxicological databases link (experimentally)
these 150,000 molecules to 23 toxicity classes
• Combining these two sources of data matches
the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1)
• Protein target prediction allows us to link
(predictively) 150,000 toxic organic molecules to
233 specific protein targets
• Each target is treated as a single protein,
although may be sets of related proteins
• Toxicological databases link (experimentally)
these 150,000 molecules to 23 toxicity classes
• Combining these two sources of data matches
the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (1)
• Protein target prediction allows us to link
(predictively) 150,000 toxic organic molecules to
233 specific protein targets
• Each target is treated as a single protein,
although may be sets of related proteins
• Toxicological databases link (experimentally)
these 150,000 molecules to 23 toxicity classes
• Combining these two sources of data matches
the 233 proteins with the 23 toxicity classes
Toxicological Relationships Outline (2)
• For each protein target, we have a profile of
association with the 23 toxicity classes
• Proteins with similar profiles are clustered
together
• We demonstrate that these clusters of
proteins can be physiologically meaningful.
Toxicological Relationships Outline (2)
• For each protein target, we have a profile of
association with the 23 toxicity classes
• Proteins with similar profiles are clustered
together
• We demonstrate that these clusters of
proteins can be physiologically meaningful.
Toxicological Relationships Outline (2)
• For each protein target, we have a profile of
association with the 23 toxicity classes
• Proteins with similar profiles are clustered
together
• We demonstrate that these clusters of
proteins can be physiologically meaningful.
Predictions Obtained
Target Prediction
L70 - Changes in liver weight<Liver
Y07 - Hepatic microsomal oxidase<Enzyme inhibition
M30 - Other changes<Kidney, Urether, and Bladder
L30 - Other changes<Liver
Highest ranking
class IS predicted
protein target
Protein code j
Toxicity codes i
Toxcodes
Result matrix R = (rij)
rij incremented for
each prediction.
Protein targets
(
r11 r12 …
r21
)
Toxicity Annotations
FULL TOXICITY CODES (490)
Y41 : Glycolytic < Metabolism (intermediary) < Biochemical
CANONICAL TOXICITY CODES (23)
Proteins by Toxicity
•
Cardiac - G
•
1. Kainic acid receptor
2. Adrenergic alpha2
3. Phosphodiesterase III
4. cAMP Phosphodiesterase
5. O6-Alkylguanine-DNA
alkyltransferase
Vascular - H
1. Angiotensin II AT2
2. Dopamine (D2)
3. Bombesin
4. Adrenergic alpha2
5. 5-HT antagonist
Top 5 Proteins by Toxicity
68 distinct proteins
for 23 toxicity
classes, i.e., 3.0
proteins per
canonical toxicity
code.
Lanosterol 14alpha-Methyl Demethylase 5
Glucose-6-phosphate Translocase 4
IL-6 4
Benzodiazepine Antagonist 3
Kainic Acid Receptor 3
Proteins and their connectivities
Clustering of Toxicity Classes
Clustering of toxicity classes: based on predicted protein
associations from the result matrix
Correlation Between Toxicity Classes
Correlations between toxicity classes: 23 by 23 correlation matrix
Correlation Between Proteins
Correlations between proteins: 233 by 233 correlation matrix
Correlation Between Proteins
Correlations between proteins: 233 by 233 correlation matrix
Cluster 1 (proteins 6-11)
We will look at two specific clusters, which are called
Cluster 1 and Cluster 4.
Cluster 1
• Cluster 1 (proteins 6-11)
• Within-cluster
correlation (without
auto-correlation)
r = 0.95
• Carbonic Anhydrase
Inhibitor
• Estrogen Receptor
Modulator
• LHRH Agonist
• Aromatase Inhibitor
• Cysteine Protease
Inhibitor
• DHFR Inhibitor
Cluster 1
• Cluster 1 (proteins 6-11)
• Within-cluster
correlation (without
auto-correlation)
r = 0.95
• Carbonic Anhydrase
Inhibitor
• Estrogen Receptor
Modulator
• LHRH Agonist
• Aromatase Inhibitor
• Cysteine Protease
Inhibitor
• DHFR Inhibitor
Cluster 1
Cluster 1
• Within-cluster
correlation (without
auto-correlation)
r = 0.95
• Carbonic Anhydrase
Inhibitor
• Estrogen Receptor
Modulator
• LHRH Agonist
• Aromatase Inhibitor
• Cysteine Protease
Inhibitor
• DHFR Inhibitor
Proteins involved in breast cancer
Cluster 1
Proteins involved in breast cancer
Literature-based links between these proteins
Computational Toxicology
Tissue-specific transcripts of human
steroid sulfatase are under control of
estrogen signaling pathways in breast
carcinoma, Zaichuk 2007
“aim of this study was to characterize
carbonic anhydrase II (CA2), as novel
estrogen responsive gene” Caldarelli 2005
ER
CA
The Transactivation Domain
AF-2 but Not the DNA-Binding
Domain of the Estrogen
Receptor Is Required to Inhibit
Differentiation of Avian
Erythroid Progenitors, Marieke
von Lindern 1998
Controversies of adjuvant endocrine
treatment for breast cancer and
recommendations of the 2007 St
Gallen conference, Rabaglio 2007
Aromatase
LHRH
Cathepsin L Gene Expression and
Promoter Activation in Rodent Granulosa
Cells, Sriraman 2004
Merchenthaler 2005
Summary of aromatase inhibitor trials:
The past and future, Goss 2007
This led to premature expression of
CAII, a possible explanation for the
toxic effects of overexpressed ER.
Regulation of collagenolytic cysteine
protease synthesis by estrogen in
osteoclasts, Furuyama 2000
Induction by estrogens of methotrexate
resistance in MCF-7 breast cancer
cells, Thibodeau 1998
DHFR
showed that cathepsin L expression in
granulosa cells of small, growing follicles
in- creased in periovulatory follicles after
human chorionic gonadotropin
stimulation.
Cysteine Prot.
Antimalarials?
Breast Cancer Proteins
and now Cluster 4 …
Cluster 4
This cluster links treatment of stomach ulcers to loss of
bone mass!
This cluster links treatment of stomach ulcers to loss of
bone mass!
Proton Pump Inhibitors etc.
Correlation
above 0.98
Proton Pump Inhibitors etc.
Correlation
above 0.99
Correlation
above 0.98
Proton Pump Inhibitors etc.
PTH = Parathyroid hormone (84 aa mini-protein)
•
Proton pump inhibitors used to limit
production of gastric acid
•
PTH is important in the
developent/regulation of osteoclasts (cells
for bone resorption)
•
PTH controls levels of Ca2+ in the blood;
increased PTH levels are associated with
age-related decrease of bone mass
Recent clinical studies showed increased risk of hip
fractures resulting from long-term use of proton pump
inhibitors. Hence link between PTH and proton pump
inhibitors.
Proton Pump Inhibitors etc.
PTH = Parathyroid hormone (84 aa mini-protein)
•
Proton pump inhibitors used to limit
production of gastric acid
•
PTH is important in the
developent/regulation of osteoclasts (cells
for bone resorption)
•
PTH controls levels of Ca2+ in the blood;
increased PTH levels are associated with
age-related decrease of bone mass
Recent clinical studies showed increased risk of hip
fractures resulting from long-term use of proton pump
inhibitors. Hence link between PTH and proton pump
inhibitors.
Proton Pump Inhibitors etc.
PTH = Parathyroid hormone (84 aa mini-protein)
•
Proton pump inhibitors used to limit
production of gastric acid
•
PTH is important in the
developent/regulation of osteoclasts (cells
for bone resorption)
•
PTH controls levels of Ca2+ in the blood;
increased PTH levels are associated with
age-related decrease of bone mass
Recent clinical studies showed increased risk of hip
fractures resulting from long-term use of proton pump
inhibitors. Hence link between PTH and proton pump
inhibitors.
Proton Pump Inhibitors etc.
PTH = Parathyroid hormone (84 aa mini-protein)
•
Proton pump inhibitors used to limit
production of gastric acid
•
PTH is important in the
developent/regulation of osteoclasts (cells
for bone resorption)
•
PTH controls levels of Ca2+ in the blood;
increased PTH levels are associated with
age-related decrease of bone mass
Recent clinical studies showed increased risk of hip
fractures resulting from long-term use of proton pump
inhibitors. Hence link between PTH and proton pump
inhibitors.
Proton Pump Inhibitors etc.
•
Proton pump inhibitors used to limit
production of gastric acid
•
PTH is important in the
developent/regulation of osteoclasts (cells
for bone resorption)
•
PTH controls levels of Ca2+ in the blood;
increased PTH levels are associated with
age-related decrease of bone mass
Recent clinical studies showed increased risk of hip
fractures resulting from long-term use of proton pump
inhibitors. Hence link between PTH and proton pump
inhibitors.
Conclusions
• Successful adaptation of algorithm formerly not used
in this area
• Benchmark confirms usability, speed & memory
requirements
• Can find correct protein targets for molecules
• Hence link proteins together via ligand-binding
properties and associations of ligands with toxicities
• Identify toxicological relationships between proteins
Conclusions
• Successful adaptation of algorithm formerly not used
in this area
• Benchmark confirms usability, speed & memory
requirements
• Can find correct protein targets for molecules
• Hence link proteins together via ligand-binding
properties and associations of ligands with toxicities
• Identify toxicological relationships between proteins
Conclusions
• Successful adaptation of algorithm formerly not used
in this area
• Benchmark confirms usability, speed & memory
requirements
• Can find correct protein targets for molecules
• Hence link proteins together via ligand-binding
properties and associations of ligands with toxicities
• Identify toxicological relationships between proteins
Conclusions
• Successful adaptation of algorithm formerly not used
in this area
• Benchmark confirms usability, speed & memory
requirements
• Can find correct protein targets for molecules
• Hence link proteins together via ligand-binding
properties and associations of ligands with toxicities
• Identify toxicological relationships between proteins
Conclusions
• Successful adaptation of algorithm formerly not used
in this area
• Benchmark confirms usability, speed & memory
requirements
• Can find correct protein targets for molecules
• Hence link proteins together via ligand-binding
properties and associations of ligands with toxicities
• Identify toxicological relationships between proteins
Acknowledgements
» Cambridge
• Andreas Bender
• Hamse Mussa
• Jeremy Jenkins
» Unilever
• Jos Tissen
• Bernd van Buuren
• Silvia Miret
Funding - Unilever
Download