Slides 1(ppt)

advertisement
Master Chemoinfo
• Criblage virtuel
Alexandre Varnek
Faculté de Chimie, ULP, Strasbourg, FRANCE
computational
Hit
Target Protein
Filtering,
QSAR,
Docking
Large libraries
of molecules
Small Library of selected hits
experimental
Virtual Screening
High Throughout Screening
Chemical universe:
• 10200 molecules
•
1060 druglike molecules
Virtual screening
must be fast and
reliable
Molecules are considered as vectors in
multidimentional chemical space defined
by the descriptors
Criblage à haut débit
Génomique
Cible
HTS
Criblage à haut débit
High-throughput
screening
Hits
Analyse de
données
Lead
Optimisation
Candidat au
développement
Drug Discovery and ADME/Tox studies
should be performed in parallel
idea target combichem/HTS hit lead
candidate
ADME/Tox studies
drug
Methodologies of a virtual screening
from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003
Platform for Ligand Based Virtual Screening
~106 – 109
molecules
• Filters
• Similarity search
~103 - – 104
molecules
• QSAR models
Candidates for docking
or experimental tests
Criblage à haut débit (HTS)
Mots clés:
- Chimie combinatoire
-Criblage à haut débit
(High Throughput Screening (HTS))
- Screening virtuel
- Aspect Drug-like
- Training sets jusqu’à 1000000 composés
Virtual Screening
Molecules available for screening
(1) Real molecules
1 - 2 millions in in-house archives of large pharma and
agrochemical companies
3 - 4 millions of samples available commercially
(2) Hypothetical molecules
Virtual combinatorial libraries (up to 1060 molecules)
Methods of virtual High-Throughput
Screening
• Filters
• Similarity search
• Classification and regression structure –
property models
• Docking
Filters to estimate “drug-likeness”
Lipinski rules for intestinal absorption
(« Rules of 5 »)
•
H-bond donors < 5
•
(the sum of OH and NH groups);
•
MWT < 500;
•
LogP < 5
•
H-bond acceptors < 10
(the sum of N and O atoms without H attached).
Lipinski rules for drug-like molecules (« Rules of 5 »)
Lipinski rules for drug-like molecules (« Rules of 5 »)
Example of different filters:
Rules for Absorbable compounds
Lipinski
Veber
AB/HIA
< 500
< 770
< 1,000
Log P
<5
<9
< 10
H-Don.
<5
---
<6
H-Acc.
< 10
---
< 19
H-D + H-A
---
< 12
< 22
Rot-Bonds
---
< 10
< 19
tPSA
---
< 140
< 291
Mol. W.
Remove compounds containing too many rings
Remove compounds with toxic groups
Remove compounds with reactive groups
Remove False-Positive Hits
Remove poorly soluble compounds
Filter on inorganic and heteroatom compounds
Remove compounds with multiple chiral centers
Paclitaxel (Taxol): violation of 2 rules
MW = 837
logP=4.49
HD = 3
HA = 15
O
O
H3C
O
CH3
HO
HO
O
CH3
O
O
O
H3C
O
O
HN
O
H3C
O
CH3
logD vs logP
95% of all drugs are ionizable :
75% are bases and 20% acids
Utilizing pH dependent log D as a descriptor for lipophilicity in
place of log P significantly increases the number of compounds
correctly identified as drug-like using the drug-likeness filter:
log D5.5 < 5
The Rule of Five Revisited: Applying Log D in Place of Log P in Drug-Likeness Filters
S. K. Bhal, K. Kassam, I. G. Peirson, and G. M. Pearl , MOLECULAR PHARMACEUTICS, v.4, 556-560, (2007)
Synthetic Accessibility
is proportional to fragment’s occurrence in the PubChem database
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Synthetic Accessibility
Frequency distribution of fragments
Altogether 605,864 different fragment types have been obtained by fragmenting the
PubChem structures. Most of them (51%), however are singletons (present only once in
the whole set). Only a relatively small number of fragments, namely 3759 (0.62%), are
frequent (i.e. present more than 1000-times in the database).
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Synthetic Accessibility
The most common fragments present in the million PubChem molecules. The "A" represents any nonhydrogen atom, "dashed" double bond indicates an aromatic bond and the yellow circle marks the central atom of
the fragment.
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Synthetic Accessibility
Distribution of (- Sascore) for natural products,
bioactive molecules and molecules from catalogues.
Correlation of calculated (-SAscore ) and average
chemist estimation for 40 molecules (r2 = 0.890)
Ertl and Schuffenhauer Journal of Cheminformatics 2009 1:8
Similarity Search:
unsupervised and supervised approaches
 2d (unsupervised) Similarity Search
H
N
O
N
N
S
O
O
H
N
Cl
N
O
N
S
O
O
Tanimoto coef
NA &B
T
 0.80
NA  NB  NA &B
1010001001110110101
0010001001110110101
molecular fingerprints
Contineous and Discontineous
SAR
Structural Spectrum of Thrombin Inhibitors
structural similarity “fading away”
…
reference
compounds
0.56
0.72
0.53
0.84
0.67
0.52
0.82
0.64
0.39
continuous SARs
gradual changes in structure result in moderate
changes in activity
“rolling hills” (G. Maggiora)
Structure-Activity Landscape Index:
discontinuous SARs
small changes in structure have
dramatic effects on activity
“cliffs” in activity landscapes
SALIij = DAij / DSij
DAij (DSij ) is the difference between activities (similarities) of molecules i and j
R. Guha et al. J.Chem.Inf.Mod., 2008, 48, 646
VEGFR-2 tyrosine kinase inhibitors
discontinuous SARs
6 nM
MACC
STc:
1.00
Analog
2390 nM
bad news for molecular similarity analysis...
small changes in structure have
dramatic effects on activity
“cliffs” in activity landscapes
lead optimization, QSAR
Example of a “Classical” Discontinuous SAR
Any similarity method
must recognize these
compounds as being
“similar“ ...
(MACCS Tanimoto similarity)
Adenosine deaminase inhibitors
Supervised
Molecular Similarity Analysis
Dynamic Mapping of Consensus Positions
 Prototypic “mapping algorithm” for simplified binarytransformed* descriptor spaces
 Uses known active compounds to create activity-dependent
consensus positions in chemical space
 Operates in descriptor spaces of step-wise increasing
dimensionality (“dimension extension”)
 Selects preferred descriptors from large pools
* median-based, i.e. assign “1” to a descriptor if its value is greater than (or equal
to) its screening database median; assign “0” if it is smaller
Godden et al. & Bajorath. J Chem Inf Comput Sci 44, 21 (2004)
Descriptor bit strings for reference molecules
DMC Algorithm
…
Calculate and binary
transform descriptors
Compare descriptor bit
strings of reference
molecules and determine
consensus bits
Calculate consensus bit string:
= 1.0 or = 0.0
no variability
1. Dimension extension:
 0.9 or  0.1
10% variability
2. Dimension extension:
 0.8 or  0.2
20% variability
Select DB compounds
matching consensus bits
Re-generate bit strings
permitting bit variability
(white “0”, black “1” gray, variably set bits)
0
1
2
Select DB compounds
matching extended bit strings
Repeat until a small selection
set is obtained
e.g. 0%, 10%, 20% permitted bit variability:
longer bit strings – fewer matching DB compounds
QSAR/QSPR models
Screening and hits selection
Database
O
COOH
Cl
Br
OH
N
OH
Virtual
Sreening
N
OH
QSPR model
N
COOH
Useless
compounds
O
Br
Hits
Experimental
Tests
Libraries profiling:
indexing a database by simultaneous
assessment of various activities
Example:
PASS
software
(Prediction of Activity Spectra for Substances)
For each fragment i
wi 
acti
acti inacti
PASS
Naïve Bayes estimator
Calculations of « P(act) » and « P(inact) »
Molecule is considered as active if
P(act) > P(inact) or/and P(act) > 0.7
Quantitative Structure-Property Relationships
(QSPR)
Y = f (Structure) = f (descriptors)
QSPR restricts reliable predictions for compounds which
are similar to those used for the obtaining the models.
Similarity / pharmacophore search approaches are still
inevitable as complementary tools
Combinatorial Library Design
Virtual Screening
... when target structure is unknown
Screening library
Virtual library
Diverse
Subset
Hits
HTS
Design of
focussed library
Screening
Parallel synthesis
or
synthesis of single
compounds
Generation of Virtual Combinatorial Libraries
Fragment Marking approach
O
Markush structure
R1
P R3
R2
if
R1, R2, R3 =
and
then
O
O
O
O
P
P
P
P
O
O
O
O
P
P
P
P
The types of variation in Markush structures:
1.
2.
3.
4.
OH
R1 = Me, Et, Pr
R1
R2
R3 = alkyl or
heterocycle
R3
R2 =NH2
Cl
(CH2)n
n=1– 3
Substituent variation (R1)
Position variation (R2)
Frequency variation
Homology variation (R3)
(only for patent search)
Generation of Virtual Combinatorial Libraries
Reaction transform approach
from A.R. Leach, V.J. Gillet “An Introduction to Chemoinformatics”, Kluwer Academic Publisher, 2003
Issues and Concepts in Combinatorial
Library Design
• Size of the library
• Coverage of properties („chemical space“)
• Diversity, Similarity, Redundancy
• Descriptor validation
• Subset selection from virtual libraries
Hot topics in chemoinformatics
Predictions vs interpretation
New approaches in structure-property modeling
- descriptors,
- applicability domain
- machine-learning methods (inductive learning transfer,
semi-supervised learning, ....)
New techniques to mine chemical reactions
QSAR of complex systems
- multi-component synergistic mixtures, new materials,
metabolic pathways, ...
Public availability of chemoinformatics tools
Predictions vs interpretation
Nathan BROWN “Chemoinformatics—An Introduction for Computer Scientists”
ACM Computing Surveys, Vol. 41, No. 2, Article 8, February 2009
Predictions vs interpretation
Problems :
• Ensemble modeling
• Non-linear machine-learning methods (SVM, NN, …)
• Descriptors correlations
What do end users expect from QSAR models ?
• Reliable estimation (prediction) of the given property.
Public accessibility of models:
WEB based platform for virtual
screening
Some Screen Shots: Welcome Page…
ISIDA property prediction WEB server
infochim.u-strasbg.fr/webserv/VSEngine.html
ISIDA ScreenDB tools
http://infochim.u-strasbg.fr/webserv/VSEngine.html
-only INTERNET browser is required
-Different descriptors
-(ISIDA fragments, FPT, ChemAxon)
- Similarity search with
metrics (Tanimoto, Dice, …)
different
- ensemble modeling approach
(simulteneous application of several
models)
- models applicability domain
(automatic detection of useless
models)
The most fundamental and lasting
objective of synthesis is not
production of new compounds but
production of properties
George S. Hammond
Norris Award Lecture, 1968
Download