Screen - ChemAxon

advertisement
Screen
Ligand based virtual screening
presented by …
maintained by Miklós Vargyas
Last update: 13 April 2010
Screen
Virtual screening by topological descriptors
Screen
Description of the product
Screen performs high throughput virtual screening of compound
libraries using similarity comparisons by various molecular
descriptors.
Availabilty
•
•
•
•
•
•
•
JChemBase
JChem Oracle cartridge
Instant Jchem
Server version
standalone command line application programs
KNIME
PipelinePilot
Key features
Various 2D descriptors
•
•
•
•
•
•
ChemAxon chemical fingerprint (CCFP)
PipelinePilot ECFP/FCFP
ChemAxon pharmacophore fingerprint (CPFP)
BCUT
Scalars (logP, logD, Szeged index …)
custom descriptors, in-house fingerprints
Optimized similarity measures
•
•
•
Improves similarity prediction
depends on set of known actives
high enrichment ratios in virtual screening
Multiple queries
•
•
3 types of hypotheses
combined hit lists
Benefits
Versatile
• Use various descriptors in your well established model
• Access your trusted in-house fingerprint in IJC, JCB,
JCART
• Easy integration in corporate discovery pipelines
• Search chemical files directly no need to import structures
in database
• New descriptors are pluggable in deployed systems
Optimal
• Consistent similarity scores
• Smaller hit set
• More focused library
Benefits
More consistent similarity scores
0.57
0.47
0.55
optimized Tanimoto
0.20
regular Tanimoto
0.28
0.06
Benefits
High enrichment ratio
• Fewer false hits
• Known actives are true positive hits (ACE inhibitors)
Number of Hits
10000
1000
100
10
1
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Tanimto
Euclidean
Optimized
Ideal
Results
NPY-5 (pharmacophore similarity)
Number of Hits
10000
1000
100
10
1
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Number of Spikes retrieved
Tanimoto
Euclidean
Optimized
Ideal
Results
β2-adrenoceptor (pharmacophore similarity)
Number of Hits
10000
1000
100
10
1
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Tanimto
Euclidean
Optimized
Ideal
Case study at Axovan
• GPCR activity prediction
• distinguishing between GPCR subclasses
GPCR-Tailored Pharmacophore Pattern Recognition of
Small Molecular Ligands
Modest von Korff and Matthias Steger, JCICS 2004, 44
Screen roadmap
• New molecular descriptors
– ECFP/FCFP (in 5.4)
– Shape descriptors (in 5.4)
• Hidden use of the optimiser
– No-pain black-box approach
– Simultaneous multi-descriptor search
• Enhanced IJC integration
– Easy descriptor configuration and generation
– Similarity search type instead of descriptors, metrics
and other unfriendly concepts
Screen roadmap
• GUI
– New web interface (HTML/AJAX)
– Desktop application for descriptor generation
• 3D shape similarity
– fast pre-filtering by 3D fingerprint
– Alignment based volumetric Tanimoto calculation
– scaffold hopping by maximizing topological
dissimilarity and spatial similarity
Supplementary slides
A typical approach
0101010100010100010100100000000000010010000010010100100100010000
query fingerprint
query
B( x & y)
B( x)  B( y )  B( x & y )
metric
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
targets
target fingerprints
hits
ChemAxon’s approach
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0101110100110101010111111000010000011111100010000100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010100000000000000101000010010000000000100000000000000
0101110100110101010111111000010000011111100010000100001000101000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
hypothesis fingerprint
0000000100000000010000100000000000001010100000000100000100100000
queries
 s min(x , y )
  x   s min(x , y )  1    y   s min(x , y )   s min(x , y )
i i
i i
i i
i
i
i
i i
i
i i
i
i
i i
i
i
optimized metric
optimization
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
targets
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
target fingerprints
hits
Performance
Chemical fingerprint generation: 500/s
Pharmacophore fingerprint generation
•
calculated: 80/s
•
rule-based: 200/s
Screening: 12000/s
Optimization: 10s/metric
Hardware/software environment:
•
P4 3GHz, 1GB RAM
•
Red Hat Linux 9
•
Java 1.4.2
Implementations
Use of various fingerprints and metrics in JSP
http://www.chemaxon.com/jchem/examples/jsp1_x/index.jsp
UGM presentation by Aureus Pharma
Improved Virtual Screening Strategies and Enrichment of
Focused Libraries in Active Compounds Using TargetOriented Databases
http://www.chemaxon.com/forum/viewpost2307.html
Molecular similarity
Chemical, pharmacological or biological properties of two compounds
match.
The more the common features, the higher the similarity between two
molecules.
Chemical
Pharmacophore
Similarity measures
Quantitative assessment of similarity of structures
• need a numerically tractable form
• molecular descriptors, fingerprints, structural keys
Sequences/vectors of bits, or numeric values that can be compared by
distance functions, similarity metrics.
E ( x, y ) 
n
 x
i 1
i
 yi 
2
T ( x, y ) 
B( x & y)
B( x)  B( y )  B( x & y )
Standard metrics
DTanimoto
x y
min(x , y )


( x, y) 
1
 max(x , y )
 x   y   min(x , y )
i
i
i
i
i
DEuclidean( x, y ) 
 x
i
i
i
i i
 yi 
2
i
DTanimoto
(
,
) = 0.68
(
,
) = 21.93
DEuclidean
i i
i
i
i
i
i
Topological chemical fingerprint
hashed binary fingerprint
• encodes topological properties of the chemical graph:
connectivity, edge label (bond type), node label (atom type)
• allows the comparison of two molecules with respect to their
chemical structure
Construction
1.
2.
3.
find all 0, 1, …, n step walks in the chemical graph
generate a bit array for each walks with given number of bits set
merge the bit arrays with logical OR operation
Construction of chemical fingerprint
H
H
H
C
C
O
H
H
length
H
walk
bit array
0
C
1010000000
1
C–H
0001010000
1
C–C
0001000100
2
C–C–H
0001000010
2
C–C–O
0100010000
3
C–C–O–H
0000011000
ALL
1111011110
Chemical similarity
0100010100010100010000000001101010011010100000010100000000100000
0100010100010100010000000001101010011010100000000100000000100000
Topological pharmacophore fingreprint
• encodes pharmacophore properties of molecules as frequency
counts of pharmacophore point pairs at given topological distance
• allows the comparison of two molecules with respect to their
pharmacophore
Construction
1.
2.
3.
4.
perceive pharmacophoric features
map pharmacophore point type to atoms
calculate length of shortest path between each pair of atoms
assign a histogram to every pharmacophore point pairs and
count the frequency of the pair with respect to its distance
Pharmacophore perception
Rule based approach
Rule 1: The pharmacophore type of an atom is an acceptor, if
• it is a nitrogen, oxygen or sulfur, and
• it is not an amide nitrogen or sulfur, and
• it is not an aniline nitrogen, and
• it is not a sulfonyl sulfur, and
• it is not a nitro group nitrogen.
Exceptions to simple rules
sp2 atom
n-cyano-methil piperidine
exception  extra rules  large number of rules
 maintenance, performance
Effect of pH
pH = 7
pH = 1
pH  pH specific rules  large number of rules
 maintenance, performance
Pharmacophore perception
Calculation based approach
Step 1: estimation of pKa
allows the determination of the protonation state
for ionizable groups at the given pH
Step 2: partial charge calculation
Pharmacophore perception
Calculation based approach
Step 3: hydrogen bond donor/acceptor recognition
Step 4: aromatic perception
Step 5: pharmacophore property assignment
acceptor
negatively charged acceptor
acceptor and donor
hydrophobic
none
Pharmacophore fingerprint
12
11
10
9
8
7
6
5
4
3
2
1
0
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
12
11
10
9
8
7
6
5
4
3
2
1
0
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Pharmacophore type coloring: acceptor, donor, hydrophobic, none.
Fuzzy smoothing
5
2
2
1
1
2
3
0
0
AA1
AA2
AA3
AA4
AA5
AA6
AA1
DE=1.41
AA2
AA3
AA4
AA5
AA6
AA5
AA6
DE=0.45
2
4
5
1
2
0
1
0
AA1
AA2
AA3
AA4
AA5
AA6
AA1
AA2
AA3
AA4
Virtual screening using fingerprints
0101010100010100010100100000000000010010000010010100100100010000
query fingerprint
query
metric
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
targets
target fingerprints
hits
Multiple query structures
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0101110100110101010111111000010000011111100010000100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
queries
0101110100110101010111111000010000011111100010000100001000101000
hypothesis fingerprint
metric
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
targets
target fingerprints
hits
Hypothesis fingerprints
Advantages
• allows faster operation
• compiles features common to each individual actives
• reduces noise
Hypothesis types
Active 1
0
2
7
1
0
1
6
4
0
0
9
0
Active 2
1
6
0
4
3
3
1
2
2
0
5
1
Active 3
2
4
4
1
0
2
5
3
4
3
4
5
Minimum
0
2
0
1
0
1
1
2
0
0
4
0
Average
1
4
3.67
2
1
2
4
3
2
1.33
6
2
Median
1.5
4
5.5
1
0
2
5
3
3
0
5
3
Hypothesis fingerprints
Advantages
Disadvantages
Minimum
• strict conditions for hits if
actives are fairly similar
• false results with asymmetric
metrics
• misses common features of
highly diverse sets
• very sensitive to one missing
feature
Average
• captures common features of
more diverse active sets
• less selective if actives are
very similar
Median
• captures common features of
more diverse active sets
• specific treatment of the
absence of a feature
• less sensitive to outliers
• less selective if actives are
very similar
The need for optimization
Too many hits
The need for optimization
Inconsistent dissimilarity values
0.57
0.47
0.55
Parametrized metrics
scaled ,asymmetric
Tanimoto
D
s min(x , y )

( x, y)  1 
  x   s min(x , y )  1    y   s min(x , y )   s min(x , y )
i i
i i
i i
i
i
i
i i
i
i i
i
i
i i
  0,1 asymmetry factor
si  N
scaling factor
weighted, asymmetric
DEuclidean
( x, y) 
 wi xi  yi  
2
xi  yi
  0,1 asymmetry factor
wi  0,1 weights
 wi 1   xi  yi 
2
xi  yi
i
i
Optimization of metrics
Step 1 optimize parameters for maximum enrichment
Step 2 validate metrics over an independent test set
training
set
training
set
query
set
selected
targets
known
actives
test set
test
set
Optimization of metrics
Step 1 optimize parameters for maximum enrichment
query set
1111100010000100001000101000
query
fingerprint
Target hits
parametrized
metric
training
set
Active hits
Optimization of metrics
v1
v2
v3
vi
vn
potential variable value
temporarily fixed value
final value
running variable value
Optimization of metrics
Step 2 validate metrics over an independent test set
query set
Target hits
1111100010000100001000101000
query
fingerprint
optimized
metric
test set
Active hits
Results of Optimization
1. Similar structures get closer
0.57
0.47
0.55
0.20
0.28
0.06
Results of Optimization
2. Hit set size reduced
Active set: 18 mGlu-R1 antagonists
Target set: 10000 randomly selected drug-like structures
Euclidean
Tanimoto
Metric
Basic
Scaled
Asymmetric
Scaled Asymmetric
Basic
Normalized
Asymmetric Normalized
Weighted Normalized
Weighted Asymmetric Normalized
Enrichment Test Hits Target Hits
70.47
5.43
172.00
7.63
6.00
1101.71
99.36
5.29
106.00
11.94
5.86
731.14
5.59
5.43
1465.57
11.33
5.14
791.29
18.58
4.71
368.71
296.30
4.14
27.57
281.30
3.43
17.00
Results of Optimization
3. Higher enrichment
Active set
5-HT3
ACE
Angiotensin
Beta2
D2
Delta
FTP
mGluR1
NPY-5
Thrombin
size Euclidean Optimized Improvement
12
12.55
239.24
49.26
89
1.42
6.50
4.64
10
27.81
85.45
11.15
50
1.52
24.70
17.42
13
27.64
123.25
11.19
20
11.66
243.57
69.11
35
46.88
71.54
5.35
18
5.59
296.30
70.93
139
3.05
12.75
3.25
8
2.56
7.68
2.62
Results of Optimization
4. Top ranked structures are spikes
• offers a more intuitive way to evaluate the efficiency of screening
• based on sorting random set hits and known actives on dissimilarity
values and counting the number of random set hits preceding each active
in the sorted list
number of virtual hits
0.014
0.015
0.017
0.020
0.022
0.023
0.027
0.041
0.043
number of spikes retrieved
Results
ACE (pharmacophore similarity)
10000
Number of hits
1000
Euclidean
100
Optimized
Euclidean
10
1
1
2
3
4
5
6
7
8
9
10
11
Number of spikes retrieved
12 13
14 15
16
Results
NPY-5 (pharmacophore similarity)
Number of Hits
10000
1000
100
10
1
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Number of Spikes retrieved
Tanimoto
Euclidean
Optimized
Ideal
Results
β2-adrenoceptor (pharmacophore similarity)
Number of Hits
10000
1000
100
10
1
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Tanimto
Euclidean
Optimized
Ideal
3D flexible search
Expected top performance 200 structures/s
Download