Knowledge-based_approach_for_reaction_generation_CXNEUGM09

advertisement
A knowledge-based
approach for reaction
generation
Development, validation and
applications
Dimitar Hristozov, 04.06.2009
Motivation
lab notebooks (eLN)
public
reaction
databases
U
>1,500,000
reactions
covering general
organic chemistry
medicinal chemists
commercial
reaction
databases
public data
proprietary
reaction
databases
large number of
reactions per
year, strong
medicinal
chemistry bias
 wealth of reaction data



extract some of the knowledge hidden in these data
use this knowledge to assist the medicinal chemist
suggest new, synthetically feasible molecules with desired bio profile
Reaction vectors
From reaction database to knowledge base
O
O
OH
+
HO
R1
O
R2
P
1
2
3
4
Bond
C-C
C=O
C-OH
C-OR
#
4
1
2
0
1
2
3
4
Bond
C-C
C=O
C-OH
C-OR
#
4
1
0
2
reactant vector, R = (R1 + R2)
product vector, P
1
2
3
4
Bond
C-C
C=O
C-OH
C-OR
#
0
0
-2
2
reaction vector, D = P - R
Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J.A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
From reaction vector to products (I)



The reaction vector, D, equals the difference between
the product vector, P, and the reactant vector, R
D=P–R
Given a reaction vector, D, and a reactant vector, R, the
product vector, P, can be obtained
P=D+R
Given a product vector, P, can we reconstruct the
product molecule(s)?
O
O
1
2
3
4
Bond
C-C
C=O
C-OH
C-OR
#
4
1
0
2
O
O
O
O
better descriptor
is required
Extended atom pairs
6
O
7
2
4
1
3
O
8
5
atom types
atom pairs
No.
Symbol
n
p
r
Type
Atom Pair
Atoms
4
C
3
1
0
C(3,1,0)
C(3,1,0)-2(1)-O(2,0,0)
4-5
5
O
2
0
0
O(2,0,0)
C(2,0,0)-2(1)-O(2,0,0)
7-5
7
C
2
0
0
C(2,0,0)
C(2,0,0)-3-C(3,1,0)
2-4; 7-4
AP2: atoms 1 bond away
AP3: atoms 2 bonds away
n: number of bonds to heavy atoms
p: number of π bonds
r: number of ring memberships
From reaction vector to products (II)
“wrong” or “missing”
atom pairs
product vector (P = D + R)
Atom Pair
Count
C(1,0,0)-2(1)-C(2,0,0)
2
C(2,0,0)-2(1)-C(2,0,0)
1
C(2,0,0)-2(1)-C(3,1,0)
1
C(3,1,0)-2(1)-O(2,0,0)
1
C(3,1,0)-2(2)-O(1,1,0)
1
C(2,0,0)-2(1)-O(2,0,0)
1
C(1,0,0)-3-C(2,0,0)
1
C(2,0,0)-3-C(3,1,0)
2
C(2,0,0)-3-O(2,0,0)
1
C(2,0,0)-3-O(1,1,0)
1
O(2,0,0)-3-O(1,1,0)
1
C(1,0,0)-3-O(2,0,0)
1
O
O
C(2,1,0)-2(2)-O(1,1,0)
C(3,1,0)-2(1)-O(2,0,0)
C(3,0,0)-2(1)-O(2,0,0)
C(3,1,0)-2(1)-O(2,0,0)
O
O
O
O
O
O
OH
+
HO
O
Reaction vectors in action
4
Reaction
5
Reaction Vector
4
2
OH
1
5
APs “Lost”
APs “Gained”
3
2
3
C(2,0,0)-2(1)-O(1,0,0)
-1
C(2,1,0)-2(1)-C(2,0,0)
+1
C(2,0,0)-2(1)-C(2,0,0)
-2
C(2,1,0)-2(2)-C(1,1,0)
+1
Starting Molecule
C
C
Product
C
C
C
OH
C
Atoms/bonds
selected for removal
using APs lost
C
C
C
OH
C
C
C
New atoms/bonds
added using APs
gained
C
C
Advantages




Does not require manual atom-atom mapping of the
reaction centre
Makes use of the synthetic chemistry data collected
through the years
Accounts for the synthetic accessibility of the proposed
molecules – all transformations are derived from
successful reactions
Is fast to apply – no substructure searching is required
Good approach…
so how is it…
implemented?
Optimisation made easy

build as an Eclipse plug-in => 100% Java
KNIME meets Chemaxon
Sketcher
File reader
Reaction generator
Convertor
Multi-objective ranking
File writer
Marvin Views
Looks great…
but does it …
work?
Reproducing reactions
5,695
diverse
reactions
1
2,902
reaction
vectors
create knowledge base
3 retrieve its reaction vector
2 for each reaction
APs Lost
-H2O
APs Gained
C(2,0,0)-2(1)-O(1,0,0)
-1
C(2,1,0)-2(1)-C(2,0,0)
+1
C(2,0,0)-2(1)-C(2,0,0)
-2
C(2,1,0)-2(2)-C(1,1,0)
+1
4 apply the reaction vector to the starting materials
APs Lost
+
APs Gained
C(2,0,0)-2(1)-O(1,0,0)
-1
C(2,1,0)-2(1)-C(2,0,0)
+1
C(2,0,0)-2(1)-C(2,0,0)
-2
C(2,1,0)-2(2)-C(1,1,0)
+1
5 is the product obtained in less than 30 seconds?
How well did it work?
Products generated for ~90% of the 5,695 reactions
Reproducibility
per cent

100
90
80
70
60
50
40
30
20
10
0
product(s) generated
no product generated
How fast did it work?
Median run time: 0.015 seconds per reaction
Execution Times
80
70
60
per cent

50
40
30
20
10
0
0.05
0.1
0.5
1
5
10
time / s
15
20
25
30
> 30
Epoxide reduction
Epoxide reduction
OH
O

reproduced in large variety of environments (350 reactions)

only one reaction was not reproduced
OH
O
O
O
O
O
O
O
Works like a charm…
More than 95% reproduced successfully
O
O
O
O
O
OH
O
HN
S
HN
O
O
+
epoxide reduction
S
+
O
NH2
O
O
O
O
O
OH
OH
OH
ester to amide
O
HO
NH2
+
epoxide formation
O
H2N
O
N
alcohol dehydration
acid to aldehyde
Br
O
O
O
HO
+
+
nitrile to aldehyde
Br
N
O
O
N
N
N
N
H
O
nitrile hyrdrolysis
O
NO2
N
OH
alcohol amination
F F
F F
F
O
N
H
N
H
Friedel-Crafts acylation
+
OH
N
NH2
F
N
O
O
OH
+
O
O
OH
O
O
nitro reduction
aldol condensation
O
N
O
O
alkene oxidation
Still works like a charm…
More than 90% reproduced successfully
O
O
O
O
O
N
N
O
O
N
N
O
O
O
N
+
O
olefin metathesis
O
O
Cl
N
O
ether halogenation
O
O
Cl
N
S
O
S
O
S
O
N
N
O
O
HO
amide reduction
O
+
O
Cl
S
HO
N
HO
ozonolysis
N
O
Beckmann rearrangement
O
O
O
O
+
O
N
Cl
Claisen rearrangement
O
O
Cl
O
O
O
+ HO
HO
OH
N
O
alkene halogenation
Dieckmann condensation
O
O
O
O P
O
S
+
O
Br
Cl
HN
O
S
F
O
F
O O
+
O
O P
O
Wittig-Horner
olefination
O
O
+
+
Si
Robinson annulation
Si
Claisen condensation
O
O
O
O+
O
O



+ HO
O
variety of environments were tested
79 out of 100 reactions were successfully reproduced
21% of the reactions were not reproduced

mainly condensations (intra- and intermolecular) which result in ring closures
O
O
O
O
+
S
O
S
OH
Still works
More than 50% reproduced successfully
CF3
O
+
N
HO
N
N
N
N
O
CF3
O
N
O
N
O+
N
O
CF3
S
Cope rearrangement (67% success)
CF3
O
hetero Diels-Alder (73% success)
O
N
O
+
Cl
Cl
O
N
N
O
O
N
NH2
N
+
N
N
N
+
O
+
N
N
O
+
N
+
Cl
N
N
O
O
N
N
O
Diels-Alder cycloaddition (49% success)


Fischer indole synthesis (57% success)
A large variety of reactions successfully reproduced
Small difficulties with complex cycle formations

improvements are on their way
+ HO
Claisen condensation (79% success)
Cl
+
N
O
O
Wow! Cool! It works!
but what is its…
use?
Generating new molecules
Starting molecule
Select reaction
transform
Is a second reagent
required?
no
Discard reaction
vector
no
Can the transform
be applied?
yes
Apply reaction
transform
New molecule
Knowledge
base
yes Select suitable
reagent
Reagents
database
Multi-objective de novo design
H
N
S
N
O
N
O
HO
HO
O
Cl
S
O
N
H
N
S
O
N
NH
O
O
O
N
NH 2
HO
O
H
N
S
N
O
O
Cl
O


rank the proposed new molecules
direct the generation towards desired new molecules
Use case one: Lead optimisation
Here is my starting material. What kind of (feasible) one
step transformations may I make?

starting molecule: Pencillin G
H
N
O
S
N
O
OH
O
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
H
N
S
N
Lead optimisation (cntd.)
O
O
HO
Penicillin G
O
H
N
S
H
N
S
N
N
O
O
Cl
O
H
N
S
N
N
O
O
O
OH
N
O
H
N
S
HO
N
O
O
O
N
O
N
H
N
HO
H
N
S
N
O
N
O
O
O
O
O
HO
O
N
O
O
N
O
N
N
O
O
O
HO
O
O
HO
O
HO
H
N
S
N
Ir
O
H
N
N
O
O
O
O
H
N
S
N
O
H
N
S
O
N
O
O
O
O
Cl
O
NH 2
HO
O
HO
O
HO
OH
O
O
O
O
O
OH
O
H
N
N
O
O
S
N
S
NH2
O
H
N
S
O
N
N
HO
OH
H
N
S
O
O
N
H
N
HO
S
H
N
S
O
O
O
S
N
O
HO
S
H
N
O
NH
HO
D
S
O
O
D
D
O
S
NH 2
H
N
S
D
N
HO
O
O
D
O
S
O
HO
O
NO2
O
O
HO
H
N
N
O
N
O
O
S
H
N
S
O
O
H
N
N
H
N
O
HO
NO 2
O
O
S
NH 2
HO
Cl
O
HO
O
O
HO
H
N
O
S
N
Cl
S
N
O
O
O
O
H
N
S
N
N
O
HO
HO
H
N
S
O
O
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
Use case two: Synthetic route
I have this (active) fragment. Is there a route from it to the
molecule I have in mind?

reproducing known synthetic route – Plavix
Br
CN
NH
S
CN
CN
Cl
N
1
2
S
Cl
No. applicable
reaction
vectors
Total no.
products
generated
1
17
158
2
11
123
Step
3
12
124
4
41
386
Cl
3
COOCH3
O
S
O
O
O
COOH
N
N
4
S
Cl
S
Cl
Synthetic route from Wang, L. et al., Synthetic Improvements in the Preparation of Clopidogrel, Org. Process Res. Dev., 2007, 11 (3), 487-489
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
Use case three: Library design
With which of these reagents will my starting material
undergo reaction X?

enumerate a library using a single reaction and a number of different
reagents
Br
starting material
N
N
O
reaction X (X = Suzuki coupling)
R
+
+
B
HO
Br
OH
B
HO
OH
628 boronic acids as reagents
An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article
Library design (cntd.)
292 products generated
Cl
O
S
N
N
N
N
HN
HN
HN
HN
O
O
O
O
O
O
O
O
Cl
N
N
N
HN
HN
HN
O
O
O
Summary





The reaction vectors offer good way to explore the
knowledge hidden inside reaction databases
A variety of chemical reactions can be reproduced with
this approach
The method works fast
The is applicable in different medicinal chemistry related
scenarios
The use of the method is made easy by variety of
KNIME nodes which have been implemented
Acknowledgements

Michael Bodkin


Hina Patel


for his continuous support both in and outside my daily work
for creating the first prototype which sprung the reaction vectors into live
(http://pubs.acs.org/doi/abs/10.1021/ci800413m)
Dave Evans, Fred Ludlow, Swanand Gore, Dave
Thorner, Maria Whatton, Juliette Pradon

for many stimulating discussions and for their continuous support
Thank You!
do you have any…
questions, comments,
recommendations?
Download