A knowledge-based approach for reaction generation Development, validation and applications Dimitar Hristozov, 04.06.2009 Motivation lab notebooks (eLN) public reaction databases U >1,500,000 reactions covering general organic chemistry medicinal chemists commercial reaction databases public data proprietary reaction databases large number of reactions per year, strong medicinal chemistry bias wealth of reaction data extract some of the knowledge hidden in these data use this knowledge to assist the medicinal chemist suggest new, synthetically feasible molecules with desired bio profile Reaction vectors From reaction database to knowledge base O O OH + HO R1 O R2 P 1 2 3 4 Bond C-C C=O C-OH C-OR # 4 1 2 0 1 2 3 4 Bond C-C C=O C-OH C-OR # 4 1 0 2 reactant vector, R = (R1 + R2) product vector, P 1 2 3 4 Bond C-C C=O C-OH C-OR # 0 0 -2 2 reaction vector, D = P - R Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J.A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article From reaction vector to products (I) The reaction vector, D, equals the difference between the product vector, P, and the reactant vector, R D=P–R Given a reaction vector, D, and a reactant vector, R, the product vector, P, can be obtained P=D+R Given a product vector, P, can we reconstruct the product molecule(s)? O O 1 2 3 4 Bond C-C C=O C-OH C-OR # 4 1 0 2 O O O O better descriptor is required Extended atom pairs 6 O 7 2 4 1 3 O 8 5 atom types atom pairs No. Symbol n p r Type Atom Pair Atoms 4 C 3 1 0 C(3,1,0) C(3,1,0)-2(1)-O(2,0,0) 4-5 5 O 2 0 0 O(2,0,0) C(2,0,0)-2(1)-O(2,0,0) 7-5 7 C 2 0 0 C(2,0,0) C(2,0,0)-3-C(3,1,0) 2-4; 7-4 AP2: atoms 1 bond away AP3: atoms 2 bonds away n: number of bonds to heavy atoms p: number of π bonds r: number of ring memberships From reaction vector to products (II) “wrong” or “missing” atom pairs product vector (P = D + R) Atom Pair Count C(1,0,0)-2(1)-C(2,0,0) 2 C(2,0,0)-2(1)-C(2,0,0) 1 C(2,0,0)-2(1)-C(3,1,0) 1 C(3,1,0)-2(1)-O(2,0,0) 1 C(3,1,0)-2(2)-O(1,1,0) 1 C(2,0,0)-2(1)-O(2,0,0) 1 C(1,0,0)-3-C(2,0,0) 1 C(2,0,0)-3-C(3,1,0) 2 C(2,0,0)-3-O(2,0,0) 1 C(2,0,0)-3-O(1,1,0) 1 O(2,0,0)-3-O(1,1,0) 1 C(1,0,0)-3-O(2,0,0) 1 O O C(2,1,0)-2(2)-O(1,1,0) C(3,1,0)-2(1)-O(2,0,0) C(3,0,0)-2(1)-O(2,0,0) C(3,1,0)-2(1)-O(2,0,0) O O O O O O OH + HO O Reaction vectors in action 4 Reaction 5 Reaction Vector 4 2 OH 1 5 APs “Lost” APs “Gained” 3 2 3 C(2,0,0)-2(1)-O(1,0,0) -1 C(2,1,0)-2(1)-C(2,0,0) +1 C(2,0,0)-2(1)-C(2,0,0) -2 C(2,1,0)-2(2)-C(1,1,0) +1 Starting Molecule C C Product C C C OH C Atoms/bonds selected for removal using APs lost C C C OH C C C New atoms/bonds added using APs gained C C Advantages Does not require manual atom-atom mapping of the reaction centre Makes use of the synthetic chemistry data collected through the years Accounts for the synthetic accessibility of the proposed molecules – all transformations are derived from successful reactions Is fast to apply – no substructure searching is required Good approach… so how is it… implemented? Optimisation made easy build as an Eclipse plug-in => 100% Java KNIME meets Chemaxon Sketcher File reader Reaction generator Convertor Multi-objective ranking File writer Marvin Views Looks great… but does it … work? Reproducing reactions 5,695 diverse reactions 1 2,902 reaction vectors create knowledge base 3 retrieve its reaction vector 2 for each reaction APs Lost -H2O APs Gained C(2,0,0)-2(1)-O(1,0,0) -1 C(2,1,0)-2(1)-C(2,0,0) +1 C(2,0,0)-2(1)-C(2,0,0) -2 C(2,1,0)-2(2)-C(1,1,0) +1 4 apply the reaction vector to the starting materials APs Lost + APs Gained C(2,0,0)-2(1)-O(1,0,0) -1 C(2,1,0)-2(1)-C(2,0,0) +1 C(2,0,0)-2(1)-C(2,0,0) -2 C(2,1,0)-2(2)-C(1,1,0) +1 5 is the product obtained in less than 30 seconds? How well did it work? Products generated for ~90% of the 5,695 reactions Reproducibility per cent 100 90 80 70 60 50 40 30 20 10 0 product(s) generated no product generated How fast did it work? Median run time: 0.015 seconds per reaction Execution Times 80 70 60 per cent 50 40 30 20 10 0 0.05 0.1 0.5 1 5 10 time / s 15 20 25 30 > 30 Epoxide reduction Epoxide reduction OH O reproduced in large variety of environments (350 reactions) only one reaction was not reproduced OH O O O O O O O Works like a charm… More than 95% reproduced successfully O O O O O OH O HN S HN O O + epoxide reduction S + O NH2 O O O O O OH OH OH ester to amide O HO NH2 + epoxide formation O H2N O N alcohol dehydration acid to aldehyde Br O O O HO + + nitrile to aldehyde Br N O O N N N N H O nitrile hyrdrolysis O NO2 N OH alcohol amination F F F F F O N H N H Friedel-Crafts acylation + OH N NH2 F N O O OH + O O OH O O nitro reduction aldol condensation O N O O alkene oxidation Still works like a charm… More than 90% reproduced successfully O O O O O N N O O N N O O O N + O olefin metathesis O O Cl N O ether halogenation O O Cl N S O S O S O N N O O HO amide reduction O + O Cl S HO N HO ozonolysis N O Beckmann rearrangement O O O O + O N Cl Claisen rearrangement O O Cl O O O + HO HO OH N O alkene halogenation Dieckmann condensation O O O O P O S + O Br Cl HN O S F O F O O + O O P O Wittig-Horner olefination O O + + Si Robinson annulation Si Claisen condensation O O O O+ O O + HO O variety of environments were tested 79 out of 100 reactions were successfully reproduced 21% of the reactions were not reproduced mainly condensations (intra- and intermolecular) which result in ring closures O O O O + S O S OH Still works More than 50% reproduced successfully CF3 O + N HO N N N N O CF3 O N O N O+ N O CF3 S Cope rearrangement (67% success) CF3 O hetero Diels-Alder (73% success) O N O + Cl Cl O N N O O N NH2 N + N N N + O + N N O + N + Cl N N O O N N O Diels-Alder cycloaddition (49% success) Fischer indole synthesis (57% success) A large variety of reactions successfully reproduced Small difficulties with complex cycle formations improvements are on their way + HO Claisen condensation (79% success) Cl + N O O Wow! Cool! It works! but what is its… use? Generating new molecules Starting molecule Select reaction transform Is a second reagent required? no Discard reaction vector no Can the transform be applied? yes Apply reaction transform New molecule Knowledge base yes Select suitable reagent Reagents database Multi-objective de novo design H N S N O N O HO HO O Cl S O N H N S O N NH O O O N NH 2 HO O H N S N O O Cl O rank the proposed new molecules direct the generation towards desired new molecules Use case one: Lead optimisation Here is my starting material. What kind of (feasible) one step transformations may I make? starting molecule: Pencillin G H N O S N O OH O An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article H N S N Lead optimisation (cntd.) O O HO Penicillin G O H N S H N S N N O O Cl O H N S N N O O O OH N O H N S HO N O O O N O N H N HO H N S N O N O O O O O HO O N O O N O N N O O O HO O O HO O HO H N S N Ir O H N N O O O O H N S N O H N S O N O O O O Cl O NH 2 HO O HO O HO OH O O O O O OH O H N N O O S N S NH2 O H N S O N N HO OH H N S O O N H N HO S H N S O O O S N O HO S H N O NH HO D S O O D D O S NH 2 H N S D N HO O O D O S O HO O NO2 O O HO H N N O N O O S H N S O O H N N H N O HO NO 2 O O S NH 2 HO Cl O HO O O HO H N O S N Cl S N O O O O H N S N N O HO HO H N S O O An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article Use case two: Synthetic route I have this (active) fragment. Is there a route from it to the molecule I have in mind? reproducing known synthetic route – Plavix Br CN NH S CN CN Cl N 1 2 S Cl No. applicable reaction vectors Total no. products generated 1 17 158 2 11 123 Step 3 12 124 4 41 386 Cl 3 COOCH3 O S O O O COOH N N 4 S Cl S Cl Synthetic route from Wang, L. et al., Synthetic Improvements in the Preparation of Clopidogrel, Org. Process Res. Dev., 2007, 11 (3), 487-489 An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article Use case three: Library design With which of these reagents will my starting material undergo reaction X? enumerate a library using a single reaction and a number of different reagents Br starting material N N O reaction X (X = Suzuki coupling) R + + B HO Br OH B HO OH 628 boronic acids as reagents An example from Patel, H., Bodkin, M.J., Chen, B., Gillet, V.J. A Knowledge-Based Approach to De Novo Design Using Reaction Vectors, J. Chem. Inf. Model., 2009, ASAP article Library design (cntd.) 292 products generated Cl O S N N N N HN HN HN HN O O O O O O O O Cl N N N HN HN HN O O O Summary The reaction vectors offer good way to explore the knowledge hidden inside reaction databases A variety of chemical reactions can be reproduced with this approach The method works fast The is applicable in different medicinal chemistry related scenarios The use of the method is made easy by variety of KNIME nodes which have been implemented Acknowledgements Michael Bodkin Hina Patel for his continuous support both in and outside my daily work for creating the first prototype which sprung the reaction vectors into live (http://pubs.acs.org/doi/abs/10.1021/ci800413m) Dave Evans, Fred Ludlow, Swanand Gore, Dave Thorner, Maria Whatton, Juliette Pradon for many stimulating discussions and for their continuous support Thank You! do you have any… questions, comments, recommendations?