Use of Chemical Information in Organic Synthesis Reaction Information for the Practicing Synthetic Chemist: The Search for Relevant Answers AGENDA: Available information Introduction to reaction data searching Concepts and problems Basis of reaction classification DiscoveryGate Retrieving relevant information for the synthesis of new compounds Questions & Answers Guenter Grethe May, 2006 April 2006 Use of Chemical Information in Organic Synthesis Information Needs of Synthetic Organic Chemists in Basic Research and Development • new preparation of intermediates and starting materials • well established, high yield preparations (experimental procedures) • new synthetic methodologies (new reagents, catalysts etc.) • information on starting materials (availability, price, physical data etc.) • physical properties of reagents, solvents and catalysts • access to the primary, secondary, and tertiary literature • spectral information of related compounds General: searching for information on molecules precedes retrieval of synthetic methodology data April 2006 Use of Chemical Information in Organic Synthesis Differences in Molecule vs. Reaction Searching CN Cl Molecules: NO2 Query: Is this particular molecule or similar ones known? Specific data? Answer: Yes or No from existing databases, including patents CN Cl Reaction Conditions? Cl NH2 Reactions: NO2 NO2 Query: How to selectively reduce the nitrile group (transformation?) Answer: Pointers to relevant examples in the literature Criteria: Efficient transformation Functional group compatibility Reactions conditions April 2006 Use of Chemical Information in Organic Synthesis Available Reaction Databases online: CASREACT (CAS) (ca. 10.5 Mio, including Spresi database, 1985 - present ) Spresi (InfoChem) (ca. 4.5 Mio, 1974 – 2004) CrossFireplusReactions (Elsevier MDL, STN) (ca. 10 Mio, 1779 - present) ChemInform RX on STN (FIZ Chemie) (ca. 0.8 Mio) CCR (Thomson Scientific) (ca. 0.6 Mio) inhouse: ChemInform Reaction Library (Elsevier MDL) Spresi (InfoChem) CrossFire Beilstein (Elsevier MDL) Specialty Databases (several vendors) Proprietary Databases For a good review see: Zass, E. "Reaction Databases", In: Encyclopedia of Computational Chemistry, Schleyer, P. von R.; Allinger, N.L.; Clark, T.; Gasteiger, J.; Kollman, P.A.; Schaefer, H.F.; Shreiner, P.R. (Eds.). Wiley, Chichester, 4, 2402-2420. QD39.3.E46 E53 1998 April 2006 Use of Chemical Information in Organic Synthesis Use of Available Information in Synthesis Preparation of a distinct compound requires access to information about new synthetic methodologies in journals and databases experimental details for the preparation of known intermediates and starting materials from databases, journals and other sources tools to plan syntheses and select optimal reaction conditions Preparation of a library of diverse compounds requires all of the above knowledge about the characteristics of functional groups information about available building blocks Process development requirements are defined by access to information about various reaction conditions of a reaction knowledge about the characteristics of molecules or their fragments under required reaction condition tools to calculate the behavior of reagents, solvents, and catalysts April 2006 Use of Chemical Information in Organic Synthesis Barriers Impeding the Use of Available Information by Endusers multiple access systems different user interfaces different modi operandi difficult query formulation substructure concept keyword inconsistencies limited post-search management of large hitlists some integrated access to other information sources Most importantly: failure of available systems to recognize and to facilitate the integration of the vast knowledge of synthetic chemists April 2006 Use of Chemical Information in Organic Synthesis Search Modes Structure-Based Searches Full structure Only for reactions with known molecules (not very useful) Reaction substructure (RSS) Most frequently used mode (difficult for end-users to formulate effective query) Reaction similarity Various methodologies using different parameters (results often vary greatly, good for browsing and idea generation) Reaction classification Several methodologies, mostly based on structural information about reaction centers and immediate environment (good indexing tool, improvement over reaction similarity) Reagents, Solvents Full structure and substructure searches for molecules (not available in all databases, used mostly in conjunction with other structural searches) Data-Based Searches Keywords intellectually derived terms for name reactions, reaction types etc. (incomplete, not very useful) Journal, author, title, yields, etc. Text or numeric data searches (mostly used in conjunction with structural searches) April 2006 Use of Chemical Information in Organic Synthesis Problems with Reaction Searching Synthetic Problem: CH3O CH3O N O Full Structure Search: N O O O O No hits* Reaction Substructure Search (colored fragment): Class Code Search O 119 hits* 672 hits* (broad, reaction center only) Keyword Search “Michael Addition”: 2972 hits* *Results were obtained from Elsevier MDL’s combined reaction databases (ca. 1 Mio reactions); 2006 April 2006 Use of Chemical Information in Organic Synthesis Problems with Substructure Searching N Oversimplified Query (nitrile to primary amine) Cl NH2 Cl N DATABASE SIZE: ca. 1 million reactions NH2 O2N 737 Hits N O2N N Narrowly Defined Query Cl NO2 N Cl NO2 NH2 0 Hits Problems: Solutions: - how to avoid excessively large hitlist - how to formulate “reasonable” search queries - combination of several queries (expert approach) - indexing of reactions (focusing on relevant reactions) - facilitating query building (non-expert approach, intuitive) April 2006 Use of Chemical Information in Organic Synthesis Goal for an Efficient Reaction Data Management System Create an environment that allows for combining the intelligence and creativity of synthetic chemists with the processing and simulating power of computers and the wealth of information in databases to meet the challenges in the laboratory for developing efficient syntheses. April 2006 Use of Chemical Information in Organic Synthesis Requirements to Facilitate Enduser Searching User interfaces based on users’ tasks and capabilities (e.g. CrossFire Web, DiscoveryGate, Reaction Browser, Scifinder) (see “A Framework for the Evaluation of Chemical Structure Databases”, Cooke,F; Schofield, H. J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140) Hierarchical thesauri for keywords and reaction types Effective indexing of databases (e.g. classification) Simplification of the querying process (natural, not rule dependent) Efficient post-search management tools (e.g.clustering) Seamless integration of various information sources (web environment, point-and-click) Most importantly: available tools must simulate the chemist’s problem solving process April 2006 Use of Chemical Information in Organic Synthesis Databases in DiscoveryGate April 2006 Use of Chemical Information in Organic Synthesis ReactionClassification Classification as as Indexing Reaction IndexingTool Tool ‘Do We Still Need a Classification of Organic Reactions?’ Reasons alternate method for indexing databases - complement to structurebased retrieval systems access to “generic” types of information in retrieval systems post-search management of large hitlists simplification of query generation linking of reaction information from different sources source for deriving knowledge bases for reaction prediction and synthesis design automatic procedures for analyses and correlations, e.g. quality control and overlap studies April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification as Indexing Tool Examples of some recent work Horace: An Automatic System for the Hierarchical Classification of Chemical Reactions. Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput. Sci. 1994, 34, 74 COGNOS: A Beilstein-Type System for Organizing Organic Reactions. Hendrickson, J.B., Sander, T. J. Chem. Inf. Comput. Sci. 1995, 35, 251 Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a SelfOrganizing Neural Network. Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997, 119, 4033 Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites. Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput. Sci. 1998, 38, 210 Topology-Based Reaction Classification: An Important Tool for the Efficient Management of Reaction Information. Kraut, H., Löw, P., Matuszczyk, H., Saller, H., Grethe, G. Proceed. 5th Internat. Conf. Chem. Struct., Noordwijkerhout, The Netherlands 1999, 26 Analysis of Reaction Information. Grethe, G. In “Handbook of Chemoinformatics” Gasteiger, J. (Ed.) Wiley-VCH, Volume 4, 1407 – 1427, Weinheim, 2003 April 2006 Use of Chemical Information in Organic Synthesis Reaction Indexing through Classification CH3O CH3O N N O O O O O O Based on: Keywords: Michael addition, Michael reaction, ring closure……. Molecule Type: N-heterocycle, isoquinoline, quinolizidine….. Reaction Type: reaction centers CH3O CH3O N O N O O O O O April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification - Background Classify v.2. 5, developed by InfoChem, Munich Based on InfoChem’s reaction center perception algorithm Rules and Definitions A bond is defined as a reaction center if it is made or broken An atom is defined as a reaction center if it changes number of implicit hydrogens number of valencies number of -electrons atomic the charge connecting bond is a reaction center April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification - Background Rules and Definitions Hashcodes are calculated for all reaction centers taking into account atom properties atom type valence state total number of bonded hydrogens (implicit plus explicitly drawn) number of -electrons aromaticity formal charges reaction center information The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction classification code: ‘ClassCode’ April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification - Background Rules and Definitions Inclusion of atoms in the immediate environment (spheres) reaction centers only (0-sphere = BROAD) reaction centers + -atoms (1-sphere = MEDIUM) reaction centers + -atoms (2-sphere = NARROW) inclusion of one sp3-atoms during sphere expansion Atom equivalency atoms in the same group of the periodic table, with the exception of row-2 elements, are considered equivalent Multiple occurrences of identical transformations are handled as one April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification - Background Rules and Definitions C N C N N C C C H N C C N N N H N 0-Sphere (Broad) ...655778 Reaction centers only, similar to broadly based substructure search large-sized cluster or hitlist H H 1-Sphere (Medium) ...151297 Reaction centers plus alpha atoms, excluding hydrogens medium-sized cluster or hitlist C N C C N N N H H 2-Sphere (Narrow) ...077692 Reaction centers plus beta atoms, excluding consecutive sp3-atoms small-sized cluster or hitlist Number of hits from CIRX97 (70060 rxns) for identical transformation at different classification levels 700 O OH 300 O broad HO medium Number of hits narrow 50 Topological specificity April 2006 Use of Chemical Information in Organic Synthesis Reaction Classification – Clustering of Search Results Classification codes are data stored in the database usable for sorting (clustering) Result: 156 hits CH3O CH3O RSS-Search Query: (in red) N O N O O O O Clustered by Classification Code “MEDIUM) O 72 clusters 1.Cluster (20 rxns) 2.Cluster (15 rxns) Chiral O O N O N N H O O O O H N O O O O O O O 3.Cluster (13 rxns) 4.Cluster (8 rxns) Chiral O O O O N O O O O N O O H H O O O O April 2006 Use of Chemical Information in Organic Synthesis Classification by Reaction Names Chemists are familiar with Name Reactions (Diels-Alder, Michael etc.) Papers in a one issue of JOC (22, 2004) mentioned 20 name reactions, known and lesser known, some multiple times e.g.,Mitsunobu reaction, Nazarov reaction, Wolff rearrangement etc. Several books dealing exclusively with Name Reactions* (ca.700 reactions) Use of Name Reactions facilitates reaction retrieval Complementary to other searches Used in combination with other data Easier alternative to formulating complex RSS queries Excellent browsing tool Overview of scope and limitations of a given reaction, e.g. Aldol reaction Combining different reaction types leading to same compound class Hantzsch pyridine synthesis from dihydropyridines or ß-keto esters Fischer Indole synthesis from hydrazines or hydrazones Darzens reaction of epoxides from esters, amides, sulfones, or nitriles *References Named Organic Reactions, Laue, T. and Plagens, A., Eds., John Wiley &Sons, 1 st Edition 1999, 2nd Edition 2005 Organic Syntheses Based on Name Reactions, Hassner, A. and Stumer,C., Eds., Elsevier Science,1st Edition 1994; 2nd Edition 2002 Name Reactions, Li, J. J., Ed., Springer, 2002 Strategic Applications of Named Reactions, Kürti, L. and Czakó, B., Eds., Elsevier, 2005 Name Reactions and Reagents in Organic Synthesis, Mundy, B.P; Ellerd, M.G. and Favaloro, F.G., Jr. Wiley Interscience 2005 April 2006 Note: The work on classification by reaction names is being developed at InfoChem (Munich) in consultation with G.Grethe Use of Chemical Information in Organic Synthesis Classification by Reaction Names - Requirements Established electronically not intellectually NOW – Intellectually derived Inclusion of intellectually derived keywords greatly varies from database to database and depend on abstractors and are either too inclusive or not comprehensive Example: “Michael addition” 184 hits (keywords) vs. 89 hits (RSS search) 52 hits (reaction name keywords) FUTURE – Electronically derived Assignments based on single or multiple RSS searches Uselogic of is Chemical Organic Synthesis Boolean applied toInformation combine and/orin subtract search results (queries) Assignments are pre-processed and added as data to database(s) Name reactions are aligned in hierarchical order Based on main reaction categories (addition, substitution, rearrangements, eliminations, oxidations, reductions) Reactions can be listed in multiple categories, e.g.: Baeyer-Villiger oxidation in Oxidation and Rearrangement Hierarchy must be able to accommodate non-name reactions (future project) Reactions containing n reactions (e.g., tandem reactions) are listed in n categories Individual name reactions have to be recognizable Otherwise, stored under “Miscellaneous” Queries and corresponding names are stored in spreadsheet April 2006 Use of Chemical Information in Organic Synthesis Classification by Reaction Names - Hierarchy Main categories Addition Substitution Rearrangements First Level Second Level Third Level 1,2-Addition Darzens condensation Sulfones 1,4-Addition Michael reaction Intermolecular Cycloaddition 4+2 Cycloadditions Diels-Alder reaction Aromatic electrophilic Friedel-Crafts acylation Intramolecular Aliphatic Nucleophilic Schotten-Baumann reaction Free radical Gomberg-Bachmann reaction Intermolecular Nucleophilic Hofmann rearrangement Alkyl Sigmatropic [3,3] Sigmatropic rearrangement Claisen rearrangement Radical Elimination Cope reaction Reductions Cannizaro reaction Intermolecular Oxidations Baeyer-Villiger oxidation Lactones Heterocyclic Synthesis Hantzsch pyridine synthesis Modified Miscellaneous Alper reaction Chugaev reaction Cyclocarbonylation April 2006 Use of Chemical Information in Organic Synthesis Classification by Reaction Names– Keyword Generation Example: Intermolecular Mannich reaction with CH-acidic compounds Procedure: - generate query for general search - check hitlist for non-relevant hits - formulate queries to eliminate negatives - combine queries using Boolean operators OCH3 CHO CH3 + H2N H3C O [C,H] HN [C,H] H N .1. .4. .2. [C,H] + [C,H] C(s*) O + N(s*) [C,H] .1. .2. C(s*) A H + OCH3 .3. .3. A .4. [C,H] H3C O Mannich reaction Query Q1 Elimination of negative hits: [C,H] [C,H] O O O O + O N(s*) + N N N N N(s*) Q + [C,H] C(s*) + A N(s*) C(s*) A A [C,H] Query Q2 NH2 Aza Diels-Alder reaction A O H + H Q Biginelli reaction CHO + O N H N [C,H] H [C,H] N .1. .3. .2. [C,H] + [C,H] C(s*) O + H A .3. A .4. A (s*) Rn .1. [C,H] N (s*) .2. C [C,H] A .4. Query Q3 Query set for intermolecular Mannich reaction with CH-acidic compounds: Q1 – (Q2+Q3) April 2006 Use of Chemical Information in Organic Synthesis Classification by Reaction Names Example of query menu (partial view) from InfoChem’s SpresiWeb April 2006 Use of Chemical Information in Organic Synthesis “The design of organic syntheses by chemists without the help of computers proceeds in anything but a systematic stepwise manner from the target molecule to available starting materials. A systematic stepwise approach is more the exception than the rule”. “The human mind solves problems by lateral thinking, jumping from one idea to the next, from one question to a different one, from retrosynthetic thinking to considering the course and outcome of a reaction ,etc.” Gasteiger, J.; Ihlenfeldt, W.D.; Roese, P. Recl.Trav.Chim.Pays-Bas 1992, 111, 270. The paradigm in an ideal electronic world Journals Major Reference Works Books Databases E-Labjournal Databases + Knowledge, Intuition, and Experience of Synthetic Chemist April 2006 Use of Chemical Information in Organic Synthesis Integrated Major Reference Works (iMRW) (Reaction Databases, DiscoveryGate ) (Elsevier MDL, Third Party, Proprietary etc.) present status ClassCodes LinkFinderPlus (citations) LinkFinderPlus (citations) Tertiary Sources Major Reference Works (MRWs) Primary Journals iMRW links Future links April 2006 Use of Chemical Information in Organic Synthesis Integrated Major Reference Works - Concept Simulating chemists’ approach of gathering information from various sources (lateral approach) for solving synthetic problems through a simple point-and-click mechanism Assisting chemists with the synthesis of new compounds by providing complementary information With examples for synthetic methodologies from reaction databases From summaries, critically evaluated by experts, describing reaction mechanisms principles of stereo-controlled reactions applications, preparations, and properties of reagents and other information generally not found in reaction databases Through one-click linking to the primary literature when combined with LinkFinderPlus April 2006 Use of Chemical Information in Organic Synthesis Integrated Major Reference Works - Summary iMRW…. is a unique collaboration between Elsevier MDL, InfoChem and leading scientific publishers (Elsevier Science, Georg Thieme Verlag, and Springer-Verlag) provides one-click, bi-directional linking based on reaction type between synthetic methodology databases and electronic versions of major reference works (MRWs) or between individual MRWs, i.e.a true integration of information: allows text and (sub)structure searching over multiple major reference works from a single user interface April 2006 Use of Chemical Information in Organic Synthesis Major Reference Works in iMRW Detailed information about methodologies based on reaction type Information about scope and limitations of reactions Evaluated experimental procedures Information about reaction mechanism, stereo-control, effect of substituents and ligands, and other factors influencing a reaction Information about reagents and catalysts, their preparation and properties Updates for each of them are planned or under consideration by the publishers and will be added when available April 2006 Use of Chemical Information in Organic Synthesis Comprehensive Asymmetric Catalysis (CAC) - Summary Editors: Eric N. Jacobsen, Andreas Pfaltz, Hisashi Yamamoto (1999) CAC is an innovative reference work that reviews in three volumes catalytic methods for asymmetric organic synthesis, a major challenge in synthetic chemistry today. Illustrated by over 6,000 reactions critically evaluated by 60 leading experts in the field, the basic principles, mechanisms, basis for stereoinduction, and scope and limitations of asymmetric reactions are covered in-depth. April 2006 Use of Chemical Information in Organic Synthesis Comprehensive Organic Functional Group Transformations (COFGT) – Summary Editors-in-Chief: Alan R. Katritzky, Otto Meth-Kohn, Charles W. Rees (1995) COFGT covers in 40,000 reactions and seven volumes the vast subject of organic synthesis in terms of the introduction and interconversion of functional groups. The editors have adopted a rather rigorous, logical and formal treatment on the basis of structure, which enables a detailed analysis of all known, and indeed of some as yet unknown, functional groups. Therefore, the treatise deals rationally and comprehensively with the method of their construction. April 2006 Use of Chemical Information in Organic Synthesis Science of Synthesis - Summary Houben-Weyl Methods of Molecular Transformations Editorial Board: D. Bellus, S. V. Ley, R. Noyori, M. Regitz P. J. Reider, E. Schaumann, I. Shinkai, E. J. Thomas, B. M. Trost 2001 Science of Synthesis is the authoritative and comprehensive reference work for the entire field of organic and organometallic synthesis. The series of 48 volumes will be published over a period of 8 years, it will present 15,000 selected synthetic methods for all classes of compounds illustrated by 150,000 reactions, and it includes - Methods critically evaluated by leading scientists - Background information and detailed experimental procedures - Schemes and tables which illustrate the reaction scope April 2006 Use of Chemical Information in Organic Synthesis Collecting Information for the Synthesis of a new Compound NH2 Target molecule: Me N N N N EtO2C Muray, E.; Rifé, J.; Branchadell, V.; Ortuňo, R.M. J. Org. Chem. 2002, 67, 4520 – 4525 (The paper describes the syntheses of cyclopropyl nucleosides as potential antiviral and antitumor agents) April 2006 Use of Chemical Information in Organic Synthesis Synthesis Plan NH2 N Me EtO2C N N N NH2 X Me N N + N H EtO2C A N B Retrosynthetic Analysis: N1-alkylation of adenine 1.Step: general information about the alkylation reaction 2.Step: information about the preparation of A, including stereochemistry 3.Step: information about scope and limitations, effect of substituents, applicable reagents etc. April 2006 Use of Chemical Information in Organic Synthesis Reaction Substructure + Data Search in DiscoveryGate April 2006 Use of Chemical Information in Organic Synthesis Cl Cl N N + I N N N N Cl N N Cl N N April 2006 Use of Chemical Information in Organic Synthesis April 2006 Use of Chemical Information in Organic Synthesis Search for Similar Reactions in iMRW April 2006 Use of Chemical Information in Organic Synthesis COFGT chapter Literature Linking April 2006 Use of Chemical Information in Organic Synthesis Text Search in iMRW April 2006 Use of Chemical Information in Organic Synthesis Information about Enantioselective Cyclopropanation from CAC April 2006 Use of Chemical Information in Organic Synthesis Text Search Results from COFGT and Linking to Literature April 2006 Use of Chemical Information in Organic Synthesis Integration of iMRW with Reaction Database April 2006 Use of Chemical Information in Organic Synthesis Conclusion DiscoveryGate provides chemists with relevant information from different sources required for solving synthetic problems in a single system allowing for interaction by the user in an interactive fashion Access is provided from an intuitive user-interface by a simple point-and-click mechanism. The system very closely simulates the lateral information gathering process of synthetic chemists April 2006