Integrated Major Reference Works (iMRW)

advertisement
Use of Chemical Information in Organic Synthesis
Reaction Information for the Practicing Synthetic Chemist:
The Search for Relevant Answers
AGENDA:







Available information
Introduction to reaction data searching
Concepts and problems
Basis of reaction classification
DiscoveryGate
Retrieving relevant information for the synthesis of new compounds
Questions & Answers
Guenter Grethe
May, 2006
April 2006
Use of Chemical Information in Organic Synthesis
Information Needs of Synthetic Organic Chemists
in Basic Research and Development
• new preparation of intermediates and starting materials
• well established, high yield preparations (experimental procedures)
• new synthetic methodologies (new reagents, catalysts etc.)
• information on starting materials (availability, price, physical data etc.)
• physical properties of reagents, solvents and catalysts
• access to the primary, secondary, and tertiary literature
• spectral information of related compounds
General: searching for information on molecules precedes retrieval of
synthetic methodology data
April 2006
Use of Chemical Information in Organic Synthesis
Differences in Molecule vs. Reaction Searching
CN
Cl
Molecules:
NO2
 Query: Is this particular molecule or similar ones known? Specific data?
 Answer: Yes or No from existing databases, including patents
CN
Cl
Reaction
Conditions?
Cl
NH2
Reactions:
NO2
NO2
 Query: How to selectively reduce the nitrile group (transformation?)
 Answer: Pointers to relevant examples in the literature
 Criteria:
 Efficient transformation
 Functional group compatibility
 Reactions conditions
April 2006
Use of Chemical Information in Organic Synthesis
Available Reaction Databases
 online:
CASREACT (CAS) (ca. 10.5 Mio, including Spresi database, 1985 - present )
Spresi (InfoChem) (ca. 4.5 Mio, 1974 – 2004)
CrossFireplusReactions (Elsevier MDL, STN) (ca. 10 Mio, 1779 - present)
ChemInform RX on STN (FIZ Chemie) (ca. 0.8 Mio)
CCR (Thomson Scientific) (ca. 0.6 Mio)
inhouse:
ChemInform Reaction Library (Elsevier MDL)
Spresi (InfoChem)
CrossFire Beilstein (Elsevier MDL)
Specialty Databases (several vendors)
Proprietary Databases
For a good review see: Zass, E. "Reaction Databases", In: Encyclopedia of Computational
Chemistry, Schleyer, P. von R.; Allinger, N.L.; Clark, T.; Gasteiger, J.; Kollman, P.A.; Schaefer,
H.F.; Shreiner, P.R. (Eds.). Wiley, Chichester, 4, 2402-2420. QD39.3.E46 E53 1998
April 2006
Use of Chemical Information in Organic Synthesis
Use of Available Information in Synthesis

Preparation of a distinct compound requires
 access to information about new synthetic methodologies in journals and
databases
 experimental details for the preparation of known intermediates and
starting materials from databases, journals and other sources
 tools to plan syntheses and select optimal reaction conditions

Preparation of a library of diverse compounds requires
 all of the above
 knowledge about the characteristics of functional groups
 information about available building blocks

Process development requirements are defined by
 access to information about various reaction conditions of a reaction
 knowledge about the characteristics of molecules or their fragments under
required reaction condition
 tools to calculate the behavior of reagents, solvents, and catalysts
April 2006
Use of Chemical Information in Organic Synthesis
Barriers Impeding the Use of Available Information by Endusers




multiple access systems
different user interfaces
different modi operandi
difficult query formulation
 substructure concept
 keyword inconsistencies
 limited post-search management of large hitlists
 some integrated access to other information sources
Most importantly: failure of available systems to recognize
and to facilitate the integration of the vast knowledge of
synthetic chemists
April 2006
Use of Chemical Information in Organic Synthesis
Search Modes
 Structure-Based Searches
 Full structure
 Only for reactions with known molecules (not very useful)
 Reaction substructure (RSS)
 Most frequently used mode (difficult for end-users to formulate effective query)
 Reaction similarity
 Various methodologies using different parameters (results often vary greatly, good
for browsing and idea generation)
 Reaction classification
 Several methodologies, mostly based on structural information about reaction centers
and immediate environment (good indexing tool, improvement over reaction
similarity)
 Reagents, Solvents
 Full structure and substructure searches for molecules (not available in all databases,
used mostly in conjunction with other structural searches)
 Data-Based Searches
 Keywords
 intellectually derived terms for name reactions, reaction types etc. (incomplete, not
very useful)
 Journal, author, title, yields, etc.
 Text or numeric data searches (mostly used in conjunction with structural searches)
April 2006
Use of Chemical Information in Organic Synthesis
Problems with Reaction Searching
Synthetic Problem:
CH3O
CH3O
N
O
Full Structure Search:
N
O
O
O
O
No hits*
Reaction Substructure Search (colored fragment):
Class Code Search
O
119 hits*
672 hits* (broad, reaction center only)
Keyword Search “Michael Addition”:
2972 hits*
*Results were obtained from Elsevier MDL’s combined reaction databases (ca. 1 Mio reactions); 2006
April 2006
Use of Chemical Information in Organic Synthesis
Problems with Substructure Searching
N
Oversimplified Query
(nitrile to primary amine)
Cl
NH2
Cl
N
DATABASE SIZE: ca. 1 million reactions
NH2
O2N
737 Hits
N
O2N
N
Narrowly Defined Query
Cl
NO2
N
Cl
NO2
NH2
0 Hits
Problems:
Solutions:
- how to avoid excessively large hitlist
- how to formulate “reasonable” search queries
- combination of several queries (expert approach)
- indexing of reactions (focusing on relevant reactions)
- facilitating query building (non-expert approach, intuitive)
April 2006
Use of Chemical Information in Organic Synthesis
Goal for an Efficient Reaction Data Management System
Create an environment that allows for combining the
intelligence and creativity of synthetic chemists with
the processing and simulating power of computers and
the wealth of information in databases to meet the
challenges in the laboratory for developing efficient
syntheses.
April 2006
Use of Chemical Information in Organic Synthesis
Requirements to Facilitate Enduser Searching
 User interfaces based on users’ tasks and capabilities
(e.g. CrossFire Web, DiscoveryGate, Reaction Browser, Scifinder)
(see “A Framework for the Evaluation of Chemical Structure Databases”, Cooke,F;
Schofield, H. J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140)
 Hierarchical thesauri for keywords and reaction types
 Effective indexing of databases (e.g. classification)
 Simplification of the querying process (natural, not rule dependent)
 Efficient post-search management tools (e.g.clustering)
 Seamless integration of various information sources
(web environment, point-and-click)
Most importantly: available tools must simulate the chemist’s problem
solving process
April 2006
Use of Chemical Information in Organic Synthesis
Databases in DiscoveryGate
April 2006
Use of Chemical Information in Organic Synthesis
ReactionClassification
Classification as
as Indexing
Reaction
IndexingTool
Tool
‘Do We Still Need a Classification of Organic Reactions?’
 Reasons
 alternate method for indexing databases - complement to structurebased retrieval systems
 access to “generic” types of information in retrieval systems
 post-search management of large hitlists
 simplification of query generation
 linking of reaction information from different sources
 source for deriving knowledge bases for reaction prediction and
synthesis design
 automatic procedures for analyses and correlations, e.g. quality
control and overlap studies
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification as Indexing Tool
Examples of some recent work
 Horace: An Automatic System for the Hierarchical Classification of Chemical Reactions.
Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput. Sci. 1994, 34, 74
 COGNOS: A Beilstein-Type System for Organizing Organic Reactions.
Hendrickson, J.B., Sander, T. J. Chem. Inf. Comput. Sci. 1995, 35, 251
 Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a SelfOrganizing Neural Network.
Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997, 119, 4033
 Classification of Organic Reactions: Similarity of Reactions Based on Changes in the
Electronic Features of Oxygen Atoms at the Reaction Sites.
Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput. Sci. 1998, 38,
210
 Topology-Based Reaction Classification: An Important Tool for the Efficient Management of
Reaction Information.
Kraut, H., Löw, P., Matuszczyk, H., Saller, H., Grethe, G. Proceed. 5th Internat. Conf. Chem. Struct.,
Noordwijkerhout, The Netherlands 1999, 26
 Analysis of Reaction Information.
Grethe, G. In “Handbook of Chemoinformatics” Gasteiger, J. (Ed.) Wiley-VCH, Volume 4, 1407 – 1427,
Weinheim, 2003
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Indexing through Classification
CH3O
CH3O
N
N
O
O
O
O
O
O
Based on:
Keywords: Michael addition, Michael reaction, ring closure…….
Molecule Type: N-heterocycle, isoquinoline, quinolizidine…..
Reaction Type: reaction centers
CH3O
CH3O
N
O
N
O
O
O
O
O
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Classify v.2. 5, developed by InfoChem, Munich
Based on InfoChem’s reaction center perception algorithm
Rules and Definitions
A bond is defined as a reaction center if it is made or broken
 An atom is defined as a reaction center if it changes
number
of implicit hydrogens
number
of valencies
number
of -electrons
atomic
the
charge
connecting bond is a reaction center
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
 Hashcodes are calculated for all reaction centers taking into
account atom properties







atom type
valence state
total number of bonded hydrogens (implicit plus explicitly drawn)
number of -electrons
aromaticity
formal charges
reaction center information
 The sum of all reaction center hashcodes of all reactants and
one product of a reaction provides the unique reaction
classification code:
‘ClassCode’
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
 Inclusion of atoms in the immediate environment (spheres)




reaction centers only (0-sphere = BROAD)
reaction centers + -atoms (1-sphere = MEDIUM)
reaction centers + -atoms (2-sphere = NARROW)
inclusion of one sp3-atoms during sphere expansion
 Atom equivalency
 atoms in the same group of the periodic table, with the
exception of row-2 elements, are considered equivalent
 Multiple occurrences of identical transformations are
handled as one
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification - Background
Rules and Definitions
C
N
C
N
N
C
C
C
H
N
C
C
N
N
N
H
N
0-Sphere (Broad)
...655778
Reaction centers only, similar to broadly
based substructure search
large-sized cluster or hitlist
H
H
1-Sphere (Medium)
...151297
Reaction centers plus alpha atoms,
excluding hydrogens
medium-sized cluster or hitlist
C
N
C
C
N
N
N
H
H
2-Sphere (Narrow)
...077692
Reaction centers plus beta atoms,
excluding consecutive sp3-atoms
small-sized cluster or hitlist
Number of hits from CIRX97 (70060 rxns) for identical transformation at different classification levels
700
O
OH
300
O
broad
HO
medium
Number of hits
narrow
50
Topological specificity
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Classification – Clustering of Search Results
 Classification codes are data
 stored in the database
 usable for sorting (clustering)
Result: 156 hits
CH3O
CH3O
RSS-Search Query:
(in red)
N
O
N
O
O
O
O
Clustered by
Classification
Code “MEDIUM)
O
72 clusters
1.Cluster (20 rxns)
2.Cluster (15 rxns)
Chiral
O
O
N
O
N
N
H
O
O
O
O
H
N
O
O
O
O
O
O
O
3.Cluster (13 rxns)
4.Cluster (8 rxns)
Chiral
O
O
O
O
N
O
O
O
O
N
O
O
H
H
O
O
O
O
April 2006
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names
 Chemists are familiar with Name Reactions (Diels-Alder, Michael etc.)
 Papers in a one issue of JOC (22, 2004) mentioned 20 name reactions, known and
lesser known, some multiple times
 e.g.,Mitsunobu reaction, Nazarov reaction, Wolff rearrangement etc.
 Several books dealing exclusively with Name Reactions* (ca.700 reactions)
 Use of Name Reactions facilitates reaction retrieval
 Complementary to other searches
 Used in combination with other data
 Easier alternative to formulating complex RSS queries
 Excellent browsing tool
 Overview of scope and limitations of a given reaction, e.g. Aldol reaction
 Combining different reaction types leading to same compound class
 Hantzsch pyridine synthesis from dihydropyridines or ß-keto esters
 Fischer Indole synthesis from hydrazines or hydrazones
 Darzens reaction of epoxides from esters, amides, sulfones, or nitriles
*References
Named Organic Reactions, Laue, T. and Plagens, A., Eds., John Wiley &Sons, 1 st Edition 1999, 2nd Edition 2005
Organic Syntheses Based on Name Reactions, Hassner, A. and Stumer,C., Eds., Elsevier Science,1st Edition 1994; 2nd Edition 2002
Name Reactions, Li, J. J., Ed., Springer, 2002
Strategic Applications of Named Reactions, Kürti, L. and Czakó, B., Eds., Elsevier, 2005
Name Reactions and Reagents in Organic Synthesis, Mundy, B.P; Ellerd, M.G. and Favaloro, F.G., Jr. Wiley Interscience 2005
April 2006
Note: The work on classification by reaction names is being developed at InfoChem (Munich) in consultation with G.Grethe
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names - Requirements
 Established electronically not intellectually
 NOW – Intellectually derived
 Inclusion of intellectually derived keywords greatly varies from database to
database and depend on abstractors and are either too inclusive or not
comprehensive
 Example: “Michael addition” 184 hits (keywords) vs. 89 hits (RSS search)
52 hits (reaction name keywords)
 FUTURE – Electronically derived
 Assignments based on single or multiple RSS searches
Uselogic
of is
Chemical
Organic
Synthesis
 Boolean
applied toInformation
combine and/orin
subtract
search
results (queries)
 Assignments are pre-processed and added as data to database(s)
 Name reactions are aligned in hierarchical order
 Based on main reaction categories (addition, substitution, rearrangements,
eliminations, oxidations, reductions)
 Reactions can be listed in multiple categories, e.g.:
 Baeyer-Villiger oxidation in Oxidation and Rearrangement
 Hierarchy must be able to accommodate non-name reactions (future project)
 Reactions containing n reactions (e.g., tandem reactions) are listed in n
categories
 Individual name reactions have to be recognizable
 Otherwise, stored under “Miscellaneous”
 Queries and corresponding names are stored in spreadsheet
April 2006
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names - Hierarchy
Main categories
Addition
Substitution
Rearrangements
First Level
Second Level
Third Level
1,2-Addition
Darzens condensation
Sulfones
1,4-Addition
Michael reaction
Intermolecular
Cycloaddition
4+2 Cycloadditions
Diels-Alder reaction
Aromatic electrophilic
Friedel-Crafts acylation
Intramolecular
Aliphatic Nucleophilic
Schotten-Baumann reaction
Free radical
Gomberg-Bachmann reaction
Intermolecular
Nucleophilic
Hofmann rearrangement
Alkyl
Sigmatropic
[3,3] Sigmatropic rearrangement Claisen rearrangement
Radical
Elimination
Cope reaction
Reductions
Cannizaro reaction
Intermolecular
Oxidations
Baeyer-Villiger oxidation
Lactones
Heterocyclic Synthesis
Hantzsch pyridine synthesis Modified
Miscellaneous
Alper reaction
Chugaev reaction
Cyclocarbonylation
April 2006
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names– Keyword Generation
Example: Intermolecular Mannich reaction with CH-acidic compounds
Procedure: - generate query for general search
- check hitlist for non-relevant hits
- formulate queries to eliminate negatives
- combine queries using Boolean operators
OCH3
CHO
CH3
+ H2N
H3C
O
[C,H]
HN
[C,H]
H
N
.1.
.4.
.2.
[C,H]
+
[C,H]
C(s*)
O
+
N(s*)
[C,H] .1. .2. C(s*)
A
H
+
OCH3
.3.
.3.
A
.4.
[C,H]
H3C
O
Mannich reaction
Query Q1
Elimination of negative hits:
[C,H]
[C,H]
O
O
O
O
+
O
N(s*)
+
N
N
N
N
N(s*)
Q
+
[C,H]
C(s*)
+
A
N(s*)
C(s*)
A
A
[C,H]
Query Q2
NH2
Aza Diels-Alder reaction
A
O
H
+
H
Q
Biginelli reaction
CHO
+
O
N
H
N
[C,H]
H
[C,H]
N
.1.
.3.
.2.
[C,H]
+
[C,H]
C(s*)
O
+
H
A
.3.
A
.4.
A
(s*)
Rn
.1.
[C,H] N
(s*)
.2. C
[C,H]
A .4.
Query Q3
Query set for intermolecular Mannich reaction with CH-acidic compounds: Q1 – (Q2+Q3)
April 2006
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names
Example of query menu (partial
view) from InfoChem’s SpresiWeb
April 2006
Use of Chemical Information in Organic Synthesis
“The design of organic syntheses by chemists without the help of computers proceeds in
anything but a systematic stepwise manner from the target molecule to available starting
materials. A systematic stepwise approach is more the exception than the rule”.
“The human mind solves problems by lateral thinking, jumping from one idea to the next,
from one question to a different one, from retrosynthetic thinking to considering the
course and outcome of a reaction ,etc.”
Gasteiger, J.; Ihlenfeldt, W.D.; Roese, P. Recl.Trav.Chim.Pays-Bas 1992, 111, 270.
The paradigm in an ideal electronic world
Journals
Major Reference
Works
Books
Databases
E-Labjournal
Databases
+ Knowledge, Intuition, and Experience of Synthetic Chemist
April 2006
Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works (iMRW)
(Reaction Databases, DiscoveryGate )
(Elsevier MDL, Third Party, Proprietary etc.)
present status
ClassCodes
LinkFinderPlus (citations)
LinkFinderPlus (citations)
Tertiary Sources
Major Reference Works (MRWs)
Primary Journals
iMRW links
Future links
April 2006
Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works - Concept


Simulating chemists’ approach of gathering information from
various sources (lateral approach) for solving synthetic problems
through a simple point-and-click mechanism
Assisting chemists with the synthesis of new compounds by
providing complementary information
 With examples for synthetic methodologies from reaction databases
 From summaries, critically evaluated by experts, describing




reaction mechanisms
principles of stereo-controlled reactions
applications, preparations, and properties of reagents
and other information generally not found in reaction databases
 Through one-click linking to the primary literature when combined
with LinkFinderPlus
April 2006
Use of Chemical Information in Organic Synthesis
Integrated Major Reference Works - Summary
iMRW….
is a unique collaboration between Elsevier MDL, InfoChem and leading scientific
publishers (Elsevier Science, Georg Thieme Verlag, and Springer-Verlag)
provides one-click, bi-directional linking based on reaction type between
synthetic methodology databases and electronic versions of major
reference works (MRWs) or between individual MRWs, i.e.a true integration of
information:
allows text and (sub)structure searching over multiple major reference works
from a single user interface
April 2006
Use of Chemical Information in Organic Synthesis
Major Reference Works in iMRW
 Detailed information about methodologies based on reaction type
 Information about scope and limitations of reactions
 Evaluated experimental procedures
 Information about reaction mechanism, stereo-control, effect of substituents and ligands,
and other factors influencing a reaction
 Information about reagents and catalysts, their preparation and properties
 Updates for each of them are planned or under consideration by the publishers and will
be added when available
April 2006
Use of Chemical Information in Organic Synthesis
Comprehensive Asymmetric Catalysis (CAC) - Summary
Editors: Eric N. Jacobsen, Andreas Pfaltz, Hisashi Yamamoto
(1999)
CAC is an innovative reference work
that reviews in three volumes catalytic
methods for asymmetric organic
synthesis, a major challenge in
synthetic chemistry today. Illustrated
by over 6,000 reactions critically
evaluated by 60 leading experts in the
field, the basic principles, mechanisms,
basis for stereoinduction, and scope
and limitations of asymmetric reactions
are covered in-depth.
April 2006
Use of Chemical Information in Organic Synthesis
Comprehensive Organic Functional Group
Transformations (COFGT) – Summary
Editors-in-Chief: Alan R. Katritzky, Otto Meth-Kohn, Charles W. Rees
(1995)
COFGT covers in 40,000 reactions and seven
volumes the vast subject of organic synthesis in
terms of the introduction and interconversion
of functional groups. The editors have adopted
a rather rigorous, logical and formal treatment
on the basis of structure, which enables a
detailed analysis of all known, and indeed of
some as yet unknown, functional groups.
Therefore, the treatise deals rationally and
comprehensively with the method of their
construction.
April 2006
Use of Chemical Information in Organic Synthesis
Science of Synthesis - Summary
Houben-Weyl Methods of Molecular Transformations
Editorial Board: D. Bellus, S. V. Ley, R. Noyori, M. Regitz
P. J. Reider, E. Schaumann, I. Shinkai, E. J. Thomas, B. M. Trost
2001
Science of Synthesis is the authoritative
and comprehensive reference work for the
entire field of organic and organometallic
synthesis. The series of 48 volumes will be
published over a period of 8 years, it will
present 15,000 selected synthetic methods
for all classes of compounds illustrated by
150,000 reactions, and it includes
- Methods critically evaluated by leading
scientists
- Background information and detailed
experimental procedures
- Schemes and tables which illustrate the
reaction scope
April 2006
Use of Chemical Information in Organic Synthesis
Collecting Information for the Synthesis of a new Compound
NH2
Target molecule:
Me
N
N
N
N
EtO2C
Muray, E.; Rifé, J.; Branchadell, V.; Ortuňo, R.M. J. Org. Chem. 2002, 67, 4520 – 4525
(The paper describes the syntheses of cyclopropyl nucleosides as potential antiviral and antitumor agents)
April 2006
Use of Chemical Information in Organic Synthesis
Synthesis Plan
NH2
N
Me
EtO2C
N
N
N
NH2
X
Me
N
N
+
N
H
EtO2C
A
N
B
Retrosynthetic Analysis: N1-alkylation of adenine
1.Step: general information about the alkylation reaction
2.Step: information about the preparation of A, including stereochemistry
3.Step: information about scope and limitations, effect of substituents, applicable reagents etc.
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Substructure + Data Search in DiscoveryGate
April 2006
Use of Chemical Information in Organic Synthesis
Cl
Cl
N
N
+
I
N
N
N
N
Cl
N
N
Cl
N
N
April 2006
Use of Chemical Information in Organic Synthesis
April 2006
Use of Chemical Information in Organic Synthesis
Search for Similar Reactions in iMRW
April 2006
Use of Chemical Information in Organic Synthesis
COFGT chapter
Literature Linking
April 2006
Use of Chemical Information in Organic Synthesis
Text Search in iMRW
April 2006
Use of Chemical Information in Organic Synthesis
Information about Enantioselective Cyclopropanation from CAC
April 2006
Use of Chemical Information in Organic Synthesis
Text Search Results from COFGT and Linking to Literature
April 2006
Use of Chemical Information in Organic Synthesis
Integration of iMRW with Reaction Database
April 2006
Use of Chemical Information in Organic Synthesis
Conclusion
 DiscoveryGate provides chemists with relevant
information from different sources required for solving
synthetic problems in a single system allowing for
interaction by the user in an interactive fashion
 Access is provided from an intuitive user-interface by a
simple point-and-click mechanism.
 The system very closely simulates the lateral
information gathering process of synthetic chemists
April 2006
Download