• Reactions in organic chemistry, review
• Problems in reaction chemistry
• Chemoinformatics methods
• Applications of reaction chemoinformatics to reaction chemistry problems
• Review questions
2
• Reactions in organic chemistry, review
• Chemoinformatics methods
• Applications of reaction chemoinformatics to reaction chemistry problems
• Review questions
3
• Chemical Space
– Chemicals are points in the space
– Reactions are “vectors” describing how to reach new points from existing ones
• Reactant Chemicals Product Chemicals
CH
3
CH
2
H
3
C
+
HBr
H
3
C
Br
• Transformation that forms and breaks bonds
– Rearrangement of electron configuration
4
• Simplest reaction specification is a chemical equation indicating starting reactants and resultant products
Br
C
2
H
5
O Na +
+ HBr
C
2
H
70
5 o
OH
• For practical use and reproducibility, additional information is required:
– Catalyst or other reagents
– Reaction conditions (temperature, solvent, etc.)
– Yield %, etc.
5
• Reactions are fundamentally rearrangements of electron configurations
• Mechanisms describe the specific flow of electrons, the transient intermediates, and the final products
6
• Curved arrow diagrams
– Depict flow of electrons, NOT atoms
– Source must be electrons (bond, lone pair, radical)
– Targets should be atoms / nuclei
O
O
-
Cl
O
+ Cl
-
Cl
N
N C
N
7
• Broadly speaking, reactions are the transfer of electrons from
– Electron-dense groups (nucleophiles) to
– Electron-deficient ones (electrophiles)
8
• Molecular orbitals
– Distinct spaces around atoms that electrons reside in (high electron probability density)
H
– Up to 2 electrons per orbital
– Relative order of reactivity:
N
• radicals (1e) >
• n-orbital: Lone pairs >
• p
-orbital: Double / triple bonds >
• s
-orbital: Single bonds
H s n
..
..
H
.
N
.
p s
..
.
H
C
.
..
..
H
H
9
• Thermodynamics
– Eventually reactions will proceed to thermodynamic equilibrium, maintaining a steady state ratio of products : reactants
– K eq
: Equilibrium constant defining the stable ratio of products : reactants for a reaction under standard conditions (1 atmosphere, room temperature)
– Larger value of K eq for a reaction thus indicates greater favorability
– Given competing products, K eq ones can indicate major
10
• Gibbs Free Energy
– K eq is a function of
D
G o (and temperature)
– D
G: Difference between product and reactant (Gibbs) free energy
• Negative D
G is thus favorable
• State function, measuring thermodynamic stability
• D
G o :
D
G under standard conditions
K eq
= e
-
D
G o /RT
D
G o = -RT ln K eq
R = Universal gas constant
T = Absolute temperature
11
• Enthalpy and Entropy contributors
– G = H – TS
– H: Enthalpy, primarily determined by strength of bonds broken and formed in a reaction
– S: Entropy, measuring “randomness” of a system, with greater randomness being favorable
• For most reactions, D
S is small (esp. when
D n = 0), thus
• Unless at very high temperatures, D
H dominates T
D
S, thus
• Calculating D
H provides a good estimate for
D
G
12
• Thermodynamics
– D
G =
D
H – T D
S (Enthalpy & Entropy contribute)
– Hess’ Law simplification
– D
H reaction
=
S
(BDE broken
) – S
(BDE
– BDE: Bond Dissociation Energy formed
)
• Standard lookup values (kcal/mol)
C-C : 83 C=O : 178
O
C=C : 146 C≡N : 213 83 + 178 + 83 + 213
C=N : 147 etc.
• Kinetics
– Much less data available N
557 http://www.cem.msu.edu/~reusch/OrgPage/bndenrgy.htm
13
• Reaction Kinetics
– Thermodynamics: How “far” a reaction will proceed
– Kinetics: How “fast” a reaction will proceed
2 H
2
+ O
2
2 H
2
O
• Highly favorable D
G, but without a catalyst or flame, reaction proceeds so slowly as to essentially not occur
– Measured by rate constants, but much less data exists
– Based on relative stability of transition states…
14
• Given infinite time, all reactions will reach thermodynamic equilibrium , but
E a
• Intervening, unstable intermediates in the pathway impose an activation energy
(E a
) barrier
• Given limited time and input energy, a reactions may only achieve kinetic equilibrium ,
H O
settling into an energy local minimum between large E a barriers
H
H
H
D
G
Reaction Coordinate
H
Cl
H O
+
H O
H
C
-
Cl
H
H
Cl
-
H H
D
G o = 1.4 kcal / mol ~ 10x K eq
E a
< 22 kcal / mol ~ Room temperature reaction 15
• Reactions in organic chemistry, review
– Reaction Classification
• Specific chemicals
• Compatible functional groups
• Reactant counts
• Bond rearrangement patterns
• Functional classification
• Mechanism based
More Informative
16
• Specific chemicals
– acetic acid + methanamine N-methylacetamide
O
O
+ H
2
N + H
2
O
OH
NH
• Compatible functional groups
– carboxylic acid + primary amine amide + water
O
R
1
O
OH
+ H
2
N R
2
R
1
NH
+
R
2
H
2
O
17
• Reactant counts
– Substitution
D n = 0
O
OH
+ H
2
N
– Addition
D n < 0 H
3
C
CH
2
+
HBr
H
– Elimination
D n > 0 OH
H
3
C
O
+ H
2
O
NH
CH
3
Br
+
H
2
O
18
• Bond rearrangement patterns
– 4 atom bond swap covers ~50% of organic reactions
A B A B A B
O
C D
O
C D C D
O
C Cl
C Cl
C NH
NH H
NH H
19
• Bond rearrangement patterns
– 6 atom cyclic rearrangement covers ~25%
A A
F B F B
E C E C
D D
O O
+
O O
O
O
20
• Functional classification
– Acid-catalyzed,
+ H NO
3 Electrophilic
– Base-catalyzed,
O
Nucleophilic
– Oxidation-
OH
+ H
2
N
Reduction
– Free-radical
– Etc.
OH
H
2
SO
4
Heat
Na
2
Cr
2
O
7
H
2
SO
4
O
NH
O
+ H
2
O
OH
NO
2
+ H
2
O
21
• Mechanism-based
H O
-
– Sn1
– Sn2
– E1
Cl
+
Cl
-
H O
– E2
– etc.
H O
-
H
H
H
Cl H O
H
H
Cl
-
H
– Most informative classification patterns, but
• Reaction mechanisms often unknown
• Mechanisms cannot be directly observed, can only be proposed and supported with exp. evidence
22
• Reactions in organic chemistry, review
• Chemoinformatics methods
• Applications of reaction chemoinformatics to reaction chemistry problems
• Review questions
• Synthesis design (retrosynthesis)
23
• DB : Record and classify all reactions, including:
– Reactants and products
– Reaction conditions, catalysts, solvents, etc.
– Literature references, lab notes, etc.
• Search : Ability to query for information on all reactions that
• Use an epoxide reactant
• Produce an aromatic ring
• Follow the Sn2 reaction mechanism
• Use copper as a catalyst
• Can be run at room temperature in aqueous solution
24
• Combinatorial Chemistry
– Given a collection of “building block” chemicals, combine them with reactions to produce a diverse set of new products
• Virtual Chemical Space
– Systems like ChemDB catalog all chemicals available for purchase from different vendors
– “RChemDB” would store or allow on-the-fly searching of all chemicals indirectly (but easily) available by applying reactions to directly available chemicals
25
• Given a mixture of reactants and reaction conditions, predict the major products
H N NH
2
O
N
+
O
O
NaOMe
D
?
N
O
26
• Knowledge-based
– If a reaction database was available, predicting the course of a reaction could just be a matter of finding it
(or an analog) in the database
• Knowledge-based limitations
– Requires construction of the database of many different known reaction profiles to achieve any degree of generalization
– DB driven approach would be unlikely to discern competing cases. For example,
• carboxylic acid + amine amide
• carboxylic acid + alcohol ester
• carboxylic acid + amino-alcohol ?
27
• Principle-based
– Predict or derive reactions based on general principles of reactivity
– Much more flexible and powerful
– Entails the ability to discover new reaction profiles that may not be in known in any DB
• Principle-based limitations
– Complex reactivity can be very difficult to predict
– Confounding factors of solvent effects, catalysts, etc.
28
• Series of reactions from starting reactants to form a pathway to the final product
H
3
C CaCO
3
H
2
Pd
Quinolone
CH
2
H
3
C
CH
HBr
CH
3
H
3
C
CH
3
Na
+
C
-
N
H
3
C
Br
N
29
• Derive synthesis pathway given
– Starting reactant
– Target product
– Available reagents / reactions
O
Cl
N
O
?
?
?
O
N
O
30
• Derive synthesis pathway given
– Starting reactant pool
– Target product
– Available reagents / reactions
O
H
N
Chemical
Vendor
Catalog
?
?
?
N
N
N
O
O
S
N
O
N
31
• Reactions in organic chemistry, review
• Problems in reaction chemistry
• • Chemoinformatics methods
• Review questions
• SMIRKS
– Quantum Mechanics
32
• Reaction SMILES
– Reaction equation denoted with delimiters
• “.” separates distinct molecules
• “>>” separates reactants from products
Br
+ HBr
CCC(Br)(C)C>>CC=C(C)C.Br
33
• Reaction SMILES
– Catalyst, solvent or other chemicals may be added between the “>>” delimiters
– No natural space to specify non-molecular info such as temperature, yield %, etc.
Br
C
2
H
5
O Na +
C
2
H
5
OH
70 o C
+ HBr
CCC(Br)(C)C>CC[O-].[Na+].CCO>CC=C(C)C.Br
34
• SMARTS
– “Regular expressions” for molecules
– SMILES are SMARTS strings, but
– SMARTS strings can describe more general matching criteria, such as
• Atom types
• Bond types
• Logical operators (and, or, not) http://www.daylight.com/dayhtml_tutorials/languages/smarts/
35
SMARTS Description
*
[C]
[c]
[#6]
Wildcard atom. Matches any atom
Aliphatic (non-aromatic) carbons
Aromatic carbons
Any carbons (aliphatic or aromatic)
[CH3]
[+1]
Terminal carbons (having exactly 3 hydrogens)
Any atom with a formal charge of +1
[OX2] Oxygen with degree 2 (exactly 2 neighbors)
[!#1] Any atom that is NOT hydrogen
[N,O,C-1] Nitrogen OR oxygen OR (carbon with –1 charge)
[N,O;+1] (Nitrogen OR oxygen) AND +1 charge http://www.daylight.com/dayhtml_tutorials/languages/smarts/ for complete rule list
36
SMARTS Description
[CH3]C(=O)[OH] Acetic acid
*C(=O)[OH]
C(=O)O
Any carboxylic acid
Any carboxylic acid or ester
C(=O)[F,Cl,Br,I] Any acid halide
[C+,B;X3] Carbocation or neutral boron
37
• SMIRKS
– Reaction profile describing reactants and how to transform them into respective products
– Combination of
• Reaction SMILES
• SMARTS
• Atom Mapping
– Generally must be manually specified.
Limited work done to automatically derive reaction profile from specific examples http://www.daylight.com/dayhtml_tutorials/languages/smirks/
38
• Atom Mapping
– Necessary to map reactant to product atoms
– Proper transform requires balanced stoichiometry
O
• Hydrogens generally must be explicitly specified
1 O 1
2
+
H
8 4 5 10
N H-R
2
2
+ H
7,8 3
2
O
R 9 3 7
1
O H
Carboxylic acid +
Primary amine
Amide +
Water
9 4 5 10
R
1
N H-R
2
[O:1]=[C:2]([*:9])[O:3][H:7].
[H:8][N:4]([*:10])[H:5]>>
[O:1]=[C:2]([*:9])[N:4]([*:10])[H:5].
[H:7][O:3][H:8]
39
• Atom mapping implies mechanism
– Two feasible mechanisms for reaction below
– Ambiguity without at least atom mapping
H O
-
Br
H O
OH
+ Br
-
+ Br
-
• Atom mapping still lacks a complete mechanistic description analogous to “curved arrow” diagram
40
• Capable of accurate predictions for
– Chemical reactivity
– Chemical stability Reaction favorability
• Requires significant computational power, unfeasible for large scale processing
41
• Reactions in organic chemistry, review
• Problems in reaction chemistry
• Chemoinformatics methods to organic chemistry problems
• Review questions
– Reaction prediction / discovery
– Synthesis design (retrosynthesis)
42
• Storage
– Specific reactions can be recorded with reaction SMILES
– More general mechanistic reaction profiles can be stored with SMIRKS
• Retrieval
– Search by reactant or product is same as usual chemical structure search
– Search by bonds that change focuses on reaction centers to find similar classes
43
• Most repositories with thousands of records, some may have millions
- CASREACT - Beilstein
- ChemInform RX - ChemReact
• Generally poor consistency and completion of
– Balanced reaction stoichiometry
– Atom mapping / mechanistic description
– Reaction conditions, etc.
• Not publicly available or difficult to access
44
• Algorithm features needed
– Hypothesis generating scheme
– Thermodynamic scoring system
– Kinetic scoring system
– Known reactions database
H N NH
2
O
NaOMe
O
N
N
+
O
O
D
?
45
• Find electron donors (nucleophile) and electron acceptors (electrophile) using rules and rank them
• Compute all possible intermediates
• Rank by Enthalpy (+Enthropy)
• Recurse
• Stopping rule (drop in delta G)
46
0
H
3
C
O
C
Cl
H O
-
H
3
C
C
O
-
Cl
H O
+7
H
3
C C
O
OH
Cl
-
-17.5
Blue: HOMOs / Nucleophiles
Red: LUMOs / Electrophiles 47
0
H H
CH
O
+
CH
O H
H Br +300
Br
-
H
H
+315
CH
2
CH
+
Br
-
H
O H
CH
-
+415
Br
CH
H
O
+
H
H
H
O H
-25
H O
H Br
Br
-
H
+300
O
+
H
-30 Br
Blue: HOMOs / Nucleophiles
Red: LUMOs / Electrophiles 48
• Apply retro reactions towards available starting reactants
OH
OH
Dead End
O
O
O
O
OH
OH
Starting
Material
H O OH
Starting Material
OH
Br
Dead End
OH
49
• Retrosynthetic
– Interactive: LHASA, SECS
– Non-Interactive: SYNCHEM
• Forward: SST, CHIRON
• Formal: IGOR, WODCA, SYNGEN
• Reaction Prediction: CAMEO, EROS
Todd, M. H. (2004). "Computer-Aided Organic
Synthesis." Chemical Society Reviews(34): 247-266.
50
N
+
H N O
Retro Diels-Alder
Target Structure
Nothing directly similar in DB
1. Apply retro reaction to find possible components
O
N
H
O
O
O
O
NH
NH
2
N
H N
N
N
O
O
O
O
O
O
O
N
2. Search DB for items similar to components
N N N
N N
N
O
O N
N
N
H
N
N
O
O
N
N
N
N
N
N
N
N
N
51
N N
O NH
O
N
N N
N O N N
N
O
N
O
O O
N
O N +
N O
O
O N
NH
2
O
H
N
O
Forward Diels-Alder
N O
N
H
N O
N
3. Reapply forward reaction to components to generate theoretical products that should be similar to the original target
N
N
N
N
4. 160 unique products resulted with similarity scores ranging in [0.247, 0.860],
14 with similarity score > 0.80
N
H N N
N
O
O
N
O
O
N
H N O
Target Structure
H N N N N
N N
O O
O
O O
O
52
• Synergy between:
1. Chemical DB
2. Reaction DB
3. Reaction mechanism
4. Search algorithms (chemical and reactions)
– Address combinatorial challenges
53
54
• Discover reaction profiles by general principles
• Generic 4 atom reaction profile covers about
50% of all known organic reactions
A B
A B
A B
C D C D
O
O C D
O
C Cl
C Cl
C NH
NH H
NH H
55
• Still, a screening or ranking method is needed to filter many unrealistic reactions proposed
O
O
C Cl
NH
2 Cl
O
O Cl
NH
NH
NH H
CH
4
• More sophisticated profiles are not covered without more knowledge based profiles
Diels-Alder
Azide + Alkyne aromatic cyclization
Cl
+
N N
+
+
N
-
N N
N
56
• Thermodynamics
– D
G =
D
H – T D
S (Enthalpy & Entropy contribute)
– Hess’ Law simplification
– D
H reaction
=
S
(BDE broken
) – S
(BDE
– BDE: Bond Dissociation Energy formed
)
• Standard lookup values (kcal/mol)
C-C : 83 C=O : 178
O
C=C : 146 C≡N : 213 83 + 178 + 83 + 213
C=N : 147 etc.
• Kinetics
– Much less data available N
557 http://www.cem.msu.edu/~reusch/OrgPage/bndenrgy.htm
57
• More generalized, pseudo-mechanistic reaction modeling with the introduction of “intermediates”
• Model breaking a bond by separating charge, representing bond electrons moving to one atom
A B A + B A B
C D C D + C D
• Closing the intermediates is then just a matter of matching + and - charges
58
• Applying general electron-shifting rules on the intermediates provides significant power and chemically intuitive results
O H O H +
O
O
-
H +
59
R
1
R
2
N
C
N + N -
C R
3
R
1
N
N
N
R
1
R
2
N + N
C C +
N
R
-
3
C C
R
2
-38.9 kcal / mol
R
3
60
C
C
C
C C +
C C
C
C +
C
C
C
C
C -
C C +
C C +
-40 kcal / mol
61
• Rather than trying all possible bond rearrangement combinations, can use reactivity principles to predict
• For example, frontier molecular orbital theory can find the
– Highest Occupied Molecular Orbital (HOMO)
– Lowest Unoccupied Molecular Orbital (LUMO)
62
Components
Starting Reactants
Reagents w/ Reaction
Profiles
Synthesis Problem
63
• Synthesis problem generator
– Tutorial for students
– Test base for retro-synthesis algorithm
• Algorithm features needed
– Knowledge base of reactions
– Retro-reaction application
– Heuristic to guide search
64
• Reactions in organic chemistry, review
• Problems in reaction chemistry
• Chemoinformatics methods
• Applications of reaction chemoinformatics
• • Review questions
65
• For each molecule, what is the most reactive (lone or bond) pair of electrons?
....
+
NH
3
Recall the relative order of molecular orbital reactivity
• n-orbitals (lone pairs) >
• p
-orbitals (double / triple bonds) >
• s
-orbitals (single bonds)
Lone pairs win in general, though no lone pair is available in the last molecule (the nitrogen has already been protonated). In that case, the p
-orbital (double bond) supercedes the s
-orbitals of all the single bonds
66
• For the reaction energy diagram, suppose A = B = 2.8 kcal / mol
• Would you expect the reaction to proceed at room temperature?
• At thermodynamic equilibrium, what ratio of products : reactants would you expect?
• Which of the following would shift the equilibrium closer to 50:50 ratio?
a. Adding a catalyst b. Heating the reaction mixture c. Raising the universal gas constant d. None of the above
B
A
Reactant Intermediate Product
Reaction Coordinate
67
• For the reaction energy diagram, suppose A = B = 2.8 kcal / mol
• Yes, expect the reaction to proceed at room temperature because
A
E a
= A < 22 kcal /mol
• At equilibrium, expect products : reactants ratio = Keq ~ 100:1
10x Keq ~ 1.4 kcal / mol
D
G = B
B
• Shifting the equilibrium ratio… Reactant Intermediate Product a. Adding a catalyst: No, this lowers E a
, but
D
Reaction Coordinate
G is unchanged.
Free energy is a state function. Catalyst only accelerates reaction b. Heating the reaction mixture: Yes, K eq depends on
D
G and temperature.
Higher temperature provides more energy to maintain less stable state
K eq
= e
-
D
G o /RT
68
• Using the provided bond dissociation energies
(BDE), which of the products do you predict is most likely for a reaction between the reactants?
Bond BDE
H O
+
H
2
O
H —H 104
C —C 83
H O
H
2
H O
C=C 146
C —O 85
C —H 99
O
+
H O +
H O
H O
O —H 111
O —O 35
69
• Using the provided bond dissociation energies
(BDE), which of the products do you predict is most likely for a reaction between the reactants?
Bond BDE
H O
+
H
2
O
H —H 104
C —C 83
H O
H
2
H O
C=C 146
C —O 85
C —H 99
O
+
H O +
H O
O —H 111
O —O 35
(O —H + O—H) –
(O —O + H—H) =
(111 + 111) –
(35 + 104) =
(O —H + C—C) –
H O
(C —O + C—H) =
(111 + 83) –
(85 + 99) =
(O —H + C=C) –
(C —O + C—H + C—C) =
(111 + 146) –
(85 + 99 + 83) =
+10
+83 -10
70
• Which reactions can NOT be classified into the 4 atom bond rearrangement pattern?
A —B + C—D A —C + B—D
O
OH
+ H
2
N
O
NH
+ H
2
O
Br
C
2
H
5
O Na +
C
2
H
5
OH
70 o C
+ HBr
CH
3
H
3
C +
CH
2
+ HBr H
3
C
Br
+ H O NO
2
H
2
SO
4
Heat
NO
2
+ H
2
O
OH
Na
2
Cr
2
O
7
H
2
SO
4
O
OH
71
• Which reactions can NOT be classified into the 4 atom bond rearrangement pattern?
A —B + C—D A —C + B—D
O
OH
+ H
2
N
O
NH
+ H
2
O
Br
C
2
H
5
O Na +
C
2
H
5
OH
70 o C
+ HBr
CH
3
H
3
C +
CH
2
+ HBr H
3
C
Br
+ H O NO
2
H
2
SO
4
Heat
NO
2
+ H
2
O
OH
Na
2
Cr
2
O
7
H
2
SO
4
O
OH
72
• What features in the reaction below can
NOT be specified with reaction SMILES?
O
OH O
10% NaOH, H
2
O
2
5 o C
(50% yield)
CC=O.CC=O>[Na]O.O>CC(O)CC=O
Could not specify “10%,” reaction temperature or yield
73
#
• For each SMARTS pattern, indicate which molecules it will find at least one match in.
O SMARTS
5
6
7
1
2
C#C
C(=O)O
3 *C(=O)[OH]
4 C(=O)[F,Cl,Br,I]
[#8X1]
[X3]=[!O]
[c]
O
O
OH
OH
OH
O
-
Cl
N
O
Cl
O
O
74
#
1
2
• For each SMARTS pattern, indicate which molecules it will find at least one match in.
O SMARTS
C#C
OH
2
3
5
6
C(=O)O
OH
7
3 *C(=O)[OH]
6
7
4 C(=O)[F,Cl,Br,I]
5 [#8X1]
[X3]=[!O]
[c]
O
O
2
5
6
OH
O
-
Cl
5
N
O
O
4
5
Cl
O
1
5
75
• Apply each SMIRKS string to the respective starting reactants below to generate a product
[C:1]=[C:2].[H:3][Br:4]>>[H:3][C:1][C:2][Br:4]
Hydrobromination, Alkene
Br
+ HBr
[C:1]#[C:2].[H:3][H:4]>>[H:3][C:1]=[C:2][H:4]
Hydrogenation, Alkyne
[C:1]=[C:2].[H:3][H:4]>>[H:3][C:1][C:2][H:4]
Hydrogenation, Alkene
OH OH
+ H
2
+ H
2
Br
[H:3][C:1][C:2][O:4][H:5]>>[C:1]=[C:2].[H:3][O:4][H:5]
Dehydration
OH OH
X + H
2
O
Br
No reaction! Reactant does not match the SMIRKS reactant pattern. No [H:3] attached to [C:1] 76
• Using the SMIRKS defined reactions and starting materials in this and the previous slide, come up with a synthesis pathway for the boxed target molecule
Br
H O
OH
H O
Br
77
+ H
2
O
Br
+ 2 HBr
Halogenation
Dehydration
Br
H O
+ H
2
Available Starting Material
Hydrogenation, Alkyne
H O
78
• Enthalpy determination
– D
H f
: “Heat of formation.” State function indicating the heat / energy produced accompanying formation of a substance from its constituent elements in standard states (room t, 1 atmosphere)
Formation equation for carbon dioxide:
C(solid, graphite) + O
2
(gas) CO
2
(gas)
– Only relative values have meaning, “constituent elements in standard state” is an arbitrary zero point
79