Molecular Similarity and Molecular Structure

advertisement
Molecular Similarity and Molecular Structure
N. Sukumar
ISPC, San Francisco, Aug. 2007
Why should molecules have Structure?
“The idea that molecules are microscopic,
material bodies with more or less well-defined
shapes has been fundamental to the
development of our understanding of the
physicochemical properties of matter, and it is
now so familiar and deeply ingrained in our
thinking that it is usually taken for granted - it is
the central dogma of chemistry.”
— R. G. Woolley (Woolley 1980)
What do we mean by
Molecular Structure?
Ø The notion that a molecule has structure is fundamental to much of
chemistry as practiced today. But what do we really mean by this
term?
Ø There are several ways to envision molecular structure, some more
general and others fairly concrete, but rather restrictive.
Ø As an example of the latter, we can think of molecular structure in
terms of familiar ball-and-stick molecular models. Such models are
simple to visualize and are intuitively appealing. But by confining our
conception to such models, we risk imposing a classical,
mechanical, vision upon an intrinsically microscopic quantum world.
Ø From a philosophical perspective, we can define structure as that
property of a molecule by virtue of which it occupies space in the
real world.
Ø From a statistical perspective, we can define structure as that which
distinguishes an object from a heap of its parts, in this case, a
molecule from a collection of its constituent atoms.
Molecular Structure
Ø This statistical definition generalizes the concept of
molecular structure to situations where the relative
spatial locations of the constituent atoms may not be
known and makes the link to the fundamental statistics
of the constituent particles.
Ø Most modern molecular structure determinations are
indirect, utilizing a transformation from momentum space
or frequency domain.
Ø Mathematically, structure is measured by the interparticle distribution function. Thus an ideal gas of atoms
has minimal structure, a hydrogen-bonded liquid is more
structured and a crystal or molecular solid even more so.
Ø The familiar ball-and-stick molecular models are thus the
rigid limit of a hierarchy of structures.
Hierarchy of Molecular Structure Representations
Molecular Structure and
Shannon information entropy
Ø The Shannon information entropy is a maximum for a
uniform distribution.
Ø Deviations from this uniformity may be attributed to
structure.
Ø Electron-nuclear forces add structure to an electron
distribution, thereby lowering the entropy;
Ø Electron repulsion forces broaden the distribution and
hence raise the entropy
Ø A decrease of Shannon information entropy is due to
the dominant role of the attractive forces exerted by
the nuclei in imparting structure to the electron
distribution in a molecular system
Molecular structure in the
Born-Oppenheimer approximation
•
•
•
•
•
The BO separation of electronic and nuclear motions in molecules
shows that there must exist molecular states which can be
approximately represented as products of electronic and nuclear
functions.
The electronic structure problem then involves solving for the
eigenfunctions of an electronic Hamiltonian, while the nuclear
function satisfies an equation of motion, with the eigenvalues of the
electronic Hamiltonian forming an effective potential energy surface
upon which the nuclei may be envisioned to move.
The distinct concepts of electronic structure and molecular structure
are thus intimately related.
This is, of course, not accidental: as Hohenberg and Kohn showed
in 1964, there exists a unique mapping between the potential v(r)
due to the nuclei and the distribution of electron density ρ(r).
Since ρ(r) determines the number of electrons
N = ∫ρ(r) dr,
ρ(r) also uniquely determines the ground state wave function ψ, the
ground state electronic energy and the molecular structure.
Electron density envelopes for
Ethylene
ρ = 0.002 e/Bohr3
ρ = 0.20 e/Bohr3
ρ = 0.36 e/Bohr3
Electron density profiles of
ethylene
Molecular structure and bond paths
“Will you reflect for a moment on some of the things that I
have been saying? I described a bond, a normal simple
chemical bond; and I gave many details of its character
(and could have given many more). Sometimes it seems
to me that a bond between two atoms has become so
real, so tangible, so friendly that I can almost see it. And
then I awake with a little shock; for a chemical bond is
not a real thing: it does not exist: no one has ever seen
it, no one ever can. It is a figment of our own
imagination.”
— C. A. Coulson (Coulson 1951; Coulson 1955)
Molecular structure in the Quantum
Theory of Atoms in Molecules
•
•
•
The virial partitioning of molecular systems into roughly neutral
subsystems forms the basis of the Quantum Theory of Atoms in
Molecules, providing a rigorous and unambiguous recipe for
partitioning a molecule into atomic subsystems.
In this formulation, the nuclei function as attractors of the electron
density field ρ(r), the atom being defined as the union of an attractor
and its basin of attraction.
Each atom thus contains one and only one nucleus, with the
gradient paths of the electron density (∇ρ) being employed to define
the bonds between atoms as well as the interatomic boundaries: the
bond path between any two atoms is defined as the unique gradient
path ∇ρ connecting the respective nuclei, while the interatomic
surface is defined through the zero-flux criterion:
∇ρ.ñ = 0
where ñ is the normal to the surface.
Chemical topology & Molecular graphs
•
•
•
•
This partitioning scheme has a sound theoretical underpinning: the
zero-flux criterion ensures that each atomic subsystem satisfies the
virial theorem and thereby ensures the spatial additivity of the action
W=∫L(t)dt
(where L is the Lagrangian), and of its variation, in accordance with
Schwinger’s principle of stationary action.
It is through this principle that we are able to extend the formulation
of quantum mechanics to an open quantum subsystem, such as an
atom in a molecule.
Through bond paths, we also recover the concept of chemical
bonds: the topology of the bond paths completely specifies the
molecular graph.
This molecular graph is commonly referred to as the 2-D structure of
the molecule.
Electron density contours,
gradient paths and bond paths
of ethylene
http://www.chemistry.mcmaster.ca/faculty/bader/aim/aim_1.html
Bond paths and
non-nuclear attractors
Li
Li2
Li
Li
Li
Li
Li4
Li
Li
= non-nuclear attractor
Li
Li
Li
Li6
Li
Li
No direct Li-Li bonds
Quantum Topology of Molecular
Structure and Change
Water
Umbilic
catastrophe
Structure and Conformation
• Conformational flexibility is a critical link
between structure, stability and function.
• Enzymes must be flexible enough to
mediate a reaction pathway, yet rigid
enough to achieve molecular recognition.
Transition-state theory
involves a rate-limiting step,
shown as an obligatory
thermodynamic barrier
Protein folding landscape
Theory and simulations show that energy landscapes
for protein folding are funnel-shaped and have no
apparent microscopic energetic or entropic barriers.
Schonbrun, Jack and Dill, Ken A. (2003) Proc. Natl. Acad. Sci. USA 100, 12678-12682
Encoding Structure : Descriptors
O
N
N
Cl
AAACCTCATAGGAAGCATACCA
GGAATTACATCA…
Structural Descriptors
Physiochemical Descriptors
Topological Descriptors
Geometrical Descriptors
Molecular
Structures
Descriptors
Model
Property
Molecular Representations
O
N
H3C
N
CH3
N
CH3
Chemistry space and Molecular Similarity
Chemistry space and Molecular Similarity
The figure depicts a cartoon representation of the relationship between the continuum of chemical space
(light blue) and the discrete areas of chemical space that are occupied by compounds with specific affinity for
biological molecules. Examples of such molecules are those from major gene families (shown in brown, with
specific gene families colour-coded as proteases (purple), lipophilic GPCRs (blue) and kinases (red)). The
independent intersection of compounds with drug-like properties, that is those in a region of chemical space
defined by the possession of absorption, distribution, metabolism and excretion properties consistent with
orally administered drugs — ADME space — is shown in green.
Christopher Lipinski & Andrew Hopkins, NATURE|VOL 432 | 16 DECEMBER 2004, pp.855-861
Molecular Similarity
Assessment: Motivation…
The Drug Discovery Pipeline
Distribution of drug potencies
Cumulative Cost
Probability
of success
The Interface of NIH and Drug Development
Current
Public
Sector
Science
Dedicated
MedChem
begins
Indefinite
Target
identification
1 yr 1 yr 1 yr
Compound
accepted into
Development
~ 3 yrs
Lead
Optimization,
Toxicology
Assay
develop- Screening
(HTS or
Hit-toment
otherwise) Probe
1 yr
2 yrs
Ph I
Ph II
(Safety) (Dose finding,
initial efficacy
in patient pop.)
~3 yrs
Ph III
(Efficacy and
safety in large
populations)
1.5 yrs
Indefinite
Regulatory Ph IV-V
review
(Additional
indications,
Safety
monitoring)
Cumulative Cost
Probability
of success
The Interface of NIH and Drug Development
Proposed
Public
Sector
Science
Dedicated
MedChem
begins
Indefinite
Target
identification
1 yr 1 yr 1 yr
Compound
accepted into
Development
~ 3 yrs
Lead
Optimization,
Toxicology
Assay
develop- Screening
(HTS or
Hit-toment
otherwise) Probe
1 yr
2 yrs
Ph I
Ph II
(Safety) (Dose finding,
initial efficacy
in patient pop.)
~3 yrs
Ph III
(Efficacy and
safety in large
populations)
1.5 yrs
Indefinite
Regulatory Ph IV-V
review
(Additional
indications,
Safety
monitoring)
Model Applicability Domain Analysis
Poor Model Applicability
Good Model Applicability
Macrocycles – musky odor or not ?
(C. Davidson and B. Lavine)
musk
non-musk
Nitroaromatics – musk or non-musk?
(C. Davidson and B. Lavine)
musk
non-musk
Descriptor Selection
• What features of a molecule are
related to the property of interest ?
• What descriptors can capture that
information?
Molecular
Structures
Descriptors
Model
Property
GA/PCA Results with TAE descriptors
(C. Davidson and B. Lavine) 7 selected features
•1—Nonmusk
•2—Musk
Results with Wavelet and PEST Descriptors
(C. Davidson and B. Lavine)
Lavine)
3D PC Plot Dim(9)
3
2
2
•1—Nonmusk
2
2
2 2
•2—Musk
2
2
2 2
2
2
22
2
2
222
22 2
22 2
2 22
2
22
2
2
2 2
2 2
2
2 2 222 2
2 22 2
2
2 222222 2
2
2
2
222222222
22
2
22
2
22
22 2
2
2
2 2
22 22
2
2 22 2
222
2
PC2
1
0
-1
-2
-3
-3
1 11
1 1111
111111
1
11 1111
1
11 11
1111111
1 11
1
1
1
11 1 1
1
1 1
1
1
22
-2
-1
0
1
PC1
2
3
4
5
Nitroaromatics and macrocycles (B. Lavine)
3D PC Plot Dim(30)
3
11111111
1
1111
1
1
1
1
11111111
1111
11
•1 Macro Non-Musk
11
11 11
1
11
1111
1111 1
11 1
111
11 1
1111
11
11
1
1111
1
1111
11
•2 Macro Musk
2
•1 Nitro Non-Musk
•2 Nitro Musk
PC2
1
1
1111 11 1
11
0
-1
22
2
2222
2
22
222
22
22222
22222
2222
2
2
22
2
2
2
22
2222
22
-2
-3
-6
-4
-2
0
PC1
2
22
2 2 222
2 22 22
2222
222
222
2
222
22 2 22222
2 2 2 222222222
2
22
2 22 2222222
2 2
2
2
2 2
4
6
Assessment of Molecular Similarity
Assessment of Similarity
It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind
The First approached the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
“God bless me! but the Elephant
Is very like a wall!”
The Second, feeling of the tusk,
Cried, “Ho! what have we here
So very round and smooth and sharp?
To me ’tis mighty clear
This wonder of an Elephant
Is very like a spear!”
The Third approached the animal,
And happening to take
The squirming trunk within his hands,
Thus boldly up and spake:
“I see,” quoth he, “the Elephant
Is very like a snake!”
The Fourth reached out an eager hand,
And felt about the knee.
“What most this wondrous beast is like
Is mighty plain,” quoth he;
“ ‘Tis clear enough the Elephant
Is very like a tree!”
The Fifth, who chanced to touch the ear,
Said: “E’en the blindest man
Can tell what this resembles most;
Deny the fact who can
This marvel of an Elephant
Is very like a fan!”
The Sixth no sooner had begun
About the beast to grope,
Than, seizing on the swinging tail
That fell within his scope,
“I see,” quoth he, “the Elephant
Is very like a rope!”
And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
- John Godfrey Saxe (1816-1887)
Why there is No Salt in the Sea
— Joseph E. Earley
Foundations of Chemistry
(Springer Netherlands)
Volume 7, Number 1, Pages 85-102, January 2005
What, precisely, is 'salt'? It is a certain white, solid, crystalline, material, also
called sodium chloride. Does any of that solid white stuff exist in the sea? –
Clearly not. One can make salt from sea water easily enough,but that fact
does not establish that salt, as such, is present in brine. (Paper and ink can be
made into a novel – but no novel actually exists in a stack of blank paper
with a vial of ink close by.) When salt dissolves in water, what is present is
no longer 'salt' but rather a collection of hydrated sodium cations and chloride
anions, neither of which is precisely salt, nor is the collection. The aqueous
material in brine is also significantly different from pure water. Salt may be
considered to be present in seawater, but only in a more or less vague
'potential' way. Actually, there is no salt in the sea.
What about water in proteins?
•
•
•
•
•
•
•
Our bodies are an aqueous environment — Liquid water constitutes
one of the essential components of biological systems and it is difficult
to overstate the role of water in biological structure and function.
Proteins crystallize with several units of H2O weakly bound to the rest
of the protein
H2O provides the thermodynamic driving force for proteins to fold and
self-assemble.
It mediates not only tertiary and quaternary interactions, but also
interactions between different biomolecules, and between
biomolecules and ligands or surfaces.
H2O molecules are also known to take part in specific enzymatic
reactions.
Protein conformational dynamics appear to be linked (or slaved) to the
dynamics of vicinal H2O, thereby affecting protein function.
H2O in the vicinity of proteins and other biomolecules critically
influence protein structure, dynamics, function and other
thermodynamic and kinetic properties.
pH-Sensitive Protein Surface
Electrostatic Potential Maps
1POC EP pH 3.0
1POC EP pH 4.0
1POC EP pH 5.0
1POC EP pH 6.0
1POC EP pH 7.0
1POC EP pH 8.0
DNA Binding Complex with 1CGP
Representations of DNA Structure
Can we improve on ATCG?
• Most bioinformatic methods represent DNA by sequence of letters
• DNA bases assumed to act independently
• This representation of DNA has little to do with the energetics of
binding of protein to DNA
Dixel approach
• Characterization of DNA through features of electron densities on
the surfaces of the major and minor groves of the DNA
• The central base pair resides in the specific electronic environment
generated by the flanking base pairs
DNA Nucleotide Triplets as DIXELS
A “basis set” of all
possible nucleotide
base pairs with all
possible neighbors
results in a set of
base pair “triplets”.
Ab Initio properties
of base pair and
two flanking base
pairs (end capped)
are computed.
Central base pair is
encoded and stored
as a “DIXEL” object.
Base pair properties perturbed by
flanking base pairs
Challenges in Molecular
Similarity Assessment
“First there are the known knowns”
—These are the things that we know we know
“Then there are the known unknowns”
—These are the things that we now know we do not know
“Finally there are also the unknown unknowns”
—These are the things that we do not yet know we do not know
“And each day brings us a few more unknown unknowns”
—Donald Rumsfeld, 2003
Download