Molecular graph

advertisement
Milano Chemometrics and QSAR Research Group
Roberto Todeschini
Viviana Consonni
Manuela Pavan
Andrea Mauri
Davide Ballabio
Alberto Manganaro
chemometrics
molecular descriptors
QSAR
multicriteria decision making
environmetrics
experimental design
artificial neural networks
statistical process control
Department of Environmental Sciences
University of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.unimib.it/chm/
Roberto Todeschini
Milano Chemometrics and QSAR Research Group
Molecular descriptors
Constitutional descriptors and graph invariants
Iran - February 2009
Content
 Counting descriptors
 Empirical descriptors
 Fragment descriptors
 Molecular graphs
 Topological descriptors
Counting descriptors
Each descriptor represents the number of elements of
some defined chemical quantity.
For example:
- the number of atoms or bonds
- the number of carbon or chlorine atoms
- the number of OH or C=O functional groups
- the number of benzene rings
- the number of defined molecular fragments
Counting descriptors
... also a sum of some atomic / bond property is
considered as a count descriptor, as well as its average
A
MW   mi
i 1
A
P  wi
AMW  MW / A
i 1
For example:
- molecular weight and average molecular weight
- sum of the atomic electronegativities
- sum of the atomic polarizabilities
- sum of the bond orders
Counting descriptors
A counting descriptor n is semi-positive variable,
i.e. n  0
Its statistical distribution is usually a Poisson
distribution.
Main characteristics
• simple
• the most used
• local information
• high degeneracy
• discriminant modelling power
Empirical descriptors
Descriptors based on specific structural aspects
present in sets of congeneric compounds and
usually not applicable (or giving a single default
value) to compounds of different classes.
Empirical descriptors
Index of Taillander
Taillander et al., 1983
It is a descriptor dedicated to the modelling of the
benzene rings and is defined as the sum of the six
lengths joining the adjacent substituent groups.
H
H
Cl
CH3
H
H
Empirical descriptors
Hydrophilicity index (Hy)
Todeschini et al., 1999
It is a descriptor dedicated to the modelling of
hydrophilicity and is based on a function of the counting of
hydrophilic groups (OH-, SH-, NH-, ...) and carbon atoms.
1
nHy
1
(1  nHy )  log 1  nHy   nC   log  
2
n
n
n


Hy 
log 1  n 
nHy
nC
n
number of hydrophilic groups
-1  Hy  3.64
number of carbon atoms
total number of non-hydrogen atoms
Empirical descriptors
Compound
nHy
nC
n
Hy
hydrogen peroxide
2
0
2
3.64
carbonic acid
2
1
3
3.48
water
2
0
1
3.44
butanetetraol
4
4
8
3.30
propanetriol
3
3
6
2.54
ethanediol
2
2
4
1.84
methanol
1
1
2
1.40
ethanol
1
2
3
0.71
decanediol
2
10
12
0.52
propanol
1
3
4
0.37
butanol
1
4
5
0.17
pentanol
1
5
6
0.03
methane
0
1
1
0.00
nHy = 0 and nC = 0
0
0
N
0.00
decanol
1
10
11
- 0.28
ethane
0
2
2
- 0.63
pentane
0
5
5
- 0.90
decane
0
10
10
- 0.96
alcane with nC = 1000
0
1000
1000
- 1.00
Fragment approach
 Parametric approach (Hammett – Hansch,1964)
 Substituent approach (Free-Wilson, Fujita-Ban, 1976)
 DARC-PELCO approach (Dubois, 1966)
 Sterimol approach (Verloop, 1976)
Fragment approach
The biological activity of a molecule is
the sum of its fragment properties
Congenericity principle
QSAR styrategies can be applied ONLY to classes of
similar compounds
common reference skeleton
molecule properties gradually modified by substituents
Hansch approach
Corvin Hansch, 1964
Biological response = f1(L) + f2(E) + f3(S) + f4(M)
1
Lipophilic properties
2
Electronic properties
3
Steric properties
4
Other molecular properties
Hansch approach
1
Congenericity approach
2
Linear additive scheme
3
Limited representation of global molecular properties
4
No 3D and conformational information
Free-Wilson approach
Me
Me
Me
Me
Me
1H
I
H
F
Br
2H
H
I
F
F
Free-Wilson approach
Me
Me
Me
Me
Me
H
I
H
F
Br
H
H
I
F
F
Pos. 1
Pos. 2
F
Br
I
F
Br
I
mol.1
0
0
0
0
0
0
mol.2
0
0
1
0
0
0
mol.3
0
0
0
0
0
1
mol.4
1
0
0
1
0
0
mol.5
0
1
0
1
0
0
Free-Wilson approach
Free-Wilson, 1964
S
Ns
y i  b0   bks  Ii ,ks
s 1 k 1
Iks absence/presence of k-th subst. in the s-th site
yi  b0  b11  Ii ,11  b21  Ii ,21  b31  Ii ,31  b12  Ii ,12  b22  Ii ,22  b32  Ii ,32
F
Br
Pos. 1
I
F
Br
Pos. 2
I
Fragment approach
Fingerprints
binary vector
1000101000000010000000
presence of a fragment
absence of a fragment
similarity searching
Molecular graph
5
1
2
6
3
7
4
Molecular graph
Mathematical object defined as
G = (V, E)
set V
vertices
atoms
set E
edges
bonds
5
1
2
6
3
7
4
Molecular graph
Usually in the molecular graph hydrogen atoms
are not considered
H - depleted molecular graph
Molecular graph
A walk in G is a sequence of vertices
w = (v1, v2, v3, ..., vk) such that {vj, vj+1} E.
The length of a walk is the number of edges traversed by the
walk.
A path in G is a walk without any repeated vertices.
The length of a path (v1, v2, v3, ..., vk+1) is k.
1
3
4
2
5
6
v1 v2 v3 v2 v5
walk of length 4
v1 v2 v3 v4 v5
path of length 4
Molecular graph
The topological distance dij is the length of the shortest
path between the vertices vi and vj.
1
3
2
d15 = 2
d15 = 4
4
5
6
The detour distance dij is the length of the longest path
between the vertices vi and vj.
Molecular graph
A self returning walk is a walk closed in itself, i.e. a
walk starting and ending on the same vertex.
1
3
2
4
5
v1 v2 v3 v2 v1
6
Self returning
walk of length 4
v2 v3 v4 v5 v2
A cycle is a walk with no repeated vertices other
than its first and last ones (v1 = vk).
Molecular graph
The molecular walk (path) count MWCk (MPCk) of order
k is the total number of walks (paths) of k-th length in the
molecular graph.
MWC0 = nSK (no. of atoms)
MWC1 = nBO (no. of bonds)
DRAGON
 Molecular size
 Branching
 Graph complexity
MWC1, MWC2,
…, MWC10
Molecular graph
The self-returning walk count SRWk of order k is the
total number of self-returning walks of length k in the
graph.
SRW1 = nSK
SRW2 = nBO
DRAGON
SRW1, SRW2,
…, SRW10
spectral moments of the adjacency matrix, i.e. linear
combinations of counts of certain fragments contained
in the molecular graph, i.e. embedding frequencies.
Molecular graph
Local vertex invariants (LOVIs) are quantities
associated to each vertex of a molecular graph.
Graph invariants are molecular descriptors
representing graph properties that are preserved by
isomorphism.
 characteristic polynomial
 derived from local vertex invariants
Molecular graph and more
Molecular graph
Topological matrix
Algebraic operator
Local Vertex Invariants
Graph invariants
Molecular descriptors
molecular graph
topostructural
descriptors
graph invariants
molecular geometry
x, y, z coordinates
topochemical
descriptors
topographic
descriptors
topological information indices
Wiener index, Hosoya Z index
Zagreb indices, Mohar indices
Randic connectivity index
Balaban distance connectivity index
Schultz molecular topological index
Kier shape descriptors
eigenvalues of the adjacency matrix
eigenvalues of the distance matrix
Kirchhoff number
detour index
topological charge indices
...............
3D-Wiener index
3D-Balaban index
D/D index
...............
Kier-Hall valence connectivity indices
Burden eigenvalues
BCUT descriptors
Kier alpha-modified shape descriptors
2D autocorrelation descriptors
...............
total information content on .....
mean information content on .....
Molecule graph invariants
Numerical chemical information extracted from
molecular graphs.
The mathematical representation of a molecular graph
is made by the topological matrices:
• adjacency matrix
• atom connectivity matrix
• distance matrix
• edge distance matrix
• incidence matrix
... more than 60 matrix representations of the
molecular structure
Local vertex invariants
Local vertex invariants (LOVIs) are quantities
associated to each vertex of a molecular graph.
Examples:
• atom vertex degree
• valence vertex degree
• sum of the vertex distance degree
• maximum vertex distance degree
Topological matrices
Adjacency matrix
Derived from a molecular graph, it represents the
whole set of connections between adjacent pairs of
atoms.
1 if atom i and j are bonded
aij =
0 otherwise
Topological matrices
Bond number B
It is the simplest graph invariant obtained from the
adjacency matrix.
It is the number of bonds in the molecular graph
calculated as:
1
B 
2
A
A
 a
ij
i 1 j 1
where aij is the entry of the adjacency matrix.
Local vertex invariants
atom vertex degree
δi
It is the row sum of the vertex adjacency matrix
5
1
2
6
3
7
4
1
2
3
4
5
6
7
di
1
0
1
0
0
0
0
0
1
2
1
0
1
0
1
0
1
4
3
0
1
0
1
0
1
0
3
4
0
0
1
0
0
0
0
1
5
0
1
0
0
0
0
0
1
6
0
0
1
0
0
0
0
1
7
0
1
0
0
0
0
0
1
Local vertex invariants
valence vertex degree
for atoms of the 2nd principal quantum number
(C, N, O, F)
δiv  Z iv  hi
Ziv
number of valence electrons of the i-th atom
hi
number of hydrogens bonded to the i-th atom
Local vertex invariants
valence vertex degree
the vertex degree of the i-th atom is the count
of edges incident with the i-th atom, i.e. the
count of  bonds or  electrons.
Local vertex invariants
valence vertex degree
for atoms with principal quantum number > 2
v
Z
v
i  hi
δi 
Z i  Z iv  1
Zi
total number of electrons of the i-th atom
(Atomic Number)
Topological descriptors
Zagreb indices (Gutman, 1975)
A
M 1   d 2a
a 1
M 2  b di  d j
di vertex degree of the i-th atom
Topological descriptors
Kier-Hall connectivity indices (1986)
They are based on molecular graph decomposition into
fragments (subgraphs) of different size and complexity and use
atom vertex degrees as subgraph weigth.
Randic branching index (1975)
R  1  b di  d j 
1/ 2
d i  d j 
1 / 2
is called edge connectivity
Topological descriptors
mean Randic branching index
χR
χR 
B
Topological descriptors
atom connectivity indices of m-th order
0
  a da1/ 2
The immediate bonding environment of each
atom is encoded by the subgraph weigth.
  b di  d j b
The number of terms in the sum depends on
the molecular structure.
1/ 2
1
   di  dl  d j k
2
2
P
1/ 2
k 1
m
1 / 2
n


m
χ q     δa 
k
k 1  a 1
m
The connectivity indices show a good
capability of isomer discrimination and reflect
some features of molecular branching.
P
P
number of m-th order paths
q
subgraph type (Path, Cluster,
Path/Cluster, Chain)
n=m
for Chain (Ring) subgraph type
n = m + 1 otherwise
Topological descriptors
valence connectivity indices of m-th order
0
 
  a d
v 1/ 2
a
v

v
i

v
i
  b d  d
1 v
2
2
P
m

v 1/ 2
j b

   d d d
v
v
l
k 1
1 / 2

v
χ     δa 
k
k 1  a 1
m
m
v 1/ 2
j k
v
q
P
n
χ
v
q
They encode atom identities
as well as the connectivities
in the molecular graph.
Topological descriptors
Kier-Hall electronegativity
X KH  δ  δi
v
i
X KH
δ iv  δ i

N2
Kier-Hall relative
electronegativity
electronegativity of
carbon sp3 taken as zero
principal quantum
number
correlation with the Mulliken-Jaffe electronegativity:


XMJ  1.99  dvi  di  6.99
div  di
XMJ  7.99 
 7.07
2
N
Distance matrix
vertex distance matrix degree
si
It is the row sum of the vertex distance matrix
5
1
2
6
3
4
7
The distance dij between two
vertices is the smallest number
of edges between them.
si  i
1
2
3
4
5
6
7
1
0
1
2
3
2
3
2
13
3
2
1
0
1
2
1
2
1
8
2
3
2
1
0
1
2
1
2
9
2
4
3
2
1
0
3
2
3
14
3
5
2
1
2
3
0
3
2
13
3
6
3
2
1
2
3
0
3
14
3
7
2
1
2
3
2
3
0
13
3
si is high for terminal vertices and low for central vertices
Local vertex invariants
The eccentricity i of the i-th atom is the upper
bound of the distance dij between the atom i and
the other atoms j
Topological descriptors
Petitjean shape index (1992)
A simple shape descriptor
I PJ
DR

R
IPJ = 0 for structure strictly cyclic
IPJ = 1 for structure strictly acyclic and with an even diameter
Topological descriptors
Wiener index (1947)
1 A A
W   dij
2 i 1 j 1
dij topological
distances
2W
W 
A A  1
high values for big molecules and for linear molecules
low values for small molecules and for branched or cyclic molecules
The Average Wiener index is independent from the molecular size.
Topological descriptors
Balaban distance connectivity index (1982)
C  B  A 1
B
0.5
J
 b si  s j 
C 1
number of
atoms
B
 0.5
J
 b si  s j 
C 1
_
B number of bonds
C number of cycles
si sum of the i-th row distances
si
si 
B
average sum of the i-th row distances
one of the most discriminant indices
Edge descriptors
5
6
d
1
2
3
a
4
e
b
c
f
atom
7
Es E 
i
i
a
b
c
d
e
f
a
0
1
2
1
2
1
7
2
b
1
0
1
1
1
1
5
1
c
2
1
0
2
1
2
7
2
d
1
1
2
0
2
1
7
2
e
2
1
1
2
0
2
8
2
f
1
1
2
1
2
0
7
2
bond
d
e
a
b
c
f
Topographic descriptors
Some geometrical descriptors are derived from the
corresponding topological descriptors substituting
the topological distances dst by the geometrical
distances rst.
They are called topographic descriptors.
For example, the 3D-Wiener index:
A
3D
A
1
W     rij
2 i 1 j 1
Molecular geometry
The geometry matrix G (or geometric distance matrix) is
a square symmetric matrix whose entry rst is the
geometric distance calculated as the Euclidean distance
between the atoms s and t:
0 r12
r21 0
G
 
rA1 rA 2
 r1 A
 r2 A
 
 0
Milano Chemometrics and QSAR Research Group
Roberto Todeschini
Viviana Consonni
Manuela Pavan
Andrea Mauri
Davide Ballabio
Alberto Manganaro
chemometrics
molecular descriptors
QSAR
multicriteria decision making
environmetrics
experimental design
artificial neural networks
statistical process control
Department of Environmental Sciences
University of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.disat.unimib.it/chm/
THANK YOU
coffee break
Goal
Goal
Molecular graph
Molecular graph
Molecule graph invariants
Molecular graph
Molecular graph
Molecular graph
Molecular graph
Molecular graph
Molecular graph
Hansch approach
Hansch molecular descriptors
lipophilic
properties
electronic
properties
steric
properties
partition coefficients
- logP, logKow
Hammett constants
molecular weight
molar refraction
VDW volume
chromatog. param.
- Rf, RT,
dipole moment
molar volume
Solubility
HOMO, LUMO
surface area
….
Ionization potential
….
….
Molecular graph
Molecular graph
Molecular graph
Download