Exploration of Chemical Space by Molecular Morphing David Hoksza

advertisement
Exploration of
Chemical Space by
Molecular Morphing
David Hoksza1, Daniel Svozil2
1 SIRET
Research Group
Department of Software Engineering, FMP, Charles University in Prague, Czech
Republic
2 Laboratory
of Informatics and Chemistry
Institute of Chemical Technology, Prague, Czech Republic
Outline
• Overview and Motivation
• Chemical Space Exploration
o
o
o
o
morphing operators
molecule representation
distance definition
space exploration
• Experimental Evaluation
BIBE 2011
October 26, 2011
2
Chemical Space
• All possible organic compounds comprise a “chemical
space”
• Can be viewed as being analogous to the cosmological
universe in its vastness, with chemical compounds
populating space instead of stars
• Size
o
o
o
o
Estimated size of the chemical space: 10100-10200 (SciFinder ~ 6107)
Around one sextillion (1021) stars in the observable universe
For example, there are more than 1029 possible derivatives of
n-hexane
Chemical space is infinite for our purposes
• Not all theoretically postulated compounds fall within
the limits of what is synthetically feasible
BIBE 2011
October 26, 2011
3
Chemical Space
Exploration - Motivation
• Motivation
o 2 ligands
BIBE 2011
October 26, 2011
4
General Algorithm
1. Generate n morphs
from MS
2. Accept each morph
with probability give
by its distance to MT
3. Accepted morphs
form generation M1
4. For each morph Mi
from M1 repeat from
1 using MS = Mi
5. Finish when one of
the morphs is
identical with MT
BIBE 2011
October 26, 2011
5
Molecular Structure
Representation
• Fragment-based representation
o The fragments present in a structure can be represented as a sequence of
0s and 1s
00010100010101000101010011110100
• 0 means fragment is not present in structure
• 1 means fragment is present in structure (perhaps multiple times)
o structural keys – fixed dictionary of fragments (1:1 relationship
bit:fragment, problem: structure containing no fragments in dictionary)
o hashed fingerprints – the fragment description (C-C-N-C-O) can be
hashed to the e.g. 1-1024 and this bit is set (problem: collisions, how to
work back from position to fragment?)
BIBE 2011
October 26, 2011
6
Molecular Structure
Similarity
• Count the “on” bits in both molecules
• Count the “on” bits in each molecule
struct A:
struct B:
A AND B:
00010100010101000101010011110100 13 bits on (A)
00000000100101001001000011100000 8 bits on (B)
00000000000101000001000011100000 6 bits on (C)
• Tanimoto similarity coefficient
𝐶
6
similarity =
=
= 0.4
𝐴 + 𝐵 − 𝐶 13 + 8 − 6
BIBE 2011
October 26, 2011
7
Morphing Operators
Path Example
Morphing Operators
MS
MT
BIBE 2011
October 26, 2011
8
Exploration Parameters
•
cnt_max_iterations
•
cnt_morphs
•
cnt_morphs_det
•
dist_det
•
cnt_accept
•
cnt_accept_max
•
cnt_it_prune
•
cnt_morphs_max
BIBE 2011
October 26, 2011
9
Evaluation - Datasets
• 3 start/target pairs datasets from Pubchem
• 20 pairs in each set
• 3 difficulty levels based on pair similarity
o representation of start and target structures by their PubChem
substructure fingerprints
o similarity quantified as the Tanimoto score
• D1 … 0.7 – 0.8 similarity
• D2 … 0.5 – 0.6 similarity
• D3 … 0.3 – 0.4 similarity
• time constraint – 8h
BIBE 2011
October 26, 2011
10
Evaluation - Results
75%
BIBE 2011
60%
35%
October 26, 2011
11
Molpher Student Project
• To start at the end of 2011
• Algorithm optimization
• Parallel processing
• Visualization
• Extensive Logging
BIBE 2011
October 26, 2011
12
Questions?
BIBE 2011
October 26, 2011
13
Download