Exploration of Chemical Space by Molecular Morphing David Hoksza1, Daniel Svozil2 1 SIRET Research Group Department of Software Engineering, FMP, Charles University in Prague, Czech Republic 2 Laboratory of Informatics and Chemistry Institute of Chemical Technology, Prague, Czech Republic Outline • Overview and Motivation • Chemical Space Exploration o o o o morphing operators molecule representation distance definition space exploration • Experimental Evaluation BIBE 2011 October 26, 2011 2 Chemical Space • All possible organic compounds comprise a “chemical space” • Can be viewed as being analogous to the cosmological universe in its vastness, with chemical compounds populating space instead of stars • Size o o o o Estimated size of the chemical space: 10100-10200 (SciFinder ~ 6107) Around one sextillion (1021) stars in the observable universe For example, there are more than 1029 possible derivatives of n-hexane Chemical space is infinite for our purposes • Not all theoretically postulated compounds fall within the limits of what is synthetically feasible BIBE 2011 October 26, 2011 3 Chemical Space Exploration - Motivation • Motivation o 2 ligands BIBE 2011 October 26, 2011 4 General Algorithm 1. Generate n morphs from MS 2. Accept each morph with probability give by its distance to MT 3. Accepted morphs form generation M1 4. For each morph Mi from M1 repeat from 1 using MS = Mi 5. Finish when one of the morphs is identical with MT BIBE 2011 October 26, 2011 5 Molecular Structure Representation • Fragment-based representation o The fragments present in a structure can be represented as a sequence of 0s and 1s 00010100010101000101010011110100 • 0 means fragment is not present in structure • 1 means fragment is present in structure (perhaps multiple times) o structural keys – fixed dictionary of fragments (1:1 relationship bit:fragment, problem: structure containing no fragments in dictionary) o hashed fingerprints – the fragment description (C-C-N-C-O) can be hashed to the e.g. 1-1024 and this bit is set (problem: collisions, how to work back from position to fragment?) BIBE 2011 October 26, 2011 6 Molecular Structure Similarity • Count the “on” bits in both molecules • Count the “on” bits in each molecule struct A: struct B: A AND B: 00010100010101000101010011110100 13 bits on (A) 00000000100101001001000011100000 8 bits on (B) 00000000000101000001000011100000 6 bits on (C) • Tanimoto similarity coefficient 𝐶 6 similarity = = = 0.4 𝐴 + 𝐵 − 𝐶 13 + 8 − 6 BIBE 2011 October 26, 2011 7 Morphing Operators Path Example Morphing Operators MS MT BIBE 2011 October 26, 2011 8 Exploration Parameters • cnt_max_iterations • cnt_morphs • cnt_morphs_det • dist_det • cnt_accept • cnt_accept_max • cnt_it_prune • cnt_morphs_max BIBE 2011 October 26, 2011 9 Evaluation - Datasets • 3 start/target pairs datasets from Pubchem • 20 pairs in each set • 3 difficulty levels based on pair similarity o representation of start and target structures by their PubChem substructure fingerprints o similarity quantified as the Tanimoto score • D1 … 0.7 – 0.8 similarity • D2 … 0.5 – 0.6 similarity • D3 … 0.3 – 0.4 similarity • time constraint – 8h BIBE 2011 October 26, 2011 10 Evaluation - Results 75% BIBE 2011 60% 35% October 26, 2011 11 Molpher Student Project • To start at the end of 2011 • Algorithm optimization • Parallel processing • Visualization • Extensive Logging BIBE 2011 October 26, 2011 12 Questions? BIBE 2011 October 26, 2011 13