Screening a Virtual Compound Space Szabolcs Csepregi Ferenc Csizmadia Szilárd Dóránt Nóra Máté György Pirok Zsuzsanna Szabó Jenő Varga Miklós Vargyas ChemAxon Ltd. Máramaros köz 3/a 1037 Budapest Hungary www.chemaxon.com Drug research Finding or making a needle in the hay stack? virtual screening de novo design JChem Screen JChem AnalogMaker advantages • fast • hits are readily available for in vitro screening disadvantages • limited number of available compounds advantages • practically unlimited virtual compound space • structural novelty disadvantages • synthetic accessibility of virtual hits is a problem Drug research Finding or making a needle in the hay stack? virtual screening de novo design JChem Screen JChem AnalogMaker advantages • fast • hits are readily available for in vitro screening disadvantages • limited number of available compounds advantages • practically unlimited virtual compound space • structural novelty disadvantages • synthetic accessibility of virtual hits is a problem Virtual Screening Find something similar to a fistful of needles corporate database known actives structures found Molecular similarity How to tackle it? Quantitative assessment of similarity/dissimilarity of structures • need a numerically tractable form • molecular descriptors, fingerprints, structural keys Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics. E ( x, y) n x i 1 i yi 2 T ( x, y ) B( x & y ) B( x) B( y ) B( x & y ) Virtual screening using fingerprints Multiple query structures 0100010100011101010000110000101000010011000010100000000100100000 0001101110011101111110100000100010000110110110000000100110100000 0100010100110100010000000010000000010010000000100100001000101000 0101110100110101010111111000010000011111100010000100001000101000 0001000100010100010100100000000000001010000010000100000100000000 0100010100010100000000000000101000010010000000000100000000000000 0101010101111100111110100000000000011010100011100100001100101000 0100010100011000010000011000000000010001000000110000000001100000 0000000100000000010000100000000000001010100000000100000100100000 queries 0101110100110101010111111000010000011111100010000100001000101000 hypothesis fingerprint metric 0000000100001101000000101010000000000110000010000100001000001000 0100010110010010010110011010011100111101000000110000000110001000 0100010100011101010000110000101000010011000010100000000100100000 0001101110011101111110100000100010000110110110000000100110100000 0100010100110100010000000010000000010010000000100100001000101000 0100011100011101000100001011101100110110010010001101001100001000 0101110100110101010111111000010000011111100010000100001000101000 0100010100111101010000100010000000010010000010100100001000101000 0001000100010100010100100000000000001010000010000100000100000000 0100010100010011000000000000000000010100000010000000000000000000 0100010100010100000000000000101000010010000000000100000000000000 0101010101111100111110100000000000011010100011100100001100101000 0100010100011000010000011000000000010001000000110000000001100000 0000000100000000010000100000000000001010100000000100000100100000 0100010100010100000000100000000000010000000000000100001000011000 0001000100001100010010100000010100101011100010000100001000101000 0100011100010100010000100001001110010010000010001100000000101000 0101010100010100010100100000000000010010000010010100100100010000 targets target fingerprints hits Optimized virtual screening Parameterized metrics s min( x , y ) ( x, y ) 1 x s min( x , y ) 1 y s min( x , y ) s min( x , y ) i i scaled , asymmetric Tanimoto D i i i i i i i i i i i i i i i i 0,1 asymmetry factor si N scaling factor weighted , asymmetric DEuclidean ( x, y) wi xi yi 2 xi yi 0,1 asymmetry factor wi 0,1 weights wi 1 xi yi 2 xi yi i i How good is optimized virtual screening? β2-adrenoceptor antagonist Number of Hits 10000 1000 100 10 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Number of Active Hits Tanimoto Euclidean Optimized Ideal Drug research Finding or making a needle in the hay stack? virtual screening de novo design JChem Screen JChem AnalogMaker advantages • fast • hits are readily available for in vitro screening disadvantages • limited number of available compounds advantages • practically unlimited virtual compound space • structural novelty disadvantages • synthetic accessibility of virtual hits is a problem JChem AnalogMaker Workflow Lead Candidates Fragmentation Examples Fragmentation rules Amide Original molecule Generated fragments Fragment 1 amide 2 amide 1 Fragment 2 Ester ester 1 ester 2 Fragment 3 Fragmentation RECAP rules 1 = amide 2 =ester 5 = ether 6 = olefin 9 = lactam N carbon 3 = amine 7 = quaternary nirogen 10 = aromatic carbon – aromatic carbon 4 = urea 8 = aromatic N carbon 11 = sulphonamide Xiao Qing Lewell, Duncan B. Judd, Stephen P. Watson, Michael M. Hann; RECAP – retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 1998, 38, 511–522 JChem AnalogMaker General algorithm start create building block library generate pharmacophore hypothesis of active compounds create several starting compounds by random combination of some building blocks select parent structure generate variants of parent convergence or end of optimization stop Variant generation Example: TOPAS modifier G. Schneider et al, J. Comput.-Aided Mol. Design, 14(2000): 487-494 G. Schneider et al, Angew. Chem. Int. Ed., 39(2000): 4130-4133 Drug research Finding or making a needle in the hay stack? virtual screening de novo design JChem Screen JChem AnalogMaker advantages • fast • hits are readily available for in vitro screening disadvantages • limited number of available compounds advantages • practically unlimited virtual compound space • structural novelty disadvantages • synthetic accessibility of virtual hits is a problem Drug research Finding or making a needle in the hay stack? virtual screening ? de novo design JChem Screen ? JChem AnalogMaker advantages • fast • hits are readily available for in vitro screening disadvantages • limited number of available compounds advantages • practically unlimited virtual compound space • structural novelty disadvantages • synthetic accessibility of virtual hits is a problem Drug research Screening a virtual compound space virtual screening random virtual synthesis JChem Screen advantages JChem Synthesizer advantages de novo design JChem AnalogMaker advantages • practically unlimited virtual • fast • fast compound space virtual molecules are likely • hits are readily •available for structural novelty in vitro screeningto be synthetically• available • practically infinite virtual compound space disadvantages disadvantages • structural novelty • synthetic accessibility of • limited number of available virtual hits is a problem compoundsdisadvantages Screening a virtual compound space Smart reactions Generic (simple) • the equation describes the transformation only • few hundred generic reactions can form the basic armory of a preparative chemist Specific (complex) • chemo-, recognizes reactive and inactive functional groups • regio-, "knows" directing rules • stereo-, inversion/retention Customizable • to improve reaction model quality Smart reactions Chemoselectivity REACTIVITY: !match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1) Smart reactions Regioselectivity SELECTIVITY: TOLERANCE: -charge(ratom(1)) 0.0045 Smart reaction library Example Baeyer-Villiger ketone oxidation SELECTIVITY: charge(ratom(2), "sigma") JChem Synthesizer Workflow Virtual compound space Available chemicals Active set1 Screen Hits Synthesizer Active setn Smart reaction library Screen Hits JChem Synthesizer example Dopamine D2 actives Active sets were kindly provided by Aureus Pharma within a research collaboration between Aureus and ChemAxon. JChem Synthesizer example Virtual hits similarity: 2D pharmacophore fingerprint, weighted Euclidean metric optimized for 20 random d2 actives JChem Synthesizer example Best virtual hits 9.88 9.53 9.82 9.73 JChem Synthesizer example Synthesis path step 1 Knoevenagel-Doebner condensation JChem Synthesizer example step 2 Baylis-Hillman vinyl alkylation JChem Synthesizer example step 3 Lawesson thiacarbonylation JChem Synthesizer example step 4 Dess-Martin alcohol oxidization JChem Synthesizer example Software and performance data • virtual reactions: 500-1000 reactions/s • random synthesis: 10-20 structures/s • pharmacophore fingerprint generation: 100 structure/s (includes pharmacophore point perception) • metric optimization: 57 sec (13 parameterized metrics, 20 structures in training set, 50 spikes) • virtual screening: 7500 structure/s • pure Java client: P4 1.6GHz, RH Linux, java 1.4.2 database server: P4 2.4GHz, Windows XP, MySQL Acknowledgements Jean-Michael Drancourt François Petitet (Axovan is now part of Actelion.) Modest von Korff, Matthias Steger Alex Allardyce ChemAxon Contact Miklós Vargyas mvargyas@chemaxon.hu office: +36 1 453 2661 mobile: +36 70 381 3205