Java Solutions for Cheminformatics March 2005 About Us About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments History Formed: 1998 Budapest, Hungary Highlights • 1998: Custom projects • 1999: Java tools for sketching/viewing structures Skills base: • Chemistry, • 2000: Structure database support • Software development, • • Predictive tools 2001: Clustering and diversity analysis • 2003: Pharmacophore screening, property predictions, reaction processing, fragmenting • 2004: Cartridge technology, virtual synthesis, improved SMARTS support Aim: Platform independent software for chemistry People Developers: 17 Business Support: 3 (7 Phd, 10 MSc) (1 MSc, 2 BSc) Technical expertise Commercial expertise • Cheminformatics • Negotiation & contracting • Synthetic and physicochemistry • Relationship management • Collaboration steering and development • Virtual drug design • Java • Strategic marketing • Web technology • Mutually benefitial (win win) business relationships Selected Application Areas Global licenses Custom development projects Value added constructions Websites/portal front and back end Educational Product development 1999 2000 RDF, Marvin SDF, XYZ animations, Applets, CML, Molfiles, stereo templates, support, compressed Windows, Unix formats, Swing, 3D rendering Structure Database and Cheminformatics toolkit Chemical drawing 1998 SMILES, SMARTS, PDB, Rgroups, isotopes, shortcuts, Marvin Beans 2001 2002 Ball and stick, JPG, PNG, SVG, Cut&Paste with Isis/ChemDraw, 2D cleaning, (de)aromatizatio n, reactions Mac support, signed applets, Java Web Start, atom mapping JChem Oracle, MySQL, SQLServer, Access, hashed fingerprints, substructure and similarity searching clustering, diversity DB2, PostgreSQL, Rgroup searching 2003 Partial charge, pKa, logP, logD, 3D generation, radicals, Sgroups reaction searching, reaction processing, pharmacophore analysis. screening, standardization, fragmentation 2004 Marvin file format, enhanced stereo, enhanced SMARTS support, shapes, text boxes, multiple groups, TPSA, Donor/Acceptor... cartridge, enhanced stereo searching, recursive SMARTS, chemical expressions, virtual synthesis… Current Products Overview Multiple Deployment Formats • Applications • Java Applets • Signed Java Applets • Java Web Start • Java Beans • Plugins • JSP Why ChemAxon? • Sophisticated virtual chemistry technology • Platform independence and Web (Java) • High performance tools (speed, capacity) • Client oriented development • Comprehensive API for the developers • Detailed documentation • Competitive prices • Fast and reliable support Product Support „Developers supporting developers” • Fast response to support question – max. 24 hour response (fast solution also!) • Final and beta releases available online. • Detailed documents available online and extensive help bundled within software • Skilled and relevant human support quality (direct developer to developer) • Product development based on support requests Molecule Drawing and Visualization About Us Molecule Drawing and Visualization Structure Searching Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Operating Systems • 100% pure java • Windows – 95, 98, Me, NT, 2000, XP • Macintosh – OS 9, OS X • Unix – Linux, Solaris, Irix, etc. Web Browsers • Internet Explorer • Netscape • Mozilla • Safari • Opera Marvin • Various file formats • Isotopes, charges, radicals • SMARTS properties (atoms, bonds, recursive SMARTS) • Alias, pseudo atoms • Chemical error checking • Templates • Generic atoms and bonds • Abbreviated groups • Atom lists and not lists • Reactions • 2D cleaning • Atom maps • 3D cleaning • R-groups • Various 3D models • Stereo bonds, stereo configurations (R/S, E/Z) • Shapes, text boxes • Enhanced stereo (ABS/AND/OR) • Plugins Various File Formats Isotopes, Charges, Radicals Templates Abbreviated Groups R-groups Reactions Rendered 3D displays with MarvinSpace Structure Cleaning CC(C)NCC(O)COC1=C2C=C(C)NC2=CC=C1 topology 2D 3D Structure Searching About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments JChem Base Features • Rapid fingerprint-based database scanning • Sophisticated graph-based searching • Integration with databases – – – – – – – Oracle MS SQL Server DB2 MYSQL PostgreSQL InterBase Access • Custom standardization • JChem Cartridge for searching in Oracle • JSP integration Import with JChem Base Manager Query Features • Exact structure • Stereo atoms • Substructure • Stereo bonds • Atom lists and notlists • R-group queries • Explicit hydrogens • Generic atoms • Generic bonds • SMARTS atom properties – – – – – – – Aliphatic, aromatic Hydrogen count Connection count Valence Ring count Smallest ring size Recursive SMARTS – – – – R-groups Occurence if / then conditions RestH • Reaction search – Transformation recognition – Component identification – Stereospecific reactions (inversion, retention) • Diastereomers – Enhanced stereo groups (Abs, And, Or) JChem Base JSP Integration Thin client support: only a web browser and Java required Cartridge Technology About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments JChem Cartridge for Oracle Oracle can be extended to support chemical database operations using the JChem Cartridge for Oracle Examples: Substructure search displaying ID, SMILES codes, and molweight: SELECT cd_id, cd_smiles, cd_molweight FROM my_structures WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1; Finding benzene derivatives conforming the Lipinski’s rule of five: SELECT count(*) FROM my_structures WHERE jc_compare(structure, 'c1ccccc1','sep=!t:s!ctFilter: (mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10)') = 1; JChem Cartridge for Oracle Example Oracle search returning similar structures with logP >1, which were acquired after April 14th, 2002. MarvinView below. Structure Standardization About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Standardization • Explicit hydrogens • Aromatic bonds • Mesomers • Tautomers • Counterions Standardization Example before after Molecular Predictions About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Calculator Plugins Available Calculations Calculation Interface • • • • • • • • • • • • • • • Elemental analysis Charge distribution Polarizability pKa logP logD Polar surface area Huckel Analysis H-bond donor-acceptor Major microspecies Refractivity Marvin GUI Command line Chemical Terms API Elemental Analysis Polar Surface Area Partial Charge Distribution Partial Charge Distribution Calculation Partial Equalization of Orbital Electronegativities (PEOE) Orbital electronegativity defined by Mulliken Orbital electronegativity of atom i: ci=at+btqi+ctqi2 qi: partial charge Partial charge of atom i is iteratively calculated based on Gasteiger’s method: ci(0) = at, qi(0) = 0 qi(n+1) = qi (n) + S(0.5)n(ci- ck)/ max(ci, ck) k: index of a neighbor of atom i Polarizability logP logP Example logP = Sf i fI: atomic logP increment Validation of the logP prediction logD logD Example k1 1+(1) k4 1+2+(4 k5 123(0) k2 2+(2) k6 k3 p0 neutral species log D log 3-(3) k7 ) 1+3-(5) 1 +2+3(7) 2+3-(6) mono -ionized species -ionized species tri-ionized species di 2 [H ] [H ] k3 [H ] k5 k6 k p1 p2 p3 p4 p5 p6 p7 7 [ H ] k1 k2 [H ] k1k4 k1 k2 k1k4 2 [H ] [H ] k [H ] k5 k 6 k 1 3 7 [H ] k1 k2 [H ] k1k4 k1 k 2 k1k4 logD is computed using micro ionization constants (ki), micro partition coefficients (pi), and pH pKa pKa Plugin - Microconstants Micro ionization constants (logk) are calculated from regression equations that have three types of calculated parameters: Intramolecular interactions Partial charges logk Polarizabilities pKa Plugin - Macroconstants Macro ionization constants (pKa) are calculated from the microconstants (logk) Ionization scheme 1 1- 1 -2 + 2+ 1 -3 - 3- 2+3- 3 123 2 1 -2 + 3 - Hydrogen Bonds in pKa Calculation Dlogk = a (qi - qk) + b a,b: regression parameters Intramolecular hydrogen bonds are also taken into account Validation of the pKa prediction Chemical Expressions About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Chemical Terms Elements of the language • structure matching functions (describing functional groups, reaction sites, similarity…) • property calculations (partial charge distribution, pKa, logP, electrophility…) • arithmetic and logic-operators Chemical Terms examples searching match("olefine.mol") && !match("c1ccncc1") && (atomCount(16) == 0) || (mass() < 300); goal functions inhibitor = inhibitor.mol; (similarity(inhibitor, pharmacophore_tanimoto) > 0.8) && (similarity(inhibitor, chemical_tanimoto) < 0.5); filtering (mass() <= 500) && (logP() <= 5) && (donorCount() <= 5) && (acceptorCount() <= 10); Applications of Chemical Terms virtual synthesis reaction and synthesis rules pharmacophore analysis pharmacophore definitions CT drug design goal functions structure searching advanced query expressions Screening About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Pharmacophore Mapping atom type colors ■ hydrophobic (h) ■ acceptor (a) ■ donor / cationic (d/c) pharmacophore type colors ■ aromatic (r) ■ acceptor / donor (a/d) ■ donor / aromatic (d/r) Topological Pharmacophore Fingerprint r h r r r r r/d r r h r h d/a h h h h d/+ d/a Hypothesis Fingerprints Advantages Disadvantages Minimum strict selection of common features very sensitive to one missing feature Average not that sensitive to outliers less selective if actives are similar Dissimilarity Metrics Euclidean Tanimoto • standard • standard • normalized • scaled • weighted • asymmetric • asymmetric Screening Optimization 10,000 test compounds (from NCI) 300 optimization 50 active compounds (ß-adrenoreceptor antagonists) TRAINING 1/3 training set 1/3 query set 9,700 validation VALIDATION 1/3 spikes Screening Validation ß2-adrenoreceptor antagonists All compounds: Known active compounds: 9,700 18 minimum hypothesis all hits known active hits enrichment before optimization after optimization 2,476 18 15 18 3.27 539.89 Active Hit Distribution ß2-adrenoreceptor antagonists Mixing 18 active compounds with random 9,700 NCI molecules. Sorting by pharmacophore similarity. Screening Validation 10,000 NCI compounds family before optimization actives all hits after optimization active hits enrichment all hits active hits enrichment ACE 7 6,537 6 1.27 171 6 47.01 Angiotensin2 4 177 3 40.40 66 3 105.50 D2 5 417 5 22.90 31 5 269.08 delta 7 60 5 106.70 9 5 495.25 FTP 13 1020 11 7.97 13 10 422.30 mGluR1 7 1744 3 2.38 10 7 571.10 NPY Y5 49 6370 38 1.18 145 45 47.12 3 328 2 19.6 57 2 109.64 thrombin Optimized Screening JSP Example Optimized Screening JSP Example Hits Clustering About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments JKlustor • JarvisPatrick • Ward Ward Clustering Features • Ward's minimum variance method • Murtagh's reciprocal nearest neighbor (RNN) algorithm • O(n2) time complexity • O(n) memory complexity Ward Pharmacophore Clustering Example • 8 active compound sets – – – – – – – – 5-HT3-antagonists ACE inhibitors angiotensin 2 antagonists D2 antagonists delta antagonists FTP antagonists mGluR1 antagonists thrombin inhibitors Ward Centroids A Ward Cluster D2 antagonists Maximum Common Substructure Clustering Drug Design About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments RECAP fragmentation example amide:2 ether:1 amide:1 amine:1 amine:2 ether:2 Virtual Synthesis About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments The Ideal Virtual Reaction • Generic (simple) – the equation describes the transformation only – few hundred generic reactions can form the basic armory of a preparative chemist • Specific (complex) – chemo-, recognizes reactive and inactive functional groups – regio-, "knows" directing rules – stereo-, inversion/retention • Customizable – to improve reaction model quality Reaction Modeling • Processing selective "smart" reactions • Batch mode (sequential or combinatorial combinations) • Reverse direction • High performance (speed and capacity) Customizable Reaction Engine! Chemoselective Reaction Definition REACTIVITY: !match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1) && !match(ratom(3), "[N,O,S:1][C,P,S]=[N,O,S]", 1) Reactants 369 isocyanates and isothiocyanates 2920 amines, alcohols and thiols Chemoselective Reaction Products 1,264,391 single site products Regioselectivity (Markovnikov, Zaitsev) Addition reaction definition with the Markovnikov rule. r1 SELECTIVITY: hcount(ratom(2)) An elimination reaction definition with Zaitsev’s rule. r2 SELECTIVITY: -hcount(ratom(2)) Regioselective Reaction Example Chlorine migration example in four steps by consecutive elimination and addition reactions. r2 r2 r1 r1 Regioselectivity (SeAr) Reaction definition of aromatic electrophile bromination of the benzene ring. The expression defines a regioselectivity rule for the major product. SELECTIVITY: TOLERANCE: -charge(ratom(1)) 0.0045 Regioselectivity (SeAr) Products The virtual bromination of toluene with the above reacton definition results the ortho and para isomer as main product… … and bromine is directed into the meta position in case of nitro-benzene. Regioselectivity (SeAr) Example Products 1,198 monobrominated main products (tolerance is set to zero) Virtual Synthesis • Multiple steps • Flexible compound dispatching • Synthesis rules • Synthesis tree building • Memory, file and database mode • Graphical synthesis browser • Building block coloring Customizable Synthesis Engine! Synthesis Example alkyne coupling lacton aminolysis esterification Derek S. Tan, Michael A. Foley, Matthew D. Shair, Stuart L. Schreiber*, J. Am. Chem. Soc., 1998, 120, 8565-8566 Synthesis Definition Synthesis route definition R1 Step1: A+B Step2: C+D R2 E Step3: E+F R3 G "Smart" reaction library R1: alkyl-iodid + alkyne >> alkyl-alkyne R2: lacton + amine >> amide R3: alcohol + carboxylic acid >> ester C Component set definition Set1: Set2: Set3: Set4: Set5: Set6: Set7: A B1, B2, B3 D1, D2 F1, F2 Synthesis Browser Current Developments About Us Molecule Drawing and Visualization Structure Searching Cartridge Technology Structure Standardization Molecular Predictions Chemical Expressions Screening Clustering Fragment Analysis Virtual Synthesis Current Developments Recent Developments • Automatic searching of low-energy conformers • Improved Oracle cartridge • Structure searching combined with chemical calculations • Exhaustive Synthesis for metabolism applications • R-group decomposition • Maximum common substructure search in molecule pairs and in libraries Current Developments • MarvinSpace, an OpenGL based 3D molecule and surface visualisation engine for small and macromolecules • Instant JChem Base, a desktop and enterprise chemical database client with form builder • IUPAC naming plugin • Isoelectric point plugin • Random Synthesis for building up a diverse virtual space of synthetically feasible compounds • Extension of the reaction library • Further descriptors in the Topology Analysis plugin Future Plans • Metabolic transformation library • Diverse database of synthetically accessible compounds • Search in Markush compounds • Peptide builder • Fragment-based activity analysis of compound libraries • AnalogMaker (fragment based random evolutionary analog design) • Retrosynthesis Visit us • Home page – www.chemaxon.com • Forum – www.chemaxon.com/forum • Animated demos and tutorials – www.chemaxon.com/demos • Presentations and posters – www.chemaxon.com/conf Thank you for your attention Máramaros köz 3/a Budapest, 1037 Hungary info@chemaxon.com www.chemaxon.com