Overview of 3D Searching: A Powerful Tool for Computer-Assisted Molecular Design R. S. Pearlman1 and O. F. Güner2 1College of Pharmacy, University of Texas, Austin, TX 2Turquoise Consulting, San Diego, CA Copyright © 2006 Turquoise Consulting. All Rights Reserved Introductory Remarks Drug action is a 2-phase process; drug discovery must address both phases: Transport of drug from site of administration (e.g., GI-tract if oral, blood or muscle if injected) to the “biophase” where the receptor is located Does the compound have appropriate physical chemical properties? Is it drug-like? Is it bioavailable? Interaction of drug with receptor Does the compound have the right size, shape, and substructural features (e.g., pharmacophore) to enable favorable interaction with receptor? Does it show sufficient “affinity” for receptor and/or display intrinsic “activity”? 2 Copyright © 2006 Turquoise Consulting. All Rights Reserved Outline What is 3D searching? Why perform 3D searching? Essential components required Brief example 3 Copyright © 2006 Turquoise Consulting. All Rights Reserved What is 3D Searching? Searching within large databases of 3D chemical structures for those compounds which satisfy both the chemical and geometric requirements specified in the 3D search query The search typically reflects the chemical and geometric requirements for a ligand to interact favorably with a particular bio-receptor That is, the search query usually reflects “the pharmacophore” 3D Searching review articles VanDrie, J. H. “3D Database Searching in Drug Discovery,” http://www.netsci.org/Science/Cheminform/feature06.html Güner, O. F. and Henry, D. R. “Three-dimensional Structure Searching,” in The Encyclopedia of Computational Chemistry; Schleyer, P. v. R.; Allinger, N. L. Clark, T.; Gasteiger, J.; Kollman, P. A.; Schaefer III, H. F.; Schreiner, P. R. (Eds.): John Wiley & Sons: Chichester, 1998, vol 5, pp 2988-3003. Kurogi, Y. and Güner, O. F. “Pharmacophore Modeling and Threedimensional Database Searching for Drug Design Using Catalyst,” Curr. Med. Chem. 2001, 8, 1035-1055. 4 Copyright © 2006 Turquoise Consulting. All Rights Reserved Outline What is 3D searching? Why perform 3D searching? Essential components required Brief example 5 Copyright © 2006 Turquoise Consulting. All Rights Reserved Why Perform 3D Searching? Ideas for extending existing leads Ideas for new leads Retrieving “active” compounds from commercial databases “We use 3D searching to break other people’s patents.” (anonymous --- AAPS, 1989) “Cover” planned patents Examples of pharmacophores in patents: WO 98/04913 – Biogen patent on VLA-4 inhibitors WO 98/46630 – Peptide therapeutics on Hepatitis C NS3 inhibitors US 2002/0013372 – Pfizer on CYP 2D6 inhibitors Used to validate pharmacophore models Used in “reverse” to predict other activities 6 Copyright © 2006 Turquoise Consulting. All Rights Reserved Three “Levels” of Computer-Assisted Drug Design No knowledge of receptor structure Classical QSAR Modern QSAR Recursive partitioning Limited knowledge of receptor structure 3D searching/screening based upon structural complementarity (e.g., Catalyst, ISIS/3D, Unity, Phase) Ligand-based pharmacophores (e.g., DISCOtech, GASP, HipHop) Pharmacophore-based QSAR, (e.g., CoMFA, HypoGen) Complete knowledge of receptor structure “Structure-based drug design” Docking (e.g., Glide, C2.LigandFit, DOCK, FlexX, GOLD) De Novo design (e.g., LeapFrog, Ludi) Receptor-based pharmacophores (e.g., C2.SBF, Catalyst, Unity) Screening based on computational estimates of drug-receptor “interaction energy” 7 Copyright © 2006 Turquoise Consulting. All Rights Reserved Searching Software – 2D vs 3D 2D Substructure (and similarity) searching User specifies atoms and how they are connected Users can’t discard undesired “hits” based on 3D geometry of those hits (no 3D information allowed) Therefore, user must discard undesired hits based on connectivity and, thereby, pre-determines a large fraction of sub-structural information about hits 3D Searching User specifies atom-types and their relative position in 3D-space User does not specify how atoms are connected User does not pre-determine “chemistry” of hits 8 Copyright © 2006 Turquoise Consulting. All Rights Reserved 2D vs 3D Searching 2D search: uses 2D connectivity to constrain positions of chemical features 3D search: uses 3D geometry to constrain positions of chemical features Unconstrained search for all compounds containing >C=O, -OH, and –CL would return many spurious hits (“false positives”) 9 Copyright © 2006 Turquoise Consulting. All Rights Reserved Outline What is 3D searching? Why perform 3D searching? Essential components required Brief example 10 Copyright © 2006 Turquoise Consulting. All Rights Reserved Essential Components Required for 3D Searching Large number of interesting 3D structures Database management software for storage and retrieval Software to perform search based upon 3D criteria Rational 3D search criteria 11 Copyright © 2006 Turquoise Consulting. All Rights Reserved Methods for Acquiring 3D Structures Experimental determination (e.g., X-Ray, NMR) CSD (CCDC), ca. 100,000 organic compounds Most are not pharmacologically relevant MM or MO geometry optimization Requires 3D (not 2D) initial structures License commercial database of 3D structures Convert corporate 2D database to 3D structures and hunt for “buried treasure” Corporate structures, virtual libraries, combinatorial libraries 12 Copyright © 2006 Turquoise Consulting. All Rights Reserved Comments on 3D Databases The value of searching software is limited by the value of databases being searched Size matters (not too big, not too small) Diversity (sometimes low in corporate databases) “Richness” “Relevance” of structures Non-structural information Availability of compounds Stereochemistry Needs to be dealt with either within the database, search query, or search algorithm 13 Copyright © 2006 Turquoise Consulting. All Rights Reserved Software for Generating 3D Databases CONCORD Pearlman, R. S. “Rapid Generation of High Quality Approximate 3D Molecular Structures,” Chem. Des. Aut. News, 1987, 2, 1-7. CORINA Hiller, C.; Gasteiger, J. “Ein Automatisierter Molekülbaukasten,” in Software-Entwicklung in der Chemie, vol. 1; Gasteiger, J. Ed.; Springer, 1987, Berlin, pp 53-66 WIZARD Dolata, P. D.; Leach, A. R.; Prout, K., “Wizard: AI in conformational analysis,” J. Comput.-Aided Mol. Des., 1987, 1, 73-85. AIMB Wipke, W. T.; Hahn, M. A., “AIMB: Analogy and intelligence in model building. System description and performance characteristics,” Tetrahedron Comput. Meth., 1988, 1, 141. 14 Copyright © 2006 Turquoise Consulting. All Rights Reserved CONCORD – General Capabilities Converts CONnection table to 3D CoORDinates 2D CT contains information about connectivity per se 2.5D Contains additional information about stereochemistry Handles almost all “drug-sized” compounds Handles input/output in a wide variety of ways Very fast Good to excellent structures Limitations No inorganics or metallo-organics Single low energy conformation Not intended for macrocycles, polymers, or other highly flexible structures for which 3D structure is determined by extrinsic rather than intrinsic forces 15 Copyright © 2006 Turquoise Consulting. All Rights Reserved CONCORD Algorithm “Expert-system approach with MM cleanup Uses rule-based “chemical intuition” when applicable (most acyclic substructures) Uses pseudo-molecular mechanics approach when “intuition” not applicable (most cyclic substructures) A novel strain function is minimized Strain is a function defined such that minimization is performed over a single, composite variable Initial structure is checked for close-contacts; dihedrals causing close-contacts (or all acyclic dihedrals) are then relaxed by ultrafast MM optimization Carried out in torsion space, using analytical gradients, and with substantial topological speed-ups 16 Copyright © 2006 Turquoise Consulting. All Rights Reserved Commercially Available 3D Databases ACD, CMC, MDDR – MDL Information Systems, Inc. Pomona-90C – Daylight Information Systems, Inc. CAST-3D, CAS Registry File – Chemical Abstract Services TRIAD – UC Berkeley CHDC, NCI – Tripos Inc, CAP, Maybridge, NCI, WDI– Accelrys Inc. NCI – Nat’l Cancer Institute Plus Corporate databases and chemical supply houses 17 Copyright © 2006 Turquoise Consulting. All Rights Reserved Essential Components Required for 3D Searching Large number of interesting 3D structures Database management software for storage and retrieval Software to perform search based upon 3D criteria Rational 3D search criteria 18 Copyright © 2006 Turquoise Consulting. All Rights Reserved Database Management Software (DBMS) General database management software There are many examples… all chemistry ignorant (data “name” oriented) Chemical database management software Accelrys, CAS, Daylight, MDL, Tripos, Oracle* Unique, structure-related storage key Searchable by structure, as well as name, etc. Searchable by 2D substructure keys Searchable by 3D substructure keys Integrated queries (including biological, chemical data, etc.) Some include 3D shape based searches Some interface with other modeling and analysis software tools 19 Copyright © 2006 Turquoise Consulting. All Rights Reserved DBMS -- Keys “short-cuts” Bit strings (a.k.a. “fingerprints”) – is a particular feature present? Yes or no? 2D Substructural keys 3D object/distance keys Object: atom, lone-pair, ring-centroid, projected point, etc. Distance (rigid): single distance bins Distance (flexible): ranges of bins 3D shape-based keys 3, 4-point pharmacophore keys 20 Copyright © 2006 Turquoise Consulting. All Rights Reserved 2D Substructural Keys Which pre-defined 2D substructures are present in this compound? 21 Copyright © 2006 Turquoise Consulting. All Rights Reserved 3D Object/distance Keys Which inter-object distances are present within this conformation? 22 Copyright © 2006 Turquoise Consulting. All Rights Reserved 3D Object/distance Keys [Flexible Search with UNITY and ISIS/3D] Which inter-object distances could be achieved by this compound? Max/min distance ranges 23 Copyright © 2006 Turquoise Consulting. All Rights Reserved Flexible Search with Catalyst crefs from database crefs from rigid hits = Key-based screening with loose tolerance query Fitting with loose tolerance query crefs for flexible fit Fit optimization Check thresholds? Energy minimization Bad - loop back crefs for screen crefs for fit Parallelized ! Good - break out Flexible Hit 24 Flexible + Rigid Hits Hit List Copyright © 2006 Turquoise Consulting. All Rights Reserved Essential Components Required for 3D Searching Large number of interesting 3D structures Database management software for storage and retrieval Software to perform search based upon 3D criteria Rational 3D search criteria 25 Copyright © 2006 Turquoise Consulting. All Rights Reserved History and Evolution of 3D DBMS Software 1974 MOLPAT – Gund, Wipke, Langridge - Princeton 1982 DOCK – Kuntz et al. – UC San Francisco 1987-88 – Fast 3D Builders CONCORD, CORINA, WIZARD, AIMB 1988 Caveat – Bartlett – UC Berkeley 1988 3D Search – Sheridan et al. Lederle Labs 1989 Aladdin – Van Drie, Martin – Abbott Labs 1989 MACCS-3D – Henry et al., MDL 1990 ChemDBS-3D – Davies et al., Accelrys 1991 UNITY – Hurst et al., Tripos 1992 ISIS/3D - Henry et al., MDL 1992/3 Catalyst – Van Drie, Kahn - Accelrys 26 Copyright © 2006 Turquoise Consulting. All Rights Reserved MACCS-3D Screen from MACCS-3D displaying a hit retrieved from MDDR3D based on a CNS active drugs pharmacophore from: Lloyd, E.J. and Andrews, P.R. J. Med. Chem. 1986, 29, 453. 27 Copyright © 2006 Turquoise Consulting. All Rights Reserved ISIS/3D Screen from ISIS/3D displaying a dopamine antagonist pharmacophore proposed by Martin, Y.C. Tetrahedron Comput. Meth. 1990, 3, 15-25; and a hit retrieved from MDDR-3D. 28 Copyright © 2006 Turquoise Consulting. All Rights Reserved Unity A UNITY query based on a set of muscarinic M3 receptor antagonists (Marriott, D.P., Dougall, I.G., Meghani, P., Liu, Y., and Flower, D.R. J. Med. Chem. 1999, 42, 3210) developed using DISCOtech and refined via Tripos’ Pharmacophore Model Analysis tools. 29 Copyright © 2006 Turquoise Consulting. All Rights Reserved Catalyst Screen from Catalyst displaying a hit retrieved from Maybridge database based on an angiotensin II blockers pharmacophore developed by Peter Sprague. The conformation of the hit with the highest score is shown overlaid with the original query. 30 Copyright © 2006 Turquoise Consulting. All Rights Reserved Phase 31 Copyright © 2006 Turquoise Consulting. All Rights Reserved Query Development 2-Fold objective Find leads to active compounds Don’t find leads to inactive compounds Iterative process Initial query Validate Modify or improve Build small, development database (training set) Include actives and “relevant” inactives Avoids ambiguities caused by hits of unknown activity 32 Copyright © 2006 Turquoise Consulting. All Rights Reserved Searching Software -- Queries Non-structural criteria Chemical criteria: atom types or 2D substructures (fragments) Geometric criteria: 3D constraints between objects Representation of pharmacophore Shape criteria Inclusion volumes, exclusion volumes 33 Copyright © 2006 Turquoise Consulting. All Rights Reserved Searching Software -- Queries Atom-types: Element Charge, hybridization, connectivity, ring, etc. Objects Atoms, pharmacophore features (e.g., hydrophobe) Points, vectors, planes, spheres, etc. Constraints Distances Angles, dihedrals RMS deviations Inclusion, exclusion volumes 34 Copyright © 2006 Turquoise Consulting. All Rights Reserved Importance of “Forbidden Regions” Example: rigid search, same atomic constraints No forbidden regions: 2,016 hits 2 “sides of box”: 1,377 hits 4 “sides of box”: 383 hits 5 “sides of box”: 46 hits Forbidden regions are even more important for flexible searching Explore and report inactivity 35 Copyright © 2006 Turquoise Consulting. All Rights Reserved Searching Software – Search Phases Key-based screening (rapid screen-out phase) The objective is to quickly eliminate all those compounds that cannot possibly satisfy the query Use keyed structural information Consider one-constraint at a time Atom-by-atom mapping (slower geometric search) The objective is to actually verify that the compound satisfies the query Consider all constraints simultaneously Conformational flexibility issue needs to be addressed Stereochemistry 36 Copyright © 2006 Turquoise Consulting. All Rights Reserved 3D Searching Process Query input Databases and Spreadsheets Database Subset Key-based screening 2D, 3D, 4D, 1D Hits Atom-by-atom mapping Hit list output 37 Copyright © 2006 Turquoise Consulting. All Rights Reserved Conformational Flexibility The issue: If the query reflects putative bound conformation which is different than the low energy conformation, then search over database of low energy conformations might miss some interesting hits. Approaches to the issue: Handle the conformational flexibility within the database, by storing multiple conformations of each compound Handle the conformational flexibility within the searching query via flexible queries Handle the conformational flexibility within the search process, via on-the-fly conformational exploration 38 Copyright © 2006 Turquoise Consulting. All Rights Reserved Addressing Conformational Flexibility – in the Database Store multiple conformations of each compound Too many to store (33 = 27, 124 = 20,736) Too many to search Still no guarantee that bound conformation is amongst those stored Example reference: Murrall, N. W.; Davies, E. K. “Conformational Freedom in 3-D Databases,” J. Chem. Inf. Comput. Sci., 1990, 30, 312-316. 39 Copyright © 2006 Turquoise Consulting. All Rights Reserved Addressing Conformational Flexibility – in Search Query Build conformational flexibility into query Specify ranges for geometric constraints Increasing range increase false positives Decreasing range increase false negatives Specify “hinge” points Differentiate parts of the query dealing with flexible regions Generality of query may be compromised Example references: Güner, O. F.; Henry, D. R.; Pearlman, R. S. “Use of Flexible Queries for Searching Conformationally Flexible Molecules in Databases of ThreeDimensional Structures,” J. Chem. Inf. Comput. Sci. 1992, 32, 101-109. Güner, O. F.; Henry, D. R.; Moock, T. E.; Pearlman, R. S. “Flexible Queries in 3D Searching. 2. Techniques in 3D Query Formulation,” Tetrahedron Comp. Meth. 1990, 3(6C), 557-563. 40 Copyright © 2006 Turquoise Consulting. All Rights Reserved Addressing Conformational Flexibility – during Search Explore conformational flexibility at search-time Rigid: does this conformation match query? Flexible: could this compound match query? 2-phase process: First rapid screen based upon max/min distance keys Then, slower conformational search Ensures that no hits are missed (however, it is important note local minima problem) Example references: Moock, T. E.; Henry, D. R.; Ozkabak, A. G.; Alamgir, M. “Conformational Searching in ISIS/3D Databases,” J. Chem. Inf. Comput. Sci., 1994, 34, 184-189. Hurst, T. “Flexible 3D Searching: the Directed Tweak Technique,” J. Chem. Inf. Comput. Sci., 1994, 34, 190-196. 41 Copyright © 2006 Turquoise Consulting. All Rights Reserved Conformational Coverage in 3D Databases A. Smellie, S.L. Teig, and P. Towbin, "Poling: Promoting Conformational Coverage", J. Comp. Chem., 1995, 16, 171-187. A. Smellie, S.D. Kahn, and S. Teig, "An Analysis of Conformational Coverage 1. Validation and Estimation of Coverage", J. Chem. Inf. Comput. Sci., 1995, 35, 285-294. A. Smellie, S.D. Kahn, and S. Teig, "An Analysis of Conformational Coverage 2. Applications of Conformational Models" , J. Chem. Inf. Comput. Sci., 1995, 35, 295-304. – The most commonly used approach is to store multiple diverse conformations in the database and perform a flexible search • ISIS/3D – performs a random kick before giving up on a conformation • UNITY – performs user defined number of kicks • Catalyst – uses “poling” algorithm to store multiple conformations (av. >30 conf.s) 42 Copyright © 2006 Turquoise Consulting. All Rights Reserved Essential Components Required for 3D Searching Large number of interesting 3D structures Database management software for storage and retrieval Software to perform search based upon 3D criteria Rational 3D search criteria 43 Copyright © 2006 Turquoise Consulting. All Rights Reserved Rational 3D Search Criteria: Pharmacophore Perception Relatively easy if receptor structure is known Otherwise, based on analysis of actives and inactives Relatively easy if compounds are rigid Otherwise difficult and expensive E.g., Active analog approach: Constrained systematic search, considering intersection of accessible conformation-space of all compounds in the training set. The concept of pharmacophores and detailed examples of pharmacophore development will be covered in the next lecture 44 Copyright © 2006 Turquoise Consulting. All Rights Reserved Outline What is 3D searching? Why perform 3D searching? Essential components required Brief example 45 Copyright © 2006 Turquoise Consulting. All Rights Reserved ACE Inhibitors 46 Copyright © 2006 Turquoise Consulting. All Rights Reserved ACE Inhibitors Pharmacophore Model Mayer, D.; Naylor, C. B.; Motoc, I.; Marshall, G. R. “A unique geometry of the active site of angiotensin-converting enzyme consistent with structure-activity studies,” J. Comput.-Aided Mol. Des. 1987, 1(1), 3-16. Object-1: Zn-ligand (sulfhydryl or carboxylate oxygen) Object-2: H-bond acceptor (N, O, or F) Object-3: anion (--CS-, --COO-, --SO4-2, or –-PO4-3) Object-4: indicates direction of lone-pair on object-2 Object-5: “central” atom in anion labeled object-3 47 Copyright © 2006 Turquoise Consulting. All Rights Reserved Note Regarding Geometric Search Criteria Conformation space of potential ligands is, generally, multi-dimensional space of high volume Combination of apparently broad criteria on each of the several axes results in greatly reduced volume of conformation space to be explored Rubics cube example 48 Copyright © 2006 Turquoise Consulting. All Rights Reserved Search for ACE Inhibitors Search performed at Lederle laboratories by Sheridan, R. P.; Nilakantan, R.; Rusinko, A. III; Bauman, N.; Haraki, K. S.; Venkataraghavan, R., “3DSEARCH: A system for three-dimensionsl substruture searching,” J. Chem. Inf. Comput. Sci., 1989, 29, 255-260. Found 96 “hits” in their corporate database of 223,988 structures Required ca. 7 VAX-8650 CPU minutes [would require ca. 2 SGI R10k seconds] 49 Copyright © 2006 Turquoise Consulting. All Rights Reserved Summary 3D Searching works but requires a team effort: Laboratory synthesis and testing (and/or HTS) Molecular modeling for query refinement Tight interface between modeling and searching software Hit list analysis, prioritization, post processing 3D Searching can spark chemists’ imagination The more information provided by chemists, the more information returned by 3D search 50 Copyright © 2006 Turquoise Consulting. All Rights Reserved