3D Molecular Structures C371 Fall 2004 Morgan Algorithm (Leach & Gillet, p. 8) Bioisosteres (Leach & Gillet, p. 31) Milestones In Chemical Information: IV (PW) • Structure diagrams are planar but molecules are not, so need to extend existing 2D screening and graph-search methods to allow 3D substructure searching (Pfizer and Lederle, 1986-87) • Sources of 3D structural data – Experimental data (Cambridge Structure Database) – Computational chemistry (quantum mechanics, molecular mechanics, molecular dynamics) – Structure-generation methods for databases of molecules • CONCORD (Texas, 1987) • CORINA (Munich/Erlangen, 1990) • Further extensions to allow flexible searching (ICI, MDL and Tripos, 1991-94) Milestones In Molecular Modelling: IV (PW) • Use of 3D information in QSAR to facilitate structurebased approaches to drug discovery • COmparative Molecular Field Analysis (Tripos 1988), and related approaches – Calculate energies at points on a 3D grid surrounding a molecule – Statistical correlation with activity to identify important positions in space – Need for alignment Pharmacophore (Leach & Gillet, p. 32) 3D Substructure Searching (PW) O a = 8.62+ - 0.58 Angstroms N O b = 7.08+ - 0.56 Angstroms c a O c = 3.35+ - 0.65 Angstroms O O N b O O O S O O O O O O N N N N O O N N O O N N N O O N N O O P O O O N O N N N O P O O O O O N O P O O O O O O Current Activities: Virtual Screening (PW) • Need to prioritise the many molecules that could be tested • Increasingly sophisticated level of filtering to maximise the numbers of potential leads – “Drugability” considerations – Similarity searching (both 2D and 3D) using initial weak leads – 3D substructure searching once possible pharmacophoric patterns have been identified – Docking once the 3D structure of the biological target is available Cambridge Structural Database • X-ray crystal structures of more than 250,000 compounds (organic and organometallic) • Established in 1965 • Textual queries • Structural queries • Specific 3D constraints (conformation or distance variables) Protein Data Bank • More than 25,000 X-ray and NMR structures of protein and protein-ligand complexes • Some nucleic acid and carbohydrate structures • Founded in 1971 at Brookhaven National Laboratory; now run by a consortium • Retrieval by textual queries or in some interfaces by amino acid sequences Uses of the CSD and PDB • Data mining for conformational properties and intermolecular interactions (CSD & PDB) • Data mining for information about intermolecular interactions (CSD & PDB) • Further understanding of the nature of protein structure and its relationship to amino acid sequence (PDB) • Homology modeling (comparative modeling) (PDB) 3D Pharmacophores • Definition: a set of features together with their relative spatial orientation that are thought to be capable of interaction with a particular biological target – Hydrogen bond donors and acceptors – Positively and negatively charged groups – Hydrophobic regions and aromatic rings • Depends on atomic properties rather than element types • Does not depend on specific chemical connectivity Lipinski Rule of Five • Poor absorption or permeation are more likely when a molecule has: – More than five hydrogen bond donors – More than ten hydrogen bond acceptors – LogP greater than five – Molecular weight greater than 500 3D Database Searching • As with 2D searching, usually involves a 2stage process – Rapid screen to eliminate molecules that cannot match the query – Graph matching to identify matches • Interatomic distances between pairs of atoms are important Structure Generation Programs • CONCORD (Coordinates found in the CAS Registry File) • CORINA (COoRdINAtes) – About CORINA – Generating 3D structures with CORINA Conformational Search and Analysis; Systematic Conformational Search • Goal of Conformational Analysis: identify all accessible minimum-energy structures of a molecule • Global minimum-energy conformation: the minimum with the lowest energy • Systematic searches assign values to the torsion angles of the rotatable bonds in the molecule Random Conformational Search • Simulated annealing: temperature is gradually reduced from a high value to a low temperature Other Conformational Searches • Distance geometry • Molecular dynamics Deriving 3D Pharmacophores • Pharmacophore mapping: the process of deriving a 3D pharmacophore – Conformational flexibility – Different combinations of pharmacophoric groups in the molecule • Genetic algorithms: a class of optimization method based on computational models of Darwinian evolution Applications: Structural Genomics • Definitions (Goals) – Characterization of all protein structures in a given genome – Provide sufficient coverage fold space to facilitate accurate homology modeling of the majority of proteins of biological interest – PDB Target Database (http://targetdb.rcsb.org/) Searching 3D Protein Structures (PW) • Searching protein sequences is well established: how to search the 3D structures in the Protein Data Bank (PDB)? • Extensive collaboration between Information Studies and Molecular Biology and Biotechnology to develop graph representations of proteins that can be searched with isomorphism algorithms analogous to those used for chemical structures • Focus here on folding motifs (secondary structure elements) in proteins but others – Protein amino acid sidechains – Carbohydrates – Nucleic acids Representation Of Protein Folding Motifs: I (PW) • The helix and strand secondary structure elements (SSE) are both approximately linear, repeating structures, which can hence be represented by vectors drawn along their major axes • The nodes of the graph are these vectors and the edges comprise: – The angle between a pair of vectors – The distance of closest approach of the two vectors – The distance between the vectors’ mid-points • PROTEP compares such representation using a maximal common subgraph isomorphism algorithm to identify common folds Representation Of Protein Folding Motifs: II (PW) Structural Relationship Between Leucine Aminopeptidase And Carboxypeptidase A (PW) • Use of 1LAP as the target for a PROTEP search requiring structures with at least 7 SSEs in common with the target • The four carboxypeptidase structures in the PDB at that time have a fold containing five helices and eight strands in a sheet in common with 1LAP