The University of Texas at Dallas The Database of Molecular Motions Anastasia Kurdia 2006 Introduction ...................................................................................................................................................................... 3 MolMov database ............................................................................................................................................................. 4 Homogenization of input ................................................................................................................................................. 7 Morph Server ................................................................................................................................................................... 5 Hinge prediction ............................................................................................................................................................... 8 Summary ........................................................................................................................................................................... 9 References ....................................................................................................................................................................... 10 2 Introduction Studying molecular motion leads to understanding of functions of molecules, since motion is closely related to the way a macromolecule of a given structure fulfills a particular function. Obtaining coherent input data, generating a feasible trajectory within acceptable time bounds, classifying, storing, retrieving, comparing and analyzing results are only a share of the challenges of various nature arising during exploration of molecular motion. This paper describes the ways that MolMov project of Mark Gerstein’s lab in Yale University solves these problems in the software family consisting of a Database of Molecular Motions (MolMov database) and several supporting applications (Morph Server, Hinge Master and others). It complements class presentation of web interface to MolMov and attempts to address the questions risen during class discussion as well as those from a project description. MolMov database has existed for a decade and seems to have undergone dramatic modifications just recently. Currently the papers only outline the changes and report case studies; a little is written on details of simulation or visualization. Therefore, less attention is paid to minute analysis of algorithms; rather an emphasis is made on describing the principles, relationships, strong and weak features of the used procedures. 3 MolMov database The Database of Molecular Motions is a collection of movies characterizing motion of biomolecules. It was a first application to introduce and employ standardized notation for describing motions. Since a single molecule can demonstrate various motions and at the same time one type of motion can be common among a large family of molecules, a motion characteristic should be designed regardless of specific molecule. One feature that serves as a basis for motion classification is motion size: subunit, domain, fragment [6]. Large-scale motion, or domain motion, is a most common type of motion. Motion of fragments smaller than domains is referred to as fragment motion and describes the motion of surface loops or secondary structures. For proteins, domain and fragment motions usually involve portions of the protein closing around a binding site. Subunit motions are small-scale motions. Fragment and domain protein motions are also differ on the basis of packing of atoms inside the proteins [6]. Tertiary structure usually greatly restricts the range of motion. Shear motions is a sliding motion that occurs with respect to a great number of bonds but does not induce repacking. Hinge motion involves moving of two or more domains of the backbone, underconstrained by packing, around the links, or hinges, connecting them and usually features just a few dramatic dihedral angle changes. The entries in MolMov database are morphs, or movies, illustrating motions. Entries also contain a set of attributes that help to characterize and classify the motions: maximum displacement of all atoms or just backbone atoms, degree of rotation around the hinge, numbers of residues with large dihedral angle change. To guaranty direct comparability of these attributes, motions are places in unified coordinate system. The database consists of two major parts: Protein motions, that has a smaller size but is populated manually and Movies, that is filled with morphs produced by Morph Server from user-submitted input. Entries in the former contain more accurate information, often referencing published description of specific protein motion. Gene ontology annotation (GOA) terms defining molecular function, cellular component and biological process of protein has been added to the database. This not only increases searching capabilities of the database, but also leads to understanding of the connection between type of motion and a role a protein plays[2]. 4 Morph Server The supporting Morph Server is an application that facilitates database entry generation. Morph Server computes a discrete pathway between start and end configurations, defined in .pbd files, and renders the resulting frames into a movie. The pathway generation is done using adiabatic mapping. Adiabatic mapping technique causes selected atoms to move along given path to correspond to desired conformational change. Other atoms are allowed to move freely under constraints of potential energy minimization at each step [1]. The major advantage of this technique is low computational cost. However, dependence on an a priori chosen path constitutes its major drawback: if in fact a molecule moves along an alternative path (actually, any path deviating from linear interpolation of trajectory between start and end configuration [2]), adiabatic mapping results are far from physical. Moreover, energy minimization step performs fast for local motions, but tends to slow the computation down for large domain motion. Lastly, dependence of thermodynamic potential and entropy of a molecule on temperature is not accounted for during energy minimization [15], what also lowers credibility of the result. An alternative method of interpolation FRODA lite that is based on newly introduced technique FRODA was recently added to the Morph Server. Original FRODA algorithm first finds rigid bodies within a protein by counting internal degrees of freedom of the molecule and identifying constrained regions. Each rigid body (a unit rigid body being an atom) is assigned a so called ghost template so that each atom belongs to at least one ghost template. Ghost templates can intersect only at vertex of rotatable dihedral angle. Then, by randomly displacing ghost templates and iteratively fitting remaining atoms into new locations so that constraints imposed by bond lengths, dihedral angle values, van der Waals radii are satisfied, FRODA finds a new feasible configuration: when the best fit of a ghost template to new location of atoms is found, least-squares fit or ghost 5 template to new positions of atoms is computed and displaced atoms are fit into new positions in ghost template. Carbon atoms that belong to two ghost templates are put equidistantly from corresponding ghost template points. If all atoms are located within some tolerance distance (0.125 Ǻ is the value used by the Morph Server) from respective points of their ghost templates, a new configuration is found. In directed version of FRODA, displacement of ghost templates is directed towards final configuration, however, a random component is also present in the process of displacement, what helps to ensure that a simulation will reach destination configuration even if at some step all constrains cannot be satisfied (in the latter case, the simulation backs off to previous configuration and continues morphing [2]). A new conformation, produced at each step of FRODA simulation, is guaranteed to be sterically possible; and thus the resulting pathway is also theoretically possible one. However, it is not yet clear how close is the correspondence of computed and real path. Moreover, Morph Server uses FRODA lite version that does not take into account hydrogen atoms and therefore the constraints due to presence of hydrogen bonds. Although atomic radius of hydrogen and spherical space associated with hydrogen is smaller than that of other atoms on a protein backbone, it is not negligibly small and not considering hydrogen atoms may introduce the possibility of steric clashes. On the other hand, since original .pdb files may not contain positions of hydrogens and as discussed below, Morph Server needs to have corresponding atoms of start and end configuration in precise order, considering them would increase dependence on algorithms [9],[11] that fill input file with appropriate hydrogen atoms. 6 Homogenization of input Initial and final configurations, represented by coordinate .pdb files, do not necessarily have one-toone correspondence of their residues. Moreover, not just a functional motion but evolutionary path between two conformations may be of interest, and therefore, start and end configurations may possess significant differences in sequence of their atoms. The earliest problem faced by any software producing a pathway between two conformations is to find an association of atoms of initial and final configurations. In Morph Server both configurations are first parsed with X-PLOR [14] to find missing non-hydrogen atoms in known aminoacids. If atoms missing from one conformation are present in another conformation, then their location are guessed from superimposing and rotating that conformation; otherwise, no specific input is given to a next step. Known atoms are fixed and missing atoms’ positions are found after 1000 energy function minimization steps. Then, a sequence alignment is performed. Although at most two input files can now be submitted to the Morph Server for obtaining a morph, the server is meant to handle up to 10 input files, so multiple sequence alignment algorithms are built into it. If two submitted sequences exhibit high degree of similarity, AMPS [7] algorithm is used to perform alignment. If two sequences represent very distant homologues, a structural alignment that takes 3D coordinates of the atoms is performed instead. The user has an opportunity to define a similarity metric cutoff at which sequenced alignment is substituted with structural alignment. Developers of Morph Server freely distribute morphing script, however, script itself cannot produce a feasible morph. Complexity of .pdb format causes the need for input homogenization. For successful morphing, corresponding residues should be numbered exactly the same in both input files. Although input preprocessing is claimed to be a truly novel functionality of Morph Server [5], no description more detailed than outlined above is given in the papers, describing the server. 7 Hinge prediction A key element in studying structural mobility of proteins is identifying regions of flexibility on the backbone. It has been observed that a single rotation along a bond may be a cause of global motion of the protein. FlexOracle, a component of HingeMaster, is another technique in the family of applications that accompany the Database of Molecular motions. Taking configuration file of a single molecule as an input, it predicts location of hinges: it splits the molecule into two chains after some residue i and computes intramolecular potential energy using CHARMm [12]. The values of energy for both chains are summed up and stored; the split is iteratively performed for original molecule and each value of i. The bonds for which the value of i corresponds to lower energy are predicted to be in hinges. The process of potential energy computation implicitly implies protein’s solubility in order to take into consideration protein-solution interaction. The nature of the algorithm suggest that it works only for a single molecule, not a complex, of a soluble protein [2]. Experimental results showed the algorithm’s success in predicting hinge in such uncommon place as within an alpha helix of a small protein [8]. After hinges have been identified, FlexOracle applies forces to one domain of the protein and keeps the rest of the molecule locked in place. Computing forces needed to move the domain in each direction allows prediction of the path that matches the natural path of moving protein, or the ‘path of least resistance’. Morphing of molecule’s motion around hinges is done along this path. A more thorough description of hinge prediction algorithm is expected to appear in the paper by Samuel Flores et al. Without looking at details and results of experiments for proteins larger than a hundred residues, it is hard to estimate performance of hinge prediction algorithm. Hinges, sometimes with short relatively rigid regions between them, constitute loops. Finding hinges and analyzing aminoacids that constitute them may provide an insight into the problem of efficient loop identification, outlined in project description. 8 Summary From the very beginning, Database of Macromolecular motions and its satellite applications were developed with focus on speed, not chemical realism [3]. How plausible the resulting morphs are heavily depends on how distant start and end configurations are, how big a chosen iterative step is, as well as how close a real pathway is to a linear interpolation of trajectory between start and end points. Obviously, modern molecular dynamics simulation techniques are too costly for a webbased software. High cohesion of the Morph Server with the huge-sized MolMov database restricts development and distribution of the former as of a stand-alone, offline application. Therefore, alternative algorithms that are both fast and produce reasonably good morphs should become a part of the Morph Server. Addition of FRODA lite option enhances credibility of created morphs. The Morph Server has a unique capability of morphing evolutionary motion of distant homologues, FRODA produces sterically possible conformations. Coupled together, these two algorithms could produce evolutionary pathway between start and end configurations that would also have meaningful intermediate steps; plus, Morph Server’s mechanism of preprocessing the input could be used in FRODA for cases when good quality input data is not available. Standardized classification of motions is another significant feature of the database. However, not all molecular motions can be computed and put into existing categories. Morph Server can now process proteins and DNA/RNA sequences. Developing separate computational engines for proteins and nucleotides that would take into account specific features of these classes of molecules would probably enhance the quality of resulting morphs. Also, additional refinement of classification techniques could improve the number of classifiable motions and establish new relationship between different types of motion. 9 References [1] Stewart A. Adcock and J. Andrew McCammon Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins Chem. Rev.; 2006; 106(5) pp 1589 – 1615 [2] Samuel Flores, Nathaniel Echols, Duncan Milburn, Brandon Hespenheide, Kevin Keating, Jason Lu, Stephen Wells, Eric Z. Yu, Michael Thorpe and Mark Gerstein The Database of Macromolecular Motions: new features added at the decade mark Nucleic Acids Research 2006 34 Nucleic Acids Research, 2006, Vol. 34, Database issue D296-D301 [3] N Echols, D Milburn, M Gerstein MolMovDB: analysis and visualization of conformational change and structural flexibility (2003) Nucleic Acids Res 31: 478-82. [4] M.F. Thorpe and P.M. Duxbury (editors) Rigidiity theory and applications New York : Kluwer Academic, c2002 [5] MolMov database and web interface to supporting applications: molmovdb.org [6] W. G. Krebs and M. Gerstein The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework Nucleic Acids Res, vol. 28, pp. 1665-1675, 2000. [7] Barton, G.J. & Sternberg, M.J.E A strategy for the rapid multiple alignment of protein sequences: Confidence levels from tertiarystructure comparisons. J. Mol. Biol. 198, 327-337, (1987). [8] E.Landhuis From Sight To Insight:Visualization tools yield biomedical success stories Biomedical Computation Review, Winter 2005-2006: 23 [9] Word,J.M., Lovell,S.C., Richardson,J.S. and Richardson,D.C. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation J. Mol. Biol., , 285, , 1735– 1747, 1999 [10] Reduce software http://kinemage.biochem.duke.edu/ 10 [11] Whatif server http://swift.cmbi.kun.nl/ [12] CHARMm source server http://www.charmm.org/ [13] CHARMm tutorial http://www.ch.embnet.org/MD_tutorial/ [14] X-PLOR http://xplor.csb.yale.edu/xplor/ [15] McCammon J.A., Harvey S. Dynamics of proteins and nucleic acids Cambridge University Press, Cambridge, 1987 11