Anastasia`s project - The University of Texas at Dallas

advertisement
The University of Texas at Dallas
The Database of Molecular Motions
Anastasia Kurdia
2006
Introduction ...................................................................................................................................................................... 3
MolMov database ............................................................................................................................................................. 4
Homogenization of input ................................................................................................................................................. 7
Morph Server ................................................................................................................................................................... 5
Hinge prediction ............................................................................................................................................................... 8
Summary ........................................................................................................................................................................... 9
References ....................................................................................................................................................................... 10
2
Introduction
Studying molecular motion leads to understanding of functions of molecules, since motion is
closely related to the way a macromolecule of a given structure fulfills a particular function.
Obtaining coherent input data, generating a feasible trajectory within acceptable time bounds,
classifying, storing, retrieving, comparing and analyzing results are only a share of the challenges
of various nature arising during exploration of molecular motion. This paper describes the ways
that MolMov project of Mark Gerstein’s lab in Yale University solves these problems in the
software family consisting of a Database of Molecular Motions (MolMov database) and several
supporting applications (Morph Server, Hinge Master and others). It complements class
presentation of web interface to MolMov and attempts to address the questions risen during class
discussion as well as those from a project description. MolMov database has existed for a decade
and seems to have undergone dramatic modifications just recently. Currently the papers only
outline the changes and report case studies; a little is written on details of simulation or
visualization. Therefore, less attention is paid to minute analysis of algorithms; rather an emphasis
is made on describing the principles, relationships, strong and weak features of the used
procedures.
3
MolMov database
The Database of Molecular Motions is a collection of movies characterizing motion of
biomolecules. It was a first application to introduce and employ standardized notation for
describing motions. Since a single molecule can demonstrate various motions and at the same time
one type of motion can be common among a large family of molecules, a motion characteristic
should be designed regardless of specific molecule.
One feature that serves as a basis for motion classification is motion size: subunit, domain,
fragment [6]. Large-scale motion, or domain motion, is a most common type of motion. Motion of
fragments smaller than domains is referred to as fragment motion and describes the motion of
surface loops or secondary structures. For proteins, domain and fragment motions usually involve
portions of the protein closing around a binding site. Subunit motions are small-scale motions.
Fragment and domain protein motions are also differ on the basis of packing of atoms inside the
proteins [6]. Tertiary structure usually greatly restricts the range of motion. Shear motions is a
sliding motion that occurs with respect to a great number of bonds but does not induce repacking.
Hinge motion involves moving of two or more domains of the backbone, underconstrained by
packing, around the links, or hinges, connecting them and usually features just a few dramatic
dihedral angle changes.
The entries in MolMov database are morphs, or movies, illustrating motions. Entries also contain a
set of attributes that help to characterize and classify the motions: maximum displacement of all
atoms or just backbone atoms, degree of rotation around the hinge, numbers of residues with large
dihedral angle change. To guaranty direct comparability of these attributes, motions are places in
unified coordinate system. The database consists of two major parts: Protein motions, that has a
smaller size but is populated manually and Movies, that is filled with morphs produced by Morph
Server from user-submitted input. Entries in the former contain more accurate information, often
referencing published description of specific protein motion. Gene ontology annotation (GOA)
terms defining molecular function, cellular component and biological process of protein has been
added to the database. This not only increases searching capabilities of the database, but also leads
to understanding of the connection between type of motion and a role a protein plays[2].
4
Morph Server
The supporting Morph Server is an application that facilitates database entry generation. Morph
Server computes a discrete pathway between start and end configurations, defined in .pbd files, and
renders the resulting frames into a movie. The pathway generation is done using adiabatic mapping.
Adiabatic mapping technique causes selected atoms to move along given path to correspond to
desired conformational change. Other atoms are allowed to move freely under constraints of
potential energy minimization at each step [1]. The major advantage of this technique is low
computational cost. However, dependence on an a priori chosen path constitutes its major
drawback: if in fact a molecule moves along an alternative path (actually, any path deviating from
linear interpolation of trajectory between start and end configuration [2]), adiabatic mapping results
are far from physical. Moreover, energy minimization step performs fast for local motions, but
tends to slow the computation down for large domain motion. Lastly, dependence of
thermodynamic potential and entropy of a molecule on temperature is not accounted for during
energy minimization [15], what also lowers credibility of the result.
An alternative method of interpolation FRODA lite that is based on newly introduced technique
FRODA was recently added to the Morph Server. Original FRODA algorithm first finds rigid
bodies within a protein by counting internal degrees of freedom of the molecule and identifying
constrained regions. Each rigid body (a unit rigid body being an atom) is assigned a so called ghost
template so that each atom belongs to at least one ghost template. Ghost templates can intersect
only at vertex of rotatable dihedral angle. Then, by randomly displacing ghost templates and
iteratively fitting remaining atoms into new locations so that constraints imposed by bond lengths,
dihedral angle values, van der Waals radii are satisfied, FRODA finds a new feasible configuration:
when the best fit of a ghost template to new location of atoms is found, least-squares fit or ghost
5
template to new positions of atoms is computed and displaced atoms are fit into new positions in
ghost template. Carbon atoms that belong to two ghost templates are put equidistantly from
corresponding ghost template points. If all atoms are located within some tolerance distance
(0.125 วบ is the value used by the Morph Server) from respective points of their ghost templates, a
new configuration is found. In directed version of FRODA, displacement of ghost templates is
directed towards final configuration, however, a random component is also present in the process of
displacement, what helps to ensure that a simulation will reach destination configuration even if at
some step all constrains cannot be satisfied (in the latter case, the simulation backs off to previous
configuration and continues morphing [2]).
A new conformation, produced at each step of FRODA simulation, is guaranteed to be sterically
possible; and thus the resulting pathway is also theoretically possible one. However, it is not yet
clear how close is the correspondence of computed and real path. Moreover, Morph Server uses
FRODA lite version that does not take into account hydrogen atoms and therefore the constraints
due to presence of hydrogen bonds. Although atomic radius of hydrogen and spherical space
associated with hydrogen is smaller than that of other atoms on a protein backbone, it is not
negligibly small and not considering hydrogen atoms may introduce the possibility of steric
clashes. On the other hand, since original .pdb files may not contain positions of hydrogens and as
discussed below, Morph Server needs to have corresponding atoms of start and end configuration
in precise order, considering them would increase dependence on algorithms [9],[11] that fill input
file with appropriate hydrogen atoms.
6
Homogenization of input
Initial and final configurations, represented by coordinate .pdb files, do not necessarily have one-toone correspondence of their residues. Moreover, not just a functional motion but evolutionary path
between two conformations may be of interest, and therefore, start and end configurations may
possess significant differences in sequence of their atoms. The earliest problem faced by any
software producing a pathway between two conformations is to find an association of atoms of
initial and final configurations. In Morph Server both configurations are first parsed with X-PLOR
[14] to find missing non-hydrogen atoms in known aminoacids. If atoms missing from one
conformation are present in another conformation, then their location are guessed from
superimposing and rotating that conformation; otherwise, no specific input is given to a next step.
Known atoms are fixed and missing atoms’ positions are found after 1000 energy function
minimization steps. Then, a sequence alignment is performed. Although at most two input files can
now be submitted to the Morph Server for obtaining a morph, the server is meant to handle up to
10 input files, so multiple sequence alignment algorithms are built into it. If two submitted
sequences exhibit high degree of similarity, AMPS [7] algorithm is used to perform alignment. If
two sequences represent very distant homologues, a structural alignment that takes 3D coordinates
of the atoms is performed instead. The user has an opportunity to define a similarity metric cutoff at
which sequenced alignment is substituted with structural alignment. Developers of Morph Server
freely distribute morphing script, however, script itself cannot produce a feasible morph.
Complexity of .pdb format causes the need for input homogenization. For successful morphing,
corresponding residues should be numbered exactly the same in both input files. Although input
preprocessing is claimed to be a truly novel functionality of Morph Server [5], no description more
detailed than outlined above is given in the papers, describing the server.
7
Hinge prediction
A key element in studying structural mobility of proteins is identifying regions of flexibility on the
backbone. It has been observed that a single rotation along a bond may be a cause of global motion
of the protein. FlexOracle, a component of HingeMaster, is another technique in the family of
applications that accompany the Database of Molecular motions. Taking configuration file of a
single molecule as an input, it predicts location of hinges: it splits the molecule into two chains
after some residue i and computes intramolecular potential energy using CHARMm [12]. The
values of energy for both chains are summed up and stored; the split is iteratively performed for
original molecule and each value of i. The bonds for which the value of i corresponds to lower
energy are predicted to be in hinges. The process of potential energy computation implicitly implies
protein’s solubility in order to take into consideration protein-solution interaction. The nature of the
algorithm suggest that it works only for a single molecule, not a complex, of a soluble protein [2].
Experimental results showed the algorithm’s success in predicting hinge in such uncommon place
as within an alpha helix of a small protein [8]. After hinges have been identified, FlexOracle
applies forces to one domain of the protein and keeps the rest of the molecule locked in place.
Computing forces needed to move the domain in each direction allows prediction of the path that
matches the natural path of moving protein, or the ‘path of least resistance’. Morphing of
molecule’s motion around hinges is done along this path. A more thorough description of hinge
prediction algorithm is expected to appear in the paper by Samuel Flores et al. Without looking at
details and results of experiments for proteins larger than a hundred residues, it is hard to estimate
performance of hinge prediction algorithm. Hinges, sometimes with short relatively rigid regions
between them, constitute loops. Finding hinges and analyzing aminoacids that constitute them may
provide an insight into the problem of efficient loop identification, outlined in project description.
8
Summary
From the very beginning, Database of Macromolecular motions and its satellite applications were
developed with focus on speed, not chemical realism [3]. How plausible the resulting morphs are
heavily depends on how distant start and end configurations are, how big a chosen iterative step is,
as well as how close a real pathway is to a linear interpolation of trajectory between start and end
points. Obviously, modern molecular dynamics simulation techniques are too costly for a webbased software. High cohesion of the Morph Server with the huge-sized MolMov database restricts
development and distribution of the former as of a stand-alone, offline application. Therefore,
alternative algorithms that are both fast and produce reasonably good morphs should become a part
of the Morph Server. Addition of FRODA lite option enhances credibility of created morphs. The
Morph Server has a unique capability of morphing evolutionary motion of distant homologues,
FRODA produces sterically possible conformations. Coupled together, these two algorithms could
produce evolutionary pathway between start and end configurations that would also have
meaningful intermediate steps; plus, Morph Server’s mechanism of preprocessing the input could
be used in FRODA for cases when good quality input data is not available.
Standardized classification of motions is another significant feature of the database. However, not
all molecular motions can be computed and put into existing categories. Morph Server can now
process proteins and DNA/RNA sequences. Developing separate computational engines for
proteins and nucleotides that would take into account specific features of these classes of molecules
would probably enhance the quality of resulting morphs. Also, additional refinement of
classification techniques could improve the number of classifiable motions and establish new
relationship between different types of motion.
9
References
[1] Stewart A. Adcock and J. Andrew McCammon Molecular Dynamics: Survey of Methods for
Simulating the Activity of Proteins Chem. Rev.; 2006; 106(5) pp 1589 – 1615
[2] Samuel Flores, Nathaniel Echols, Duncan Milburn, Brandon Hespenheide, Kevin Keating,
Jason Lu, Stephen Wells, Eric Z. Yu, Michael Thorpe and Mark Gerstein The Database of
Macromolecular Motions: new features added at the decade mark Nucleic Acids Research 2006 34
Nucleic Acids Research, 2006, Vol. 34, Database issue D296-D301
[3] N Echols, D Milburn, M Gerstein MolMovDB: analysis and visualization of conformational
change and structural flexibility (2003) Nucleic Acids Res 31: 478-82.
[4] M.F. Thorpe and P.M. Duxbury (editors) Rigidiity theory and applications New York : Kluwer
Academic, c2002
[5] MolMov database and web interface to supporting applications: molmovdb.org
[6] W. G. Krebs and M. Gerstein The morph server: a standardized system for analyzing and
visualizing macromolecular motions in a database framework Nucleic Acids Res, vol. 28, pp.
1665-1675, 2000.
[7] Barton, G.J. & Sternberg, M.J.E A strategy for the rapid multiple alignment of protein
sequences: Confidence levels from tertiarystructure comparisons. J. Mol. Biol. 198, 327-337,
(1987).
[8] E.Landhuis From Sight To Insight:Visualization tools yield biomedical success stories
Biomedical Computation Review, Winter 2005-2006: 23
[9] Word,J.M., Lovell,S.C., Richardson,J.S. and Richardson,D.C. Asparagine and glutamine: using
hydrogen atom contacts in the choice of side-chain amide orientation J. Mol. Biol., , 285, , 1735–
1747, 1999
[10] Reduce software http://kinemage.biochem.duke.edu/
10
[11] Whatif server http://swift.cmbi.kun.nl/
[12] CHARMm source server http://www.charmm.org/
[13] CHARMm tutorial http://www.ch.embnet.org/MD_tutorial/
[14] X-PLOR http://xplor.csb.yale.edu/xplor/
[15] McCammon J.A., Harvey S. Dynamics of proteins and nucleic acids Cambridge University
Press, Cambridge, 1987
11
Download