Honig

advertisement
Honig (P0110) - 113 predictions: 113 3D
Comparative Modeling Using HMAP, NEST, Troll and
Physical-Chemical Principles
Zhexin Xiang1,2 , Donald Petrey1,2, Cinque Soto2,
Chris Tang3 and Barry Honig1,2
1
Howard Hughes Medical Institute, 2 Department Of Biochemistry And
Molecular Biophysics, 3 Integrated Program in Cellular, Molecular and
Biophysical Studies, Columbia University
bh6@columbia.edu
Overview - We participated in the fold recognition and homology sections of
the experiment using primarily in-house software. Much of this software is
novel and has not yet been published. The in-house software we used includes
HMAP (a hybrid sequence and structure based alignment between query and
template profiles), NEST (a new homology modeling program that is based on
an artificial evolution method), SCAP [1] and LOOPY [2] (a side-chain and
loop prediction program based on the colony energy approach), Troll [3]
/GRASP2 (an interactive program which contains all of the features of GRASP
plus multiple structure alignments and an easy to use graphical user interface
that displays both sequence and structure alignments), DIFALN/BINGO (a
graphical program to display and manually tune sequence alignments between
HMAP and CAFASP servers) and physical-chemical based energy functions to
evaluate alternate conformations.
Our strategies for fold recognition and homology modeling were very similar.
For fold recognition we generally attempted targets where HMAP detected
templates with a reasonable e-value threshold, or where we felt that HMAP
improved the alignments that came from the CAFASP servers. On occasion, we
noticed that CAFASP servers would detect significant hits where HMAP did
not. In all cases, this happened because the hit detected by the servers was not
in our database. Thus, we built a profile using HMAP for the new template and
used it to generate our own alignments. If we felt we had nothing to add beyond
what the servers listed, we decided not to submit that target.
For each target we would perform the following: 1) build 3D models for
sequence alignments from HMAP and selected CAFASP servers; 2) evaluate
each model with our own energy functions and with Verify3D [4]; and 3)
identify regions of the sequence where multiple structure alignments of family
members revealed either similarities or differences. If differences were
identified, we generally used energetic criteria to decide between models, but
on occasion used intuition derived from visually inspecting the alignments. The
alignments were adjusted based on the energy criteria and steps 1-3 above were
carried out again. This process was repeated until a satisfactory structure was
generated. One area where visual inspection was particularly useful was in
deleting insertions. In many cases we could easily delete loops and even some
secondary structure elements while minimally perturbing the structure.
Our strategy for homology modeling was closely related to that used in fold
recognition but with a few additional steps. Since NEST works so rapidly we
were able to use regions from different templates where we believed they
provided better local templates, and then fuse the ends of these regions into our
original template with a loop closure procedure [2]. In general, we did not try to
keep the target as close as possible to the template. We realized that this was a
risky procedure but we felt it important to test our ability, for example using the
refinement module of NEST, to try to relax the structure. This was sometimes
done with manual input. For example we always tested for buried charges and
unless we could visually identify a potential ion-pairing partner we would
either change the alignment or try to change the structure. This involved both
backbone and side chain movement.
Methods-HMAP is a fold-recognition and alignment program that relies on
profile-to-profile dynamic programming. Template profiles were derived from
SCOP-defined protein domains (version 1.57 at the time we built our database)
and consisted of several different types of information that could be derived
from the sequence and structure of a protein. In CASP5, our templates
primarily used information derived from secondary structure, fixed-length
sequence motifs, automated multiple structure alignments and sequence-based
profiles. Position-specific gap penalties were derived from the secondary
structure profiles generated from multiple alignments of structurally related
proteins. The results were stored in the form of a database of structural
templates. Profiles were calibrated so that the statistical significance of a hit
could be estimated. When a new target was released from CASP, we built a
query profile for the sequence based on its sequence-based profile and
secondary structure prediction (using a consensus between PSI-PRED [5], PHD
[6] and JNET [7]). The alignments given by HMAP were manually assessed
and then fed to the homology-modeling program NEST.
NEST is a homology program based on an artificial evolution method
(http://trantor.bioc.columbia.edu/~xiang/jackal). The program can build and
refine homology models based on single, composite or multiple templates.
Given an alignment between a query sequence and a template, the alignment
can be considered as a list of operations such as residue mutation, insertion or
deletion. Building a structure for the query sequence based on the template is a
process of performing these operations. Each operation will disturb the
template structure and involves an energy cost, either positive or negative. The
model building starts from the operation with the least energy cost and so on.
Each operation is finished with a slight energy minimization to remove atomic
clashes. The final structure is then subjected to more thorough energy
minimization. The minimization is done in torsion angle space. The energy
function consists of the following terms: van der Waals energy, hydrophobic,
electrostatics, torsion angle energy, hydrogen-bond network energy of the
template, and statistical energy of a residue’s solvent accessibility. The
structure refinement module in NEST can refine the models in four levels:
energy minimization of clashing atoms, refinement of insertion and deletion
regions, refinement in all loop regions and refinement in all α/β regions.
Refinement of loop regions is done using LOOPY and refinement of side-chain
conformations is performed using SCAP, where both SCAP and LOOPY use
the colony energy approach to account for the flexibility of side chains and
loops on the protein surface. Refinement of helix or sheet regions is done by a
procedure similar to LOOPY, but the hydrogen constraints in the regular
secondary structure regions are applied so that the refinement does not disrupt
the original hydrogen bond network.
tools. In addition to the molecular graphics, surface display, and electrostatic
features of the original version of GRASP, GRASP2 now integrates structure
alignment and sequence display/alignment tools into the graphical user
interface. These tools allow a user to conveniently search a database of
domains for proteins that are structurally homologous to a given template, and
to simultaneously display/compare different alignments to a template or
alignments to different templates. This is accomplished by carrying out a
multiple structure alignment of a set of templates and then adding alignments of
a query to each template to the multiple structure alignment. Structure
alignments were generated as follows. First, equivalent secondary structure
elements are identified using a double-dynamic programming algorithm. Once
structurally equivalent secondary structure elements are identified, structurally
equivalent residues are identified by superposing the end-points of the
equivalent secondary structure elements and then carrying out an iterative
process of sequence alignment. Residue similarity at this stage is a simple
function of the distance between alpha-carbons given the current rigid body
superposition. A sequence alignment is determined using this similarity score
and rigid body superposition is carried out again. This process is repeated until
the change in root-mean square deviation of aligned carbon-alpha atoms does
not change by more than a given threshold.
The simultaneous
display/comparison of alignments and structures allows convenient
identification of structural features that may be responsible for differences in
the more objective evaluation criteria such as calculation of molecular
mechanics energies or Verify3D profiles [4] and contributed significantly to the
decision as to which model/alignment to submit.
1.
2.
Models were evaluated by comparing energies of the models using a protocol
that combines an extensive molecular mechanics minimization with an
evaluation of the total electrostatic energy using the finite-difference PoissonBoltzman method. Powell minimization using an all-hydrogen model and
CHARMM22 parameters and a dielectric constant of 10 was performed. Low
energy structures were considered for submission. This procedure was
combined with visual evaluation of the models using the program GRASP2
written with the Troll software library of molecular analysis and visualization
3.
4.
5.
A-2
Xiang Z. and Honig B. (2001) Extending the Accuracy Limits of
Prediction for Side Chain Conformations. J. Mol. Biol. 311:421-430.
Xiang Z., Soto C and Honig B. (2002) Evaluating Conformational Free
Energies: The Colony Energy and its Application to the Problem of Loop
Prediction. Proc. Natl. Acad. Sci. USA 99:7432-7437.
Petrey D. and Honig B. (2000) Free Energy Determinants of Tertiary
Structure and the Evaluation of Protein Models. Protein Science 9:21812191.
Luthy R., Bowie J.U. and Eisenberg D. (1992) Assesment of Protein
Models with Three- Dimensional Profiles. Nature 356:83-85.
Jones D. (1999) Protein secondary structure prediction based on positionspecific scoring matrices. J Mol Biol. 292(2):195-202.
6.
Rost B. (1996) PHD: predicting one-dimensional protein structure by
profile-based neural networks. Methods Enzymology. 266:525-39.
7.
Cuff J.A. and Barton G.J. (2000) Application of multiple
sequence alignment profiles to improve protein secondary
structure prediction. Proteins. 240(3): 502-11.
A-3
Download