Honig (P0110) - 113 predictions: 113 3D Comparative Modeling Using HMAP, NEST, Troll and Physical-Chemical Principles Zhexin Xiang1,2 , Donald Petrey1,2, Cinque Soto2, Chris Tang3 and Barry Honig1,2 1 Howard Hughes Medical Institute, 2 Department Of Biochemistry And Molecular Biophysics, 3 Integrated Program in Cellular, Molecular and Biophysical Studies, Columbia University bh6@columbia.edu Overview - We participated in the fold recognition and homology sections of the experiment using primarily in-house software. Much of this software is novel and has not yet been published. The in-house software we used includes HMAP (a hybrid sequence and structure based alignment between query and template profiles), NEST (a new homology modeling program that is based on an artificial evolution method), SCAP [1] and LOOPY [2] (a side-chain and loop prediction program based on the colony energy approach), Troll [3] /GRASP2 (an interactive program which contains all of the features of GRASP plus multiple structure alignments and an easy to use graphical user interface that displays both sequence and structure alignments), DIFALN/BINGO (a graphical program to display and manually tune sequence alignments between HMAP and CAFASP servers) and physical-chemical based energy functions to evaluate alternate conformations. Our strategies for fold recognition and homology modeling were very similar. For fold recognition we generally attempted targets where HMAP detected templates with a reasonable e-value threshold, or where we felt that HMAP improved the alignments that came from the CAFASP servers. On occasion, we noticed that CAFASP servers would detect significant hits where HMAP did not. In all cases, this happened because the hit detected by the servers was not in our database. Thus, we built a profile using HMAP for the new template and used it to generate our own alignments. If we felt we had nothing to add beyond what the servers listed, we decided not to submit that target. For each target we would perform the following: 1) build 3D models for sequence alignments from HMAP and selected CAFASP servers; 2) evaluate each model with our own energy functions and with Verify3D [4]; and 3) identify regions of the sequence where multiple structure alignments of family members revealed either similarities or differences. If differences were identified, we generally used energetic criteria to decide between models, but on occasion used intuition derived from visually inspecting the alignments. The alignments were adjusted based on the energy criteria and steps 1-3 above were carried out again. This process was repeated until a satisfactory structure was generated. One area where visual inspection was particularly useful was in deleting insertions. In many cases we could easily delete loops and even some secondary structure elements while minimally perturbing the structure. Our strategy for homology modeling was closely related to that used in fold recognition but with a few additional steps. Since NEST works so rapidly we were able to use regions from different templates where we believed they provided better local templates, and then fuse the ends of these regions into our original template with a loop closure procedure [2]. In general, we did not try to keep the target as close as possible to the template. We realized that this was a risky procedure but we felt it important to test our ability, for example using the refinement module of NEST, to try to relax the structure. This was sometimes done with manual input. For example we always tested for buried charges and unless we could visually identify a potential ion-pairing partner we would either change the alignment or try to change the structure. This involved both backbone and side chain movement. Methods-HMAP is a fold-recognition and alignment program that relies on profile-to-profile dynamic programming. Template profiles were derived from SCOP-defined protein domains (version 1.57 at the time we built our database) and consisted of several different types of information that could be derived from the sequence and structure of a protein. In CASP5, our templates primarily used information derived from secondary structure, fixed-length sequence motifs, automated multiple structure alignments and sequence-based profiles. Position-specific gap penalties were derived from the secondary structure profiles generated from multiple alignments of structurally related proteins. The results were stored in the form of a database of structural templates. Profiles were calibrated so that the statistical significance of a hit could be estimated. When a new target was released from CASP, we built a query profile for the sequence based on its sequence-based profile and secondary structure prediction (using a consensus between PSI-PRED [5], PHD [6] and JNET [7]). The alignments given by HMAP were manually assessed and then fed to the homology-modeling program NEST. NEST is a homology program based on an artificial evolution method (http://trantor.bioc.columbia.edu/~xiang/jackal). The program can build and refine homology models based on single, composite or multiple templates. Given an alignment between a query sequence and a template, the alignment can be considered as a list of operations such as residue mutation, insertion or deletion. Building a structure for the query sequence based on the template is a process of performing these operations. Each operation will disturb the template structure and involves an energy cost, either positive or negative. The model building starts from the operation with the least energy cost and so on. Each operation is finished with a slight energy minimization to remove atomic clashes. The final structure is then subjected to more thorough energy minimization. The minimization is done in torsion angle space. The energy function consists of the following terms: van der Waals energy, hydrophobic, electrostatics, torsion angle energy, hydrogen-bond network energy of the template, and statistical energy of a residue’s solvent accessibility. The structure refinement module in NEST can refine the models in four levels: energy minimization of clashing atoms, refinement of insertion and deletion regions, refinement in all loop regions and refinement in all α/β regions. Refinement of loop regions is done using LOOPY and refinement of side-chain conformations is performed using SCAP, where both SCAP and LOOPY use the colony energy approach to account for the flexibility of side chains and loops on the protein surface. Refinement of helix or sheet regions is done by a procedure similar to LOOPY, but the hydrogen constraints in the regular secondary structure regions are applied so that the refinement does not disrupt the original hydrogen bond network. tools. In addition to the molecular graphics, surface display, and electrostatic features of the original version of GRASP, GRASP2 now integrates structure alignment and sequence display/alignment tools into the graphical user interface. These tools allow a user to conveniently search a database of domains for proteins that are structurally homologous to a given template, and to simultaneously display/compare different alignments to a template or alignments to different templates. This is accomplished by carrying out a multiple structure alignment of a set of templates and then adding alignments of a query to each template to the multiple structure alignment. Structure alignments were generated as follows. First, equivalent secondary structure elements are identified using a double-dynamic programming algorithm. Once structurally equivalent secondary structure elements are identified, structurally equivalent residues are identified by superposing the end-points of the equivalent secondary structure elements and then carrying out an iterative process of sequence alignment. Residue similarity at this stage is a simple function of the distance between alpha-carbons given the current rigid body superposition. A sequence alignment is determined using this similarity score and rigid body superposition is carried out again. This process is repeated until the change in root-mean square deviation of aligned carbon-alpha atoms does not change by more than a given threshold. The simultaneous display/comparison of alignments and structures allows convenient identification of structural features that may be responsible for differences in the more objective evaluation criteria such as calculation of molecular mechanics energies or Verify3D profiles [4] and contributed significantly to the decision as to which model/alignment to submit. 1. 2. Models were evaluated by comparing energies of the models using a protocol that combines an extensive molecular mechanics minimization with an evaluation of the total electrostatic energy using the finite-difference PoissonBoltzman method. Powell minimization using an all-hydrogen model and CHARMM22 parameters and a dielectric constant of 10 was performed. Low energy structures were considered for submission. This procedure was combined with visual evaluation of the models using the program GRASP2 written with the Troll software library of molecular analysis and visualization 3. 4. 5. A-2 Xiang Z. and Honig B. (2001) Extending the Accuracy Limits of Prediction for Side Chain Conformations. J. Mol. Biol. 311:421-430. Xiang Z., Soto C and Honig B. (2002) Evaluating Conformational Free Energies: The Colony Energy and its Application to the Problem of Loop Prediction. Proc. Natl. Acad. Sci. USA 99:7432-7437. Petrey D. and Honig B. (2000) Free Energy Determinants of Tertiary Structure and the Evaluation of Protein Models. Protein Science 9:21812191. Luthy R., Bowie J.U. and Eisenberg D. (1992) Assesment of Protein Models with Three- Dimensional Profiles. Nature 356:83-85. Jones D. (1999) Protein secondary structure prediction based on positionspecific scoring matrices. J Mol Biol. 292(2):195-202. 6. Rost B. (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymology. 266:525-39. 7. Cuff J.A. and Barton G.J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 240(3): 502-11. A-3