BioGeometry NEWS March 2004 biogeometry.cs.duke.edu Research Computing Protein Structures from Electron Density Maps: The Missing Loop Problem by Itay Lotan, Henry van den Bedem*, Ashley M. Deacon*, and Jean-Claude Latombe Motivation: Rapid protein structure determination relies greatly on the availability of software that can automatically generate a protein model from an experimental electron density map. In favorable cases, available software can build over 90% of the final model. However, in less favorable circumstances, particularly at medium to low resolution, only about 2/3 completeness is attained. The electron density in the gapped areas is often of poorer quality, especially in the flexible loop regions of a protein, making the manual completion of missing fragments particularly difficult and time-consuming. Automatic computation of these fragments could speed up the structure resolution process considerably. Problem Description: The input to our algorithm is the electron density map (EDM), the parts of the structure that were resolved by the automatic model builder, and the two anchor residues that need to be bridged. In the majority of partially resolved structures, the amino acid sequence is correctly assigned. Thus, we assume that the gap length and residue sequence of the missing fragment are known. Our goal is to propose candidate structures for the missing fragment that fall within the radius of convergence of existing refinement tools (1 to 1.5Å RMSD). General Approach: Our protein model assumes all bond lengths and bond angles are fixed to their ideal values. The ω angle is fixed at 180 degrees. The degrees of freedom (DoFs) in our model are the backbone φ and ψ torsional angles. To simplify our problem we will not have any side-chain DoFs and therefore no side-chain atoms except for the Cβ. Side-chains can be built onto the model once the backbone is fully in place. The Cβ and the backbone oxygen are included because they are essential for correctly orienting the backbone. Thanks to the kinematic chain structure of the protein backbone, loop completion can be approached as an inverse kinematics problem. We try to compensate for the deficient density information by taking advantage of the closure constraint to guide the loop to its correct positioning in space Method: Our algorithm proceeds in two stages: candidate generation and refinement. In the first stage, candidate loops are built using the Cyclic Coordinate Descent algorithm. Our implementation puts additional constraints on the DoFs to take EDM fit and collision avoidance into account. Next, initial conformations are ranked according to density fit and conformational likelihood and top-ranking ones are passed on to the refinement procedure. Refinement is achieved by minimizing a target function that quantifies the goodness of the fit of the conformation and Highest scoring loop for residues 51 to 64 from TM0423 in cyan, together with the PDB structure in magenta. The RMSD is 0.25Å. Highest scoring loop for residues 83 to 96 from TM0813 in magenta, together with its starting conformation in yellow (output of stage 1). The RMSD between the starting conformation and the PDB structure is 2.1Å. The refinement procedure reduces it to 0.6Å. the EDM. An optimization protocol based on simulated annealing and Monte Carlo Minimization (MCM) searches for the global minimum of the target function while maintaining loop closure. Each candidate is optimized numerous times and the best scoring loops are returned. The heart of the refinement stage is a method for minimization with closure constraints. We want to minimize the target function on the manifold of joint motions that do not change the position and orientation of the endpoints of the loop. We exploit the kinematic redundancy of the loop and use the null-space of the Jacobian matrix of the endpoint as a linear approximation to this manifold. The gradient of the target function is projected onto this null-space to gen- erate minimization steps that do not break closure. Large random moves needed for MCM are proposed using an exact IK solver and by choosing random directions in the null-space. Results: We tested our method on structures resolved at the JCSG. On a missing 12 residue loop in TM0423 (376 residues, resolved at 2Å, 1KQ3 in PDB, 88% correctly built automatically) we achieved all-atom RMSD as low as 0.25Å. On a missing 12 residue loop in TM0813 (342 residues, resolved at 2.8, 1J5X in PDB, 61% correctly built automatically) we achieved all-atom RMSD as low as 0.56Å. Since we use idealized geometry a discrepancy of 0.2 to 0.3Å is expected even for an exactly correct loop. * Joint Center for Structural Genomics, Stanford Synchrotron Radiation Laboratory (SSRL) Student Profile: Jeff Phillips J eff Phillips, a first-year PhD student in the Department of Computer Science at Duke University, is working with Pankaj Agarwal. Recently he was awarded a prestigious NSF Graduate Fellowship. Jeff earned a BS in Computer Science and a BA in Mathematics at Rice University, where he worked with Prof. Lydia Kavraki on robotic motion planning, physical simulation, and computational biology. At Duke, Jeff is working on several geometric projects in computational biology using his background in robotics to gain new insight into the problems. According to Jeff, the array of problems presented under the BioGeometry project match perfectly with his background and interests, and it was one of the main reasons he joined Duke. At Rice and while at internship at Draper Laboratories in Houston, Jeff developed a guided technique for exploring a high-dimensional configuration space with constraints on the continuity of the space. In robotics these constraints are represented by non-holonomic motion constraints, but these constraints can be thought of more generally for other problems such as those required for exploring protein configurations. This technique, called Guided Expansive Spaces Trees, allows efficient exploration of high-dimensional configurations which conventional approaches were unable to handle. Building upon his work at Rice, Jeff is investigating the sampling process in the stochastic roadmap simulations developed by Serkan Apaydin, Jean-Claude Latombe, and others in the BioGeometry project. By adapting the guided configurations search technique, the low-energy configurations can be sampled more densely. Specifically, the technique allows configurations to be added to the roadmap by sampling at the incremental knowledge of the Boltzmann distribution while retaining the properties necessary for Markov chain analysis. Jeff is also working on shape matching problems. The iterative corresponding point (ICP) algorithm can quickly align two 3-dimensional shapes by translating and rotating one of them. He is exploring various generalizations of the ICP algorithm --- extensions to higher dimensions and to polygonal surfaces, matching complimentary shapes under collision constraints, etc. He is also analyzing the efficiency of the ICP algorithm. In collaboration with Dr. Alper Üngör, Jeff is developing mesh-generation techniques by searching the configu- ration space of point placement and edge topology. A two-dimensional implementation generates meshes optimized for specific mesh characteristics after fixing certain point placements and certain edge topologies. A three-dimensional implementation is underway. It has two phases, (1) a weighted random walk of edge-flips on the edge topology and (2) an energy minimization/refinement hill-climbing search. In addition to generating helpful insights into several theoretical problems related to mesh generation, this has potential to shed new light on protein configurations. Using the protein atoms as points and certain atomic interactions as edges, a threedimensional mesh with certain properties can represent a protein configuration. An understanding of the energy landscape of generic meshes may give insight into that of proteins. PUBLICATIONS 1 J.M Phillips, N. Bedrossian, and L.E. Kavraki. Guided expansive spaces trees: a search technique for motion- and costconstrained state spaces. IEEE Internat. Conf. Robotics Autom. (to appear). 2 J. M. Phillips, L.E. Kavraki, and N. Bedrossian. Spacecraft rendezvous and docking with real-time, randomized optimization. AIAA Guidance, Navigation, Control, 2003. 3 J.M Phillips, L.E. Kavraki, and N. Bedrossian. Probabilistic optimization applied to spacecraft rendezvous and docking. AAS/AIAA Space Flight Mechanics Meeting, 2003. 4 J.M. Phillips, A. Ladd, and L.E. Kavraki. Simulated knot tying. IEEE Internat. Conf. Robotics Autom., 2002. - Profile by Pankaj K. Agarwal The BioGeometry project is funded by the National Science Foundation under grant CCR-00-86013. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. BioGeometry News is the project's monthly newsletter. For more information, please visit http://biogeometry.cs.duke.edu/newsletter