March - BioGeometry Project

advertisement
BioGeometry NEWS
March
2004
biogeometry.cs.duke.edu
Research
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem
by Itay Lotan, Henry van den Bedem*, Ashley M. Deacon*, and Jean-Claude Latombe
Motivation: Rapid protein structure determination relies
greatly on the availability of software that can automatically generate a protein model from an experimental electron
density map. In favorable cases, available software can
build over 90% of the final model. However, in less favorable circumstances, particularly at medium to low resolution, only about 2/3 completeness is attained. The electron
density in the gapped areas is often of poorer quality, especially in the flexible loop regions of a protein, making the
manual completion of missing fragments particularly difficult and time-consuming. Automatic computation of these
fragments could speed up the structure resolution process
considerably.
Problem Description: The input to our algorithm is the
electron density map (EDM), the parts of the structure that
were resolved by the automatic model builder, and the two
anchor residues that need to be bridged. In the majority of
partially resolved structures, the amino acid sequence is
correctly assigned. Thus, we assume that the gap length
and residue sequence of the missing fragment are known.
Our goal is to propose candidate structures for the missing
fragment that fall within the radius of convergence of existing refinement tools (1 to 1.5Å RMSD).
General Approach: Our protein model assumes all
bond lengths and bond angles are fixed to their ideal values. The ω angle is fixed at 180 degrees. The degrees of
freedom (DoFs) in our model are the backbone φ and ψ
torsional angles. To simplify our problem we will not have
any side-chain DoFs and therefore no side-chain atoms
except for the Cβ. Side-chains can be built onto the model
once the backbone is fully in place. The Cβ and the backbone oxygen are included because they are essential for
correctly orienting the backbone. Thanks to the kinematic
chain structure of the protein backbone, loop completion
can be approached as an inverse kinematics problem. We
try to compensate for the deficient density information by
taking advantage of the closure constraint to guide the
loop to its correct positioning in space
Method: Our algorithm proceeds in two stages: candidate generation and refinement. In the first stage, candidate loops are built using the Cyclic Coordinate Descent
algorithm. Our implementation puts additional constraints
on the DoFs to take EDM fit and collision avoidance into
account. Next, initial conformations are ranked according
to density fit and conformational likelihood and top-ranking ones are passed on to the refinement procedure. Refinement is achieved by minimizing a target function that
quantifies the goodness of the fit of the conformation and
Highest scoring loop for residues 51 to 64 from TM0423 in cyan,
together with the PDB structure in magenta. The RMSD is 0.25Å.
Highest scoring loop for residues 83 to 96 from TM0813 in magenta,
together with its starting conformation in yellow (output of stage 1).
The RMSD between the starting conformation and the PDB structure is 2.1Å. The refinement procedure reduces it to 0.6Å.
the EDM. An optimization protocol based on simulated annealing and Monte Carlo Minimization (MCM) searches for
the global minimum of the target function while maintaining loop closure. Each candidate is optimized numerous
times and the best scoring loops are returned.
The heart of the refinement stage is a method for minimization with closure constraints. We want to minimize the
target function on the manifold of joint motions that do not
change the position and orientation of the endpoints of the
loop. We exploit the kinematic redundancy of the loop and
use the null-space of the Jacobian matrix of the endpoint
as a linear approximation to this manifold. The gradient of
the target function is projected onto this null-space to gen-
erate minimization steps that do not
break closure. Large random moves
needed for MCM are proposed using
an exact IK solver and by choosing
random directions in the null-space.
Results: We tested our method on
structures resolved at the JCSG. On
a missing 12 residue loop in TM0423
(376 residues, resolved at 2Å, 1KQ3
in PDB, 88% correctly built automatically) we achieved all-atom RMSD as
low as 0.25Å. On a missing 12 residue loop in TM0813 (342 residues, resolved at 2.8, 1J5X in PDB, 61% correctly built automatically) we achieved
all-atom RMSD as low as 0.56Å. Since
we use idealized geometry a discrepancy of 0.2 to 0.3Å is expected even
for an exactly correct loop.
* Joint Center for Structural Genomics, Stanford
Synchrotron Radiation Laboratory (SSRL)
Student Profile: Jeff Phillips
J
eff Phillips, a first-year PhD student in the Department of Computer Science at Duke University,
is working with Pankaj Agarwal. Recently he was awarded a prestigious
NSF Graduate Fellowship. Jeff earned
a BS in Computer Science and a BA in
Mathematics at Rice University, where
he worked with Prof. Lydia Kavraki on
robotic motion planning, physical simulation, and computational biology. At
Duke, Jeff is working on several geometric projects in computational biology using his background in robotics to
gain new insight into the problems. According to Jeff, the array of problems
presented under the BioGeometry
project match perfectly with his background and interests, and it was one
of the main reasons he joined Duke.
At Rice and while at internship at
Draper Laboratories in Houston, Jeff
developed a guided technique for exploring a high-dimensional configuration space with constraints on the
continuity of the space. In robotics
these constraints are represented by
non-holonomic motion constraints,
but these constraints can be thought
of more generally for other problems
such as those required for exploring
protein configurations. This technique,
called Guided Expansive Spaces
Trees, allows efficient exploration of
high-dimensional configurations which
conventional approaches were unable
to handle.
Building upon his work at Rice, Jeff
is investigating the sampling process
in the stochastic roadmap simulations developed by Serkan Apaydin,
Jean-Claude Latombe, and others in
the BioGeometry project. By adapting the guided configurations search
technique, the low-energy configurations can be sampled more densely.
Specifically, the technique allows configurations to be added to the roadmap by sampling at the incremental
knowledge of the Boltzmann distribution while retaining the properties necessary for Markov chain analysis.
Jeff is also working on shape matching
problems. The iterative corresponding
point (ICP) algorithm can quickly align
two 3-dimensional shapes by translating and rotating one of them. He is exploring various generalizations of the
ICP algorithm --- extensions to higher
dimensions and to polygonal surfaces, matching complimentary shapes
under collision constraints, etc. He
is also analyzing the efficiency of the
ICP algorithm.
In collaboration with Dr. Alper Üngör,
Jeff is developing mesh-generation
techniques by searching the configu-
ration space of point placement and
edge topology. A two-dimensional
implementation generates meshes
optimized for specific mesh characteristics after fixing certain point placements and certain edge topologies.
A three-dimensional implementation
is underway. It has two phases, (1) a
weighted random walk of edge-flips on
the edge topology and (2) an energy
minimization/refinement hill-climbing
search. In addition to generating helpful insights into several theoretical
problems related to mesh generation,
this has potential to shed new light
on protein configurations. Using the
protein atoms as points and certain
atomic interactions as edges, a threedimensional mesh with certain properties can represent a protein configuration. An understanding of the energy
landscape of generic meshes may
give insight into that of proteins.
PUBLICATIONS
1
J.M Phillips, N. Bedrossian, and L.E. Kavraki. Guided expansive spaces trees: a
search technique for motion- and costconstrained state spaces. IEEE Internat.
Conf. Robotics Autom. (to appear).
2
J. M. Phillips, L.E. Kavraki, and N. Bedrossian. Spacecraft rendezvous and
docking with real-time, randomized optimization. AIAA Guidance, Navigation,
Control, 2003.
3
J.M Phillips, L.E. Kavraki, and N. Bedrossian. Probabilistic optimization applied
to spacecraft rendezvous and docking. AAS/AIAA Space Flight Mechanics
Meeting, 2003.
4
J.M. Phillips, A. Ladd, and L.E. Kavraki.
Simulated knot tying. IEEE Internat.
Conf. Robotics Autom., 2002.
- Profile by Pankaj K. Agarwal
The BioGeometry project is funded by the National Science Foundation under grant CCR-00-86013. Any opinions, findings, and conclusions or
recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
BioGeometry News is the project's monthly newsletter. For more information, please visit http://biogeometry.cs.duke.edu/newsletter
Download