Rotamer Packing Problem

advertisement
Rotamer Packing Problem:
The algorithms
Hugo Willy
26 May 2010
Outline
•
•
•
•
•
•
Preliminaries
Problem Formulation
Dead-End Elimination
SCWRL
TreePack
Relevance to my current work
Rotamers



• Protein side chain may
have many different
conformation
• They are mostly defined
by the dihedral angles
(bond length and bond
angle is relatively fixed)
• The figure shows the
dihedral angles of a
glutamic acid’s side
chain.
Rotamers (2)
• The range of the dihedrals is a continuous 0360°. However, there are certain angles that is
preferred because of the energetics. They are
the called gauche+ (+60°), gauche- (-60°) and
trans (180°).
• Those numbers above are approximate values.
Different amino acid would have different
average angle for gauche+ (g+), g- and trans (t).
• These averaged values form a finite number of
possible dihedral angles. They are called
rotamers.
• Hence, rotamers, in a sense, are discretization
of the dihedral angle space of amino acid
residues
Rotamers (3)
• Rotamer libraries are collected by
selecting unrelated PDB structures with
high resolutions.
• With more data available, rotamer libraries
can be built conditional upon the backbone
conformation (phi and psi angle in space
of 10°). They are called the backbonedependent rotamer library.
Side chain interaction
• Each side chain
conformation entails a
set of interaction
between the residue in
question with its
surrounding
neighborhood.
• They can be favorable
or not-favorable based
on their distance and
charge.
• These interactions is
used to score to the
chosen conformation.
The energy function
• Eglobal is the total energy of the system
• Etemplate is the energy of the template backbone
• E(ir) is a function that defines the interaction
energy of residue i with the fixed backbone if it
takes the rotamer r
• E(ir,js) defines the interaction energy between
residues i and j if they adopt the rotamers r and
s resp.
Rotamer Packing Problem
• Given a fixed backbone conformation of a
protein sequence S[1..N] and an
interaction energy scoring function E
• Find the optimal rotamer set {r1,r2,...rN} for
S[1..N] such that the sum of all self and
pairwise residue interactions is minimized
w.r.t E.
Rotamer Packing Problem
• The brute force rotamer search method is exponential in
the number of rotamers per residue. O(nrotN)
• Assuming three conformations per dihedral, residue with
1 dihedral (1) would have 3 rotamers, those with two
would have 9, 3 yields 27 and, ultimately, 4 gives 81
possible rotamers.
• The ones with 4 dihedrals are arginine and lysine—the
only two amino acids with positive charge (Histidine have
a weaker positive charge but it depends on its
environmental pH condition).
• Which says that they are pretty common-esp in TFs and
DNA interacting proteins (DNA carry a net negative
charge).
• We need optimization.
Dead End Elimination a.k.a DEE
(Nature 1992)
• If for some rotamer r of residue i, its sum of
interaction energies with the best rotamers of
other residues w.r.t r is still larger than the
interaction energy of rotamer t of i with the worst
rotamer possible of other residues w.r.t t
• Then r is certainly not in the best rotamer
configuration (r is called dead-ending).
Dead End Elimination (2)
• Extending to rotamer pair, let
• If we have
• Then the rotamer pair r and s is a dead-end
Dead End Elimination (3)
• The DEE is applied in iterative fashion
1. DEE is applied for single rotamers
2. DEE is applied for rotamer pairs and they are
marked. These pairs are then removed from the
possible pairs considered in the single rotamer case
in the next iteration.
• A rotamer r of residue i whose pairing with all
other rotamer of a residue j are marked is also
dead-ending and hence removed.
• In a case study using insulin structure of 76
residues, the initial number of rotamer
configurations is 2.7E+76
• After 9 iterations, only 7200 are left.
SCWRL (Protein Sci. 2003)
• The problem with DEE would be when
there are still a lot of remaining residues
with more than 1 possible rotamer.
• SCWRL models the remaining residues as
a graph where the residues forms the
nodes and an edge is established
whenever two residues have at least a pair
of rotamer configuration whose interaction
energy is non-zero
SCWRL (2)
• Previously, SCWRL
will try to find a
“keystone” node
whose removal would
divide the connectivity
graph to two.
• Then, the energy of
the two parts can be
computed separately.
• Complexity is reduced
from nrot11 to nrot7+nrot5
SCWRL (3)
• In the most recent
improvement, SCWRL splits
the graph into biconnected
components.
• A biconnected component is
a subgraph which can not be
made disconnected by the
removal of only one node.
• They are cycles or nested
cycles. They can be found by
standard DFS based
algorithm (Tarjan 1972)
• This way, SCWRL manage to
have the complexity to be
bound by the size of the
largest cycle in the residue
connectivity graph.
SCWRL (4)
SCWRL (5)
• For the biconnected
components, they use a
branch and bound
algorithm.
• First, since their energy
function only has positive
terms, one can do DFS
and bound the search
based on the energy of
the best path from root to
any leaf.
• One can also bound the
energy contribution of a
residue using the sum of
minimum self and
pairwise energies
between it and its
descendants.
SCWRL (6)
• The energy functions used in SCWRL is a linear
combination of a rotamer probability term and
linear repulsive energy term (van-der-waals
repulsive)
ri = 1 is the probability of
the best rotamer of a
given phi and psi. K is a
fitting parameter set to 3.
r is interatomic distance
between two residues i
and j, Rij is the sum of
van der waals radii of i
and j.
Tree Pack (J. ACM 2006)
• This technique is based on the tree
decomposition technique by Robert and
Seymour 1986.
• Basically given a graph G (V,E), a tree
decomposition (T,X) of G is consist of a tree T (I,
F) and a vertex mapping X which maps the node
in I to a certain subset of V. For each node i  I,
the subset is denoted by Xi.
• Every edge in E must be contained in some Xi.
• For all i, j and k in I, if j is on the path from i to k
then (Xi Xk)  Xj.
Tree Pack (2)
• The width of a tree
decomposition is the
maximum of |Xi| -1
• The tree width of G
is the minimum width
of all possible tree
decomposition over
G
Tree Pack (3)
• The computation of the energy based on a tree
decomposition.
Xr,j
Basically, the
computation is done in
two steps. The first
computes the best
energies bottom-up.
Then the optimal
rotamer configuration
is computed top-down.
Tree Pack (4)
• So the complexity is O(Nnrottw+1).
• Each residue interaction need a minimum distance of Dl
and maximum distance of Du. Residues are defined in a
3D geometric graph.
• Definition: k-ply neighborhood system in R3 is a set of
closed balls in R3 such that no point is strictly inside
more than k balls.
• Sphere separator theorem (Miller, 1997): For every k-ply
neighborhood system, there is a sphere separator S s.t.
–
–
–
–
|NE| <= 4/5 N (NE are the balls outside S).
|NI| <= 4/5 N (NI are the balls within S) and,
|No| = O(k1/3n2/3) where No contains the balls that intersect S.
S can be computed in a linear time randomized algorithm.
Tree Pack (5)
• Given Du and Dl, there is no point inside
more than (1+Du/Dl)3 balls.
• Then given No, we can have an
intersection whose size is at most O(V2/3)
References
• Tarjan, R. 1972. Depth first search and linear graph
algorithms. SIAM J. Comput. 1: 146-160.
• Desmet, J. et. al. 1992. The dead end elimination
theorem and its use in protein side chain positioning.
Nature 356:539-542.
• Canutescu. A. et. al. 2003. A graph theory algorithm for
rapid protein side chain prediction. Protein Sci. 12:20012014
• Xu. J and Berger. B. 2006. Fast and accurate algorithms
for protein side chain packing. J. of the ACM 53:533-557.
Download