The Protein Side-Chain Positioning Problem

advertisement
The Side-Chain Positioning Problem
Carl Kingsford
Princeton University
Joint work with Bernard Chazelle and Mona Singh
Proteins
Many functions: Structural, messaging, catalytic, …
Sequence of amino acids strung together on a backbone
Each amino acid has a flexible side-chain
Proteins fold. Function depends highly on 3D shape
V
R
R
C
Protein Structure
Backbone
Side-chains
Side-chain Positioning Problem
Given:
• fixed backbone
• amino acid sequence
Find the 3D positions for the
side-chains that minimize the
energy of the structure
Assume lowest energy is best
IILVPACW…
Side-chain Positioning Applications
Homology-modeling: Use known backbone
of similar protein to predict new structure
Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII
NV CKNG NCY S S + ITDCR G+SKYPNC YKT+
KHII
Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII
Rotamers
Each amino acid has some number of statistically
preferred side-chain positions
These are called rotamers
Continuum of positions is well approximated by
rotamers
3 rotamers of Arginine
An Equivalent Graph Problem
For protein with p side-chains:
V1
V2
p-partite graph:
• part Vi for each side-chain i
• node u for each rotamer
• edge {u,v} if u interacts with v
Weights:
• E(u) = self-energy
• E(u,v) = interaction energy
position
n nodes
rotamer
Feasible Solution
V1
Feasible solution: one node
from each part
cost(feasible) = cost of induced
subgraph
V2
Hard to approximate within a
factor of cn
where n is the # of nodes
rotamer
position
Determining the Energy
0
+
-
electrostatics
• Energy of a protein
conformation is the sum
of several energy terms
van der Waals
bond lengths
bond angles
• No -inequality
A
B
hydrogen bonds
dihedral angles
Plan of Attack
1.Formulate as a quadratic integer program
2.Relax into a semidefinite program
3.Solve the SDP in polynomial time
4.Round solution vectors to choice of
rotamers
Quadratic Integer Program
min
subject to
for each posn j
for each posn j, node v
Relax Into Vector Program
Use xu = xu2 for
to write as pure quadratic
program
Variables  n-dimensional vectors (  )
minimize
subject to
for each posn j
for each node v, posn j
Rewrite As Semidefinite Program
X  (xuv) is PSD  xuv = xuTxv
minimize
subject to
for each posn j
for each node v, posn j
Constraints & Dummy Position
Insert a new position with a single node.
No edges, no node cost.
xu0
Vi
V0
xuv
xvv
position constraints
sum of the node
variables in each
position is 1
flow constraints
sum of edge variables
adjacent to a node
equals that node
variable
Vj
Geometry of the Solution Vectors
Geometry of Solution Vectors
Lemma.
Proof.
Let
. Simple algebra shows that:
• Length of y is 1
• Length of xu0 is 1
• Length of projection of y onto xu0 is 1
Solution Vectors Lie on a Sphere
Each solution vector lies on a sphere of
radius ½ centered at xu0/2:
a2 =
xu 0
a
because
xu
O
Note. Length of projection of xu onto xu0 is
the length of vector xu squared.
How do we round the solution
of the SDP relaxation?
Convert fractional solutions into feasible 0/1 solutions
• Projection rounding
• Perron-Frobenius rounding
Projection Rounding
Since
, the xuu give a probability distribution at
at each position.
Pick node u with probability xuu
xuu = length of the projection
onto xu0.
xu 0
xv
X=
O
xu
Drift for Projection Rounding
Drift   expected difference between
fractional & rounded solutions.
uv = E(u,v)(xuv – Pr[uv])
Comes entirely from pairwise
interactions.
In fact,
Because xu are on a sphere,
By Cauchy-Schwartz,
xu
yu
xv
yv
Perron-Frobenius Rounding
=
q=
0
1
0
=1
0
1
0
=1
0
0
1
=1
0
0
1
0
0
=1
  0/1 characteristic n-vector of optimal solution
Optimal integral X*   T  rank(X*) = 1
Idea: Approximate fractional X by a rank 1 matrix qqT
Want to sample from , but settle for q
q needs to contain probability distributions for each
position. How do we choose q?
0
Possible Choices for q
Lemma. Any nonnegative vector q with L1-norm p in the image
space of X contains the required set of probability distributions.
Proof.
X = WTW, where W = [x1 x2 … xn].
Let 1i  characteristic vector for position i
Suppose q = Xy for some y.
Then,
The final value is independent of i  each position sums to 1.
A Choice for q
By spectral decomposition
where
Take
z1 is in the image space of X.
By Perron-Frobenius theorem for nonnegative matrices  q ≥ 0.
By Lemma, q contains the needed probability distributions.
Computational Results
30 random graphs
 60 nodes, 15 positions
 edge probability ½
 weights uniformly from [0,1]
Compare solutions from
 Simple LP
 SDP Fractional
 Projection rounded
 Perron-Frobenius rounded
Future Work
Can the rounding schemes be applied to other problems?
Can the semidefinite program be sped up?
─ Can only routinely solve graphs with ≤ 120 nodes
(reasonable protein problems contain 1000 to 5000 nodes)
─ xuv ≥ 0 constraints are the bottleneck
Can the requirement of a fixed backbone be relaxed?
We’ve worked quite a bit with real proteins using a LP approach
Seems an SDP formulation might be useful
More Information
The Side-Chain Positioning Problem: A Semidefinite Programming
Formulation with New Rounding Schemes, B. Chazelle, C.
Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of
Computing and Knowledge: Paris Kanellakis Memorial Workshop
(2003).
http://www.cs.princeton.edu/~carlk/papers.html
Download