roadmap-motion

advertisement
Molecular Motion Pathways:
Computation of Ensemble Properties
with Probabilistic Roadmaps
1)
2)
3)
4)
5)
A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to
Flexible Ligand Binding. Proc. 7th Int. Conf. on Intelligent Systems for
Molecular Biology (ISMB), AAAI Press, Menlo Park, CA, pp. 252-261,
1999.
N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein
Folding Landscapes and Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2):239-255, 2003.
M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C.
Varma. Stochastic Roadmap Simulation: An Efficient Representation and
Algorithm for Analyzing Molecular Motion. J. Comp. Biology, 10(3-4):257281, 2003.
N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build
Better Markovian State Models: Predicting the Folding Rate and
Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics,
121(1):415-425, 2004.
J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric
Algorithms for the Conformational Analysis of Long Protein Loops. J.
Comp. Chemistry, 25:956-967, 2004.
Molecular motion is an essential
process of life
Mad cow disease is
caused by misfolding
Drug molecules act by
binding to proteins
So, studying molecular motion is of
critical importance in molecular biology
However, few tools are available
Computer simulation:
- Monte Carlo simulation
- Molecular Dynamics
Stanford BioX cluster
NMR spectrometer
Two Major Drawbacks of
MD and MC Simulation
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
 Interest in ensemble
properties
Unfolded (denatured) state
Intermediate
states
Many pathways
Folded (native) state
Example of Ensemble Property:
Probability of Folding pfold
Measure kinetic distance to folded state
Du, Pande, Grosberg, Tanaka,
and Shakhnovich. On the
Transition Coordinate for
Protein Folding. Journal of
Chemical Physics (1998).
1- pfold
Unfolded state
pfold
Folded state
Other Examples of
Ensemble Properties
 Folding:
• Order of formation of SSE’s
• Folding rate / Mean first passage time
• Key intermediates
 Binding:
• Average time to escape from active site
• Average energy barrier
Two Major Drawbacks of
MD and MC Simulation
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
2) Each simulation run
tends to waste much
time in local minima
 Roadmap-Based Representation
 Compact representation of many motion pathways
 Coarse resolution relative to MC and MD simulation
 Efficient algorithms for analyzing multiple pathways
Roadmaps for Robot Motion Planning
free space
[Kavraki, Svetska, Latombe,Overmars, 96]
Initial Work
A.P. Singh, J.C. Latombe, and D.L. Brutlag.
A Motion Planning Approach to Flexible Ligand Binding.
Proc. 7th ISMB, pp. 252-261, 1999
 Study of ligand-protein binding
 The ligand is a small flexible molecule, but the protein
is assumed rigid
 A fixed coordinate system P is
attached to the protein and a
moving coordinate system L is
defined using three bonded atoms
in the ligand
 A conformation of the ligand is
defined by the position and
orientation of L relative to P
and the torsional angles of the ligand
Roadmap Construction
(Node Generation)
 The nodes of the roadmap are generated by sampling
conformations of the ligand uniformly at random in the
parameter space (around the protein)
 The energy E at each sampled conformation is computed:
Waals
E
=
Einteraction =
Einternal =
Einteraction + Einternal
electrostatic + van der Waals potential
Snon-bonded pairs of atoms electrostatic + van der
Roadmap Construction
(Node Generation)
 The nodes of the roadmap are generated by sampling
conformations of the ligand uniformly at random in the
parameter space (around the protein)
 The energy E at each sampled conformation is computed:
Waals
E
=
Einteraction =
Einternal =
Einteraction + Einternal
electrostatic + van der Waals potential
Snon-bonded pairs of atoms electrostatic + van der
 A sampled conformation is retained as a node of the
roadmap with probability:
P=
0
Emax-E
Emax-Emin
1
if E > Emax
if Emin  E  Emax
if E < Emin
 Denser distribution of nodes in low-energy regions of
conformational space
Roadmap Construction
(Edge Generation)
q
qi
qi+1
q’
 Each node is connected to its closest neighbors by
straight edges
 Each edge is discretized so that between qi and qi+1 no
atom moves by more than some ε (= 1Å)
E
Emax
 If any E(qi) > Emax , then the edge is rejected
Roadmap Construction
(Edge Generation)
q
qi
q’
qi+1
 Any two nodes closer apart than some threshold
distance are connected by a straight edge
 Each edge is discretized so that between qi and qi+1 no
atom moves by more than some ε (= 1Å)
 If all E(qi)  Emax , then the edge is retained and is
assigned two weights w(qq’) and w(q’q)
where:
w(q  q') =
 -ln(P[q  q
i
i
i+1
])
Heuristic measure
of energetic difficulty
or moving from q to q’
e-(Ei+1 -Ei )/kT
P[qi  qi+1 ] = -(Ei+1 -Ei )/kT
e
 e-(Ei-1 -Ei )/kT
(probability that the ligand moves from qi to qi+1 when it
is constrained to move along the edge)
Querying the Roadmap
 For a given goal node qg (e.g., binding conformation),
the Dijkstra’s single-source algorithm computes the
lowest-weight paths from qg to each node (in either
direction) in O(N logN) time, where N = number of
nodes
 Various quantities can then
be easily computed in O(N)
time, e.g., average weights
of all paths entering qg and
of all paths leaving qg
(~ binding and dissociation
rates Kon and Koff)
Protein: Lactate dehydrogenase
Ligand: Oxamate (7 degrees of freedom)
Experiments on 3 Complexes
1) PDB ID: 1ldm
Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues)
Ligand: Oxamate (6 atoms, 7 dofs)
2) PDB ID: 4ts1
Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423
atoms, 319 residues)
Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs)
3) PDB ID: 1stp
Receptor: Streptavidin (901 atoms, 121 residues)
Ligand: Biotin (16 atoms, 11 dofs)
Computation of Potential Binding
Conformations
1) Sample many (several 1000’s) ligand’s
conformations at random around protein
active site
2) Repeat several times:
 Select lowest-energy
conformations that are
close to protein surface
 Resample around them
3) Retain k (~10)
lowest-energy
conformations whose
centers of mass are at
least 5Å apart
lactate dehydrogenase
Results for 1ldm


Some potential binding sites have slightly lower energy than the active site
 Energy is not a discriminating factor
Average path weights (energetic difficulty) to enter and leave binding site
are significantly greater for the active site
 Indicates that the active site is surrounded by an energy barrier that
“traps” the ligand
Energy
Potential
binding
site
Active
site
Potential
binding
site
Conformation
Application of Roadmaps
to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map
Protein Folding Landscapes and Analyze Folding Kinetics of Known
Native Structures. J. Comp. Biology, 10(2):239-255, 2003
 Known native state
 Degrees of freedom: φ-ψ angles
 Energy: van der Waals, hydrogen bonds,
hydrophobic effect
 New idea: Sampling strategy
 Application: Finding order of SSE formation
Sampling Strategy
(Node Generation)
 High dimensionality
 non-uniform sampling
 Conformations are sampled
using Gaussian distribution
around native state
 Conformations are sorted
into bins by number of
native contacts (pairs of
C atoms that are close
apart in native structure)
 Sampling ends when all bins
have minimum number of
conformations
 “good” coverage of
conformational space
Application: Order of Formation
of Secondary Structures
 The lowest-weight path is extracted
from each denatured conformation to
the folded one
 The order of formation of SSE’s is
computed along each path
 The formation order that appears the
most often over all paths is considered
the SSE formation order of the protein
Method
1) The contact matrix showing the time
step when each native contact appears
is built
Protein CI2
(1 + 4 b)
60
5
Protein CI2
(1 + 4 b)
The native contact between residues 5 and 60 appears at step 216
Method
1) The contact matrix showing the time
step when each native contact appears
is built
2) The time step at which a structure
appears is approximated as the average
of the appearance time steps of its
contacts
 forms at time step 122 (II)
b3 and b4 come together at 187 (V)
b2 and b3 come together at 210 (IV)
b1 and b4 come together at 214 (I)
 and b4 come together at 214 (III)
Protein CI2
(1 + 4 b)
Method
1) The contact matrix showing the time
step when each native contact appears
is built
2) The time step at which a structure
appears is approximated as the average
of the appearance time steps of its
contacts
Comparison with
Experimental Data
CI2
SSE’s
roadmap size
1+4b
5126, 70k
3
1+4b
1+5b
5471, 104k
7975, 104k
8357, 119k
Stochastic Roadmaps
M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm
for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-281, 2003
New Idea: Capture the stochastic nature of molecular
motion by assigning probabilities to edges
vi
Pij
vj
Edge probabilities
 exp(-ΔEij/kT)
, if ΔEij >0;

Ni

Follow Metropolis criteria: Pij = 
 1 , otherwise.
 Ni

vi
Self-transition probability:
Pii =1- Pij
Pii
Pij
ji
[Roadmap nodes are sampled uniformly at random and energy profile
along edges is not considered]
vj
Stochastic Roadmap Simulation
Pij
V
Stochastic roadmap simulation and Monte Carlo simulation
converge to the Boltzmann distribution, i.e., the number of
-E/kT
e
dV
times SRS is at a node in V converges toward V
when the number of nodes grows (and they are uniformly
distributed)
Roadmap as Markov Chain
i
Pij
j
 Transition probability Pij depends only on i and j
Example #1:
Probability of Folding pfold
1- pfold
Unfolded state
pfold
Folded state
First-Step Analysis
U: Unfolded state





F: Folded state
One linear equation per node
l
Solution gives pfold for all nodes
No explicit simulation runk
j
Pik
Pil
All pathways are taken into account
Pij
m
Sparse linear system
i
Pim
Pii
Let fi = pfold(i)
After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1
=1
Number of Self-Avoiding Walks
on a 2D Grid
1, 2, 12, 184, 8512, 1262816,
575780564, 789360053252,
3266598486981642,
(10x10) 41044208702632496804,
(11x11) 1568758030464750013214100,
(12x12) 182413291514248049241470885236 > 1028
http://mathworld.wolfram.com/Self-AvoidingWalk.html
In contrast …
Computing pfold with MC simulation requires:
For every conformation q of interest
 Perform many MC simulation runs from q
 Count number of times F is attained first
Computational Tests
• 1ROP (repressor of
primer)
• 2  helices
• 6 DOF
• 1HDD (Engrailed
homeodomain)
• 3  helices
• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
Correlation with MC Approach
1ROP
pfold for ß hairpin
Immunoglobin binding protein
(Protein G)
Last 16 amino acids
Cα based representation
Go model energy function
42 DOFs
[Zhou and Karplus, `99]
Computation Times (ß hairpin)
Monte Carlo (30 simulations):
~10 hours of
computer time
Over 107 energy
computations
2000 conformations 23 seconds of
computer time
~50,000 energy
computations
1 conformation
Roadmap:
~6 orders of magnitude speedup!
Example #2:
Ligand-Protein Interaction
Computation of escape time
from funnels of attraction
around potential binding sites
Funnel of attraction = ball of
10Å rmsd around bound state
[Camacho and Vajda, 01]
Computation Through Simulation
[Sept, Elcock and McCammon `99]
10K to 30K independent simulations
Computing Escape Time with Roadmap
l
k
j
Pil
Pik
Pij
i
Pii
m
Pim
Funnel of Attraction
ti = 1 + Pii ti + Pij tj+ Pik tk + Pil tl + Pim tm
=0
(escape time is measured as number of steps
of stochastic simulation)
Distinguishing Active Site
Given several potential binding sites,
which one is the active one?
Energy: electrostatic + van der Waals + solvation free energy terms
Complexes Studied
ligand
protein # random
nodes
#
DOFs
oxamate
1ldm
8000
7
Streptavidin
1stp
8000
11
Hydroxylamine
4ts1
8000
9
COT
1cjw
8000
21
THK
1aid
8000
14
IPM
1ao5
8000
10
PTI
3tpi
8000
13
Distinction Using Escape Time
Protein
1stp
4ts1
3tpi
1ldm
1cjw
1aid
1ao5
Bound
state
3.4E+9
3.8E+10
1.3E+11
8.1E+5
5.4E+8
9.7E+5
6.6E+7
Best potential
binding site
1.1E+7
1.8E+6
5.9E+5
3.4E+6
4.2E+6
1.6E+8
5.7E+6
Able to
distinguish
catalytic site
Not able
(# steps)
Using Path Sampling to Construct
Roadmaps
N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build
Better Markovian State Models: Predicting the Folding Rate and
Mechanism of a Tryptophan Zipper Beta Hairpin,
J. Chemical Physics, 121(1):415-425, 2004
New idea:
Paths computed with Molecular Dynamics
simulation techniques are used to create the
nodes of the roadmap
 More pertinent/better distributed nodes
 Edges are labeled with the time needed to
traverse them
Sampling Nodes from Computed
Paths (Path Shooting)
~dt
F
U
Sampling Nodes from Computed
Paths (Path Shooting)
tij
i
j
pij
F
U
Example: Langevin dynamics equation of motion
dx
is Fext -mγ
+R=0 where R is a Gaussian random force
dt
Node Merging
If two nodes are closer apart than some e, they
are merged into one and merging rules are
applied to update edge probabilities and times
1
P12, t12
P14, t14
3
2
3
1
P12’, t12’
2’
4
5
P12’ = P12 + P14
t12’ = P12xt12 + P14xt14
5
Node Merging
If
two
nodes are closer
apart than
some e, they
Approximately
uniform
distribution
are merged into one and merging rules are
of nodes over the reachable subset of
applied to update edge probabilities and times
conformational space
1
P12, t12
P14, t14
3
2
3
1
P12’, t12’
2’
4
5
P12’ = P12 + P14
t12’ = P12xt12 + P14xt14
5
Application: Computation of MFPT
 Mean First Passage Time: the average time
when a protein first reaches its folded state
 First-Step Analysis yields:
 MPFT(i) = Sj Pij x (tij + MPFT(j))
 MPFT(i) = 0 if i  F
 Assuming first-order kinetics, the probability
that a protein folds at time t is:
Pf (t) = 1 - e-rt
where r is the folding rate

 MFPT =
  P (t)  tdt
0
f
=1/r
Computational Test
 12-residue tryptophan zipper beta hairpin (TZ2)
 Folding@Home used to generate trajectories
(fully atomistic simulation) ranging from 10 to
450 ns
 1750 trajectories (14 reaching folded state)
  22,400-node roadmap
 MFPT ~ 2-9 ms, which is similar to experimental
measurements (from fluorescence and IR)
Conformational Analysis of
Protein Loops
J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric
Algorithms for the Conformational Analysis of Long Protein Loops.
J. Comp. Chemistry, 25:956-967, 2004
New idea:
Explore the clash-free subset of the
conformational space of a loop, by building a
tree-shaped roadmap
Kinematic model: f-y angles on the backbone +
ci torsional angles in side-chains
Amylosucrase (AS)
- Only enzyme in its family that
acts on sucrose substrate
-The 17-residue loop (named loop 7)
between Gly433 and Gly449 is
believed to play a pivotal role
Roadmap Construction
 A tree-shaped roadmap is created
from a start conformation qstart
 At each step of the roadmap
construction, a conformation qrand
of the loop is picked at random, and
a new roadmap node is created by
iteratively pulling toward it the
existing node that is closest to qrand
Roadmap Construction
C
Cclosed
Cfree
qrand
qstart
Roadmap Construction
C
Cclosed
Cfree
qrand
qstart
Roadmap Construction
C
Cclosed
Cfree
qrand
qstart
Roadmap Construction
C
Cclosed
Cfree
qrand
qstart
Stops when one can’t get closer to qrand
or a clash is detected
Computational Results
 Surprisingly, loop 7 can’t move much
 Main bottleneck is residue Asp231
Positions of the
C atom of middle
residue (Ser441)
Computational Results
 Surprisingly, loop 7 can’t move much
 Main bottleneck is residue Asp231
Computational Results
 If residue Asp231 is “removed”, then loop 7’s
mobility increases dramatically. The C atom of
Ser441 can be displaced by more than 9Å from
its crystallographic position
Conclusion
 Probabilistic roadmaps are a recent, but promising tool
for exploring conformational space and computing
ensemble properties of molecular pathways
 Current/future research:
• Better sampling strategies able to handle more
complex molecular models (protein-protein binding)
• More work to include time information in roadmaps
• More thorough experimental validation to compare
computed and measured quantitative properties
Download