The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe

advertisement
The Probabilistic Roadmap
Approach to
Study Molecular Motion
Jean-Claude Latombe
Kwan Im Thong Hood Cho Temple Visiting Professor, NUS
Kumagai Professor, Computer Science, Stanford
Molecular motion is an essential
process of life
CspA
Understanding molecular motion
could help cure many diseases
Mad cow disease is
caused by misfolding
Drug molecules act by
binding to proteins
As few experimental tools are
available, computational tools are
critical
Computer simulation:
- Monte Carlo simulation
- Molecular Dynamics
NMR spectrometer
Stanford BioX cluster
But MD and MC simulation have
two major drawbacks
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
But MD and MC simulation have
two major drawbacks
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
Intermediate
states
But MD and MC simulation have
two major drawbacks
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
 Interest in
ensemble properties
Example of Ensemble Property:
Probability of Folding pfold
Measure kinetic distance to folded state
1- pfold
Unfolded state
pfold
Folded state
Other Examples of Ensemble
Properties
 Order of formation of secondary
structure elements
 Average time for a ligand to escape a
binding site
 Folding rate of a protein
 Key intermediates along folding
pathways
 Etc ...
But MD and MC simulation have two
major drawbacks
1) Each simulation run
yields a single pathway,
while molecules tend to
move along many
different pathways
 Interest in
ensemble properties
2) Each simulation run
tends to waste much
time in local minima
Roadmap-Based Representation
 Network of conformations connected by local motion
pathways
 Compact representation of huge number of motion
pathways
 Coarse resolution relative to MC and MD simulation
 Efficient algorithms for analyzing multiple pathways
Roadmaps for Robot Motion Planning
free space
[Kavraki, Svetska, Latombe,Overmars, 95]
Initial Work: Application of
Roadmaps to Ligand Binding
A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning
Approach to Flexible Ligand Binding. Proc. 7th Int. Conf. on
Intelligent Syst. for Molecular Biology (ISMB), pp. 252-261, 1999
 The ligand is modeled as a
flexible molecule, but the
protein is assumed rigid
 A conformation of the ligand is
defined by the position and
orientation of a group of
3 atoms relative to the protein
and by the torsional angles of
the ligand
Roadmap Construction
(Node Generation)
 Conformations of the ligand are sampled at random
around the protein
 The energy E at each sampled conformation is computed:
Waals
E
=
Einteraction =
Einternal =
Einteraction + Einternal
electrostatic + van der Waals potential
Snon-bonded pairs of atoms electrostatic + van der
 A sampled conformation is retained as a node with
probability:
P=
0
Emax-E
Emax-Emin
1
if E > Emax
if Emin  E  Emax
if E < Emin
 Denser distribution of nodes in low-energy regions of
conformational space
Roadmap Construction
(Edge Generation)
q
ε
qi
qi+1
q’
 Each node is connected to each of its closest neighbors
by a straight edge
 Each edge is discretized at some resolution ε (= 1Å)
E
Emax
 If any E(qi) > Emax , then the edge is rejected
Roadmap Construction
(Edge Generation)
q
ε
qi
qi+1
q’
 Each node is connected to each of its closest neighbors
by a straight edge
 Each edge is discretized at some resolution ε (= 1Å)
 If all E(qi)  Emax , then the edge is retained and is
assigned two weights w(qq’) and w(q’q)
where:
w(q  q') =
 -ln(P[i  i+1])
i
Heuristic measure
of energetic difficulty
of moving from q to q’
e-(Ei+1 -Ei )/kT
P[qi  qi+1 ] = -(Ei+1 -Ei )/kT
e
 e-(Ei-1 -Ei )/kT
(probability that the ligand moves from qi to qi+1 when it
is constrained to move along the edge)
Querying the Roadmap
 For a given goal node qg (e.g., binding conformation),
the Dijkstra’s single-source algorithm computes the
lowest-weight paths from qg to each node (in either
direction) in O(N logN) time, where N = number of
nodes
 Various quantities can then
be easily computed in O(N)
time, e.g., average weights
of all paths entering qg and
of all paths leaving qg
(~ binding and dissociation
rates Kon and Koff)
Protein: Lactate dehydrogenase
Ligand: Oxamate (7 degrees of freedom)
Experiments on 3 Complexes
1) PDB ID: 1ldm
Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues)
Ligand: Oxamate (6 atoms, 7 dofs)
2) PDB ID: 4ts1
Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423
atoms, 319 residues)
Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs)
3) PDB ID: 1stp
Receptor: Streptavidin (901 atoms, 121 residues)
Ligand: Biotin (16 atoms, 11 dofs)
Computation of Potential Binding
Conformations
1) Sample many (several 1000’s) ligand’s
conformations at random around protein
active site
2) Repeat several times:
 Select lowest-energy
conformations that are
close to protein surface
 Resample around them
3) Retain k (~10)
lowest-energy
conformations whose
centers of mass are at
least 5Å apart
lactate dehydrogenase
Results for 1ldm


Some potential binding sites have slightly lower energy than the active site
 Energy is not a discriminating factor for recognizing active site
Average path weights (energetic difficulty) to enter and leave binding site
are significantly greater for the active site
 Indicates that the active site is surrounded by an energy barrier that
“traps” the ligand
Application of Roadmaps
to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map
Protein Folding Landscapes and Analyze Folding Kinetics of Known
Native Structures. J. Comp. Biology, 10(2):239-255, 2003
 Known native state
 Degrees of freedom: φ-ψ angles
 Energy: van der Waals, hydrogen bonds,
hydrophobic effect
 New idea: Sampling strategy
Sampling Strategy
(Node Generation)
 High dimensionality
 non-uniform sampling
 Conformations are sampled
using Gaussian distribution
around native state
 Conformations are sorted
into bins by number of
native contacts (pairs of
C atoms that are close
apart in native structure)
 Sampling ends when all bins
have minimum number of
conformations
 “good” coverage of
conformational space
Application: Order of Formation of
Secondary Structure Elements
 The lowest-weight path is extracted
from each denatured conformation to
the folded one
 The order of formation of SSE’s is
computed along each path
 The formation order that appears the
most often over all paths is considered
the SSE formation order of the protein
Order of Formation of Secondary
Structures along a Path
1) The contact matrix showing the time
step when each native contact appears
is built
Protein CI2
(1 + 4 b)
60
5
Protein CI2
(1 + 4 b)
The native contact between residues 5 and 60 appears at step 216
Order of Formation of Secondary
Structures along a Path
1) The contact matrix showing the time
step when each native contact appears
is built
2) The time step at which a structure
appears is approximated as the average
of the appearance time steps of its
contacts
 forms at time step 122 (II)
b3 and b4 come together at 187 (V)
b2 and b3 come together at 210 (IV)
b1 and b4 come together at 214 (III)
Protein CI2
(1 + 4 b)
Application: Order of Formation of
Secondary Structure Elements
 The lowest-weight path is extracted
from each denatured conformation to
the folded one
 The order of formation of SSE’s is
computed along each path
 The formation order that appears the
most often over all paths is considered
the SSE formation order of the protein
Comparison with Experimental Data
SSE’s
roadmap size
1+4b
5126, 70k
3
1+4b
1+5b
5471, 104k
7975, 104k
8357, 119k
Stochastic Roadmaps
M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm
for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-281, 2003
New Idea: Capture the stochastic nature of molecular
motion by assigning probabilities to edges
vi
Pij
vj
Edge Probabilities
Follow Metropolis criteria:
 exp(-ΔEij/kT)
, if ΔEij >0;

Ni

Pij = 
 1 , otherwise.
 Ni

vi
Self-transition probability:
Pii =1- Pij
ji
Pii
Pij
vj
Stochastic Roadmap Simulation
Pij
V
Stochastic roadmap simulation and Monte Carlo simulation
converge to the Boltzmann distribution, i.e., the number of
-E/kT
e
dV
times SRS is at a node in V converges toward Z V
when the number of nodes grows (and they are uniformly
distributed)
Roadmap as Markov Chain
i
Pij
j
 Transition probability Pij depends only on i and j
Probability of Folding pfold
1- pfold
Unfolded state
pfold
Folded state
First-Step Analysis
U: Unfolded state
F: Folded state
l
k
j
Pij
Pik
i
Pil
Pim
Pii
m
Let fi = pfold(i)
After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
First-Step Analysis
U: Unfolded state





F: Folded state
One linear equation per node
Solution gives pfold for all nodesl
k
No explicit simulation
run
j
Pik
Pil
Pij
All pathways are taken
into account
m
Sparse linear system i Pim
Pii
Let fi = pfold(i)
After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1
=1
Number of Self-Avoiding Walks
on a 2D Grid
1, 2, 12, 184, 8512, 1262816,
575780564, 789360053252,
3266598486981642,
(10x10) 41044208702632496804,
(11x11) 1568758030464750013214100,
(12x12) 182413291514248049241470885236
> 1028
http://mathworld.wolfram.com/Self-AvoidingWalk.html
In contrast …
Computing pfold with MC simulation requires:
For every conformation q of interest
 Perform many MC simulation runs from q
 Count number of times F is attained first
Computational Tests
• 1ROP (repressor of
primer)
• 2  helices
• 6 DOF
• 1HDD (Engrailed
homeodomain)
• 3  helices
• 12 DOF
H-P energy model with steric clash exclusion [Sun et al., 95]
pfold for ß hairpin
Immunoglobin binding protein
(Protein G)
Last 16 amino acids
Cα based representation
Go model energy function
42 DOFs
[Zhou and Karplus, `99]
Correlation with MC Approach
1ROP
Computation Times (ß hairpin)
Monte Carlo (30 simulations):
~10 hours of
computer time
Over 107 energy
computations
2000 conformations 23 seconds of
computer time
~50,000 energy
computations
1 conformation
Roadmap:
~6 orders of magnitude speedup!
Using Path Sampling to Construct
Roadmaps
N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build
Better Markovian State Models: Predicting the Folding Rate and
Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics,
121(1):415-425, 2004
New idea:
Paths computed with Molecular Dynamics
simulation techniques are used to create the
nodes of the roadmap
 More pertinent/better distributed nodes
 Edges are labeled with the time needed to
traverse them
Sampling Nodes from Computed
Paths (Path Shooting)
~dt
F
U
Sampling Nodes from Computed
Paths (Path Shooting)
tij
i
U
j
pij
F
Node Merging
 If two nodes are closer apart than some e, they
are merged into one  roadmap
 Rules are applied to update edge probabilities
and times
1
P12, t12
P14, t14
3
2
3
1
P12’, t12’
2’
4
5
P12’ = P12 + P14
t12’ = P12xt12 + P14xt14
5
Application: Computation of MFPT
 Mean First Passage Time: the average time
when a protein first reaches its folded state
 First-Step Analysis yields:
 MPFT(i) = Sj Pij x (tij + MPFT(j))
 MPFT(i) = 0 if i  F
 Assuming first-order kinetics, the probability
that a protein folds at time t is:
Pf (t) = 1 - e-rt
where r is the folding rate
 MFPT =
  P (t)  tdt
0
f
=1/r
Computational Test
 12-residue tryptophan zipper beta hairpin (TZ2)
 Folding@Home used to generate trajectories
(fully atomistic simulation) ranging from 10 to
450 ns
 1750 trajectories (14 reaching folded state)
  22,400-node roadmap
 MFPT ~ 2-9 ms, which is similar to experimental
measurements (from fluorescence and IR)
Conformational Analysis of
Protein Loops
J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric
Algorithms for the Conformational Analysis of Long Protein Loops.
J. Comp. Chemistry, 25:956-967, 2004
New idea:
Explore the clash-free subset of the
conformational space of a loop, by building a
tree-shaped roadmap
Kinematic model: f-y angles on the backbone +
ci torsional angles in side-chains
Amylosucrase (AS)
- Only enzyme in its family that
acts on sucrose substrate
-The 17-residue loop (named loop 7)
between Gly433 and Gly449 is
believed to play a pivotal role
Roadmap Construction
 A tree-shaped roadmap is created
from a start conformation qstart
 At each step of the roadmap
construction, a conformation qrand
of the loop is picked at random, and
a new roadmap node is created by
iteratively pulling toward it the
existing node that is closest to qrand
Roadmap Construction
C
Cclosed
Cfree
qrand
qstart
Stops when one can’t get closer to qrand
or a clash is detected
Computational Results
 Surprisingly, loop 7 can’t move much
 Main bottleneck is residue Asp231
Positions of the
C atom of middle
residue (Ser441)
Computational Results
 If residue Asp231 is “removed”, then loop 7’s
mobility increases dramatically. The C atom of
Ser441 can be displaced by more than 9Å from
its crystallographic position
Conclusion
 Probabilistic roadmaps are a recent, but promising tool
for exploring conformational spaces and computing
ensemble properties of molecular pathways
 Current/future research:
• Better sampling strategies able to handle more
complex molecular models (protein-protein binding)
• More work to include time information in roadmaps
• More thorough experimental validation to compare
computed and measured quantitative properties
Download