Chris Sweet, University of Notre Dame

advertisement
Chris Sweet,
University of Notre Dame
 
Collaborative work with:
◦  Jesus A. Izaguirre (Notre Dame)
◦  Paula Petrone (Stanford)
◦  Vijay S. Pande (Stanford)
◦  Eric Darve (Stanford)
 
Support from Simbios for GPU implementations
 
Support from NSF (JAI) and NIH (VSP)
 
WW folding analysis by Faruck Morcos , Santanu Chatterjee, (ND)
and James Sweet (Staffs)
 
Reach long timescales (microsecond to millisecond)
 
Model large conformational changes:
◦  Protein folding
◦  Allosteric regulation
◦  Etc.
 
Use a modest computational budget:
◦  With GPU implementation.
 
Several independent sources of evidence show that
conformational change of biomolecules can be
described within a low frequency motion space
◦  Low frequency normal modes
◦  Elastic network models
◦  Principal component analysis
 
However, linear bases such as these cannot properly
describe large conformational change – these analyses
are appropriate only locally
 
Determine collective or slow variables on-the-fly
◦  Based on coarse-grained normal modes (CNMA)
◦  Cheap update
  Currently 2-level method is O(N9/5) time
  Proposed multi-level method is O(N log N) time
 
Use implicit solvent model
◦  Currently, screened Coulomb potential+
◦  Generalized-Born and Poisson-Boltzmann planned
 
 
 
Our analysis shows what we can analyze dynamically
◦  No need to decide slow variables a priori
Cost of CNMA can be made comparable to a force
computation with a modern force field
◦  Scalable to very large systems (more than 50,000
atoms can be handled in single core)
Numerical discretizations of equations of motion can
increase time step stably
◦  We have used time steps of up to 500 fs to date.
The amortized cost of a step of NML is about 2x to
3x that of a step of MD.
 
The protein or biomolecule can usually be described by
a separable Hamiltonian of the form
1 T −1
H(x, p) = p M p + U (x).
2
 
If we expand the potential energy at local minimum, the
Hessian H is a factor in the first non-constant, non-zero
term of the expansion.
 
A harmonic approximation to the original system can
be found by truncating the expansion at this point.
We can rewrite the harmonic approximation as a set of
oscillators by diagonalizing the mass weighted Hessian
 
− 12
M
− 12
HM
Q = QΛ,
 
We can project between the mode space and Cartesian
space using the eigenvectors and mass matrix
1
2
− 21
c = Q M (x − x0 ), x = M
T
 
 
 
Qc + x0 .
The system is partitioned
! at the eigenvector with
|λm | < fc
ordered eigenvalue
This gives column eigenvector matrix
Q = [e1 , e2 , · · · , em ]
We can now split the forces
f̂
f̄
1
2
− 12
= Pf f = M QQ M f
" −1
1 !
T
⊥
= Pf f = M 2 I − QQ M 2 f ,
T
 
 
 
Propagation of the slow d.o.f. is easy using the Langevin
equation
dx̂ = v̂dt,
!
1
1
2
Mdv̂ = f̂ dt − Γ̂Mv̂dt + 2kB T Γ̂ Pf M 2 dW1 (t).
How to propagate the fast d.o.f. ?
MTS integrator: Since the ratio of step sizes is very large we
must ‘over-damp’ to prevent resonances
!
1
−1
−1
− 12
−1 ⊥
dx̄ = Γ̄ M f̄ dt + 2kB T Γ̄ M Pf M 2 dW2 (t).
 
BUT Euler-Maruyama = Poor sampling AND no dynamics.
 
Minimizer:
dx̄ = ηM−1 f̄ dt.
!"#
!##
"
"#
#
!"#
!!##
!!"#
!!"# !!## !"#
#
!
"#
!##
!"#
Propagation of first 20 modes only for Alanine Dipeptide, starting at C7
equatorial. 100ns simulation.
1000
! /!
Maximum step size fs
i
0
ratio of square root
of eigenvalues
Maximum step size
obtained
100
10
1
1
500
1000
1500
2000
Mode number
2500
Results from BPTI model, 57 residues.
Maximum speedups for different size proteins:
AD (22 atoms); WW (33 res.); BPTI (57 res.); CaM
(148 res.); Src Kinase (449 res.)
http://normalmodes.cse.nd.edu
Rate of NML(m,f) where m is the number of modes
and f is rediagonalization frequency. LI MD gives 4.7/ns.
'
12)*-,3-!#
&
45
674845,#!!3674844,#!!3-
%
$
!
!
"
#!
()*+,-./*,0-
#"
Using 12 modes NML can compute isomerization rate with
16 fs time steps. LI over-dampens as step size is increased,
BBK unsuitable as γ∆t > 1 . LL-Langevin Leapfrog.
v
n+ 12
x
n+1
vn+1
= e−γ
∆t
2
vn +
!
1 − e−γ
γ
$
#
− 12
+ 2kB T γ M
= x + ∆t v
n
n+ 12
,
!
∆t
2
"
M−1 f (xn )
1 − e−γ∆t n
Z ,
2γ
∆t
2
"
−γ
1
−
e
= e−γ v
+
M−1 f (xn+1 )
γ
$
#
1 − e−γ∆t n+1
− 12
+ 2kB T γ M
Z
.
2γ
∆t
2
n+ 12
Illustration of the dimensionality reduction strategy for
the diagonalization. If the vectors in E span the low
frequency space of interest in H, then the diagonalization
of S can produce a low frequency basis set.
 
 
We form blocks of
contiguous residues
and do a full
diagonalization of
their Hessian:
H̃ii Qi = Qi Di
Low frequency
interactions among
blocks are in the first
10 – 15 eigenvectors
of Qi .
Source of vectors on E:
1.  Nonbonded interactions between blocks
manifest themselves in block rotations and
translations (and map to null space, first 6 vectors)
2.  Backbone dihedral motions map to first 4 non-zero
eigenvectors
3.  Side-chain dihedral motions map to next non-zero
eigenvectors
ui
i
= α1i v1 + α2i v2 + · · · + α3N
v3N ,
uT
i Huj
σi = uT
i Hui .
= α1i α1j v1T Hv1 + α2i α2j v2T Hv2 + · · ·
j
i
T
+α3N
α3N
v3N
Hv3N ,
j
i
= α1i α1j λ1 + · · · + α3N
α3N
λ3N = 0,
uT
i uj
j
i
= α1i α1j + α2i α2j + · · · + α3N
α3N
= 0.
If there are no degenerate eigenvectors are we get an
upper bound
αki αkj = 0 ∀k ⇒ vi ∈ C, vi ∈
/ C ⊥ , ∃ul = vi
√
then fmax ≤ σm . guaranteeing stability.
Coarse-grained Rayleigh quotients and true eigenvalues for
BPTI.
  2-level
CNMA using b blocks:
◦  Cost of diagonalizing b blocks:
O
!
3
N
b2
"
◦  Cost of diagonalizing S:
! 3"
O b
3
5
◦  This is minimized when b ∝! N "
9
◦  Total scaling is
O N5
  Space
use is between O (N ) and O
!
N
3
2
"
 
 
 
 
 
The 2-level coarse-grained
! 9 "diagonalization of the the
Hessian has cost O N 5
We can reuse this result to reduce the cost of
diagonalizing the blocks.
Recursive application log N times
of this idea gives a multi-level algorithm
with cost of
time and
memory.
O (N log N )
O (N )
Compute (coarse-grained) low frequency modes
Form block Hessian
Multi-level diagonalization
Propagate dynamics on space of low frequency motions
Compute all-atom forces
Project forces onto subspace
Minimize forces in the fast frequency space
Compute fast forces
Iterate until convergence
Ensign and Pande (2009) simulated the FIP35 WW domain
mutant (33 residues) using GB implicit solvent, Langevin
dynamics, and Folding@Home
◦  10,000 simulations total were run
◦  1,000 sims were at T=300 K and γ=91 ps-1
◦  2 of these simulations folded (out of 33 total at different
viscosity and temperature) – folding was defined as
formation of the 3 β-sheets and an Cα RMSD of less
than 3 Å
◦  Sampling for the 1000 sims was around 197
microseconds
We tested NML with 4 real modes;
timesteps of 50 and 100 fs;
and rediagonalization every 1 or 2 ps at
temperature 300 K and viscosity γ=91 ps-1
◦  We use CHARMM with screened Coulomb potential.
After analyzing RMSD of plain MD simulations for
folded FIP35 we found it to be higher than for Ensign
and Pande’s AMBER with Generalized Born
simulations.
◦  We redefined the “folded” state to allow for an RMSD
of less than 5 Å, as opposed to 3 Å in their work.
 
NML Simulation Results:
◦  3 NML simulations folded out of
600 (RMSD < 5 Å).
◦  5 NML simulations folded out of
600 (RMSD < 6 Å)
◦  Sampling for the 600 simulations
was around 23 microseconds
 
F@H Simulation Results:
◦  5 F@H simulations folded out
of 1000 (RMSD < 5 Å)
◦  9 F@H simulations folded out
of 1000 (RMSD < 6 Å)
◦  Sampling for the 1000 sims
was around 197 microseconds
Rates are off only by a factor of 5 in this case:
F@H, 19 µs-1
NML, 4 µs-1
Experimental, 13 µs-1
 
For WW domain:
◦  MD 1fs: 6 ns/day
◦  NML 50fs: 130 ns/day
 
◦  NML 100fs: 209 ns/day
For Calmodulin (open):
 
◦  Speedups of 80 for timestep of 200fs
For F1-ATPase:
◦  Speedups of 170 for timestep of 500fs
 
 
 
Coarse-grained normal modes can effectively be used as
collective slow variables
By re-computing these normal modes periodically one
can recover kinetics (to within logarithmic accuracy)
We presented algorithms for:
◦  Computing normal modes on O(N9/5) and O(N log N)
time
◦  Computing dynamics of these normal modes to
model conformational change (folding)
 
Currently in protomol 3.0:
 
◦  http://protomol.sourceforge.net
Information web page:
 
◦  http://normalmodes.info
Submission server:
 
◦  http://lcls.cse.nd.edu/submission
Will be in future OpenMM releases
 
Papers available from
◦  http://www.nd.edu/~izaguirr
 
Improved model
 
◦  Other solvation models (GB, PBE, Ions, Semi-explicit)
◦  Include hydrodynamics tensor for implicit solvent, and
friction tensor for fast modes (with Eric Darve,
Stanford)
Better algorithms
◦  Fully adaptive NML, to realize full potential of the
method
◦  Better integrators and minimizer
 
Hybrid model/algorithms
◦  Use coarse-grained normal modes to find flexible/
rigid regions of the biomolecule on-the-fly
 
◦  Use these flexibility finders in conjunction with
multibody dynamics or noncolliding trajectories?
Implementation
◦  OpenMM GPU implementations of NML and CNMA
 
Use NML for improved sampling methods
◦  String method based on coarse-grained normal
modes (manuscript submitted)
◦  2-D Parallel and serial tempering based on NML (as
Hamiltonian replica exchange on the number of
modes, besides temperature)
 
Applications
◦  Allosteric changes in src Kinases
◦  Peptide recognition dynamics in T-cell receptors
◦  GPCR regulators
Download