Chris Sweet, University of Notre Dame Collaborative work with: ◦ Jesus A. Izaguirre (Notre Dame) ◦ Paula Petrone (Stanford) ◦ Vijay S. Pande (Stanford) ◦ Eric Darve (Stanford) Support from Simbios for GPU implementations Support from NSF (JAI) and NIH (VSP) WW folding analysis by Faruck Morcos , Santanu Chatterjee, (ND) and James Sweet (Staffs) Reach long timescales (microsecond to millisecond) Model large conformational changes: ◦ Protein folding ◦ Allosteric regulation ◦ Etc. Use a modest computational budget: ◦ With GPU implementation. Several independent sources of evidence show that conformational change of biomolecules can be described within a low frequency motion space ◦ Low frequency normal modes ◦ Elastic network models ◦ Principal component analysis However, linear bases such as these cannot properly describe large conformational change – these analyses are appropriate only locally Determine collective or slow variables on-the-fly ◦ Based on coarse-grained normal modes (CNMA) ◦ Cheap update Currently 2-level method is O(N9/5) time Proposed multi-level method is O(N log N) time Use implicit solvent model ◦ Currently, screened Coulomb potential+ ◦ Generalized-Born and Poisson-Boltzmann planned Our analysis shows what we can analyze dynamically ◦ No need to decide slow variables a priori Cost of CNMA can be made comparable to a force computation with a modern force field ◦ Scalable to very large systems (more than 50,000 atoms can be handled in single core) Numerical discretizations of equations of motion can increase time step stably ◦ We have used time steps of up to 500 fs to date. The amortized cost of a step of NML is about 2x to 3x that of a step of MD. The protein or biomolecule can usually be described by a separable Hamiltonian of the form 1 T −1 H(x, p) = p M p + U (x). 2 If we expand the potential energy at local minimum, the Hessian H is a factor in the first non-constant, non-zero term of the expansion. A harmonic approximation to the original system can be found by truncating the expansion at this point. We can rewrite the harmonic approximation as a set of oscillators by diagonalizing the mass weighted Hessian − 12 M − 12 HM Q = QΛ, We can project between the mode space and Cartesian space using the eigenvectors and mass matrix 1 2 − 21 c = Q M (x − x0 ), x = M T Qc + x0 . The system is partitioned ! at the eigenvector with |λm | < fc ordered eigenvalue This gives column eigenvector matrix Q = [e1 , e2 , · · · , em ] We can now split the forces f̂ f̄ 1 2 − 12 = Pf f = M QQ M f " −1 1 ! T ⊥ = Pf f = M 2 I − QQ M 2 f , T Propagation of the slow d.o.f. is easy using the Langevin equation dx̂ = v̂dt, ! 1 1 2 Mdv̂ = f̂ dt − Γ̂Mv̂dt + 2kB T Γ̂ Pf M 2 dW1 (t). How to propagate the fast d.o.f. ? MTS integrator: Since the ratio of step sizes is very large we must ‘over-damp’ to prevent resonances ! 1 −1 −1 − 12 −1 ⊥ dx̄ = Γ̄ M f̄ dt + 2kB T Γ̄ M Pf M 2 dW2 (t). BUT Euler-Maruyama = Poor sampling AND no dynamics. Minimizer: dx̄ = ηM−1 f̄ dt. !"# !## " "# # !"# !!## !!"# !!"# !!## !"# # ! "# !## !"# Propagation of first 20 modes only for Alanine Dipeptide, starting at C7 equatorial. 100ns simulation. 1000 ! /! Maximum step size fs i 0 ratio of square root of eigenvalues Maximum step size obtained 100 10 1 1 500 1000 1500 2000 Mode number 2500 Results from BPTI model, 57 residues. Maximum speedups for different size proteins: AD (22 atoms); WW (33 res.); BPTI (57 res.); CaM (148 res.); Src Kinase (449 res.) http://normalmodes.cse.nd.edu Rate of NML(m,f) where m is the number of modes and f is rediagonalization frequency. LI MD gives 4.7/ns. ' 12)*-,3-!# & 45 674845,#!!3674844,#!!3- % $ ! ! " #! ()*+,-./*,0- #" Using 12 modes NML can compute isomerization rate with 16 fs time steps. LI over-dampens as step size is increased, BBK unsuitable as γ∆t > 1 . LL-Langevin Leapfrog. v n+ 12 x n+1 vn+1 = e−γ ∆t 2 vn + ! 1 − e−γ γ $ # − 12 + 2kB T γ M = x + ∆t v n n+ 12 , ! ∆t 2 " M−1 f (xn ) 1 − e−γ∆t n Z , 2γ ∆t 2 " −γ 1 − e = e−γ v + M−1 f (xn+1 ) γ $ # 1 − e−γ∆t n+1 − 12 + 2kB T γ M Z . 2γ ∆t 2 n+ 12 Illustration of the dimensionality reduction strategy for the diagonalization. If the vectors in E span the low frequency space of interest in H, then the diagonalization of S can produce a low frequency basis set. We form blocks of contiguous residues and do a full diagonalization of their Hessian: H̃ii Qi = Qi Di Low frequency interactions among blocks are in the first 10 – 15 eigenvectors of Qi . Source of vectors on E: 1. Nonbonded interactions between blocks manifest themselves in block rotations and translations (and map to null space, first 6 vectors) 2. Backbone dihedral motions map to first 4 non-zero eigenvectors 3. Side-chain dihedral motions map to next non-zero eigenvectors ui i = α1i v1 + α2i v2 + · · · + α3N v3N , uT i Huj σi = uT i Hui . = α1i α1j v1T Hv1 + α2i α2j v2T Hv2 + · · · j i T +α3N α3N v3N Hv3N , j i = α1i α1j λ1 + · · · + α3N α3N λ3N = 0, uT i uj j i = α1i α1j + α2i α2j + · · · + α3N α3N = 0. If there are no degenerate eigenvectors are we get an upper bound αki αkj = 0 ∀k ⇒ vi ∈ C, vi ∈ / C ⊥ , ∃ul = vi √ then fmax ≤ σm . guaranteeing stability. Coarse-grained Rayleigh quotients and true eigenvalues for BPTI. 2-level CNMA using b blocks: ◦ Cost of diagonalizing b blocks: O ! 3 N b2 " ◦ Cost of diagonalizing S: ! 3" O b 3 5 ◦ This is minimized when b ∝! N " 9 ◦ Total scaling is O N5 Space use is between O (N ) and O ! N 3 2 " The 2-level coarse-grained ! 9 "diagonalization of the the Hessian has cost O N 5 We can reuse this result to reduce the cost of diagonalizing the blocks. Recursive application log N times of this idea gives a multi-level algorithm with cost of time and memory. O (N log N ) O (N ) Compute (coarse-grained) low frequency modes Form block Hessian Multi-level diagonalization Propagate dynamics on space of low frequency motions Compute all-atom forces Project forces onto subspace Minimize forces in the fast frequency space Compute fast forces Iterate until convergence Ensign and Pande (2009) simulated the FIP35 WW domain mutant (33 residues) using GB implicit solvent, Langevin dynamics, and Folding@Home ◦ 10,000 simulations total were run ◦ 1,000 sims were at T=300 K and γ=91 ps-1 ◦ 2 of these simulations folded (out of 33 total at different viscosity and temperature) – folding was defined as formation of the 3 β-sheets and an Cα RMSD of less than 3 Å ◦ Sampling for the 1000 sims was around 197 microseconds We tested NML with 4 real modes; timesteps of 50 and 100 fs; and rediagonalization every 1 or 2 ps at temperature 300 K and viscosity γ=91 ps-1 ◦ We use CHARMM with screened Coulomb potential. After analyzing RMSD of plain MD simulations for folded FIP35 we found it to be higher than for Ensign and Pande’s AMBER with Generalized Born simulations. ◦ We redefined the “folded” state to allow for an RMSD of less than 5 Å, as opposed to 3 Å in their work. NML Simulation Results: ◦ 3 NML simulations folded out of 600 (RMSD < 5 Å). ◦ 5 NML simulations folded out of 600 (RMSD < 6 Å) ◦ Sampling for the 600 simulations was around 23 microseconds F@H Simulation Results: ◦ 5 F@H simulations folded out of 1000 (RMSD < 5 Å) ◦ 9 F@H simulations folded out of 1000 (RMSD < 6 Å) ◦ Sampling for the 1000 sims was around 197 microseconds Rates are off only by a factor of 5 in this case: F@H, 19 µs-1 NML, 4 µs-1 Experimental, 13 µs-1 For WW domain: ◦ MD 1fs: 6 ns/day ◦ NML 50fs: 130 ns/day ◦ NML 100fs: 209 ns/day For Calmodulin (open): ◦ Speedups of 80 for timestep of 200fs For F1-ATPase: ◦ Speedups of 170 for timestep of 500fs Coarse-grained normal modes can effectively be used as collective slow variables By re-computing these normal modes periodically one can recover kinetics (to within logarithmic accuracy) We presented algorithms for: ◦ Computing normal modes on O(N9/5) and O(N log N) time ◦ Computing dynamics of these normal modes to model conformational change (folding) Currently in protomol 3.0: ◦ http://protomol.sourceforge.net Information web page: ◦ http://normalmodes.info Submission server: ◦ http://lcls.cse.nd.edu/submission Will be in future OpenMM releases Papers available from ◦ http://www.nd.edu/~izaguirr Improved model ◦ Other solvation models (GB, PBE, Ions, Semi-explicit) ◦ Include hydrodynamics tensor for implicit solvent, and friction tensor for fast modes (with Eric Darve, Stanford) Better algorithms ◦ Fully adaptive NML, to realize full potential of the method ◦ Better integrators and minimizer Hybrid model/algorithms ◦ Use coarse-grained normal modes to find flexible/ rigid regions of the biomolecule on-the-fly ◦ Use these flexibility finders in conjunction with multibody dynamics or noncolliding trajectories? Implementation ◦ OpenMM GPU implementations of NML and CNMA Use NML for improved sampling methods ◦ String method based on coarse-grained normal modes (manuscript submitted) ◦ 2-D Parallel and serial tempering based on NML (as Hamiltonian replica exchange on the number of modes, besides temperature) Applications ◦ Allosteric changes in src Kinases ◦ Peptide recognition dynamics in T-cell receptors ◦ GPCR regulators