String Method with Collective variables from Normal Modes : Application to Alanine Dipeptide Santanu Chatterjee, Christopher R. Sweet, and Jesús A. Izaguirre Department of Computer Science and Engineering and University of Notre Dame, Notre Dame, Indiana Abstract The String Method is commonly used to find the minimum free energy paths (MFEP) between meta-stable states of a molecule. This requires the choice of a set of reaction coordinates, or collective variables, along which the transition takes place, which is non-trivial. In this paper we propose the use of low frequency normal modes as the reaction coordinates. This choice is based on the observation that normal mode analysis can provide the direction of the low frequency motion of interest. We have successfully applied this method to study conformation transitions of alanine dipeptide, both in vacuum and with implicit solvent. We also show that the choice of the normal modes confers simplifications in the string method itself due to the linear projections. Keywords: String Method, Normal Modes, Alanine Dipeptide 1 Contents 3 I. Introduction 3 II. Methods A. Collective variables from Normal Mode Analysis 4 B. Dihedral Common Space 6 C. Dragging Algorithm 7 D. Some results 8 E. String Method 9 1. On-the-fly String Method 9 F. Derivation of mean force 11 1. Reparameterization 12 2. Estimation of mean force 13 G. Reparameterization step 14 H. Results 14 References 14 2 I. INTRODUCTION Minimum Free Energy Path (MFEP) is important for studying conformational transitions in biomolecules. On-the-fly String method (developed by Maragliano et al.? ) can be applied to find MFEP. However,Applicability of string method strongly depends upon the correct definition of the collective variables describing conformational transition, which is not generally known. The challenge remains in the correct definition of collective variables which can capture conformational transition. In this paper, we present a method of semi-automatically determining collective variable set which can be used in string method. The method of generating collective variables is based on Normal Mode Analysis (NMA). NMA involves a quadratic approximation of the potential energy surface. It has been shown earlier that large conformational transitions in biomolecules can be described by a set of low frequency normal modes (1 ,2 ,3 ). Sweet et al.4 has developed a coarse-grained propagator which allows one to integrate the equation of motion along a chosen set of relevant low frequency modes while minimizing the potential energy along the rest of the degrees of freedom. Based on these general observations, an interesting question arises : can one describe collective variables for a conformational transition using normal modes? In this paper, we used a set of low frequency normal modes as collective variables in the string method. We applied this method to study conformational transitions in alanine dipeptide. II. METHODS First, we will describe the Normal Mode Analysis (NMA) and its use in defining the collective variables. We will then present the theoretical basis for the String Method, based on NMA. The numerical approximations required to build a practical method will then be discussed, followed by our String Method algorithm. 3 A. Collective variables from Normal Mode Analysis MD systems of interest for modeling biomolecules can be described by the separable Hamiltonian, 1 H(x, p) = pT M −1 p + U (x), 2 (1) where p are the momenta , x are the positions (s. t. p, x∈<3N , N is the total number of atoms) and U (x) is the potential energy. We can expand this system around a conformation x0 as follows, U (x0 + δx) = U (x0 ) + ∇x U T δx + δxT Hδx + ... , 2! (2) for H the system Hessian, δx = x − x0 . If Eqn. (2) is truncated at the Hessian term we obtain a harmonic approximation of the original system. To obtain the eigenvectors/eigenvalues that are ordered according to system frequency (square root of eigenvalues) we mass weight the Hessian to rewrite the system as a set of decoupled oscillators. Diagonalizing the resulting weighted Hessian gives 1 1 M − 2 HM − 2 Q = QΛ, (3) where Λ is a diagonal matrix of eigenvalues λ1 , . . ., λ3N with λ1 ≤ λ2 ≤ . . . ≤ λ3N for the corresponding 3N eigenvectors qk , k = 1, . . ., 3N , M is diagonal system mass matrix and Q the corresponding eigenvectors matrix. Each eigenvector gives the direction along which the corresponding degree of freedom can move. To define the collective variables based on the eigenvectors corresponding to the lowest frequency modes we divide the degrees of freedom of the system into a subspace Ω spanned by the first m eigenvectors Q = [q1 q2 . . . qm ] , (4) where Q is a column matrix (s. t. Q∈<3N ×m ), and a complement subspace Ω̄ spanned by the remaining 3N − m eigenvectors. If we perform diagonalization at an energy minima x0 , we can project from Cartesian space to the mode space defined by m low frequency 4 eigenvectors as c = QT M 1/2 (x − x0 ). (5) Note the dependency on x0 in Equation (5). The normal mode eigenvectors give us the direction away from the point where we performed the diagonalization. We assume that we are given conformations of the system at A and B and we know the timescale associated with the conformational transition between A and B. Based on the following, we can perform frequency partitioning at A and B to pick a set of m low frequency eigenvectors. Next, we need to determine a common collective variable space capturing the conformational transition. Note that the set of eigenvectors QA , QB ∈<3N ×m are different. One approach could be the following : combine QA and QB and find an orthogonal set for the combined space. However, if we combine the space, we will loose information about the origin of QA and QB . Therefore, we will not be able to perform the projection in Equation (5). FIG. 1: The set of eigenvectors at each intermediate point are different. In essence, the set of low frequency eigenvectors will be different at the intermediate points on the String. Figure (1) describes this problem. Let us denote them by Qi = [qi,j , j = 1, . . ., m], i = 1, . . ., R. Here we are making an assumption that a set of m of eigenvectors at each intermediate point will span the relevant low frequency space in which the conformational transition takes place. Our analysis suggest that this is true for many biomolecular systems. Figures (2(a)) and (2(b)) shows this for open and closed conformation of Calmodulin. 5 (a)Calmodulin Structures (b)Frequency distributions FIG. 2: Frequency distributions at two conformations of calmodulin are similar. This suggests that if we pick a set of m low frquency eigenvectors at each conformation, the span of these eigenvectors will be nearly the same. B. Dihedral Common Space In the previous subsection, we described how one can describe the collective variables using Normal Modes. However, it was evident that global set of normal modes can not be obtained by combining these local set of modes. Therefore. we define the common collective variable space by a set of dihedrals. Let us denote the set of dihedrals by φi , i = 1, . . ., p. Let us denote the space of these dihedrals as Θ. Θ can be chosen by mapping the eigenvectors at the endpoints into the set of all dihedrals. The string will be defined in Θ and it will be updated in Θ. However, the dynamics will be performed in the space of low frequency normal modes at each intermediate point of the string. Therefore, choice of Θ will not affect the overall peformance of the algorithm. For example, one can choose • all backbone dihedrals or • all backbone and sidechain dihedrals or • all dihedrals with heavy atoms. Once we find Θ, we can then define the string in Θ with R intermediate points defined as θi , i = 1, . . ., R. Next, we need to build the initial string. We developed a dragging algorithm which will allow us to move from θi to θi+1 . 6 C. Dragging Algorithm Let us assume that we are at a position x0i where the set of dihedrals defining the conformation of the system is given by Φ0i = Φ(x0i ). Next we move to a point Φ1i s.t. Φ1i = Φ0i + δΦ = Φ(x0i + δx). (6) Note that δx is unknown here and we want to find it s.t. x1i = x0i + δx. (7) We will make an assumption that δx is small. Then we can expand Φ(x0i + δx) according to Taylor Series as Φ(x0i + δx) = Φ(x0i ) + ∂Φ ∂x T δx + . . . (8) Φ0i Note that Φ here is a vector valued function. The coordinates of the points x0i and x1i are given by c0i and c1i so that c0i = c(Φ0i ) = c(Φ(x0 )) and c1i = c(Φ1i ) = c(Φ(x0i + δx)). Once again, we can expand using Taylor Series c(Φ0i + δΦ) = c(Φ0i ) + dc dΦ T δΦ + . . . (9) Φ0i Note that our main goal is to find x1i . How do we find it? We have two options : • truncate Equation (8) after the second term and solve for δx. However, p < 3N and therefore, this system of equation is underdetermined. We can have infinitely many solutions. • On the other hand, we can truncate Equation (9) and solve for c(Φ0i + δΦ). Next, we can find δx from c(Φ0i + δΦ). dc In order to solve Equation (9), we need to find the derivative ( dΦ )x0i . First, Note that dc dΦ Φ(x0i ) dcj = , j = 1, . . ., m, , i = 1, . . ., p . dΦi 7 (10) We can estimate the derivative as follows dc dc |Φ0i = dΦ dx Note that unless the Jacobian Matrix dΦ dx dx dΦ . (11) Φ0i is square we can not have a real inverse matrix. Now, from Equation (10), we can write the elements of the matrix as dcj = dΦi Note that in Equation (12), dcj ∈<3N ×1 dx dcj dx T dx dΦi (12) s.t. 1 dcj = qjT M 2 , dx (13) where qj is the eigenvector corresponding to jth mode and dx1 /dΦi dx /dΦ dx i 2 = dΦi . . . dx3N /dΦi (14) We can use dxk /dΦi = 1/ (dΦi /dxk ) . (15) Note that if dihedral Φi is not a function of xk then (dΦi /dxk ) = 0. In that case, we can assume dxk /dΦi = 0. D. Some results I tested the dragging algorithm with alanine dipeptide. In the Φ − Ψ plane, I picked a source and a target point. Next, I moved the system from source to target using the dragging algorithm. Figure (3) and (4) how the dragging algorithm works. At each step, we estimate ∆Φ between current point and the target. Next, we determine the corresponding change ∆c in mode space for a move ∆Φ/2. 8 FIG. 3: Plot showing how we move from source to target using ”Dragging”. 50 steps of dragging algorithm has been applied. FIG. 4: Plot showinh how we move from source to target using ”Dragging”. E. String Method 1. On-the-fly String Method For convenience, first we describe the on-the-fly String Method of Maragliano et al.5 Let x be the system conformation with x∈R3N and c∈Ω. The probability of c in the reduced variable space, in the canonical ensemble, is given by P (c) = Z −1 Z <3N exp−βU (x) Πm j=1 δ((c(x) − c)j )dx, 9 (16) where (.)j is the jth element of the vector, c(x) the mapping from 3N Cartesian coordinates to collective variables, Z is the partition function and β = 1 kB T with T is the system temperature. The Kronecker Delta function δ(.) in Eqn. (16) counts the number of points from Cartesian space that can mapped to c. The free energy of the system at c is then given by F (c) = −β ln (P (c)). (17) MFEP is the transition path between A and B with highest probability. We define such a path as S(α) where α is a parameterization for paths in Ω s. t. Y1 ≤α≤Y2 , S(Y1 )∈A, S(Y2 )∈B. (18) The probability of the path is given by Z P (S) = P (S(α))dα. (19) α 0 By definition, if S(α) is the MFEP and S (α) is any other path 0 P (S(α)) ≥ P (S (α)). (20) On the MFEP the mean force G(c) = −∇c F (c) at a point ci = S(αi ) is tangential to the string dŜ(αi ) dα In Eqn. (21), dŜ(αi ) dα !T ∇c F (c) − ||∇c F (c)|| = 0. (21) is a unit tangent to the string at a point S(αi ). From Eqn. (21), it is evident that the mean force perpendicular to the MFEP at any point is zero vector. We define this force as G⊥ (c) = −∇c F ⊥ (c). (22) The objective is to find a string such that G⊥ (S(αi )) = 0 at all points. In practice it is assumed that the String has converged to the MFEP when ||G⊥ (S(αi ))|| < , 10 (23) where is an arbitrarily small constant. F. Derivation of mean force We derive the expression for mean force with collective variables defined using NMA. To find the mean forces G(S(αi )) on the string we will use MD to sample the phase space of the biomolecule. To achieve this we will need to re-write Eqn. (16) in terms of x only. A point xi in Cartesian space can be written as (24) xi = x̂i + x̄i + x0 , where x̂i ∈ Ω, x̄i ∈ Ω̄ and x0 is the point where the Hessian was diagonalized. Equation (??) defines the mapping from 3N Cartesian space x to the collective variables c. From Eqns. (3) to (??) we can also define the inverse mapping 1 1 1 x̂ = M − 2 Qc = M − 2 QQT M 2 (x − x0 ). (25) showing x̂ has a 1-1 mapping to the collective variable c. Mapping the remaining degrees of freedom from complement space Ω̄ to the Cartesian space is given by 1 1 x̄ = M − 2 (I − QQT )M 2 (x − x0 ). (26) Detailed derivation of Eqns. (24) to (26) can be found in Ref. 4. From Eqn. (17) the mean force at ci ∈Ω is given by G(ci ) = β ∇c P (ci ). P (ci ) (27) From the 1-1 mapping between a point ci and x̂i in Eqn. (25), we have (28) P (ci ) = P (x̂i ). Now we can write P (x̂i ) = Z −1 Z <3N 1 1 −2 exp−βU (x) Πm QQT M 2 (x − x0 ) − x̂i )k )dx, k=1 δ((M 11 (29) where the subscript k indicates kth element of the vector. The δ(.) term in Eqn. (29) can 1 1 be removed if we integrate only over points in Ω̄, i.e. M − 2 QQT M 2 (x − x0 ) = x̂i , ∀x. We can then write P (x̂i ) = Z −1 Z exp−βU (x̂i +x̄+x0 ) dx̄. (30) Ω̄ The derivative of P (x̂i ) w. r. t. x̂i is given by ∇x̂ P (x̂i ) = −βZ −1 Z exp−βU (x̂i +x̄+x0 ) ∇x̂ U (x̂i + x̄ + x0 )dx̄, (31) Ω̄ which, in turn, gives us β G(ci ) = P (ci ) dx̂i dci T ∇x̂i P (x̂i ). (32) Finally Z −1 T 1 G(ci ) = −β Q M2 P (ci ) 2 Z exp−βU (x̂i +x̄+x0 ) ∇x̂ U (x̂i + x̄ + x0 )dx̄. (33) Ω̄ We define the String with a set of r discrete points S(α) = {S(α1 ), S(α2 ), . . ., S(αr )} s. t. ||S(αi+1 ) − S(αi )||≈||S(αi ) − S(αi−1 )||. (34) At each step of the String Simulation, we estimate G(S(αi )) by running nB steps of Langevin dynamics in Ω̄. Since the Langevin equation will sample from the correct ensemble the integral in Eqn. (33) can be estimated using the force in Ω averaged over nB steps of dynamics. For simplicity, we update a point S(αi ) on the string with G(S(αi )) rather than G⊥ (S(αi )). Because of this we must ensure that there is equal separation between the intermediate points of the string to counteract the tangential movement along the string. We apply a reparameterization algorithm5 to approximate the condition in Eqn. (34). 1. Reparameterization If we keep updating the intermediate points, they will move towards the lower free energy regions. Therefore, we need to impose a constraint between intermediate points on the string. The constraint maintains equal distance between the intermediate points 12 in the subspace. We apply the reparamaterization algorithm described in5 to enforce this constraint. 2. Estimation of mean force We solve the following equations of motion numerically in Ω̄ to estimate the mean force : (35) dx̄ = v̄dt and M dv̄ = f̄dt − γ̄M v̄dt + p 1 2kB T γ̄Pf M 2 dW(t). (36) In Eqn. (36) t is time, W(t) is a collection of Wiener processes and Pf is the force projection matrix given by 1 1 Pf = M 2 (I − QQT )M − 2 , (37) γ̄ is the friction term in Ω̄ and v̄ is the velocity of the system in Ω̄. We use the Langevin Impulse method6 in Ω̄ to solve these equations. However, one can use any suitable NVT propagator in Ω̄. In our experience, Brownian dynamics in Ω̄ was not a good sampling method for the mean force. After nB steps of Langevin dynamics, we fix the position of the system to xi . Correct estimation of the mean force depends upon good sampling in Ω̄. Since we are only integrating the fast degrees of freedom at this step, we can use very small γ for accelerated dynamics of the system. We are not interested in the timescale of dynamics in Ω̄. At step j, for point i, if the system force is fij , the force in Ω̄ is given by f̄ij = Pf⊥ fij where 1 1 Pf⊥ = M 2 QQT M − 2 . (38) f̂ij = fij − f̄ij . (39) The force in Ω is given The mean force is estimated as the average nB steps as j f̂i,mf nB 1 X = f̂ij . nB j=1 13 (40) G. Reparameterization step After string update with estimated mean force, we need to apply the reparameterization algorithm to ensure equal separation between intermediate points on the string. The reparameterization algorithm moves point i from Φi to Φi + ∆Φi . This can be accomplished by applying the dragging algorithm described above. H. Results FIG. 5: Plot showing the string for alanine dipeptide with 10 intermediate points. Figures (5) and (6) shows the result of the string method with 10 intermediate points. 1 M. Levitt, C. Sander, and P. S. Stern, “Protein normal-mode dynamics : trypsin inhibitor, crambin, ribonuclease and lysozyme,” J. Mol. Biol., vol. 181, pp. 423–447, 1985. 2 F. Tama and Y. H. Sanejouand, “Conformational changes of proteins arising from normal mode calculations,” Protein Engng, vol. 14, no. 1, pp. 1–6, 2001. 14 [!ht] FIG. 6: Convergence of the string algorithm. The string converged to MFEP after 6000 steps. 3 P. Petrone and V. Pande, “Can conformational change be described by only a few normal modes?” Biophys. J., vol. 90, pp. 1583–1593, 2006. 4 C. R. Sweet, P. Petrone, V. S. Pande, and J. A. Izaguirre, “Normal mode partitioning of Langevin dynamics for biomolecules,” J. Chem. Phys., vol. 128, pp. 1–14, 2008. 5 L. Maragliano and E. Vanden-Eijnden, “On-the-fly string method for minimum free energy path calculations,” Chem. Phys. Lett., vol. 446, pp. 182–190, 2007. 6 R. D. Skeel and J. A. Izaguirre, “An impulse integrator for Langevin dynamics,” Mol. Phys., vol. 100, no. 24, pp. 3885–3891, 2002. 15