String Method with Collective variables from Normal Modes

advertisement
String Method with Collective variables from Normal Modes :
Application to Alanine Dipeptide
Santanu Chatterjee, Christopher R. Sweet, and Jesús A. Izaguirre
Department of Computer Science and Engineering and
University of Notre Dame, Notre Dame, Indiana
Abstract
The String Method is commonly used to find the minimum free energy paths (MFEP) between
meta-stable states of a molecule. This requires the choice of a set of reaction coordinates, or collective variables, along which the transition takes place, which is non-trivial. In this paper we
propose the use of low frequency normal modes as the reaction coordinates. This choice is based
on the observation that normal mode analysis can provide the direction of the low frequency motion of interest. We have successfully applied this method to study conformation transitions of
alanine dipeptide, both in vacuum and with implicit solvent. We also show that the choice of the
normal modes confers simplifications in the string method itself due to the linear projections.
Keywords: String Method, Normal Modes, Alanine Dipeptide
1
Contents
3
I. Introduction
3
II. Methods
A. Collective variables from Normal Mode Analysis
4
B. Dihedral Common Space
6
C. Dragging Algorithm
7
D. Some results
8
E. String Method
9
1. On-the-fly String Method
9
F. Derivation of mean force
11
1. Reparameterization
12
2. Estimation of mean force
13
G. Reparameterization step
14
H. Results
14
References
14
2
I.
INTRODUCTION
Minimum Free Energy Path (MFEP) is important for studying conformational transitions in biomolecules. On-the-fly String method (developed by Maragliano et al.? ) can be
applied to find MFEP. However,Applicability of string method strongly depends upon
the correct definition of the collective variables describing conformational transition,
which is not generally known. The challenge remains in the correct definition of collective variables which can capture conformational transition. In this paper, we present
a method of semi-automatically determining collective variable set which can be used in
string method. The method of generating collective variables is based on Normal Mode
Analysis (NMA). NMA involves a quadratic approximation of the potential energy surface. It has been shown earlier that large conformational transitions in biomolecules can
be described by a set of low frequency normal modes (1 ,2 ,3 ). Sweet et al.4 has developed a
coarse-grained propagator which allows one to integrate the equation of motion along a
chosen set of relevant low frequency modes while minimizing the potential energy along
the rest of the degrees of freedom. Based on these general observations, an interesting
question arises : can one describe collective variables for a conformational transition using normal modes? In this paper, we used a set of low frequency normal modes as collective variables in the string method. We applied this method to study conformational
transitions in alanine dipeptide.
II.
METHODS
First, we will describe the Normal Mode Analysis (NMA) and its use in defining the
collective variables. We will then present the theoretical basis for the String Method,
based on NMA. The numerical approximations required to build a practical method will
then be discussed, followed by our String Method algorithm.
3
A.
Collective variables from Normal Mode Analysis
MD systems of interest for modeling biomolecules can be described by the separable
Hamiltonian,
1
H(x, p) = pT M −1 p + U (x),
2
(1)
where p are the momenta , x are the positions (s. t. p, x∈<3N , N is the total number of
atoms) and U (x) is the potential energy. We can expand this system around a conformation x0 as follows,
U (x0 + δx) = U (x0 ) + ∇x U T δx +
δxT Hδx
+ ... ,
2!
(2)
for H the system Hessian, δx = x − x0 . If Eqn. (2) is truncated at the Hessian term we obtain a harmonic approximation of the original system. To obtain the eigenvectors/eigenvalues that are ordered according to system frequency (square root of eigenvalues) we
mass weight the Hessian to rewrite the system as a set of decoupled oscillators. Diagonalizing the resulting weighted Hessian gives
1
1
M − 2 HM − 2 Q = QΛ,
(3)
where Λ is a diagonal matrix of eigenvalues λ1 , . . ., λ3N with λ1 ≤ λ2 ≤ . . . ≤ λ3N for
the corresponding 3N eigenvectors qk , k = 1, . . ., 3N , M is diagonal system mass matrix
and Q the corresponding eigenvectors matrix. Each eigenvector gives the direction along
which the corresponding degree of freedom can move.
To define the collective variables based on the eigenvectors corresponding to the lowest frequency modes we divide the degrees of freedom of the system into a subspace Ω
spanned by the first m eigenvectors
Q = [q1 q2 . . . qm ] ,
(4)
where Q is a column matrix (s. t. Q∈<3N ×m ), and a complement subspace Ω̄ spanned by
the remaining 3N − m eigenvectors. If we perform diagonalization at an energy minima
x0 , we can project from Cartesian space to the mode space defined by m low frequency
4
eigenvectors as
c = QT M 1/2 (x − x0 ).
(5)
Note the dependency on x0 in Equation (5). The normal mode eigenvectors give us the
direction away from the point where we performed the diagonalization.
We assume that we are given conformations of the system at A and B and we know
the timescale associated with the conformational transition between A and B. Based on
the following, we can perform frequency partitioning at A and B to pick a set of m low
frequency eigenvectors. Next, we need to determine a common collective variable space
capturing the conformational transition. Note that the set of eigenvectors QA , QB ∈<3N ×m
are different. One approach could be the following : combine QA and QB and find an
orthogonal set for the combined space. However, if we combine the space, we will loose
information about the origin of QA and QB . Therefore, we will not be able to perform the
projection in Equation (5).
FIG. 1: The set of eigenvectors at each intermediate point are different.
In essence, the set of low frequency eigenvectors will be different at the intermediate
points on the String. Figure (1) describes this problem. Let us denote them by Qi =
[qi,j , j = 1, . . ., m], i = 1, . . ., R.
Here we are making an assumption that a set of m of eigenvectors at each intermediate
point will span the relevant low frequency space in which the conformational transition
takes place. Our analysis suggest that this is true for many biomolecular systems. Figures
(2(a)) and (2(b)) shows this for open and closed conformation of Calmodulin.
5
(a)Calmodulin Structures
(b)Frequency distributions
FIG. 2: Frequency distributions at two conformations of calmodulin are similar. This suggests
that if we pick a set of m low frquency eigenvectors at each conformation, the span of these
eigenvectors will be nearly the same.
B.
Dihedral Common Space
In the previous subsection, we described how one can describe the collective variables
using Normal Modes. However, it was evident that global set of normal modes can not
be obtained by combining these local set of modes. Therefore. we define the common
collective variable space by a set of dihedrals. Let us denote the set of dihedrals by φi , i =
1, . . ., p. Let us denote the space of these dihedrals as Θ.
Θ can be chosen by mapping the eigenvectors at the endpoints into the set of all dihedrals. The string will be defined in Θ and it will be updated in Θ. However, the dynamics will be performed in the space of low frequency normal modes at each intermediate
point of the string. Therefore, choice of Θ will not affect the overall peformance of the
algorithm. For example, one can choose
• all backbone dihedrals or
• all backbone and sidechain dihedrals or
• all dihedrals with heavy atoms.
Once we find Θ, we can then define the string in Θ with R intermediate points defined
as θi , i = 1, . . ., R. Next, we need to build the initial string. We developed a dragging
algorithm which will allow us to move from θi to θi+1 .
6
C.
Dragging Algorithm
Let us assume that we are at a position x0i where the set of dihedrals defining the
conformation of the system is given by Φ0i = Φ(x0i ). Next we move to a point Φ1i s.t.
Φ1i = Φ0i + δΦ = Φ(x0i + δx).
(6)
Note that δx is unknown here and we want to find it s.t.
x1i = x0i + δx.
(7)
We will make an assumption that δx is small. Then we can expand Φ(x0i + δx) according
to Taylor Series as
Φ(x0i
+ δx) =
Φ(x0i )
+
∂Φ
∂x
T
δx + . . .
(8)
Φ0i
Note that Φ here is a vector valued function. The coordinates of the points x0i and x1i are
given by c0i and c1i so that c0i = c(Φ0i ) = c(Φ(x0 )) and c1i = c(Φ1i ) = c(Φ(x0i + δx)). Once
again, we can expand using Taylor Series
c(Φ0i
+ δΦ) =
c(Φ0i )
+
dc
dΦ
T
δΦ + . . .
(9)
Φ0i
Note that our main goal is to find x1i . How do we find it? We have two options :
• truncate Equation (8) after the second term and solve for δx. However, p < 3N and
therefore, this system of equation is underdetermined. We can have infinitely many
solutions.
• On the other hand, we can truncate Equation (9) and solve for c(Φ0i + δΦ). Next, we
can find δx from c(Φ0i + δΦ).
dc
In order to solve Equation (9), we need to find the derivative ( dΦ
)x0i . First, Note that
dc
dΦ
Φ(x0i )
dcj
=
, j = 1, . . ., m, , i = 1, . . ., p .
dΦi
7
(10)
We can estimate the derivative as follows
dc
dc
|Φ0i =
dΦ
dx
Note that unless the Jacobian Matrix
dΦ
dx
dx
dΦ
.
(11)
Φ0i
is square we can not have a real inverse matrix.
Now, from Equation (10), we can write the elements of the matrix as
dcj
=
dΦi
Note that in Equation (12),
dcj
∈<3N ×1
dx
dcj
dx
T
dx
dΦi
(12)
s.t.
1
dcj
= qjT M 2 ,
dx
(13)
where qj is the eigenvector corresponding to jth mode and

dx1 /dΦi



 dx /dΦ 
dx
i 
 2
=


dΦi 
.
.
.


dx3N /dΦi
(14)
We can use
dxk /dΦi = 1/ (dΦi /dxk ) .
(15)
Note that if dihedral Φi is not a function of xk then (dΦi /dxk ) = 0. In that case, we can
assume dxk /dΦi = 0.
D.
Some results
I tested the dragging algorithm with alanine dipeptide. In the Φ − Ψ plane, I picked a
source and a target point. Next, I moved the system from source to target using the dragging algorithm. Figure (3) and (4) how the dragging algorithm works. At each step, we
estimate ∆Φ between current point and the target. Next, we determine the corresponding
change ∆c in mode space for a move ∆Φ/2.
8
FIG. 3: Plot showing how we move from source to target using ”Dragging”. 50 steps of dragging
algorithm has been applied.
FIG. 4: Plot showinh how we move from source to target using ”Dragging”.
E.
String Method
1.
On-the-fly String Method
For convenience, first we describe the on-the-fly String Method of Maragliano et al.5
Let x be the system conformation with x∈R3N and c∈Ω. The probability of c in the
reduced variable space, in the canonical ensemble, is given by
P (c) = Z
−1
Z
<3N
exp−βU (x) Πm
j=1 δ((c(x) − c)j )dx,
9
(16)
where (.)j is the jth element of the vector, c(x) the mapping from 3N Cartesian coordinates to collective variables, Z is the partition function and β =
1
kB T
with T is the system
temperature. The Kronecker Delta function δ(.) in Eqn. (16) counts the number of points
from Cartesian space that can mapped to c. The free energy of the system at c is then
given by
F (c) = −β ln (P (c)).
(17)
MFEP is the transition path between A and B with highest probability. We define such a
path as S(α) where α is a parameterization for paths in Ω s. t.
Y1 ≤α≤Y2 , S(Y1 )∈A, S(Y2 )∈B.
(18)
The probability of the path is given by
Z
P (S) =
P (S(α))dα.
(19)
α
0
By definition, if S(α) is the MFEP and S (α) is any other path
0
P (S(α)) ≥ P (S (α)).
(20)
On the MFEP the mean force G(c) = −∇c F (c) at a point ci = S(αi ) is tangential to the
string
dŜ(αi )
dα
In Eqn. (21),
dŜ(αi )
dα
!T
∇c F (c) − ||∇c F (c)|| = 0.
(21)
is a unit tangent to the string at a point S(αi ). From Eqn. (21), it is
evident that the mean force perpendicular to the MFEP at any point is zero vector. We
define this force as
G⊥ (c) = −∇c F ⊥ (c).
(22)
The objective is to find a string such that G⊥ (S(αi )) = 0 at all points. In practice it is
assumed that the String has converged to the MFEP when
||G⊥ (S(αi ))|| < ,
10
(23)
where is an arbitrarily small constant.
F.
Derivation of mean force
We derive the expression for mean force with collective variables defined using NMA.
To find the mean forces G(S(αi )) on the string we will use MD to sample the phase space
of the biomolecule. To achieve this we will need to re-write Eqn. (16) in terms of x only.
A point xi in Cartesian space can be written as
(24)
xi = x̂i + x̄i + x0 ,
where x̂i ∈ Ω, x̄i ∈ Ω̄ and x0 is the point where the Hessian was diagonalized. Equation
(??) defines the mapping from 3N Cartesian space x to the collective variables c. From
Eqns. (3) to (??) we can also define the inverse mapping
1
1
1
x̂ = M − 2 Qc = M − 2 QQT M 2 (x − x0 ).
(25)
showing x̂ has a 1-1 mapping to the collective variable c. Mapping the remaining degrees
of freedom from complement space Ω̄ to the Cartesian space is given by
1
1
x̄ = M − 2 (I − QQT )M 2 (x − x0 ).
(26)
Detailed derivation of Eqns. (24) to (26) can be found in Ref. 4. From Eqn. (17) the mean
force at ci ∈Ω is given by
G(ci ) =
β
∇c P (ci ).
P (ci )
(27)
From the 1-1 mapping between a point ci and x̂i in Eqn. (25), we have
(28)
P (ci ) = P (x̂i ).
Now we can write
P (x̂i ) = Z
−1
Z
<3N
1
1
−2
exp−βU (x) Πm
QQT M 2 (x − x0 ) − x̂i )k )dx,
k=1 δ((M
11
(29)
where the subscript k indicates kth element of the vector. The δ(.) term in Eqn. (29) can
1
1
be removed if we integrate only over points in Ω̄, i.e. M − 2 QQT M 2 (x − x0 ) = x̂i , ∀x. We
can then write
P (x̂i ) = Z
−1
Z
exp−βU (x̂i +x̄+x0 ) dx̄.
(30)
Ω̄
The derivative of P (x̂i ) w. r. t. x̂i is given by
∇x̂ P (x̂i ) = −βZ
−1
Z
exp−βU (x̂i +x̄+x0 ) ∇x̂ U (x̂i + x̄ + x0 )dx̄,
(31)
Ω̄
which, in turn, gives us
β
G(ci ) =
P (ci )
dx̂i
dci
T
∇x̂i P (x̂i ).
(32)
Finally
Z −1 T 1
G(ci ) = −β
Q M2
P (ci )
2
Z
exp−βU (x̂i +x̄+x0 ) ∇x̂ U (x̂i + x̄ + x0 )dx̄.
(33)
Ω̄
We define the String with a set of r discrete points S(α) = {S(α1 ), S(α2 ), . . ., S(αr )} s. t.
||S(αi+1 ) − S(αi )||≈||S(αi ) − S(αi−1 )||.
(34)
At each step of the String Simulation, we estimate G(S(αi )) by running nB steps of
Langevin dynamics in Ω̄. Since the Langevin equation will sample from the correct ensemble the integral in Eqn. (33) can be estimated using the force in Ω averaged over nB
steps of dynamics. For simplicity, we update a point S(αi ) on the string with G(S(αi ))
rather than G⊥ (S(αi )). Because of this we must ensure that there is equal separation between the intermediate points of the string to counteract the tangential movement along
the string. We apply a reparameterization algorithm5 to approximate the condition in
Eqn. (34).
1.
Reparameterization
If we keep updating the intermediate points, they will move towards the lower free
energy regions. Therefore, we need to impose a constraint between intermediate points
on the string. The constraint maintains equal distance between the intermediate points
12
in the subspace. We apply the reparamaterization algorithm described in5 to enforce this
constraint.
2.
Estimation of mean force
We solve the following equations of motion numerically in Ω̄ to estimate the mean
force :
(35)
dx̄ = v̄dt
and
M dv̄ = f̄dt − γ̄M v̄dt +
p
1
2kB T γ̄Pf M 2 dW(t).
(36)
In Eqn. (36) t is time, W(t) is a collection of Wiener processes and Pf is the force projection
matrix given by
1
1
Pf = M 2 (I − QQT )M − 2 ,
(37)
γ̄ is the friction term in Ω̄ and v̄ is the velocity of the system in Ω̄. We use the Langevin
Impulse method6 in Ω̄ to solve these equations. However, one can use any suitable NVT
propagator in Ω̄. In our experience, Brownian dynamics in Ω̄ was not a good sampling
method for the mean force.
After nB steps of Langevin dynamics, we fix the position of the system to xi . Correct
estimation of the mean force depends upon good sampling in Ω̄. Since we are only integrating the fast degrees of freedom at this step, we can use very small γ for accelerated
dynamics of the system. We are not interested in the timescale of dynamics in Ω̄.
At step j, for point i, if the system force is fij , the force in Ω̄ is given by f̄ij = Pf⊥ fij where
1
1
Pf⊥ = M 2 QQT M − 2 .
(38)
f̂ij = fij − f̄ij .
(39)
The force in Ω is given
The mean force is estimated as the average nB steps as
j
f̂i,mf
nB
1 X
=
f̂ij .
nB j=1
13
(40)
G.
Reparameterization step
After string update with estimated mean force, we need to apply the reparameterization algorithm to ensure equal separation between intermediate points on the string.
The reparameterization algorithm moves point i from Φi to Φi + ∆Φi . This can be accomplished by applying the dragging algorithm described above.
H.
Results
FIG. 5: Plot showing the string for alanine dipeptide with 10 intermediate points.
Figures (5) and (6) shows the result of the string method with 10 intermediate points.
1
M. Levitt, C. Sander, and P. S. Stern, “Protein normal-mode dynamics : trypsin inhibitor, crambin, ribonuclease and lysozyme,” J. Mol. Biol., vol. 181, pp. 423–447, 1985.
2
F. Tama and Y. H. Sanejouand, “Conformational changes of proteins arising from normal mode
calculations,” Protein Engng, vol. 14, no. 1, pp. 1–6, 2001.
14
[!ht]
FIG. 6: Convergence of the string algorithm. The string converged to MFEP after 6000 steps.
3
P. Petrone and V. Pande, “Can conformational change be described by only a few normal
modes?” Biophys. J., vol. 90, pp. 1583–1593, 2006.
4
C. R. Sweet, P. Petrone, V. S. Pande, and J. A. Izaguirre, “Normal mode partitioning of Langevin
dynamics for biomolecules,” J. Chem. Phys., vol. 128, pp. 1–14, 2008.
5
L. Maragliano and E. Vanden-Eijnden, “On-the-fly string method for minimum free energy path
calculations,” Chem. Phys. Lett., vol. 446, pp. 182–190, 2007.
6
R. D. Skeel and J. A. Izaguirre, “An impulse integrator for Langevin dynamics,” Mol. Phys., vol.
100, no. 24, pp. 3885–3891, 2002.
15
Download