NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL JUN KOO PARK Department of Mathematics Iowa State University, Ames, Iowa, 50010, USA Email:jkpark@iastate.edu Abstract. The functions of large biological molecules are related with various kinds of large-amplitude molecular motions. With some assumptions, those motions can be analyzed by Normal Mode Analysis and Gaussian Network Model. However, despite their contributions to wide applications, the relationship between NMA and GNM requires further discussion. In this review, the author compares the Normal Mode Analysis and Gaussian Network Model and addresses their common applications in structural biology. Key Words. Normal Mode Analysis, Gaussian Network Model, protein dynamics 1. Introduction 1.1. Historical development In 1950, Goldstein introduced the theory of normal modes in classical mechanics [4,64,79,86]. Since then, normal mode analysis has been widely used in several fields such as mechanical engineering, wave theory and optics. It was first applied in the early 1980’s to examine mechanical aspects of small biological systems such as bovine pancreatic trypsin inhibitor (BPTI) [34]. Afterwards, research was carried out on larger proteins [73,74], however, the number of residues was still less than 300. With the progress of X-ray crystallography, formations of larger biological systems have become usable and new methods for NMA were needed to study fluctuations of these larger systems. Mouawad and Perahia proposed the diagonalization in a mixed basis (DIMB) method in 1993 [37,84], which performs an iterative diagonalization based on mixing of subsets taken from two different basis sets: low-frequency NM and Cartesian coordinates. After they introduced the DIMB method, the studies on proteins of more than 300 residues were possible. In 2000, Tama et al. introduced the rotation translation block (RTB) method [23,53], which yields low-frequency normal modes that are sufficient to describe the global dynamics of macromolecules such as RNA polymerase [47], the ribosome [5] and the F1 -ATPase [51]. Furthermore, Coarsegrained elastic network model combined with diagonalization techniques such as DIMB and RTB has reduced the computational cost on large biological machines. Nowadays, because of advancement in computational techniques as well as processing power, normal 1 2 JUN KOO PARK mode analysis has become as a very powerful tool to study dynamics of large biological molecules that are critically important in our bodies. In Figure 1.1 [91], history of NMA applied to biological molecules is described. The figure illuestrates the progress that has been made during the past few years. Figure 1.1 Bahar et al. introduced Gaussian network model (GNM) in 1997 [29]. The model was based on the previous work by Tirion and Go et al. Go et al. showed that the dynamic structure of the globular protein can be described as a linear combination of harmonic high-frequency motions and coupled anharmonic low-frequency motions of collective variables in 1983 [60]. Tirion showed that a single-parameter potential is sufficient to reproduce the slow dynamics in good detail in 1996 [89]. In the GNM, the fluctuation mechanics of large biological molecules can be modeled as those of elastic networks composed of nodes and springs; the nodes are the residues, and the springs are the inter-residue forces. Residues are assumed to have a Gaussian distributed fluctuations around their mean positions. That is, the GNM method applies the theory of elastic network and Gaussian(Normal) distribution to understand macromolecular dynamics with the assumption that the structure is near its equilibrium state, and only considers the residue contacts such as Cα contacts. The model uses the harmonic approximation to model inter-residue interactions. In other words, the spatial interactions between nodes are modeled with a uniform harmonic spring. NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 3 One of the purpose of the GNM is to understand dynamical properties of long-scale mechanics. The model can be used to study not only small proteins such as enzymes but also large biological molecules such as a ribosome or a viral capsid. Especially, GNM method involves only one single parameter from the work of Tirion [89]. From a computational point of view, The GNM method only requires solving a singular value decomposition (SVD) problem of Hessian matrix and therefore it makes the calculations computationally inexpensive than molecular dynamic simulation. 1.2. Motivation To understand the biological functions of protein, it is essential to know the internal motions. Especially, large amplitude motions are important to apprehend the conformational change. To study those dynamical properties of biological molecules using computational techniques several approaches can be considered. One of the method is using the molecular dynamics simulations. Alder et al. proposed the general method of studying in molecular dynamics in 1959 [59]. In the molecular dynamics simulations, the system can be expressed as a function of time. The application for large-scale molecular assemblies is still restricted to relatively short time scales(picoseconds) because of the computational complexity of all-atom simulation methods, even though computational techniques and processing power have been improved significantly. In other words, the utilization of standard molecular dynamics simulation is limited, as the timescales of these conformational changes are on the order of microseconds. Moreover, due to the limitation of X-ray crystallography, it is usually the case that a high resolution structure is not available and only low-resolution structural information is available. In that situation, molecular dynamics simulations can not be applied. Furthermore, because of the complex nature of the energy surface and the existence of a lot of energy minima, this method was not well-adapted. One of the way to lengthen the time scale is to use more coarse-grained models, which enable microsecond time scale to be reached for small proteins [78]. Because the coarsegrained models are sufficient to reproduce the low frequency normal modes on structure, one can assume that there is no need for an atomic description of the molecule. Note that Cα refers to the first carbon after the carbon that attaches to the functional group. It is the backbone carbon next to the carbonyl carbon. When describing a protein (which is a chain of amino acids), one often approximates the location of each amino acid as the location of its Cα . Thus, in these simulations, typically the Cα atoms are considered, which considerably reduce the number of atoms necessary for simulation. However, these calculations are still computationally expensive for large biological assemblies. To simulate large and slow conformational rearrangements of large macromolecular assemblies, we need to employ alternative methods that do not depend on the framework of molecular dynamics simulations. One of these methods is the Normal Mode Analysis (NMA). In NMA, a complete analytical solution of the equations of motion can be obtained with the assumption that the potential energy of the system can be approximated as a quadratic function around the potential energy minimum, and molecular motions are decomposed into vibrational modes. Despite this assumption, low-frequency modes obtained by this method appeared to describe enough the conformational changes that are observed experimentally [37,62,28,76,90] 4 JUN KOO PARK 2. Normal Mode Analysis 2.1. Theory In normal mode analysis, the dynamics of a molecule can be approximately represented as a set of simple harmonic oscillators. In this sense, because the motion of a harmonic oscillator can be analytically described, this expression is very useful [65]. From the classical mechanical model, for a simple harmonic oscillator of a mass m connected to a spring with the spring constant k, the equation of motion is given by: d2 x dt2 where F represents the total force on the particle and x represents its coordinate. F = −kx = m Therefore, the solution of the above equation is able to describe the dynamics of the particle. The solutions of above system can be obtained by the form of x = αcos(ωt + φ) where α and φ are the amplitude and the phase at time t=0, respectively and ω = the angular frequency associated with the vibrational mode [81]. q k m Recall. (Euler-Lagrange equation) Let L be a continuously differentiable functional. Then the Euler-Lagrange equation is given by ∂L(x, x0 , t) d ∂L(x, x0 , t) − [ ]=0 ∂x dt ∂x0 For the general case, based on the theory of classical mechanics, we have following: Lemma 2.1.1 Suppose that a biological molecule has n atoms. Let r(t) be the collection of the position vectors given molecule at time t such that r(t)={ri (t) : ri (t) = (xi (t), yi (t), zi (t))T , i = 1, · · · , n} where ri is the position vector of atom i. Then the molecular motion can be described as a collection of movements of the atoms in the molecule, as given in the following equations of motions: M r00 = −∇Ep (r) where M is a diagonal matrix with mii , the mass of atom i and Ep is the potential energy [56]. Proof. Let 1 L = (r0 )T M (r0 ) − Ep (r) 2 Then ∂L 1 = (M + M T )r0 = M r0 ∂r0 2 NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 5 d ∂L ( 0 ) = M r00 dt ∂r By the Euler-Lagrange equation, M r00 = And d ∂L ∂L ( 0) = dt ∂r ∂r ∂L = −∇Ep (r) ∂r Therefore, M r00 = −∇Ep (r) Because a large biological molecules is composed of a lot of molecules, the potential energy of the biological molecules is not simple, and thus the equation of motion of the molecules cannot be solved analytically. However, if one consider the motion in the surrounding of equilibrium position, the potential energy can be approximated by a simple form. Suppose that we have a molecule with n atoms, and let (xi , yi , zi ) be the coordinate of atom i. Then the coordinates of atoms are given by r = (x1 , y1 , z1 , · · · , zn ). Let r0 be the equilibrium configuration of the molecule which means that at r0 the potential energy is minimized. Then a Taylor expansion of the potential energy function Ep (r) around a minimum r0 gives: X ∂ 2 Ep X ∂Ep 1 0 0 0 Ep (r) = Ep (r0 ) + 0 (ri − ri ) + 2! 0 (ri − ri )(rj − rj ) + · · · ∂r ∂r ∂r i i j r=r r=r i ij Since the r0 is a minimum of the energy function, ∂Ep 0 (r ) = 0 ∂ri In addition, without loss of generality, the potential energy can be defined in connection with this structure as Ep (r0 ) = 0 Finally, if one considers sufficiently small displacements, terms greater than or equal to the second order may be neglected. Therefore, the approximate potential energy function is given by the harmonic approximation such as following: 1 X ∂ 2 Ep Ep (r) ≈ (ri − ri0 )(rj − rj0 ) 2 ij ∂ri ∂rj r=r0 or 1 Ep (r) ≈ (∆r)T [∇2 Ep (r0 )](∆r) 2 where ∆r = r − r0 Then, 1 M ∆r00 = M r00 = −∇Ep (r) = −∇[ (∆r)T [∇2 Ep (r0 )](∆r)] = −∇2 Ep (r0 )∆r. 2 From this equation of motion, we know that the protein is fluctuating as if the protein is governed by a harmonic potential energy around the equilibrium position, and the force 6 JUN KOO PARK field becomes approximately linear. Therefore, the equation can be solved analytically through the singular value decomposition of the Hessian matrix, ∇2 Ep (r0 ). Suppose that we have a protein with n atoms. Then the dimension of the Hessian matrix, H is 3n and we are able to find 3n sets of eigenpair solutions. Among those solutions, 3n-6 normal modes are indeed meaningful. Because the first six smallest normal modes have an eigenvalue equal to 0 and which correspond to 3 translational and 3 rotational motions of the whole system [14]. Fact. Based on classical dynamics, each atom will have a time-averaged potential energy of 23 kB T , where kB is Boltzmann constant and T absolute temperature, i.e. 3 Epta (ri ) = kB T 2 [86] Fact. For some constant ω > 0 and β, Z 1 n 1 lim cos2 (ωt + β)dt = hcos(ωt + β), cos(ωt + β)i = n→∞ n 0 2 Lemma 2.1.2 Suppose the singular value decomposition of the Hessian matrix, H = ∇2 Ep (r0 ) is given by U ΛU T . Then, the solution of the system of equations can be obtained by ∆ri (t) = 3n X Uij αj cos(ωj t + βj ), ωj2 = Λjj j=1 where αj and βj can be determined by the given condition of the system and in particular, [52,56] αj2 = 6kB T [Λ−1 ]jj P 2 0 Proof. Let ∆ri (t) = 3n j=1 Uij αj cos(ωj t + βj ) and H = −∇ Ep (r ). Then, From the equation of motion, M ∆r00 = −∇2 Ep (r0 )∆r = H∆r 3n X j=1 Mij ∆rj00 =− 3n X Hij ∆rj j=1 Substituting for ∆r and ∆r00 in above equation gives: − 3n X j=1 Mij 3n X k=1 Ujk αk ωk2 cos(ωk t + βk ) = − 3n X j=1 Hij 3n X k=1 Uij αk cos(ωk t + βk ) NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 7 Simplifying 3n 3n X X k=1 ! Mij Ujk αk ωk2 cos(ωk t + βk ) = j=1 3n 3n X X k=1 ! Hij Ujk αk cos(ωk t + βk ) j=1 Because above equation must hold for all t: ! 3n 3n X X 2 Mij Ujk ωk = Hij Ujk j=1 j=1 for each k = 1, · · · , 3n If we rewrite the above equation in matrix notation, it gives the equations of motion as: M U Λ = HU Because the singular value decomposition of H is given by U ΛU T , M U Λ = U ΛU T U = U Λ Thus, M = I, which means that coordinate U are mass-weighted. Therefore, the solution is given by 3n X ∆ri (t) = Uij αj cos(ωj t + βj ), ωj2 = Λjj j=1 where αk is the amplitude, ωj the angular frequency and βj , the phase of the jth normalmode of motion. For the amplitude αk , let Qj = αj cos(ωj t + βj ) Then, ∆ri (t) = 3n X Uij Qj j=1 ∆r = U Q Also, 1 1 1 Ep (r) = (∆r)T [∇2 Ep (r0 )](∆r) = (∆r)T H(∆r) = (U Q)T U ΛU T (U Q) 2 2 2 Therefore, 3n 3n 1X 1X 2 2 1 T 2 Λjj Qj = ω Q Ep (r) = Q ΛQ = 2 2 j=1 2 j=1 j j Then, the time-averaged potential energy is: 1 2 1 1 3 ωj hQj , Qj i = ωj2 αj2 hcos(ωj t + βj ), cos(ωj t + βj )i = ωj2 αj2 = kB T 2 2 4 2 Therefore, 6kB T αj2 = = 6kB T [Λ−1 ]jj 2 ωj 8 JUN KOO PARK Here, ωj2 = Λjj is called the jth mode of the motion. Based on the above solution, the larger value the ωj has, the faster the corresponding mode is [73]. If the system is in the thermal equilibrium, from the proof of lemma, kB T hQj , Qj i = 2 ωj Fact. If ωi 6= ωj , then hcos(ωi t + βi ), cos(ωj t + βj )i = 0 Theorem 2.1.3 Suppose that r1 , · · · , rn is the equilibrium component positions of the residues in a protein, and the solution of the system of equations, for each atom i, is given P3n kB T 2 −1 by ∆ri (t) = j=1 Uij αj cos(ωj t + βj ), where αj = 6 γ [Λ ]jj and the singular value decomposition of the Hessian matrix, H = ∇2 Ep (r0 ) is given by U ΛU T . Then, the mean square fluctuation of atom i is h∆ri , ∆ri i = 3kB T 3n X Uij [Λ−1 ]jj Uij j=1 [52] Proof. h∆ri , ∆ri i = = = * n X Uij αj cos(ωj t + βj ), j=1 n X n X + Uij αj cos(ωj t + βj ) j=1 hUij αj cos(ωj t + βj ), Uij αj cos(ωj t + βj )i j=1 n X Uij αj2 Uij hcos(ωj t + βj ), cos(ωj t + βj )i j=1 n 1X = Uij αj2 Uij 2 j=1 n = 1X Uij 6kB T [Λ−1 ]jj Uij 2 j=1 = 3kB T n X Uij [Λ−1 ]jj Uij j=1 Therefore,P based on the theorem above, because the residue fluctuation of a protein, 3n ∆ri (t) = j=1 Uij αj cos(ωj t + βj ) can be expressed as a sum of simple oscillator, the dynamics of the system can be described as a linear combination of independent normal mode oscillators. The calculated fluctuations show the structural flexibilities by adding NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 9 over the significant modes. They give the beneficial dynamical properties of the given protein. Notice that h∆ri , ∆ri i = 3kB T 3n X j=1 −1 Uij [Λ ]jj Uij = kB T 3n X Uij2 j=1 ωj2 If the singular value, λjj is small, then the frequency ωj is also small, therefore the mode is slow. Thus, the smaller singular value is, the slower mode it corresponds to, and also the larger fluctuation it makes. The mean square fluctuation has usually connection with the experimentally detected average atomic fluctuation so called B-factor in X-ray crystallography and the order parameter in NMR. Especially, using the above relations, the B-factor, Bi (also called x-ray crystallographic temperature factor or Debye-Waller factor) of each atom is given by Bi = 3n X 8π 2 8π 2 8π 2 h∆ri , ∆ri i = kB T Uij [Λ−1 ]jj Uij (ri − ri0 )2 = 3 3 3 j=1 3n X Uij2 8π 2 kB T = 3 ωj2 j=1 From this equation, the smaller ωj has, the larger Bi has. Thus it is obvious that the largest contributions to the atomic displacement comes from the lowest frequency normal modes(small ω). In addition, the eigenvector corresponding to the lowest frequency represents the most globally distributed or collective motions. For these reasons, studies on the NMA generally focus on a few low frequency normal modes, and in real, only several terms corresponding to the smallest singular values are used for the evaluation of the residue fluctuation [90]. Figure 2.1 [65] illustrates the difference between the adenylate kinase and cytochrome c. For the lowest frequency normal mode of the adenylate kinase, almost all the atoms are moving together with a large amplitude. On the other hand, a high frequency normal mode of the cytochrome c shows only a few atoms are experiencing a motion with a very small amplitude. From the latter case, this normal mode is much more localized than the one for adenylate kinase. The direction and amplitude of arrows represents the those of motions of atoms, respectively [1]. 10 JUN KOO PARK Figure 2.1 Normal Mode Computations When the theory of NMA is applied to the biological molecules, energy minimization is required first to ensure that the structure is indeed at a minimum of the potential energy function. This minimization is followed by the diagonalization of the Hessian matrix, the second dervatives of the potential energy with respect to the Cartesian coordinates. The size of this matrix increases as the square of the number of atoms. Generally, the digonalization process has been computationally expensive. Since the 1980, a lot of studies performed to reduce the number of degrees of freedom. The first attempt was based on the use of dihedral angle space, ignoring the other degrees of freedom [52,60]. This approach reduces the degrees of freedom more than ten times. But the size of Hessian matrix still NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 11 increases as the square of the dihedral angle’s number. Elastic Network Model(ENM) methods neglect the coupling between the retained degrees of freedom and those that were ignored, although such a coupling may be essential to describe the conformational change. Thus, it is important to consider other methods that take into account all the degrees of freedom. One way is using iterative methods to diagonalize the Hessian matrix [70,84,10]. There are three main iterative methods. One is the method based on Rayleigh quotient, and another is perturbation method and the other is DIMB method. Mouawad and Perahia introduced an iterative method of diagonalization in a mixed basis(DIMB) in 1993 [84,61,1]. The method was implemented in the CHARMM [30]. Moreover, Tama et. al. proposed the rotation translation block(RTB) methods in which set of residues was regarded as a rigid block having six translation-rotation degrees of freedom [23]. This approach reduces drastically the size of the Hessian matrix and thus the computational time. In conclusion, computational techniques such as the diagonalization in a mixed basis(DIMB) or the rotation translation block(RTB) approach enable NMA calculations in a very short amount of time for large biological molecules. Potential energy functions used for NMA When NMA is applied to the biological molecules, the quality of the normal modes usually depends on reaching a true minimum on the potential energy. Thus, for large biological molecules, the minimizaiton procedure discussed above is not trivial. Generally the minimization process needs to be performed until it is sufficient to ensure that real frequencies for all modes are not connected with translation or rotation of the entire molecule. Tirion proposed a simplified model in 1996 which can be used to overcome the major issues regarding the minimization procedure [89]. In this simplified model, the biological molecule can be described as a three-dimensional elastic network on which the probability distribution of atomic fluctuation is normal in an experimentally determined structure. One of the model which satisfied those conditions is Gaussian Network Model [1]. In this model, because the energy function is at the minimum, it is possible to eliminate the need for minimization prior to NMA. As a result, NMA can be applied directly on crystal coordinates of NMR structures. Hence, it is now possible to determine whether two crystal forms of a protein, open form and closed form, are interconvertible using the slow modes as coordinates [89]. Those procedures reduces a lot of the computational expense and makes NMA an ideal candidate for study of conformational change pathways between two states [6,21,40,48,68,69,72,74]. Here, the direction of low frequency normal modes can be used to deform the structure from its original conformation to other known conformations. Due to the nonlinearity of the conformational change, one needs to deform the structure iteratively so that it fits optimally. This concept was introduced in the normal mode flexible fitting(NMFF) method [46,75]. Those iteration procedure may require hundreds of NMA. Such deformations would cause the structure to move from its minimun, and therefore a minimization process would be necessary to perform NMA on the deformed structure. However, with elastic network, one can directly apply NMA without additional minimization. More importantly, the elastic network model can be used for protein models without atomic detail. Using the coarse-grained representation of a biological molecule reduces the size of the Hessian matrix. Thus, the model improves the computational efficiency 12 JUN KOO PARK dramatically, which is essential for the study of large biological molecules [13,25,26]. Normal modes are properties of the shape of Biomolecules As advanced models and computational techniques developed based on lots of studies, the role of shape and form has became clear. From all-atom model to multi-resolution elastic network models, those advances has shown the connection between the shape and governing the dynamics of biological molecule. In fact, this elastic network model can be used to express the slow dynamical properties of proteins with high degree of accord [24]. Around high frequencies this accord tends to disagree ,however, there have been many studies showing that collective motions found in the low frequency normal modes effectively describe biologically relevant conformational changes. The good agreement between experimental data and the modes constructed from these methods suggests that low frequency normal modes are a predominant property of the shape of the molecular system [5,39,91]. In other words, the character of these low frequency normal modes is mainly due to the properties of the shape of the biological molecules [65]. Thus, it means that an atomic representation of the biological molecules is not needed to obtain its dynamical properties, rather only its shape would be necessary. These observations indicates that larger biological assemblies may have been designed in a way that they adopt a specific form from which specific dynamics can arise [49,82]. Furthermore, those suggests that these machines then use these motions that are the property of the shape. Hence, the shape corresponding to dynamical properties of the structure is the key to understand the function of biological system. 2.2 Applications Conformational changes Theory of normal mode can be used to investigate the dynamics of molecules of biological object. Such area includes NMR order parameters, the study of large-scale conformational changes of proteins [6,21,40,48,68,69,72,74], B-factor refinement in crystallography [71,77,78], and so on. Moreover, NMA has also been used to investigate DNA and RNA dynamics [42]. Nowadays, because of advanced computational techniques, many studies indicates the utilization of NMA to cope with problems in connection with dynamics of large biological systems. In fact, with the help of NMA, dynamical properties of the molecules can be obtained in a very short amount of time compared to molecular dynamics simulations [18,54]. From the discussion above, the low frequency normal modes often correlate well with experimentally observed conformational changes [8,24]. A few low-frequency normal modes are able to describe the most visible motions occurring in the system and those are expected to be related with function of given biological molecules. For these reasons, NMA has been used in many studies and the study focused only on a few large-amplitude/low-frequency normal modes. The fact that studies on the normal modes of a biological system are able to give insights to dynamics of large-scale conformational change of protein is quite interesting. Such studies have helped our understanding of the functioning of large molecular biological machines. NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 13 Furthermore, based on the theory of NMA, feasible pathways between two distinct structurally determined conformations of a biological molecule could be constructed by using a linear combinations of the low frequency normal modes. The explanation of the pathways for conformational changes in biomolecular systems may be helpful in the analysis of functional rearrangements. To find those pathways, models of intermediate structures between the two known conformational states of the molucule needs to be generated. This can be approached in various ways using NMA [93]. Figure 2.2 illustrates the conformational change pathway of the maltose binding protein [65]. The protein starts with the open form and a linear combination of the low frequency normal modes yield the intermediate conformation iteratively. Finally, the protein reaches the closed structure. Figure 2.2 The basic idea is to follow the lowest frequency nomral modes directions of the system between the two end point states. One way is using the linear interpolation between endpoints using normal mode directions. And the other way is using nonliner/iterative approach [19]. Figure 2.3 shows the nonlinear conformational change between the open and closed forms of the adenylate kinase [1]. The lowest frequency normal mode of the open structure illustrates the preferential direction of the conformational change. Although the conformational change pathway is nonlinear, the normal modes provide only linear motions. Thus, the mode itself is not enough to lead the structure to the closed form. From the figure 2.3, the direction of the conformational change varies as the structure is changing from the initial to the final structure. Furthermore, if the molecule is moving along the lowest normal mode direction too far, the local structure may be distorted. Because of the those problems, performing the NMA and conformational deformations in an iterative manner is essential to find the intermediate structure [46,75]. 14 JUN KOO PARK Figure 2.3 Interpretation of experimental data NMA has been used to refine crystallographic B-factors obtained from X-ray crystallography since 1990s. Nowadays, due to the development of computational techniques for NMA, new ways began to open to complement structural experimental data from experimental tools. For the most part, normal mode vectors can be used as search directions to mimic protein dynamics to achieve a better fit to the experimental data. [46,75] More recently, NMA was also introduced to improve the molecular replacement technique used in X-ray crystallography. Indeed, structures of unknown large biological molecules could be modeled using data from previously known structural fragments. However, it might be the case that the conformation would not exactly match the known templates. By exploring the dynamics of these templates using NMA, one can generate several conformations that can be examined as possible solutions for molecular replacements. Compared to a full-atomic normal mode approach, one of the advantage of elastic network model is that the techniques of NMA can be applied to low resolution structural data. Recently, new methodologies based on NMA have also appeared to use NMA in connection with lower resolution structural information to predict the structure or the conformational change of biological molecules. As a matter of fact, Chacon et al. showed functionally important rearrangements of several large biological molecules with the applications of NMA to low resolution data [55]. The low resolution information can be obtained from density maps measured by cryogenic electron microscopy (cryo-EM). In the case of cryo-electron microscopy experiments [31], it is often the case that a low resolution structure of the biological molecule is obtained but its conformation differs from a known X-ray structure. To interpret the low resolution data, one would like to deform the known X-ray structure so that it would fit into the low resolution data. As low frequency normal modes often represent functional motions, those modes can be used to deform the structure. In other words, atomic motions corresponding to low-frequency normal modes NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 15 are not localized but globalized. Moreover, a description of the dynamics can be applied directly to 3D image reconstructions from EM, as it is independent of the resolution of the underlying data. Therefore, the structural information at low resolution can be used to construct discrete multi-resolution models using the techniques of vector quantization [94]. This technique can be used to represent a 3D density map in real space with a finite number of so-called landmark points. Figure 2.4 Figure 2.4 illustrates how NMA can be applied to a low-resolution electron microscopy map of E. coli RNA polymerase [63,66]. The figure shows that how the lowest-frequency normal modes can be extracted from a low-resolution electron microscopy map in the absence of a known atomic structure. The lowest frequency normal modes can describe the essential motions. From the left figure, the EM map is represented by 1500 landmarks (green) distributed by vector quantization according to the map density. Those landmarks were computed with vector quantization algorithm implemented in the Situs package [66]. The elastic network is formed by close-neighbor interactions (red). From the right, the lowest-frequency mode is obtained by NMA. The arrow indicates the direction and relative amplitude of the displacements. Thus large conformational changes can be elucidated directly from low-resolution structural data through the application of elastic normal mode theory and vector quantization. Database for the NMA As the information of protein structure has been accumulated, the necessity to use given data has been raised. As a result, a lot of databases were made to investigate the dynamical properties of protein more deeply and accurately and computer simulations has played an important role as a tool for understanding functions of protein [38,41]. Although NMA is based on the harmonic approximation, the results from NMA are less time-consuming 16 JUN KOO PARK and reasonable than Monte Carlo simulation and molecular dynamics [32], and hence databases of NMA has been constructed such as ProMode, ElNemo, WEBnm, NOMADRef and so on [36,50,67]. Promode is based on full-atom model [45], and it can provide more realistic molecular motions in full-atom details [35]. Moreover, comparisons between homologous proteins can be possible [3] and search can be manipulated by a new knowledge information [83]. ElNemo provides not only a fast and simple tool to analyze low-frequency modes but also generate a large number of different starting templates for molecular replacement [33]. One of the strengh of ElNemo is that there is no upper limit to the size of the proteins that can be treated because rotation-translation block (RTB) approximation is implemented 3. Gaussian Network Model 3.1. Theory Let’s consider the representation of nodes(residues of protein) in elastic network of GNM. Assume that every node is connected to its spatial neighbors with uniform spring constant. For ith and j th nodes, Let Ri0 and Rj0 denote the distance from the origin to the ith and j th residue of protein, respectively. So Ri0 and Rj0 are equilibrium position vec0 denote the equilibrium distance vector between Ri0 and tors, respectively. Similarly, Rij Rj0 . Also, ∆Ri and ∆Rj means the instantaneous fluctuation vectors of each ith , j th node. Instantaneous position vectors of these nodes are defined by Ri and Rj , and Rij is instantaneous distance vector between ∆Ri and ∆Rj . Therefore, Rj − Rj0 = ∆Rj , Ri − Ri0 = ∆Ri 0 Rij − Rij = (Rj − Ri ) − (Rj0 − Ri0 ) = (∆Rj − ∆Ri ) = ∆Rij Potential energy of the Gaussian network In GNM method, a three dimensional protein structure is usually described as an elastic network connected by harmonic springs with a certain cutoff distance. Generally, in the reduced model, the residue of a protein is considered as a location of Cα atoms of a protein, and a sequence of such residues connected with strings forms an elastic network. In the mathematical field of graph theory, the Laplacian matrix, sometimes called admittance matrix or Kirchhoff matrix, is a matrix representation of a graph. This matrix can be used to express the connectivity of the residue of the protein. Contact matrix of Cα atoms of a protein is constructed by using the Kirchhoff matrix Γ, defined by if i 6= j and Rij < rc −1 0 if i 6= j and Rij > rc Γij = PN − j,j6=i Γij if i = j where rc is a cutoff distance for spatial interactions, usually 7Å for proteins [80]. The energy function can be used to describe the interaction among the residues. When a protein changes from an arbitrary position to an equilibrium state, the connected residues NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 17 can be considered as a set of masses contacting with springs. Thus around their equilibrium positions, using the harmonic potential approximation, potential energy of the network can be defined in terms of ∆Ri and ∆Rj , instantaneous fluctuation vectors of each ith , j th node respectively. From the integral of Hooke’s Law, Ep can be expressed in terms of ∆Xi , ∆Yi and ∆Zi components of ∆Ri as follows N γ X 2 2 2 Ep = Γij [(∆Xj − ∆Xi ) + (∆Yj − ∆Yi ) + (∆Zj − ∆Zi ) ] 2 i,j for some force constant γ. If we express the X, Y and Z components of the fluctuation vectors ∆Ri as ∆X = [∆X1 , ∆X2 , · · · , ∆XN ]T , ∆Y = [∆Y1 , ∆Y2 , · · · , ∆YN ]T and ∆Z = [∆Z1 , ∆Z2 , · · · , ∆ZN ]T , then above equation simplifies to Ep = γ [∆X T Γ∆X + ∆Y T Γ∆Y + ∆Z T Γ∆Z] 2 [1] Furthermore, in the field of statistical mechanics, each fluctuation vector ∆X, ∆Y and ∆Z has to follow the Maxwell-Boltzmann distribution, respectively p(∆X) = −Ep (∆X) 1 γ 1 exp( )= exp(− ∆X T Γ∆X) N kB T N 2kB T where kB is the Boltzmann constant, T is the absolute temperature and N is the normalization factor [58]. Similarly, p(∆Y ) and p(∆Z) can be expressed. Note that because p(∆X) is the probability density function(PDF), Z Z 1 γ p(∆X)dX = exp(− ∆X T Γ∆X) = 1 N 2k T B Rn Rn Thus, Z N= exp(− Rn γ ∆X T Γ∆X) 2kB T This model, which also referred to as the Gaussian Network Model [88], can be used to understand the movement of each residue in the protein mechanically. Especially, we are interested in calculating expectation values of a particular residue fluctuations socalled mean-square fluctuation or the correlations between the fluctuations of two different residues. Based on the statistical physics foundations of GNM, mean-square fluctuation for each residue h∆Ri , ∆Ri i, and cross-correlations, h∆Ri , ∆Rj i can be evaluated. Because the mean square fluctuations for particular residue i, is the sum of component fluctuations, h∆Ri , ∆Ri i = h∆Xi , ∆Xi i + h∆Yi , ∆Yi i + h∆Zi , ∆Zi i 18 JUN KOO PARK Because cross-correlations between the fluctuation of two different residues, i and j, is the sum of component cross-correlations, h∆Ri , ∆Rj i = h∆Xi , ∆Xj i + h∆Yi , ∆Yj i + h∆Zi , ∆Zj i Definition 3.1.1 Suppose that X1 , · · · , Xn has a normal distribution N(µ1 , σ12 ), · · · , N(µn , σn2 ), respectively. Then an n-dimensional random variable X with mean vector µ and covariance matrix Σ is said to have a nonsingular multivariate normal distribution, if (i)Σ is positive definite, and (ii) the probability density function (PDF) of X is of the form 1 1 T −1 fX (x1 , · · · , xn ) = (x − µ)) 1 exp(− (x − µ) Σ n 2 2 2 (2π) |Σ| where |Σ| is the determinant of the covariance matrix [87,9]. If we combine the nonsingular and singular cases together, we have following definition. Definition 3.1.2 An n-dimensional random variable with mean vector µ and covariance matrix Γ is said to have a multivariate normal distribution ( in symbols Nn (µ,Σ)) if either X ∼ Nn (µ, Σ), Σ > 0, or X ∼ Nn (µ, Σ), |Σ| = 0 Under certain conditions, an n-dimensional random variable with mean vector µ and covariance matrix Γ can have a singular multivariate normal distribution [9]. From the above discussion, p(∆X) = 1 γ 1 1 kB T −1 −1 exp(− ∆X T Γ∆X) = exp(− (∆X T ( Γ ) ∆X)) N 2kB T N 2 γ By the Definition 3.1, we know that ∆X has a n-dimensional multivariate normal distribution with mean zero and the covariance matrix kBγ T Γ−1 ∆X ∼ N (0, kB T −1 Γ ) γ Therefore, the probability density function (PDF) for ∆X around the equilibrium positions is given by p(∆X) = 1 (2π) N 2 1 1 ( kBγ T ) 2 |Γ−1 | 2 1 kB T −1 −1 exp(− (∆X T ( Γ ) ∆X)) 2 γ Because the smallest eigenvalue of contact matrix Γ is zero, the determinant of Γ is zero. Thus, Γ−1 does not exist. Instead, pseudo inverse can be computed by using singular-value decomposition of Γ. Γ−1 is constructed using the N-1 non-zero eigenvalues and associated eigenvectors. NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 19 Recall. From calculus, for some constant a>0, r Z π 2 exp(−ax )dx = a R r Z π x2 exp(−ax2 )dx = 4a3 R Theorem 3.1.3 Suppose that X1 , · · · , Xn is the equilibrium component positions of the residues in a protein, Γ is the corresponding contact matrix, and ∆X1 , · · · , ∆Xn is the corresponding component residue fluctuations. Suppose that the singular value decomposition of Γ is given by UΛUT where U is orthogonal matrix and Λ is diagonal matrix with singular values. Then, n kB T X h∆Xi , ∆Xi i = Uij [Λ−1 ]jj Uij γ j=1 [56] Proof. Z h∆Xi , ∆Xi i = (∆Xi )2 p(∆Xi )d∆Xi R 1 −Ep (∆Xi ) (∆Xi )2 exp( )d∆Xi Ni kB T R Z 1 −γΓii (∆Xi )2 (∆Xi )2 exp( )d∆Xi Ni R 2kB T R 2 ii (∆Xi ) (∆Xi )2 exp( −γΓ2k )d∆Xi R BT R 2 ii (∆Xi ) exp( −γΓ2k )d∆Xi R BT r r π c −γΓii ,c = 3 4c π 2kB T kB T −1 [Γ ]ii γ kB T [U Λ−1 U T ]ii γ n kB T X Uij [Λ−1 ]jj Uij γ j=1 Z = = = = = = = [29,57,88] The above theorem shows that the mean-square fluctuation of residue can be evaluated with only simple calculations referred to as singular-value decomposition. Because h∆Xi , ∆Xi i = h∆Yi , ∆Yi i = h∆Zi , ∆Zi i = 1 h∆Ri , ∆Ri i 3 20 JUN KOO PARK the mean square fluctuation of the residue i is given by h∆Ri , ∆Ri i = 3kB T −1 [Γ ]ii γ and the cross-correlations between residue i and j is given by h∆Ri , ∆Rj i = 3kB T −1 [Γ ]ij γ Mode decomposition Based on the above discussion, the cross-correlations between residue i and j is h∆Ri , ∆Rj i = 3kB T −1 3kB T 3kB T [Γ ]ij = [(U ΛU T )−1 ]ij = [U Λ−1 U T ]ij γ γ γ 3kB T X 1 [ uk uTk ]ij = γ λk k The frequency and shape of a mode is represented by its eigenvalue and eigenvector, respectively. Since the Kirchhoff matrix is positive semi-definite, the first eigenvalue, λ1 , is zero and the corresponding eigenvector have all its elements equal to √1N . This shows that the network model is translation invariant. Here, the summation is performed over all non-zero eigenvalues of Γ. It is clear that the above equations allow us to calculate the amplitudes of fluctuations for individual residues without providing information regarding their absolute orientations or directions. However, in reality the fluctuations are anisotropic in general. And it is important to access the directions of collective motions, as these can be directly relevant to biological function and mechanisms. It is not indeed possible to acquire an understanding of the mechanism of motion unless the fluctuation vectors, in addition to their magnitudes, are elucidated. An extension of the GNM, called the anisotropic network model (ANM) address this issue [1,2]. The main difference between the GNM and ANM is that all residue fluctuations are isotropic in the GNM, while ANM does have directional preferences [32]. In the GNM, for the X, Y and Z directions, fluctuations of X, Y and Z are indepdent and identically distributed each other, therefore the joint probability distribution function (PDF) of all fluctuation can be expressed by P (∆R) = P (∆X, ∆Y, ∆Z) = p(∆X)p(∆Y )p(∆Z) = [p(∆X)]3 where P denotes the joint PDF of X, Y and Z and p denotes the PDF. This formula requires inverse of the Kirchhoff matrix. In the GNM, the determinant of Kirchhoff matrix is zero, hence calculation of its inverse requires singular value decomposition. Therefore, the probability distribution of all fluctuations in GNM becomes P (∆R) = p(∆X)p(∆Y )p(∆Z) = 1 (2π) 3N 2 3 T kB T −1 −1 exp(− (∆X ( Γ ) ∆X)) 2 γ ( kBγ T ) |Γ−1 | 3 2 3 2 NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 21 The relationship between NMA and GNM Theorem 3.1.4 The Gaussian Network Model is equivalent to the Normal Mode Analysis for evaluating the mean square residue fluctuations of a protein around equilibrium state, with the energy function defined for the residues instead of the atoms by [52,56] γ Ep ≈ [∆X T Γ∆X] 2 Proof. In the proof of Lemma 2.1.2, 1 Ep (r) = QT ΛQ 2 Thus, suppose that Ep (r) is defined by γ Ep (r) = QT ΛQ 2 Then, 6kB T −1 αj2 = [Λ ]jj γ Therefore, by Theorem 2.1.3 3n 3kB T X h∆ri , ∆ri i = Uij [Λ−1 ]jj Uij γ j=1 and Theorem 3.1.3, n kB T X Uij [Λ−1 ]jj Uij h∆Xi , ∆Xi i = γ j=1 Therefore, h∆Ri , ∆Ri i = n 3kB T X Uij [Λ−1 ]jj Uij γ j=1 Therefore, for both NMA and GNM, the mean square fluctuation for residue i can be evaluated by using the singular value decomposition. Hence the Gaussian Network Model is equivalent to the Normal Mode Analysis for evaluating the mean square residue fluctuations of a protein around equilibrium state given potential energy. 3.2 Applications As we know, in order to know a variety of biological functions of proteins, the knowledge of three dimensional structures of proteins and their dynamic behaviors are indispensable. Based on the GNM, to understand the dynamic behaviors of proteins, fluctuations of residue can be calculated around the equilibrium state. On the other hand, equilibrium fluctuations of biological molecules can be measured experimentally, too. One of the way to measure the fluctuations is based on the X-ray crystallography. In X-ray crystallography, B-factor (or temperature factor) of each atom is a measure of mean-squared 22 JUN KOO PARK fluctuation of the native structure. The other way to obtain the fluctuations is established on the work of nuclear magnetic resonance (NMR). In NMR experiments, this measure can be obtained by calculating root-mean-squared differences between different models. In many applications and publications [17,22,27], including the original articles [29], it has been shown that expected residue fluctuations obtained from GNM is in good agreement with the experimentally measured native state fluctuations. The relation between B-factors, for example, and expected residue fluctuations obtained from GNM is as follows Bi = 8π 2 8π 2 kB T −1 h∆Ri , ∆Ri i = (Γ )ii 3 γ There is a single parameter in the theory, the force constant γ. This parameter is a measure of the strength of inter-molecular potential energy that stabilize the native folding. In the GNM theory, γ has been estimated for each protein by comparing above equation. Figure 3.1 NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 23 Figure 3.1 [1](a) compares the B-factors evaluated by experimental data(open circles) and those predicted from GNM for ribonuslease T1(RNase T1) in Protein Data Bank. (b) shows the agreement between the mean square fluctuations predicted by the GNM and those obtained across the 20 NMR models in PDB for reduced disulphide-bond formation facilitator (DsbA) [92]. The correlation coefficient between GNM results and experimental data for these two example protein are 0.769 and 0.823, respectively. Physical meanings of slow and fast modes Recall. The cross-correlations between residue i and j is h∆Ri , ∆Rj i = 3kB T X 1 [ uk uTk ]ij γ λk k Then, consider the correlation h∆Ri , ∆Rj ik , contributed by the kth mode h∆Ri , ∆Rj ik = 3kB T 1 [uk ]i [uk ]j γ λk where [uk ]i is the ith element of uk . Similarly, we can consider mean square fluctuation for residue i, contributed by the kth mode h∆Ri , ∆Ri ik = 3kB T 1 [uk ]2i γ λk Because U is unitary, uTk uk =1, uk is normalized. Thus, the plot of [uk ]2i for each residue i=1,· · · ,n generalizes the normalized distribution of mean square fluctuations in the kth mode. This plot is called the kth mode shape. Because the residue fluctuations are related with the B factors, [uk ]i represents the mobilities of residue i in the kth mode. Due to the factor λ1k , the slowest mode has the largest contribution. Moreover, in general, the slowest motions have the highest degree of collectivity. In other words, many residues are involved in the slow motion [1]. Shapes of the slowest modes reveal the mechanisms of global motions, and the most constrained residues (minima of residue fluctuation) in these modes, play a important role that govern the correlated movements of entire domains [17,85]. Although these motions are slow, they involve substantial conformational change over several residues. On the other hand, the fastest modes involve the most tightly packed and constrained residues in the molecule. Figure 3.2 [1] compares the degree of collectivity for the slowest and fastest modes for a protein, RNase T1. Figure (a) illustrates the broad and delocalized peaks over the residues in the slow mode. In contrast, Figure (b) shows that the fastest modes are highly localized with only a few peaks. 24 JUN KOO PARK Figure 3.2 Diagonalization of the Kirchhoff matrix decomposes the normal modes of collective motions of the Gaussian network model of a biomolecule. The expected values of fluctuations and cross-correlations are obtained from linear combinations of fluctuations along these normal modes. The contribution of each mode is scaled with the inverse of that modes frequency. Hence, slow (low frequency) modes contribute most to the expected fluctuations. Along the few slowest modes, motions are shown to be collective and global and potentially relevant to functionality of the biomolecules. Fast (high frequency) modes, on the other hand, describe uncorrelated motions not inducing notable changes in the structure. Database based on the GNM To control and understand the function of protein, not only the knowledge of protein structure but also the dynamics of protein is required because a function is a dynamic property. In this sense, characterizing the dynamics of proteins is essential. Yang et al. computed protein dynamics from protein data bank based on the GNM(PDB). The singular value decomposition of contact matrix Γ is the most expensive part in GNM calculations [7]. They also generated information at the residue level around equilibrium position and the results are saved on iGNM, a web-based system. Static and dynamic images of protein are accessible. Moreover, from oGNM, users are able to evaluate the shape and dispersion of normal modes. Computations with the new engine are faster than those using conventional normal mode analyses [11]. Based on the many studies, NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 25 usefulness of the GNM with EN methods are demonstrated. Thus, effective calculation tool, such as oGNM is expected to be a useful resource in their research field. 4. Concluding Remarks In this paper, the original and historical background of Normal Mode Analysis was reviewed. Since the establishment of normal mode theory, it has been used in a variety of fields such as B-factor refinement in crystallography, NMR order parameters, the study of large-scale conformational changes of proteins [48,68,69,72,74]. Until the development of diagonalization in mixed basis(DIMB) method, normal mode studies remained limited to proteins of less than 300 residues. By using DIMB method, studying on large molecular assembly comprising 2760 residues was successful [1] contrary to methods based on Rayleigh quotient [70] or on the perturbation approach [10]. Evolution in diagonalizaiton techniques such as DIMB or RTB with the coarse-grained elastic network model has reduced computational expense on large biological system. In particular, as techniques in X-ray crystallography have progressed, structures of larger biological molecules have become available and new techniques for NMA were needed to address dynamics of these larger molecules. A few lowest frequency normal modes are related to the motions of structural domains [20]. Thus, NMA is useful for studying the mechanical details of the motions if the motion has a collective character and amplitude is large enough [23,47]. Moreover, perturbing the structure along its lowset-frequency modes seems promising tool for molecular replacement problems [12], structure refinement [44] and fitting atomic structures into low resolution electron desity maps [19]. Regarding theory of normal mode, mean square fluctuation was reviewed mathematically. In other words, residue fluctuation can be expressed as a sum of simple oscillator provided that singular value decomposition of the Hessian matrix is given. For the larger biological molecules, classical molecular dynamics simulation has been limited due to their complexity. Based on the normal mode analysis approach, analysis for conformational rearrangement and low resolution experimental data of protein can be possible. Furthermore, a key to understand the dynamics of protein is the shape dependent on the structure. The other method to understand dynamics of protein is Gaussian Network Model based on the theory of elastic network and Gaussian distribution combined with the assumption that the structure is near its equilibrium state [1]. Based on the statistical physics foundations, mean-square fluctuation of residue can be evaluated with only simple calculations referred to as singular-value decomposition. More importantly, the Gaussian Network Model is equivalent to the Normal Mode Analysis for evaluating the mean square residue fluctuations of a protein around equilibrium state combined with same energy function. Therefore, the GNM can be used in several areas that the NMA applied such as B-factor refinement in crystallography, NMR order parameters and so on [17,22,27,29]. Furthermore, GNM can be used to understand the relationship between the degree of collectivity and the modes. Shapes of slow modes reveal the dynamics of global motions and fastest modes involve the most constrained residues [1]. 26 JUN KOO PARK References [1] Cui, Q., Bahar, I. (2006). Normal Mode Analysis: Theory and Application to biological and chemical systems. Chapman & Hall/CRC [2] Eran Eyal, Lee-Wei Yang and Ivet Bahar (2006). Anisotropic Network Model: Systematic Evaluation and a New Web Interface [3] Wako, H., Otsuka, M., Tomizawa, Y., Kato, M., Endo, S. (2005). Comparisons between Dynamic Properties of Homologous Protein Structures in ProMode. [4] (2003). Classical Normal Mode Analysis: Harmonic Approximation. (lecturenote) http://www.colby.edu/chemistry/PChem/notes/NormalModesText.pdf [5] Tama, F., Valle, M., Frank, J., Brooks C.L. (2003). Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo-electron microscopy. PNAS, 100(16), 9319-9323 [6] Valadie, H., Lacapc?re, J.J., Sanejouand, Y.-H., Etchebest, C. (2003). Dynamical Properties of the MscL of Escherichia coli: A Normal Mode Analysis, Journal of Molecular Biology, 332, 657-674 [7] Yang,L.W., Liu,X., Jursa,C.J., Holliman,M., Rader,A.J., Karimi,H.A. and Bahar,I. (2005). iGNM: a database of protein functional motions based on Gaussian Network Model. Bioinformatics, 21, 2978-2987. [8] Van Wynsberghe, A. W., Cui, Q. (2006). Interpreting Correlated Motions Using Normal Mode Analysis. Structure, 14, 1647-1653 [9] Tong, Y. (1990). The Multivariate Normal Distribution. Srpinger-Verlag [10] Durand, P., Trinquier, G., Sanejouand, Y. (1994). A new approach for determining low frequency normal modes in macromolecules, Biopolymers, 34, 759 [11] Yang, L.W., Rader, A. J., Liu, X., Jursa, C.J., Chen, S.C., Karimi, H.A., Bahar, I. (2006). oGNM: online computation of structural dynamics using the Gaussian Network Model. Nucleic Acids Research, 34, W24-W31 [12] Suhre,K. and Sanejouand,Y.-H. (2004) On the potential of normal-mode analysis for solving difficult molecular-replacement problems. Acta Crystallogr., D60, 796-799. [13] Kondrashov, D.A., Cui, C., Phillips, G.N. (2006). Optimization and Evaluation of a Coarse-Grained Model of Protein Motion Using X-Ray Crystal Data. Biophysical Journal, 91, 2760-2767 [14] Schlick, T.(2003). Molecular Modeling and Simulation, An Interdisciplinary Guide, Springer [15] Eman, B. (2006). The Gaussian Network Model: Precise Prediction of Residue Fluctuations and Application to Binding Problems. Biophysical Journal, 91, 3589-3599 [16] Li, G., and Q. Cui. (2002). A coarse-grained normal mode approach for macromolecules: an efficient implementation and application to Ca21-ATPase. Biophys. J. 83:2457-2474. [17] Bahar, I., Atilgan, A., Demirel, C., Erman, B. (1998). Vibrational Dynamics of Folded Proteins: Significance of Slow and Fast Motions in Relation to Function and Stability. Physical Review Letters, 80(12), 2733-2736. [18] Micheletti, C., Carloni, P., Maritan, A. (2004). Accurate and efficient description of protein vibrational dynamics: comparing molecular dynamics and Gaussian models. Proteins, 55, 635-645 [19] Tama, F., Miyashita, O., Brooks III, C. (2004). Flexible multi-scale fitting of atomic structures into low-resolution electron density maps with elastic network normal mode analysis. J. Mol. Biol, 337, 985-999 [20] Hinsen, K. (1998). Analysis of Domain Motions by Approximate Normal Mode Calculations, Proteins, 33:417-429 [21] Hinsen, K., Thomas, A, Field, M.J. (1999). Analysis of Domain Motions in Large Proteins, Proteins, 34:369-382 [22] Atilgan, A. R. , Durell, S. R., Jernigan, R. L., Demirel, M. C., Keskin, O. and Bahar I. (2001). Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model, Biophysical Journal, 80:505-515 [23] Tama, F., F. X. Gadea, O. Marques, and Y. Sanejouand. (2000). Building block approach for determining low-frequency normal modes of macromolecules. Prot. Struct. Funct. Gen. 41:1-7. [24] Petrone, P., Pande, V.S. (2006). Can Conformational Change Be Described by Only a Few Normal Modes? Biophysical Journal, 90, 1583-1593 NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 27 [25] Bahar,I. and Rader,A.J. (2005) Coarse-grained normal mode analysis in structural biology. Curr. Op. Struct. Biol., 15, 586-592. [26] Eom, K., Na, S. (2008). Coarse-Grained Structural Model of Protein Molecules. Preprint to the chapter of the book ”Computational Biology: New Research” [27] Keskin, O., Jernigan, R., Bahar, I. (2000). Proteins with Similar Architecture Exhibit Similar LargeScale Dynamic Behavior. Biophysical Journal, 78, 20932106 [28] Frauenfelder, H., Parak, F., Young, R. (1988). Conformational substates in proteins. Ann. Rev. Biophys. Chem, 17, 451-479 [29] Bahar, I., Atilgan, A.R., Erman, B. (1997). Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential, Folding & Design, 2, 173-181 [30] Brooks, B., Bruccoleri, R. Olafson, B., States, D., Swaminathan, S., Karplus, M. (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comp. Chem, 4, 187 [31] Tama, F., Valle, M., Frank, J., Brooks, C.L. (2003). Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo-electron microscopy. PNAS, 100(16), 9319-9323 [32] Doruker, P., A. R. Atilgan, and I. Bahar. (2000). Dynamics of proteins predicted by molecular dynamics simulations and analytical approaches: application to a-Amylase inhibitor. Prot. Struct. Funct. Gen. 40:512-524. [33] Suhre,K. and Sanejouand,Y.H. (2004) ElNemo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res., 32, W610W614. [34] Brooks BR and Karplus M (1983) Harmonic dynamics of proteins: normal mode and fluctuations in bovine pancreatic trypsin inhibitor. Proceedings of the National Academy of Sciences of the USA 80: 6571-6575. [35] Wako, H., Kato, M., Endo, S. (2003). Improvements in ProMode (a Database of Normal Mode Analyses of Proteins). Genome Informatics, 14, 663-664 [36] Lindahl, E., Azuara, C., Koehl, P., Delarue, M. (2006). NOMAD-Ref: visualization, deformation and refinement of macromolecular structures based on all-atom normal mode analysis. Nucleic Acids Research, 34, W52-W56 [37] Perahia, D., Mouawad, L. (1995). Computation of low-frequency normal modes in macromolecules: improvements to the method of diagonalization in a mixed basis and application to hemoglobin. Comput. Chem, 19, 241 [38] Krebs,W.G., Alexandrov,V., Wilson,C.A., Echols,N., Yu,H. and Gerstein,M. (2002) Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins: Struct. Funct. and Genet., 48, 682-695. [39] Lu, M., Ma, J. (2005). The role of shape in determining molecular motions. Biophys. J., 89, 2395-2401 [40] Tama F. (2003). Normal mode analysis with simplified models to investigate the global dynamics of biological systems. Protein and Peptide Letters, 10(2), 119-132 [41] Alexandrov,V., Lehnert,U., Echols,N., Milburn,D., Engelman,D. and Gerstein,M. (2005) Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool. Prot.Sci., 14, 633-643. [42] Van Wynsberghe, A., Li, G., Cui, Q. (2004). Normal-Mode Analysis Suggests Protein Flexibility Modulation throughout RNA Polymerase’s Functional Cycle. Biochemistry, 43, 13083-13096 [43] Harary, F. (1971) Graph theory. Addison-Wesley. [44] Delarue,M. and Dumas,P. (2004) On the use of low-frequency normal modes to enforce collective movements in refining macromolecular structural models. Proc. Natl. Acad. Sci., 101, 6957-6962. [45] Wako, H., Endo, S. (2002). ProMode: A Database of Normal Mode Analysis of Proteins. Genome Informatics, 13, 519-520 [46] Gorba, C., Miyashita, O., Tama, F. (2008). Normal mode flexible fitting of high resolution structure of biological molecules toward one dimensional low resolution data. Biophysical Journal, 94, 15891599 [47] Delarue,M. and Sanejouand,Y.-H. (2002) Simplified normal mode analysis of conformational transitions in DNA-dependent polymerases: the elastic network model. J. Mol. Biol., 320, 1011-1024. 28 JUN KOO PARK [48] Thomas, A., K. Hinsen, M. J. Field, and D. Perahia. (1999). Tertiary and quaternary conformational changes in aspartate transcarbamylase: a normal mode study. Prot. Struct. Funct. Gen. 34:96-112. [49] Ma JP (2005) Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 13(3): 373-380. [50] Hollup,S.M., Salensminde,G. and Reuter,N. (2005) WEBnm@: a web application for normal mode analysis of proteins. BMC Bioinformatics, 6, 52. [51] Cui, Q., Li, G., Ma, J., Karplus, M. (2004). A Normal Mode Analysis of Structural Plasticity in the Biomolecular Motor F1-ATPase. Journal of Molecular Biology, 340, 345-372 [52] Levitt, M., Sander, C., Stern, P.S. (1985). Protein Normal-mode Dynamics: Trypsin Inhibitor, Crambin, Ribonuclease and Lysozyme. Journal of Molecular Biology, 181, 423-447 [53] Li, G., Cui, Q. (2004). Analysis of Functional Motions in Brownian Molecular Machines with an Efficient Block Normal Mode Approach: Myosin-II and Ca21-ATPase. Biophysical Journal, 86, 743763 [54] Yu, X., Park, J., Leitner, D. (2003). Thermodynamics of Protein Hydration Computed by Molecular Dynamics and Normal Modes. Journal of physical chemistry, 107, 12820-12828 [55] Chacon, P., Tama, F., Wriggers, W. (2003). Mega-Dalton biomolecular motion captured from electron microscopy reconstructions. J. Mol. Biol, 326, 485-492 [56] Wu, Z. (2006). Linear Algebra in Biomolecular Modeling. Handbook of Linear Algebra, Chapman/Hall CRC Press [57] Kloczkowaski, A., Mark, J., Erman, B. (1989). Chain Dimensions and Fluctuations in Random Elastomeric Networks. 1. Phantom Gaussian Networks in the Undeformed State. Macromolecules, 22, 1423-1432 [58] Huang, Kerson (1990). Statistical Mechanics. Wiley, John & Sons, Inc. ISBN 0-471-81518-7. [59] Alder, B. J.; Wainwright, T. E. (1959). Studies in Molecular Dynamics. I. General Method. Journal of Chem. Phys. 31(2), 459 [60] Go, N., T. Noguti, and T. Nishkawa. (1983). Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci. USA. 80:3696-3700. [61] Perahia, D., Mouawad, L. (1995). Computation of low-frequency normal in macromolecules: improvements to method of diagonalization in a mixed and application to hemoglobin. Computers Chem., 19(3), 241-246 [62] Marques, O., and Y. Sanejouand. (1995). Hinge-bending motion in citrate synthase arising from normal mode calculations. Prot. Struct. Funct. Gen. 23:557-560. [63] Darst, S., Opalka, N., Chacon, P., Polyakov, A., Richter, C., Zhang, G. (2002). Conformational flexibility of bacterial RNA polymerase. PNAS. 99(7), 4296-4301 [64] Hayward S. (2001). Normal mode analysis of biological molecules. In Computational biochemistry and biophysics. New-York, Marcel Dekker, Inc., 153-168. [65] Miyashita, O., Tama, F. (2007).Normal Mode Analysis Techniques in Structural Biology. Encyclopedia of life Sciences, John Wiley & Sons. [66] http://emotion.biomachina.org/about.html [67] Delarue M, Lindahl E. (2004). Normal mode calculation and visualization using Pymol. [http://lorentz.immstr.pasteur.fr/nma/]. [68] Reuter N, Hinsen K, Lacapere JJ. (2003). Transconformations of the SERCA1 Ca-ATPase: a normal mode study. Biophys J, 85:2186-2197. [69] Brink J, Ludtke SJ, Kong YF,Wakil SJ, Ma JP, Chiu W. (2004). Experimental verification of conformational variation of human fatty acid synthase as predicted by normal mode analysis. Structure 12:185-91 [70] Brooks, B., and Karplus, M. (1985). Normal Modes for Specific Motions of Macromolecules: Application to the Hinge-Bending Mode of Lysozyme. Proc. Natl. Acad. Sci. USA 82, 4995-4999. [71] Kidera, A., Inaka, K., Matsushima, M., Go, N. (1991). Normal mode refinement: Crystallographic refinement of protein dynamic structure applied to human lysozyme. Biopolymers, 32(4), 315-319. [72] Ma, J., and Karplus, M. (1997). Ligand-induced conformational changes in ras p21: a normal mode and energy minimization analysis. J. Mol. Biol. 274, 114-131. [73] Wu, Z. Lecture notes on computational structural biology. NORMAL MODE ANALYSIS AND GAUSSIAN NETWORK MODEL 29 [74] Seno, Y., and Go, N. (1990b). Deoxymyoglobin studied by the conformational normal mode analysis. I. Dynamics of globin and the heme-globin interaction. J. Mol. Biol. 216, 95-109. [75] Tama, F., Miyashita, O., Brooks, C.L. (2004). Normal mode based flexible fitting of high-resolution structure into low-resolution experimental data from cryo-EM. Journal of Structural Biology, 147, 315-326 [76] Thomas, A., Field, M.J., and Perahia, D. (1996b). Analysis of the Low-frequency Normal Modes of the R State of Aspartate Transcarbamylase and a Comparison with the T State Modes. J. Mol. Biol. 261, 490-506. [77] Tirion, M.M., ben-Avraham, D., Lorenz, M., and Holmes, K.C. (1995). Normal modes as refinement parameters for the F-actin model. BiAcad. ophys. J. 68, 5-12. [78] Tozinni, V., Mccammon JA. (2005). A coarse grained model for the dynamics of ap opening in HIV-1 protease. Chem. Phys. Lett. 413,123-128. [79] Moore, W. J.(1972). Physical Chemistry, 4th Ed., Prentice-Hall: Englewood Cliffs, NJ, , Chapter 17, Sec. 14., 775-776. [80] Godsil,C. (2001). Algebraic graph theory. New York : Springer ISBN:0387952209 [81] Serway, R., Jewett, J. (2003). Physics for Scientists and Engineers. Brooks/Cole. ISBN 0-534-408427. [82] Ma, J. (2004). New Advances in Normal Mode Analysis of Supermolecular Complexes and pplications to Structural Refinement. Current Protein and Peptide Science, 5(2), 119-123 [83] Wako, H., Otsuka, M., Tomizawa, Y., Kato, M., Endo, S. (2005). Reflection of Knowledge Information in ProMode. [84] Mouawad, L., Perahia, D. (1993). Diagonalization in a mixed basis: a method to compute lowfrequency normal modes for large macromolecules. Biopolymers, 33, 599-611. [85] Bahar, I., Jernigan, R. L. (1998). Vibrational dynamics of transfer RNAs: comparison of the free and synthetase bound forms. J. Mol. Biol. 281, 871-884 [86] Goldstein, H. (2002). Classical mechanics. Addison Wesley [87] Casella, G., Berger, R. (2002). Statistical inference. Duxbury. [88] Haliloglu, T., Bahar, I., Erman, B. (1997). Gaussian Dynamics of Folded Proteins. Physical Review Letters, 79(16), 3090-3093. [89] Tirion, M. (1996). Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis. Physical Review Letters, 77(9), 1905-1908. [90] Tama, F., Sanejouand, Y. (2001). Conformational change of proteins arising from normal mode calculations. Protein Engineering, 14(1), 1-6 [91] Tama, F., Brooks, C. (2006). Symmetry, Form, and Shape: Guiding Principles for Robustness in macromolecular Machines. Annu. Rev. Biophys. Biomo. Struct, 35: 115-133 [92] Wilson, K., Malcolm, B., Matthews, B. (1992). Structural and thermodynamic analysis of compensating mutations within the core of chicken egg white lysozyme. J. Biol. Chem, 267, 10842 [93] Xu, C., Tobi, D., Bahar, I. (2003). Allosteric Changes in Protein Structure Computed by a Simple Mechanical Model: Hemoglobin T ↔ R2 Transition, J. Mol. Biol, 333, 153-168 [94] Tama, F., Wriggers, W., Brooks III, C.(2002). Exploring Global Distortions of Biological Macromolecules and Assemblies from Low-resolution Structural Information and Elastic Network Theory. J. Mol. Biol, 321, 297305