EXPERIMENTAL AND THEORETICAL METHODS TO STUDY PROTEIN FOLDING Experiments • • • • • Thermal denaturation Chemical denaturation Mechanical unfolding Kinetic experiments Mutational studies Techniques • Differential scanning calorimetry (DSC) • Spectroscopy – – – – Circular dichroism (CD) Fluorescence Nuclear magnetic resonance (NMR) Small angle X-ray (SAXS) and small angle neutron scattering (SANS) • Atomic force microscopy (AFM) Wild type Acid-denaturated wild type L16A mutant C-terminal peptide Religa et al., J. Mol. Biol., 333, 977-991 (2003) F-values F0 Mutation affects the folded state but not the transition state F 1 Matouschek A, Kellis JT, Serrano L, Fersht AR. (1989). Mapping the transition state and pathway of protein folding by protein engineering. Nature 340:122 Mutation affects both the folded state and the transition state Millet et al.. Biochemistry 41, 321-325 (2002) Structure of closed and open form of the DnaK (Hsp70) chaperone Fluorescence studies of closing and opening of Hsp70 Mapa et al., Molecular Cell 38, 89, 2010. Theoretical studies of protein structure and protein folding • Need to express energy of a system as function of coordinates • Need an algorithm to explore the conformational space Energy expression in empirical force fields Es Eb 1 d 1 0 2 0 2 E ki d i d i ki i i i 2 i 2 Enb Eel 12 6 0 0 rij rij qi q j 332 ij 2 r rij rij i j i i j i ij Etor Vi (1) Vi ( 2 ) Vi ( 3) i 2 1 cos i 2 1 cos 2 i 2 1 cos3 i Partition of the energy of interactions with respect to topological distance Torsional interactions Etor 1,3-interactions Eb only 1,5-interactions Eel+EVdW Bond distortion energy d 2 Es(d) 1 d Es d k d d 0 2 d0 d Typical values of d0 and kd Bond d0 [A] kd [kcal/(mol A2)] Csp3-Csp3 1.523 317 Csp3-Csp2 1.497 317 Csp2=Csp2 1.337 690 Csp2=O 1.208 777 Csp2-Nsp3 1.438 367 C-N (amid) 1.345 719 Comparison of the actual bond-energy curve with that of the harmonic approximation Potentials that take into account the asymmetry of bond-energy curve 1 d 1 0 2 Es d k d d d d 0 2 6 Es d De 1 e 1 b d d e 2 3 Anharmonic potential Morse potential Harmonic potential E [kcal/mol] Anharmonic potential Morse potential d [A] k Eb() Energy of bond-angle distortion 1 0 Eb k 2 2 0 Typical values of 0 and k 0 [degrees] Csp3-Csp3-Csp3 109.47 k [kcal/(mol degree2)] 0.0099 Csp3-Csp3-H 109.47 0.0079 H-Csp3-H 109.47 0.0070 Angle Csp3-Csp2-Csp3 117.2 0.0099 Csp3-Csp2=Csp2 121.4 0.0121 Csp3-Csp2=O 0.0101 122.5 Basic types of torsional potentials Single bond between sp3 carbons or between sp3 carbon and nitrogen Example: C-C-C-C quadruplet Etor [kcal/mol] Etor 1.61 cos3 60 Double or partially double bonds 50 40 Example: C-C=C-C quadruplet 30 20 Etor 301 cos2 10 0 Single bond between electronegative atoms (oxygens, sulfurs, etc.). Example: C-S-S-C quadruplet Etor 3.51 cos2 0.61 cos dihedral angle [deg] Potentials imposed on improper torsional angles B X A X Etor V2 1 cos 2 V3 1 cos 3 Nonbonded Lennard-Jones (6-12) potential r 0 12 r 0 6 12 6 Enb r 2 Enb r 4 r r r r 1 6 Enb [kcal/mol] ro 2 rij0 ri0 r j0 ij i j - r0 r [A] Sample values of i and r0i Atom type r0 C(carbonyl) 1.85 0.12 C(sp3) 1.80 0.06 N(sp3) 1.85 0.12 O(carbonyl) 1.60 0.20 H(bonded with C) 1.00 0.02 S 2.00 0.20 Other nonbonded potentials r C Enb r A exp 6 r Buckingham potential C D Ehb r 12 10 r r 10-12 potential used in some force fields (e.g., ECEPP) for proton…proton donor pairs Sources of parameters Energy contribution Source of parameters Bond and bond angle distortion Crystal and neutronographic data, IR spectroscopy Torsional NMR and FTIR spectroscopy Nonbonded interactions Polarizabilities, crystal and neutronographic data Electrostatic energy Molecular electrostatic potentials All Energy surfaces of model systems calculated with molecular quantum mechanics Solvent in simulations Explicit water • TIP3P • TIP4P • TIP5P • SPC Implicit water • Solvent accessible surface area (SASA) models • Molecular surface area models • Poisson-Boltzmann approach • Generalized Born surface area (GBSA) model • Polarizable continuum model (PCM) TIP3P model TIP4P model 0.00 e -0.834 e H 104.52o H 0.417 e H 0.520 e 0.15 Å O O M H -1.040 e O=3.1507 Å O=3.1535 Å O=0.1521 kcal/mol O=0.1550 kcal/mol Solvent accessible surface area (SASA) models Fsolw A i i atoms i Ai Free energy of solvation of atomu i per unit area, solvent accessible surface of atom i dostępna Vila et al., Proteins: Structure, Function, and Genetics, 1991, 10, 199-218. Comparison of the lowest-energy conformations of [Met5]enkefalin (H-Tyr-Gly-Gly-Phe-Met-OH) obtained with the ECEPP/3 force field in vacuo and with the SRFOPT model vacuum SRFOPT Compariosn of the molecular sufraces of the lowest-energy conformation of [Met5]enkefaliny obtained without and with the SRFOPT model vacuum SRFOPT Molecular surface are model Fcav A Surface tension A molecular surface area Generalized Born molecular surface (GBSA) model Fsolw Fcav E GB pol E GB pol 1 1 332qi q j in out 1 f GB (rij ) 2 r ij 2 f GB (rij ) rij Ri R j exp 4R R i j All-atom representation of polypeptide chains Coarse-grained representation of polypeptide chains Coarse-grained force fields Physics-based potentials (statistical-mechanical formulation) 1 F ( X) U ( X) RT ln VY exp E X, Y / RT dVY Y VY dVY Y primary variables present in the model secondary variables not present in the model (solvent, side-chain dihedral angles, etc.) E(X,Y) : all-atom energy function. X: Y: Statistical potentials W x; c; s RT ln N obs x; c; s N ref x; c; s X – geometric variables c – residue types s – sequence context Leu-Leu pair A – radial correlation function B – reference distribution function C- Searching the conformational space Low (Lowest)-energy conformations Monte Carlo with minimization (MCM) Canonical conformational ensembles Basin hopping Canonical MC Canonical MD Replica-exchange MC (REMC) Diffusion equation method (DEM Replica-exchange MD (REMD) Genetic algorithms Local energy minimization Simulated annealing Smoothing energy surface Monte Carlo Molecular dynamics Local vs. global minimization f(x) Start Local minimum Global minimum x General scheme of local minimization of multivariate functions: 1. Choose the initial approximation x(0). 2. In pth iteration, compute the search direction d(p). 3. Locate x(p+1) as a minimum on the serarch direction (minimization of a function in one variable). 4. Terminate when convergence has been achieved or maximum number of iterations exceeded. x2 x(0) x(1) f(x(p)+d(p)) x(2) x* d(2) d(1) x1 a* a Deformation methods Lowest-energy structure of gramicidin S computed with the ECEPP force field (M. Dygert, N. Go, H.A. Scheraga, Macromolecules, 8, 750-761 (1975). This structure turned out to be identical with the NMR structure determined later. The C-terminal part of HDEA protein found by global minimization of the UNRES coarsegrained effective energy function. The N-terminal part of HDEA Liwo et al., PNAS, 96, 5482–5485 (1999) Comparison of the experimental strucgture of bacteriocin AS-48 from E. faecalis with the structure obtained by global minimization of the UNRES force field (Pillardy et al., Proc. Natl. Acad. Sci. USA., 98, 2329-2333 (2001)) “Potential energy” or “free energy”? Nature (and a canonical simulation) finds the basin with the lowest free energy, at a given temperature which might happen to but does not have to contain the conformation with the lowest potential energy. The global-optimization methods are desinged to find structures with the lowest potential energy, thus ignoring conformational entropy. Technically this corresponds to canonical simulations at 0 K. Comparison of minimum potential energies obtained in MD runs with the lowest values of the potential energy PDB ID code Emin (MD) [kcal/mol] Eglob [kcal/mol] 1BDD (46) -409 (-414) -597 1GAB (47) -461 (-501) -669 1LQ7 (67) -658 (-652) -937 1CLB (75) -740 (-709) -1053 1E0G (48) -405 (-380) -632 (number of residues) Results of Langevin dynamics simulations are in parentheses. Basic scheme of the Metropolis (canonical) Monte Carlo algorithm Conformation Xo, energy Eo Perturb Xo: X1 = Xo + DX Compute new energy (E1) NO E1<Eo ? NO Sample Y from U(0,1) Compute W=exp[-(E1-Eo)/kT] W>Y? YES Xo=X1, Eo=E1 YES E1 E0 Accept with probability exp[-(E2-E1)/kBT] Accept unconditionally E1 Calculation of averages 1 A N N A i 1 i The index i runs through all MC steps, including those in which new conformations have not been accepted. Conformational space representation in Monte Carlo methods • Lattice representation; the centers of interactions are on lattice nodes. • Continuous; the centers are located in 3D space. An example of lattice Sample MC trajectory of a goodMonte folder; Model 1a Carlo trajectory A pathway of thermal unforlding of protein G simulated with the CABS model and lattice Monte Carlo dynamics Kmiecik and Koliński, Biophys. J., 94, 726-736 (2008) Molecular dynamics d 2ri dv i Fi r (t ) 1 a i (t ) ri V r (t ) , i 1,2, , n 2 dt dt mi mi d 2 xi V m 2 dt xi dri v i (t ), i 1,2, , n dt r t0 r0 vt0 v 0 1 r (t Dt ) r (t ) vt Dt a(t )Dt 2 2 The Verlet algorithm: 1 r (t Dt ) r (t ) v(t )Dt a(t )Dt 2 2 1 r (t Dt ) r (t ) v(t )Dt a(t )Dt 2 2 r (t Dt ) r (t Dt ) 2r (t ) a(t )Dt 2 r (t Dt ) 2r (t ) r (t Dt ) a(t )Dt 2 1 v(t ) r (t Dt ) r (t Dt ) 2 e(t ) O(Dt 4 ) The Velocity Verlet algorithm Step 1: 1 r t Dt r (t ) v(t )Dt a(t )Dt 2 2 1 Dt v t v(t ) a(t )Dt 2 2 Step 2: 1 ai (t Dt ) ri U r(t Dt ) mi Dt 1 v(t Dt ) v t a(t Dt )Dt 2 2 The leapfrog algorithm: Dt Dt v t v t a(t )Dt 2 2 Dt r (t Dt ) r (t ) v t Dt 2 All three algorithms are symplectic, i.e., the total energy oscillates about a constant value (the „shadow Hamiltonian”) which is close bur not equal to the initial energy. Many other higher-order algorithms which are more accurate in a single step (e.g., the Gear algorithm) lack this property. Symplectic algorithms have also been designed for isokinetic (constant temperature) and isobaric (constant pressure) simulations; extended Hamiltonian is considered in these cases. Dependence of the kinetic, potential, and total energy on time for coarsegrained Ac-Ala10-NHMe (Khalili et al., J. Phys. Chem. B, 2005, 109, 1378513797) Kinetic energy energy [kcal/mol] Total energy Potential energy Total energy 0.0 1.0 2.0 3.0 time [ns] 4.0 5.0 Berendsen’s thermostat (weak coupling with temperature bath) Dt fkT v v 1 1 Ek 1 n 2 2 2 Ek mi v xi v yi v zi 2 i 1 f – number of degrees of freedom (3n) – coupling parameter Dt – time step Ek – kinetic energy Langevin dynamics (for implicit solvent) d xi dxi V rand mi 2 i fi dt xi dt 2 i 6 (ri rw ) w fi rand 2 i RT N (0,1) Dt dxi E rand i fi dt xi Stokes’ law Wiener process Brownian dynamics sidechain rotation helix formation protein folding 10-15 10-12 10-9 10-6 10-3 100 femto pico nano micro milli seconds bond vibration all atom MD step loop closure folding of -hairpins MD algorithm references: 1. Frenkel, D.; Smit, B. Understanding molecular simulations, Academic Press, 1996, Chapter 4. 2. Calvo, M. P.; Sanz-Serna, J. M. Numerical Hamiltonian Problems; Chapman & Hall: London, U. K., 1994. 3. Verlet, L. Phys. Rev. 1967, 159, 98. 4. Swope, W. C.; Andersen, H. C.; Berens, P. H.; Wilson, K. R. Chem. Phys. 1982, 76, 637. J. 5. Tuckerman, M.; Berne, B. J.; Martyna, G. J. J. Chem. Phys. 1992, 97, 1990. 6. Ciccotti, G.; Kalibaeva, G. Philos. Trans. R. Soc. London, Ser. A 2004, 362, 1583. Regular and multiplexed replica exchange algorithm N independent replicas are simulated independently for a reasonably long time using standard canonical MC or MD exchange of two neighboring replicas is attempted according to following probability: for D 0 1 W X, m | Y, n exp D for D 0 regular D m n E Y E X multiplexed Y.Rhee V.Pande, Biophys. J. 84, 775, 2003