Supporting information Essential function of the N-termini tails of the proteasome for the gating mechanism revealed by molecular dynamics simulations Hisashi Ishida Simulated Annealing in vacuum Hereafter, all the MD simulations were carried out using an MD simulation program called SCUBA 1 2 3 with the AMBER parm99SB force-field 4. In order to optimize the conformation of the N-termini tails, simulated annealing (SA) was performed in vacuum by assuming the distance-dependent dielectric constant of 4.0r with the value of r in Angstrom. Non-bonded interactions were evaluated with a cut-off radius of 12 Å. The time-step of 0.5 fs was used throughout the SA. As the surfaces of the X-ray structures were removed for simplicity, the ordered- and disordered-gate models were structurally fragile. To fix the structures, the majority of the atoms in the models were restrained; only the atoms of the first seven (ten) residues of the N-termini tails were 1 free to move for the ordered (disordered)-gated models respectively, and other heavy atoms were restrained by a force constant of 10 kcal/mol/Å2 in the SA simulation. For each system, two types of simulations were carried out; one is a simulation in which the substrate was located in the antechamber at (r, z) = (0 Å, -20 Å), and the other is a simulation in which the substrate was located in the activator at (r, z) = (0 Å, 35 Å). The center of mass of the substrate was restrained by a force constant of 1.0 kcal/mol/Å2. The system was heated from 0 to 1200 K during the first 100 ps and was then equilibrated for 400 ps. The equilibrated system was then gradually cooled for 500 ps from 1200 K to 300 K. The SA was repeated 10 times and the resulting coordinate sets were stored as possible conformations of the missing N-termini tails at local minimum energy regions. Each of the 10 conformations was minimized for 500 steps using the steepest descent algorithm followed by 5,000 steps of the conjugate gradient algorithm without constraining the missing N-termini tails. Then, as the representative of the 10 conformations, the minimum energy (energy was defined as the total of the internal energy of the N-termini tails and the interaction energy between the N-termini tails and the other part of the complex) was selected. 2 Model of proteasome – activator – substrate complex in water After the SA, the four models were placed in an aqueous medium. The PAN-like – CPO/D and PA26 – CPO/D complexes were each placed in a rectangular box (96 Å × 92 Å × 122 Å), (97 Å × 93 Å ×124 Å), (89 Å × 90 Å × 114 Å) and (91 Å × 90 Å × 116 Å), respectively. In this box, all the atoms of the models were separated more than 6 Å from the edge of the box. The rather short distance of 6 Å was used as the dynamics of the substrate and the N-termini tails were the main focus of this study. The dynamics of the water molecules outside the fixed surface of the models was assumed not to have a significant influence on the dynamics of the substrate and the N-termini tails. To neutralize the charges of the models, sodium ions were placed at positions with large negative electrostatic potential. Moreover, sodium and chloride ions were added to the box at a concentration of 100 mM NaCl. were added to surround the four models. Then TIP3P water molecules 5 Consequently, the four systems, the PAN-like – CPO/D and PA26 – CPO/D complexes, comprised 106,028 atoms (the complex – 27,601 atoms, substrate – 145 atoms, 26,044 water molecules, 60 sodium and 90 chloride ions), 111,410 atoms (the complex – 27,468 atoms, substrate – 145 atoms, 27,880 water molecules, 60 sodium and 97 chloride ions), 92,182 atoms (the complex – 27,398 atoms, substrate – 145 atoms, 21,501 water molecules, 60 sodium 3 and 76 chloride ions), and 95,191 atoms (the complex – 27,265 atoms, substrate – 145 atoms, 22,546 water molecules, 60 sodium and 83 chloride ions), respectively. Energy minimization of each system was carried out to alleviate unfavorable interactions between the complex and the water molecules using the steepest descent algorithm for 500 steps and following the conjugate gradient algorithm for 5,000 steps. Harmonic restraints with a force constraint of 10.0 kcal/mol/Å2 were applied to all the heavy atoms of the molecules. The dielectric constant used was 1.0 and the van der Waals interactions were evaluated with a cut-off radius of 8 Å. The particle-particle particle-mesh (PPPM) method 6 was used for the electrostatic interactions. For the PPPM calculations, charge grid sizes of (90 Å × 90 Å × 120 Å) were chosen for the PAN-like – CPO/D complex, and (96 Å × 90 Å ×120 Å) and (96 Å × 96 Å × 125 Å) were chosen for the PA26 – CPO/D complex, respectively, to set charge grid spacing close to 1 Å. The charge grid was interpolated using a spline of the order of seven, while the force was evaluated using a differential operator of the order of six 6. Free-energy calculation by ABMD implemented in SCUBA To evaluate the free-energy of the translocation of the peptide inside the complex, the 4 adaptively biased molecular dynamics (ABMD) method exchange method 8 was employed in SCUBA. 7 combined with replica The reaction coordinate of z was set at the D7 symmetry axis of the CP (see Fig. 1). The z-component of the center of mass of the substrate was employed as the value of the reaction coordinate. The equations of motion used in the ABMD method are expressed as 7: ma d 2ra Fa U t | R , 2 dt ra (S1) U (t | z ) k BT K z R , t F where R r1,..., rN is the coordinates of the substrate, and N is the number of atoms of the substrate (N = 145). R is a function to give the value of the reaction coordinate (z-component of the center of mass of the substrate). kB is the Boltzmann constant, T is the constant temperature,F is the flooding time scale, and K is the kernel which has distribution around the reaction coordinate. The first equation is for atom a, with an additional force coming from the biasing potential U(t|z) with an ordinary atomic force of Fa. The second equation is the time-evolving equation of the biasing potential. For large enough F and small enough width of the kernel, U(t|z) converges towards the free-energy F(z) times -1 as t →∞ 7. In the ABMD, the range of the reaction coordinate from zmin to zmax is discretized to n grids with an interval of z such as zmin = z0, z1, …, zn-1 = zmax. The biasing potential is expanded in terms of a third order 5 B-spline function 7 as follows: U (t | z ) n U k 1 k z zmin (t ) Bk z , (S2) ( z i )2 (| z i | 2) / 2 2 / 3, Bi ( z ) (2 | z i |)3 / 6, 0, 0 | z i | 1, (S3) 1 | z i | 2, otherwise. Uk(t) is the coefficient of Bk at the discrete reaction coordinates. The third B-spline Bi(z) has more than zero in the range of | z – i | < 2 with a center of z = i. B-spline in the literature of Babin 7 (B() of the is equivalent to Bi(x) by setting z - i.) Therefore, the number of Uk(t) which is required to determine U(t|z) is n+2; in addition to the coefficients Uk(t), k = 0, …, n-1, two coefficients of U-1(t) and Un(t) are required. At the implementation of the ABMD method in SCUBA, the force derived from the biasing potential was assumed to be zero at z = zmin and z = zmax, U (t | z ) z z zmin U (t | z ) z 0, (S4) z zmax so that U(t|z) = U(t|zmin) at z ≤ zmin and U(t|z) = U(t|zmax) at z ≥ zmax. This boundary condition is satisfied when U 1 (t ) U1 (t ), U n (t ) U n2 (t ). (S5) Under this boundary condition, Uk(t) can be determined from the values of U(t|z) at k = 0, …, n-1, and the other way, the values of U(t|z) can be interpolated from the values of Uk(t) at k = 0, …, n-1. Uk(t) is calculated as a Euler-like equation of the time 6 evolution 7;. U k (t t ) U k (t ) t k BT F R zmin Gk z , (S6) k 0,..., n 1 G has a form of a bi-weight function. 48 z 2 2 1 , G ( z ) 41 4 0 |z| 2, (S7) otherwise. It should be noted that the integral of G is not 1 but (48/41)×(16/15) = 768/615 ≈ 1.249. The coefficient of 48/41 is set so that the maximum of the kernel of K z R in Eq. S1 is 1 as follows: 48 9 1 2 9 1 G(k i) B (i) 41 16 6 1 3 16 6 1, (S8) k k where i is an integer. Consequently, the energy added to the biasing potential with ps during the time of t ps is 1.249kBT×(t/) kcal/mol. The force coming from U(t|) was calculated as; n U (t | z ) z zmin U k (t ) Bk z z z k 1 n 1 z zmin U k' (t ) Bk' , z k 1 (S9) where Bk' is the second order B-spline function: ( z i 1) 2 / 2, ( z i )( z i 1) 1/ 2, Bi' ( z ) 2 ( z i 2) / 2, 0, -1 z i 0, 0 z i 1, (S10) 1 z i 2, otherwise. It should be noted that Bk' ( z ) is not the derivative of Bi(z). 7 U k' (t ) which corresponds to the coefficient of the second-order B-spline expansion is given as: U k' (t ) U k 1 (t ) U k (t ) . z (S11) The ABMD method can be combined with the replica exchange method 8. The temperatures Tα and T, and the biasing potentials of Uα(t|z) and U(t|z) at the replicas α and are exchanged according to an exchange probability given by 7: 0 1 w( , ) exp( ) 0 (S12) 1 1 1 E E U z U z k BT k BT k BT k 1T U z U z , B where Eα and E are ordinary atomic potential energies at the replicas α and , respectively. Using the ABMD simulations with the replica exchange method, kBT in Eq. S6 was set at one to normalize the change of the biasing potential for all the replica temperatures. In this study, 96 equilibrated replicas for each system with an exponential temperature distribution in the range 318 K ≤ T < 470 K were used. The temperature of 318 K was chosen to match the experimental conditions employed in the measurement of the rate of protein degradation 9. The distribution of the temperatures was obtained using a web-server (http://folding.bmc.uu.se/remd/) so that the exchange rate is ~ 0.135 at the desired temperature interval 10: 8 318.00, 319.37, 320.74, 322.12, 323.50, 324.89, 326.28, 327.68, 329.08, 330.48, 331.89, 333.30, 334.72, 336.15, 337.57, 339.01, 340.44, 341.88, 343.33, 344.78, 346.24, 347.70, 349.16, 350.61, 352.09, 353.56, 355.05, 356.53, 358.03, 359.53, 361.03, 362.54, 364.05, 365.56, 367.08, 368.61, 370.12, 371.66, 373.20, 374.74, 376.29, 377.85, 379.41, 380.98, 382.55, 384.12, 385.70, 387.29, 388.89, 390.48, 392.08, 393.69, 395.30, 396.91, 398.53, 400.16, 401.79, 403.43, 405.07, 406.72, 408.38, 410.04, 411.70, 413.37, 415.05, 416.73, 418.41, 420.11, 421.80, 423.51, 425.21, 426.93, 428.65, 430.37, 432.11, 433.84, 435.58, 437.33, 439.08, 440.84, 442.61, 444.38, 446.15, 447.93, 449.72, 451.52, 453.32, 455.13, 456.94, 458.75, 460.58, 462.40, 464.24, 466.08, 467.93, 469.78 Langevin dynamics algorithm was utilized to control the temperature and pressure of the system. The coupling times for the temperature and pressure control were both set at 2 ps-1. The SHAKE algorithm involving hydrogen atoms. 11; 12 was used to constrain all the bond lengths The leap-frog algorithm with a time step of 2 fs was used throughout the simulation to integrate the equations of motion. Met1–Leu21 of the seven N-terminal tails, Tyr123–Arg130 of the α-annulus and His99–Asn104 corresponding to a part of the seven activation loops in the activator were set to be free. The other heavy atoms in the models were subjected to a 10-kcal/mol/Å2 harmonic restraint in their original positions to prevent the structure from collapsing at a high 9 temperature. To prepare for the 96 equilibrated replicas, 48 odd-number systems in which the substrate was located in the activator and 48 even-number systems in which the substrate was located in the antechamber of the CP were used. First, the replica system was heated from 0 K to 318 K within 1 ns during which the flexible parts in the models and ions were fixed with decreasing restraints and the water molecules were allowed to move. After these the restraints were removed, the system was equilibrated for 10 ns at a constant pressure of one bar and a temperature of 318 K with no restraint (except to maintain the substrate within the range of between z = -23 and 37 Å and to maintain the atoms in the models which should be fixed). for the following REMD simulations. Then the box size was fixed Using the box size at 318K, the other 95 replica systems were independently heated from 0K to their respective replica temperatures and equilibrated for 10 ns at a constant volume at their respective replica temperatures. The ABMD simulations were carried out at a constant volume with replica-exchanges attempted every 125 steps (every 250 fs) for 300 ns per replica (a total of 28.8 μs for each model). The range of the reaction coordinate for the ABMD simulations was set at -28 Å ≤ z ≤ 42 Å. To sample data efficiently within the range, two harmonic potentials with a force constant of 10.0 kcal/mol were applied at z = -28 Å and 42 Å. 10 As the free-energies around z = -28 Å and 42 Å may not be very reliable possibly due to the introduction of the harmonic potentials, the range of z analyzed was limited to within -23 Å ≤ z ≤ 37 Å. The resolution of the reaction coordinate, z, was set at 0.25 Å. The relaxation time for the free-energy profile in Eq. S1, , was set at 5 ps, 10 ps and 20 ps for 100 ns, respectively. Consequently, the energies added to the biasing potential per the range of 70 Å (from -28 to 42 Å) for 1 ns using = 5, 10 and 20 ps were 2.29, 4.58 and 9.15 kcal/mol, respectively. The free-energy landscape, the biasing potential times -1, fluctuates during the ABMD simulation. Especially in this study, sporadic local deformations of the free-energy landscapes were frequently observed at almost any place during the simulations and the free-energy landscape changes continuously because the substrate was often trapped locally inside the complex (data not shown). Therefore, the free-energy landscape was estimated to be an averaged free-energy landscape obtained from the last 100 ns simulation with = 20 ps. The conformation of the complex for the analysis were stored every 1 ps from the last 200-ns of the simulation. The free-energy component analysis The free energy of ΔG(z,T) was obtained from the ABMD simulation. 11 However, in this study only the shape of ΔG(z,T) from z = -23 to 37 Å was estimated although the value of ΔG(z,T) is generally set at zero at z = ±∞. Therefore, the difference between ΔG(z,T1) and ΔG(z,T2) at different temperatures is unknown from the limited information. To obtain the difference, it has been assumed that the system in this study follows the thermodynamic formula: G ( z , T ) H ( z , T ) T S ( z , T ) T H ( z, T ) H 0 ( z, T ) CV ( z, T )dT (S13) T0 CV ( z , T ) S ( z, T ) S0 ( z, T ) dT T0 T d CV CV ( z , T ) CV0 ( z , T ) (T T0 ) dT 0 T where ΔH, ΔS and Cv are enthalpy, entropy and the heat capacity at a constant volume, respectively. Then ΔG, ΔH and ΔS can be written as polynomial equations in terms of T such as: Gfit ( z, T ) 1 ( z ) 2 ( z )T 3 ( z )T log T 4 ( z )T 2 H fit ( z, T ) 1 ( z ) 3 ( z )T log T 4 ( z )T 2 (S14) T Sfit ( z , T ) 2 ( z ) 3 ( z ) T 3 ( z )T log T 2 4 ( z )T 2 where αj (j = 1, …, 4) are parameters to fit the curves of ΔGfit, ΔHfit and ΔSfit to ΔG, ΔH and ΔS, respectively. It should be noted that when ΔGfit(z,T) is obtained, ΔHfit(z,T) and -TΔSfit (z,T) are simultaneously obtained from Eq. S14. Here a quantity of δg(z,T) is introduced to ΔG(z,T) as follows, G( z, T ) Gfit ( z, T ) g ( z, T ) (S15) As the system was assummed to follow the thermodynamic formula in Eq. S13, δg(z,T) 12 is regarded as an error which deviates from the polynomial equations of Gfit ( z, T ) . To minimize δg(z,T), Eq. S15 was iteratively calculated as follows: Eq. S15 is rewritten as G i ( z, T ) Gfiti ( z, T ) g i ( z, T ) in the i-th iteration. (A) ΔG0(z,T) at each temperature was set so that the average of Δ0G(z,T) in the range of -23 and 37 Å was zero. Then, the parameters, αj (j = 1, …, 4) were calculated to fit the curve in Eq. S14 to ΔG0(z,T), in which δg0(z,T) was fixed at zero. (B) g i ( z, T ) and its average in the range of z between -23 Å and 37 Å, g i (T ) , were calculated as follows: g i ( z, T ) G i ( z, T ) G ifit ( z, T ), g i (T ) z 37 z 23 g i ( z, T )dz / z 37 z 23 (S16) dz. (C) ΔG i+1(z,T) was reset to be Gi ( z, T ) g i (T ) so that the deviation between ΔG i+1(z,T) and Gfiti ( z , T ) along z, z 37 z 23 G i fit ( z, T ) Gi 1 ( z, T ) dz , is minimized. (D) The iteration was repeated from (A) and the parameters, αj (j = 1, …, 4) were re-estimated to obtain the new fitting curves in Eq. S14 to the renewal ΔGi+1(z,T) This cycle was repeated four times so that Gfiti 1 ( z , T ) Gfiti ( z , T ) 5.0 104 kcal/mol (S17) H fiti 1 ( z , T ) H fiti ( z , T ) 3.5 103 kcal/mol T Sfiti 1 ( z , T ) T Sfiti ( z , T ) 3.5 103 kcal/mol were fulfilled at any z and T. Thus, the unknown quantities of ΔG(z,T1) - ΔG(z,T2) can 13 be estimated to be ΔGfit(z,T1) - ΔGfit (z,T2). Finally, the curves of ΔH(z,T) and -TΔS(z,T) were smoothed using a Bézier curve. Reference simulation for a free substrate An additional simulation of the substrate in bulk water was performed as a reference for comparison with the distributions of the peptide in free space and confined space in the complex. The distribution of the temperatures was selected as: 318.00, 320.68, 323.38, 326.10, 328.84, 331.59, 334.36, 337.15, 339.95, 342.77, 345.62, 348.48, 351.36, 354.26, 357.17, 360.11, 363.06, 366.04, 369.03, 372.05, 375.09, 378.15, 381.22, 384.31, 387.42, 390.55, 393.70, 396.87, 400.06, 403.27, 406.50, 409.76, 413.03, 416.34, 419.65, 423.00, 426.36, 429.74, 433.16, 436.59, 440.04, 443.52, 447.02, 450.53, 454.08, 457.64, 461.23, 468.48 In this box, all the atoms of the substrate were separated more than 20 Å from the edge of the box of 53 Å × 55 Å × 60 Å. To neutralize the system, 2 chloride ions were added. 13 sodium ions and 13 chloride ions were added to the box at a concentration of 100 mM NaCl. The protocol for this REMD simulation was the same as that for the substrate in the complex. The REMD was carried out for 120 ns and the last 100-ns of the trajectory was analyzed. The result showed that the radius of gyration of the 14 substrate at 318 K was 7.62 ± 1.22 Å in bulk. The average numbers of residues forming -helix, 310-helix, -strand, turn and random coil in the 9-residue substrate were 1.38 × 10-2, 1.79 × 10-1, 8.00 × 10-5, 2.33 and 5.58, respectively. This indicates that the substrate was naturally unfolded. FIGURE LEGENDS Fig. S1. A snapshot of the front N-gate closed-state observed in the PA26 – CPO complex is shown. Two cavities, represented with dots in blue and pink, were analyzed by GHECOM. The grid size of the dots is 2.0 Å. The blue and pink dots which are closest together are shown as bigger dots. The depth of the N-gate, which is defined as the distance between these two bigger dots, is 2 5 Å in this figure. The position of the N-gate, which is defined as the middle point between the two bigger dots, is depicted as a cross. The circle at the position of the N-gate with a radius of half the depth of the N-gate and the circle at the center of mass of the substrate (z = 27.7 Å) with a radius of gyration (Rg = 6.35 Å) are shown in thin and thick circles, respectively. Residues 1 – 9 of the N-termini tails are depicted as traces of their Cα atoms. Tyr8 of the N-termini tails and Gly128 of the α-annulus and the substrate are depicted as wire models in red, blue and black, respectively. 15 Fig. S2. The root mean square deviations (RMSDs) of the free-energy landscape between the free-energy landscape observed during the simulation and an averaged free-energy landscape obtained from the last 100 ns with = 20 ps are plotted against the simulation time. Fig. S3. The volumes of motion for the substrate are plotted against z. Fig. S4. The average values and RMSDs of the radius of gyration of the substrate are plotted against z. Fig. S5. The logarithm of the distributions of Tyr8 (Gly8) in the ordered- (disordered) gates with regards to (r, z) is plotted. The minimum value is set at zero. distribution of residue Tyr126 at the α-annulus is shown as a reference. The For (d), the data of Gly8 at (7 Å ≤ r ≤ 8 Å, -3 Å ≤ z ≤ 1 Å) and (8 Å ≤ r ≤ 9 Å, -2 Å ≤ z ≤ 1 Å) were omitted for the clarity of the graph. Fig. S6. 2-dimentional map of the number of contact pairs for the atoms of the interior 16 of the complex with the atoms of the substrate is depicted. The positions of all the heavy atoms of the residues, whose total number of contact pairs with the substrate contributed to more than 10% of the number of pairs of contacting atoms at any z in Fig. 6, are plotted. The residues at z ≤ 14 Å are from the CP, while those at z > 14 Å are from the activator. Tyr123–Arg130 of the CP form a part of the α-annulus, and His99–Asn104 (Glu102 in PA26 and Ala102 in the PAN-like) of the activator form a part of the activation loop. The residues encircled by a dotted line are the hydrophobic patch on the inner wall of the antechamber, while the residues encircled by a broken line are those which were observed only in the disordered-gate models. the graph, the all values of more than 0.2 were set at 0.2. For the clarity of In contrast to the ordered-gate models, there was contact with the substrate mainly with Tyr26 of the α-ring at z = ~ 7 – 17 Å in the disordered-gate models. This is because Tyr26, which interacted with Tyr8 in the ordered-gate models, could not interact with Gly8 in the disordered-gate models and was exposed to the access of the substrate. Fig. S7. The average values and RMSDs of the position of the N-gate are plotted against z. REFERENCES 17 1. Ishida, H. (2011). Molecular Dynamics Simulation System for Structural Analysis of Biomolecules by High Performance Computing. Prog. Nucl. Sci. Technol. 2, 470-476. 2. Ishida, H. & Hayward, S. (2008). Path of Nascent Polypeptide in Exit Tunnel Revealed by Molecular Dynamics Simulation of Ribosome. Biophysical J. 95, 5962-5973. 3. Ishida, H. (2010). Branch Migration of Holliday Junction in RuvA Tetramer Complex Studied by Umbrella Sampling Simulation Using a Path-Search Algorithm. J. Comput. Chem. 31, 2317-2329. 4. Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A. & Simmerling, C. (2006). Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins 65, 712-725. 5. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926-935. 6. Deserno, M. & Holm, C. (1998). How to mesh up Ewald sums. I. A theoretical and numerical comparison of various particle mesh routines. J. Chem. Phys. 1998, 7678-7693. 7. Babin, V., Roland, C. & Sagui, C. (2008). Adaptively biased molecular dynamics of free energy calculations. J. Chem. Phys. 128, 134101. 8. Sugita, Y. & Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314, 141-151. 9. Religa, T. L., Sprangers, R. & Kay, L. E. (2010). Dynamic Regulation of Archaeal Proteasome Gate Opening As Studied by TROSY NMR. Science 328, 98-102. 10. Patriksson, A. & van der Spoel, D. (2008). A temperature predictor for parallel tempering simulations. Phys. Chem. Chem. Phys. 10, 2073-2077. 11. Ryckaert, J., Cicotti, G. & Berendsen, H. J. C. (1977). Numerical integraton of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327-341. 12. van Gunsteren, W. F. & Berendsen, H. J. C. (1977). Algorithms for macromolecular dynamics and constraint dynamics. Mol. Phys. 34, 1311-1327. 18