Supplementary material for A coarse-grained model with implicit salt for RNAs: predicting 3D structure, stability and salt effect Ya-Zhou Shi†, Feng-Hua Wang†, Yuan-Yan Wu, and Zhi-Jie Tan* Department of Physics and Key Laboratory of Artificial Micro- and Nano-structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China † * The authors contributed equally to the work. To whom correspondence should be addressed. Email: zjtan@whu.edu.cn 1 The energy function of the model The total energy U in the present model is composed of following two components π = πππππππ + ππππππππππ . (S1) The bonded potential Ubonded is used to mimic the connectivity and the local geometry of RNAs. Ubonded consists of the energies for virtual bond length Ub, bond angle Ua and dihedral UdοΌ πππππππ = ππ + ππ + ππ ; (S2) ππ = ∑ πΎπ (π − π0 )2 ; (S3) πππππ ππ = ∑ πΎπ (π − π0 )2 ; (S4) ππππππ ππ = ∑ ππβππππππ 1 {πΎπ [1 − πππ (π − π0 )] + πΎπ [1 − πππ 3(π − π0 )]} , 2 (S5) where Kb, Kθ, and Kφ represent the energy strength; r0, θ0, and φ0 are the distances and angles for virtual bonds, bond angles and dihedrals at energy minimum, respectively. The initial parameters of these potentials were derived from the Boltzmann inversion of the corresponding atomistic distribution functions P(x) obtained by the statistical analysis on the experimental structures in the PDB (shown in Table SI) π(π₯) = −ππ΅ πln[π(π₯)], (S6) where kB is the Boltzmann constant and T is the absolute temperature in Kelvin. P(x) is the normalized probability distribution for corresponding bonded characteristic x in the structures, such as bond length and angle; see Figs. S1-S3. The nonbonded energy function Unonbonded in Eq. S1 is composed of the following four components ππππππππππ = πππ₯π + πππ + πππ + πππ . (S7) Uexc represents the excluded volume interaction between the CG beads and it is modeled by a purely repulsive Lennard-Jones potential: 12 π0 4π [( ) = ∑{ πππ π πππ₯π π<π 0 6 π0 −( ) ] πππ ππ πππ ≤ π0 , (S8) ππ πππ > π0 where ε = 0.26 kcal/mol is the interaction strength, σ0 is the sum of the radii of bead i and j, and rij is 2 the distance between bead i and j. Uqq in Eq. S7 is the electrostatic interaction. For simplicity, Uqq can be given by the screened Coulombic interaction between phosphates with reduced charges πP πππ = ∑ π<π (ππ)2 e−πππ/ππ· , 4ππ0 π(π)πππ (S9) where e is the elementary charge and NP is the number of phosphate beads in an RNA. Upon counterions condensation onto RNA molecules, the charge of each phosphate group decreases from – e to –Qe and Q is given by Q=b/lB where b is the charge spacing on RNA backbone and lB is the Bjerrum length75,81. lD is Debye length π0 π(π)ππ΅ π 1/2 ππ· = ( ) , 2ππ΄ π 2 πΌ (S10) which defines the ionic screening. Here, kB is the Boltzmann constant, T is the absolute temperature in Kelvin, I is the ionic strength which is proportional to the ion concentration of the salt solution and ε(T) is an effective dielectric constant of water decreasing with the temperature T75,80: π(π) = 87.74 − 0.4008π + 9.398 × 10−4 π 2 − 1.41 × 10−6 π 3 , (S11) where T is in β. And lD decreases slightly with the increasing of temperature and decreases rapidly with the ionic strength. It should be noted that the model can be further extended to involve explicit salt ions78. Ubp and Ubs are the base-pairing and base-stacking interactions, which play important roles in stabilizing the secondary structure of RNAs. For the canonical Watson-Crick base pairs (G-C and A-U) and the wobble base pairs (G-U), if the distance rNiNj between two base beads Ni and Nj, satisfies the pairing criteria: a1<rNiNj<a2, the hydrogen-bond is formed and the potential is given by πππ πππ = ∑ πππ 2 2 2 , (S12) π<π−3 1 + πππ (πππ ππ − πππ ) + ππΆπ ∑π(π) (ππΆπ ππ − ππΆπ ) + πππ ∑π(π) (πππ ππ − πππ ) where εbp (<0) is the interaction strength ππ΄π = ππΊπ = πΎππΊπΆ where γ describes the ratio of pairing strength between different types of bases. rNN, rCN, and rPN are three distances (Fig. 1b) obtained from the known structures (Fig. S4a) to determine whether the base-pair is well formed, and kNN, kCN and kPN are the corresponding energy strength. οi(j) in Eq. S12 stands for the summation over i and j. In the model, one nucleotide cannot become paired with more than one nucleotide. If two nearest neighbour bases (i, i+1) are paired with other neighbour bases (j, j-1) respectively, the base-stacking 3 is formed (Fig. 1c) and the base-stacking energy is given by ππ π‘ πππ 12 1 ππ π‘ = ∑ |πΊπ,π+1,π−1,π | {[5 ( ) 2 ππ,π+1 π,π 10 − 6( 12 ππ π‘ ππ π‘ ) ] + [5 ( ) ππ,π+1 ππ,π−1 10 − 6( ππ π‘ ) ]} , ππ,π−1 (S13) where σst is the optimum distance of two neighbour bases in the known helix structures (Fig. S4b). Gi,i+1,j-1,j in Eq. S13 is the strength of base-stacking energy and can be estimated from the combination of the experimental parameters and Monte Carlo (MC) algorithm πΊπ,π+1,π−1,π = βπ» − π(βπ − βππ ), (S14) where ΔH and ΔS are the RNA thermodynamic parameters derived from experiments83,84. T is the absolute temperature in Kelvin, and ΔSc is the conformational entropy change which is naturally included in MC algorithm due to the formation of one base pair stacking. ΔSc was calculated from MC simulations for A-form double-stranded RNA, as shown in Fig. S5a. In the simulations, we fixed the entire molecule except for nucleotides ≤ i or ≥ j and counted the number Ω of conformations which satisfy the condition of the stacking between base pairs (i, j) and (i+1, j-1) in the absence of the base-pairing and base-stacking constraints. Based on these, the conformational entropy changes for the formation of base stacking between base pairs (i, j) and (i+1, j-1) is calculated by βππ = ππ΅ ln(πΊ⁄πΊ0 ), (S15) where kB is the Boltzmann constant, and Ω0 is the total number of conformations searched in this simulation. As Fig. S5b shown, ΔSc changes very slightly at different base pair location i. Consequently, for simplicity, the average value of -9.7 eu was used in the present model. 4 Figure S1. The normalized probability distributions P(r) of the virtual bond length for PC, CP and CN shown in Fig. 1, which are obtained by the statistical analysis over the structures of 157 RNA molecules (PDB codes are listed in Table SI). Solid lines: for helical parts in RNAs. Dotted lines: for nonhelical parts in RNAs. 5 Figure S2. The normalized probability distributions P(θ) of the bond angles for PCP (a), CPC (b), PCN (c) and NCP (d) shown in Fig. 1, which are obtained by the statistical analysis over the structures of 157 RNA molecules (PDB codes are listed in Table SI). Solid lines: for helical parts in RNAs. Dotted lines: for nonhelical parts in RNAs. 6 Figure S3. The normalized probability distributions P(φ) of bond dihedrals for the RNA backbone (a) and the side-chains (b) shown in Fig. 1, which are obtained by the statistical analysis over the structures of 157 RNA molecules (PDB codes are listed in Table SI). The inset in (a) shows the zoomed portion of the figure near the baseline in the interval of [-1.5, 1.5]. Solid lines: for helical parts in RNAs. Dotted lines: for nonhelical parts in RNAs. 7 Figure S4. The normalized probability distributions P(r) of distances for base-pairing (a) and base-stacking (b) shown in Fig. 1, which are obtained by the statistical analysis over the structures of 157 RNAs (PDB codes are listed in Table SI). 8 Figure S5. The illustration for the calculation of the conformational entropy changes ΔSc (in Eq. 5 and Eq. S14) of base-stacking formation naturally included in the MC simulations. (a) The schematic diagram for the formation of one base-stacking between base pairs (i, j) and (i+1, j-1). (b) The conformational entropy changes ΔSc for the formation of base-pairs stacking at different location i (symbols), and the average value of -9.7 eu (line). 9 Figure S6. The fraction of denatured base pairs as a function of temperature at different [Na+]’s for the three RNAs, RH23 (a), RH24 (b) and RH30 (c), shown in Table I in the main text. Symbols: the predicted data; Bold lines: fitted to the predicted data through Eq. 7 in the main text. 10 Table SI. The PDB codes of RNAs used in our statistical analysis for CG force field. 1a51 1a60 1aq0 1atv 1bn0 1bz3 1cq5 1cql 1e95 1esh 1f6z 1f7f 1f7h 1f84 1f85 1f9l 1fhk 1fqz 1fyo 1fyp 1hs1 1hs3 1hs4 1i3x 1i3y 1i4b 1ie1 1ikl 1j4y 1jox 1jp0 1jtj 1ju7 1jwc 1jzc 1k2g 1k4b 1k5i 1k6g 1kka 1kp7 1lc6 1m82 1mfj 1mfk 1mfy 1mt4 1na2 1nc0 1osw 1owk 1p5m 1p5n 1q75 1qc8 1qwa 1r4h 1r7z 1s34 1syz 1szy 1tbk 1tjz 1u3k 1wks 1xsh 1xsu 1swu 1ymo 1yn1 1yn2 1ync 1yne 1yng 1ysv 1z2j 1z30 1z31 28sp 28sr 2b7g 2b57 2cd3 2cd5 2es5 2euy 2f87 2f88 2frl 2gio 2gip 2gv3 2gv4 2gvo 2h2x 2hem 2hns 2hua 2ixy 2jr4 2jse 2jsg 2jtp 2juk 2jxu 2jym 2k5z 2k95 2kd8 2kez 2kf0 2khy 2koc 2kpc 2kpd 2ktz 2ku0 2kur 2kuu 2kuw 2kvn 2l5z 2l94 2lac 2lbj 2lbk 2lbl 2lc8 2ldl 2ldt 2lhp 2li4 2lk3 2lpa 2lps 2lpt 2lu8 2m12 2m21 2m22 2m58 2m8k 2o33 2qh2 2qh3 2qh4 2rlu 2ro2 2rpk 2rpt 387d 3la5 3php 3sd1 4a4r 4a4s 4a4t 11 Table SII. The parameters of bonded potentials in Eqs. S2-S5. Bond Ub Kb (kcal/mol) r0 (Å) Parahelical a Paranonhelical b Parahelical Paranonhelical PiCi 133.4 98.2 3.95 3.95 CiPi+1 75.0 42.5 3.93 3.93 CiNi 85.6 24.8 3.35 3.45 Angle Ua θ0 (rad) Kθ (kcal/mol) Parahelical Paranonhelical Parahelical Paranonhelical PiCiPi+1 18.3 9.3 1.74 1.75 Ci-1PiCi 43.9 21.3 1.76 1.78 PiCiNi 35.5 9.7 1.63 1.64 NiCiPi+1 99.8 15.2 1.66 1.66 Dihedral Ud φ0 (rad) Kφ (kcal/mol) Parahelical Paranonhelical Parahelical Paranonhelical PiCi Pi+1Ci+1 2.8 1.1 2.56 2.51 Ci-1PiCiPi+1 10.5 4.3 -2.94 -2.92 Ci-1PiCiNi 3.8 0.8 -1.16 -1.18 Ni-1Ci-1PiCi 4.2 0.7 0.88 0.78 a The Parahelical only are used in the processes of folded structure refinement for the base-pairing regions (stems) in the initially folded structure. b The Paranonhelical are used in RNA folding processes to possibly describe RNAs as free chains. 12 Table SIII. The parameters for the energy functions of base pairing and base stacking. Base pairing Ubp Distances rNN rCN rPN r (Å) 8.9 12.2 13.9 Energy strength kNN kCN kPN k (kcal/mol) 3.6 1.9 0.7 εbp=-3.5 kcal/mol a1=8.6Å; a2=9.3Å γ=0.5 Base stacking Ubs ΔSc=-9.7 eu σst=4.8 Å 13 Table SIV. The 46 RNA molecules for 3D structure prediction used in this work. RNAsa PDB ID Length(nt) RMSDbmean (Å) Type of structure RMSDcminimum (Å) RMSDdMC-Sym (Å) 1 1zih 12 Hairpin 2.1 2.0 1.8 2 2f87 12 Hairpin 2.9 2.6 1.2 3 2y95 14 Hairpin 2.0 2.2 1.0 4 1oq0 15 Hairpin 2.7 3.4 2.0 5 1q75 15 Hairpin 2.1 2.5 1.4 6 2lp9 16 Hairpin with bulge loop 2.8 2.7 1.5 7 1atv 17 Hairpin 2.5 2.6 1.5 8 1j4y 17 Hairpin 4.6 4.0 2.8 9 1kka 17 Hairpin 4.9 4.2 1.6 10 1z30 18 Hairpin 2.6 1.9 0.9 11 1esy 19 Hairpin with internal loop 4.6 2.9 1.9 12 1i3x 19 Hairpin 2.8 2.8 1.6 13 2b7g 19 Hairpin 2.8 2.6 1.0 14 2meq 19 Hairpin 6.1 2.9 1.3 15 1bn0 20 Hairpin 3.6 2.1 1.2 16 1u2a 20 Hairpin 3.0 2.6 1.5 17 1qwa 21 Hairpin with bulge loop 4.6 3.0 1.6 18 2grw 22 Hairpin with internal loop 3.9 3.8 2.2 19 1k6g 22 Hairpin 3.2 2.9 1.1 20 2kd8 22 Hairpin 2.9 2.4 1.1 21 1bgz 23 Hairpin with bulge loop 4.8 4.6 2.1 22 2ro2 23 Hairpin 2.6 2.9 1.5 23 3php 23 Hairpin 3.8 3.9 1.8 24 1lc6 24 Hairpin with internal loop 5.7 4.1 2.3 25 2hem 24 Hairpin with internal loop 4.2 3.1 1.8 26 2lv0 24 Hairpin with internal loop 3.9 3.1 1.5 27 1m82 25 Hairpin with bulge loop 2.2 2.4 1.0 28 2a43 26 Pseudoknot 4.3 4.2 2.7 29 1fqz 27 Hairpin with internal loop 3.4 5.2 2.3 30 1q9a 27 Hairpin with internal loop 5.8 5.4 3.8 31 2aht 27 Hairpin with bulge loop 3.8 3.4 1.5 32 1f6x 27 Hairpin with bulge loop 2.7 2.9 1.7 33 1kpz 28 Pseudoknot 5.7 4.2 2.7 34 1l2x 28 Pseudoknot 5.9 5.2 3.5 35 28sp 28 Hairpin with internal loop 3.7 3.9 2.1 36 28sr 28 Hairpin with internal loop 3.6 4.0 2.4 37 437d 28 Pseudoknot 5.9 5.4 3.4 38 1ebq 29 Hairpin with internal loop 3.7 4.5 2.4 39 2ldz 30 Hairpin with internal loop 4.7 5.0 2.6 40 1rfr 30 Hairpin with internal loop 3.8 3.8 1.9 41 1yne 31 Hairpin with bulge loop 3.9 2.7 1.8 42 2f88 34 Hairpin with internal loop 2.6 3.4 2.3 43 1txs 38 Hairpin with internal loop 3.2 4.7 2.8 44 1zc5 41 Hairpin with bulge loop 5.9 3.4 1.8 45 2l2j 42 Hairpin with internal loop 4.3 3.7 2.0 46 1z2j 45 Hairpin with bulge loop 8.5 4.2 3.1 a For each RNA, PDB ID, chain length, type of structure, and the mean and the minimum RMSDs are listed. b, c The mean and the minimum RMSDs are calculated over C beads of structures predicted by the present model from the corresponding atoms C4’ of the native structure. d The RMSDs are calculated over the C4’ atoms of the top 1 structure for each RNA predicted by the MC-Fold/MC-Sym pipeline45 (http://www.major.iric.ca/MC-Fold/) from the native structure. 14 References 45. Parisien, M., and Major, F., “The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data,” Nature 452, 51-55 (2008). 75. Denesyuk, N. and Thirumalai, D. (2013) Coarse-grained model for predicting RNA folding thermodynamics. J. Phys. Chem. B, 117, 4901–4911. 78. Wang, F.H., Wu, Y.Y. and Tan, Z.J. (2013) Salt contribution to the flexibility of single-stranded nucleic acid of finite length. Biopolymers, 99, 370-381. 80. Tan, Z.J. and Chen, S.J. (2006) Nucleic acid helix stability: effects of salt concentration, cation valence and size, and chain length. Biophys. J., 90, 1175-1190. 81. Manning, G.S. (1978) The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Q. Rev. Biophys., 11, 179-246. 83. Xia, T., SantaLucia, J. Burkand, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C. and Turner, D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry, 37, 14719-14735. 84. Mathews, D.H., Sabina, J., Zuker, M. and Turner, D.H. (1999) Expended sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288, 911-940. 15