Supplementary data

advertisement
Supplementary material for
A coarse-grained model with implicit salt for RNAs: predicting 3D structure,
stability and salt effect
Ya-Zhou Shi†, Feng-Hua Wang†, Yuan-Yan Wu, and Zhi-Jie Tan*
Department of Physics and Key Laboratory of Artificial Micro- and Nano-structures of Ministry of
Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
†
*
The authors contributed equally to the work.
To whom correspondence should be addressed. Email: zjtan@whu.edu.cn
1
The energy function of the model
The total energy U in the present model is composed of following two components
π‘ˆ = π‘ˆπ‘π‘œπ‘›π‘‘π‘’π‘‘ + π‘ˆπ‘›π‘œπ‘›π‘π‘œπ‘›π‘‘π‘’π‘‘ .
(S1)
The bonded potential Ubonded is used to mimic the connectivity and the local geometry of RNAs.
Ubonded consists of the energies for virtual bond length Ub, bond angle Ua and dihedral Ud,
π‘ˆπ‘π‘œπ‘›π‘‘π‘’π‘‘ = π‘ˆπ‘ + π‘ˆπ‘Ž + π‘ˆπ‘‘ ;
(S2)
π‘ˆπ‘ = ∑ 𝐾𝑏 (π‘Ÿ − π‘Ÿ0 )2 ;
(S3)
π‘π‘œπ‘›π‘‘π‘ 
π‘ˆπ‘Ž = ∑ πΎπœƒ (πœƒ − πœƒ0 )2 ;
(S4)
π‘Žπ‘›π‘”π‘™π‘’π‘ 
π‘ˆπ‘‘ =
∑
π‘‘π‘–β„Žπ‘’π‘‘π‘Ÿπ‘Žπ‘™π‘ 
1
{πΎπœ‘ [1 − π‘π‘œπ‘ (πœ‘ − πœ‘0 )] + πΎπœ‘ [1 − π‘π‘œπ‘ 3(πœ‘ − πœ‘0 )]} ,
2
(S5)
where Kb, Kθ, and Kφ represent the energy strength; r0, θ0, and φ0 are the distances and angles for
virtual bonds, bond angles and dihedrals at energy minimum, respectively. The initial parameters of
these potentials were derived from the Boltzmann inversion of the corresponding atomistic
distribution functions P(x) obtained by the statistical analysis on the experimental structures in the
PDB (shown in Table SI)
π‘ˆ(π‘₯) = −π‘˜π΅ 𝑇ln[𝑃(π‘₯)],
(S6)
where kB is the Boltzmann constant and T is the absolute temperature in Kelvin. P(x) is the
normalized probability distribution for corresponding bonded characteristic x in the structures, such
as bond length and angle; see Figs. S1-S3.
The nonbonded energy function Unonbonded in Eq. S1 is composed of the following four
components
π‘ˆπ‘›π‘œπ‘›π‘π‘œπ‘›π‘‘π‘’π‘‘ = π‘ˆπ‘’π‘₯𝑐 + π‘ˆπ‘žπ‘ž + π‘ˆπ‘π‘ + π‘ˆπ‘π‘  .
(S7)
Uexc represents the excluded volume interaction between the CG beads and it is modeled by a purely
repulsive Lennard-Jones potential:
12
𝜎0
4πœ€ [( )
= ∑{
π‘Ÿπ‘–π‘—
𝑁
π‘ˆπ‘’π‘₯𝑐
𝑖<𝑗
0
6
𝜎0
−( ) ]
π‘Ÿπ‘–π‘—
𝑖𝑓 π‘Ÿπ‘–π‘— ≤ 𝜎0
,
(S8)
𝑖𝑓 π‘Ÿπ‘–π‘— > 𝜎0
where ε = 0.26 kcal/mol is the interaction strength, σ0 is the sum of the radii of bead i and j, and rij is
2
the distance between bead i and j.
Uqq in Eq. S7 is the electrostatic interaction. For simplicity, Uqq can be given by the screened
Coulombic interaction between phosphates with reduced charges
𝑁P
π‘ˆπ‘žπ‘ž = ∑
𝑖<𝑗
(𝑄𝑒)2
e−π‘Ÿπ‘–π‘—/𝑙𝐷 ,
4πœ‹πœ€0 πœ€(𝑇)π‘Ÿπ‘–π‘—
(S9)
where e is the elementary charge and NP is the number of phosphate beads in an RNA. Upon
counterions condensation onto RNA molecules, the charge of each phosphate group decreases from –
e to –Qe and Q is given by Q=b/lB where b is the charge spacing on RNA backbone and lB is the
Bjerrum length75,81. lD is Debye length
πœ€0 πœ€(𝑇)π‘˜π΅ 𝑇 1/2
𝑙𝐷 = (
) ,
2𝑁𝐴 𝑒 2 𝐼
(S10)
which defines the ionic screening. Here, kB is the Boltzmann constant, T is the absolute temperature
in Kelvin, I is the ionic strength which is proportional to the ion concentration of the salt solution and
ε(T) is an effective dielectric constant of water decreasing with the temperature T75,80:
πœ€(𝑇) = 87.74 − 0.4008𝑇 + 9.398 × 10−4 𝑇 2 − 1.41 × 10−6 𝑇 3 ,
(S11)
where T is in ℃. And lD decreases slightly with the increasing of temperature and decreases rapidly
with the ionic strength. It should be noted that the model can be further extended to involve explicit
salt ions78.
Ubp and Ubs are the base-pairing and base-stacking interactions, which play important roles in
stabilizing the secondary structure of RNAs. For the canonical Watson-Crick base pairs (G-C and
A-U) and the wobble base pairs (G-U), if the distance rNiNj between two base beads Ni and Nj,
satisfies the pairing criteria: a1<rNiNj<a2, the hydrogen-bond is formed and the potential is given by
𝑁𝑏𝑝
π‘ˆπ‘π‘ = ∑
πœ€π‘π‘
2
2
2
,
(S12)
𝑖<𝑗−3 1 + π‘˜π‘π‘ (π‘Ÿπ‘π‘– 𝑁𝑗 − π‘Ÿπ‘π‘ ) + π‘˜πΆπ‘ ∑𝑖(𝑗) (π‘ŸπΆπ‘– 𝑁𝑗 − π‘ŸπΆπ‘ ) + π‘˜π‘ƒπ‘ ∑𝑖(𝑗) (π‘Ÿπ‘ƒπ‘– 𝑁𝑗 − π‘Ÿπ‘ƒπ‘ )
where εbp (<0) is the interaction strength πœ€π΄π‘ˆ = πœ€πΊπ‘ˆ = π›Ύπœ€πΊπΆ where γ describes the ratio of pairing
strength between different types of bases. rNN, rCN, and rPN are three distances (Fig. 1b) obtained
from the known structures (Fig. S4a) to determine whether the base-pair is well formed, and kNN, kCN
and kPN are the corresponding energy strength. i(j) in Eq. S12 stands for the summation over i and j.
In the model, one nucleotide cannot become paired with more than one nucleotide. If two nearest
neighbour bases (i, i+1) are paired with other neighbour bases (j, j-1) respectively, the base-stacking
3
is formed (Fig. 1c) and the base-stacking energy is given by
𝑁𝑠𝑑
π‘ˆπ‘π‘ 
12
1
πœŽπ‘ π‘‘
= ∑ |𝐺𝑖,𝑖+1,𝑗−1,𝑗 | {[5 (
)
2
π‘Ÿπ‘–,𝑖+1
𝑖,𝑗
10
− 6(
12
πœŽπ‘ π‘‘
πœŽπ‘ π‘‘
) ] + [5 (
)
π‘Ÿπ‘–,𝑖+1
π‘Ÿπ‘—,𝑗−1
10
− 6(
πœŽπ‘ π‘‘
) ]} ,
π‘Ÿπ‘—,𝑗−1
(S13)
where σst is the optimum distance of two neighbour bases in the known helix structures (Fig. S4b).
Gi,i+1,j-1,j in Eq. S13 is the strength of base-stacking energy and can be estimated from the
combination of the experimental parameters and Monte Carlo (MC) algorithm
𝐺𝑖,𝑖+1,𝑗−1,𝑗 = βˆ†π» − 𝑇(βˆ†π‘† − βˆ†π‘†π‘ ),
(S14)
where ΔH and ΔS are the RNA thermodynamic parameters derived from experiments83,84. T is the
absolute temperature in Kelvin, and ΔSc is the conformational entropy change which is naturally
included in MC algorithm due to the formation of one base pair stacking. ΔSc was calculated from
MC simulations for A-form double-stranded RNA, as shown in Fig. S5a. In the simulations, we fixed
the entire molecule except for nucleotides ≤ i or ≥ j and counted the number Ω of conformations
which satisfy the condition of the stacking between base pairs (i, j) and (i+1, j-1) in the absence of
the base-pairing and base-stacking constraints. Based on these, the conformational entropy changes
for the formation of base stacking between base pairs (i, j) and (i+1, j-1) is calculated by
βˆ†π‘†π‘ = π‘˜π΅ ln(𝛺⁄𝛺0 ),
(S15)
where kB is the Boltzmann constant, and Ω0 is the total number of conformations searched in this
simulation. As Fig. S5b shown, ΔSc changes very slightly at different base pair location i.
Consequently, for simplicity, the average value of -9.7 eu was used in the present model.
4
Figure S1. The normalized probability distributions P(r) of the virtual bond length for PC, CP and
CN shown in Fig. 1, which are obtained by the statistical analysis over the structures of 157 RNA
molecules (PDB codes are listed in Table SI). Solid lines: for helical parts in RNAs. Dotted lines: for
nonhelical parts in RNAs.
5
Figure S2. The normalized probability distributions P(θ) of the bond angles for PCP (a), CPC (b),
PCN (c) and NCP (d) shown in Fig. 1, which are obtained by the statistical analysis over the
structures of 157 RNA molecules (PDB codes are listed in Table SI). Solid lines: for helical parts in
RNAs. Dotted lines: for nonhelical parts in RNAs.
6
Figure S3. The normalized probability distributions P(φ) of bond dihedrals for the RNA backbone (a)
and the side-chains (b) shown in Fig. 1, which are obtained by the statistical analysis over the
structures of 157 RNA molecules (PDB codes are listed in Table SI). The inset in (a) shows the
zoomed portion of the figure near the baseline in the interval of [-1.5, 1.5]. Solid lines: for helical
parts in RNAs. Dotted lines: for nonhelical parts in RNAs.
7
Figure S4. The normalized probability distributions P(r) of distances for base-pairing (a) and
base-stacking (b) shown in Fig. 1, which are obtained by the statistical analysis over the structures of
157 RNAs (PDB codes are listed in Table SI).
8
Figure S5. The illustration for the calculation of the conformational entropy changes ΔSc (in Eq. 5
and Eq. S14) of base-stacking formation naturally included in the MC simulations. (a) The schematic
diagram for the formation of one base-stacking between base pairs (i, j) and (i+1, j-1). (b) The
conformational entropy changes ΔSc for the formation of base-pairs stacking at different location i
(symbols), and the average value of -9.7 eu (line).
9
Figure S6. The fraction of denatured base pairs as a function of temperature at different [Na+]’s for
the three RNAs, RH23 (a), RH24 (b) and RH30 (c), shown in Table I in the main text. Symbols: the
predicted data; Bold lines: fitted to the predicted data through Eq. 7 in the main text.
10
Table SI. The PDB codes of RNAs used in our statistical analysis for CG force field.
1a51
1a60
1aq0
1atv
1bn0
1bz3
1cq5
1cql
1e95
1esh
1f6z
1f7f
1f7h
1f84
1f85
1f9l
1fhk
1fqz
1fyo
1fyp
1hs1
1hs3
1hs4
1i3x
1i3y
1i4b
1ie1
1ikl
1j4y
1jox
1jp0
1jtj
1ju7
1jwc
1jzc
1k2g
1k4b
1k5i
1k6g
1kka
1kp7
1lc6
1m82
1mfj
1mfk
1mfy
1mt4
1na2
1nc0
1osw
1owk
1p5m
1p5n
1q75
1qc8
1qwa
1r4h
1r7z
1s34
1syz
1szy
1tbk
1tjz
1u3k
1wks
1xsh
1xsu
1swu
1ymo
1yn1
1yn2
1ync
1yne
1yng
1ysv
1z2j
1z30
1z31
28sp
28sr
2b7g
2b57
2cd3
2cd5
2es5
2euy
2f87
2f88
2frl
2gio
2gip
2gv3
2gv4
2gvo
2h2x
2hem
2hns
2hua
2ixy
2jr4
2jse
2jsg
2jtp
2juk
2jxu
2jym
2k5z
2k95
2kd8
2kez
2kf0
2khy
2koc
2kpc
2kpd
2ktz
2ku0
2kur
2kuu
2kuw
2kvn
2l5z
2l94
2lac
2lbj
2lbk
2lbl
2lc8
2ldl
2ldt
2lhp
2li4
2lk3
2lpa
2lps
2lpt
2lu8
2m12
2m21
2m22
2m58
2m8k
2o33
2qh2
2qh3
2qh4
2rlu
2ro2
2rpk
2rpt
387d
3la5
3php
3sd1
4a4r
4a4s
4a4t
11
Table SII. The parameters of bonded potentials in Eqs. S2-S5.
Bond Ub
Kb (kcal/mol)
r0 (Å)
Parahelical a
Paranonhelical b
Parahelical
Paranonhelical
PiCi
133.4
98.2
3.95
3.95
CiPi+1
75.0
42.5
3.93
3.93
CiNi
85.6
24.8
3.35
3.45
Angle Ua
θ0 (rad)
Kθ (kcal/mol)
Parahelical
Paranonhelical
Parahelical
Paranonhelical
PiCiPi+1
18.3
9.3
1.74
1.75
Ci-1PiCi
43.9
21.3
1.76
1.78
PiCiNi
35.5
9.7
1.63
1.64
NiCiPi+1
99.8
15.2
1.66
1.66
Dihedral Ud
φ0 (rad)
Kφ (kcal/mol)
Parahelical
Paranonhelical
Parahelical
Paranonhelical
PiCi Pi+1Ci+1
2.8
1.1
2.56
2.51
Ci-1PiCiPi+1
10.5
4.3
-2.94
-2.92
Ci-1PiCiNi
3.8
0.8
-1.16
-1.18
Ni-1Ci-1PiCi
4.2
0.7
0.88
0.78
a
The Parahelical only are used in the processes of folded structure
refinement for the base-pairing regions (stems) in the initially folded
structure. b The Paranonhelical are used in RNA folding processes to
possibly describe RNAs as free chains.
12
Table SIII. The parameters for the energy functions of base pairing and base stacking.
Base pairing Ubp
Distances
rNN
rCN
rPN
r (Å)
8.9
12.2
13.9
Energy strength
kNN
kCN
kPN
k (kcal/mol)
3.6
1.9
0.7
εbp=-3.5 kcal/mol
a1=8.6Å; a2=9.3Å
γ=0.5
Base stacking Ubs
ΔSc=-9.7 eu
σst=4.8 Å
13
Table SIV. The 46 RNA molecules for 3D structure prediction used in this work.
RNAsa PDB ID
Length(nt)
RMSDbmean (Å)
Type of structure
RMSDcminimum (Å)
RMSDdMC-Sym (Å)
1
1zih
12
Hairpin
2.1
2.0
1.8
2
2f87
12
Hairpin
2.9
2.6
1.2
3
2y95
14
Hairpin
2.0
2.2
1.0
4
1oq0
15
Hairpin
2.7
3.4
2.0
5
1q75
15
Hairpin
2.1
2.5
1.4
6
2lp9
16
Hairpin with bulge loop
2.8
2.7
1.5
7
1atv
17
Hairpin
2.5
2.6
1.5
8
1j4y
17
Hairpin
4.6
4.0
2.8
9
1kka
17
Hairpin
4.9
4.2
1.6
10
1z30
18
Hairpin
2.6
1.9
0.9
11
1esy
19
Hairpin with internal loop
4.6
2.9
1.9
12
1i3x
19
Hairpin
2.8
2.8
1.6
13
2b7g
19
Hairpin
2.8
2.6
1.0
14
2meq
19
Hairpin
6.1
2.9
1.3
15
1bn0
20
Hairpin
3.6
2.1
1.2
16
1u2a
20
Hairpin
3.0
2.6
1.5
17
1qwa
21
Hairpin with bulge loop
4.6
3.0
1.6
18
2grw
22
Hairpin with internal loop
3.9
3.8
2.2
19
1k6g
22
Hairpin
3.2
2.9
1.1
20
2kd8
22
Hairpin
2.9
2.4
1.1
21
1bgz
23
Hairpin with bulge loop
4.8
4.6
2.1
22
2ro2
23
Hairpin
2.6
2.9
1.5
23
3php
23
Hairpin
3.8
3.9
1.8
24
1lc6
24
Hairpin with internal loop
5.7
4.1
2.3
25
2hem
24
Hairpin with internal loop
4.2
3.1
1.8
26
2lv0
24
Hairpin with internal loop
3.9
3.1
1.5
27
1m82
25
Hairpin with bulge loop
2.2
2.4
1.0
28
2a43
26
Pseudoknot
4.3
4.2
2.7
29
1fqz
27
Hairpin with internal loop
3.4
5.2
2.3
30
1q9a
27
Hairpin with internal loop
5.8
5.4
3.8
31
2aht
27
Hairpin with bulge loop
3.8
3.4
1.5
32
1f6x
27
Hairpin with bulge loop
2.7
2.9
1.7
33
1kpz
28
Pseudoknot
5.7
4.2
2.7
34
1l2x
28
Pseudoknot
5.9
5.2
3.5
35
28sp
28
Hairpin with internal loop
3.7
3.9
2.1
36
28sr
28
Hairpin with internal loop
3.6
4.0
2.4
37
437d
28
Pseudoknot
5.9
5.4
3.4
38
1ebq
29
Hairpin with internal loop
3.7
4.5
2.4
39
2ldz
30
Hairpin with internal loop
4.7
5.0
2.6
40
1rfr
30
Hairpin with internal loop
3.8
3.8
1.9
41
1yne
31
Hairpin with bulge loop
3.9
2.7
1.8
42
2f88
34
Hairpin with internal loop
2.6
3.4
2.3
43
1txs
38
Hairpin with internal loop
3.2
4.7
2.8
44
1zc5
41
Hairpin with bulge loop
5.9
3.4
1.8
45
2l2j
42
Hairpin with internal loop
4.3
3.7
2.0
46
1z2j
45
Hairpin with bulge loop
8.5
4.2
3.1
a
For each RNA, PDB ID, chain length, type of structure, and the mean and the minimum RMSDs are listed. b, c The mean
and the minimum RMSDs are calculated over C beads of structures predicted by the present model from the corresponding
atoms C4’ of the native structure. d The RMSDs are calculated over the C4’ atoms of the top 1 structure for each RNA
predicted by the MC-Fold/MC-Sym pipeline45 (http://www.major.iric.ca/MC-Fold/) from the native structure.
14
References
45. Parisien, M., and Major, F., “The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data,”
Nature 452, 51-55 (2008).
75. Denesyuk, N. and Thirumalai, D. (2013) Coarse-grained model for predicting RNA folding thermodynamics.
J. Phys. Chem. B, 117, 4901–4911.
78. Wang, F.H., Wu, Y.Y. and Tan, Z.J. (2013) Salt contribution to the flexibility of single-stranded nucleic acid of
finite length. Biopolymers, 99, 370-381.
80. Tan, Z.J. and Chen, S.J. (2006) Nucleic acid helix stability: effects of salt concentration, cation valence and
size, and chain length. Biophys. J., 90, 1175-1190.
81.
Manning, G.S. (1978) The molecular theory of polyelectrolyte solutions with applications to the electrostatic
properties of polynucleotides. Q. Rev. Biophys., 11, 179-246.
83. Xia, T., SantaLucia, J. Burkand, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C. and Turner, D.H. (1998)
Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with
Watson-Crick base pairs. Biochemistry, 37, 14719-14735.
84. Mathews, D.H., Sabina, J., Zuker, M. and Turner, D.H. (1999) Expended sequence dependence of
thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288, 911-940.
15
Download