prot24751-sup-0001-suppinfosi1

advertisement
Supporting Information File 1 (SI-1)
Paudyal, S., Alfonso-Prieto, M. et al.
Computational Methods and Results
Homology modeling of Ec-RNase III. Fig. S1 shows the sequence alignment of the two template
sequences (Aa- and Tm-RNase III) and the query sequence (Ec-RNase III). The following protocol was
applied to build the Ec-RNase III homology models. For each template (PDB entries 1O0W, 2NUG or
2NUE), 1,500 models of the Ec-RNase III structure were generated using Modeller (version 9.10),27
then clustered using the g_cluster tool of the Gromacs package,29 applying the gromos method30 and an
RMSD cutoff of either 0.65 Å (1O0W-based model) or 1.5 Å (2NUE and 2NUG). A single
representative of the most highly populated cluster was selected, then applying Modeller to optimize
the conformation of the variable loops.28 The loops corresponded to residues 29-36 for the homology
model based on the Tm-RNase III template, and residues 102-107 and 194-201 for the two Aa-based
models. 1,500 loop conformations were generated and the most representative was selected with
g_cluster, using the gromos method and an RMSD cutoff of 1.25 Å (1O0W-based model), 1.35 Å
(2NUE) or 1.50 Å (2NUG). The final homology models of Ec-RNase III are shown in Fig. S2.
S1
Figure S1. Alignment of the Escherichia coli (Ec), Aquifex aeolicus (Aa) and Thermotoga maritima
(Tm) RNase III polypeptide sequences used in the homology modeling of Ec-RNase III. The residue
numbering of Ec-RNase III and the predicted secondary structure (using PSIPRED)24 are shown above
the aligned sequences. Conserved residues are highlighted in black, and similar residues are boxed.
S2
Figure S2. Homology models of Ec-RNase III. The Ec-RNase III structural models are shown in
brown cartoon form, whereas the crystal structures of the corresponding templates are shown in blue.
(A) Model of the apo state, built using the Tm-RNase III structure (PDB entry 1O0W) as template. The
inset shows the protein in the same orientation as in (C), in order to highlight the change in positions of
the two dsRBDs during the catalytic cycle. In contrast, the structures of the two RNase III domains
(RIIIDs) remain essentially unchanged. (B) Model of a presumptive pre-catalytic complex, built using
as template the Aa-RNase III structure with a dsRNA bound to one of the dsRBDs (PDB entry
2NUE).20 The dsRNA is shown in pink cartoon form. (C) Model of a presumptive post-catalytic
complex, built using as template the Aa-RNase III structure with dsRNA bound to the catalytic valley
(PDB entry 2NUG).20
S3
Protein-protein docking. Haddock (version 2.0)31,32 and CNS (version 1.2)33,34 were used for the
docking with default settings, except for the number of models generated. For the blind docking (i.e.
without prior assumptions of the interacting surfaces), 100 structures were generated in the rigid body
docking step, of which the 20 best-ranked structures were refined first by semi-flexible simulated
annealing (SA), then by SA in the presence of explicit solvent. No restraints were imposed in either
protein. This protocol was repeated two hundred times, starting with different initial seeds, yielding a
total of 4,000 docking models. For the restrained docking, the protocol was the same as for the blind
docking, except for the use of the Ambiguous Interaction Restraints (AIRs). AIRs were defined
between the WHISCY-predicted41 interface residues of Ec-YmdB (Fig. S3), and residues 120-140 of
each monomer of the Ec-RNase III homodimer. In this case, only ten iterations with different initial
seeds were run, resulting in a total of ~400 structures for each Ec-RNase III homology model bound to
YmdB.
S4
Figure S3. WHISCY41 prediction for E. coli YmdB. (A) Surface representation of Ec-YmdB (PDB
entry 1SPV), colored according to the WHISCY scores. The protein is shown as van der Waals
spheres, where red indicates active residues (i.e. residues predicted to be directly involved in proteinprotein interactions) and pink indicates passive residues (i.e. solvent-accessible residues that are
neighbors of the predicted active residues). The remainder of the protein is shown in blue. (B)
Atomistic representation of the WHISCY-predicted active and passive residues. The active residues are
shown in thick licorice and are identified by name and position; the consecutive active residues G30,
G31, G32 and G33 are labeled as “G30-G33”. The passive residues (D11, K14, P26, S27, A41, P44,
A45, L47, L69, G71, D72, P74, A75, K76, V85, W86, R87, V125, G127, and Y159) are displayed as
lines, and are not labeled.
S5
Blind docking analysis. A center of mass (COM) approach36 was used to determine the main
interacting area(s) explored in the blind docking. The 4,000 configurations of the protein-protein
complex generated in the blind docking were aligned, based only on the structure of the receptor
(RNase III). The position of the COM of the ligand (YmdB) was then determined, using the backbone
atoms of residues 3-174. The set of COM positions was used to compute the number density of ligand
poses on the surface of RNase III using the Volmap plugin38 in VMD.37 Each COM was treated as a
normalized isotropic Gaussian density of width 3 Å, and the density was summed over all the COM
using a three-dimensional grid with a 1 Å bin resolution. Then, a cluster analysis of the set of COM
positions was performed using g_cluster29 and the gromos method,30 with a cutoff of 2 nm. The
resulting clusters were ranked according to their local density, instead of the number of cluster
members, because regions of continuous density (or high occupancy) are likely to represent regions of
tighter binding. A similar strategy was used to identify small molecule ligand binding sites. 62 In order
to avoid overestimation errors, we recalculated the number densities using a smaller Gaussian width of
1.5 Å and a finer grid resolution of 0.5 Å, and then integrated the number density using a 5σ threshold.
Fig. 1 shows the first five clusters along with the corresponding number densities. The five clusters
correspond to 35% of the total number of docking poses (309, 311, 291, 269 and 211 poses for clusters
1-5, respectively, out of a total of 4,000). Beyond cluster 5, the local density of the clusters is
significantly lower (~1.4-fold) than the first cluster. The latter clusters therefore were not considered
further.
Next, the docking landscape was further characterized by performing a configurational analysis
based on receptor-ligand distances.39 First, we calculated the distance between the COM of YmdB and
the COM of either the RIIIDs or the dsRBDs of RNase III (black dashed lines in Fig. S4A). Here, the
COM was determined by taking into account only the backbone atoms of residues 3-174 for YmdB, 6128 for the RNase III RIIID, and 155-225 for the RNase III dsRBD. Using the YmdB-RIIID and
S6
YmdB-dsRBD distances as coordinates and 1 Å bins, we computed a two-dimensional (2D) histogram
(Fig. S4B) to quantify the population of the two binding modes identified above: 42% for YmdB bound
to the RIIIDs, and 34% for YmdB bound to the dsRBDs. Next, we calculated the distances between the
COM of YmdB and the COM of each of the RIIIDs separately (YmdB-RIIID1 and YmdB-RIIID2, see
blue dashed lines in Fig. S4A) and computed the corresponding 2D histogram (Fig. S4C) to
differentiate between YmdB bound to both RIIIDs or to a single RIIID. Similarly, we calculated the
distances between the COM of YmdB and the COM of each of the dsRBDs of RNase III (YmdBdsRBD1 and YmdB-dsRBD2; see red dashed lines in Fig. S4A), and computed the corresponding 2D
histogram (Fig. S4D) to differentiate between YmdB bound to both dsRBDs or to a single dsRBD.
Finally, a statistical analysis was performed of the residue propensity for the protein-protein
interface. Specifically, the percentage frequency was calculated for finding a YmdB or RNase III
residue within 5 Å of the complementary partner in the 4,000 configurations of the protein-protein
complex, generated through blind docking (Figs. S5-S6). Here, the assumption is that amino acids with
calculated high frequencies are likely to be involved in protein-protein interactions.39,40
S7
Figure S4. Configurational analysis of the blind docking poses. (A) Schematic representation of the
YmdB-RNase III complex, using the homology model of Ec-RNase III with dsRNA bound in the
catalytic valley. The domains of the RNase III homodimer are shown in brown and are indicated by
name. YmdB is displayed in gray, and the dsRNA in pink. The distances between the centers of mass
(COM) of YmdB and the RNase III domains are indicated by dashed lines (black for the distances
YmdB-RIIID and YmdB-dsRBD, blue for YmdB-RIIID1 and YmdB-RIIID2, and red for YmdBdsRBD1 and YmdB-dsRBD2). (B) 2D histogram of the 4,000 configurations generated in the blind
S8
docking using as coordinates the distances YmdB-RIIID and YmdB-dsRBD (shown as black dashed
lines in panel A). As a reference, the distance RIIID-dsRBD is 28.9±0.6 Å. The color scale indicates
the population of the different configurations of the YmdB-RNase III complex, from blue (low) to red
(high). (C) 2D histogram using the distances YmdB-RIIID1 and YmdB-RIIID2 as coordinates (shown
as blue dashed lines in panel A). As a reference, the RIIID1-RIIID2 distance is 31.6±1.3 Å. A-B-A’
represents the ternary protein-protein complex in which YmdB (B) is bound to both RNase III
monomers (A, A'), whereas A-B and A'-B denote binary complexes in which YmdB is bound to only
one RNase III monomer (either A or A'). (D) 2D histogram using the YmdB- dsRBD1 and YmdBdsRBD2 distances as coordinates (shown as red dashed lines in panel A). As a reference, the dsRBD1dsRBD2 distance is 39.8±0.9 Å.
S9
Figure S5. Statistical analysis of YmdB residues at the protein-protein interface in the blind docking
poses. (A) YmdB interfacial frequency map. The protein is displayed as van der Waals spheres, using a
color scale (see inset bar) in which blue indicates low frequency, and red denotes high frequency. (B)
The top ten YmdB residues with the highest interfacial frequencies, mapped onto the Ec-YmdB crystal
structure (PDB entry 1SPV). The residues are shown in licorice, with C atoms in yellow, N atoms in
blue, and O atoms in red.
S10
Figure S6. Statistical analysis of RNase III residues present at the protein-protein interface in the blind
docking poses. (A) RNase III interfacial frequency map. One of the monomers is displayed as van der
Waals spheres, using a color scale (see inset bar) where blue indicates low frequency, and red denotes
high frequency. The other RNase III monomer and the nucleic acid are shown in cartoon form (in
brown and pink, respectively). The structure shown corresponds to the homology model of Ec-RNase
S11
III (based on PDB entry 2NUG) bound to a cleaved dsRNA, in a presumptive post-catalytic state. (B)
The top ten RIIID residues with the highest interfacial frequencies. Note that the view of the protein
has been rotated such that the bottom of the protein in panel A is shown. The residues are displayed in
licorice, with C atoms in cyan, N atoms in blue, and O atoms in red. (C) Positions of the top ten
dsRBD residues with the highest interfacial frequencies. Again, the view of the protein has been rotated
such that now the top of the protein is shown.
S12
Restrained docking analysis. As in the blind docking analysis (see above), a COM approach was
applied first. The ~400 configurations of the Ec-RNase III‒YmdB complex generated in each
restrained docking experiment were aligned onto the corresponding homology model of Ec-RNase III,
using only RNase III for the fitting, and the position of the COM of YmdB was calculated. Then, the
set of COM positions was used to compute the number density of ligand poses on the surface of the
RNase III receptor (see panel A in Figs. S7-S9) using the Volmap plugin,38 with Gaussian width of 1.5
Å, and grid resolution of 1 Å. Then, the restrained docking poses were clustered. Some modifications
were introduced in the clustering protocol, compared to the blind docking analysis, since in the
restrained docking simulations the interacting surface of RNase III is limited to the RIIIDs (Figs. S7AS9A), in a region corresponding approximately to clusters 2-4 of the blind docking (Fig. 1). As an
alternative to the RMSD of the COM of YmdB used in the blind docking, the backbone RMSD of
YmdB could be used in the clustering. However, the large fluctuations of the L-RMSD (spanning a ~40
Å range) indicate that the structural ensemble of YmdB bound to the RIIID is very heterogeneous.
Thus, clustering using L-RMSD alone is not sufficiently discriminating. Instead, a two-step clustering
protocol based on a rigid body representation of YmdB was applied. By computing all the Cα-Cα pair
distances of YmdB, and following the changes among the restrained docking poses, we can identify a
set of three alpha carbon atoms which exhibit long pair distances, but with small fluctuations, and use
the triad (here, the Cα atoms of K76, G88 and K139) to represent YmdB as a rigid body. In the first
clustering step, we used a single atom of the triad (Cα of K76) to distinguish between the
configurations in which YmdB is bound to either a single RIIID, or to both RIIIDs. The one-particle
clustering was carried out with g_cluster,29 using the gromos method30 and a 4 Å cutoff. Then, for each
of the one-particle clusters obtained, the different YmdB structures were fitted by removing the
translation of the Cα atom of K76. The second clustering step was performed by using all three atoms
of the triad (the Cα atoms of K76, G88 and K139), in order to differentiate between the different
S13
rotational orientations that YmdB can adopt. The three-particle clustering was also carried out with
g_cluster,29 using the gromos method30 and a 2 Å cutoff. The clusters obtained upon the two-step
clustering protocol are shown in panel B of Figs. S7-S9, superimposed onto the density of ligand poses
in order to show that the most populated clusters also correspond to regions of higher density. Panel C
of Figs. S7-S9 show the population of the different clusters and panel D the average Haddock score of
each of the clusters. Although some clusters are significantly more populated than others, their
Haddock scores are quite similar, and thus we cannot discard with full confidence any of the clusters.
Next, the protein-protein interface was analyzed. First, the molecular details of the interface
were explored for the representatives of the two most populated clusters (blue and red clusters in Figs.
S7-S9), which represent 60-70% of the configurations obtained. Figs. 2 and S10-S11 show the surface
complementarity of the two proteins, as well as the residues within 4 Å of the other partner. Tables I
and SI list the main protein-protein interactions, identified using the algorithm Binana50 with default
parameters, followed by a visual double-check.
S14
Figure S7. Distribution of the YmdB restrained-docked poses on the surface of a structural model of
apo Ec-RNase III. (A) Position of the center of mass of YmdB (small colored spheres) in each of the
400 configurations of the protein-protein complex, generated by restrained docking. The number
density of the YmdB docking poses is displayed as a light blue isosurface, with a cutoff of 7∙10-4 Å-3.
RNase III is shown in brown cartoon form. (B) Two-step cluster analysis of the YmdB docking poses.
The COM of YmdB is represented by small spheres, with a color code indicating the cluster to which
the YmdB docking poses belong. (C) Population of each of the clusters shown in (B). The white bar
represents the rest of structures that do not form significantly populated or dense clusters. (D) Haddock
score of each of the clusters in (B). The average and standard error of the mean are shown.
S15
Figure S8. Distribution of the
YmdB restrained-docked poses
on the surface of the structural
model of RNase III with dsRNA
bound to one of the dsRBDs. (A)
Position of the center of mass of
YmdB (shown as small colored
spheres) in each of the 400
configurations of the proteinprotein complex generated by
restrained docking. The number
density of the YmdB docking
poses is displayed as a light blue
isosurface, with a cutoff of 7∙10-4
Å-3. RNase III is shown in brown
cartoon form and the dsRNA in
pink.
(B)
Two-step
cluster
analysis of the YmdB docking poses. The COM of YmdB is represented by small spheres, with a
color code indicating the cluster to which the YmdB docking poses belong. (C) Population of each of
the clusters shown in (B). The white bar represents the rest of structures that do not form significantly
populated or dense clusters. (D) Haddock score of each of the clusters in (B). The average and
standard error of the mean are shown.
S16
Figure S9. Distribution
of the YmdB restraineddocked
poses
on
the
surface of the structural
model of RNase III with
a cleaved dsRNA bound
in the catalytic valley.
(A) Position of the center
of mass of YmdB (shown
as small colored spheres)
in
each
of
the
400
configurations
of
the
protein-protein complex
generated by restrained
docking.
The
number
density of the YmdB
docking
poses
is
displayed as a light blue isosurface, with a cutoff of 7∙10-4 Å-3. RNase III is shown in brown cartoon
form and the dsRNA in pink. (B) Two-step cluster analysis of the YmdB docking poses. The COM of
YmdB is represented by small spheres, with a color code indicating the cluster to which the YmdB
docking poses belong. (C) Population of each of the clusters shown in (B). The white bar represents the
rest of structures that do not form significantly populated or dense clusters. (D) Haddock score of each
of the clusters in (B). The average and standard error of the mean are shown.
S17
Table SI. Analysis of the clusters obtained in each of the three restrained dockings of Ec-RNase III
(apo, pre-catalytic and post-catalytic states, respectively) with Ec-YmdB. Besides the population of
each cluster, the Haddock score (see also panel D in Figs. S7-9) and its components are shown. Eelec,
EvdW, Edesolv and EAIR are the electrostatic, van der Waals energy, desolvation energy and ambiguous
interaction restraint energies, respectively, and BSA is the buried surface area. The Haddock score is
calculated as: Haddock score = 1.0 EvdW + 0.2 Eelec + 1.0 Edesolv + 0.1 EAIR . Average values
(and standard error of the mean) are indicated.
apo RNase III-YmdB docking
cluster
#
population
(%)
Haddock
score (a.u.)
EvdW +
0.2Eelec
(kcal/mol)
Edesolv
(kcal/mol)
EAIR
(kcal/mol)
BSA
(Å2)
1
40.4
76.8±1.9
23.6±0.4
13.5±1.8
397.4± 7.0
3329.5±33.8
2
20.5
75.9±2.9
24.5±0.6
12.0±2.6
393.1±10.1
3340.9±41.6
3
15.1
79.5±3.0
24.2±0.8
17.5±2.9
379.1±12.5
3219.9±64.5
4
14.7
75.5±3.7
23.0±0.8
15.3±3.2
371.8±11.1
3283.3±45.7
5
5.1
84.2±5.4
25.5±0.4
19.3±4.7
394.0±16.3
3479.0±62.9
6
4.1
73.4±6.9
21.9±0.7
11.7±6.2
398.9±29.6
3469.7±65.5
pre-catalytic RNase III-YmdB docking
cluster
#
population
(%)
Haddock
score (a.u.)
EvdW +
0.2Eelec
(kcal/mol)
Edesolv
(kcal/mol)
EAIR
(kcal/mol)
BSA
(Å2)
1
41.1
31.9±1.6
19.2±0.4
-23.4±1.3
361.3± 6.7
2576.6±31.3
2
30.8
32.1±2.0
19.6±0.5
-25.5±1.7
380.4± 9.1
2572.7±35.3
3
16.7
30.1±2.1
20.4±0.5
-27.0±1.9
368.2±11.8
2564.6±48.7
4
6.5
39.3±2.4
20.4±1.1
-18.1±2.0
370.1±16.9
2479.3±85.2
5
4.9
37.8±3.7
19.1±0.6
-22.6±2.6
413.6±18.2
2493.6±85.1
post-catalytic RNase III-YmdB docking
cluster
#
population
(%)
Haddock
score (a.u.)
EvdW +
0.2Eelec
(kcal/mol)
Edesolv
(kcal/mol)
EAIR
(kcal/mol)
BSA
(Å2)
1
69.5
52.3±1.2
21.2±0.4
-5.5±0.9
368.5± 5.3
2792.5±21.9
2
12.3
49.1±2.7
21.1±0.7
-8.2±2.2
361.0±14.8
2953.8±54.6
3
8.2
52.1±3.5
23.7±1.3
-6.2±2.8
345.9±12.4
2900.5±54.1
4
4.1
56.9±4.8
19.9±1.1
2.3±4.8
347.9±19.5
2721.5±84.3
5
3.3
61.2±4.3
23.8±1.5
-2.7±4.0
402.3±19.0
2877.4±81.8
S18
6
2.6
54.3±4.3
20.7±1.6
-4.4±3.6
379.9±26.4
2877.4±97.2
S19
Figure S10. Protein-protein complex between Ec-YmdB and Ec-RNase III in the apo form, obtained
through restrained docking. The representative structure of the first most populated cluster (blue in Fig.
S7) is shown on top, and the second most populated cluster (red in Fig. S7) is shown at the bottom. (A)
and (D): Surface representation of the complex, with YmdB in gray and RNase III in brown. The
position of YmdB R40 anchored in the dimer interface of RNase III is indicated with a yellow ellipse.
(B) and (E): Cartoon representation of the complex, using the same color code as in (A) and (D). The
interface residues (within 4 Å of the other partner) are shown as licorice, with C atoms in yellow for
YmdB and in cyan for RNase III. The O atoms are in red and N atoms in blue for both proteins. (C)
and (F): Close-up of the protein-protein interface, showing the residues discussed in the text.
S20
Figure S11. Protein-protein complex between Ec-YmdB and Ec-RNase III in a pre-catalytic state,
obtained through restrained docking. The representative structure of the first most populated cluster
(blue in Fig. S8) is shown on top and the second most populated cluster (red in Fig. S8) is shown at the
bottom. (A) and (D): Surface representation of the complex, with YmdB in gray and RNase III in
brown. The position of the anchored YmdB R40 is indicated with a yellow ellipse. (B) and (E):
Cartoon representation of the complex. The interface residues (within 4 Å of the other partner) are
shown as licorice, with C atoms in yellow for YmdB and in cyan for RNase III. (C) and (F): Close-up
of
the
protein-protein
interface,
showing
the
residues
discussed
in
the
text.
S21
Table SII. Main protein-protein interactions at the interface of the RNase III-YmdB complexes
obtained through restrained docking. Apo denotes the complex of YmdB with the apo form of RNase
III and pre the complex with RNase III in an early step of substrate recognition. For each complex, the
representative structure of the second most populated cluster (i.e. red cluster in Figs. S7-S8) is
considered, since their population (21% for the apo complex and 31% for the pre complex) is
comparable to the first most populated cluster (40% and 41%, respectively). For the post complex of
YmdB with RNase III in a post-catalytic state, the population of the first cluster is already 70%, and
thus the second cluster (10%) is not considered. The YmdB residues interacting with RNase III are
ordered according to the sequence numbering, followed by the RNase III counterpart residue(s) and the
type of interaction in parentheses (if present). The abbreviations used are: hb (hydrogen bond), sb (salt
bridge), pp (- interaction), cp (cation- interaction) and hc (hydrophobic contact). ΔΔGbind is the
calculated change in binding free energy of the complex upon alanine mutation of the corresponding
YmdB residue; hence, interactions involving backbone atoms of YmdB are not included. FoldX values
are indicated first, followed by Robetta values.
S22
YmdB
RNase III ΔΔGbind YmdB
(apo)
(kcal/mol)
RNase III ΔΔGbind
(pre)
(kcal/mol)
K14
D126' (sb)
E21' (sb)
1.7 / 1.1
K14
‒
‒
N25
‒
‒
N25
E133' (hb)
1.8 / 2.0
S27
N18 (hb)
1.1 / 0.0
S27
E133' (hb)
0.0 /0.0
H39
Q130 (hb)
2.1 / 2.5
H39
Y15 (pp)
0.4 / 0.0
R40
Y50 (hb)
E133 (sb)
L125' (hc)
3.2 / 3.6
R40
L135 (hc)
0.8 / 0.5
L47
K134 (hc)
1.5/ 1.6
L47
L125 (hc)
0.6 / 0.6
R54
T16 (hb)
2.5 / 1.6
R54
‒
‒
Y126
‒
‒
Y126
Q130' (hb)
K134' (cp)
3.0 / 2.6
Y159
Q130' (hb)
0.6 / 1.2
Y159
‒
‒
D160
K134' (sb)
0.8 / 1.3
D160
‒
‒
S23
Generalized Born Implicit Solvent (GBIS) simulations. The representative structures of the YmdBRNase III complex obtained in the restrained docking simulations were optimized in a Generalized
Born Implicit Solvent (GBIS) model, using the implementation43 in the NAMD program.44 All Arg,
Lys, Asp, and Glu residues were considered in their ionized form, and the protonation states of the
histidine residues were determined by taking into account the hydrogen bond environment. The protein
and the dsRNA were described using the Cornell45 and the parm99bsc046 force fields, respectively, and
the Åqvist parameters were used for sodium and chloride ions.47 The Mg2+ ion Lennard-Jones
interactions were calculated using the metal ion parameters provided by Allner et al.48
All simulations were performed with the NAMD (version 2.9) program.44 The GBIS
minimizations were carried out assuming an implicit ion concentration of 150 mM, with protein and
solvent dielectric constants set to 1 and 80, respectively. Born radii were calculated using a cutoff of 14
Å, while the nonbonded forces were smoothed and cut off between 15 and 16 Å. The protein-protein
complex was minimized for 500,000 steps with the protein backbone fixed, followed by another
500,000 steps without restraints. The quality of the final models was assessed using MolProbity.49
GBIS minimization significantly improved the MolProbity score of the complex (≤ 1.5 Å), indicating
that the clashes, rotamer quality, and Ramachandran quality of the model are within the average values
for structures of 1.5 Å resolution.
S24
In silico alanine mutagenesis. Computational alanine scanning was performed in order to identify
RNase III-YmdB interface hot spots. The FoldX program51 and the Robetta webserver52 were used to
calculate the change in binding free energy (ΔΔGbind) of the complex as a result of alanine (Ala)
mutation:
(ΔGMUTcomplex – ΔGMUTpartner A – ΔGMUTpartner
ΔΔGbind = B)
(1)
– (ΔGWTcomplex – ΔGWTpartner A – ΔGWTpartner B)
Here, ΔGXcomplex is the binding free energy of the complex, with X either the wild type (WT) or mutant
(MUT), and ΔGXpartner
Y
is the folding free energy of the interaction partner Y. Ala scanning was
performed for each of the representative structures of the clusters (Figs. S7-S9) obtained in the three
restrained dockings, using both FoldX and Robetta. The agreement between the results obtained with
the two programs was investigated using a Kendall tau rank correlation test; the FoldX and Robetta
rankings of ΔΔGbind were found to be consistent with Kendall's tau coefficient 0.35 / 0.43 (RNase III /
YmdB, respectively) and (single-tailed) p-value 0.04 / 0.05. Therefore, both programs provide similar
orderings of the residue contributions to the binding free energy of the complex, despite the use of
different energy functions (in particular, the Robetta energy function does not take into account the
possible contribution of the nucleic acid). For each restrained docking (i.e. the YmdB complex with
apo RNase III, with RNase III bound to dsRNA in one dsRBD, or with RNase III bound to dsRNA in
the catalytic valley), interface residues were considered hot spot candidates if both the FoldX and
Robetta ΔΔGbind values were >1 kcal/mol in at least one of the representative structures of the
restrained docking. The 1 kcal/mol threshold has been shown to have a positive predictive value of
71% (Robetta) and 73% (FoldX).53
It should be noted that, for the computational Ala scanning, only one of the two equivalent
S25
residues of the RNase III dimer is mutated to Ala (i.e. a mutant/wt heterodimer is used in the
calculation of ΔΔGbind), whereas the experimental Ala mutagenesis analysis necessarily creates a
mutant-mutant homodimer. This constraint would be particularly important for RNase III residues
located at the dimer interface, where the two symmetry-related residues may interact simultaneously
with YmdB. To address this problem, we have estimated the in silico change in binding free energy for
a given mutant-mutant homodimer as the sum of the ΔΔGbind of the two alternative mutant/wt
heterodimers:52,54,55
ΔΔGbind (XA/X'A) = ΔΔGbind (XA) + ΔΔGbind (X'A)
(2)
Here we are assuming that the two single Ala mutations (XA and X'A) are functionally independent,
such that the coupling or interaction free energy between the two residues (X and X') is zero. However,
the additivity approximation likely does not hold for the D128/D128' and Q130/Q130' pairs, due to the
proximity of the residues (Cβ-Cβ distance ~ 9 Å) at the RNase III homodimer interface. For example,
there is an electrostatic repulsion between the negative charges of D128 and D128', which is alleviated
by hydrogen bond formation with the Q130' and Q130 side chains, respectively. Therefore, Ala
mutation of these residue pairs would not only affect the binding energy of the complex (i.e. the
ΔGMUTcomplex term in equation 1), but also the stability of RNase III (i.e. ΔGMUTpartner B). Thus, Ala
mutation of a single acidic residue (either D128A or D128'A) may confer stabilization to RNase III (i.e.
ΔGMUTpartner
B
< ΔGWTpartner B), because it eliminates the charge repulsion, or may be neutral (i.e.
ΔGMUTpartner B ~ ΔGWTpartner B) if the electrostatic stabilization is canceled by the loss of one hydrogen
bond. In contrast, a double mutation D128A/D128'A is expected to result in destabilization of RNase
III (i.e. ΔGMUTpartner B > ΔGWTpartner B), due to the loss of two hydrogen bonds. Hence, [ΔΔGbind (D128A)
+ ΔΔGbind (D128'A)] is a low estimate of ΔΔGbind (D128A/D128'A). On the other hand, both single and
double Ala mutations of the Q130/Q130' pair are probably destabilizing (i.e. ΔGMUTpartner B > ΔGWTpartner
B),
due to the loss of screening of the D128 and D128' charges. Nevertheless, the destabilization upon
S26
double mutation (Q130A/Q130'A) is expected to be more than twice the value of any of the two single
mutations (either Q130A or Q130'A), because the second Ala mutation is introduced in a single RNase
III mutant that has already a charge imbalance. In other words, the interaction energy term between
Q130 and Q130' is larger than zero, due to indirect coupling through the D128/D128' pair, and thus the
approximation in equation 2 may not be fully accurate for the Q130/Q130' pair.
The ΔΔGbind values of the hot spot candidates were calculated as a total ensemble average using
the following equation:
ΔΔGbind = Σj=1,3 Σi=1,Nj (wij ·ΔΔGbind, ij)
(3)
where the index j runs over the three restrained dockings and, for each restrained docking, the index i
runs over the Nj representative structures (six, five or six, respectively, see Figs. S7-S9). The prefactor
wij is the population of the representative structure i in the restrained docking j, normalized to the total
number of structures of all three restrained dockings (wij = nij / Σj=1,3 Σi=1,Nj nij), and ΔΔGbind, ij is the
change in binding free energy upon Ala mutation of the residue considered in structure i in the
restrained docking j. The resulting average values are shown in Fig. S12, Table SII, and Table I (see
Results section in the main text). For RNase III the ΔΔGbind values correspond to the mutant-mutant
homodimer, and were estimated using equation 2. Analysis of each restrained docking separately (i.e.
replacing the first summation by a constant j, either 1, or 2 or 3) did not alter the ranking of the hot spot
candidates, and yielded closely similar ΔΔGbind values, probably because the different forms of RNase
III considered here interact with YmdB in a similar fashion (see the main text). Moreover, an analysis
of the Ala scanning results that considered only the two most populated clusters (i.e. limiting Nj to 2)
did not significantly change the ΔΔGbind values.
S27
Figure S12. Calculated changes in binding energy upon Ala mutation (ΔΔG) of the YmdB-RNase III
complex. Shown are the total ensemble averages of ΔΔGbind calculated using equation 3. The ΔΔG
values for RNase III correspond to the mutant-mutant homodimer, estimated using equation 2. The
gray bars correspond to FoldX values, and the red bars correspond to Robetta values.
S28
Solvent-accessible Surface Area (SASA). According to the anchor-latch model,74 the smaller partner
of the protein-protein complex provides an “anchor” residue that sequesters the largest solventaccessible surface area (SASA) upon binding. The anchor residue therefore is predicted to be
functionally important for protein-protein recognition, usually acting as a “hot spot.” The change in
solvent accessible surface area (ΔSASA) was calculated for the YmdB residues using the ANCHOR
webserver.64
Analysis of RNase III sequences. The Ec-RNase III sequence was used as query in a BLAST56
analysis, using the translated open reading frames of the genomes of ten phylogenetically distinct
bacterial species. The identified orthologous sequences were imported and aligned using
ClustalOmega.57 The obtained full-length multiple sequence alignment (MSA) is available upon
request; the region discussed in the main text is shown in Fig. 3A. This initial MSA was then used as a
seed to train a Hidden Markov Model (HMM), using HMMER3.0.22 Scanning the HMM against the
reference proteome 75 (rp75) sequence database, and confining the search to eubacteria and hits having
E-values <0.01, 1,117 RNase III sequences were collected and aligned. The resulting HMM-based
MSA was used to investigate the residue conservation, or frequency in the polypeptide segment 120140 (Ec-RNase III numbering) shown elsewhere to be important for YmdB binding.13 Sequence logos58
for this region were generated using Weblogo359 and are shown in Fig. S13. The 1,117 RNase III
sequences were classified in two main groups, depending on the length of the loop between helices α6
and α7. 1,089 sequences (97.5%) have a “short” loop, whereas 28 (2.5%) have a “long” loop, due to an
insertion. In sixteen of the “long loop” sequences (1.4%) the insertion consists of a single amino acid
(before residue 128), while in the other twelve sequences (1.1%) two amino acids are introduced
between residues 127 and 128. The insertion in the α6-α7 loop observed in the sequence alignment is
S29
consistent with the structural alignment of the crystal structures of Aquifex aeolicus (Aa), Thermotoga
maritima (Tm) and Mycobacterium tuberculosis (Mt) RNases III (PDB entries 2NUG,20 1O0W and
2A1160, respectively), as well as homology models of Ec-RNase III (this work) and Streptomyces
coelicolor RNase III (obtained from ModBase61).
Figure S13. Sequence logo of the RNase III segment (“recognition pocket”) implicated in the
interaction with YmdB (residues 120-140; Ec-RNase III numbering). Amino acids A, V, I, L, M, W, F
and P are in black; T, S, Y, C and G in green; R, K and H in blue; D and E in red; and N and Q in
purple. The letter height is proportional to the probability of finding the corresponding amino acid at
that particular position. Top, sequence logo of 1,089 RNase III sequences with a “short” loop
connecting helices α6 and α7. Bottom, sequence logo of 28 RNase III sequences with a “long” loop
connecting helices α6 and α7.
S30
Electrostatic surface potential of the RNase III recognition pocket. Continuum electrostatic
calculations were performed for the apo form of RNase III using the Adaptive Poisson Boltzmann
Solver (APBS) program.73 The crystal structure of Tm-RNase III (1O0W) and the homology models of
Ec-RNase III (this work) and Sc-RNase III (obtained from ModBase)61 were used in the calculations.
The solvent radius was set to 1.4 Å, the dielectric constants of the protein and the solvent were set to
4.0 and 78.5, respectively, and the ionic strength was adjusted to 150 mM using NaCl. The obtained
electrostatic surface potentials are displayed in Fig. S14.
Figure S14. Electrostatic surface potentials of apo RNase III structures. (A) Tm-RNase III, (B) EcRNase III, and (C) Sc-RNase III. The protein surface is colored by electrostatic potential, from red (-4
kT/e) to blue (+4 kT/e). The two recognition pockets for the R40 side chain of Ec-YmdB are circled in
yellow.
Analysis of YmdB ortholog sequences. The Ec-YmdB sequence was used as query in a BLAST56
analysis, using the translated open reading frames of the genomes of eleven other phylogenetically
distinct bacterial species. The sequences were imported and aligned using Clustal Omega. 57 The
obtained full-length MSA is available upon request; the region discussed in the main text is shown in
Figure 3B.
S31
Download