prot24751-sup-0001-suppinfosi1

Supporting Information File 1 (SI-1) Paudyal, S., Alfonso-Prieto, M. et al. Computational Methods and Results Homology modeling of Ec-RNase III. Fig. S1 shows the sequence alignment of the two template sequences (Aa- and Tm-RNase III) and the query sequence (Ec-RNase III). The following protocol was applied to build the Ec-RNase III homology models. For each template (PDB entries 1O0W, 2NUG or 2NUE), 1,500 models of the Ec-RNase III structure were generated using Modeller (version 9.10),27 then clustered using the g_cluster tool of the Gromacs package,29 applying the gromos method30 and an RMSD cutoff of either 0.65 Å (1O0W-based model) or 1.5 Å (2NUE and 2NUG). A single representative of the most highly populated cluster was selected, then applying Modeller to optimize the conformation of the variable loops.28 The loops corresponded to residues 29-36 for the homology model based on the Tm-RNase III template, and residues 102-107 and 194-201 for the two Aa-based models. 1,500 loop conformations were generated and the most representative was selected with g_cluster, using the gromos method and an RMSD cutoff of 1.25 Å (1O0W-based model), 1.35 Å (2NUE) or 1.50 Å (2NUG). The final homology models of Ec-RNase III are shown in Fig. S2. S1 Figure S1. Alignment of the Escherichia coli (Ec), Aquifex aeolicus (Aa) and Thermotoga maritima (Tm) RNase III polypeptide sequences used in the homology modeling of Ec-RNase III. The residue numbering of Ec-RNase III and the predicted secondary structure (using PSIPRED)24 are shown above the aligned sequences. Conserved residues are highlighted in black, and similar residues are boxed. S2 Figure S2. Homology models of Ec-RNase III. The Ec-RNase III structural models are shown in brown cartoon form, whereas the crystal structures of the corresponding templates are shown in blue. (A) Model of the apo state, built using the Tm-RNase III structure (PDB entry 1O0W) as template. The inset shows the protein in the same orientation as in (C), in order to highlight the change in positions of the two dsRBDs during the catalytic cycle. In contrast, the structures of the two RNase III domains (RIIIDs) remain essentially unchanged. (B) Model of a presumptive pre-catalytic complex, built using as template the Aa-RNase III structure with a dsRNA bound to one of the dsRBDs (PDB entry 2NUE).20 The dsRNA is shown in pink cartoon form. (C) Model of a presumptive post-catalytic complex, built using as template the Aa-RNase III structure with dsRNA bound to the catalytic valley (PDB entry 2NUG).20 S3 Protein-protein docking. Haddock (version 2.0)31,32 and CNS (version 1.2)33,34 were used for the docking with default settings, except for the number of models generated. For the blind docking (i.e. without prior assumptions of the interacting surfaces), 100 structures were generated in the rigid body docking step, of which the 20 best-ranked structures were refined first by semi-flexible simulated annealing (SA), then by SA in the presence of explicit solvent. No restraints were imposed in either protein. This protocol was repeated two hundred times, starting with different initial seeds, yielding a total of 4,000 docking models. For the restrained docking, the protocol was the same as for the blind docking, except for the use of the Ambiguous Interaction Restraints (AIRs). AIRs were defined between the WHISCY-predicted41 interface residues of Ec-YmdB (Fig. S3), and residues 120-140 of each monomer of the Ec-RNase III homodimer. In this case, only ten iterations with different initial seeds were run, resulting in a total of ~400 structures for each Ec-RNase III homology model bound to YmdB. S4 Figure S3. WHISCY41 prediction for E. coli YmdB. (A) Surface representation of Ec-YmdB (PDB entry 1SPV), colored according to the WHISCY scores. The protein is shown as van der Waals spheres, where red indicates active residues (i.e. residues predicted to be directly involved in proteinprotein interactions) and pink indicates passive residues (i.e. solvent-accessible residues that are neighbors of the predicted active residues). The remainder of the protein is shown in blue. (B) Atomistic representation of the WHISCY-predicted active and passive residues. The active residues are shown in thick licorice and are identified by name and position; the consecutive active residues G30, G31, G32 and G33 are labeled as “G30-G33”. The passive residues (D11, K14, P26, S27, A41, P44, A45, L47, L69, G71, D72, P74, A75, K76, V85, W86, R87, V125, G127, and Y159) are displayed as lines, and are not labeled. S5 Blind docking analysis. A center of mass (COM) approach36 was used to determine the main interacting area(s) explored in the blind docking. The 4,000 configurations of the protein-protein complex generated in the blind docking were aligned, based only on the structure of the receptor (RNase III). The position of the COM of the ligand (YmdB) was then determined, using the backbone atoms of residues 3-174. The set of COM positions was used to compute the number density of ligand poses on the surface of RNase III using the Volmap plugin38 in VMD.37 Each COM was treated as a normalized isotropic Gaussian density of width 3 Å, and the density was summed over all the COM using a three-dimensional grid with a 1 Å bin resolution. Then, a cluster analysis of the set of COM positions was performed using g_cluster29 and the gromos method,30 with a cutoff of 2 nm. The resulting clusters were ranked according to their local density, instead of the number of cluster members, because regions of continuous density (or high occupancy) are likely to represent regions of tighter binding. A similar strategy was used to identify small molecule ligand binding sites. 62 In order to avoid overestimation errors, we recalculated the number densities using a smaller Gaussian width of 1.5 Å and a finer grid resolution of 0.5 Å, and then integrated the number density using a 5σ threshold. Fig. 1 shows the first five clusters along with the corresponding number densities. The five clusters correspond to 35% of the total number of docking poses (309, 311, 291, 269 and 211 poses for clusters 1-5, respectively, out of a total of 4,000). Beyond cluster 5, the local density of the clusters is significantly lower (~1.4-fold) than the first cluster. The latter clusters therefore were not considered further. Next, the docking landscape was further characterized by performing a configurational analysis based on receptor-ligand distances.39 First, we calculated the distance between the COM of YmdB and the COM of either the RIIIDs or the dsRBDs of RNase III (black dashed lines in Fig. S4A). Here, the COM was determined by taking into account only the backbone atoms of residues 3-174 for YmdB, 6128 for the RNase III RIIID, and 155-225 for the RNase III dsRBD. Using the YmdB-RIIID and S6 YmdB-dsRBD distances as coordinates and 1 Å bins, we computed a two-dimensional (2D) histogram (Fig. S4B) to quantify the population of the two binding modes identified above: 42% for YmdB bound to the RIIIDs, and 34% for YmdB bound to the dsRBDs. Next, we calculated the distances between the COM of YmdB and the COM of each of the RIIIDs separately (YmdB-RIIID1 and YmdB-RIIID2, see blue dashed lines in Fig. S4A) and computed the corresponding 2D histogram (Fig. S4C) to differentiate between YmdB bound to both RIIIDs or to a single RIIID. Similarly, we calculated the distances between the COM of YmdB and the COM of each of the dsRBDs of RNase III (YmdBdsRBD1 and YmdB-dsRBD2; see red dashed lines in Fig. S4A), and computed the corresponding 2D histogram (Fig. S4D) to differentiate between YmdB bound to both dsRBDs or to a single dsRBD. Finally, a statistical analysis was performed of the residue propensity for the protein-protein interface. Specifically, the percentage frequency was calculated for finding a YmdB or RNase III residue within 5 Å of the complementary partner in the 4,000 configurations of the protein-protein complex, generated through blind docking (Figs. S5-S6). Here, the assumption is that amino acids with calculated high frequencies are likely to be involved in protein-protein interactions.39,40 S7 Figure S4. Configurational analysis of the blind docking poses. (A) Schematic representation of the YmdB-RNase III complex, using the homology model of Ec-RNase III with dsRNA bound in the catalytic valley. The domains of the RNase III homodimer are shown in brown and are indicated by name. YmdB is displayed in gray, and the dsRNA in pink. The distances between the centers of mass (COM) of YmdB and the RNase III domains are indicated by dashed lines (black for the distances YmdB-RIIID and YmdB-dsRBD, blue for YmdB-RIIID1 and YmdB-RIIID2, and red for YmdBdsRBD1 and YmdB-dsRBD2). (B) 2D histogram of the 4,000 configurations generated in the blind S8 docking using as coordinates the distances YmdB-RIIID and YmdB-dsRBD (shown as black dashed lines in panel A). As a reference, the distance RIIID-dsRBD is 28.9±0.6 Å. The color scale indicates the population of the different configurations of the YmdB-RNase III complex, from blue (low) to red (high). (C) 2D histogram using the distances YmdB-RIIID1 and YmdB-RIIID2 as coordinates (shown as blue dashed lines in panel A). As a reference, the RIIID1-RIIID2 distance is 31.6±1.3 Å. A-B-A’ represents the ternary protein-protein complex in which YmdB (B) is bound to both RNase III monomers (A, A'), whereas A-B and A'-B denote binary complexes in which YmdB is bound to only one RNase III monomer (either A or A'). (D) 2D histogram using the YmdB- dsRBD1 and YmdBdsRBD2 distances as coordinates (shown as red dashed lines in panel A). As a reference, the dsRBD1dsRBD2 distance is 39.8±0.9 Å. S9 Figure S5. Statistical analysis of YmdB residues at the protein-protein interface in the blind docking poses. (A) YmdB interfacial frequency map. The protein is displayed as van der Waals spheres, using a color scale (see inset bar) in which blue indicates low frequency, and red denotes high frequency. (B) The top ten YmdB residues with the highest interfacial frequencies, mapped onto the Ec-YmdB crystal structure (PDB entry 1SPV). The residues are shown in licorice, with C atoms in yellow, N atoms in blue, and O atoms in red. S10 Figure S6. Statistical analysis of RNase III residues present at the protein-protein interface in the blind docking poses. (A) RNase III interfacial frequency map. One of the monomers is displayed as van der Waals spheres, using a color scale (see inset bar) where blue indicates low frequency, and red denotes high frequency. The other RNase III monomer and the nucleic acid are shown in cartoon form (in brown and pink, respectively). The structure shown corresponds to the homology model of Ec-RNase S11 III (based on PDB entry 2NUG) bound to a cleaved dsRNA, in a presumptive post-catalytic state. (B) The top ten RIIID residues with the highest interfacial frequencies. Note that the view of the protein has been rotated such that the bottom of the protein in panel A is shown. The residues are displayed in licorice, with C atoms in cyan, N atoms in blue, and O atoms in red. (C) Positions of the top ten dsRBD residues with the highest interfacial frequencies. Again, the view of the protein has been rotated such that now the top of the protein is shown. S12 Restrained docking analysis. As in the blind docking analysis (see above), a COM approach was applied first. The ~400 configurations of the Ec-RNase III‒YmdB complex generated in each restrained docking experiment were aligned onto the corresponding homology model of Ec-RNase III, using only RNase III for the fitting, and the position of the COM of YmdB was calculated. Then, the set of COM positions was used to compute the number density of ligand poses on the surface of the RNase III receptor (see panel A in Figs. S7-S9) using the Volmap plugin,38 with Gaussian width of 1.5 Å, and grid resolution of 1 Å. Then, the restrained docking poses were clustered. Some modifications were introduced in the clustering protocol, compared to the blind docking analysis, since in the restrained docking simulations the interacting surface of RNase III is limited to the RIIIDs (Figs. S7AS9A), in a region corresponding approximately to clusters 2-4 of the blind docking (Fig. 1). As an alternative to the RMSD of the COM of YmdB used in the blind docking, the backbone RMSD of YmdB could be used in the clustering. However, the large fluctuations of the L-RMSD (spanning a ~40 Å range) indicate that the structural ensemble of YmdB bound to the RIIID is very heterogeneous. Thus, clustering using L-RMSD alone is not sufficiently discriminating. Instead, a two-step clustering protocol based on a rigid body representation of YmdB was applied. By computing all the Cα-Cα pair distances of YmdB, and following the changes among the restrained docking poses, we can identify a set of three alpha carbon atoms which exhibit long pair distances, but with small fluctuations, and use the triad (here, the Cα atoms of K76, G88 and K139) to represent YmdB as a rigid body. In the first clustering step, we used a single atom of the triad (Cα of K76) to distinguish between the configurations in which YmdB is bound to either a single RIIID, or to both RIIIDs. The one-particle clustering was carried out with g_cluster,29 using the gromos method30 and a 4 Å cutoff. Then, for each of the one-particle clusters obtained, the different YmdB structures were fitted by removing the translation of the Cα atom of K76. The second clustering step was performed by using all three atoms of the triad (the Cα atoms of K76, G88 and K139), in order to differentiate between the different S13 rotational orientations that YmdB can adopt. The three-particle clustering was also carried out with g_cluster,29 using the gromos method30 and a 2 Å cutoff. The clusters obtained upon the two-step clustering protocol are shown in panel B of Figs. S7-S9, superimposed onto the density of ligand poses in order to show that the most populated clusters also correspond to regions of higher density. Panel C of Figs. S7-S9 show the population of the different clusters and panel D the average Haddock score of each of the clusters. Although some clusters are significantly more populated than others, their Haddock scores are quite similar, and thus we cannot discard with full confidence any of the clusters. Next, the protein-protein interface was analyzed. First, the molecular details of the interface were explored for the representatives of the two most populated clusters (blue and red clusters in Figs. S7-S9), which represent 60-70% of the configurations obtained. Figs. 2 and S10-S11 show the surface complementarity of the two proteins, as well as the residues within 4 Å of the other partner. Tables I and SI list the main protein-protein interactions, identified using the algorithm Binana50 with default parameters, followed by a visual double-check. S14 Figure S7. Distribution of the YmdB restrained-docked poses on the surface of a structural model of apo Ec-RNase III. (A) Position of the center of mass of YmdB (small colored spheres) in each of the 400 configurations of the protein-protein complex, generated by restrained docking. The number density of the YmdB docking poses is displayed as a light blue isosurface, with a cutoff of 7∙10-4 Å-3. RNase III is shown in brown cartoon form. (B) Two-step cluster analysis of the YmdB docking poses. The COM of YmdB is represented by small spheres, with a color code indicating the cluster to which the YmdB docking poses belong. (C) Population of each of the clusters shown in (B). The white bar represents the rest of structures that do not form significantly populated or dense clusters. (D) Haddock score of each of the clusters in (B). The average and standard error of the mean are shown. S15 Figure S8. Distribution of the YmdB restrained-docked poses on the surface of the structural model of RNase III with dsRNA bound to one of the dsRBDs. (A) Position of the center of mass of YmdB (shown as small colored spheres) in each of the 400 configurations of the proteinprotein complex generated by restrained docking. The number density of the YmdB docking poses is displayed as a light blue isosurface, with a cutoff of 7∙10-4 Å-3. RNase III is shown in brown cartoon form and the dsRNA in pink. (B) Two-step cluster analysis of the YmdB docking poses. The COM of YmdB is represented by small spheres, with a color code indicating the cluster to which the YmdB docking poses belong. (C) Population of each of the clusters shown in (B). The white bar represents the rest of structures that do not form significantly populated or dense clusters. (D) Haddock score of each of the clusters in (B). The average and standard error of the mean are shown. S16 Figure S9. Distribution of the YmdB restraineddocked poses on the surface of the structural model of RNase III with a cleaved dsRNA bound in the catalytic valley. (A) Position of the center of mass of YmdB (shown as small colored spheres) in each of the 400 configurations of the protein-protein complex generated by restrained docking. The number density of the YmdB docking poses is displayed as a light blue isosurface, with a cutoff of 7∙10-4 Å-3. RNase III is shown in brown cartoon form and the dsRNA in pink. (B) Two-step cluster analysis of the YmdB docking poses. The COM of YmdB is represented by small spheres, with a color code indicating the cluster to which the YmdB docking poses belong. (C) Population of each of the clusters shown in (B). The white bar represents the rest of structures that do not form significantly populated or dense clusters. (D) Haddock score of each of the clusters in (B). The average and standard error of the mean are shown. S17 Table SI. Analysis of the clusters obtained in each of the three restrained dockings of Ec-RNase III (apo, pre-catalytic and post-catalytic states, respectively) with Ec-YmdB. Besides the population of each cluster, the Haddock score (see also panel D in Figs. S7-9) and its components are shown. Eelec, EvdW, Edesolv and EAIR are the electrostatic, van der Waals energy, desolvation energy and ambiguous interaction restraint energies, respectively, and BSA is the buried surface area. The Haddock score is calculated as: Haddock score = 1.0 EvdW + 0.2 Eelec + 1.0 Edesolv + 0.1 EAIR . Average values (and standard error of the mean) are indicated. apo RNase III-YmdB docking cluster # population (%) Haddock score (a.u.) EvdW + 0.2Eelec (kcal/mol) Edesolv (kcal/mol) EAIR (kcal/mol) BSA (Å2) 1 40.4 76.8±1.9 23.6±0.4 13.5±1.8 397.4± 7.0 3329.5±33.8 2 20.5 75.9±2.9 24.5±0.6 12.0±2.6 393.1±10.1 3340.9±41.6 3 15.1 79.5±3.0 24.2±0.8 17.5±2.9 379.1±12.5 3219.9±64.5 4 14.7 75.5±3.7 23.0±0.8 15.3±3.2 371.8±11.1 3283.3±45.7 5 5.1 84.2±5.4 25.5±0.4 19.3±4.7 394.0±16.3 3479.0±62.9 6 4.1 73.4±6.9 21.9±0.7 11.7±6.2 398.9±29.6 3469.7±65.5 pre-catalytic RNase III-YmdB docking cluster # population (%) Haddock score (a.u.) EvdW + 0.2Eelec (kcal/mol) Edesolv (kcal/mol) EAIR (kcal/mol) BSA (Å2) 1 41.1 31.9±1.6 19.2±0.4 -23.4±1.3 361.3± 6.7 2576.6±31.3 2 30.8 32.1±2.0 19.6±0.5 -25.5±1.7 380.4± 9.1 2572.7±35.3 3 16.7 30.1±2.1 20.4±0.5 -27.0±1.9 368.2±11.8 2564.6±48.7 4 6.5 39.3±2.4 20.4±1.1 -18.1±2.0 370.1±16.9 2479.3±85.2 5 4.9 37.8±3.7 19.1±0.6 -22.6±2.6 413.6±18.2 2493.6±85.1 post-catalytic RNase III-YmdB docking cluster # population (%) Haddock score (a.u.) EvdW + 0.2Eelec (kcal/mol) Edesolv (kcal/mol) EAIR (kcal/mol) BSA (Å2) 1 69.5 52.3±1.2 21.2±0.4 -5.5±0.9 368.5± 5.3 2792.5±21.9 2 12.3 49.1±2.7 21.1±0.7 -8.2±2.2 361.0±14.8 2953.8±54.6 3 8.2 52.1±3.5 23.7±1.3 -6.2±2.8 345.9±12.4 2900.5±54.1 4 4.1 56.9±4.8 19.9±1.1 2.3±4.8 347.9±19.5 2721.5±84.3 5 3.3 61.2±4.3 23.8±1.5 -2.7±4.0 402.3±19.0 2877.4±81.8 S18 6 2.6 54.3±4.3 20.7±1.6 -4.4±3.6 379.9±26.4 2877.4±97.2 S19 Figure S10. Protein-protein complex between Ec-YmdB and Ec-RNase III in the apo form, obtained through restrained docking. The representative structure of the first most populated cluster (blue in Fig. S7) is shown on top, and the second most populated cluster (red in Fig. S7) is shown at the bottom. (A) and (D): Surface representation of the complex, with YmdB in gray and RNase III in brown. The position of YmdB R40 anchored in the dimer interface of RNase III is indicated with a yellow ellipse. (B) and (E): Cartoon representation of the complex, using the same color code as in (A) and (D). The interface residues (within 4 Å of the other partner) are shown as licorice, with C atoms in yellow for YmdB and in cyan for RNase III. The O atoms are in red and N atoms in blue for both proteins. (C) and (F): Close-up of the protein-protein interface, showing the residues discussed in the text. S20 Figure S11. Protein-protein complex between Ec-YmdB and Ec-RNase III in a pre-catalytic state, obtained through restrained docking. The representative structure of the first most populated cluster (blue in Fig. S8) is shown on top and the second most populated cluster (red in Fig. S8) is shown at the bottom. (A) and (D): Surface representation of the complex, with YmdB in gray and RNase III in brown. The position of the anchored YmdB R40 is indicated with a yellow ellipse. (B) and (E): Cartoon representation of the complex. The interface residues (within 4 Å of the other partner) are shown as licorice, with C atoms in yellow for YmdB and in cyan for RNase III. (C) and (F): Close-up of the protein-protein interface, showing the residues discussed in the text. S21 Table SII. Main protein-protein interactions at the interface of the RNase III-YmdB complexes obtained through restrained docking. Apo denotes the complex of YmdB with the apo form of RNase III and pre the complex with RNase III in an early step of substrate recognition. For each complex, the representative structure of the second most populated cluster (i.e. red cluster in Figs. S7-S8) is considered, since their population (21% for the apo complex and 31% for the pre complex) is comparable to the first most populated cluster (40% and 41%, respectively). For the post complex of YmdB with RNase III in a post-catalytic state, the population of the first cluster is already 70%, and thus the second cluster (10%) is not considered. The YmdB residues interacting with RNase III are ordered according to the sequence numbering, followed by the RNase III counterpart residue(s) and the type of interaction in parentheses (if present). The abbreviations used are: hb (hydrogen bond), sb (salt bridge), pp (- interaction), cp (cation- interaction) and hc (hydrophobic contact). ΔΔGbind is the calculated change in binding free energy of the complex upon alanine mutation of the corresponding YmdB residue; hence, interactions involving backbone atoms of YmdB are not included. FoldX values are indicated first, followed by Robetta values. S22 YmdB RNase III ΔΔGbind YmdB (apo) (kcal/mol) RNase III ΔΔGbind (pre) (kcal/mol) K14 D126' (sb) E21' (sb) 1.7 / 1.1 K14 ‒ ‒ N25 ‒ ‒ N25 E133' (hb) 1.8 / 2.0 S27 N18 (hb) 1.1 / 0.0 S27 E133' (hb) 0.0 /0.0 H39 Q130 (hb) 2.1 / 2.5 H39 Y15 (pp) 0.4 / 0.0 R40 Y50 (hb) E133 (sb) L125' (hc) 3.2 / 3.6 R40 L135 (hc) 0.8 / 0.5 L47 K134 (hc) 1.5/ 1.6 L47 L125 (hc) 0.6 / 0.6 R54 T16 (hb) 2.5 / 1.6 R54 ‒ ‒ Y126 ‒ ‒ Y126 Q130' (hb) K134' (cp) 3.0 / 2.6 Y159 Q130' (hb) 0.6 / 1.2 Y159 ‒ ‒ D160 K134' (sb) 0.8 / 1.3 D160 ‒ ‒ S23 Generalized Born Implicit Solvent (GBIS) simulations. The representative structures of the YmdBRNase III complex obtained in the restrained docking simulations were optimized in a Generalized Born Implicit Solvent (GBIS) model, using the implementation43 in the NAMD program.44 All Arg, Lys, Asp, and Glu residues were considered in their ionized form, and the protonation states of the histidine residues were determined by taking into account the hydrogen bond environment. The protein and the dsRNA were described using the Cornell45 and the parm99bsc046 force fields, respectively, and the Åqvist parameters were used for sodium and chloride ions.47 The Mg2+ ion Lennard-Jones interactions were calculated using the metal ion parameters provided by Allner et al.48 All simulations were performed with the NAMD (version 2.9) program.44 The GBIS minimizations were carried out assuming an implicit ion concentration of 150 mM, with protein and solvent dielectric constants set to 1 and 80, respectively. Born radii were calculated using a cutoff of 14 Å, while the nonbonded forces were smoothed and cut off between 15 and 16 Å. The protein-protein complex was minimized for 500,000 steps with the protein backbone fixed, followed by another 500,000 steps without restraints. The quality of the final models was assessed using MolProbity.49 GBIS minimization significantly improved the MolProbity score of the complex (≤ 1.5 Å), indicating that the clashes, rotamer quality, and Ramachandran quality of the model are within the average values for structures of 1.5 Å resolution. S24 In silico alanine mutagenesis. Computational alanine scanning was performed in order to identify RNase III-YmdB interface hot spots. The FoldX program51 and the Robetta webserver52 were used to calculate the change in binding free energy (ΔΔGbind) of the complex as a result of alanine (Ala) mutation: (ΔGMUTcomplex – ΔGMUTpartner A – ΔGMUTpartner ΔΔGbind = B) (1) – (ΔGWTcomplex – ΔGWTpartner A – ΔGWTpartner B) Here, ΔGXcomplex is the binding free energy of the complex, with X either the wild type (WT) or mutant (MUT), and ΔGXpartner Y is the folding free energy of the interaction partner Y. Ala scanning was performed for each of the representative structures of the clusters (Figs. S7-S9) obtained in the three restrained dockings, using both FoldX and Robetta. The agreement between the results obtained with the two programs was investigated using a Kendall tau rank correlation test; the FoldX and Robetta rankings of ΔΔGbind were found to be consistent with Kendall's tau coefficient 0.35 / 0.43 (RNase III / YmdB, respectively) and (single-tailed) p-value 0.04 / 0.05. Therefore, both programs provide similar orderings of the residue contributions to the binding free energy of the complex, despite the use of different energy functions (in particular, the Robetta energy function does not take into account the possible contribution of the nucleic acid). For each restrained docking (i.e. the YmdB complex with apo RNase III, with RNase III bound to dsRNA in one dsRBD, or with RNase III bound to dsRNA in the catalytic valley), interface residues were considered hot spot candidates if both the FoldX and Robetta ΔΔGbind values were >1 kcal/mol in at least one of the representative structures of the restrained docking. The 1 kcal/mol threshold has been shown to have a positive predictive value of 71% (Robetta) and 73% (FoldX).53 It should be noted that, for the computational Ala scanning, only one of the two equivalent S25 residues of the RNase III dimer is mutated to Ala (i.e. a mutant/wt heterodimer is used in the calculation of ΔΔGbind), whereas the experimental Ala mutagenesis analysis necessarily creates a mutant-mutant homodimer. This constraint would be particularly important for RNase III residues located at the dimer interface, where the two symmetry-related residues may interact simultaneously with YmdB. To address this problem, we have estimated the in silico change in binding free energy for a given mutant-mutant homodimer as the sum of the ΔΔGbind of the two alternative mutant/wt heterodimers:52,54,55 ΔΔGbind (XA/X'A) = ΔΔGbind (XA) + ΔΔGbind (X'A) (2) Here we are assuming that the two single Ala mutations (XA and X'A) are functionally independent, such that the coupling or interaction free energy between the two residues (X and X') is zero. However, the additivity approximation likely does not hold for the D128/D128' and Q130/Q130' pairs, due to the proximity of the residues (Cβ-Cβ distance ~ 9 Å) at the RNase III homodimer interface. For example, there is an electrostatic repulsion between the negative charges of D128 and D128', which is alleviated by hydrogen bond formation with the Q130' and Q130 side chains, respectively. Therefore, Ala mutation of these residue pairs would not only affect the binding energy of the complex (i.e. the ΔGMUTcomplex term in equation 1), but also the stability of RNase III (i.e. ΔGMUTpartner B). Thus, Ala mutation of a single acidic residue (either D128A or D128'A) may confer stabilization to RNase III (i.e. ΔGMUTpartner B < ΔGWTpartner B), because it eliminates the charge repulsion, or may be neutral (i.e. ΔGMUTpartner B ~ ΔGWTpartner B) if the electrostatic stabilization is canceled by the loss of one hydrogen bond. In contrast, a double mutation D128A/D128'A is expected to result in destabilization of RNase III (i.e. ΔGMUTpartner B > ΔGWTpartner B), due to the loss of two hydrogen bonds. Hence, [ΔΔGbind (D128A) + ΔΔGbind (D128'A)] is a low estimate of ΔΔGbind (D128A/D128'A). On the other hand, both single and double Ala mutations of the Q130/Q130' pair are probably destabilizing (i.e. ΔGMUTpartner B > ΔGWTpartner B), due to the loss of screening of the D128 and D128' charges. Nevertheless, the destabilization upon S26 double mutation (Q130A/Q130'A) is expected to be more than twice the value of any of the two single mutations (either Q130A or Q130'A), because the second Ala mutation is introduced in a single RNase III mutant that has already a charge imbalance. In other words, the interaction energy term between Q130 and Q130' is larger than zero, due to indirect coupling through the D128/D128' pair, and thus the approximation in equation 2 may not be fully accurate for the Q130/Q130' pair. The ΔΔGbind values of the hot spot candidates were calculated as a total ensemble average using the following equation: ΔΔGbind = Σj=1,3 Σi=1,Nj (wij ·ΔΔGbind, ij) (3) where the index j runs over the three restrained dockings and, for each restrained docking, the index i runs over the Nj representative structures (six, five or six, respectively, see Figs. S7-S9). The prefactor wij is the population of the representative structure i in the restrained docking j, normalized to the total number of structures of all three restrained dockings (wij = nij / Σj=1,3 Σi=1,Nj nij), and ΔΔGbind, ij is the change in binding free energy upon Ala mutation of the residue considered in structure i in the restrained docking j. The resulting average values are shown in Fig. S12, Table SII, and Table I (see Results section in the main text). For RNase III the ΔΔGbind values correspond to the mutant-mutant homodimer, and were estimated using equation 2. Analysis of each restrained docking separately (i.e. replacing the first summation by a constant j, either 1, or 2 or 3) did not alter the ranking of the hot spot candidates, and yielded closely similar ΔΔGbind values, probably because the different forms of RNase III considered here interact with YmdB in a similar fashion (see the main text). Moreover, an analysis of the Ala scanning results that considered only the two most populated clusters (i.e. limiting Nj to 2) did not significantly change the ΔΔGbind values. S27 Figure S12. Calculated changes in binding energy upon Ala mutation (ΔΔG) of the YmdB-RNase III complex. Shown are the total ensemble averages of ΔΔGbind calculated using equation 3. The ΔΔG values for RNase III correspond to the mutant-mutant homodimer, estimated using equation 2. The gray bars correspond to FoldX values, and the red bars correspond to Robetta values. S28 Solvent-accessible Surface Area (SASA). According to the anchor-latch model,74 the smaller partner of the protein-protein complex provides an “anchor” residue that sequesters the largest solventaccessible surface area (SASA) upon binding. The anchor residue therefore is predicted to be functionally important for protein-protein recognition, usually acting as a “hot spot.” The change in solvent accessible surface area (ΔSASA) was calculated for the YmdB residues using the ANCHOR webserver.64 Analysis of RNase III sequences. The Ec-RNase III sequence was used as query in a BLAST56 analysis, using the translated open reading frames of the genomes of ten phylogenetically distinct bacterial species. The identified orthologous sequences were imported and aligned using ClustalOmega.57 The obtained full-length multiple sequence alignment (MSA) is available upon request; the region discussed in the main text is shown in Fig. 3A. This initial MSA was then used as a seed to train a Hidden Markov Model (HMM), using HMMER3.0.22 Scanning the HMM against the reference proteome 75 (rp75) sequence database, and confining the search to eubacteria and hits having E-values <0.01, 1,117 RNase III sequences were collected and aligned. The resulting HMM-based MSA was used to investigate the residue conservation, or frequency in the polypeptide segment 120140 (Ec-RNase III numbering) shown elsewhere to be important for YmdB binding.13 Sequence logos58 for this region were generated using Weblogo359 and are shown in Fig. S13. The 1,117 RNase III sequences were classified in two main groups, depending on the length of the loop between helices α6 and α7. 1,089 sequences (97.5%) have a “short” loop, whereas 28 (2.5%) have a “long” loop, due to an insertion. In sixteen of the “long loop” sequences (1.4%) the insertion consists of a single amino acid (before residue 128), while in the other twelve sequences (1.1%) two amino acids are introduced between residues 127 and 128. The insertion in the α6-α7 loop observed in the sequence alignment is S29 consistent with the structural alignment of the crystal structures of Aquifex aeolicus (Aa), Thermotoga maritima (Tm) and Mycobacterium tuberculosis (Mt) RNases III (PDB entries 2NUG,20 1O0W and 2A1160, respectively), as well as homology models of Ec-RNase III (this work) and Streptomyces coelicolor RNase III (obtained from ModBase61). Figure S13. Sequence logo of the RNase III segment (“recognition pocket”) implicated in the interaction with YmdB (residues 120-140; Ec-RNase III numbering). Amino acids A, V, I, L, M, W, F and P are in black; T, S, Y, C and G in green; R, K and H in blue; D and E in red; and N and Q in purple. The letter height is proportional to the probability of finding the corresponding amino acid at that particular position. Top, sequence logo of 1,089 RNase III sequences with a “short” loop connecting helices α6 and α7. Bottom, sequence logo of 28 RNase III sequences with a “long” loop connecting helices α6 and α7. S30 Electrostatic surface potential of the RNase III recognition pocket. Continuum electrostatic calculations were performed for the apo form of RNase III using the Adaptive Poisson Boltzmann Solver (APBS) program.73 The crystal structure of Tm-RNase III (1O0W) and the homology models of Ec-RNase III (this work) and Sc-RNase III (obtained from ModBase)61 were used in the calculations. The solvent radius was set to 1.4 Å, the dielectric constants of the protein and the solvent were set to 4.0 and 78.5, respectively, and the ionic strength was adjusted to 150 mM using NaCl. The obtained electrostatic surface potentials are displayed in Fig. S14. Figure S14. Electrostatic surface potentials of apo RNase III structures. (A) Tm-RNase III, (B) EcRNase III, and (C) Sc-RNase III. The protein surface is colored by electrostatic potential, from red (-4 kT/e) to blue (+4 kT/e). The two recognition pockets for the R40 side chain of Ec-YmdB are circled in yellow. Analysis of YmdB ortholog sequences. The Ec-YmdB sequence was used as query in a BLAST56 analysis, using the translated open reading frames of the genomes of eleven other phylogenetically distinct bacterial species. The sequences were imported and aligned using Clustal Omega. 57 The obtained full-length MSA is available upon request; the region discussed in the main text is shown in Figure 3B. S31

prot24751-sup-0001-suppinfosi1

Related documents

Products

Support

prot24751-sup-0001-suppinfosi1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib