pro2724-sup-0001-suppinfo01

Supporting Information Figure 1. Sequence-, structure-, and signature-based networks for the enolase superfamily. Networks for the enolase superfamily were created using three different edge metrics as the comparison tool. The sequence-, structure-, and signature-based networks utilized pairwise BLAST, TM-Align, and ASP scores as their edge metrics, respectively (see Methods). From left to right, the edge threshold is increased, removing the weakest relationships, creating smaller and smaller clusters that are more and more closely related. A color key indicating SFLD subgroup and family annotation is shown on the right. Edge thresholds were chosen to demonstrate cluster development and ended when one or more subgroups had devolved into mostly singlets/doublets. Stars above the networks indicate the edge threshold with the highest count of SFLD subgroups identified distinctly with all members in the same cluster (number of subgroups in parenthesis next to stars). Arrows and boxes in the networks correspond to annotation on figures found in the text. Supporting Information Figure 2. Sequence-, structure-, and signature-based networks for the peroxiredoxin superfamily. Networks for the peroxiredoxin superfamily were created using three different edge metrics as the comparison tool. The sequence-, structure-, and signature-based networks utilized pairwise BLAST, TM-Align, and ASP scores as their edge metrics, respectively (see Methods). From left to right, the edge threshold is increased, removing the weakest relationships, creating smaller and smaller clusters that are more and more closely related. A color key indicating SFLD subgroup annotation is shown on the right. Edge thresholds were chosen to demonstrate cluster development and ended when one or more subgroups had devolved into mostly singlets/doublets. Stars above the networks indicate the edge threshold with the highest count of SFLD subgroups identified distinctly with all members in the same cluster (number of subgroups in parenthesis next to stars). Circles in the networks correspond to annotation on figures found in the text. Supporting Information Figure 3. Sequence-, structure-, and signature-based networks for the glutathione transferase superfamily. Networks for the glutathione transferase superfamily were created using three different edge metrics as the comparison tool. The sequence-, structure-, and signature-based networks utilized pairwise BLAST, TM-Align, and ASP scores as their edge metrics, respectively (see Methods). From left to right, the edge threshold is increased, removing the weakest relationships, creating smaller and smaller clusters that are more and more closely related. A color key indicating SFLD subgroup annotation is shown on the right. Edge thresholds were chosen to demonstrate cluster development and ended when one or more subgroups had devolved into mostly singlets/doublets. Stars above the networks indicate the edge threshold with the highest count of SFLD subgroups identified distinctly with all members in the same cluster (number of subgroups in parenthesis next to stars). Circles in the networks correspond to annotation on figures found in the text. Supporting Information Figure 4. Sequence-, structure-, and signature-based networks for the crotonase superfamily. Networks for the crotonase superfamily were created using three different edge metrics as the comparison tool. The sequence-, structure-, and signature-based networks utilized pairwise BLAST, TM-Align, and ASP scores as their edge metrics, respectively (see Methods). From left to right, the edge threshold is increased, removing the weakest relationships, creating smaller and smaller clusters that are more and more closely related. A color key indicating SFLD subgroup and family annotation is shown on the right. Edge thresholds were chosen to demonstrate cluster development and ended when one or more families had devolved into mostly singlets/doublets. Stars above the networks indicate the edge threshold with the highest count of SFLD families identified distinctly with all members in the same cluster (number of families in parenthesis next to stars). Circles in the networks correspond to annotation on figures found in the text. Supporting Information Figure 5. Signature logos for the Prx and crotonase superfamilies. A. Signature logos were created for the entire peroxiredoxin superfamily (top) and the four largest clusters at the 0.35 filter threshold in the signature-based network. B. Signature logos were created for the three largest crotonase clusters at the 0.25 filter threshold in the signature-based network. The number of proteins in each cluster, as well as the dominant subgroup, is shown above the cluster. Supporting Information Figure 6. Key residues are identified using structural overlays with a representative protein. A. 1UIY (blue) and 2Q2X (purple) are structurally aligned with a representative protein, 1DUB (grey), in which key residues have been experimentally identified. B. The key residues defined for the representative protein (black) are used as guides to identify the structurally analogous residues in the other proteins (dark blue and dark purple). Supporting Information Figure 7. Pairwise score distribution for ASP, BLAST, and TM-Align enolase networks demonstrate why multiple clusters are identified at the sequence-based “no filter”. The count of enolase pairwise scores is shown in each bin of size 0.05 between 0 and 1. Bars of blue, green, and black represent the scores from the ASP, TM-Align, and BLAST scoring metrics, respectively. BLAST scores are shown on a log scale in the inset, with a bin size of 1E-5. Though the pairwise edge scores for all three metrics are in similar ranges, the distribution of these scores is not consistent among the three networks. The ASP scores and the TM-Align scores are left- and rightshifted, respectively, and their centers are different, with means of 0.22 and 0.78, respectively. BLAST scores, on the other hand, exhibit a bimodal distribution and are mostly found in the first bin (0 – 0.05) and the last bin (>1). The median of the BLAST scores is 3E-7 while the mean is 42 due to skewing from the large values in the >1 bin (maximum is 4765). As a result of this score distribution, the edges >1 are removed during MCL clustering, causing the no filter threshold network to contain multiple distinct groups. Conversely, both the ASP or TM-Align networks show one large group at no filter because no edges are extremely different from the median edge. The BLAST scores that should be defining relevant protein clusters are the scores between 0 and 0.05; the distribution of these scores is skewed left (inset).

pro2724-sup-0001-suppinfo01

Related documents

Products

Support

pro2724-sup-0001-suppinfo01

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib