pro2255-sup-0001-suppinfo

advertisement
Supplementary Discussion
Our “ordered” data set includes protein sequences of various sizes and various
fractions of residues exposed to solvent in the surface and these structural features may
influence local pairing propensities of residue pairs.
To address this point, we
calculated LCoSs of amino acid pairs in protein subsets of different sizes (numbers of
residues) or different surface-to-volume ratios (SVRs).
First, to examine the dependence of LCoS on protein sizes, we divided the
ordered data set into four groups according to the number of residues, i.e. ~100,
101~200, 201~300 and 301~, and calculated the LCoSs (Supplementary Table 1).
As
a result, we did not observe any significant change in LCoSs of the HH, PP or HP
groups if protein size is larger than 100 residues.
On the other hand, the LCoSs of HP
pairs calculated from the smallest size subset of ~100 residues were significantly
decreased compared with those calculated from the whole data set (p<0.001 by
Wilcoxon singed-rank test).
We consider that this difference is an artifact resulting
from poor signal/noise ratio due to the small number of amino acid pairs in this subset,
because the ratio between the observed and the expected occurrences of local HP pairs
was not changed in the smallest size subset from that in the whole data set (1.02,
column 9 of Supplementary Table 1). The “significant” decrease of LCoSs may result
from the effect of noise, as described below.
For example, when we suppose that we use separations of 1 and 2 for
calculating LCoS of a pair ab , that the expected frequencies for these two separations
are 1 and 1, respectively, and that the observed frequencies for these two separations are
also 1 and 1, respectively, then the LCoS of ab is (log( 1 / 1)  log( 1 / 1)) / 2  0 . On
the other hand, if the observed frequencies are changed to 1   and 1   ,
respectively, though the sum of the observed frequencies is not changed, the LCoS of
ab changes to (log(( 1   ) / 1)  log(( 1   ) / 1)) / 2  log( 1   2 ) / 2  0 .
Thus, the
noise in the observed frequencies at different separations can shift the LCoS values to
negative even if the total observed frequency in local proximity remains constant.
In addition, to examine the effect of fractional area exposed to solvent on local
residue pairings, we also divided the data set according to surface-to-volume ratio
(SVR) of proteins, which is defined by the ratio between the accessible surface area
(ASA) of a protein and the volume enclosed by the accessible surface.
SVRs of most
proteins distributed in the range of 0.2~0.4Å-1, and we divided the data set into four
groups with SVR ~0.25Å-1, 0.25~0.3Å-1, 0.3~0.35Å-1, 0,35~Å-1, respectively.
The
results are summarized by the mean and the standard deviation of LCoSs for each of the
HH, PP and HP groups, together with the number of proteins and the mean and standard
deviation of the sizes of proteins in each subset (Supplementary Table 2).
In general,
as protein size increases, SVR decreases. Consistent with the result of size-classified
subsets, we did not observe any significant change in LCoSs of the HH, PP or HP
groups if protein SVR is smaller than 0.35Å-1.
On the other hand, the LCoSs of HP
pairs calculated from the largest SVR subset of 0.35Å-1~ were significantly decreased
compared with those calculated from the whole data set (p<0.001 by Wilcoxon
singed-rank test). The ratio between the observed and expected occurrence of local HP
pairs did not change between the largest SVR subset and the whole data set (column 9
of Supplementary Table 2). Thus, the decrease in LCoS of HP pairs can also be
explained by the effect of large noise in this subset, as observed in the smallest size
subset.
In summary, the frequencies of local HH, PP and HP pairs relative to their
expected frequencies are almost constant in proteins with different sizes or SVRs.
Given the current data set of structured proteins, the number of proteins in the subset of
the smallest protein size or of the largest SVR is not sufficient to estimate LCoS values
accurately.
Supplementary Tables
Supplementary Table 1. The effect of protein size on LCoS values. The first column shows the
number of residues in each subset. The second and third columns indicate the number of proteins
and the mean ± S.D. of the SVRs of the proteins in each of the data sets. The fourth through the
sixth columns indicate the mean ± S.D. of the LCoSs of 81 HH pairs, of 121 PP pairs and of 198 HP
pairs, respectively. The seventh through the ninth columns indicate the ratio between the observed
and expected occurrences of the local pairs for HH, PP and HP pairs, respectively.
The bold face
indicates that the distribution of the LCoSs of the pairs in the column calculated from the subset of
the row differs significantly from the distribution of the LCoSs of the same pairs calculated from the
whole data set in the bottom (p< 0.001, by Wilcoxon signed-rank test).
LCoS
Obs/Exp
Size
#protein
SVR(Å-1)
~100
1495
0.34±0.04
-0.042±0.058
-0.019±0.030
0.010±0.038
0.97
0.99
1.02
101~200
3253
0.30±0.03
-0.033±0.037
-0.019±0.026
0.019±0.032
0.97
0.98
1.02
201~300
1578
0.27±0.02
-0.032±0.036
-0.019±0.034
0.019±0.033
0.97
0.98
1.02
301~
1042
0.25±0.02
-0.031±0.039
-0.018±0.035
0.019±0.032
0.97
0.98
1.02
Total
7368
0.29±0.04
-0.033±0.030
-0.018±0.028
0.019±0.029
0.97
0.98
1.02
HH
PP
HP
HH
PP
HP
Supplementary Table 2. The effect of surface-to-volume ratio (SVR) on LCoS values. The first
column shows SVR values in each subset. The third column indicates the mean ± S.D. of the
protein sizes in each of the data sets. The second column and the fourth through the ninth columns
indicate the same quantities as the corresponding columns in Supplementary Table 1.
LCoS
Obs/Exp
SVR(Å-1)
#protein
Size
HH
PP
HP
HH
PP
HP
~0.25
1035
358±148
-0.033±0.039
-0.018±0.037
0.018±0.033
0.97
0.98
1.02
0.25~0.30
3295
209±87
-0.031±0.034
-0.019±0.029
0.019±0.031
0.97
0.98
1.02
0.30~0.35
2359
118±44
-0.036±0.032
-0.020±0.025
0.019±0.035
0.97
0.98
1.02
0.35~
679
78±36
-0.044±0.080
-0.020±0.042
0.003±0.053
0.98
0.98
1.02
Total
7368
188±119
-0.033±0.030
-0.018±0.028
0.019±0.029
0.98
0.98
1.02
Supplementary Figure
Supplementary Figure 1
Supplementary Figure 1. The effect of maximum separation for defining local pairs.
The means
and standard deviations of the LCoSs of HH (blue), PP (red) and HP (black) pairs are shown as
functions of maximum separation.
Download