Supplementary Discussion Our “ordered” data set includes protein sequences of various sizes and various fractions of residues exposed to solvent in the surface and these structural features may influence local pairing propensities of residue pairs. To address this point, we calculated LCoSs of amino acid pairs in protein subsets of different sizes (numbers of residues) or different surface-to-volume ratios (SVRs). First, to examine the dependence of LCoS on protein sizes, we divided the ordered data set into four groups according to the number of residues, i.e. ~100, 101~200, 201~300 and 301~, and calculated the LCoSs (Supplementary Table 1). As a result, we did not observe any significant change in LCoSs of the HH, PP or HP groups if protein size is larger than 100 residues. On the other hand, the LCoSs of HP pairs calculated from the smallest size subset of ~100 residues were significantly decreased compared with those calculated from the whole data set (p<0.001 by Wilcoxon singed-rank test). We consider that this difference is an artifact resulting from poor signal/noise ratio due to the small number of amino acid pairs in this subset, because the ratio between the observed and the expected occurrences of local HP pairs was not changed in the smallest size subset from that in the whole data set (1.02, column 9 of Supplementary Table 1). The “significant” decrease of LCoSs may result from the effect of noise, as described below. For example, when we suppose that we use separations of 1 and 2 for calculating LCoS of a pair ab , that the expected frequencies for these two separations are 1 and 1, respectively, and that the observed frequencies for these two separations are also 1 and 1, respectively, then the LCoS of ab is (log( 1 / 1) log( 1 / 1)) / 2 0 . On the other hand, if the observed frequencies are changed to 1 and 1 , respectively, though the sum of the observed frequencies is not changed, the LCoS of ab changes to (log(( 1 ) / 1) log(( 1 ) / 1)) / 2 log( 1 2 ) / 2 0 . Thus, the noise in the observed frequencies at different separations can shift the LCoS values to negative even if the total observed frequency in local proximity remains constant. In addition, to examine the effect of fractional area exposed to solvent on local residue pairings, we also divided the data set according to surface-to-volume ratio (SVR) of proteins, which is defined by the ratio between the accessible surface area (ASA) of a protein and the volume enclosed by the accessible surface. SVRs of most proteins distributed in the range of 0.2~0.4Å-1, and we divided the data set into four groups with SVR ~0.25Å-1, 0.25~0.3Å-1, 0.3~0.35Å-1, 0,35~Å-1, respectively. The results are summarized by the mean and the standard deviation of LCoSs for each of the HH, PP and HP groups, together with the number of proteins and the mean and standard deviation of the sizes of proteins in each subset (Supplementary Table 2). In general, as protein size increases, SVR decreases. Consistent with the result of size-classified subsets, we did not observe any significant change in LCoSs of the HH, PP or HP groups if protein SVR is smaller than 0.35Å-1. On the other hand, the LCoSs of HP pairs calculated from the largest SVR subset of 0.35Å-1~ were significantly decreased compared with those calculated from the whole data set (p<0.001 by Wilcoxon singed-rank test). The ratio between the observed and expected occurrence of local HP pairs did not change between the largest SVR subset and the whole data set (column 9 of Supplementary Table 2). Thus, the decrease in LCoS of HP pairs can also be explained by the effect of large noise in this subset, as observed in the smallest size subset. In summary, the frequencies of local HH, PP and HP pairs relative to their expected frequencies are almost constant in proteins with different sizes or SVRs. Given the current data set of structured proteins, the number of proteins in the subset of the smallest protein size or of the largest SVR is not sufficient to estimate LCoS values accurately. Supplementary Tables Supplementary Table 1. The effect of protein size on LCoS values. The first column shows the number of residues in each subset. The second and third columns indicate the number of proteins and the mean ± S.D. of the SVRs of the proteins in each of the data sets. The fourth through the sixth columns indicate the mean ± S.D. of the LCoSs of 81 HH pairs, of 121 PP pairs and of 198 HP pairs, respectively. The seventh through the ninth columns indicate the ratio between the observed and expected occurrences of the local pairs for HH, PP and HP pairs, respectively. The bold face indicates that the distribution of the LCoSs of the pairs in the column calculated from the subset of the row differs significantly from the distribution of the LCoSs of the same pairs calculated from the whole data set in the bottom (p< 0.001, by Wilcoxon signed-rank test). LCoS Obs/Exp Size #protein SVR(Å-1) ~100 1495 0.34±0.04 -0.042±0.058 -0.019±0.030 0.010±0.038 0.97 0.99 1.02 101~200 3253 0.30±0.03 -0.033±0.037 -0.019±0.026 0.019±0.032 0.97 0.98 1.02 201~300 1578 0.27±0.02 -0.032±0.036 -0.019±0.034 0.019±0.033 0.97 0.98 1.02 301~ 1042 0.25±0.02 -0.031±0.039 -0.018±0.035 0.019±0.032 0.97 0.98 1.02 Total 7368 0.29±0.04 -0.033±0.030 -0.018±0.028 0.019±0.029 0.97 0.98 1.02 HH PP HP HH PP HP Supplementary Table 2. The effect of surface-to-volume ratio (SVR) on LCoS values. The first column shows SVR values in each subset. The third column indicates the mean ± S.D. of the protein sizes in each of the data sets. The second column and the fourth through the ninth columns indicate the same quantities as the corresponding columns in Supplementary Table 1. LCoS Obs/Exp SVR(Å-1) #protein Size HH PP HP HH PP HP ~0.25 1035 358±148 -0.033±0.039 -0.018±0.037 0.018±0.033 0.97 0.98 1.02 0.25~0.30 3295 209±87 -0.031±0.034 -0.019±0.029 0.019±0.031 0.97 0.98 1.02 0.30~0.35 2359 118±44 -0.036±0.032 -0.020±0.025 0.019±0.035 0.97 0.98 1.02 0.35~ 679 78±36 -0.044±0.080 -0.020±0.042 0.003±0.053 0.98 0.98 1.02 Total 7368 188±119 -0.033±0.030 -0.018±0.028 0.019±0.029 0.98 0.98 1.02 Supplementary Figure Supplementary Figure 1 Supplementary Figure 1. The effect of maximum separation for defining local pairs. The means and standard deviations of the LCoSs of HH (blue), PP (red) and HP (black) pairs are shown as functions of maximum separation.