PRO_425_sm_Suppinfo

advertisement
Patil, Kinoshita and Nakamura
“Domain distirbution and intrinsic disorder in hubs in the human protein-protein interaction
network”
Supporting Information
1. Scale-free property of the human protein-protein interaction network
Protein-protein interaction networks are known to have a scale-free topology i.e. they have a few
hubs with a large number of interactions and several non-hubs with a small number of interactions.
As a result, they are characterized by a power-law degree distribution with P(k)~k-γ, typically
having 2<γ<31. In this study, we defined hubs as proteins with 5 or more interactions and non-hubs
as proteins with 1 interaction. We did not include proteins with 2-4 interactions in the non-hub
category in order to minimize the effect of potential hubs in this group with unknown interactions.
This resulted in 4312 hubs and 1929 non-hubs. This classification indicates an apparent lack of the
scale-free nature of the interaction network (where non-hubs are expected to be more than hubs). In
order to confirm that the network studied had a scale-free topology, we plotted the fraction of nodes
having k links (Figure S1). The log-log plot followed a power law distribution with P(k)~k-γ, where γ
= 1.95. These results indicate that the protein-protein interaction network in this study is scale-free.
Thus, the larger number of hubs in the analysis is a result of the criteria used to define hubs and
non-hubs and not due to a lack of the scale-free property of the network.
Figure S1. Log-log plot of the fraction of nodes P(k) with k connections
1
2. Analysis of hubs and non-hubs based on an alternative definition
In order to determine the robustness of our results to the selection criteria, we performed the analyses
using alternate definitions of hubs and non-hubs and compared the results. Figure S2 and Table SVI
show the characteristics of hubs and non-hubs using two alternative criteria. Hubs10+ denotes
proteins with 10 or more interactions (2395), and Non-hubs 1-4 includes proteins with 1-4
interactions (4625). Hubs5+ (same as Hubs in the main article) indicates proteins with 5 or more
interactions (4312) and Non-hubs 1 (same as Non-hubs in the main article) identifies proteins with 1
interaction (1929). Numbers in brackets are the number of proteins in each group.
In all cases, hubs were more likely to have multi-domain architectures, repeat domains and domains
with overlapping disordered regions than non-hubs. All differences between hubs and non-hubs were
statistically significant (p << 0.001 ).
Figure S2. Percent of proteins with the characteristics listed on the x-axis: 1) Total Multidomain – %
proteins with 2 or more Pfam domains, 2) Repeating domains - % proteins with at least one repeating
domain, 3) Disordered domains - % proteins with a predicted disordered region overlapping at least
one Pfam domain, 4) Distinct Multidomain - % proteins with 2 or more unique Pfam domains
(excluding domain duplicates) , 5) Ordered Multidomain - % proteins with 2 or more ordered Pfam
domains (excluding those with overlapping disordered regions). All differences in the prevalence of
the characteristic between the two types of hubs and non-hubs were statistically significant (p <
0.001).
2
Property
Hubs
10+
Hubs
5+
Non-hubs
1-4
Non-hubs
1
Total Multidomain
Repeating domains
Disordered domains
Distinct Multidomain
Ordered Multidomain
58.7
27.1
26.4
50.4
50.2
54.5
25.8
24.5
45.5
46.8
45.4
22.8
17.9
35.8
40.5
43.9
22.6
16.6
34.3
39.6
Table SVI. Percentage of proteins with each characteristic listed in the left hand column and shown
in Figure S2 above.
We also compared the differences in the average fraction of disordered residues and average length
of hubs and non-hubs in the four groups in Table SVII below. Hubs had higher levels of intrinsic
disorder and were longer than non-hubs in both groups. All differences between hubs and non-hubs
were statistically significant (p << 0.001 ).
% Disorder
Length
Hubs 10+
Hubs 5+
20.52 ±0.86
19.74 ±0.64
692.01 ±25.45
660.75 ±18.26
Non-hubs 1-4
Non-hubs 1
15.44 ±0.58
13.82 ±0.87
576.18 ±14.87
571.79 ±23.85
Table SVII. Average percentage of disordered residues and average length in different types of hubs
and non-hubs
Additionally, we also found an inverse correlation between the number of ordered domains in the
hubs with 10 or more interactions and their fraction of disordered residues (r =-0.1814 , ρ= -0.2146,
p << 0.001) . Hubs 10+ also showed a positive correlation between their number of distinct domains
and the number of interactions (r = 0.1059, ρ= 0.1095, p << 0.001). Here, r denotes the Pearson’s
correlation coefficient, ρ indicates the Spearman’s rank order correlation coefficient and p gives the
statistical significance of both.
Thus, we conclude that our results are robust and do not show any bias as a result of using any
particular criteria to identify hubs and non-hubs.
3
3. Analysis of hubs and non-hubs obtained by excluding derived interactions
In order to see if the interactions derived from protein complexes bias our results, we performed the
analysis using hubs and non-hubs identified from an interaction network obtained entirely from
28718 observed binary interactions. Figure S3 show the characteristics of hubs and non-hubs using
two alternative criteria. Hubs are proteins with 5 or more interactions (3049) and Non-hubs are
proteins with 1 interaction (2039). Numbers in brackets are the number of proteins in each group.
In all cases, hubs were more likely to have multi-domain architectures, repeat domains and domains
with overlapping disordered regions than non-hubs (p << 0.001).
Figure S3. Percent of proteins with the characteristics listed on the x-axis: 1) Total Multidomain – %
proteins with 2 or more Pfam domains, 2) Repeating domains - % proteins with at least one repeating
domain, 3) Disordered domains - % proteins with a predicted disordered region overlapping at least
one Pfam domain, 4) Distinct Multidomain - % proteins with 2 or more unique Pfam domains
(excluding domain duplicates) , 5) Ordered Multidomain - % proteins with 2 or more ordered Pfam
domains (excluding those with overlapping disordered regions). All differences in the prevalence of
the characteristic between the two types of hubs and non-hubs were statistically significant (p <<
0.001).
We also compared the differences in the average fraction of disordered residues and average length
of hubs and non-hubs in the four groups in Table SVIII below. Hubs had higher levels of intrinsic
disorder and were longer than non-hubs in both groups. All differences between hubs and non-hubs
were statistically significant (p << 0.001 ).
% Disorder
Length
4
Hubs
20.90 ±0.77
677.44 ±22.10
Non-hubs
14.50 ±0.87
568.95 ±21.29
Table SVIII. Average percentage of disordered residues and average length in different types of hubs
and non-hubs
Additionally, we also found an inverse correlation between the number of ordered domains in the
hubs and their fraction of disordered residues (r =-0.1888, ρ = -0.2399, p << 0.001). Hubs also
showed a positive correlation between their number of distinct domains and the number of
interactions (r = 0.1230, ρ = 0.1273, p << 0.001). Here, r denotes the Pearson’s correlation
coefficient, ρ indicates the Spearman’s rank order correlation coefficient and p gives the statistical
significance of both.
Thus, we conclude that the inclusion of derived binary interactions from protein complexes does not
bias the results.
4. Correlation using Spearman’s rank correlation coefficient
To provide further support for correlation, Spearman’s rank correlation coefficient (SRCC) was
calculated, in addition to the Pearson’s correlation coefficient (PCC) , for the following two cases:
Datasets in hubs
PCC (r)
SRCC (ρ)
Number of distinct domains Vs
Number of interactions
0.1239 (p = 2.22e-16)
0.1348 (p < 2.2e-16)
Number of ordered domains Vs
Fraction of disordered residues
-0.18304 (p < 2.2e-16)
-0.2341 (p < 2.2e-16)
Significance values (p) are indicated in brackets.
References
1. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization.
Nat Rev Genet 5:101-113.
5
Download