Patil, Kinoshita and Nakamura “Domain distirbution and intrinsic disorder in hubs in the human protein-protein interaction network” Supporting Information 1. Scale-free property of the human protein-protein interaction network Protein-protein interaction networks are known to have a scale-free topology i.e. they have a few hubs with a large number of interactions and several non-hubs with a small number of interactions. As a result, they are characterized by a power-law degree distribution with P(k)~k-γ, typically having 2<γ<31. In this study, we defined hubs as proteins with 5 or more interactions and non-hubs as proteins with 1 interaction. We did not include proteins with 2-4 interactions in the non-hub category in order to minimize the effect of potential hubs in this group with unknown interactions. This resulted in 4312 hubs and 1929 non-hubs. This classification indicates an apparent lack of the scale-free nature of the interaction network (where non-hubs are expected to be more than hubs). In order to confirm that the network studied had a scale-free topology, we plotted the fraction of nodes having k links (Figure S1). The log-log plot followed a power law distribution with P(k)~k-γ, where γ = 1.95. These results indicate that the protein-protein interaction network in this study is scale-free. Thus, the larger number of hubs in the analysis is a result of the criteria used to define hubs and non-hubs and not due to a lack of the scale-free property of the network. Figure S1. Log-log plot of the fraction of nodes P(k) with k connections 1 2. Analysis of hubs and non-hubs based on an alternative definition In order to determine the robustness of our results to the selection criteria, we performed the analyses using alternate definitions of hubs and non-hubs and compared the results. Figure S2 and Table SVI show the characteristics of hubs and non-hubs using two alternative criteria. Hubs10+ denotes proteins with 10 or more interactions (2395), and Non-hubs 1-4 includes proteins with 1-4 interactions (4625). Hubs5+ (same as Hubs in the main article) indicates proteins with 5 or more interactions (4312) and Non-hubs 1 (same as Non-hubs in the main article) identifies proteins with 1 interaction (1929). Numbers in brackets are the number of proteins in each group. In all cases, hubs were more likely to have multi-domain architectures, repeat domains and domains with overlapping disordered regions than non-hubs. All differences between hubs and non-hubs were statistically significant (p << 0.001 ). Figure S2. Percent of proteins with the characteristics listed on the x-axis: 1) Total Multidomain – % proteins with 2 or more Pfam domains, 2) Repeating domains - % proteins with at least one repeating domain, 3) Disordered domains - % proteins with a predicted disordered region overlapping at least one Pfam domain, 4) Distinct Multidomain - % proteins with 2 or more unique Pfam domains (excluding domain duplicates) , 5) Ordered Multidomain - % proteins with 2 or more ordered Pfam domains (excluding those with overlapping disordered regions). All differences in the prevalence of the characteristic between the two types of hubs and non-hubs were statistically significant (p < 0.001). 2 Property Hubs 10+ Hubs 5+ Non-hubs 1-4 Non-hubs 1 Total Multidomain Repeating domains Disordered domains Distinct Multidomain Ordered Multidomain 58.7 27.1 26.4 50.4 50.2 54.5 25.8 24.5 45.5 46.8 45.4 22.8 17.9 35.8 40.5 43.9 22.6 16.6 34.3 39.6 Table SVI. Percentage of proteins with each characteristic listed in the left hand column and shown in Figure S2 above. We also compared the differences in the average fraction of disordered residues and average length of hubs and non-hubs in the four groups in Table SVII below. Hubs had higher levels of intrinsic disorder and were longer than non-hubs in both groups. All differences between hubs and non-hubs were statistically significant (p << 0.001 ). % Disorder Length Hubs 10+ Hubs 5+ 20.52 ±0.86 19.74 ±0.64 692.01 ±25.45 660.75 ±18.26 Non-hubs 1-4 Non-hubs 1 15.44 ±0.58 13.82 ±0.87 576.18 ±14.87 571.79 ±23.85 Table SVII. Average percentage of disordered residues and average length in different types of hubs and non-hubs Additionally, we also found an inverse correlation between the number of ordered domains in the hubs with 10 or more interactions and their fraction of disordered residues (r =-0.1814 , ρ= -0.2146, p << 0.001) . Hubs 10+ also showed a positive correlation between their number of distinct domains and the number of interactions (r = 0.1059, ρ= 0.1095, p << 0.001). Here, r denotes the Pearson’s correlation coefficient, ρ indicates the Spearman’s rank order correlation coefficient and p gives the statistical significance of both. Thus, we conclude that our results are robust and do not show any bias as a result of using any particular criteria to identify hubs and non-hubs. 3 3. Analysis of hubs and non-hubs obtained by excluding derived interactions In order to see if the interactions derived from protein complexes bias our results, we performed the analysis using hubs and non-hubs identified from an interaction network obtained entirely from 28718 observed binary interactions. Figure S3 show the characteristics of hubs and non-hubs using two alternative criteria. Hubs are proteins with 5 or more interactions (3049) and Non-hubs are proteins with 1 interaction (2039). Numbers in brackets are the number of proteins in each group. In all cases, hubs were more likely to have multi-domain architectures, repeat domains and domains with overlapping disordered regions than non-hubs (p << 0.001). Figure S3. Percent of proteins with the characteristics listed on the x-axis: 1) Total Multidomain – % proteins with 2 or more Pfam domains, 2) Repeating domains - % proteins with at least one repeating domain, 3) Disordered domains - % proteins with a predicted disordered region overlapping at least one Pfam domain, 4) Distinct Multidomain - % proteins with 2 or more unique Pfam domains (excluding domain duplicates) , 5) Ordered Multidomain - % proteins with 2 or more ordered Pfam domains (excluding those with overlapping disordered regions). All differences in the prevalence of the characteristic between the two types of hubs and non-hubs were statistically significant (p << 0.001). We also compared the differences in the average fraction of disordered residues and average length of hubs and non-hubs in the four groups in Table SVIII below. Hubs had higher levels of intrinsic disorder and were longer than non-hubs in both groups. All differences between hubs and non-hubs were statistically significant (p << 0.001 ). % Disorder Length 4 Hubs 20.90 ±0.77 677.44 ±22.10 Non-hubs 14.50 ±0.87 568.95 ±21.29 Table SVIII. Average percentage of disordered residues and average length in different types of hubs and non-hubs Additionally, we also found an inverse correlation between the number of ordered domains in the hubs and their fraction of disordered residues (r =-0.1888, ρ = -0.2399, p << 0.001). Hubs also showed a positive correlation between their number of distinct domains and the number of interactions (r = 0.1230, ρ = 0.1273, p << 0.001). Here, r denotes the Pearson’s correlation coefficient, ρ indicates the Spearman’s rank order correlation coefficient and p gives the statistical significance of both. Thus, we conclude that the inclusion of derived binary interactions from protein complexes does not bias the results. 4. Correlation using Spearman’s rank correlation coefficient To provide further support for correlation, Spearman’s rank correlation coefficient (SRCC) was calculated, in addition to the Pearson’s correlation coefficient (PCC) , for the following two cases: Datasets in hubs PCC (r) SRCC (ρ) Number of distinct domains Vs Number of interactions 0.1239 (p = 2.22e-16) 0.1348 (p < 2.2e-16) Number of ordered domains Vs Fraction of disordered residues -0.18304 (p < 2.2e-16) -0.2341 (p < 2.2e-16) Significance values (p) are indicated in brackets. References 1. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell's functional organization. Nat Rev Genet 5:101-113. 5