Supplementary text: Relationship between Conservation of Interface Location (CIL) and Conservation of Interaction pattern (CIP) scores and binding affinity for transient complexes Although transient protein-protein complexes dissociate and associate to carry out biological functions, their binding affinities are variable; there can be strong transient interactions or weak transient interactions. The strong interactions typically require a trigger for association / dissociation, whereas the weak interactions show a fast bound – unbound equilibrium1. For the main dataset of 93 transient complex pairs used in this study (Table 1, Table 2), binding affinity values were culled from two main curated sources documenting binding affinity for complexes of known structure based on manual curation: structure-based benchmark for protein binding affinity2, PDBbind database v20143. Experimentally measured binding affinity information is available for at least 1 complex in the pair in 56% cases (n=53), with values available for both complexes in 12 cases (for 50%, the binding affinity values of homologous complexes are very similar). The pairs were divided into 3 sets: Low affinity (Kd ≤ 10-7 M, n = 10), Low-Medium affinity (Kd ≤ 10-8 M, n = 31), and High affinity pairs (Kd ≥ 10-9 M, n = 22). The variation between the binding affinity of transient complexes and CIL and CIP scores was analysed (Supplementary Figure S4). The scatter plots and box plots indicate that there is no correlation between the measures. However, this result should be viewed in light of some mitigating factors: a) Experimentally reported binding affinity values can vary commonly by a factor of 2 even when studied under similar conditions of pH, temperature, ionic strength2. b) There are only few data points available for low affinity complexes (in the micromolar range) In summary, there appears to be no distinct trend for CIL and CIP scores in strong and weak transient binders. However, larger amounts of data perhaps in the future is essential to confirm this result. An approach to improve comparative modelling of protein-protein complexes using multiple templates: A case study with GTPase-GAP complex. Mouse GTPase-GAP complex was modelled starting from the tertiary structures and the accuracy of the modelled structure has been assessed in comparison to the crystal structure of the complex (pdb id: 3c7k)4. In order to build a model, information about residue pairs with conserved interaction pattern was obtained from six homologous templates of mouse GTPaseGAP complex (pdb id: 1agr, 1fqj, 2gtp, 2ik8, 2ihb, 2ode) (Supplementary figure S6a).These residues were used for guided protein docking using HADDOCK program5 (refer Materials and Methods). The assessment of the modelled complex is made by comparing the overlap of interface residues and the interacting pairs as identified in the crystal structure (pdb: 3c7k). The docked model was able to correctly predict 88% of the interactions in the protein-protein complex (15 out of 17 interactions from the crystal structure. 11 out of 12 interface residues were predicted for one chain and all the 12 interface residues were predicted for the other chain. 7 residue pairs have been wrongly predicted as interacting residues with some of them being interface residues). These encouraging results led us to use information of residues with conserved interaction pattern only from one template of GTPase-GAP complex (pdb id: 2ihb) (Supplementary figure S6b) whose conservation of interaction pattern score is only 48.5%. Docking results showed equally good results of picking up 76% of the correct interactions in the protein-protein complex (This includes 13 correct interactions out of 17 total numbers of interactions. 10 out of 13 interface residues in one chain and 11 out of 12 interface residues in the other chain were correctly predicted. 6 residue pairs were wrongly predicted to be interacting pairs). It was found that the consensus interaction from all the 12 best docked models showed high confidence for the correctly predicted interactions and low confidence for majority of the wrongly predicted interactions. Moreover, three of them have poor conservation score emphasizing the wrong prediction. We wanted to assess the accuracy of the models thus obtained by supposedly improved method with that of the model obtained using more simple-minded traditional method. Therefore GTPase-GAP complex was also modelled by the traditional method of superimposing the tertiary structure of the target chains onto the template complex using DALI to directly extrapolate the interface residues and interacting residue pairs to the complex to be modelled from the template. The number of correctly predicted interface residues and interacting pairs are lower than CIP residue guided protein docking. Only 59% of interactions were correctly predicted by this method (10 interacting pairs were correctly predicted out of 17. This includes 7 interface residues out of 13 in one chain and 7 out of 12 residues from the other chain. The number of wrongly predicted interface residues is 7). It should be noted that all the core-core interactions in these homologous pairs of protein complexes showed conserved interaction pattern. This test case shows that there is lot of scope to integrate this method with other features using machine learning algorithms6 to improve the accuracy of homology modelled protein-protein complexes. References 1. Kastritis PL, Moal IH, Hwang H, Weng Z, Bates PA, Bonvin AM, Janin J. A structurebased benchmark for protein-protein binding affinity. Protein Sci 2011; 20: 482-91. 2. Wang R, Fang X, Lu Y, Wang S.The PDBbind database: Collection of binding affinities for protein-ligand coplexes with known three-dimensional structures. J Med Chem 2004; 47: 2977-80. 3. Perkins, J.R., Diboun, I., Dessailly, B.H., Lees, J.G. and Orengo, C. Transient proteinprotein interactions: structural, functional, and network properties. Structure, 18, 12331243. 4. Slep KC, Kercher MA, Wieland T, Chen CK, Simon MI, et al. Molecular architecture of Galphao and the structural basis for RGS16-mediated deactivation. Proc Natl Acad Sci U S A 2008; 105: 6243-6248. 5. Dominguez C, Boelens R, Bonvin AM. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc 2003; 125: 1731-1737. 6. Koike A, Takagi T. Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004; 17: 165-173. Supplemental information Legends to Supplementary figures Figure S1 (a) Correlation between the extent of conservation of interface location (CIL) for both homologous subunits (AA’ and BB’) of pairs of transient Protein-Protein Complex (PPC). Lowest and highest CIL scores are plotted in X-axis and Y-axis respectively. (b) Histogram of CIL scores for a dataset of same protein complexed to homologous proteins. (c) Eglin C bound to homologous proteins subtilisin carlsberg (PDB code: 1cse) and thermitase (PDB code: 1tec) are coloured as dark and light green. Residues with CIL are shown as pink sticks. Both the pairs of homologous subunits of the protein complex shows high CIL score. (d) Box plot representation for the distribution of conservation of interface location for identical pairs of transient PPC and permanent homodimers solved in different crystal conditions. (e) Correlation between CIL and global sequence identity for homologous pairs of transient PPC and permanent homodimers. Figure S2 Correlation between conservation of interface location (CIL) and extent of structurally equivalent conserved residue type at interfaces of homologous pairs of transient PPC and permanent dimers. Figure S3 (a) Distribution of extent of conservation of interaction pattern (CIP) for a dataset of same protein complexed to homologous proteins. (b) Eglin C bound to homologous proteins subtilisin carlsberg (1cse) and thermitase (1tec) are coloured in dark and light green. Residues with CIP are shown as pink sticks and the common interactions are shown as dotted lines. The homologous pair of the PPCs shows high CIP score. (c) Correlation between CIL and CIP for homologous pairs of transient PPC and permanent homodimers. Figure S4 The variation of CIL (Conservation of Interface Location) and CIP (Conservation of Interaction Pattern) scores across affinity between interacting proteins in the complexes. Scatter plots of (a) Average CIL score (calculated as CIL1 + CIL2 / 2) and log(Binding Affinity) and (b) CIP score and log(Binding Affinity) are shown, indicating no correlation. Further, subdividing the complexes into groups of low affinity (≤10-7M Kd, n=10), low+medium affinity (≤10-8 M Kd, n=31), and high affinity (≥10-9 M Kd, n=22) complexes and comparing their (c) Average CIL scores (d) CIP scores using Wilcoxon test shows that there is no statistically significant correlation. Similar analyses carried out for Min (CIL1,CIL2) and Max (CIL1,CIL2) gave similar results. Figure S5 (a) Extent of conservation of interface location (CIL) for core and rim interface residues for the dataset of same protein bound to homologous proteins. (b) Extent of CIL for core and rim interface residues with conserved residue type are shown for each of the twenty residue types (c) Extent of conservation of interaction pattern (CIP) for core-core, core-rim and rim-rim interaction categories are shown. Figure S6 (a) Residues showing conserved interaction pattern (CIP) from multiple templates are provided to guide the docking of GTPase-GAP complex (PDB id: 3c7k). The interacting chains are coloured in dark and light grey. Correctly and wrongly predicted interface residues are coloured in green and red respectively. (b) Similarly, residues showing conserved interaction pattern (CIP) from a single template (PDB id: 2ihb) is provided to guide the docking of the GTPase-GAP complex (PDB id: 3c7k). Supplemental information Legends to Supplementary figures Figure S1 (a) Correlation between the extent of conservation of interface location (CIL) for both homologous subunits (AA’ and BB’) of pairs of transient Protein-Protein Complex (PPC). Lowest and highest CIL scores are plotted in X-axis and Y-axis respectively. (b) Histogram of CIL scores for a dataset of same protein complexed to homologous proteins. (c) Eglin C bound to homologous proteins subtilisin carlsberg (PDB code: 1cse) and thermitase (PDB code: 1tec) are coloured as dark and light green. Residues with CIL are shown as pink sticks. Both the pairs of homologous subunits of the protein complex shows high CIL score. (d) Box plot representation for the distribution of conservation of interface location for identical pairs of transient PPC and permanent homodimers solved in different crystal conditions. (e) Correlation between CIL and global sequence identity for homologous pairs of transient PPC and permanent homodimers. Figure S2 Correlation between conservation of interface location (CIL) and extent of structurally equivalent conserved residue type at interfaces of homologous pairs of transient PPC and permanent dimers. Figure S3 (c) Distribution of extent of conservation of interaction pattern (CIP) for a dataset of same protein complexed to homologous proteins. (b) Eglin C bound to homologous proteins subtilisin carlsberg (1cse) and thermitase (1tec) are coloured in dark and light green. Residues with CIP are shown as pink sticks and the common interactions are shown as dotted lines. The homologous pair of the PPCs shows high CIP score. (c) Correlation between CIL and CIP for homologous pairs of transient PPC and permanent homodimers. Figure S4 The variation of CIL (Conservation of Interface Location) and CIP (Conservation of Interaction Pattern) scores across affinity between interacting proteins in the complexes. Scatter plots of (a) Average CIL score (calculated as CIL1 + CIL2 / 2) and log(Binding Affinity) and (b) CIP score and log(Binding Affinity) are shown, indicating no correlation. Further, subdividing the complexes into groups of low affinity (≤10-7M Kd, n=10), low+medium affinity (≤10-8 M Kd, n=31), and high affinity (≥10-9 M Kd, n=22) complexes and comparing their (c) Average CIL scores (d) CIP scores using Wilcoxon test shows that there is no statistically significant correlation. Similar analyses carried out for Min (CIL1,CIL2) and Max (CIL1,CIL2) gave similar results. Figure S5 (a) Extent of conservation of interface location (CIL) for core and rim interface residues for the dataset of same protein bound to homologous proteins. (b) Extent of CIL for core and rim interface residues with conserved residue type are shown for each of the twenty residue types (c) Extent of conservation of interaction pattern (CIP) for core-core, core-rim and rim-rim interaction categories are shown. Figure S6 (a) Residues showing conserved interaction pattern (CIP) from multiple templates are provided to guide the docking of GTPase-GAP complex (PDB id: 3c7k). The interacting chains are coloured in dark and light grey. Correctly and wrongly predicted interface residues are coloured in green and red respectively. (d) Similarly, residues showing conserved interaction pattern (CIP) from a single template (PDB id: 2ihb) is provided to guide the docking of the GTPase-GAP complex (PDB id: 3c7k).