Supplementary Material Defining the limits of homology modelling in information-driven protein docking J.P.G.L.M. Rodrigues1, A.S.J. Melquiond1, E. Karaca1, M. Trellet1,2, M. van Dijk1, G.C.P. van Zundert1, C. Schmitz1, S.J. de Vries1,3, A. Bordogna4, L. Bonati4, P.L. Kastritis1 and Alexandre M.J.J. Bonvin1* 1. Bijvoet Center for Biomolecular Research, Faculty of Science/Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands. 2. Current address: Groupe VENISE, CNRS LIMSI, 91403 Orsay CEDEX, France 3. Current address: Physik-Department (T38), Technische Universität München, James-Franck-Str. 1, 85748 Garching, Germany 4. Università degli Studi di Milano-Bicocca, Piazza della Scienza, 1 - 20126 Milano, Italy * To whom correspondence should be addressed. Alexandre M.J.J. Bonvin Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands. Tel. +31 30 2533859 a.m.j.j.bonvin@uu.nl Content: Homology modelling of interacting partners ................................................................................................................. 3 HADDOCK protocol and energetics details .................................................................................................................. 3 Restraints used in the docking predictions of CAPRI rounds 22-27 ..................................................................... 4 Supplementary Figure S1. .................................................................................................................................................... 6 Supplementary Figure S2. .................................................................................................................................................... 7 Supplementary Table S1. ...................................................................................................................................................... 8 Supplementary Table S2. ...................................................................................................................................................... 9 Supplementary Table S3. ................................................................................................................................................... 11 References ............................................................................................................................................................................... 12 2 Homology modelling of interacting partners Each interacting partner of the protein-protein complexes used in this study was modelled using the following protocol: 1. A sequence similarity search was performed on a database of sequences of proteins with known structure (‘PDB’) using the PSI-BLAST algorithm1 with default parameters. 2. All hits with low statistical significance (Expect (E) value greater than 10-3) or with an alignment overlap shorter than 85% were discarded. 3. Sequences with identities to the target sequence in the range of 20-80% were binned in steps of 5%, and one representative sequence from each ‘bin’ was selected. 4. Each sequence was re-aligned to the target sequence using three different algorithms: DALIlite2, T-Coffee3, Praline4 (all with default parameters). 5. Based on the previously derived alignments, MODELLER 9v85 was used to build homology models. 6. The 10 best models were selected according to their DOPE score6. HADDOCK protocol and energetics details HADDOCK7, 8 incorporates biochemical and/or biophysical information as restraints to drive the modelling (EAIR). The docking protocol consists of three steps: i) rigidbody energy minimization stage, ii) semi-flexible refinement consisting of a simulated annealing optimization in torsion angle space with side-chain and backbone flexibility at the interface, and iii) final refinement in explicit solvent (e.g. TIP3P water). Nonbonded interactions were calculated with the OPLS force field9 using a 8.5Å cut-off. The electrostatic energy (Eelec) was calculated using a shifted Coulomb potential, while the van der Waals energies (Evdw) were calculated with a Lennard-Jones potential, with a switching function between 6.5Å and 8.5Å. In HADDOCK, experimental or prediction information is incorporated in the form of Ambiguous Interaction Restraints (AIRs), a notion similar to that of ambiguous distance restraints based on NOE data in structure calculation of NMR structures (see e.g. Linge et al.10). HADDOCK creates AIRs based on active and passive residues defined by the user. For every active residue, HADDOCK creates one single AIR restraint between that residue and all active and passive residues on the partner molecules. These restraints are incorporated in the energy function through a softsquare harmonic potential EAIR) that depends on an effective distance. This latter is calculated using the following formula: in which A and B are molecules, i iterates over all distance restraints, Natoms indicates all atoms of a given residue and Nres the sum of active and passive residues for a given protein. For two single atoms, the effective distance is equal to the Cartesian distance, 3 but as the number of atoms grows, the effective distance becomes shorter and shorter due to the larger number of distances taken into consideration. By default, HADDOCK enforces an upper limit to the effective distance of 2Å. In case this distance is exceeded, the AIR energy becomes positive and the active residue experiences an attractive force towards the active and passive residues it is restrained to in the partner molecule. Otherwise, the restraint is satisfied and the AIR energy, and consequently the attractive force, is zero. The exact relation between effective distance and energy is described in Nilges et al. 11. Given the many atom-atom distances contributing inversely to the effective distance, a typical AIR restraint is satisfied, depending on the level of ambiguity, if the residue comes within 3-4 Å of any active or passive residues of the partner molecule. As such, (putative) interface residues are driven into (a surface region on) the partner protein, but not to any specific partner residue. Restraints used in the docking predictions of CAPRI rounds 22-27 The specific restraint files and the PDB files matching them (given likely differences in residue numbering, chain naming, etc.) are provided upon request to the authors. Target 46 Unambiguous interaction restraints, derived from inter-domain contacts on a distant structural homologue (1P9112). The N6 Adenine Specific Dna Methylase partner in T46 (chain B) structurally aligns to the fragment 67-185 of 1P91, while the last betastrand of Trm112 Activator Protein (chain A in T46) – residues 115-118 aligns with a small beta-strand in 1P91 (residues 33-40). The restraints were implemented between Carbon alpha atoms with distances taken from 1P91 with a lower bound of 2Å. Additional unambiguous restraints were defined to keep geometry of the SAH heterogroup and the zinc coordination sites in Mtq2. Target 47 Unambiguous interaction restraints, derived from the interface residues of structure provided by the CAPRI committee. The restraints were implemented between Carbon alpha atoms with lower and upper bounds of 0.25Å. Targets 48 & 49 The interaction restraints are a combination of CPORT28 predictions (haddocking.org/services/CPORT) and homology-derived restraints using the PDB entry 2YVJ (NADH-dependent ferredoxin reductase and Rieske-type [2Fe-2S] ferredoxin)13. We also included unambiguous distances to maintain the geometry of ferrodoxin iron coordination site. Target 50 The interaction restraints were defined based on a statistical analysis of the most contacted residues on hemagglutinin (ab initio docking run with 10000 models for the rigid-body stage), together with the assumption that the designed protein HB36.3 was binding to a less mutation-prone surface. On the HB36.3 molecule, the interaction surface was defined based on biophysical properties (aromatic and hydrophobic patches). 4 Target 51 We only defined connectivity restraints between the different monomers of the complex, together with center of mass14 restraints to keep the solutions compact. Target 53 & 54 CPORT predictions (haddock.science.uu.nl/services/CPORT) were used to drive the docking. Target 57 We defined ambiguous interactions restraints between each sulfate group on the heparin molecule and all arginine and lysine residues (guanidinium and amino groups only) of Bt4661. 5 Supplementary Figure S1. Influence of the quality of the homology model and of the data on the scoring performance. The quality of the model is measured by sequence identity between target and template. The impact of the quality of the data is assessed by performing the docking using CAPRI restraints (CI) (right panel) and True interface restraints (TI) (left panel), respectively. 6 Supplementary Figure S2. Changes in backbone interface RMSD of the predicted complexes upon flexible refinement for all acceptable models (iRMSD < 4Å) for all targets (21564 models). The top panels (A & B) show the distribution of changes as measured by the difference in the backbone interface RMSD of the modelled complex between the rigid-body stage and after final refinement. A negative value means the docked model is closer to the native bound conformation. The bottom panels (C & D) show the relation of sequence identity between target/template and change in backbone interface RMSD after flexible refinement with respect to the bound conformation. The models colored according to their CAPRI target: red - T12, blue – T18, green – T26, yellow – T27, purple – T40, orange – T41. 7 Supplementary Table S1. Z-DOPE Cα RMSD (Model vs Template) Backbone i-RMSD (Model vs Template) Verify3D Molprobity Cα RMSD (Model vs Crystal) GDT-TS (Model vs Crystal) Backbone i-RMSD (Model vs Crystal) Sidechain i-RMSD (Model vs Crystal) 0.86 0.75 0.79 0.80 0.52 0.74 0.68 0.66 0.72 0.81 0.51 0.67 0.78 0.71 0.84 0.86 0.62 0.73 0.70 0.65 0.67 0.74 0.59 0.74 0.77 0.72 0.64 0.64 0.11 0.53 0.39 0.43 0.55 0.68 0.44 0.64 0.95 0.75 0.72 0.71 0.43 0.72 0.63 0.58 0.59 0.73 0.41 0.62 0.51 0.64 0.58 0.53 0.52 0.74 0.68 0.66 0.62 0.73 0.70 0.65 0.11 0.53 0.39 0.43 0.43 0.72 0.63 0.58 Predictive Indices 0.63 0.72 0.67 0.55 0.59 0.73 0.81 0.74 0.68 0.73 0.54 0.51 0.59 0.44 0.41 0.72 0.67 0.74 0.64 0.62 0.75 0.83 0.61 0.74 1.00 0.89 0.73 0.85 0.89 1.00 0.67 0.77 0.73 0.67 1.00 0.78 0.85 0.77 0.78 1.00 Calculated Indices TVSMod_RMSD 0.62 0.64 0.75 0.74 0.38 0.55 0.52 0.55 0.68 0.64 0.39 1.00 Qmean 0.37 0.33 0.52 0.55 0.54 0.67 0.61 0.46 0.30 0.40 1.00 0.39 Interface Sequence Identity 0.73 0.63 0.80 0.81 0.37 0.60 0.48 0.53 0.84 1.00 0.40 0.64 Global Sequence Similarity 0.56 0.48 0.73 0.73 0.37 0.44 0.41 0.46 1.00 0.84 0.30 0.68 Global Sequence Identity 0.61 0.59 0.63 0.59 0.37 0.83 0.84 1.00 0.46 0.53 0.46 0.55 i-RMSD from Native Structure (CI) 0.63 0.55 0.59 0.58 0.53 0.94 1.00 0.84 0.41 0.48 0.61 0.52 i-RMSD from Native Structure (TI) TVSMod_Over Cross correlation table of all indices. The values are colour coded from green (high correlation) to red (low correlation). i-RMSD from Native Structure (TI) i-RMSD from Native Structure (CI) Global Sequence Identity Global Sequence Similarity Qmean TVSMod_RMSD TVSMod_Over Z-DOPE Cα RMSD (Model vs Template) Backbone i-RMSD (Model vs Template) Verify3D Molprobity 1.00 0.84 0.70 0.69 0.39 0.73 0.63 0.61 0.56 0.73 0.37 0.62 0.84 1.00 0.59 0.57 0.34 0.65 0.55 0.59 0.48 0.63 0.33 0.64 0.70 0.59 1.00 0.99 0.44 0.66 0.59 0.63 0.73 0.80 0.52 0.75 0.69 0.57 0.99 1.00 0.46 0.65 0.58 0.59 0.73 0.81 0.55 0.74 0.71 0.57 0.92 0.93 0.51 0.64 0.58 0.53 0.63 0.73 0.54 0.72 0.39 0.34 0.44 0.46 1.00 0.53 0.53 0.37 0.37 0.37 0.54 0.38 0.73 0.65 0.66 0.65 0.53 1.00 0.94 0.83 0.44 0.60 0.67 0.55 Interface Sequence Identity Cα RMSD (Model vs Crystal) GDT-TS (Model vs Crystal) Backbone i-RMSD (Model vs Crystal) Sidechain i-RMSD (Model vs Crystal) 0.71 0.86 0.78 0.77 0.95 0.57 0.75 0.71 0.72 0.75 0.92 0.79 0.84 0.64 0.72 0.93 0.80 0.86 0.64 0.71 1.00 0.75 0.83 0.61 0.74 8 Supplementary Table S2. Details of the methods used for the predictive and measured indices. Index Name Sequence Identity Sequence Similarity Sequence Identity (interface only) CαRMSD (model – template) Backbone interface RMSD (model – template) Index Type Predictive Predictive http://www.ebi.ac.uk/Tools/psa/emboss_needle/ http://www.ebi.ac.uk/Tools/psa/emboss_needle/ N/A N/A Measured http://www.ebi.ac.uk/Tools/psa/emboss_needle/ N/A Predictive ProFit V3.1 (//www.bioinf.org.uk/software/profit/) Used alignment from homology modelling as ZONEs for fitting. Predictive ProFit V3.1 (//www.bioinf.org.uk/software/profit/) Used alignment from homology modelling as ZONEs for fitting. Calculation Method Notes The Molprobity score is calculated by a log-weighted combination of the clash score, the percentage of Ramachandran Plot not favoured, and the percentage of bad rotamers. It reflects the crystallographic resolution at which these values would be found. Therefore, the lower the score the better. Score calculated from the atomic coordinates of a model that assesses the compatibility of that model to its own amino acid sequence, as measured by a 3D profile. Atomic distance-dependent statistical potential calculated from a sample of native structures. SVM regression model parameterized on a dataset containing 5.790.889 models that predicts the Cα RMSD between the model and the native structure. SVM regression model parameterized on a dataset containing 5.790.889 models that predicts the fraction of Cα atoms within 3.5Å of their correct positions in the native structure. Linear combination of six structural descriptors: two distance-dependent interaction potentials of mean force based on Cβ atoms and on all atom types to assess long-range interactions; a torsion angle potential; a solvation potential; agreement of secondary structure prediction and calculation; agreement of predicted buried surface area and calculation. MolProbity15 Predictive http:// psvs-1_4-dev.nesg.org Verify3D16 Predictive http:// psvs-1_4-dev.nesg.org Z-DOPE6 Predictive http://modbase.compbio.ucsf.edu/evaluation/ TVSMod_RMSD17 Predictive http://modbase.compbio.ucsf.edu/evaluation/ TVSMod_Over17 Predictive http://modbase.compbio.ucsf.edu/evaluation/ Qmean18 Predictive http://swissmodel.expasy.org/qmean/ CαRMSD (model – reference) Backbone Interface RMSD (model – Measured Measured ProFit V3.1 (//www.bioinf.org.uk/software/profit/) N/A Measured ProFit V3.1 (//www.bioinf.org.uk/software/profit/) N/A 9 reference) Sidechain Interface RMSD (model – reference) GDT-TS19 (model – reference) Measured ProFit V3.1 (//www.bioinf.org.uk/software/profit/) Calculated on all atoms except for the backbone, excluding hydrogens. Measured http://as2ts.proteinmodel.org/AS2TS/LGA_list/lga_ pdblist.html Parameters for LGA calculation: -3 –o1 –d:4.0 -gdc 10 Supplementary Table S3. Performance of the HADDOCK in the latest CAPRI rounds. Detail of the quality of the best submitted model: interface RMSD, ligand RMSD, Fraction of Native Contacts (FNAT), and CAPRI Star Quality. Manual Submission Target Name Complex Name Target Type T46 T47 T48 T49 T50 T51 T53 T54 T57 T58 Methyl transferase Mtq2/Trm112 colicin E2 DNase-Im2 T4moC /T4moH mono-oxygenase T4moC /T4moH mono-oxygenase HB36.3 designed protein / flu hemagglutinin Xylanase Xyn10B Designed Rep4/Rep2 a-repeat Designed neocarzinostatin/Rep16 a-repeat BT4661/heparin complex PliG /SalG lysozyme HH UU UU UU HU UUHU(U) UH UH UU UU Best Model i-RMSD (Å) l-RMSD (Å) 3.76 10.1 0.98 1.52 3.42 9.08 3.55 14.0 1.63 6.69 2.08 6.03 1.73 4.86 --2.04 4.68 2.61 6.91 Fnat Quality 0.49 * 0.86 *** 0.23 * 0.26 * 0.67 ** 0.41 * 0.50 ** --0.88 ** 0.29 * Best Model i-RMSD (Å) l-RMSD (Å) 3.83 11.9 0.80 1.81 --3.07 10.7 --------2.11 4.88 --- Fnat Quality 0.57 * 0.86 *** --0.11 * --------0.77 ** --- Server Submission Target Name Complex Name T46 T47 T48 T49 T50 T51 T53 T54 T57 T58 Methyl transferase Mtq2/Trm112 colicin E2 DNase-Im2 T4moC /T4moH mono-oxygenase T4moC /T4moH mono-oxygenase HB36.3 designed protein / flu hemagglutinin Xylanase Xyn10B Designed Rep4/Rep2 a-repeat Designed neocarzinostatin/Rep16 a-repeat BT4661/heparin complex PliG /SalG lysozyme 11 HH UU UU UU HU UUHU(U) UH UH UU UU References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Boratyn GM, Camacho C, Cooper PS, Coulouris G, Fong A, Ma N, Madden TL, Matten WT, McGinnis SD, Merezhuk Y, Raytselis Y, Sayers EW, Tao T, Ye J, Zaretskaya I. BLAST: a more efficient report with usability improvements. Nucleic Acids Res 2013. Holm L, Kääriäinen S, Rosenström P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics 2008;24(23):2780–2781. Taly J-F, Magis C, Bussotti G, Chang J-M, Di Tommaso P, Erb I, Espinosa-Carrasco J, Kemena C, Notredame C. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nat Protoc 2011;6(11):1669–1682. Simossis VA, Heringa J. PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 2005;33(Web Server issue):W289–W294. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234(3):779–815. Shen M-Y, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2006;15(11):2507–2524. Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc 2003;125(7):1731–1737. de Vries SJ, van Dijk M, Bonvin AMJJ. The HADDOCK web server for data-driven biomolecular docking. Nat Protoc 2010;5(5):883–897. Jorgensen W, Tirado-Rives J. The OPLS potential functions for proteins. Energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 1988;110(6):1657–1666. Linge JP, Habeck M, Rieping W, Nilges M. ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics 2003;19(2):315–316. Nilges M, Gronenborn AM, Brünger AT, Clore GM. Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng 1988;2(1):27–38. Das K, Acton T, Chiang Y, Shih L, Arnold E, Montelione GT. Crystal structure of RlmAI: implications for understanding the 23S rRNA G745/G748-methylation at the macrolide antibiotic-binding site. Proc Natl Acad Sci USA 2004;101(12):4041–4046. Senda M, Kishigami S, Kimura S, Fukuda M, Ishida T, Senda T. Molecular mechanism of the redox-dependent interaction between NADH-dependent ferredoxin reductase and Rieske-type [2Fe-2S] ferredoxin. J Mol Biol 2007;373(2):382–400. de Vries SJ, Bonvin AMJJ. CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS ONE 2011;6(3):e17695. Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 2010;66(Pt 1):12–21. Eisenberg D, Lüthy R, Bowie JU. VERIFY3D: assessment of protein models with threedimensional profiles. Meth Enzymol 1997;277:396–404. Eramian D, Eswar N, Shen M-Y, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci 2008;17(11):1881–1893. Benkert P, Biasini M, Schwede T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2011;27(3):343–350. Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31(13):3370–3374. 12