Classification of Protein Complexes based on Biophysics of Association Sandor Vajda Boston University “Tell me with whom you go, and I'll tell you what you are.” Italian Proverb List of Interactions “FYI” filtered yeast interactome (Vidal 2004): • involves ~1500 proteins, • making ~2500 physical interactions Structure: Nature of Intreractions PDB: ~ 25’000 solved crystal structures; ~ 10% complexes computational prediction of structure and specificity of protein – protein complexes H. Jeong et al, Nature 2001 “Tell me how you contact your partners, and I'll tell you who you are.” Protein-protein docking How proteins interact with each other? Docking problem Predict docking configuration from the structures of component proteins Bound vs. unbound docking Conformational change Bound vs.unbound: at least side chain conformations change Fine details Receptor Ligand Coarse details Trypsin/APPI Talk outline 1. What is the current state of docking? 2. What docking calculations tell us about the nature of protein - protein complexes? 3. How to deal with side chain flexibility? Proteins: Basics CASP CAPRI ADEFFGKLSTKK……. Sequence O O N ... N N O O Monomers N ... O O N Building Blocks: backbone & side chains de novo Rigid body degrees of freedom 3 translation 3 rotation docking Structure Prediction Structure Complex + Benchmark set of protein complexes: Chen, R. et al. (2003) A protein-protein docking benchmark. Proteins, 52, 88-91. 22 enzyme-inhibitor 19 antigen-antibody 11 “other” types 7 “difficult” cases Comeau, S. et al. (2003) ClusPro: An automated docking and discrimination method for the prediction of protein complexes. Bioinformatics, 20, 45-50. Chen, R. et al. (2003) ZDOCK: An initial-stage protein-docking algorithm Proteins, 52, 80-87 Li, L. et al. (2003) RDOCK: Refinement of rigid-body protein docking predictions. Proteins, 53, 693-707. Gray, J.J. et al. (2003) Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Molec. Biol. 331, 281-299 How current protein docking programs work? Rigid Body Search Select docked structures with low energy Cluster retained conformations Refine structures Flexible side chains Filter 1: 20,000 Filter 2: 2,000 Filter 3: 30 Filter 4: 1? Submit 10 models to CAPRI Algorithms of the 3 docking methods Method Step 1: Rigid body search (Investigator) Step 2: Rescoring, ranking, filtering, and refinement ClusPro (Camacho and Vajda) Fast Fourier Transform (FFT) correlation approach using ZDOCK or DOT Re-scoring with empirical potentials and clustering Gray and Baker Monte-Carlo search using simplified protein geometry and scoring function Iterative repacking of side chains and rigid-body docking repeated until convergence. Final selection by clustering. ZDOCK (Weng) FFT correlation with shape complementarity, electrostatics, and desolvation FFT correlation with shape complementarity Clustering of conformations to avoid redundancies RDOCK (Weng) Re-scoring with empirical potentials Effect of the interface area uncertain easy difficult very difficult GOOD Effect of hydrophobicity uncertain easy -4 Size vs. Hydrophobicity Type IV difficult Type III uncertain Type II easy Type I easy Type V difficult Benchmark by type Type difficult IV difficult Type V difficult Type III uncertain Type II easy Type I easy Desolvation free energy Type IV Difficult Type III Uncertain Type II Easy Antibody/ Antigen -4 Small signalling complexes Type I Easy Large multienzyme complexes Type II Or Type V? Type V Hopeless Transitional complexes with substantial conformational change Enzymes 1400 2000 3400 Interface Area Table I. Major differences between enzyme-inhibitor and antibody-antigen complexes Property Enzyme-inhibitor complexes Antibody-antigen complexes Interface area DASA 1400 Å2 < DASA < 2000 Å2, Possibly < 1400 Å2 Interface connectedness Single patch Frequently multiple patches Interface shape Convex-concave Mostly planar Binding free energy DG, kcal/mol -17.5 kcal/mol < DG < -13.0 kcal/mol -13.0 kcal/mol < DG < -6.5 kcal/mol % Nonpolar residues in interface 61% nonpolar (can reach 71%) 51% nonpolar (can be as low as 44%) Desolvation free energy Negative (favorable) Positive (unfavorable) Conformational change Generally moderate Can be substantial; loop and/or hinge motion Crystallographic water positions Around perimeter of interface Within the interface Classification of complexes Type a Conformational change Interface DASAa Hydrophobicity I Small (rigid interface) Standardb II Small DASA > 2000 Å2 III Moderate, but larger than for Type I Standard IV Restricted to side chains DASA <1400 Weak; mostly polar and Å2 charge-charge interactions V Substantial backbone change, C RMSD > 2 Å DASA > 2000 Å2 Docking outcome Successful, unless key side chains are in wrong conformations Strong; the convex-concave interface provides good shape complementarity Unimportant Successful Variable, but generally weak. Charge-charge interactions can be strong Generally moderate Unpredictable; can be very difficult, even with know hypervariable regions of antibody Hits are found, but are generally lost in scoring and ranking Example Trypsinogen and trypsin inhibitor (1cgi): KD = 0.2 pM, DASA = 1950 Å2, and DGdes = -18.3 kcal/mol. Most complexes of enzymes with their protein inhibitors are in this category Ribonuclease a and ribonuclease inhibitor (1dfj): DASA = 2580 Å2, DGdes = 18.6 kcal/mol, DEelec=-63.9 kcal/mol KD = 0.15 nM Hyhel-5 Fab with lysozyme (1mlc): KD = 126M, DASA = 1390 Å2, and DGdes = 3.84 kcal/mol. Most antibody – antigen complexes are in this category Ras and Ras interacting domain (1lfd) KD = 2M, DASA = 1130 Å2, and DGdes = 3.6 kcal/mol. A number of weak complexes are in this category Rigid body Cyclin A and cyclin-dependent kinase 2 methods seem (1fin): KD = 47.6 nM, DASA = 3390 Å2, and to always fail for DGdes = 4.7 kcal/mol these complexes ASA – Acessible Surface Area, bStandard interface: 1400 Å2 < DASA < 2000 Å2, c C RMSD - carbon Root Mean Square Deviation Type I: Enzyme-Inhibitor Complexes trypsin inhibitor variant 3 alpha-chymotrypsinogen Interface in the complex of alpha-chymotrypsinogen with trypsin inhibitor Table I. Major differences between enzyme-inhibitor and antibody-antigen complexes Property Enzyme-inhibitor complexes Antibody-antigen complexes Interface area DASA 1400 Å2 < DASA < 2000 Å2, Possibly < 1400 Å2 Interface connectedness Single patch Frequently multiple patch Interface shape Convex-concave Mostly planar Binding free energy DG, kcal/mol -17.5 kcal/mol < DG < -13.0 kcal/mol -13.0 kcal/mol < DG < -6.5 kcal/mol % Nonpolar residues in interface 61% nonpolar (can reach 71%) 51% nonpolar (can be as low as 44%) Desolvation free energy Negative (favorable) Positive (unfavorable) Conformational change Generally moderate Can be substantial; loop and/or hinge motion Crystallographic water positions Around perimeter of interface Within the interface Classification of complexes Type a Conformational change Interface DASAa Hydrophobicity I Small (rigid interface) Standardb II Small DASA > 2000 Å2 III Moderate, but larger than for Type I Standard IV Restricted to side chains DASA <1400 Weak; mostly polar and Å2 charge-charge interactions V Substantial backbone change, C RMSD > 2 Å DASA > 2000 Å2 Docking outcome Successful, unless key side chains are in wrong conformations Strong; the convex-concave interface provides good shape complementarity Unimportant Successful Variable, but generally weak. Charge-charge interactions can be strong Generally moderate Unpredictable; can be very difficult, even with know hypervariable regions of antibody Hits are found, but are generally lost in scoring and ranking Example Trypsinogen and trypsin inhibitor (1cgi): KD = 0.2 pM, DASA = 1950 Å2, and DGdes = -18.3 kcal/mol. Most complexes of enzymes with their protein inhibitors are in this category Ribonuclease a and ribonuclease inhibitor (1dfj): DASA = 2580 Å2, DGdes = 18.6 kcal/mol, DEelec=-63.9 kcal/mol KD = 0.15 nM Hyhel-5 Fab with lysozyme (1mlc): KD = 126M, DASA = 1390 Å2, DGdes = -3.84 kcal/mol, DEelec = --21.4 kcal/mol, Most antibody – antigen complexes are in this category Ras and Ras interacting domain (1lfd) KD = 2M, DASA = 1130 Å2, and DGdes = 3.6 kcal/mol. A number of weak complexes are in this category Rigid body Cyclin A and cyclin-dependent kinase 2 methods seem (1fin): KD = 47.6 nM, DASA = 3390 Å2, and to always fail for DGdes = 4.7 kcal/mol these complexes ASA – Acessible Surface Area, bStandard interface: 1400 Å2 < DASA < 2000 Å2, c C RMSD - carbon Root Mean Square Deviation Type III: Antigen-Antibody Complexes chicken lysozyme Monoclonal antibody fab d44.1 Interface in the complex of chicken lysozyme with antibody fab d44.1 Desolvation free energy Type IV Difficult Type III Uncertain Type II Easy Antibody/ Antigen -4 Small signalling complexes Type I Easy Large multienzyme complexes Type II Or Type V? Type V Hopeless Transitional complexes with substantial conformational change Enzymes 1400 2000 3400 Interface Area Classification of complexes Type a Conformational change Interface DASAa Hydrophobicity I Small (rigid interface) Standardb II Small DASA > 2000 Å2 III Moderate, but larger than for Type I Standard IV Restricted to side chains DASA <1400 Weak; mostly polar and Å2 charge-charge interactions V Substantial backbone change, C RMSD > 2 Å DASA > 2000 Å2 Docking outcome Successful, unless key side chains are in wrong conformations Strong; the convex-concave interface provides good shape complementarity Unimportant Successful Variable, but generally weak. Charge-charge interactions can be strong Generally moderate Unpredictable; can be very difficult, even with know hypervariable regions of antibody Hits are found, but are generally lost in scoring and ranking Example Trypsinogen and trypsin inhibitor (1cgi): KD = 0.2 pM, DASA = 1950 Å2, and DGdes = -18.3 kcal/mol. Most complexes of enzymes with their protein inhibitors are in this category Ribonuclease a and ribonuclease inhibitor (1dfj): DASA = 2580 Å2, DGdes = 18.6 kcal/mol, DEelec=-63.9 kcal/mol KD = 0.15 nM Hyhel-5 Fab with lysozyme (1mlc): KD = 126M, DASA = 1390 Å2, and DGdes = 3.84 kcal/mol. Most antibody – antigen complexes are in this category Ras and Ras interacting domain (1lfd) KD = 2M, DASA = 1130 Å2, and DGdes = 3.6 kcal/mol. A number of weak complexes are in this category Rigid body Cyclin A and cyclin-dependent kinase 2 methods seem (1fin): KD = 47.6 nM, DASA = 3390 Å2, and to always fail for DGdes = 4.7 kcal/mol these complexes ASA – Acessible Surface Area, bStandard interface: 1400 Å2 < DASA < 2000 Å2, c C RMSD - carbon Root Mean Square Deviation ribonuclease a Ribonuclease inhibitor Interface in the complex of ribonuclease a with ribonuclease inhibitor Desolvation free energy Type IV Difficult Type III Uncertain Type II Easy Antibody/ Antigen -4 Small signalling complexes Type I Easy Large multienzyme complexes Type II Or Type V? Type V Hopeless Transitional complexes with substantial conformational change Enzymes 1400 2000 3400 Interface Area Classification of complexes Type a Conformational change Interface DASAa Hydrophobicity I Small (rigid interface) Standardb II Small DASA > 2000 Å2 III Moderate, but larger than for Type I Standard IV Restricted to side chains DASA <1400 Weak; mostly polar and Å2 charge-charge interactions V Substantial backbone change, C RMSD > 2 Å DASA > 2000 Å2 Docking outcome Successful, unless key side chains are in wrong conformations Strong; the convex-concave interface provides good shape complementarity Unimportant Successful Variable, but generally weak. Charge-charge interactions can be strong Generally moderate Unpredictable; can be very difficult, even with know hypervariable regions of antibody Hits are found, but are generally lost in scoring and ranking Example Trypsinogen and trypsin inhibitor (1cgi): KD = 0.2 pM, DASA = 1950 Å2, and DGdes = -18.3 kcal/mol. Most complexes of enzymes with their protein inhibitors are in this category Ribonuclease a and ribonuclease inhibitor (1dfj): DASA = 2580 Å2, DGdes = 18.6 kcal/mol, DEelec=-63.9 kcal/mol KD = 0.15 nM Hyhel-5 Fab with lysozyme (1mlc): KD = 126M, DASA = 1390 Å2, and DGdes = 3.84 kcal/mol. Most antibody – antigen complexes are in this category Ras and Ras interacting domain (1lfd) KD = 2M, DASA = 1250 Å2, DGdes = 3.6 kcal/mol, and DEelec =-39.5 kcal/mol A number of weak complexes are in this category Rigid body Cyclin A and cyclin-dependent kinase 2 methods seem (1fin): KD = 47.6 M, DASA = 3550 Å2, to always fail for DGdes = 3.9 kcal/mol, and DEelec= -66.5 these complexes kcal/mol. ASA – Acessible Surface Area, bStandard interface: 1400 Å2 < DASA < 2000 Å2, c C RMSD - carbon Root Mean Square Deviation ras-interacting domain of ralgds GNP (5'-guanosyl-imido-triphosphate ras protein Interface in the complex of ras-interacting domain with ras Desolvation free energy Type IV Difficult Type III Uncertain Type II Easy Antibody/ Antigen -4 Small signalling complexes Type I Easy Large multienzyme complexes Type II Or Type V? Type V Hopeless Transitional complexes with substantial conformational change Enzymes 1400 2000 3400 Interface Area Classification of complexes Type a Conformational change Interface DASAa Hydrophobicity I Small (rigid interface) Standardb II Small DASA > 2000 Å2 III Moderate, but larger than for Type I Standard IV Restricted to side chains DASA <1400 Weak; mostly polar and Å2 charge-charge interactions V Substantial backbone change, C RMSD > 2 Å DASA > 2000 Å2 Docking outcome Successful, unless key side chains are in wrong conformations Strong; the convex-concave interface provides good shape complementarity Unimportant Successful Variable, but generally weak. Charge-charge interactions can be strong Generally moderate Unpredictable; can be very difficult, even with know hypervariable regions of antibody Hits are found, but are generally lost in scoring and ranking Example Trypsinogen and trypsin inhibitor (1cgi): KD = 0.2 pM, DASA = 1950 Å2, and DGdes = -18.3 kcal/mol. Most complexes of enzymes with their protein inhibitors are in this category Ribonuclease a and ribonuclease inhibitor (1dfj): DASA = 2580 Å2, DGdes = 18.6 kcal/mol, DEelec=-63.9 kcal/mol KD = 0.15 nM Hyhel-5 Fab with lysozyme (1mlc): KD = 126M, DASA = 1390 Å2, and DGdes = 3.84 kcal/mol. Most antibody – antigen complexes are in this category Ras and Ras interacting domain (1lfd) KD = 2M, DASA = 1130 Å2, and DGdes = 3.6 kcal/mol. A number of weak complexes are in this category Rigid body Cyclin A and cyclin-dependent kinase 2 methods seem (1fin): KD = 47.6 nM, DASA = 3550 Å2, to always fail for DGdes = 3.9 kcal/mol, DEelec =-66.5 kcal/mol these complexes ASA – Acessible Surface Area, bStandard interface: 1400 Å2 < DASA < 2000 Å2, c C RMSD - carbon Root Mean Square Deviation Type V: Large interface and large conformational change Cyclin-A Cyclin-dependent kinase Li, L. et al. (2003) RDOCK Gray, J.J. et al. (2003) 2: How the community is doing? Overall Success rates of participants in CAPRI 1-5 Classification of CAPRI 1-2 Targets Desolvation free energy, kcal/mol 20 T7 15 Type III uncertain 10 Type II easy Type IV difficult T5 T4 T3 5 0 Type V very difficult T6 T2 T1 -5 Type I easy -10 1000 1500 2000 2500 Interface area DASA 3000 3500 Overall Success rates of participants in CAPRI 1-5 Classification of CAPRI 3-5 Targets Desolvation free energy, kcal/mol 25 20 Type III uncertain T10 15 T8 10 T19 5 0 T14 Type II easy T18 Type V very difficult T12 -5 T13 -10 Type I easy T9 -15 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 Interface area DASA Overall Success rates of participants in CAPRI 1-5 Desolvation free energy Type IV Difficult -4 Type III Uncertain Expected Improvements Type II Easy Small Antibody/ signalling Antigen complexes Type I Easy Type II Or Type V? Transitional complexes with substantial conformational change Large multienzyme complexes Much improved Enzymes 1400 2000 3400 Interface Area Type V Hopeless 3. How to deal with side chain flexibility? Fine details Receptor Ligand Coarse details Trypsin/APPI Recognition mechanisms: Lock-and-key vs. Induced fit Key-and-latch mechanism Rajamani, D., Thiel, S. Vajda, S. and C.J. Camacho. Anchor residues in protein-protein interactions. Proc. Natl. Acad. Sci. USA, 101: 11287-11292, 2004. Key-Latch model key latch KEYS which stay close to the bound conformation in solution LATCHES do not show preference to stay near bound conformation. Individually crystallized protein Predisposition Unbound Bound Simulated Solvated protein RMSD of Arg39 of ribonuclease A with respect to the structure found in the complex (bound; PDB code 1DFJ) and in the individually crystallized ribonuclease A (unbound; PDB code 7RSA). The RMSD was computed for 2000 snapshots of a 4ns MD simulation of 7RSA. 7 Bound Unbound 6 RMSD 5 4 3 2 1 0 1 2 Time (ns) 3 4 Clustering of the conformations of Arg39 in ribonuclease A. The 16 largest clusters were derived from a pairwise RMSD analysis of the MD snapshots, and clustering using a radius of 2Å. The RMSD of the cluster center from the bound conformation is shown on the top/bottom of each bar. The bound conformation is shown in blue, unbound in red, and the dominant conformation from the MD simulations is shown in green. Cluster size 300 200 2.4 1.5 1.8 2.4 2.2 100 2.7 3.3 0 2.3 2.9 2.4 1.8 1.7 Clusters 3.5 1.7 3.1 Complex of trypsin with amyloid β-protein inhibitor (APPI). Key residue Arg-15 is a major contributor to the total binding free energy. HIV-1 NEF/FYN tyrosine kinase SH3 domain complex. Trp-119 is within 1 and 2 Å of the bound conformation for 36% and 96% of the MD. It is stabilized in this native-like conformation by Tyr-93 (and therefore also native-like) in the free state. Thr-97 buries the second largest SASA (70 Å2). Thr97 Tyr93 Asp100 Trp119 Hyhel-5 Fab/lysozyme complex. The main key residue, Arg-45, has a SASA value of 147 Å2; a second key residue, Lys-68, is found buried with a SASA = 93 Å2. Both side chains show native-like properties, sampling during 50% and 97% of the time conformations that were less than 2 Å rmsd from their corresponding bound rotamer. Lys68 Arg45 The complex of acetylcholinesterase with fasciculin. The main key Met-33 is in a native-like conformation during most of the simulation. The SASA encompassed by Met-33 is comparable with the next largest SASA of 78 Å2 resulting from the burial of Arg-27; this anchor is in a native-like conformer during 95% of the MD. Thr8 Arg27 Met33 ΔSASA Complex Receptor/Ligand PDB ID Anchora b ResID Å2 ΔGbind kcal/mol Residence time, %c Rank MD Rotamer library Enzyme/Inhibitor 1BRC Trypsin/APPI (1AAP) Arg 15 251.24 -11.9 1 32† 2SIC Subtilisin BPN/Inhibitor Met 70 196.33 -7.1 1 51† 2SNI Subtilisin novo/CI2 (2CI2) Ile 56 189.79 -7.6 1 37‡ 1CHO α-Chymotrypsin/OMTKY3 Leu 18 180.33 -7.9 1 73‡* 1CSE Subtilisin C/eglin C (1ACB) Leu 45 165.07 -5.1 1 50‡ 1BRS Barnase/barstar (1A19) Asp 35 125.06 -2.5 3 97‡ 1UGH** UDG/UGI Leu 272 180.38 -5.2 1 66‡ 1DFJ Ribonuclease inhibitor/ Asn 67 101.18 -1.2 8 41‡ 7.4† 96.6‡ 97.4‡ 28.5‡ ribonuclease A (7RSA) AchE/FasII (1FSC) Thr 8 96.29 -3.4 4 99‡ 1BQL Hyhel5 Fab/QBL (1DKJ) Arg 45 165.3 -10.1 1 49† 1FBI IgG1 Fab/lysozyme Arg 73 132.72 -1.9 4 46† 1DQJ Hyhel63 Fab/HEL (3LZT) Arg 21 131.4 5.4 1FSS Antigen/Antibody 92† 38.3† 29.1† Native-like Predictions PDB IDa Receptor/Ligandb Bound c UBd e Subtilisin Novo/Chymotrypsin inhibitor2(2CI2) RMSDBound Predictions UBg MDh Bi MDj Met59 2.9 1.5 99 82 Ile56 0.6 0.9 73 74 Residuef Enzyme/Inhibitor complexes 2SNI Anchor replacement Resurf 151 62 151 62 124 1DFJ Ribonuclease inhibitor/ Ribonuclease A (7RSA) 63 22 43 Arg39 3.6 2.7 22 43 1BRC Trypsin/APPI (1AAP) 98 24 45 Arg15 3.8 2.4 70 57 1CHO A-Chymotrypsin/Ovomucoid 3rd domain 182 61 91 Leu/Met18 32 68 1BRS Barnase/Barstar (1A19) 54 23 27 Asp35 0.8 0.6 16 17 1CSE Subtilisin Carlsberg/Eglin C(1ACB) 176 105 116 Leu45 0.6 1 133 119 1FSS Snake venom acetylcholinesterase/FasciculinII (1FSC) 21 3 4 Thr9 0.5 0.6 4 5 2BTF β-actin/Profilin 43 18 15 Arg74* 2.4 1.9 25 28 1WQ1 RAS activating domain/ RAS 29 26 9 Tyr32 0.2 1.5 26 21 Gln61* 2.4 1.5 28 40 Credits Crystallographers: Please submit to CAPRI Dr. Carlos Camacho (University of Pittsburgh) Graduate students at Boston University Stephen Comeau Deepa Rajamani Dima Kozakov Yang Shen Ryan Brenke National Institute of Health