8. Protein Docking 1 Prediction of protein-protein interactions 1. How do proteins interact? 2. Can we predict and manipulate those interactions? Prediction of Structure – Docking Prediction of Binding Design – creation of new interactions 2 Docking vs. ab initio modeling de novo Structure Prediction (ROSETTA) Sequence ADEFFGKLSTKK……. O N N O O Monomers + O N ... Docking (ROSETTADOCK) N ... O O N Building Blocks: backbone & side chains CASP Structure Rigid body degrees of freedom 3 translation 3 rotation CAPRI Complex 3 Protein-protein docking Aim: predict the structure of a protein complex from its partners + Rigid body degrees of freedom 3 translation 3 rotation Monomers Complex 4 Monomers change structure upon binding to partner + = Solution 1: Tolerate clashes + = Fast ↓ Weak discrimination of correct solution Solution 2: Model changes + = ↓ Slow Precise 5 Protein-protein docking Sampling strategies Initial approaches: Techniques for fast detection of shape complementarity 1. Fast Fourier Transform (FFT) 2. Geometric hashing Advanced high-resolution approaches: model changes explicitly 3. Rosettadock Data-driven docking 4. Haddock 6 Find shape complementarity: 1. Fast Fourier Transform (FFT) Ephraim Katzir + 7 Find shape complementarity - FFT Ephraim Katzir 8 Find shape complementarity: Fast Fourier Transform (FFT) Ephraim Katzir Correlation Test all possible positions of ligand and receptor: • For each rotation of ligand (R) • evaluate all translations (T) of ligand grid over Y Translation X Translation receptor grid z = correlation product: can be calculated by FFT 9 Find shape complementarity: Fast Fourier Transform (FFT) Ephraim Katzir R Discretize R R Fast Fourier Transform A=DFT(a) Computational cost: N3logN3 (instead of N6) L Rotate Surface 1 L Discretize Interior <0 for R >0 for L Correlation function C=A*B S=iDFT(C) L Fast Fourier Transform B=DFT(b) 10 From http://zlab.bu.edu/~rong/be703/ Find shape complementarity: Fast Fourier Transform (FFT) Correlation Increase the speed by 107 IFFT R L Y Translation Surface X Translation Interior Binding Site 11 From http://zlab.bu.edu/~rong/be703/ Some FFT-based docking protocols • • • • • • Zdock (Weng) Cluspro (Vajda, Camacho) PIPER (Vajda, Kozakov) Molfit (Eisenstein) DOT (TenEyck) HEX (Ritchie) – FFT in rotation space 12 Shape complementarity: 2. Geometric hashing (patchdock, Wolfson & Nussinov) Matching of puzzle pieces 1. Define geometric patches (concave, convex, flat) 2. Surface patch matching 3. Filtering and scoring 13 From http://bioinfo3d.cs.tau.ac.il/PatchDock/patchdock.html Hashing: alpha shapes • Formalizes the idea of “shape” • In 2D an “edge” between two points is “alpha-exposed” if there exists a circle of radius alpha such that the two points lie on the surface of the circle and the circle contains no other points from the point set 14 Hashing – sparse surface representation 15 Slide from Jens Meiler Docking with geometric hashing PATCHDOCK • Fast and versatile approach • Speed allows easy extension to multiple protein docking, flexible hinge docking, etc • A extension of this protocol, FIREDOCK, includes side chain optimization (RosettaDocklike) – very flexible, fast and accurate protocol 16 High-resolution docking with Rosetta: Rosettadock Random Start Random Start Position Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Clustering Predictions 105 17 Choosing starting orientations 1. Global search Random Translation Random Rotation (Euler Angles) 1. 2. 3. Tilt direction [0..360o] Tilt angle [0:90o] Spin angle [0..360o] • Euler angles are independent and guarantee non-biased search 18 Choosing starting orientations 2. Local Refinement Translation 3Å normal, 8Å parallel Rotation 80 1. 2. 3. Tilt direction [0±8o] Tilt angle Spin angle 19 Overview of docking algorithm Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Clustering Predictions 105 20 Low-resolution search 1. 2. 3. 4. Perturbation Monte Carlo search Rigid body translations and rotations Residue-scale interaction potentials Protein representation: backbone atoms + average centroids O N ... N O O Mimics physical O N N ... diffusion process O 21 O Residue-scale scoring Score Representation Physical Force rcentroid-centroid < 6 Å Attractive van der Waals Bumps (r – Rij)2 Repulsive van der Waals Residue environment -ln(Penv) Solvation -ln(Pij) Hydrogen bonding electrostatics, solvation -1 for interface residues in Antibody CDR (bioinformatic) varies (biochemical) Contacts Residue pair Alignment Constraints 22 Overview of docking algorithm Random Start Position Low-Resolution Monte Carlo Search Filters HighResolution Refinement Clustering Predictions 105 23 High resolution optimization: Monte Carlo with Minimization (MCM) Cycles of iterative optimization Random perturbation Side chain optimization Random perturbation Side chain optimization Rigid body minimization Rigid body minimization START Energy MC FINISH Rigid body orientations 24 Overview of docking algorithm Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Clustering Predictions 105 25 Filters Low resolution • Antibody profiles • Antigen binding residues at interface • Contact filters • Biological information • Interface residues • Interacting residue pair High resolution • Energy filters speed up creation of low energy models Random perturbation Monte-Carlo (MC) optimization Minimization of rigid body orientation 5 cycles of MC optimization 45 cycles of MC optimization Final scoring Filter1 Filter2 Filter3 26 Overview of docking algorithm Random Start Position Low-Resolution Monte Carlo Search Filters High-Resolution Refinement Clustering Predictions 105 27 Clustering • Compare all top-scoring decoys pairwise • Cluster decoys hierarchically • Decoys within e.g. 2.5Å form a cluster Represents ENTROPY 28 Assessment 1: Benchmark studies Benchmark set contains 54 targets for which bound and unbound structures are known http://zlab.bu.edu/zdock/benchmark.shtml • Bound-Bound – Start with bound complex structure, but remove the side chain configurations so they must be predicted trypsin + inhibitor barnase + barstar • Unbound-Unbound – Start with the individuallycrystallized component proteins in their unbound conformation • Bound-Unbound (Semibound) lysozyme + antibodies 29 Assessment of method on benchmark (54 proteins, Gray et al., 2003) funnel - 3/5 top-scoring models within 5A rmsd Overall performance Bound Docking Perturbation1 42/54 Unbound Docking Perturbation2 32/54 Unbound Docking Global3 1. 2. 3. …….. More than three of top five decoys (by score) that have rmsd less than 5 Å More than three of top five decoys (by score) that predict more than 25% native residue contacts The rank of the first cluster with >25% native residue contacts 28/32 30 Δ score (calculated) Score and performance are correlated with binding affinity -log Ka (experimental) targets with funnels targets without funnels Δ score for bound backbone docking 31 Limitation of “rotamer-based” modeling Near-native model with clash Trp 172 Non-native model without clash Trp 215 Orange and red: native complex; Blue: docking model. PDB code: 1CHO 32 Improved side chain modeling at interface Minimization Rtmin: rotamer trial with minimization • • • Rot I • Rot II Native Randomly pick one residue. Screen a list of rotamers. Minimize each of these rotamers. Accept the one that yields the lowest energy. Additional rotamers • Include free side chain conformation in rotamer library Wang, OSF & Baker,33 2005 RosettaDock simulation 1 model/simulation: energy vs RMSD Final model selected based on energy (and/or sample density) Energy (structural similarity to starting model) Rigid body orientations: RMSD to arbitrary starting structure (Å) 34 RosettaDock simulation 2. Refinement Energy 1. Initial Search (Å) RMSD to arbitrary starting structure RMSD to starting structure of refinement 35 Side chain flexibility is important CAPRI Target 12 Cohesin-Dockerin 0.27Å interface rmsd 87% native contacts 6% wrong contacts Overall rank 1 Dockerin Cohesin red,orange– xray blue – model; green – unbound Carvalho et. al (2003)36PNAS Details of T12 interface Dockerin R53 S45 D39 L22 Y74 N37 L83 E86 Cohesin red,orange– xray blue - model 37 Similar landscapes for different Rosetta predictions Docking Folding Energy function describes well principles energy landscape energy landscape underlying the correct structure of monomers and complexes Phil Bradley 38 Schueler-Furman et. al (2005) Science A Challenging Target RF1-HEMK (T20) Challenge: • Large complex • RF1 to be modeled from RF2 • Disordered Q-loop RF1 RF1 Q-loop Q-loop Q252 loop1 Q252 loop1 loop1 Q-loop loop2 loop2 Q-235 Q-235 Q-235 loop2 HemK HemK Hope: • Q235 methylated • A Gln analog in HemK crystal Strategy: • Trimming – Docking – Loop Modeling - Refining Keys to success: Location of interface with truncated protein Separate modeling of large conformational change in key loop 39 Prediction of large conformational change Q-loop Gln235 I_rmsd 2.34 Ǻ F_nat 34.2% GLN235 C atom shift:14.13Ǻ to 3.91 Ǻ Q-loop global C rmsd: 11.8 Ǻ to 4.8 Ǻ Red, orange – bound; Green,– unbound; Blue -- model 40 Docking with backbone minimization 1 C N 1’ C random perturbation 2SNI Interface energy Fold tree N Red: bound rigid Green: unbound rigid Blue: unbound flexible Interface RMSD # of “hits” in top 10 models repack 10 9 START 8 7 6 Rigid-body Backbone Sidechain minimization 5 4 FINISH Docking Monte Carlo Minimization (MCM) 3 2 1 0 1DFJ 1DQJ 1FSS 1GLA 1UGH 1WQ1 41 2SNI Docking with loop minimization Fold-tree N N 1 2 x 1’ 2’ C C Minimize rigid-body and loop simultaneously Flexible Docking All-atom energy Correctly predicted loop conformation Interface RMSD Red, orange – bound (1T6G, Sansen, S. et al, J.B.C.(2004)); Blue – model; Green – unbound (1UKR, Krengel U. et al, JMB (1996))42 Docking with loop rebuilding 1BTH All-atom energy Bound rigid Ligand RMSD unbound rigid unbound flexible loop 43 Flexible backbone protein–protein docking using ensembles • Incorporate backbone flexibility by using a set of different templates • Generation of set of ensembles: with Rosetta relax protocol, from NMR ensembles, etc 44 Chaudhury & Gray, (2008) Sampling among conformers during docking • Exchange between templates during protocol 45 Evaluation of 4 different protocols 1. key-lock (KL) model rigid-backbone docking 2. conformer selection (CS) model ensemble docking algorithm • Can teach us about the possible binding mechanism (e.g. induced fit vs key-lock) 3. induced fit (IF) model energy-gradient-based backbone minimization 4. combined conformer selection/induced fit (CS/IF) model Brown: high-quality decoys Orange: medium-quality decoys 46 RosettaDock - summary • First program to introduce general (side chain) flexibility during docking • Advanced the docking field towards unbiased high-resolution modeling • Many other protocols have since then incorporated RosettaDock as a high-resolution final step • Targeted introduction of backbone flexibility can improve modeling dramatically 47 4. Data-driven docking • Challenges: – Large conformational space to sample – Conformational changes of proteins upon binding • Approach: restrict search space by previous information – HADDOCK (High Ambiguity Driven protein-protein Docking) 48 Scheme of Haddock Bonvin, JACS 2003 • Information about complex can be retrieved from several sources 49 http://www.nmr.chem.uu.nl/haddock/ Haddock computational scheme 1. Derive Ambiguous Interaction Restraints (AIRs): – Active residues: involved in interaction, and solvent accessible – Passive residues: neighbors of active residues 2. Create CNS restraints file (Used in NMR structure determination) Rational: • Include AIRs in energy function • find protein complex structure with minimum energy Similar to – solving a structure by NMR – Homology modeling with constraints (e.g. Modeler) 50 Overview of Haddock Start Position Rigid body energy minimization: 1. rotational minimization 2. rotational & translational • Align molecules if anisotropic data is available • Satisfy maximum number of AIC • Retain top200 Predictions Semi-flexible simulated annealing (SA) •High temperature rigid body search •Rigid body SA •Semi-flexible SA with flexible side-chains at the interface •Semi-flexible SA with fully flexible interface (both backbone and side-chains) Flexible explicit solvent refinement •Improves energy ranking Clustering 51 Docking – Summary & Outlook • Efficient search using – fast sampling techniques (e.g. FFT, Geometric hashing), or/and – Restraints to relevant region (e.g. biological constraints, etc) • Challenge: conformational changes in the partners • Introduction of flexibility has improved modeling to high resolution – Full side chain flexibility (Rosetta) – Targeted introduction of backbone flexibility • Larger changes can be incorporated using techniques such as Normal Mode Analysis 52