• L12 ‐ Introduction to Protein Structure; Structure Comparison & Classification p • L13 ‐ Predicting protein structure • L14 ‐ L14 Predicting di i protein i interactions i i g y Networks • L15 ‐ Gene Regulatory • L16 ‐ Protein Interaction Networks • L17 ‐ Computable Network Models 1 Predictingg Protein Structure secondary structure IQVFLSARPPAPEVSKIY DNLILQYSPSKSLQMILR RALGDFENMLADGSFR AAPKSYPIPHTAFEKSIIV QTSRMFPVSLIEAARNH FDPLGLETARAFGHKLA TAALACFFAREKATNS domain structure novel 3D structure 2 Courtesy of RCSB Protein Data Bank. Used with permission. Statisticians vs. vs Physicists Courtesy of VADLO.com. Used with permission. © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 3 What were the key simplifications of the statistical approach? Courtesy of VADLO.com. Used with permission. © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 4 Threading (fold recognition) IQVFLSARPPAPEVSKIY DNLILQYSPSKSLQMILR RALGDFENMLADGSFR AAPKSYPIPHTAFEKSIIV QTSRMFPVSLIEAARNH FDPLGLETARAFGHKLA TAALACFFAREKATNS How could we use the potential energy function to recognize the correct fold? 5 5 Courtesy of RCSB Protein Data Bank. Used with permission. Methods for Refining Structures 1. Energy minimization 2. Molecular dynamics 3. Simulated Annealing 6 1 Energy Minimization 1. © American Chemical Society. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Baker, David, and David A. Agard. "Kinetics Versus Thermodynamics in Protein Folding." Biochemistry 33, no. 24 (1994): 7505-9. 7 Consider a small error in a structure • True T structure • Misplaced Mi l d side id chain h i Courtesy of Swiss Institute of Bioinformatics. Used with permission. ca o a e bo ds cannot make h‐bonds Examples from this good tutorial 8 Can Can we we re resto storre the the side side chain chain?? Courtesy of Swiss Institute of Bioinformatics. Used with permission. Energy 9 Angle Minimization • We have equations for U(x,y,z). • Find nearby values of x,y,z that minimize U. Courtesy of Swiss Institute of Bioinformatics. Used with permission. Energy 10 Angle G di Gra dientt Descent D t f x 0 http://mathworld.wolfram.com/MethodofSteepestDescent.html 11 Gradi dientt Descent D t xi xi1 f xi1 12 http://mathworld.wolfram.com/MethodofSteepestDescent.html Gradient di t Descent D t xi xi1 f xi1 Can require many Can many iterations http://mathworld.wolfram.com/MethodofSteepestDescent.html One dimension One N dimensions xi xi1 f xi1 x1 x0 U U 0 ( x0 ) Where gradient is defined as Since Force is F U U U U ..., ,,..., xn x1 x1 x0 F0 (x ( 0) each step step is is moving ing in in the the direction direction of of the the force Minimization • Convergence can be a problem on some surfaces More surfaces. More sophisticated sophisticated approaches approaches are are available • Our example used a continuous energy function,, but can be define for discrete optimization too. 15 Can Can we we restore the the side side chain? Starting conforma Starting conformation tion Unsuccessful minimization Unsuccessful minimization Courtesy of Swiss Institute of Bioinformatics. Used with permission. Energy Good tutorial Always limited to local search Angle 2. Molecular Dynamics 2 Molecular Dynamics • Seeks to simulate the motion of molecules • Can Can escape escape local minima minima x (ti) = x (ti‐1) + v(ti‐1) × (ti − ti‐1) F (ti 1 ) (ti ti 1 ) v(ti ) v(ti 1 ) m U (ti 1 ) (ti ti 1 ) v (ti ) v (ti 1 ) m Movie: Simulation of prot Movie: Simulation of protein folding 17 Notes • Short simulations take tremendous computing resources • Length of simulation and protocol determine radius radius of of convergence 18 3. Simula Simulatted Annealing • Phyysical annealingg ‐ high temperature is used to avoid metal defects (local minima). © Homesteading Self Sufficiency Survival. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 19 • Simulated annealingg is analogous, and can be app pplied to manyy optimization problems n. Courtesy of Henry Wang. Used with permission. Courtesy of Henry Wang. Used with permission. 3. Simulated Simulated Annealing • At higgh tempera p ture the kinetic energy exceeds the potential energy gy barrier EEnergy EEnergy • At low temp perature we cannot escape local minima Angle 20 Angle 3 Simulated Annealing 3. • Atoms find equilibrium q distribution at higher p temperatures • How can we find the equilibrium distribution p of a complicated potential function? © Homesteading Self Sufficiency Survival. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Courtesy of Henry Wang. Used with permission. http://homesteadingsurvival.myshopify.com/p 19 roducts/115‐blacksmithing‐forging‐welding‐ metallurgy‐sword‐books‐on‐dvd‐rom 21 http://www.stanford.edu/~hwang41/mcmc.pn http://www stanford edu/~hwang41/mcmc pn g 3. 3 Simulated Simulated Annealing • Start at higgh temp perature • Find most probable states • Reduce Reduce temperature temperature to to trap trap these these states © Homesteading Self Sufficiency Survival. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Courtesy of Henry Wang. Used with permission. http://homesteadingsurvival.myshopify.com/p 19 roducts/1 15‐blacksmithing‐forging‐welding‐ metallurgy‐sword‐books‐on‐dvd‐rom 22 ng41/mcmc.pn http://www stanford.edu/~hwa http://www. stanford edu/~hwang41/mcmc g Metropolis Algorithm • Goal: efficiently search a large conformation sp pace. • Can be understood in terms of physical processes, but processes but much much more more general • Note the difference from molecular dynamics: – Molecules move under physical forces but temp peratures are far outside of normal rangge – A sampling method not a simulation! 23 Acceptance Criteria Acceptance • Randomly choose neighboring state: – Alwayys accep pt moves that reduce potential – Go uphill (higher potential) based on odds ratio Etest / kT En / kT P ( Stest) e e ( Etest E n ) / kT / e P ( Sn) Z (T ) Z (T ) 1 24 3 Metropolis 3. Metropolis sampling Iterate for a fixed number of cyycles or until convergence: with energy energy En 1. Start Start with with a a system in in state state Sn with 2. Choose a neighboring state at random; we will call it the proposed state : Stest with energy Etest 3 3. If If Etest< < En : : Sn+1=Stest ( Etest E ) / kT 4. Else set Sn+1= Stest with probability P e n – otherwise Sn+1= Sn 25 1 Minimization vs. simulated annealing 26 P=e‐ΔG/kT 3 1 P=1 P=1 5 P=e‐ΔG/kT 2 27 4 Acceptance Criteria Acceptance • Always go down‐hill • Go Go uphill uphill based based on on odds odds ratio Etest / kT En / kT P ( Stest) e e (( Etest E n ) / kT k / e P ( Sn) Z (T ) Z (T ) How does T alter outcome? 28 Acceptance Acceptance Criteria • Always go down‐hill • Go Go uphill uphill based based on on odds odds ratio Etest / kT En / kT P ( Stest) e e (( Etest E n ) / kT k / e P ( Sn) Z (T ) Z (T ) Annealing schedule: g •Start at High T •Lower sllowly l 29 3. 3 Metropolis Metropolis sampling sampling – prob prob. version To identifyy minima given a probabilityy function: P(S) ( ) 1. Start with a system in state Sn 2. Choose a neighboring state at random : Stest P ( Stest ) 3. Compute Compute acceptance acceptance ratio a P( S N ) 4. If a> 1 : Sn+1=Stest 5. Else set Sn+1= Stest with probability a and Sn+1= Sn with p probability 1‐a Not specific to protein structure. Used to sample diverse probability diverse probability distributions 30 Review: Methods for Refining Structures 1. Energy minimization 2. Molecular dynamics 3. Simulated annealing 31 Methods for Predicting Structure IQVFLSARPPAPEVSKIY DNLILQYSPSKSLQMILR RALGDFENMLADGSFR AAPKSYPIPHTAFEKSIIV QTSRMFPVSLIEAARNH FDPLGLETARAFGHKLA TAALACFFAREKATNS novel 3D structure Courtesy of RCSB Protein Data Bank. Used with permission. 32 What actually works for structure prediction? 33 Rosetta Raman et al. Proteins 2009; 77(Suppl 9):89–99. http://onlinelibrary.wiley.com/doi/10.1002/prot.22540/full • Two types of models: –Homology –de novo 34 Homology • Allign query to sequences in PDB • Use several aliggnment methods • Three categories of queries: 1. High High sequence sequence similarity similarity template(s) template(s) (>50% (>50% sequence similarity). 2. Medium Medium sequence sequence similarity similarity template(s) (20 (20–50% –50% sequence similarity). 3. Low Low sequence sequence similarity similarity template(s) (<20% (<20% sequence similarity). 35 Homology • Align query to sequences in PDB • Use Use se several veral alignment alignment methods methods • Refine models 36 tools you have seen earlier in the course the Gener General al Refinement Refinement Procedure Procedure • Random changes to backbone torsion angles • Rot Rotamer amer optimization optimization of of side side chains • Energy minimization of torsion angles (bond llengths h and anglles kkept fi fixed) d) 37 Homology High sequence similarity template(s) (>50% seq quence similarity) y). – Minimal refinement, focused on regions where alignment alignment is is poor. 38 Homology Medium sequence similarity template(s) (20– 50% seq quence similarity) y). • Proceed with several alignments • Refi fine structures • Choose best model byy final energy gy 39 Homology Med dium sequence simillarity templlate((s)) ((20– 50% sequence similarity). • Refinement focuses on regions near gaps and insertions,, loop ps in the startingg model,, and sequence segments with low conservation • Replaces Replaces to torsion rsion angles with those from peptides of known structure • Minimize Mi i i llocall structure t t • Refine global structure 41 Native structure Best model Best temp plate © Wiley-Liss. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for CASP8 with All‐atom Refinement using Rosetta." Proteins: Structure, Function, and Bioinformatics 77, no. S9 (2009): 89-99. Accurate Accurate side side chains chains in in core © Wiley-Liss. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for CASP8 with All‐atom Refinement using Rosetta." Proteins: Structure, Function, and Bioinformatics 77, no. S9 (2009): 89-99. 42 Homology Low sequence similarity template(s) (<20% seq quence similarity) y). • Use many more starting models • More aggressiive refi finement strategy – Rebuild secondary structure elements in addition to regions refined in medium homology: • ggaps p and insertions • loops in the starting model • reggions with low conservation 43 Native Best Model Native Best Model © Wiley-Liss. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for CASP8 with All‐atom Refinement using Rosetta." 44 Proteins: Structure, Function, and Bioinformatics 77, no. S9 (2009): 89-99. de novo • When there is no suitable homology model: – Monte Carlo search for backbone anggles • Choose a short region (3‐9 amino acids) of backbone • Set Set torsion torsion angles angles to to those those of of aa similar similar peptide peptide in in PDB • Accept with metropolis criteria – 36 36,000 000 MC steps – Repeat entire process to get 2,000 final structures – Cluster structures – Refine clusters 45 How has modeling changed during the CASP challenges? 46 Improvement over the last decade: Percentage of residues successfully modeled CASP10 results compared to those of previous CASP experiments Andriy Kryshtafovych, Krzysztof Fidelis, John Moult DOI: 10.1002/prot.24448 % of residues modeled that were not in best b template Each point represents the best model for a target CASP10 and CASP9 are similar, much better than CASP5. © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74. 47 Target difficulty ‐based on structural and sequence similarity of a target to proteins of known structure Overall Prediction Accuracy Did Not Improve CASP10 results compared to those of previous CASP experiments DOI: 10.1002/prot.24448 Global distance test GDT TS GDT_TS =overall accuracy of a model average % of Cα atoms in the prediction close to corresponding atoms in the target structure Perfect model: Random model: 90‐100 90 100 20‐30 © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74. 48 Target difficulty is based on structural and sequence similarity of a target to proteins of known structure Overall Prediction Accuracy Did Not Improve CASP10 results compared to those of previous CASP experiments DOI: 10.1002/prot.24448 Why is the trend line the same? Are targets getting harder in other ways? Multi‐domain, multi‐chain multi ‐chain, etc etc. © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74. 49 Target difficulty is based on structural and sequence similarity of a target to proteins of known structure Free modeling results de de novo (no template) 50 Free Modelingg in Flux CASP10 results compared to those of previous CASP experiments Andriy Kryshtafovych, Krzysztof Fidelis, John Moult DOI: 10.1002/prot.24448 Global distance test GDT_TS =overall accuracy of a model average percentage of Cα atoms in the prediction close to corresponding atoms in the target structure Perfect model: 90‐100 Random model: 20‐30 © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74. 51 Free Modelingg in Flux CASP10 results compared to those of previous CASP experiments Andriy Kryshtafovych, Krzysztof Fidelis, John Moult DOI: 10.1002/prot.24448 GDT_TS Perfect model: 90‐100 Random model:20‐30 •CASP9 results for <120 AA were great 5/11 had GDT >60. great. >60 •CASP10 were mediocre (three models d l >60 60 b but ffour <40) 0) •CASP5 had only 1/5 >60 © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74. 52 Free Modelingg in Flux CASP10 results compared to those of previous CASP experiments Andriy Kryshtafovych, Krzysztof Fidelis, John Moult DOI: 10.1002/prot.24448 “Current FM [free modeling] methods perform best on single domain g pp progress g regular structures... The apparent lack of p in CASP10 and ROLL compared with CASP5 probably again reflects the more difficult nature of CASP10 targets. First, many targets which in CASP5 would have been in this category now have templates … CASP10 FM targets g exhibit more irregularity, g y, and more of a tendencyy to be domains of larger proteins that are hard to identify from sequence and that may be dependent on the rest of the structure for their conformation. 53 Statisticians Statisticians vs vs. Physicists Physicists Rosetta Courtesy of VADLO.com. Used with permission. 54 • Leverage everyythingg we know about existing p structures of proteins and peptides to build startingg models • Refine using a knowledge based knowledge‐based potential Statisticians vs. Physicists DE Shaw • DON’T CHEAT! • Only use physical forces forces. • Fold proteins by simulating the simulating the in in vitro vitro process © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 55 DE Shaw • Lindorff‐Larsen et al. (2011)) Science • Simulate protein folding. • Why Wh had h d no one else l succeeded d d at this? hi ? Courtesy of Nature Publishing Group. Used with permission. Source: Dill, Ken A. and Hue Sun Chan. "From Levinthal to Pathways to Funnels." Nature Structural Biology 4, no. 1 (1997): 10-9. 56 DE Shaw • Lindorff‐Larsen et al. ((2011)) Science • Simulate protein folding. • Built B il a specialized i li d supercomputer – Hundreds of application specific integrated circuits © The ACM. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Shaw, David E., Martin M. Deneroff, et al. "Anton, A Special-purpose Machine for Molecular Dynamics Simulation." Communications of the ACM 51, no. 7 (2008): 91-7. 57 58 © American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Lindorff-Larsen, Kresten, Stefano Piana, et al. "How Fast-folding Proteins Fold." Science 334, no. 6055 (2011): 517-20. FoldIT Game © The New York Times Company. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Markoff, John. "In a Video Game, Tackling the Complexities of Protein Folding." The 59 New York Times. August 4, 2010. Predictions So far: protein structure Next: protein interactions © American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Lindorff-Larsen, Kresten, Stefano Piana, et al. "How Fast-folding Proteins Fold." Science 334, no. 6055 (2011): 517-20. 60 © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Prediction Challenges • Predict effect of point mutations • Predict structure of complexes • Predict all interacting proteins 61 •DOI: 10.1002/prot.24356 “Simple” Simple challenge: Starting with known structure of a complex: predict how much a binding mutation i changes h bi di affinity. © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure, Function, and Bioinformatics 81, no. 11 (2013): 1980-7. 62 •DOI: 10.1002/prot.24356 •All possible single‐point mutations at each of 53 and 45 positions for two proteins. •Expressed on yeast •High‐throughput assay based on sequencing used to estimate changes in binding affinity © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure, Function, and Bioinformatics 81, no. 11 (2013): 1980-7. 63 •DOI: 10.1002/prot.24356 How could we make quantitative predictions of binding energy for mutants? © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure, Function, and Bioinformatics 81, no. 11 (2013): 1980-7. 64 Color based on predictions O Observ ved improved neutral reduced Note: B tt att predicting Better di ti deleterious mutations © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure, Function, and Bioinformatics 81, no. 11 (2013): 1980-7. interface! This is one of the top performers analyzing residues at the 65 •DOI: 10.1002/prot.24356 Top performer Average group All site es improved neutrall reduced Interfacce Color based on participant’s predictions 66 © Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure, Function, and Bioinformatics 81, no. 11 (2013): 1980-7. •DOI: 10.1002/prot.24356 What’s a good “baseline” for modeling? • Does structure/energy help? 67 What’s a good “baseline” for modeling? • Does structure/energy help? • Naïve model: – Give each mutant a score equal to the BLOSUM matrix value ((‐4 4 to 11) – As we vary the cutoff, how many mutations do we predict di correctly? l ? 68 Area under curve for predictions (varying cutoff in ranking) © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. •DOI: 10.1002/prot.24356 69 Predicted to be deleterious Predicted to be beneficial Comparing one of the best to BLOSUM © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. •DOI: 10.1002/prot.24356 70 Predicted to be deleterious Predicted to be beneficial Area under curve for predictions (varying cutoff in ranking) First Round Second Round. Round (Given data for nine random mutations at each position) BLOSUM © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. •DOI: 10.1002/prot.24356 71 Summary • Best groups are only three times better than expected from a random assignment. p g • Predicting the effect of mutations on polar starting positions appears to be a particular challenge. 72 Summary • Best approaches require explicit consideration of the effects of mutations on stabilityy A + B AB * A + B* AB 73 Summary • Best approaches require explicit consideration of the effects of mutations on stabilityy unfolded A + B AB * AB* A + B unfolded For more details see http://ocw.mit.edu/courses/biological‐engineering/20‐320‐analysis‐of‐biomolecular‐and‐cellular‐systems‐fall‐2012/modeling‐and‐manipulating‐biomolecular‐interactions/MIT20_320F12_Tpc_3_Mol_Des.pdf 74 Summary • Best approaches h require explicit l consideration d of the effects of mutations on stability • The best performing groups also modeled packing, p g, electrostatics and solvation. • The best methods used : – machine learning (G21, (G21 Fernandez‐Recio, Fernandez Recio and G05s, Bates) – atom‐level atom level energy functions (G15, (G15 Weng) – coarse‐grained models (G21s, Dehouck) 75 G21 • Database of 930 (ΔΔG,mutation) pairs • Predict structure with FoldX (empirical force field) • Describe ib each h mutant with i h 85 8 features f i using measures from FoldX, PyRosetta, FireDoc, PyDoc, SIPPER, CHARMM, NIP/NSC and others. 76 G21 • Train learners: l random d fforest, neurall networks, probabilistic classifiers, etc. • Evaluate with cross‐validation • Use combined results from five classifiers: – Random forest – Decision table – Bayesian net – Logistic regression – Alternating decision tree 77 Prediction Challenges • Predict effect of point mutations • Predict structure of complexes • Predict all interacting proteins 78 Predicting Structures of Complexes • Can we use structural data to predict complexes? p p • This might be easier than quantitative predictions for site mutants. • But it requires us to solve a docking problem © source unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. 79 Docking Which surface(s) of protein A interactions with which surface of protein B? Courtesy of Nurcan Tuncbag. Used with permission. N. Tuncbag 80 Time is an issue • Imagine we wanted to predict which proteins interact with our favorite molecule. – For each potential partner • Evaluate all possible relative positions and orientations – allow for structural rearrangements » measure energy of interaction • This approach would be extremely slow! • It’s It’ also l prone tto ffalse l positives. iti – Why? 81 Reducing the search space • Use prior knowledge of interfaces to focus analysis residues y on particular p • Find ways to choose potential partners – What Wh t role l should h ld structural t t l homology h l play? l ? 82 Subtilisin and its inhibitors Although global folds of Subtilisin’s partners are very different, binding regions are structurally very conserved. N. Tuncbag Courtesy of Nurcan Tuncbag. Used with permission. 83 Hotspots Figure from Clackson & Wells (1995). 84 © American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Clackson, Tim, and James A. Wells. "A Hot Spot of Binding Energy in a Hormone-Receptor Interface." Science 267, no. 5196 (1995): 383-6. Hotspots © American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/. Source: Clackson, Tim, and James A. Wells. "A Hot Spot of Binding Energy in a Hormone-Receptor Interface." Science 267, no. 5196 (1995): 383-6. 85 Figure from Clackson & Wells (1995). • Fewer than 10% of the residues at an interface contribute more than 2 kcal/mol to binding. • Hot spots – rich in Trp, Arg and Tyr – occur on pockets on the two proteins that have plementaryy shap pes and distributions of comp charged and hydrophobic residues. – can include buried charge residues far from solvent – O‐ring ring structure excludes solvent from interface http://onlinelibrary.wiley.com/doi/10.1002/prot.21396/full 86 Next Lecture Fast and accurate modeling of protein‐protein i t interactions ti by combining template‐ interface based docking interface‐based with flexible refinement. Tuncbag N, N Keskin O, O Nussinov R, Gursoy A. Structure‐based prediction of protein–protein i t interactions ti on a genome‐wide scale Zhang, et al. http://www.nature.com/nat ure/journal/v490/n7421/f http://www.ncbi.nlm.nih.go http://www ncbi nlm nih go ull/nature11503.html v/pubmed/22275112 87 MIT OpenCourseWare http://ocw.mit.edu 7.91J / 20.490J / 20.390J / 7.36J / 6.802J / 6.874J / HST.506J Foundations of Computational and Systems Biology Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.