to Protein Structure;       L12 ‐ Introduction

advertisement
• L12 ‐ Introduction to Protein Structure; Structure Comparison
& Classification
p
• L13 ‐ Predicting protein structure
• L14 ‐
L14 Predicting
di i protein
i interactions
i
i
g
y Networks
• L15 ‐ Gene Regulatory
• L16 ‐ Protein Interaction Networks
• L17 ‐ Computable Network Models
1
Predictingg Protein Structure
secondary structure
IQVFLSARPPAPEVSKIY
DNLILQYSPSKSLQMILR
RALGDFENMLADGSFR
AAPKSYPIPHTAFEKSIIV
QTSRMFPVSLIEAARNH
FDPLGLETARAFGHKLA
TAALACFFAREKATNS
domain structure
novel 3D structure
2
Courtesy of RCSB Protein Data Bank. Used with permission.
Statisticians vs.
vs Physicists
Courtesy of VADLO.com. Used with permission.
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
3
What were the key simplifications of the statistical approach?
Courtesy of VADLO.com. Used with permission.
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
4
Threading (fold recognition)
IQVFLSARPPAPEVSKIY
DNLILQYSPSKSLQMILR
RALGDFENMLADGSFR
AAPKSYPIPHTAFEKSIIV
QTSRMFPVSLIEAARNH
FDPLGLETARAFGHKLA
TAALACFFAREKATNS
How could we use the potential energy function to recognize the correct fold?
5
5
Courtesy of RCSB Protein Data Bank. Used with permission.
Methods for Refining Structures
1. Energy minimization
2. Molecular dynamics
3. Simulated Annealing
6
1 Energy Minimization
1.
© American Chemical Society. All rights reserved. This content is excluded from our Creative Commons license. For more information,
see http://ocw.mit.edu/help/faq-fair-use/.
Source: Baker, David, and David A. Agard. "Kinetics Versus Thermodynamics in Protein Folding." Biochemistry 33, no. 24 (1994): 7505-9.
7
Consider a small error in a structure
• True
T structure
• Misplaced
Mi l d side
id chain
h i
Courtesy of Swiss Institute of Bioinformatics. Used with permission.
ca o a e bo ds
cannot make h‐bonds
Examples from this good tutorial 8
Can
Can we
we re
resto
storre the
the side
side chain
chain??
Courtesy of Swiss Institute of Bioinformatics. Used with permission.
Energy
9
Angle
Minimization
• We have equations for U(x,y,z).
• Find nearby values of x,y,z that minimize U.
Courtesy of Swiss Institute of Bioinformatics. Used with permission.
Energy
10
Angle
G di
Gra
dientt Descent
D
t
f  x   0
http://mathworld.wolfram.com/MethodofSteepestDescent.html
11
Gradi
dientt Descent
D
t
xi  xi1   f   xi1 
12
http://mathworld.wolfram.com/MethodofSteepestDescent.html
Gradient di t Descent
D
t
xi  xi1   f   xi1 
Can require many Can
many iterations
http://mathworld.wolfram.com/MethodofSteepestDescent.html
One dimension
One
N dimensions
xi  xi1   f   xi1 
 

x1  x0  
U
U 0 ( x0 )
Where gradient is defined as
Since Force is F  U
 U
U 
U  
...,
,,...,

xn 
 x1
 

x1  x0   F0 (x
( 0)
each step
step is
is moving ing in
in the
the direction direction of of the
the force
Minimization
• Convergence can be a problem on some surfaces More surfaces.
More sophisticated
sophisticated approaches
approaches are are
available
• Our example used a continuous energy function,, but can be define for discrete optimization too. 15
Can Can we we restore the the side side chain?
Starting conforma
Starting
conformation
tion
Unsuccessful minimization
Unsuccessful minimization
Courtesy of Swiss Institute of Bioinformatics. Used with permission.
Energy
Good tutorial
Always limited to local search
Angle
2. Molecular Dynamics
2
Molecular Dynamics
• Seeks to simulate the motion of molecules
• Can
Can escape
escape local minima
minima
x (ti) = x (ti‐1) + v(ti‐1) × (ti − ti‐1)
F (ti 1 )
 (ti  ti 1 )
v(ti )  v(ti 1 ) 
m
U (ti 1 )
 (ti  ti 1 )
v (ti )  v (ti 1 ) 
m
Movie: Simulation of prot
Movie: Simulation of protein folding
17
Notes
• Short simulations take tremendous computing resources
• Length of simulation and protocol determine radius radius of of convergence
18
3. Simula
Simulatted Annealing
• Phyysical annealingg ‐
high temperature is used to avoid metal defects (local minima).
© Homesteading Self Sufficiency Survival. All rights reserved. This content
is excluded from our Creative Commons license. For more information,
see http://ocw.mit.edu/help/faq-fair-use/.
19
• Simulated annealingg is analogous, and can be app
pplied to manyy optimization problems
n.
Courtesy of Henry Wang. Used with permission.
Courtesy of Henry Wang. Used with permission.
3. Simulated
Simulated Annealing
• At higgh tempera
p ture the kinetic energy exceeds the potential energy
gy barrier
EEnergy
EEnergy
• At low temp
perature we cannot escape local minima
Angle
20
Angle
3 Simulated Annealing
3.
• Atoms find equilibrium
q
distribution at higher p
temperatures
• How can we find the equilibrium distribution p
of a complicated
potential function?
© Homesteading Self Sufficiency Survival. All rights reserved. This content
is excluded from our Creative Commons license. For more information,
see http://ocw.mit.edu/help/faq-fair-use/.
Courtesy of Henry Wang. Used with permission.
http://homesteadingsurvival.myshopify.com/p
19
roducts/115‐blacksmithing‐forging‐welding‐
metallurgy‐sword‐books‐on‐dvd‐rom
21
http://www.stanford.edu/~hwang41/mcmc.pn
http://www
stanford edu/~hwang41/mcmc pn
g
3.
3 Simulated
Simulated Annealing
• Start at higgh temp
perature
• Find most probable states
• Reduce Reduce temperature temperature to to trap
trap these
these states
© Homesteading Self Sufficiency Survival. All rights reserved. This content
is excluded from our Creative Commons license. For more information,
see http://ocw.mit.edu/help/faq-fair-use/.
Courtesy of Henry Wang. Used with permission.
http://homesteadingsurvival.myshopify.com/p
19
roducts/1
15‐blacksmithing‐forging‐welding‐
metallurgy‐sword‐books‐on‐dvd‐rom
22
ng41/mcmc.pn
http://www stanford.edu/~hwa
http://www.
stanford edu/~hwang41/mcmc
g
Metropolis Algorithm
• Goal: efficiently search a large conformation sp
pace.
• Can be understood in terms of physical processes, but
processes
but much
much more more general
• Note the difference from molecular dynamics:
– Molecules move under physical forces but temp
peratures are far outside of normal rangge
– A sampling method not a simulation!
23
Acceptance Criteria
Acceptance
• Randomly choose neighboring state:
– Alwayys accep
pt moves that reduce potential
– Go uphill (higher potential) based on odds ratio
 Etest / kT
 En / kT
P ( Stest) e
e
 ( Etest  E n ) / kT
/
e

P ( Sn)
Z (T )
Z (T )
1
24
3 Metropolis
3. Metropolis sampling
Iterate for a fixed number of cyycles or until convergence:
with energy
energy En
1. Start
Start with with a a system in in state state Sn with
2. Choose a neighboring state at random; we will call it the proposed state : Stest with energy Etest
3
3. If If Etest< < En : : Sn+1=Stest
( Etest  E ) / kT
4. Else set Sn+1= Stest with probability P  e
n
– otherwise Sn+1= Sn
25
1
Minimization vs. simulated annealing
26
P=e‐ΔG/kT
3
1
P=1
P=1
5
P=e‐ΔG/kT
2
27
4
Acceptance Criteria
Acceptance
• Always go down‐hill
• Go Go uphill uphill based based on on odds odds ratio
 Etest / kT
 En / kT
P ( Stest) e
e
(( Etest  E n ) / kT
k

/
e
P ( Sn)
Z (T )
Z (T )
How does T alter outcome?
28
Acceptance
Acceptance Criteria
• Always go down‐hill
• Go
Go uphill
uphill based
based on
on odds odds ratio
 Etest / kT
 En / kT
P ( Stest) e
e
(( Etest  E n ) / kT
k

/
e
P ( Sn)
Z (T )
Z (T )
Annealing schedule:
g
•Start at High T
•Lower sllowly
l
29
3.
3 Metropolis Metropolis sampling sampling – prob
prob. version
To identifyy minima given a probabilityy function: P(S)
( )
1. Start with a system in state Sn
2. Choose a neighboring state at random : Stest
P ( Stest )
3. Compute
Compute acceptance
acceptance ratio a 
P( S N )
4. If a> 1 : Sn+1=Stest
5. Else set Sn+1= Stest with probability a and Sn+1= Sn with p
probability 1‐a
Not specific to protein structure. Used to sample diverse probability
diverse
probability distributions
30
Review:
Methods for Refining Structures
1. Energy minimization
2. Molecular dynamics
3. Simulated annealing
31
Methods for Predicting Structure
IQVFLSARPPAPEVSKIY
DNLILQYSPSKSLQMILR
RALGDFENMLADGSFR
AAPKSYPIPHTAFEKSIIV
QTSRMFPVSLIEAARNH
FDPLGLETARAFGHKLA
TAALACFFAREKATNS
novel 3D structure
Courtesy of RCSB Protein Data Bank. Used with permission.
32
What actually works for structure prediction?
33
Rosetta
Raman et al. Proteins 2009; 77(Suppl 9):89–99.
http://onlinelibrary.wiley.com/doi/10.1002/prot.22540/full
• Two types of models:
–Homology
–de novo
34
Homology
• Allign query to sequences in PDB
• Use several aliggnment methods
• Three categories of queries:
1. High
High sequence
sequence similarity
similarity template(s) template(s) (>50% (>50%
sequence similarity).
2. Medium
Medium sequence
sequence similarity similarity template(s) (20
(20–50%
–50% sequence similarity).
3. Low Low sequence sequence similarity
similarity template(s) (<20% (<20%
sequence similarity).
35
Homology
• Align query to sequences in PDB
• Use Use se
several
veral alignment alignment methods
methods
• Refine models
36
tools you have seen earlier in the course
the
Gener
General
al Refinement
Refinement Procedure
Procedure
• Random changes to backbone torsion angles
• Rot
Rotamer
amer optimization
optimization of
of side
side chains
• Energy minimization of torsion angles (bond llengths h and anglles kkept fi
fixed)
d)
37
Homology
High sequence similarity template(s) (>50% seq
quence similarity)
y).
– Minimal refinement, focused on regions where alignment alignment is
is poor.
38
Homology
Medium sequence similarity template(s) (20–
50% seq
quence similarity)
y).
• Proceed with several alignments
• Refi
fine structures
• Choose best model byy final energy
gy
39
Homology
Med
dium sequence simillarity templlate((s)) ((20–
50% sequence similarity).
• Refinement focuses on regions near gaps and insertions,, loop
ps in the startingg model,, and sequence segments with low conservation
• Replaces
Replaces to
torsion
rsion angles with those from peptides of known structure
• Minimize
Mi i i llocall structure
t t
• Refine global structure
41
Native structure Best model Best temp
plate
© Wiley-Liss. All rights reserved. This content is excluded from our
Creative Commons license. For more information, see
http://ocw.mit.edu/help/faq-fair-use/.
Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for
CASP8 with All‐atom Refinement using Rosetta." Proteins: Structure,
Function, and Bioinformatics 77, no. S9 (2009): 89-99.
Accurate Accurate side side chains
chains in
in core
© Wiley-Liss. All rights reserved. This content is excluded from our Creative Commons license. For more information, see
http://ocw.mit.edu/help/faq-fair-use/.
Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for CASP8 with All‐atom Refinement using Rosetta."
Proteins: Structure, Function, and Bioinformatics 77, no. S9 (2009): 89-99.
42
Homology
Low sequence similarity template(s) (<20% seq
quence similarity)
y).
• Use many more starting models
• More aggressiive refi
finement strategy
– Rebuild secondary structure elements in addition to regions refined in medium homology:
• ggaps
p and insertions
• loops in the starting model
• reggions with low conservation
43
Native
Best
Model
Native
Best
Model
© Wiley-Liss. All rights reserved. This content is excluded from our Creative Commons license. For more information, see
http://ocw.mit.edu/help/faq-fair-use/.
Source: Raman, Srivatsan, Robert Vernon, et al. "Structure Prediction for CASP8 with All‐atom Refinement using Rosetta."
44
Proteins: Structure, Function, and Bioinformatics 77, no. S9 (2009): 89-99.
de novo
• When there is no suitable homology model:
– Monte Carlo search for backbone anggles
• Choose a short region (3‐9 amino acids) of backbone
• Set Set torsion
torsion angles
angles to to those those of
of aa similar
similar peptide peptide in
in PDB
• Accept with metropolis criteria
– 36
36,000
000 MC steps
– Repeat entire process to get 2,000 final structures
– Cluster structures
– Refine clusters
45
How has modeling changed during the CASP challenges?
46
Improvement over the last decade:
Percentage of residues successfully modeled CASP10 results compared to those of previous CASP experiments
Andriy Kryshtafovych, Krzysztof Fidelis, John Moult
DOI: 10.1002/prot.24448
% of residues modeled that were not in best
b template
Each point represents the best model for a target
CASP10 and CASP9 are similar, much better than CASP5.
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons
license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous
CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74.
47
Target difficulty ‐based on structural and sequence
similarity of a target to proteins of known structure
Overall Prediction Accuracy Did Not Improve
CASP10 results compared to those of previous CASP experiments
DOI: 10.1002/prot.24448
Global distance test
GDT TS GDT_TS
=overall accuracy of a model average % of Cα atoms in the prediction close to corresponding atoms in the target structure
Perfect model:
Random model:
90‐100
90
100
20‐30
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons license. For more information,
see http://ocw.mit.edu/help/faq-fair-use/.
Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous CASP
Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74.
48
Target difficulty is based on structural and sequence similarity of a target to proteins of known structure
Overall Prediction Accuracy Did Not Improve
CASP10 results compared to those of previous CASP experiments
DOI: 10.1002/prot.24448
Why is the trend line the same?
Are targets getting harder in other ways? Multi‐domain, multi‐chain
multi
‐chain, etc
etc.
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons
license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous
CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74.
49
Target difficulty is based on structural and sequence similarity of a target to proteins of known structure
Free modeling results
de
de novo
(no template)
50
Free Modelingg in Flux
CASP10 results compared to those of previous CASP experiments
Andriy Kryshtafovych, Krzysztof Fidelis, John Moult
DOI: 10.1002/prot.24448
Global distance test
GDT_TS =overall accuracy of a model average percentage of Cα atoms in the prediction close to corresponding atoms in the target structure
Perfect model: 90‐100
Random model:
20‐30
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons
license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous
CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74.
51
Free Modelingg in Flux
CASP10 results compared to those of previous CASP experiments
Andriy Kryshtafovych, Krzysztof Fidelis, John Moult
DOI: 10.1002/prot.24448
GDT_TS Perfect model: 90‐100
Random model:20‐30
•CASP9 results for <120 AA were great 5/11 had GDT >60.
great.
>60
•CASP10 were mediocre (three models
d l >60
60 b
but ffour <40)
0)
•CASP5 had only 1/5 >60
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative Commons
license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Kryshtafovych, Andriy, Krzysztof Fidelis, et al. "CASP10 Results Compared to those of Previous
CASP Experiments." Proteins: Structure, Function, and Bioinformatics 82, no. S2 (2014): 164-74.
52
Free Modelingg in Flux
CASP10 results compared to those of previous CASP experiments
Andriy Kryshtafovych, Krzysztof Fidelis, John Moult
DOI: 10.1002/prot.24448
“Current FM [free modeling] methods perform best on single domain g
pp
progress
g
regular
structures... The apparent
lack of p
in CASP10 and ROLL compared with CASP5 probably again reflects the more difficult nature of CASP10 targets. First, many targets which in CASP5 would have been in this category now have templates … CASP10 FM targets
g exhibit more irregularity,
g
y, and more of a tendencyy to be domains of larger proteins that are hard to identify from sequence and that may be dependent on the rest of the structure for their conformation.
53
Statisticians Statisticians vs
vs. Physicists
Physicists
Rosetta
Courtesy of VADLO.com. Used with permission.
54
• Leverage everyythingg we know about existing p
structures of proteins
and peptides to build startingg models
• Refine using a knowledge based knowledge‐based
potential
Statisticians vs. Physicists
DE Shaw
• DON’T CHEAT!
• Only use physical forces
forces.
• Fold proteins by simulating the
simulating
the in
in vitro vitro
process
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
55
DE Shaw
• Lindorff‐Larsen et al. (2011)) Science
• Simulate protein folding.
• Why Wh had h d no one else l succeeded d d at this?
hi ?
Courtesy of Nature Publishing Group. Used with permission.
Source: Dill, Ken A. and Hue Sun Chan. "From Levinthal to Pathways
to Funnels." Nature Structural Biology 4, no. 1 (1997): 10-9.
56
DE Shaw
• Lindorff‐Larsen et al. ((2011)) Science
• Simulate protein folding.
• Built
B il a specialized
i li d supercomputer
– Hundreds of application specific integrated circuits
© The ACM. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Shaw, David E., Martin M. Deneroff, et al. "Anton, A Special-purpose Machine for Molecular Dynamics Simulation." Communications of the ACM 51,
no. 7 (2008): 91-7.
57
58
© American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Lindorff-Larsen, Kresten, Stefano Piana, et al. "How Fast-folding Proteins Fold." Science 334, no. 6055 (2011): 517-20.
FoldIT Game
© The New York Times Company. All rights reserved. This content is excluded from our
Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Markoff, John. "In a Video Game, Tackling the Complexities of Protein Folding." The
59
New York Times. August 4, 2010.
Predictions
So far: protein structure
Next: protein interactions
© American Association for the Advancement of Science. All rights
reserved. This content is excluded from our Creative Commons license.
For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Lindorff-Larsen, Kresten, Stefano Piana, et al. "How Fast-folding
Proteins Fold." Science 334, no. 6055 (2011): 517-20.
60
© source unknown. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see
http://ocw.mit.edu/help/faq-fair-use/.
Prediction Challenges
• Predict effect of point mutations
• Predict structure of complexes
• Predict all interacting proteins
61
•DOI: 10.1002/prot.24356
“Simple”
Simple challenge:
Starting with known structure of a complex:
predict how much a binding
mutation
i changes
h
bi di affinity.
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for
Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure,
Function, and Bioinformatics 81, no. 11 (2013): 1980-7.
62
•DOI: 10.1002/prot.24356
•All possible single‐point mutations at each of 53 and 45 positions for two proteins. •Expressed on yeast
•High‐throughput assay based on sequencing used to estimate changes in binding affinity
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for
Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure,
Function, and Bioinformatics 81, no. 11 (2013): 1980-7.
63
•DOI: 10.1002/prot.24356
How could we make quantitative predictions of binding energy for mutants?
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for
Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure,
Function, and Bioinformatics 81, no. 11 (2013): 1980-7.
64
Color based on predictions
O
Observ
ved
improved
neutral
reduced
Note:
B tt att predicting
Better
di ti deleterious mutations
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for
Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure,
Function, and Bioinformatics 81, no. 11 (2013): 1980-7.
interface!
This is one of the top performers analyzing
residues
at the
65
•DOI: 10.1002/prot.24356
Top performer
Average group
All site
es
improved
neutrall
reduced
Interfacce
Color based on participant’s predictions
66
© Wiley Periodicals, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Moretti, Rocco, Sarel J. Fleishman, et al. "Community‐wide Evaluation of Methods for
Predicting the Effect of Mutations on Protein–protein Interactions." Proteins: Structure,
Function, and Bioinformatics 81, no. 11 (2013): 1980-7.
•DOI: 10.1002/prot.24356
What’s a good “baseline” for modeling?
• Does structure/energy help?
67
What’s a good “baseline” for modeling?
• Does structure/energy help?
• Naïve model:
– Give each mutant a score equal to the BLOSUM matrix value ((‐4
4 to 11)
– As we vary the cutoff, how many mutations do we predict
di correctly?
l ?
68
Area under curve for predictions (varying cutoff in ranking)
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
•DOI: 10.1002/prot.24356
69
Predicted to be deleterious
Predicted to be beneficial
Comparing one of the best to BLOSUM
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
•DOI: 10.1002/prot.24356
70
Predicted to be deleterious
Predicted to be beneficial
Area under curve for predictions (varying cutoff in ranking)
First Round
Second Round.
Round (Given data for nine random mutations at each position)
BLOSUM
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
•DOI: 10.1002/prot.24356
71
Summary
• Best groups are only three times better than expected
from a random assignment.
p
g
• Predicting the effect of mutations on polar starting positions appears to be a particular challenge.
72
Summary
• Best approaches require explicit consideration of the effects of mutations on stabilityy
A + B 
 AB
*
A + B* 
AB

73
Summary
• Best approaches require explicit consideration of the effects of mutations on stabilityy
unfolded 
 A + B 
 AB
*  AB*
A
+
B
unfolded 


For more details see http://ocw.mit.edu/courses/biological‐engineering/20‐320‐analysis‐of‐biomolecular‐and‐cellular‐systems‐fall‐2012/modeling‐and‐manipulating‐biomolecular‐interactions/MIT20_320F12_Tpc_3_Mol_Des.pdf
74
Summary
• Best approaches
h require explicit
l consideration
d
of the effects of mutations on stability
• The best performing groups also modeled packing,
p
g, electrostatics and solvation. • The best methods used : – machine learning (G21,
(G21 Fernandez‐Recio,
Fernandez Recio and G05s, Bates)
– atom‐level
atom level energy functions (G15,
(G15 Weng)
– coarse‐grained models (G21s, Dehouck) 75
G21
• Database of 930 (ΔΔG,mutation) pairs
• Predict structure with FoldX (empirical force field)
• Describe
ib each
h mutant with
i h 85
8 features
f
i using
measures from FoldX, PyRosetta, FireDoc, PyDoc, SIPPER, CHARMM, NIP/NSC and others.
76
G21
• Train learners:
l
random
d fforest, neurall networks, probabilistic classifiers, etc.
• Evaluate with cross‐validation
• Use combined results from five classifiers:
– Random forest
– Decision table
– Bayesian net
– Logistic regression
– Alternating decision tree
77
Prediction Challenges
• Predict effect of point mutations
• Predict structure of complexes
• Predict all interacting proteins
78
Predicting Structures of Complexes
• Can we use structural data to predict complexes?
p
p
• This might be easier than quantitative predictions for site mutants.
• But it requires us to solve a docking problem
© source unknown. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see
http://ocw.mit.edu/help/faq-fair-use/.
79
Docking
Which surface(s) of protein A interactions with which surface of protein B?
Courtesy of Nurcan Tuncbag. Used with permission.
N. Tuncbag
80
Time is an issue
• Imagine we wanted to predict which proteins interact with our favorite molecule.
– For each potential partner
• Evaluate all possible relative positions and orientations
– allow for structural rearrangements
» measure energy of interaction
• This approach would be extremely slow!
• It’s
It’ also
l prone tto ffalse
l positives.
iti
– Why?
81
Reducing the search space
• Use prior knowledge of interfaces to focus analysis
residues
y on particular
p
• Find ways to choose potential partners
– What
Wh t role
l should
h ld structural
t t l homology
h
l play?
l ?
82
Subtilisin and its inhibitors
Although global folds of Subtilisin’s partners are very different, binding regions are structurally very conserved.
N. Tuncbag
Courtesy of Nurcan Tuncbag. Used with permission.
83
Hotspots
Figure from Clackson & Wells (1995).
84
© American Association for the Advancement of Science. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Clackson, Tim, and James A. Wells. "A Hot Spot of Binding Energy in a Hormone-Receptor Interface." Science
267, no. 5196 (1995): 383-6.
Hotspots
© American Association for the Advancement of Science. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Clackson, Tim, and James A. Wells. "A Hot Spot of Binding Energy in a Hormone-Receptor Interface."
Science 267, no. 5196 (1995): 383-6.
85
Figure from Clackson & Wells (1995).
• Fewer than 10% of the residues at an interface contribute more than 2 kcal/mol to binding. • Hot spots – rich in Trp, Arg and Tyr – occur on pockets on the two proteins that have plementaryy shap
pes and distributions of comp
charged and hydrophobic residues. – can include buried charge residues far from solvent
– O‐ring
ring structure excludes solvent from interface
http://onlinelibrary.wiley.com/doi/10.1002/prot.21396/full
86
Next Lecture
Fast and accurate modeling of protein‐protein i t
interactions
ti by combining template‐
interface based docking interface‐based
with flexible refinement.
Tuncbag N,
N Keskin O,
O Nussinov R, Gursoy A.
Structure‐based prediction of protein–protein i t
interactions
ti on a genome‐wide scale
Zhang, et al. http://www.nature.com/nat
ure/journal/v490/n7421/f
http://www.ncbi.nlm.nih.go
http://www
ncbi nlm nih go
ull/nature11503.html
v/pubmed/22275112
87
MIT OpenCourseWare
http://ocw.mit.edu
7.91J / 20.490J / 20.390J / 7.36J / 6.802J / 6.874J / HST.506J Foundations of Computational and Systems Biology
Spring 2014
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download