Lecture 6

advertisement
6. Homology Modeling
Prediction of structure from sequence
Flowchart
Comparison of query sequence to nr database
Similar to a sequence of known structure?
No
Yes
Fold Recognition
(Threading)
Homology Modeling
(Comparative
Modeling)
Yes
Fits a known fold?
No
Ab initio prediction
Homology modeling
4 steps:
1. Detect template
2. Align sequence onto template
3. Build model (loop modeling)
4. Refine model (relax)
Errors in comparative modeling
a. Wrong side chain
conformations
b. Small backbone
deviations
c. Wrong loop
modeling
d. Wrong alignment
e. Wrong template
(Marti-Renom & Sali, 2000)
4
Homology modeling
4 steps:
1. Detect template
2. Align sequence onto template
3. Build model (loop modeling)
4. Refine model (relax)
Sequence-structure identity
depends on length of protein
No dissimilar pairs above the
threshold line
6
Sander & Schneider, 1991
Template matching
• Given: sequence
• Wanted:
• State of the art
protocols include
– more sophisticated
searches
– additional information
• Easiest approach:
for improved template
– Blast / Psiblast against sequences selection
with known structure
– Profile-profile
– select template based on
comparison (HHSEARCH)
sequence identity
– Seq-structure
• >70%: straight forward
compatibility (Threading:
• ~40-50%: usually clear
RAPTOR)
– structural template
– sequence-structure alignment
• lower seqid: alignment is challenge
Alignment step is critical
Sequence-sequence alignment
• Information content:
1.Sequence
2.Profile (Position specific scoring matrix -PSSM)
aa preferences for each position
3.Hidden Markov Models (HMM)
Contains in addition position-specific in/del penalties
M match
D deletion
I insertion
Sequence-sequence alignment
• Information content:
– Sequence-sequence comparison
• e.g. BLAST
– Profile-sequence comparison
• e.g. PSI-BLAST
– Profile-profile comparison
• e.g. LAMA, PROF_SIM, COMPASS
– HMM-HMM comparison
• e.g. HHSEARCH
More information – increased sensitivity
in detecting template
No new folds & superfamilies lately ->
template available for everyone
# of unique
folds
SCOP Folds
 # of folds
 # of new folds
~1400 folds
No new folds in the last
years!!
# of unique
superfamilies
Year
SCOP Superfamilies
 # of superfamilies
 # of new superfamilies
~2300 superfamilies
No new superfamilies
in the last years!!
Year
Many sequences – few folds: How can I detect my fold?10
Additional ways to include
structural information: Threading
4 E
Evaluate compatibility
of sequence with fold,
based on pairwise
residue potentials
Essential components:
• structural template
• neighbor definition
• energy function
C 3
C 2
A1
10
5
C
9
6
A
8
7 D
Eab
A
E = Eaibj
C
positions i,j
D
ACCECADAAC
E
-3-1-4-4-1-4-3-3=-23.
S
A C D
-3 -1
-1 -4
0 1
0 2
. .
C
A
A
E …..
0 0
1 2
5 6
6 7
. .
..
..
..
..
11
Threading (fold recognition): Find
best template for given sequence
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
RAPTOR
State of the art threading method of choice
• Successful for “low-homology” proteins (few
homolog sequences – low entropy in
alignment)
• State-of-the art threading protocol: uses linear
programming to efficiently find best seq-str
threading (linear combination of regression trees)
• Optimizes use of several templates
http://raptorx.uchicago.edu/
Jian Peng and Jinbo Xu. RaptorX: exploiting structure information for protein alignment by statistical inference.
PROTEINS, 2011; A multiple-template approach to protein threading. PROTEINS, 2011.
Combine sequence-structure and
sequence-sequence comparisons
• Example 1: GENTHREADER (ANN)
How likely are 2 aas to be neighbors??
How likely is aa to be buried/exposed??
Combine sequence and structure
for template selection
Example 2: HHSEARCH*:
• Based on hidden markov models (HMM)
• Sequence-HMM alignment
• Here: extended to HMM-HMM alignment
* Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
HHSEARCH: HMM-HMM alignment
• Formalization:
• more sensitive (for hard cases
with <20% seqid) than:
– Profile-profile comparison
– Profile-sequence comparison
– Sequence-sequence
comparison
* Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
HHSEARCH includes structural
information about template
Include secondary structure preference in model:
•Score pairs of aligned secondary structure elements with
substitution matrix
• Query sequence:
Predicted secondary structure
(PSIPRED: H/E/C) with confidence [0..9]
• Structural template:
Secondary structure
(DSSP: H/E/B/G/I/T/S)
DSSP:
H = alpha helix
E = extended strand
B = residue in isolated
beta-bridge
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
T = hydrogen bonded
turn
S = bend
10 x 3 x 7 substitution values
* Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
Homology modeling
4 steps:
1. Detect template
2. Align sequence onto template
3. Build model (loop modeling)
4. Refine model (relax)
Build model
1. Copy aligned regions from template
2. Rebuild missing pieces: Model loops
3. Refine model: add side chains (and
minimize; relax)
Build model: Loop modeling
Input:
• 2 anchors
• length of missing residues
2 approaches:
• Loop libraries: construct loops from fragments of
known structures
• Loop closure algorithms
–
–
model new conformations
good for longer loops
Fold-trees for loop modeling tasks
loop modeling
N
1
x
1’
2
x 2’
C
Color – flexible bb
Gray – fixed bb
Flexible “peptide” edge
rigid “peptide” edge
N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;
1
1’
rigid “jump”
1
1’
flexible “jump”
Rosetta loop modeling
• Define regions that are flexible, and perturb these in a
fixed background
– Same moves as described in ab initio, but more restricted
– Use fold tree architecture: connect take off and landing
segment by a jump, cut loop (at defined place, or arbitrarily),
apply perturbation, reclose loop
– Loop closure: using cyclic coordinate descent (CCD) or
kinematic loop closure (KC)
• Fragments can be used to improve knowledge-based
modeling
Cyclic Coordinate Descend (CCD) closure by
moving each joint separately..
Canutescu & Dunbrack,. Protein Sci. 12, 963–972 (2003).
..to maximally approach end
Repeat to obtain several
conformations….
Refine, and select best!
Loop closure and degrees of freedom
• Over-constrained for <6 DOFs
• Under-constrained for >6 DOF: infinite
number of solutions.
• A molecular loop closure problem with 6 DOF
has at most 16 solutions.
• Kinematic loop closure allows calculation of
analytical solution
Kinematic loop closure Coutsias (2004)
From robotics: Analytical solution of loop closure
for 6 degrees of freedom
Challenge:
• find analytical formulation to
extract
• all possible backbone structures
of a chain segment, that are
• geometrically consistent with
preceding and following parts of
the given structure.
Setup:
Kinematic loop closure, cont.
Solutions
aligned to each other
aligned to constant part
Kinematic closure (KC)
• Analytical solution of loop closure for 6 degrees of
freedom
• Extension: analytical determination of all mechanically
accessible conformations for 6 torsions of a peptide
chain of any length (e.g. 25 residues)
(1) Randomly perturb
non-pivot positions
(2) Apply KC to
pivot positions
Kinematic closure (KC) in Rosetta
• Embedded into MCM protocol (low-res + high-res)
– 720 steps
– Repeat 1000 times
Perturbation
+ KC
Loop backbone
minimization
Kinematic closure (KC)
• Improves median modeling
quality from 2.0Å to 0.8Å
RMSD (on set of 25 loops)
(CCD)
Improve loop modeling by sampling along Principle
Components (PC) of natural variation
• Collect loops of a set of
homolog templates
• Perform Principle Component
Analysis (PCA): Collection of
loops can be described by a few
(3) PCs only
➜Improves model quality: more
similar to the final structure
than to template.
• Depends on a set of known
homolog structures
8 protein structures
PCA2
PCA1
PCA3
Qian 2004 PNAS
Free-energy optimization along PC
of natural variation: example
• Red: model (2.36A RMSD)
• Blue: native
• Green: refined (1.42A RMSD)
Qian 2004 PNAS
Homology modeling
4 steps:
1. Detect template
2. Align sequence onto template
3. Build model (loop modeling)
4. Refine model (relax)
Rosetta:
Refine model with relax protocol
Same as in last ab initio modeling step*:
• Introduce general flexibility
• Relax protocol finds near-by minima
(within 4-5Å RMSD)
vdw repulsive
Small backbone moves and MCM
* MCM protocol:
small & shear moves
(120 steps; see lecture
5)
Side chain
optimization
Backbone
optimization
Side chain
optimization+
minimization
Backbone
optimization
Homology modeling with Rosetta
Summary - Basic protocol:
1. Detect template and align sequence: based
on HHSEARCH (alignment of two HMMs) or
RAPTOR (Threading)
2. Define aligned regions and loop regions; copy
aligned regions and complete protein
structure with loop modeling (with KIC kinematic loop closure, or CCD)
3. Refine structure with the “relax” protocol
Improvement over single
best target
• single impressive
improvements
• many targets better than
template
Reasons
• multiple templates
• free modeling
• refinement
worse better
CASP - Template-based modeling
(TBM)
CASP7: Example for improved TBM
with Rosetta (T330)
Distance cutoff
• Blue: Native
• Green: Baker Model04
• Red: Template
% of residues aligned
41
Rosetta in CASP7 & 8: use of several
templates improves prediction
Templates that produce
lower energy structures
produce better models
Homology modeling - summary
• Homology modeling to high resolution is
challenging (~ ab initio modeling)
• Today models are already better than the
template – GOOD NEWS!
 Good alignment and template selection are
critical
• Sophisticated new approaches have improved
homology modeling in recent years
– Include additional information during template
selection, alignment and refinement
Download