BCB 444/544 Protein Tertiary Structure Prediction Lecture 23

advertisement
BCB 444/544
Lecture 23
 Protein Tertiary Structure
Prediction
#23_Oct15
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
1
Required Reading
(before lecture)
Mon Oct 15 - Lecture 23
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8
(Terribilini)
RNA Structure/Function & RNA Structure Prediction
• Chp 16 - pp 231 - 242
Fri Oct 18 - Lecture 25
Gene Prediction
• Chp 8 - pp 97 - 112
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
2
New Reading & Homework Assignment
ALL: HomeWork #4 (emailed & posted online Sat AM)
Due: Mon Oct 22 by 5 PM (not Fri Oct 19)
Read:
Ginalski et al.(2005) Practical Lessons from Protein Structure
Prediction, Nucleic Acids Res. 33:1874-91.
http://nar.oxfordjournals.org/cgi/content/full/33/6/1874
(PDF posted on website)
• Although somewhat dated, this paper provides a nice overview of
protein structure prediction methods and evaluation of predicted
structures.
• Your assignment is to write a summary of this paper - for details
see HW#4 posted online & sent by email on Sat Oct 13
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
3
Seminars Last Week
Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar
The Computational Microscope 2:10 PM in E164 Lagomarcino
http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulte
n_Seminar.pdf
• Check out links on Schulten's website (videos, etc)
• http://www.ks.uiuc.edu/~kschulte/
• Great seminar - amazing simulations of dynamics in proteins and
large macromolecular assemblies
• Very computationally intensive - very impressive demonstration
of power of computation to produce insights not attainable using
only experimental approaches
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
4
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:
http://www.bcb.iastate.edu/seminars/index.html
• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB
• Sachdeve Sidhu (Genentech) Phage peptide and antibody
libraries in protein engineering and ligand selection
• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI
• Lyric Bartholomay (Ent, ISU) TBA
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
5
Protein Sequence & Structure: Analysis
• Diamond STING Millennium - Many useful structure analysis
tools, including Protein Dossier
http://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)
Protein knowledgebase
http://us.expasy.org/sprot
• InterPro
Sequence analysis tools
http://www.ebi.ac.uk/interpro
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
6
Chp 14 - Secondary Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• √Secondary Structure Prediction for Globular Proteins
• √Secondary Structure Prediction for Transmembrane Proteins
• √Coiled-Coil Prediction
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
7
Where Find "Actual" Secondary Structure?
In the PDB
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
8
How Does Predicted Secondary Structure
Compare with Actual? (An example)
Actual - Calculated from PDB coordinates by DSSP or author:
DSSP
Author
Query
GOR V
FDM
CDM
MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIEN
CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHH
CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHH
CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHH
Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
9
Chp 15 - Tertiary Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
•
•
•
•
•
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
10
Structural Genomics - Status & Goal
~ 20,000 "traditional" genes in human genome
(recall, this is fewer than earlier estimate of 30,000)
~ 2,000 proteins in a typical cell
> 4.9 million sequences in UniProt (Oct 2007)
> 46,000 protein structures in the PDB (Oct 2007)
Experimental determination of protein structure lags
far behind sequence determination!
Goal: Determine structures of "all" protein folds in
nature, using combination of experimental structure
determination methods (X-ray crystallography, NMR,
mass spectrometry) & structure prediction
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
11
Structural Genomics Project
TargetDB: Database of Structural Genomics Targets
http://targetdb.pdb.org
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
12
Database of Theoretical Structures?
PMDB: Protein Model Database
http://mi.caspur.it/PMDB/help.php
also, via NAR's Molecular Biology Database Collection
http://www.oxfordjournals.org/nar/database/summary/855
Theoretical structural models (predicted) are no longer accepted
by the PDB (since 10/15/06); but, it is possible to search for
models deposited earlier:
http://www.rcsb.org/pdb/search/searchModels.do
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
13
Protein Structure Prediction
or Protein Folding Problem
"Major unsolved problem in molecular biology"
In cells:
spontaneous
assisted by enzymes
assisted by chaperones
In vitro:
many proteins can fold to their "native"
states spontaneously & without assistance
but, many do not!
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
14
Deciphering the Protein Folding Code
• Protein Structure Prediction or
• Protein Folding Problem
Given the amino acid sequence of
a protein, predict its
3-dimensional structure (fold)
• Inverse Folding Problem
Given a protein fold, identify
every amino acid sequence that
can adopt its 3-dimensional
structure
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
15
Protein Structure Prediction
Structure is largely determined by sequence
BUT:
• Similar sequences can assume different structures
• Dissimilar sequences can assume similar structures
• Many proteins are multi-functional
2 Major Protein Folding Problems:
1- Determine folding pathway
2- Predict tertiary structure from sequence
Both still largely unsolved problems
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
16
Steps in Protein Folding
1- "Collapse"- driving force is burial of
hydrophobic aa’s
(fast - msecs)
2- Molten globule - helices & sheets
form, but "loose" (slow - secs)
3- "Final" native folded state compaction & rearrangement of
2' structures
Native state?
- assumed to be lowest free energy
- may be an ensemble of structures
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
17
Protein Dynamics
• Protein in native state is NOT static
• Function of many proteins requires conformational
changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"
(NOT evolved for maximum stability)
• Energy difference between native and denatured
state is very small (5-15 kcal/mol)
(this is equivalent to ~ 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
18
Difficulty of Tertiary Structure Prediction
Folding or tertiary structure prediction problem can
be formulated as a search for minimum energy
conformation
• Search space is defined by psi/phi angles of
backbone and side-chain rotamers
• Search space is enormous even for small proteins!
• Number of local minima increases exponentially
with number of residues
Computationally it is an exceedingly difficult problem!
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
19
Tertiary Structure Prediction Methods
2 (or 3) Major Methods:
1. Comparative Modeling:
• Homology Modeling (easiest!)
• Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
20
Comparative Modeling?
Comparative modeling - term is sometimes used
interchangeably with homology modeling, but also
sometimes used to mean both:
• homology modeling
• threading/fold recognition
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
21
Ab Initio Prediction
1. Develop energy function
• bond energy
• bond angle energy
• dihedral angle energy
• van der Waals energy
• electrostatic energy
2. Calculate structure by minimizing energy function
• usually Molecular Dynamics (MD) or Monte Carlo (MC)

Ab initio prediction - impractical for most real (long) proteins
• Computationally? very expensive
• Accuracy? Usually poor for all except short peptides
 (but much improvement recently!)
Provides both folding pathway & folded structure
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
22
Comparative Modeling
Two types:
1) Homology modeling
2) Threading (fold recognition)
Both rely on availability of experimentally determined
structures that are "homologous" or at least
structurally very similar to target
Provide folded structure only
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
23
Homology Modeling
1.
2.
Identify homologous protein sequences (-BLAST)
Among available structures (in PDB), choose one with closest
sequence to target as template
(can combine steps 1 & 2 by using PDB-BLAST)
3.
Build model by placing target sequence residues in
corresponding positions on homologous structure & refine by
"tweaking" modeled structure (energy minimization)

Homology modeling - works "well"
• Computationally? "relatively" inexpensive
• Accuracy? higher sequence identity  better model
 Requires ~30% sequence identity with sequence for
which structure is known
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
24
Threading - Fold Recognition
Identify “best” fit between target sequence & template structure
1.
2.
3.
4.
5.
Develop energy function
Develop template library
Align target sequence with each template in library & score
Identify top scoring template (1D to 3D alignment)
Refine structure as in homology modeling

Threading - works "sometimes"
• Computationally? Can be expensive or cheap, depends on
energy function & whether "all atom" or "backbone only"
threading is used
• Accuracy? in theory, should not depend on sequence identity
(should depend on quality of template library & "luck")

Usually, higher sequence identity to protein of known
structure  better model
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
25
Threading: the Motivation
• Basic premise:
The number of unique structural folds in nature is
fairly small (probably 2000-3000)
• Statistics from Protein Data Bank (>46,000 structures)
Prior to Structural Genomics Project, 90% of "new"
structures submitted to PDB were similar to existing
folds in PDB - suggesting that almost all folds in
nature have been identified
• Thus, chances for a protein to have a native-like structural fold
in PDB are quite good
• Note: Proteins with similar structural folds could be either
homologs or analogs
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
26
Steps in Threading
Target
Sequence
ALKKGF…HFDTSE
Structure
Templates
1. Align target sequence with template structures
in fold library (usually from the PDB)
2. Calculate energy score to evaluate "goodness of fit"
between target sequence & template structure
3. Rank models based on energy scores
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
27
Threading Goal - & Issues
Find “correct” sequence-structure alignment of a target sequence with
its native-like fold in template library (usually derived from PDB)
• Structure database - must be "complete"
• Can't build a good model if there is no good template in library!
• Sequence-structure alignment algorithm:
• Bad alignment  Bad score!
• Energy function or Scoring Scheme:
• Must distinguish correct sequence-fold alignment from
incorrect sequence-fold alignments
• Must distinguish “correct” fold from close decoys
• Prediction reliability assessment - How determine whether
predicted structure is correct? (or even close?)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
28
Threading: Template database
• Build a database of structural templates
e.g., ASTRAL domain library derived from the PDB
Sometimes, supplement with additional decoys
e.g., generated using ab initio approach such as Rosetta (Baker)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
29
Threading: Energy function
• Two main methods (& combinations of these)
• Structural profile (environmental)
physicochemical properties of amino acids
• Contact potential (statistical)
based on contact statistics from PDB
famous one: Miyazawa & Jernigan (ISU)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
30
Protein Threading: Typical energy function
What is "probability"
that two specific
residues are in
contact?
How well does a
specific residue fit
structural environment?
Alignment gap
penalty?
Total energy: Ep + Es + Eg
Goal: Find a sequence-structure alignment that
minimizes energy function
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
31
Rapid Threading Approach for
Protein Structure Prediction
A Local Example:
Kai-Ming Ho, Physics
Haibo Cao
Yungok Ihm
Zhong Gao
James Morris
Cai-zhuang Wang
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Jeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
Polymer 45:687-697
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
32
Motivations for & Assumptions of
Ho Threading Algorithm
Goal: Develop a threading algorithm that:
• Is simple & rapid enough to be used in high throughput
applications
• Is relatively "insensitive" to sequence similarity between
target protein sequence & sequence of template structure
(to enhance detection of remote homologs & structures that are
similar due to convergent evolution)
• Can be used to answer questions such as:
What are predicted structures of all "unassigned" ORFs in
Arabidopsis?
Does Arabidopsis have a protein with structure similar to
mammalian Tumor Necrosis Factor (TNF)?
Assumptions:
• Native state of a protein is lowest free energy state
• Hydrophobic interactions drive protein folding
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
33
Simplify: Template structure representation
1
i
j
N
Template structure
Cij  1, if rij  6.5 Å
Cij  0,
C ( N  N contact matrix)
(contact)
Otherwise
(non-contact)
A neighbor in sequence
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
34
Simplify: Target Sequence Representation
• Miyazawa-Jernigan (MJ) model: inter-residue contact
energy M(i,j) is a quasi-chemical approximation based on pairwise contact statistics extracted from known protein structures
in the PDB: 20 X 20 matrix = 210 values ("letters")
• Li-Tang-Wingreen (LTW): factorize the MJ interaction
matrix to reduce the number of parameters associated with
amino acids from 210 to 20 q values
• Hydrophobic-Polar (HP): represent amino acids as either H
(hydrophobic) or polar (P); Dill et al demonstrated the utility of
this simple binary alphabet representation: 2 values
Compare results with 210 vs 20 vs 2 letter representations
How low can we go?
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
35
Simplify: Energy Function
• Interaction “counts” only if two hydrophobic amino acid
residues are in contact
• At residue level, pair-wise hydrophobic interaction is
dominant:
E = i,j Cij Uij
Cij : contact matrix
Uij = U(residue I, residue
MJ:
LTW:
HP:
J)
U = Uij
U = Qi*Qj
U = {1,0}
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
36
Energy calculation: Contact energy
Miyazawa-Jernigan (MJ) matrix:
Statistical potential
210 parameters
C
M
C
M
F
I
L
V
W
046
054
049
057
052
M
L
-020
-001 006
001 003 -008
018 010 -001 -004
20 parameters
Ec   (QiCijQj  Cij)
N
Yungok Ihm
I
~
Mij  C 2{( qi   )( qj   )   }
Li-Tang-Wingreen (LTW):
Contact Energy:
F
ij 1

with
qi
~ solubility
Qi
C
~ hydrophobicity
contact matrix
Qi   qi  
  0.6797,   0.2604
Summary of Ho Threading Procedure
Template Structure
Contact Matrix
1
i
j
N
Sequence
AVFMRIHNDIVYNDIANTTQ
Scoring Function
Cij  1, if rij < 6 5 Å
Cij  0, otherwise
(a neighbor in sequence)
Sequence Vector
S  (QA, QV , QF ,....., QE )
 (0.7997, 0.9897, 1.1197, 0.6497)
Contact Energy
Ec   QiCijQj  
N
Yungok Ihm
ij 1
Can complexity be further reduced?
Consider simplifying structure representation, too
ALKKGF…HFDTSE
Sequence – Structure
(1D – 3D problem)
Sequence – Contact Matrix
(1D – 2D problem)
Sequence – 1D Profile
(1D – 1D problem)
Haibo Cao
Examine eigenvectors of contact matrix
Hydrophobic Contacts
~
~ N ~
 T CT  T (  iV iVi )T
i 1
N
~ 2
~ 2
  i V iT   1V 1T
i 1
:contact matrix
i :i-th eigenvalue of C
Vi :i-th eigenvector
V 1 :eigenvector with largest eigenvalue
T :protein sequence of the template structure
i :fraction of hydrophobic contacts from i-th eigenvector
C
r
Haibo Cao
~ 2
i (V iT )
ri  N
~ 2
 i(ViT )
i 1
Represent contact matrix by its dominant
eigenvector (1D profile)
• First eigenvector (with highest eigenvalue) dominates the overlap
between sequence and structure
• Higher ranking (rank > 4) eigenvectors are “sequence blind”
Haibo Cao
Threading Alignment Step - now fast!
Align target sequence vector (1D) with
eigenvector profile of template structure (1D)
1D Profile P
V1
Maximize the overlap between the
Sequence (S) and the profile (P)
S  P allowing gaps
New profile P  CP
Calculate contact energy
using the alignment:
Ec
Cao et al
Polymer 45 (2004)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
42
Parameters for alignment?
• Gap penalty:
Insertion/deletion in helices or
strands is strongly penalized; smaller
penalties for in/dels in loops
ALKKGFG…HFDTSE
Gap penalties apply to alignment score
only, not to energy calculation
• Size
penalty:
If a target residue and aligned
template residue differ in radius by
> 0.5Å and if residue is involved in
> 2 contacts, alignment is penalized
Loop
Helix
Size penalties apply to alignment score
only, not to energy calculation
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
43
How incorporate secondary structure?
• Predict secondary structure of target sequence
(PSIPRED, PROF, JPRED, SAM, GOR V)
N+ = total number of matches between predicted
& actual secondary structure of template
N- = total number of mismatches
Ns = total number of residues selected in alignment
“Global fitness” :
f = 1 + (N+ - N-) / Ns
Emod = f * Ethreading
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
44
How much better is this “fit” than random?
Eshuffle : Shuffled Sequence vs Structure
Erelative = Emod – Eshuffled
E score modifed to reflect
fit with predicted 2' structure
Yungok Ihm
Avg E score for same sequence
shuffled (randomized) many times
Performance Evaluation? "Blind Test"
CASP5 Competition (CASP7 is most recent)
(Critical Assessment of Protein Structure Prediction)
Given: Amino acid sequence
Goal: Predict 3-D structure
(before experimental results published)
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
46
Typical Results:
(well, actually, our BEST Results):
HO = #1-Ranked CASP5 Prediction for this Target
Predicted Structure
• Target 174
• PDB ID = 1MG7
T174_1
Actual Structure
T174_2
BCB 444/544
F07 ISUHo
Dobbs #23 - Protein Tertiary Structure Prediction
Cao, Ihm, Wang,
Dobbs,
10/15/07
47
Overall Performance in CASP5 Contest
~8th out of 180 (M. Levitt, Stanford)
•
•
FR Fold Recognition
(targets manually assessed by Nick Grishin)
•
-----------------------------------------------------------
•
Rank
•
•
•
•
•
•
•
•
•
1
2
3
4
5
6
7
8
9
•
•
•
Z-Score Ngood
24.26
21.64
19.55
16.88
15.25
14.56
13.49
11.34
10.45
9.00
7.00
8.00
6.00
7.00
6.50
4.00
3.00
3.00
Npred
12.00
12.00
12.50
10.00
7.00
11.50
11.00
6.00
5.50
NgNW
9
7
9
6
7
7
4
3
3
NpNW
12
12
14
10
7
13
11
6
6
Group-name
Ginalski
Skolnick Kolinski
Baker
BIOINFO.PL
Shortle
BAKER-ROBETTA
Brooks
Ho-Kai-Ming
Jones-NewFold
----------------------------------------------------------FR NgNW - number of good predictions without weighting for multiple models
FR NpNW - number of total predictions without weighting for multiple models
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
48
CASP - Check it out!
Critical Assessment of Protein Structure Prediction
http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest - 2006:
• http://www.predictioncenter.org/casp7/Casp7.html
• Provides assessment of automated servers for protein
structure prediction (LiveBench, CAFASP, EVA)
& URLs for them
• Related contests & resources:
• Protein Function Prediction (part of CASP)
• CAPRI = Critical Assessment of Predicted Interactions
• New: CASPM = CASP for M = Mutant proteins
• Predict effects of small (point) mutations, e.g., SNPs
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
49
Another Convenient List of Links for
Protein Prediction Servers
http://en.wikipedia.org/wiki/List_of_protein_structure_pre
diction_software
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
50
Chp 13 - Protein Structure Visualization,
Comparison & Classification
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison &
Classification
• Protein Structural Visualization
 Protein Structure Comparison
• Protein Structure Classification
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
51
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures
(see Xiong textbook for details)
1. Intermolecular
2. Intramolecular
3. Combined
But, very active research area - many recent new methods
3 Popular Methods:
•
•
•
URLS:
DALI = Distance Matrix Alignment of Structures (Holm)
• FSSP Database
SSAP = Sequential Structure Alignment Program (Orengo)
• CATH Database
CE = Combinatorial Extension (Bourne)
• VAST at NCBI
http://en.wikipedia.org/wiki/Structural_alignment_software
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
52
Another local example: Combining Structure Prediction,
Machine Learning & "Real" (wet-lab) Experiments to
Investigate the Lentiviral Rev Protein:
A Step Toward New HIV Therapies
Susan Carpenter
(Washington State Univ)
Wendy Sparks
Yvonne Wannemuehler
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Kai-Ming Ho, Physics
Yungok Ihm
Haibo Cao
Cai-zhuang Wang
Gloria Culver, BBMB
Laura Dutca
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
53
Macromolecular interactions mediated by
Rev protein in lentiviruses (HIV & EIAV)
Provirus
Rev AAAA
pre-mRNA
RNA BINDING
(protein-RNA)
Spliceosome
Rev Rev AAAA
AAAA
MULTIMERIZATION
(protein-protein)
Tat
NUCLEAR EXPORT
Rev
NUCLEAR IMPORT
(protein-protein)
Rev RevAAAA
(protein-protein)
Early: Regulatory Proteins
Late: Structural Proteins
Progeny RNA
Susan Carpenter
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
54
Rev is essential for lentiviral replication
• Rev is a small nucleoplasmic shuttling protein
(HIV Rev 115 aa; EIAV Rev 165 aa)
• Recognizes a specific binding site on viral RNA:
Rev Responsive Element (RRE)
• Interacts with CRM1 to export incompletely spliced
viral RNAs from nucleus to the cytoplasm
• Specific domains of Rev mediate nuclear localization,
RNA binding, and nuclear export
• Critical role of Rev in lentiviral replication makes it
an attractive target for antiviral (AIDs) therapy
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
55
Problem: no high resolution Rev structure!
not even for HIV Rev, despite intense effort ($$)
• Why??
• Rev aggregates at concentrations needed for NMR or Xray crystallography
• What about insights from sequence comparisons?
• "undetectable" sequence similarity among Revs from
different lentiviruses (eg, EIAV vs HIV <10%)
• But:
• We know that lentiviral Rev proteins are functionally
"homologous" - even in highly diverse lentiviruses
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
56
Hypothesis: Rev proteins from diverse lentiviruses
share structural features critical for function
Approach:
• Computationally model structures of lentiviral Rev proteins
- using structural threading algorithm (with Ho et al)
• Predict critical residues for RNA-binding, protein interaction
- using machine learning algorithms (with Honavar et al )
• Test model and predictions
- using genetic/biochemical approaches (with Carpenter & Culver)
- using biophysical approaches (with Andreotti & Yu groups)
Initially: focus on EIAV Rev & RRE
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
57
Functional domains: EIAV vs HIV Rev
 EIAV Rev
exon 2
exon 1
1
31
RBM
165
Folding?
NES
NLS
RRDRW
ERLE
KRRRK
 HIV-1 Rev
1
116
NLS/RBM
RQARRNRRRRWR
NES
NES - Nuclear Export Signal
NLS - Nuclear Localization Signal
RBM - putative RNA Binding Motif
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
58
Predicted EIAV Rev Structure
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
59
Comparison of Predicted Rev Structures
EIAV
FIV
SIV Dimer
HIV
HIV Dimer
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
60
Predicted vs Experimental Structure of
N-terminal region of HIV Rev
A
Predicted Structure
HIV Rev
N-terminus
B
NMR Structure
HIV Rev N-terminal
Peptide
(Battiste & Williamson)
C
Overlay
Alignment of Predicted
& NMR Structures
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
61
Location of functional residues EIAV Rev?
Leu95 & Leu109:
NES
Leu36,45,49:
On surface,
consistent with role
in nuclear export
Buried in core, critical
hydrophic contacts for fold?
Putative RBM
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
62
Mutate hydrophobic residues predicted to be
critical for helical packing in core
L65
vs
L95 & L109
Single mutants: Leu to Ala
Leu to Asp
Double mutants: Leu to Ala
Single Ala
Mutation
LA
Single Asp
Mutation
LD
Negligible effect
on Rev activity
L65
L109
L95
Insert charged aa in
hydrophobic core
Double Ala
Mutation
LL  AA
Dramatic change
in Rev activity?
Reduction in
Rev activity?
Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
63
150
100
RI
pcDNA3
50
Sham
Activity of Rev Structural Mutants
Functional Analysis of Rev Structural
Mutants in vivo (CAT assay)
Controls
A D
L65
A D
L95
A D
L109
Single Mutations
L65A
L95A
L65A
L109A
L95A
L109A
Double Mutations
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
Wendy Sparks
10/15/07
64
Functional domains: EIAV vs HIV Rev
Red
- RNA interaction
Green - Protein interaction
NES - Nuclear Export Signal
NLS - Nuclear Localization Signal
RBM - putative RNA Binding Motif
 EIAV Rev
RBM
Folding?
NES
NLS
RRDRW
 HIV-1 Rev
ERLE
1
KRRRK
116
NLS/RBM
NES
RQARRNRRRRWR
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
65
Putative RNA-binding Motifs & Predicted
RNA-binding Residues Mapped onto
Predicted EIAV Rev Structure
ERLE
KRRRK
Yungok Ihm
RRDRW
61
71
81
91
ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR …
++ +++++++ ++++++++++
121
+
131
+
141
151
161
HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL
+ ++++
++ +++ +++++++++++++++
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
Michael Terribilini
10/15/07
66
Express & purify MBP-ERev deletion mutants
1
31
RBM Folding?
125
NES
MBP-ERev
146
165
NLS
MBP
MBP
MBP
MBP
MBP
MBP
MBP
146-165
125-165
57-124
57-145
57-165
31-145
31-165
1-165
MBP
MBP
Marker
1-165
31-165
31-145
57-165
57-145
57-124
125-165
146-165
57
60
42
30
22
444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
Jae-Hyung BCB
Lee
10/15/07
67
MBP-ERev binds specifically to RRE in vitro
No protein
31-165
1-165
BSA
31-165
MBP
1-165
MBP
antisense
sense
BSA
Competition
No cold RRE
UV crosslinking
Cold RRE
Undigested
32P-RRE
444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
Jae-Hyung BCB
Lee
10/15/07
68
EIAV Rev: Binding Predictions vs Experiments
PREDICTED:
Structure
Protein binding residues
RNA binding residues
KRRRK
+
+
RRDRW
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++++++++
++
++++++++++++++++
131
141
151
161
VALIDATED:
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++++++
++
+++
++++++
Protein binding residues
+
++++++++++++++++++++
31-165
31-145
57-165
145-165
WT
MBP
RNA binding residues
57
31
RBM
FOLD?
145
165
NLS/RBM
NES
RRDRW
Jae-Hyung Lee
125
Lee et al (2006)
J Virol 80:3844
ERLE
KRRRK
Terribilini et al (2006)
PSB 11:415
Roles of Putative RNA Binding Motifs?
1
31
57
124
146
RBD
165
RBD
NES
NLS
RRDRW
ERLE
KRRRK
AADAA
AALA
KAAAK
ERDE
Jae-Hyung Lee
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
70
Rev RNA Binding Motifs: Predicted vs Experiment
PREDICTED:
Structure
KRRRK
+
+
Protein binding residues
RNA binding residues
RRDRW
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++++++++
++
++++++++++++++++
131
141
151
161
VALIDATED:
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++++++
++
+++
++++++
Protein binding residues
+
++++++++++++++++++++
KAAAK
WT
RNA binding residues
57
31
RBM
NES
125
145
165
NLS/RBM
NLS
RRDRW

AADAA
Jae-Hyung Lee
FOLD?
ERLE
AALA

ERDE
KRRRK

KAAAK
Summary: Predictions vs Experiments
KRRRK
ERLE
RRDRW
57
31
NES
125
RBM
RRDRW
FOLD
ERLE
145
165
NLS/RBM
KRRRK
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++++++++
++
++++++++++++++++
131
141
151
161
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++++++
++
+++
++++++
Lee et al (2006) Terribilini et al (2006)
+
++++++++++++++++++++
J Virol 80:3844 PSB 11:415
Conclusions & Future Directions
Combination of computational & wet lab approaches revealed that:
• EIAV Rev has a bipartite RNA binding domain
• Two Arg-rich RBMs are critical
• RRDRW in central region
(but not ERLE)
• KRRRK at C-terminus, overlapping the NLS
• Based on computational modeling, the RBMs are in close proximity
within the 3-D structure of protein
• Lentiviral Rev proteins & their cognate RRE binding sites may be
more similar in structure than has been appreciated
Future:
Computational: Use Rev-RRE model system to discover
"predictive rules" for protein-RNA recognition
Experimental?
Lee et al (2006) Terribilini et al (2006)
BCB 444/544
F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
J Virol 80:3844
PSB 11:415
10/15/07
73
Experimentally determine the structure of
Rev-RRE complex !!!
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
74
Building “Designer” Zinc Finger DNA-binding Proteins
J Sander, P Zaback, F Fu, J Townsend, R Winfrey
D Wright, K Joung, L Miller, D Dobbs, D Voytas
Wright et al (2006) Sander et al (2007)
Nature Protocols
Nucleic Acids Res
Chp 16 - RNA Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 16 RNA Structure Prediction (Terribilini)
•
•
•
•
•
•
Introduction
Types of RNA Structures
RNA Secondary Structure Prediction Methods
Ab Initio Approach
Comparative Approach
Performance Evaluation
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
10/15/07
76
Download