#23 - Protein Tertiary Structure 10/15/07 Prediction BCB 444/544

advertisement
#23 - Protein Tertiary Structure
Prediction
10/15/07
Required Reading
BCB 444/544
(before lecture)
Mon Oct 15 - Lecture 23
Lecture 24
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
¾ Protein Tertiary Structure
Prediction
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8
(Terribilini)
RNA Structure/Function & RNA Structure Prediction
• Chp 16 - pp 231 - 242
Fri Oct 18 - Lecture 25
#24_Oct17
Gene Prediction
• Chp 8 - pp 97 - 112
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
1
New Reading & Homework Assignment
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
2
Seminars this Week
ALL: HomeWork #4
(emailed & posted online Sat AM)
Due: Mon Oct 22 by 5 PM (not Fri Oct 19)
BCB List of URLs for Seminars related to Bioinformatics:
http://www.bcb.iastate.edu/seminars/index.html
Read:
• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB
• Sachdeve Sidhu (Genentech) Phage peptide and antibody
Ginalski et al.(2005) Practical Lessons from Protein Structure
Prediction, Nucleic Acids Res. 33:1874-91.
libraries in protein engineering and ligand selection
http://nar.oxfordjournals.org/cgi/content/full/33/6/1874
(PDF posted on website)
• Although somewhat dated, this paper provides a nice overview of
protein structure prediction methods and evaluation of predicted
structures.
• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI
• Lyric Bartholomay (Ent, ISU) TBA
• Your assignment is to write a summary of this paper - for details
see HW#4 posted online & sent by email on Sat Oct 13
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
3
Chp 15 - Tertiary Structure Prediction
SECTION V
10/17/07
4
Tertiary Structure Prediction Methods
2 (or 3) Major Methods:
1. Comparative Modeling:
• Homology Modeling (easiest!)
• Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)
STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
•
•
•
•
•
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
5
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
6
1
#23 - Protein Tertiary Structure
Prediction
10/15/07
Rapid Threading Approach for
Protein Structure Prediction
A Local Example:
Steps in Threading
Target
Sequence
ALKKGF…HFDTSE
Kai-Ming Ho, Physics
Haibo Cao
Yungok Ihm
Zhong Gao
James Morris
Cai-zhuang Wang
Structure
Templates
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Jeff Sander
1. Align target sequence with template structures
in fold library (usually from the PDB)
2. Calculate energy score to evaluate "goodness of fit"
between target sequence & template structure
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
3. Rank models based on energy scores
Polymer 45:687-697
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
7
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
8
Simplify: Energy Function
Simplify: Template structure representation
• Interaction “counts” only if two hydrophobic amino acid
residues are in contact
• At residue level, pair-wise hydrophobic interaction is
dominant:
1
i
j
E = Σi,j Cij Uij
N
C (N × N
Template structure
Cij = 1, if rij ≤ 6.5 Å
Cij = 0,
Cij : contact matrix
Uij = U(residue I, residue
contact matrix)
MJ:
LTW:
HP:
(contact)
Otherwise
(non-contact)
A neighbor in sequence
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
9
C
M=
Li-Tang-Wingreen (LTW):
C
M
F
I
L
V
W
M
I
U = Qi*Qj
U = {1,0}
10/17/07
10
Summary of Ho Threading Procedure
L
Template Structure
Contact Matrix
1
046
054
049
057
052
i
-020
-001 006
001 003 -008
018 010 -001 -004
~
Mij = C 2{( qi + α )( qj + α ) + β }
20 parameters
Ec = ∑ (QiCijQj + βCij )
N
Contact Energy:
F
U = Uij
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Energy calculation: Contact energy
Miyazawa-Jernigan (MJ) matrix:
Statistical potential
210 parameters
J)
ij =1
Yungok Ihm
BCB 444/544 Fall 07 Dobbs
with
qi
~ solubility
Qi
C
~ hydrophobicity
j
N
Sequence
AVFMRIHNDIVYNDIANTTQ
contact matrix
Qi = − q i − α
Scoring Function
α = −0.6797, β = −0.2604
Cij = 1, if rij < 6 5 Å
Cij = 0, otherwise
(a neighbor in sequence)
Sequence Vector
S = (QA, QV , QF ,....., QE )
= (0.7997, 0.9897, 1.1197, 0.6497)
Contact Energy
Ec = ∑ QiCijQj + β
N
Yungok Ihm
ij =1
2
#23 - Protein Tertiary Structure
Prediction
10/15/07
Can complexity be further reduced?
Represent contact matrix by its dominant
eigenvector (1D profile)
Consider simplifying structure representation, too
ALKKGF…HFDTSE
Sequence – Structure
(1D – 3D problem)
Sequence – Contact Matrix
(1D – 2D problem)
Sequence – 1D Profile
(1D – 1D problem)
• First eigenvector (with highest eigenvalue) dominates the overlap
between sequence and structure
• Higher ranking (rank > 4) eigenvectors are “sequence blind”
Haibo Cao
Haibo Cao
Threading Alignment Step - now fast!
Parameters for alignment?
Align target sequence vector (1D) with
eigenvector profile of template structure (1D)
• Gap penalty:
Insertion/deletion in helices or
strands is strongly penalized; smaller
penalties for in/dels in loops
1D Profile P = V 1
ALKKGFG…HFDTSE
Gap penalties apply to alignment score
only, not to energy calculation
Maximize the overlap between the
Sequence (S) and the profile (P)
S • P allowing gaps
•
If a target residue and aligned
template residue differ in radius by
> 0.5Å and if residue is involved in
> 2 contacts, alignment is penalized
New profile P = CP
Calculate contact energy
using the alignment:
Helix
Size penalties apply to alignment score
only, not to energy calculation
Ec
Cao et al
Polymer 45 (2004)
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Loop
Size penalty:
10/17/07
15
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
10/17/07
16
How much better is this “fit” than random?
How incorporate secondary structure?
• Predict secondary structure of target sequence
Eshuffle : Shuffled Sequence vs Structure
(PSIPRED, PROF, JPRED, SAM, GOR V)
N+ = total number of matches between predicted
& actual secondary structure of template
N- = total number of mismatches
Erelative = Emod – Eshuffled
Ns = total number of residues selected in alignment
“Global fitness” :
f = 1 + (N+ - N-) / Ns
E score modifed to reflect
fit with predicted 2' structure
Emod = f * Ethreading
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
BCB 444/544 Fall 07 Dobbs
10/17/07
17
Avg E score for same sequence
shuffled (randomized) many times
Yungok Ihm
3
#23 - Protein Tertiary Structure
Prediction
10/15/07
Typical Results:
Performance Evaluation? "Blind Test"
(well, actually, our BEST Results):
HO = #1-Ranked CASP5 Prediction for this Target
Predicted Structure
CASP5 Competition (CASP7 is most recent)
• Target 174
(Critical Assessment of Protein Structure Prediction)
• PDB ID = 1MG7
Given: Amino acid sequence
Goal: Predict 3-D structure
T174_1
(before experimental results published)
Actual Structure
T174_2
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
19
Overall Performance in CASP5 Contest
FR Fold Recognition
(targets manually assessed by Nick Grishin)
•
-----------------------------------------------------------
•
Rank
•
•
•
•
•
•
•
•
•
1
2
3
4
5
6
7
8
9
•
•
•
24.26
21.64
19.55
16.88
15.25
14.56
13.49
11.34
10.45
9.00
7.00
8.00
6.00
7.00
6.50
4.00
3.00
3.00
12.00
12.00
12.50
10.00
7.00
11.50
11.00
6.00
5.50
NgNW
9
7
9
6
7
7
4
3
3
NpNW
12
12
14
10
7
13
11
6
6
20
Critical Assessment of Protein Structure Prediction
•
•
Npred
10/17/07
CASP - Check it out!
~8th out of 180 (M. Levitt, Stanford)
Z-Score Ngood
444/544Dobbs,
F07 ISU Terribilini
#24 - RNA Secondary Structure Prediction
Cao, Ihm,BCB
Wang,
Ho
http://predictioncenter.gc.ucdavis.edu/
Group-name
• CASP7 contest - 2006:
Ginalski
Skolnick Kolinski
Baker
BIOINFO.PL
Shortle
BAKER-ROBETTA
Brooks
Ho-Kai-Ming
Jones-NewFold
• http://www.predictioncenter.org/casp7/Casp7.html
• Provides assessment of automated servers for protein
structure prediction (LiveBench, CAFASP, EVA)
& URLs for them
• Related contests & resources:
• Protein Function Prediction (part of CASP)
-----------------------------------------------------------
• CAPRI = Critical Assessment of Predicted Interactions
FR NgNW - number of good predictions without weighting for multiple models
FR NpNW - number of total predictions without weighting for multiple models
• New: CASPM = CASP for M = Mutant proteins
• Predict effects of small (point) mutations, e.g., SNPs
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
21
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
22
Chp 13 - Protein Structure Visualization,
Comparison & Classification
Another Convenient List of Links for
Protein Prediction Servers
http://en.wikipedia.org/wiki/List_of_protein_structure_pre
diction_software
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison &
Classification
• Protein Structural Visualization
¾ Protein Structure Comparison
• Protein Structure Classification
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
23
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
24
4
#23 - Protein Tertiary Structure
Prediction
10/15/07
Another local example: Combining Structure Prediction,
Machine Learning & "Real" (wet-lab) Experiments to
Investigate the Lentiviral Rev Protein:
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures
A Step Toward New HIV Therapies
(see Xiong textbook for details)
1. Intermolecular
2. Intramolecular
3. Combined
But, very active research area - many recent new methods
Susan Carpenter
(Washington State Univ)
Wendy Sparks
Yvonne Wannemuehler
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Kai-Ming Ho, Physics
Yungok Ihm
Haibo Cao
Cai-zhuang Wang
Gloria Culver, BBMB
Laura Dutca
3 Popular Methods:
DALI = Distance Matrix Alignment of Structures (Holm)
• FSSP Database
SSAP = Sequential Structure Alignment Program (Orengo)
• CATH Database
CE = Combinatorial Extension (Bourne)
• VAST at NCBI
•
•
•
http://en.wikipedia.org/wiki/Structural_alignment_software
URLS:
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
25
Macromolecular interactions mediated by
Rev protein in lentiviruses (HIV & EIAV)
Spliceosome
Spliceosome
AAAA
26
• Rev is a small nucleoplasmic shuttling protein
Rev
Rev AAAA
pre-mRNA
10/17/07
Rev is essential for lentiviral replication
Provirus
Cytoplasm
Nucleus
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
(HIV Rev 115 aa; EIAV Rev 165 aa)
RNA BINDING
• Recognizes a specific binding site on viral RNA:
(protein-RNA)
Rev
Rev Rev
RevAAAA
Rev Responsive Element (RRE)
MULTIMERIZATION
• Interacts with CRM1 to export incompletely spliced
viral RNAs from nucleus to the cytoplasm
(protein-protein)
Tat
NUCLEAR EXPORT
Rev
Rev
NUCLEAR IMPORT
(protein-protein)
Rev
AAAA
Rev Rev
Rev
• Critical role of Rev in lentiviral replication makes it
(protein-protein)
Late: Structural Proteins
Early: Regulatory Proteins
Progeny RNA
Susan Carpenter
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
• Specific domains of Rev mediate nuclear localization,
RNA binding, and nuclear export
10/17/07
27
Problem: no high resolution Rev structure!
not even for HIV Rev, despite intense effort ($$)
• Why??
an attractive target for antiviral (AIDs) therapy
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
28
Hypothesis: Rev proteins from diverse lentiviruses
share structural features critical for function
Approach:
• Rev aggregates at concentrations needed for NMR or Xray crystallography
• Computationally model structures of lentiviral Rev proteins
• What about insights from sequence comparisons?
- using structural threading algorithm (with Ho et al)
• "undetectable" sequence similarity among Revs from
different lentiviruses (eg, EIAV vs HIV <10%)
• Predict critical residues for RNA-binding, protein interaction
- using machine learning algorithms (with Honavar et al )
• Test model and predictions
- using genetic/biochemical approaches (with Carpenter & Culver)
- using biophysical approaches (with Andreotti & Yu groups)
• But:
• We know that lentiviral Rev proteins are functionally
"homologous" - even in highly diverse lentiviruses
Initially: focus on EIAV Rev & RRE
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
29
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
30
5
#23 - Protein Tertiary Structure
Prediction
10/15/07
Predicted EIAV Rev Structure
Functional domains: EIAV vs HIV Rev
‹ EIAV Rev
exon 2
exon 1
1
31
RBM
165
Folding?
NES
NLS
RRDRW
ERLE
KRRRK
‹ HIV-1 Rev
1
116
NLS/RBM
NES
NES - Nuclear Export Signal
NLS - Nuclear Localization Signal
RBM - putative RNA Binding Motif
RQARRNRRRRWR
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
31
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Comparison of Predicted Rev Structures
EIAV
FIV
SIV Dimer
10/17/07
32
Predicted vs Experimental Structure of
N-terminal region of HIV Rev
HIV
A
B
C
HIV Dimer
Overlay
Predicted Structure
NMR Structure
Alignment of Predicte
HIV Rev
HIV Rev N-terminal
& NMR Structures
N-terminus
Peptide
(Battiste & Williamson)
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
33
Leu95 & Leu109:
Leu36,45,49:
On surface,
consistent with role
in nuclear export
10/17/07
34
Mutate hydrophobic residues predicted to be
critical for helical packing in core
Location of functional residues EIAV Rev?
NES
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
L65
Buried in core, critical
hydrophic contacts for fold?
vs
L95 & L109
Single mutants: Leu to Ala
Leu to Asp
Double mutants: Leu to Ala
Single Ala
Mutation
L ⇒ A
Negligible effect
on Rev activity
L109
L65
L95
Insert charged aa
in hydrophobic
Asp
core
Dramatic
Single
Mutation
L ⇒ D
change
in Rev activity?
Double Ala
Mutation
L⇔L ⇒ A⇔A
Reduction in
Rev activity?
Putative RBM
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
BCB 444/544 Fall 07 Dobbs
10/17/07
35
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
10/17/07
36
6
#23 - Protein Tertiary Structure
Prediction
10/15/07
Functional Analysis of Rev Structural
Mutants in vivo (CAT assay)
Functional domains: EIAV vs HIV Rev
Red - RNA interaction
Green - Protein interaction
NES - Nuclear Export Signal
NLS - Nuclear Localization Signal
RBM - putative RNA Binding Motif
Activity of Rev Structural Mutants
‹ EIAV Rev
150
RBM
Folding?
NES
NLS
100
RRDRW
ERLE
NLS/RBM
NES
KRRRK
50
1
116
L95A
L109A
RQARRNRRRRWR
Double Mutations
37
Putative RNA-binding Motifs & Predicted
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
1
31
1-165
31-165
31-145
57-165
57-145
57-124
125-165
146-165
Yungok Ihm
RRDRW
71
81
91
121
MBP
MBP
MBP
MBP
MBP
MBP
MBP
151
161
60
42
HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL
30
22
++ +++ +++++++++++++++
+ ++++
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Michael Terribilini
10/17/07
39
MBP-ERev binds specifically to RRE in vitro
444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Jae-HyungBCBLee
No cold RRE
No protein
31-165
MBP
BSA
1-165
antisense
31-165
MBP
1-165
BSA
Competition
40
PREDICTED:
Protein binding residues
UV crosslinking
10/17/07
EIAV Rev: Binding Predictions vs Experiments
Structure
sense
1-165
Marker
+
141
165
MBP
MBP
+
131
146
NLS
ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR …
++ +++++++ ++++++++++
125
RBM Folding?
125-165
KRRRK
57
NES
MBP-ERev
61
38
Express & purify MBP-ERev deletion mutants
RNA-binding Residues Mapped onto
Predicted EIAV Rev Structure
ERLE
10/17/07
146-165
10/17/07
57-124
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Wendy Sparks
57-165
Single Mutations
L65A
L109A
57-145
L65A
L95A
A D
L109
31-165
Controls
A D
L95
31-145
RI
Sham
pcDNA3
‹ HIV-1 Rev
A D
L65
RNA binding residues
KRRRK
+
+
RRDRW
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++
++++++++
++++++++++++++++
131
141
151
161
VALIDATED:
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++++++
++
+++
++++++
Protein binding residues
+
++++++++++++++++++++
Cold RRE
WT
MBP
31-165
31-145
57-165
145-165
RNA binding residues
Undigested
32P-RRE
57
31
RBM
FOLD?
RRDRW
444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Jae-HyungBCBLee
BCB 444/544 Fall 07 Dobbs
10/17/07
41
Jae-Hyung Lee
125
145
165
NLS/RBM
NES
Lee et al (2006)
J Virol 80:3844
ERLE
KRRRK
Terribilini et al (2006)
PSB 11:415
7
#23 - Protein Tertiary Structure
Prediction
10/15/07
Rev RNA Binding Motifs: Predicted vs Experiment
Roles of Putative RNA Binding Motifs?
PREDICTED:
Structure
1
124
31
165
RBD
RBD
NES
KRRRK
+
+
Protein binding residues
57
146
RNA binding residues
RRDRW
NLS
ERLE
RRDRW
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++
++++++++
++++++++++++++++
131
141
151
161
VALIDATED:
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++++++
++
+++
++++++
Protein binding residues
+
++++++++++++++++++++
KRRRK
RNA binding residues
AADAA
57
KAAAK
ERDE
AALA
WT
AADAA
31
AALA
KAAAK
RRDRW
∅
Jae-Hyung BCB
Lee444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
43
Summary: Predictions vs Experiments
NES
RBM
FOLD
145
145
165
ERLE
KRRRK
AALA
KAAAK
∅
∅
ERDE
Jae-Hyung Lee
Combination of computational & wet lab approaches revealed that:
• EIAV Rev has a bipartite RNA binding domain
• Two Arg-rich RBMs are critical
• RRDRW in central region
(but not ERLE)
• KRRRK at C-terminus, overlapping the NLS
RRDRW
125
125
Conclusions & Future Directions
KRRRK
57
31
FOLD?
NLS/RBM
NLS
AADAA
ERDE
ERLE
RBM
NES
165
• Based on computational modeling, the RBMs are in close proximity
within the 3-D structure of protein
NLS/RBM
• Lentiviral Rev proteins & their cognate RRE binding sites may be
RRDRW
ERLE
more similar in structure than has been appreciated
Future:
KRRRK
61
71
81
91
41
51
GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI
+++++++++++++++
++
++++++++
++++++++++++++++
131
141
151
161
QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL
++++++
++++++++++
++
+++
Lee et al (2006) Terribilini et al (2006)
+
++++++++++++
++++++
++
J Virol 80:3844 PSB 11:415
Computational: Use Rev-RRE model system to discover
"predictive rules" for protein-RNA recognition
Experimental?
Lee et al (2006) Terribilini et al (2006)
BCB 444/544
F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
J Virol 80:3844
PSB 11:415
10/17/07
46
Building “Designer” Zinc Finger DNA-binding Proteins
J Sander, P Zaback, F Fu, J Townsend, R Winfrey
D Wright, K Joung, L Miller, D Dobbs, D Voytas
Experimentally determine the structure of
Rev-RRE complex !!!
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
47
Wright et al (2006) Sander et al (2007)
Nature Protocols
Nucleic Acids Res
8
#23 - Protein Tertiary Structure
Prediction
10/15/07
RNA Function
Chp 16 - RNA Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
• Storage/transfer of genetic information
• Newly discovered regulatory functions - RNAi
pathways especially
• Catalytic
Xiong: Chp 16 RNA Structure Prediction (Terribilini)
•
•
•
•
•
•
RNA Function
Types of RNA Structures
RNA Secondary Structure Prediction Methods
Ab Initio Approach
Comparative Approach
Performance Evaluation
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
49
RNA types & functions
Types of RNAs
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
50
RNA Structure
Primary Function(s)
mRNA - messenger
translation (protein synthesis)
regulatory
rRNA - ribosomal
translation (protein synthesis)
t-RNA - transfer
translation (protein synthesis)
hnRNA - heterogeneous nuclear
precursors & intermediates of mature mRNAs &
other RNAs
scRNA - small cytoplasmic
signal recognition particle (SRP)
tRNA processing
<catalytic>
snRNA - small nuclear
snoRNA - small nucleolar
mRNA processing, poly A addition <catalytic>
rRNA processing/maturation/methylation
regulatory RNAs (siRNA, miRNA,
etc.)
regulation of transcription and translation,
other??
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
<catalytic>
10/17/07
51
Levels of RNA Structure
10/17/07
52
• RNA tertiary structure is very difficult to predict
• Focus on predicting RNA secondary structure
• Given a RNA sequence, predict the secondary
structure of the molecule
• Almost all methods ignore higher order secondary
structures like psuedoknots
• Like proteins, RNA has primary, secondary, and tertiary
structures
• Primary structure - base sequence
• Secondary structure - single stranded or base paired
• Tertiary structure - 3D structure
BCB 444/544 Fall 07 Dobbs
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
RNA Structure Prediction
Rob Knight
Univ Colorado
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
• RNA forms complex 3D structures
• Mainly single stranded
• The single RNA strand can self-hybridize to form
base paired regions
10/17/07
53
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
54
9
#23 - Protein Tertiary Structure
Prediction
10/15/07
Base Pairing in RNA
Common structural motifs in RNA
G-C, A-U, G-U ("wobble") & variants
• Helices
See: IMB Image Library of Biological Molecules
• Loops
•
•
•
•
http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs
Hairpin
Interior
Bulge
Multibranch
• Pseudoknots
Fig 6.2
Baxevanis &
Ouellette 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
55
RNA Secondary Structure Prediction
Methods
10/17/07
56
Ab Initio Prediction
• Two main types of methods
• Ab initio - based on calculating the most
energetically favorable secondary structure
• Comparative approach - based on evolutionary
comparison of multiple related RNA sequences
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
• Only requires a single RNA sequence
• Calculates minimum free energy structure
• Base pairing lowers free energy of the structure,
so methods attempt to find secondary structure
with maximal base pairing
10/17/07
57
Ab Initio Prediction
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
58
Ab Initio Energy Calculation Method
• Free energy is calculated based on parameters
determined in the wet lab
• Known energy associated with each type of base
pair
• Base pair formation is not independent - multiple
base pairs adjacent to each other are more
favorable than individual base pairs - cooperative
• Bulges and loops adjacent to base pairs have a free
energy penalty
• Search for all possible
base-pairing patterns
• Calculate the total
energy of the
structure based on all
stabilizing and
destabilizing forces
Fig 6.3
Baxevanis &
Ouellette 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
59
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
60
10
#23 - Protein Tertiary Structure
Prediction
10/15/07
Dot Matrices
Dynamic Programming
• Can be used to find all
possible base pair
patterns
• Compare the input
sequence to itself and
put a dot anywhere
there is a
complimentary base
• Finding the best possible secondary structure is
difficult - lots of possibilities
• Compare RNA sequence with itself
• Apply scoring scheme based on energy parameters
for base pairs, cooperativity, and penalties for
destabilizing forces
• Find path that represents the most energetically
favorable secondary structure
R Knight 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
61
Problem
10/17/07
• Combines DP with thermodynamic calculations
• Fairly accurate for short sequences, less accurate as
sequence length increases
• RNAfold
• Returns multiple structures near the optimal structure
• Computes a larger number of potential secondary
structures than Mfold, so it uses a simplified energy
function
63
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
64
Covariation
• Uses multiple sequence alignment
• Assumes related sequences fold into the same
secondary structure
BCB 444/544 Fall 07 Dobbs
62
• Mfold
Comparative Approach
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
Popular Ab Initio Prediction Programs
• DP returns the SINGLE best structure
• There may be many structures with similar
energies
• Also, your predicted secondary structure is only as
good as the energy parameters used
• Solution - return multiple structures with near
optimal energies
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
• RNA functional motifs are conserved
• To maintain RNA structure during evolution, a
mutation in a base paired residue must be
compensated for by a mutation in the base that it
pairs with
• Comparative methods search for covariation
patterns in MSA
10/17/07
65
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
66
11
#23 - Protein Tertiary Structure
Prediction
10/15/07
Popular Comparative Prediction
Programs
Consensus Structures
• Predict secondary structure of each individual
sequence
• Compare all structures and see if there is a most
common structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
• Two types
• Require user to provide MSA
• No MSA required
67
RNAalifold
10/17/07
68
Foldalign
• Requires user to provide the MSA
• Creates a scoring matrix combining minimum free
energy and covariation information
• DP is used to select the minimum free energy
structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
• User provides a pair of unaligned RNA sequences
• Foldalign constructs alignment then computes a
commonly conserved structure
• Suitable only for short sequences
69
Dynalign
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
70
Performance Evaluation
• User provides two input sequences
• Dynalign calculates possible secondary structures
using algorithm similar to Mfold
• Dynalign compares multiple structures from both
sequences to find a common structure
• Ab initio methods achieve correlation coefficient of 20-60%
• Comparative approaches achieve correlation coefficient of
20-80%
• Programs that require user to supply MSA are more accurate
• Comparative programs are consistently more accurate than ab
initio programs
• Base-pairs predicted by comparative sequence analysis for
large & small subunit rRNAs are 97% accurate when
compared with high resolution crystal structures!
- Gutell, Pace
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/17/07
71
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
72
12
Download