BCB 444/544 Protein Tertiary Structure Prediction Lecture 24

advertisement
BCB 444/544
Lecture 24
 Protein Tertiary Structure
Prediction
#24_Oct17
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
1
Required Reading
(before lecture)
Mon Oct 15 - Lecture 23
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8
(Terribilini)
RNA Structure/Function & RNA Structure Prediction
• Chp 16 - pp 231 - 242
Fri Oct 18 - Lecture 25
Gene Prediction
• Chp 8 - pp 97 - 112
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
2
New Reading & Homework Assignment
ALL: HomeWork #4 (emailed & posted online Sat AM)
Due: Mon Oct 22 by 5 PM (not Fri Oct 19)
Read:
Ginalski et al.(2005) Practical Lessons from Protein Structure
Prediction, Nucleic Acids Res. 33:1874-91.
http://nar.oxfordjournals.org/cgi/content/full/33/6/1874
(PDF posted on website)
• Although somewhat dated, this paper provides a nice overview of
protein structure prediction methods and evaluation of predicted
structures.
• Your assignment is to write a summary of this paper - for details
see HW#4 posted online & sent by email on Sat Oct 13
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
3
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:
http://www.bcb.iastate.edu/seminars/index.html
• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB
• Sachdeve Sidhu (Genentech) Phage peptide and antibody
libraries in protein engineering and ligand selection
• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI
• Lyric Bartholomay (Ent, ISU) TBA
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
4
Chp 15 - Tertiary Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
•
•
•
•
•
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
5
Tertiary Structure Prediction Methods
2 (or 3) Major Methods:
1. Comparative Modeling:
• Homology Modeling (easiest!)
• Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
6
Steps in Threading
Target
Sequence
ALKKGF…HFDTSE
Structure
Templates
1. Align target sequence with template structures
in fold library (usually from the PDB)
2. Calculate energy score to evaluate "goodness of fit"
between target sequence & template structure
3. Rank models based on energy scores
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
7
Rapid Threading Approach for
Protein Structure Prediction
A Local Example:
Kai-Ming Ho, Physics
Haibo Cao
Yungok Ihm
Zhong Gao
James Morris
Cai-zhuang Wang
Drena Dobbs, GDCB
Jae-Hyung Lee
Michael Terribilini
Jeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
Polymer 45:687-697
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
8
Simplify: Template structure representation
1
i
j
N
Template structure
Cij  1, if rij  6.5 Å
Cij  0,
C ( N  N contact matrix)
(contact)
Otherwise
(non-contact)
A neighbor in sequence
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
9
Simplify: Energy Function
• Interaction “counts” only if two hydrophobic amino acid
residues are in contact
• At residue level, pair-wise hydrophobic interaction is
dominant:
E = i,j Cij Uij
Cij : contact matrix
Uij = U(residue I, residue
MJ:
LTW:
HP:
J)
U = Uij
U = Qi*Qj
U = {1,0}
Yungok Ihm
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
10
Energy calculation: Contact energy
Miyazawa-Jernigan (MJ) matrix:
Statistical potential
210 parameters
C
M
C
M
F
I
L
V
W
046
054
049
057
052
M
L
-020
-001 006
001 003 -008
018 010 -001 -004
20 parameters
Ec   (QiCijQj  Cij)
N
Yungok Ihm
I
~
Mij  C 2{( qi   )( qj   )   }
Li-Tang-Wingreen (LTW):
Contact Energy:
F
ij 1

with
qi
~ solubility
Qi
C
~ hydrophobicity
contact matrix
Qi   qi  
  0.6797,   0.2604
Summary of Ho Threading Procedure
Template Structure
Contact Matrix
1
i
j
N
Sequence
AVFMRIHNDIVYNDIANTTQ
Scoring Function
Cij  1, if rij < 6 5 Å
Cij  0, otherwise
(a neighbor in sequence)
Sequence Vector
S  (QA, QV , QF ,....., QE )
 (0.7997, 0.9897, 1.1197, 0.6497)
Contact Energy
Ec   QiCijQj  
N
Yungok Ihm
ij 1
Can complexity be further reduced?
Consider simplifying structure representation, too
ALKKGF…HFDTSE
Sequence – Structure
(1D – 3D problem)
Sequence – Contact Matrix
(1D – 2D problem)
Sequence – 1D Profile
(1D – 1D problem)
Haibo Cao
Represent contact matrix by its dominant
eigenvector (1D profile)
• First eigenvector (with highest eigenvalue) dominates the overlap
between sequence and structure
• Higher ranking (rank > 4) eigenvectors are “sequence blind”
Haibo Cao
Threading Alignment Step - now fast!
Align target sequence vector (1D) with
eigenvector profile of template structure (1D)
1D Profile P
V1
Maximize the overlap between the
Sequence (S) and the profile (P)
S  P allowing gaps
New profile P  CP
Calculate contact energy
using the alignment:
Ec
Cao et al
Polymer 45 (2004)
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
15
Parameters for alignment?
• Gap penalty:
Insertion/deletion in helices or
strands is strongly penalized; smaller
penalties for in/dels in loops
ALKKGFG…HFDTSE
Gap penalties apply to alignment score
only, not to energy calculation
• Size
penalty:
If a target residue and aligned
template residue differ in radius by
> 0.5Å and if residue is involved in
> 2 contacts, alignment is penalized
Loop
Helix
Size penalties apply to alignment score
only, not to energy calculation
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
10/17/07
16
How incorporate secondary structure?
• Predict secondary structure of target sequence
(PSIPRED, PROF, JPRED, SAM, GOR V)
N+ = total number of matches between predicted
& actual secondary structure of template
N- = total number of mismatches
Ns = total number of residues selected in alignment
“Global fitness” :
f = 1 + (N+ - N-) / Ns
Emod = f * Ethreading
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
Yungok Ihm
10/17/07
17
How much better is this “fit” than random?
Eshuffle : Shuffled Sequence vs Structure
Erelative = Emod – Eshuffled
E score modifed to reflect
fit with predicted 2' structure
Yungok Ihm
Avg E score for same sequence
shuffled (randomized) many times
Performance Evaluation? "Blind Test"
CASP5 Competition (CASP7 is most recent)
(Critical Assessment of Protein Structure Prediction)
Given: Amino acid sequence
Goal: Predict 3-D structure
(before experimental results published)
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
19
Typical Results:
(well, actually, our BEST Results):
HO = #1-Ranked CASP5 Prediction for this Target
Predicted Structure
• Target 174
• PDB ID = 1MG7
T174_1
Actual Structure
T174_2
444/544Dobbs,
F07 ISU Terribilini
#24 - RNA Secondary Structure Prediction
Cao, Ihm,BCB
Wang,
Ho
10/17/07
20
Overall Performance in CASP5 Contest
~8th out of 180 (M. Levitt, Stanford)
•
•
FR Fold Recognition
(targets manually assessed by Nick Grishin)
•
-----------------------------------------------------------
•
Rank
•
•
•
•
•
•
•
•
•
1
2
3
4
5
6
7
8
9
•
•
•
Z-Score Ngood
24.26
21.64
19.55
16.88
15.25
14.56
13.49
11.34
10.45
9.00
7.00
8.00
6.00
7.00
6.50
4.00
3.00
3.00
Npred
12.00
12.00
12.50
10.00
7.00
11.50
11.00
6.00
5.50
NgNW
9
7
9
6
7
7
4
3
3
NpNW
12
12
14
10
7
13
11
6
6
Group-name
Ginalski
Skolnick Kolinski
Baker
BIOINFO.PL
Shortle
BAKER-ROBETTA
Brooks
Ho-Kai-Ming
Jones-NewFold
----------------------------------------------------------FR NgNW - number of good predictions without weighting for multiple models
FR NpNW - number of total predictions without weighting for multiple models
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
21
CASP - Check it out!
Critical Assessment of Protein Structure Prediction
http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest - 2006:
• http://www.predictioncenter.org/casp7/Casp7.html
• Provides assessment of automated servers for protein
structure prediction (LiveBench, CAFASP, EVA)
& URLs for them
• Related contests & resources:
• Protein Function Prediction (part of CASP)
• CAPRI = Critical Assessment of Predicted Interactions
• New: CASPM = CASP for M = Mutant proteins
• Predict effects of small (point) mutations, e.g., SNPs
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
22
Another Convenient List of Links for
Protein Prediction Servers
http://en.wikipedia.org/wiki/List_of_protein_structure_pre
diction_software
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
23
Chp 13 - Protein Structure Visualization,
Comparison & Classification
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison &
Classification
• Protein Structural Visualization
 Protein Structure Comparison
• Protein Structure Classification
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
24
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures
(see Xiong textbook for details)
1. Intermolecular
2. Intramolecular
3. Combined
But, very active research area - many recent new methods
3 Popular Methods:
DALI = Distance Matrix Alignment of Structures (Holm)
• FSSP Database
SSAP = Sequential Structure Alignment Program (Orengo)
• CATH Database
CE = Combinatorial Extension (Bourne)
• VAST at NCBI
•
•
•
URLS:
http://en.wikipedia.org/wiki/Structural_alignment_software
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
25
Chp 16 - RNA Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 16 RNA Structure Prediction (Terribilini)
•
•
•
•
•
•
RNA Function
Types of RNA Structures
RNA Secondary Structure Prediction Methods
Ab Initio Approach
Comparative Approach
Performance Evaluation
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
49
RNA Function
• Storage/transfer of genetic information
• Newly discovered regulatory functions - RNAi
pathways especially
• Catalytic
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
50
RNA types & functions
Types of RNAs
Primary Function(s)
mRNA - messenger
translation (protein synthesis)
regulatory
rRNA - ribosomal
translation (protein synthesis)
t-RNA - transfer
translation (protein synthesis)
hnRNA - heterogeneous nuclear
precursors & intermediates of mature mRNAs &
other RNAs
scRNA - small cytoplasmic
signal recognition particle (SRP)
tRNA processing
<catalytic>
snRNA - small nuclear
snoRNA - small nucleolar
mRNA processing, poly A addition <catalytic>
rRNA processing/maturation/methylation
regulatory RNAs (siRNA, miRNA,
etc.)
regulation of transcription and translation,
other??
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
<catalytic>
10/17/07
51
RNA Structure
• RNA forms complex 3D structures
• Mainly single stranded
• The single RNA strand can self-hybridize to form
base paired regions
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
52
Levels of RNA Structure
Rob Knight
Univ Colorado
• Like proteins, RNA has primary, secondary, and tertiary
structures
• Primary structure - base sequence
• Secondary structure - single stranded or base paired
• Tertiary structure - 3D structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
53
RNA Structure Prediction
• RNA tertiary structure is very difficult to predict
• Focus on predicting RNA secondary structure
• Given a RNA sequence, predict the secondary
structure of the molecule
• Almost all methods ignore higher order secondary
structures like psuedoknots
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
54
Base Pairing in RNA
G-C, A-U, G-U ("wobble") & variants
See: IMB Image Library of Biological Molecules
http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
55
Common structural motifs in RNA
• Helices
• Loops
•
•
•
•
Hairpin
Interior
Bulge
Multibranch
• Pseudoknots
Fig 6.2
Baxevanis &
Ouellette 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
56
RNA Secondary Structure Prediction
Methods
• Two main types of methods
• Ab initio - based on calculating the most
energetically favorable secondary structure
• Comparative approach - based on evolutionary
comparison of multiple related RNA sequences
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
57
Ab Initio Prediction
• Only requires a single RNA sequence
• Calculates minimum free energy structure
• Base pairing lowers free energy of the structure,
so methods attempt to find secondary structure
with maximal base pairing
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
58
Ab Initio Prediction
• Free energy is calculated based on parameters
determined in the wet lab
• Known energy associated with each type of base
pair
• Base pair formation is not independent - multiple
base pairs adjacent to each other are more
favorable than individual base pairs - cooperative
• Bulges and loops adjacent to base pairs have a free
energy penalty
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
59
Ab Initio Energy Calculation Method
• Search for all possible
base-pairing patterns
• Calculate the total
energy of the
structure based on all
stabilizing and
destabilizing forces
Fig 6.3
Baxevanis &
Ouellette 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
60
Dot Matrices
• Can be used to find all
possible base pair
patterns
• Compare the input
sequence to itself and
put a dot anywhere
there is a
complimentary base
R Knight 2005
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
61
Dynamic Programming
• Finding the best possible secondary structure is
difficult - lots of possibilities
• Compare RNA sequence with itself
• Apply scoring scheme based on energy parameters
for base pairs, cooperativity, and penalties for
destabilizing forces
• Find path that represents the most energetically
favorable secondary structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
62
Problem
• DP returns the SINGLE best structure
• There may be many structures with similar
energies
• Also, your predicted secondary structure is only as
good as the energy parameters used
• Solution - return multiple structures with near
optimal energies
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
63
Popular Ab Initio Prediction Programs
• Mfold
• Combines DP with thermodynamic calculations
• Fairly accurate for short sequences, less accurate as
sequence length increases
• RNAfold
• Returns multiple structures near the optimal structure
• Computes a larger number of potential secondary
structures than Mfold, so it uses a simplified energy
function
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
64
Comparative Approach
• Uses multiple sequence alignment
• Assumes related sequences fold into the same
secondary structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
65
Covariation
• RNA functional motifs are conserved
• To maintain RNA structure during evolution, a
mutation in a base paired residue must be
compensated for by a mutation in the base that it
pairs with
• Comparative methods search for covariation
patterns in MSA
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
66
Consensus Structures
• Predict secondary structure of each individual
sequence
• Compare all structures and see if there is a most
common structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
67
Popular Comparative Prediction
Programs
• Two types
• Require user to provide MSA
• No MSA required
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
68
RNAalifold
• Requires user to provide the MSA
• Creates a scoring matrix combining minimum free
energy and covariation information
• DP is used to select the minimum free energy
structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
69
Foldalign
• User provides a pair of unaligned RNA sequences
• Foldalign constructs alignment then computes a
commonly conserved structure
• Suitable only for short sequences
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
70
Dynalign
• User provides two input sequences
• Dynalign calculates possible secondary structures
using algorithm similar to Mfold
• Dynalign compares multiple structures from both
sequences to find a common structure
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
71
Performance Evaluation
• Ab initio methods achieve correlation coefficient of 20-60%
• Comparative approaches achieve correlation coefficient of
20-80%
• Programs that require user to supply MSA are more accurate
• Comparative programs are consistently more accurate than ab
initio programs
• Base-pairs predicted by comparative sequence analysis for
large & small subunit rRNAs are 97% accurate when
compared with high resolution crystal structures!
- Gutell, Pace
BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction
10/17/07
72
Download