BCB 444/544 Lecture 24 Protein Tertiary Structure Prediction #24_Oct17 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 1 Required Reading (before lecture) Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction • Chp 16 - pp 231 - 242 Fri Oct 18 - Lecture 25 Gene Prediction • Chp 8 - pp 97 - 112 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 2 New Reading & Homework Assignment ALL: HomeWork #4 (emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 3 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and antibody libraries in protein engineering and ligand selection • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Lyric Bartholomay (Ent, ISU) TBA BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 4 Chp 15 - Tertiary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction • • • • • Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 5 Tertiary Structure Prediction Methods 2 (or 3) Major Methods: 1. Comparative Modeling: • Homology Modeling (easiest!) • Threading and Fold Recognition (harder) 2. Ab Initio Protein Structural Prediction (really hard) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 6 Steps in Threading Target Sequence ALKKGF…HFDTSE Structure Templates 1. Align target sequence with template structures in fold library (usually from the PDB) 2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure 3. Rank models based on energy scores BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 7 Rapid Threading Approach for Protein Structure Prediction A Local Example: Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition Polymer 45:687-697 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 8 Simplify: Template structure representation 1 i j N Template structure Cij 1, if rij 6.5 Å Cij 0, C ( N N contact matrix) (contact) Otherwise (non-contact) A neighbor in sequence Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 9 Simplify: Energy Function • Interaction “counts” only if two hydrophobic amino acid residues are in contact • At residue level, pair-wise hydrophobic interaction is dominant: E = i,j Cij Uij Cij : contact matrix Uij = U(residue I, residue MJ: LTW: HP: J) U = Uij U = Qi*Qj U = {1,0} Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 10 Energy calculation: Contact energy Miyazawa-Jernigan (MJ) matrix: Statistical potential 210 parameters C M C M F I L V W 046 054 049 057 052 M L -020 -001 006 001 003 -008 018 010 -001 -004 20 parameters Ec (QiCijQj Cij) N Yungok Ihm I ~ Mij C 2{( qi )( qj ) } Li-Tang-Wingreen (LTW): Contact Energy: F ij 1 with qi ~ solubility Qi C ~ hydrophobicity contact matrix Qi qi 0.6797, 0.2604 Summary of Ho Threading Procedure Template Structure Contact Matrix 1 i j N Sequence AVFMRIHNDIVYNDIANTTQ Scoring Function Cij 1, if rij < 6 5 Å Cij 0, otherwise (a neighbor in sequence) Sequence Vector S (QA, QV , QF ,....., QE ) (0.7997, 0.9897, 1.1197, 0.6497) Contact Energy Ec QiCijQj N Yungok Ihm ij 1 Can complexity be further reduced? Consider simplifying structure representation, too ALKKGF…HFDTSE Sequence – Structure (1D – 3D problem) Sequence – Contact Matrix (1D – 2D problem) Sequence – 1D Profile (1D – 1D problem) Haibo Cao Represent contact matrix by its dominant eigenvector (1D profile) • First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure • Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao Threading Alignment Step - now fast! Align target sequence vector (1D) with eigenvector profile of template structure (1D) 1D Profile P V1 Maximize the overlap between the Sequence (S) and the profile (P) S P allowing gaps New profile P CP Calculate contact energy using the alignment: Ec Cao et al Polymer 45 (2004) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 15 Parameters for alignment? • Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops ALKKGFG…HFDTSE Gap penalties apply to alignment score only, not to energy calculation • Size penalty: If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized Loop Helix Size penalties apply to alignment score only, not to energy calculation BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm 10/17/07 16 How incorporate secondary structure? • Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V) N+ = total number of matches between predicted & actual secondary structure of template N- = total number of mismatches Ns = total number of residues selected in alignment “Global fitness” : f = 1 + (N+ - N-) / Ns Emod = f * Ethreading BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm 10/17/07 17 How much better is this “fit” than random? Eshuffle : Shuffled Sequence vs Structure Erelative = Emod – Eshuffled E score modifed to reflect fit with predicted 2' structure Yungok Ihm Avg E score for same sequence shuffled (randomized) many times Performance Evaluation? "Blind Test" CASP5 Competition (CASP7 is most recent) (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 19 Typical Results: (well, actually, our BEST Results): HO = #1-Ranked CASP5 Prediction for this Target Predicted Structure • Target 174 • PDB ID = 1MG7 T174_1 Actual Structure T174_2 444/544Dobbs, F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Cao, Ihm,BCB Wang, Ho 10/17/07 20 Overall Performance in CASP5 Contest ~8th out of 180 (M. Levitt, Stanford) • • FR Fold Recognition (targets manually assessed by Nick Grishin) • ----------------------------------------------------------- • Rank • • • • • • • • • 1 2 3 4 5 6 7 8 9 • • • Z-Score Ngood 24.26 21.64 19.55 16.88 15.25 14.56 13.49 11.34 10.45 9.00 7.00 8.00 6.00 7.00 6.50 4.00 3.00 3.00 Npred 12.00 12.00 12.50 10.00 7.00 11.50 11.00 6.00 5.50 NgNW 9 7 9 6 7 7 4 3 3 NpNW 12 12 14 10 7 13 11 6 6 Group-name Ginalski Skolnick Kolinski Baker BIOINFO.PL Shortle BAKER-ROBETTA Brooks Ho-Kai-Ming Jones-NewFold ----------------------------------------------------------FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 21 CASP - Check it out! Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ • CASP7 contest - 2006: • http://www.predictioncenter.org/casp7/Casp7.html • Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them • Related contests & resources: • Protein Function Prediction (part of CASP) • CAPRI = Critical Assessment of Predicted Interactions • New: CASPM = CASP for M = Mutant proteins • Predict effects of small (point) mutations, e.g., SNPs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 22 Another Convenient List of Links for Protein Prediction Servers http://en.wikipedia.org/wiki/List_of_protein_structure_pre diction_software BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 23 Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification • Protein Structural Visualization Protein Structure Comparison • Protein Structure Classification BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 24 Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures (see Xiong textbook for details) 1. Intermolecular 2. Intramolecular 3. Combined But, very active research area - many recent new methods 3 Popular Methods: DALI = Distance Matrix Alignment of Structures (Holm) • FSSP Database SSAP = Sequential Structure Alignment Program (Orengo) • CATH Database CE = Combinatorial Extension (Bourne) • VAST at NCBI • • • URLS: http://en.wikipedia.org/wiki/Structural_alignment_software BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 25 Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) • • • • • • RNA Function Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 49 RNA Function • Storage/transfer of genetic information • Newly discovered regulatory functions - RNAi pathways especially • Catalytic BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 50 RNA types & functions Types of RNAs Primary Function(s) mRNA - messenger translation (protein synthesis) regulatory rRNA - ribosomal translation (protein synthesis) t-RNA - transfer translation (protein synthesis) hnRNA - heterogeneous nuclear precursors & intermediates of mature mRNAs & other RNAs scRNA - small cytoplasmic signal recognition particle (SRP) tRNA processing <catalytic> snRNA - small nuclear snoRNA - small nucleolar mRNA processing, poly A addition <catalytic> rRNA processing/maturation/methylation regulatory RNAs (siRNA, miRNA, etc.) regulation of transcription and translation, other?? BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction <catalytic> 10/17/07 51 RNA Structure • RNA forms complex 3D structures • Mainly single stranded • The single RNA strand can self-hybridize to form base paired regions BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 52 Levels of RNA Structure Rob Knight Univ Colorado • Like proteins, RNA has primary, secondary, and tertiary structures • Primary structure - base sequence • Secondary structure - single stranded or base paired • Tertiary structure - 3D structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 53 RNA Structure Prediction • RNA tertiary structure is very difficult to predict • Focus on predicting RNA secondary structure • Given a RNA sequence, predict the secondary structure of the molecule • Almost all methods ignore higher order secondary structures like psuedoknots BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 54 Base Pairing in RNA G-C, A-U, G-U ("wobble") & variants See: IMB Image Library of Biological Molecules http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 55 Common structural motifs in RNA • Helices • Loops • • • • Hairpin Interior Bulge Multibranch • Pseudoknots Fig 6.2 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 56 RNA Secondary Structure Prediction Methods • Two main types of methods • Ab initio - based on calculating the most energetically favorable secondary structure • Comparative approach - based on evolutionary comparison of multiple related RNA sequences BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 57 Ab Initio Prediction • Only requires a single RNA sequence • Calculates minimum free energy structure • Base pairing lowers free energy of the structure, so methods attempt to find secondary structure with maximal base pairing BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 58 Ab Initio Prediction • Free energy is calculated based on parameters determined in the wet lab • Known energy associated with each type of base pair • Base pair formation is not independent - multiple base pairs adjacent to each other are more favorable than individual base pairs - cooperative • Bulges and loops adjacent to base pairs have a free energy penalty BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 59 Ab Initio Energy Calculation Method • Search for all possible base-pairing patterns • Calculate the total energy of the structure based on all stabilizing and destabilizing forces Fig 6.3 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 60 Dot Matrices • Can be used to find all possible base pair patterns • Compare the input sequence to itself and put a dot anywhere there is a complimentary base R Knight 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 61 Dynamic Programming • Finding the best possible secondary structure is difficult - lots of possibilities • Compare RNA sequence with itself • Apply scoring scheme based on energy parameters for base pairs, cooperativity, and penalties for destabilizing forces • Find path that represents the most energetically favorable secondary structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 62 Problem • DP returns the SINGLE best structure • There may be many structures with similar energies • Also, your predicted secondary structure is only as good as the energy parameters used • Solution - return multiple structures with near optimal energies BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 63 Popular Ab Initio Prediction Programs • Mfold • Combines DP with thermodynamic calculations • Fairly accurate for short sequences, less accurate as sequence length increases • RNAfold • Returns multiple structures near the optimal structure • Computes a larger number of potential secondary structures than Mfold, so it uses a simplified energy function BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 64 Comparative Approach • Uses multiple sequence alignment • Assumes related sequences fold into the same secondary structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 65 Covariation • RNA functional motifs are conserved • To maintain RNA structure during evolution, a mutation in a base paired residue must be compensated for by a mutation in the base that it pairs with • Comparative methods search for covariation patterns in MSA BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 66 Consensus Structures • Predict secondary structure of each individual sequence • Compare all structures and see if there is a most common structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 67 Popular Comparative Prediction Programs • Two types • Require user to provide MSA • No MSA required BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 68 RNAalifold • Requires user to provide the MSA • Creates a scoring matrix combining minimum free energy and covariation information • DP is used to select the minimum free energy structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 69 Foldalign • User provides a pair of unaligned RNA sequences • Foldalign constructs alignment then computes a commonly conserved structure • Suitable only for short sequences BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 70 Dynalign • User provides two input sequences • Dynalign calculates possible secondary structures using algorithm similar to Mfold • Dynalign compares multiple structures from both sequences to find a common structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 71 Performance Evaluation • Ab initio methods achieve correlation coefficient of 20-60% • Comparative approaches achieve correlation coefficient of 20-80% • Programs that require user to supply MSA are more accurate • Comparative programs are consistently more accurate than ab initio programs • Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 72