BCB 444/544 Lecture 25 More RNA Structure BCB 544 Projects #25_Oct19 BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 1 Required Reading (before lecture) Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction • Chp 16 - pp 231 - 242 Fri Oct 18 - Lecture 25 (& Mon Oct 22) Gene Prediction • Chp 8 - pp 97 - 112 BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 2 Homework Assignment ALL: HomeWork #4 (emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13 BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 3 BCB 544 Only: New Homework Assignment 544 Extra#2 (posted online Thurs?) Due: Fri Nov 2 by 5 PM HW#2 is next step in Team Projects Will end lecture a few minutes early today - to allow time to meet & discuss 544 Teams & Projects BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 4 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and antibody libraries in protein engineering and ligand selection • Was great talk! • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Lyric Bartholomay (Ent, ISU) Computational Biology and vector-borne disease: from the field to the bench BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 5 Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) • • • • • • RNA Function Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 7 RNA Function This slide has been changed • Storage/transfer of genetic information • Newly discovered regulatory functions • miRNA & si RNA pathways, especially • Catalytic BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 8 RNA types & functions Types of RNAs Primary Function(s) mRNA - messenger translation (protein synthesis) regulatory rRNA - ribosomal translation (protein synthesis) tRNA - transfer translation (protein synthesis) hnRNA - heterogeneous nuclear precursors & intermediates of mature mRNAs & other RNAs scRNA - small cytoplasmic signal recognition particle (SRP) tRNA processing <catalytic> <catalytic> snRNA - small nuclear snoRNA - small nucleolar mRNA processing, polyA addition <catalytic> rRNA processing/maturation/methylation regulatory RNAs (siRNA, miRNA, etc.) regulation of transcription and translation, other?? BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 9 RNA Structures • RNA forms complex 3D structures • Mainly "single-stranded" - but: • Single RNA strandscan self-hybridize to form Base-paired regions BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 10 Levels of RNA Structure This slide has been changed Like proteins, RNA has primary, secondary, and tertiary structure (& quaternary structure, too) 1. Primary structure = Ribonucleotide sequence 2. Secondary structure = Helix vs turn (base-paired vs single-stranded) Note: in RNA, helices often involve long-range interactions 3. Tertiary structure = 3D structure (also due to long-range interactions) 4. Quaternary structure = complex of 2 or more RNA strands Rob Knight Univ Colorado BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 11 Common structural motifs in RNA • Helices • Loops • • • • Hairpin Interior Bulge Multibranch • Pseudoknots • Tetraloops Fig 6.2 Baxevanis & Ouellette BCB 2005 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 12 This is a new slide Covalent & non-covalent bonds in RNA Primary: Covalent bonds Secondary/Tertiary Non-covalent bonds • H-bonds (base-pairing) • Base stacking Fig 6.2 BCB 444/544 Baxevanis & Ouellette 2005 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 13 RNA Structure Prediction This slide has been changed • RNA tertiary structure is very difficult to predict • Focus on predicting RNA secondary structure: • Given an RNA sequence, predict its secondary structure • Almost all methods ignore higher order secondary structures such as pseudoknots & tetraloops • Specialized software is available for predicting these BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 14 RNA Pseudoknots & Tetraloops This is a new slide • Often have important regulatory or catalyltic functions Pseudoknot http://www.lbl.gov/Science-Articles/ResearchReview/Annual-Reports/1995/images/rna.gif Tetraloop http://academic.brooklyn.cuny.edu/chem/z huang/QD/mckay_hr.gif BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 15 Base Pairing in RNA This slide has been changed G-C, A-U, G-U ("wobble") & many variants See: IMB Image Library of Biological Molecules http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 16 Experimental RNA structure determination? • X-ray crystallography • NMR spectroscopy • Enzymatic/chemical mapping BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 17 This slide has been changed RNA Secondary Structure Prediction Methods Two (three, recently) main types of methods: 1. Ab initio - based on calculating most energetically favorable secondary structure(s) Energy minimization (thermodynamics) 2. Comparative approach - based on comparisons of multiple evolutionarily-related RNA sequences Sequence comparison (co-variation) 3. Combined computational & experimental Use experimental constraints when available BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 18 This is a new slide RNA Secondary structure prediction - 1 1) Energy minimization (thermodynamics) • Algorithms: • Software: Dynamic programming to find high probability pairs (also, some Genetic algorithms) Mfold - Zuker RNAfold (Vienna Package) -Hofacker RNAstructure - Mathews Sfold - Ding & Lawrence R Knight 2005 BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 19 This is a new slide RNA Secondary structure prediction - 2 2) Comparative sequence analysis (co-variation) • Algorithms: • Software: Mutual information Context-free grammars RNAlifold Foldalign Dynalign BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 20 This is a new slide RNA Secondary structure prediction - 3 3) Combined experimental & computational • Experiments: DMS Map single-stranded vs doublestranded regions in folded RNA G • How? 200 Enzymes: S1 nuclease, T1 RNase Chemicals: kethoxal, DMS, OH 220 • Software: Mfold Sfold RNAStructure RNAFold RNAlifold 240 Kethoxal modification (mild) (strong) DMS modification (mild) (strong) BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 21 1 - Ab Initio Prediction This slide has been changed • Requires only a single RNA sequence • Calculates minimum free energy structure • Base-paired regions have lower free energy, so methods "attempt to find secondary structure with maximal base pairing" (Careful!) • IMPORTANT: Largest contribution to energy is to nearest neighbor (base-stacking) interactions, not base-pairing! BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 22 This slide has been changed Ab Initio Prediction: Clarifications • Free energy is calculated based on parameters determined in the wet lab • Correction: Use known energy associated with each type of nearest-neighbor pair (base-stacking) (not base-pair) • Base-pair formation is not independent: multiple base-pairs adjacent to each other are more favorable than individual base-pairs - cooperative because of base-stacking interactions • Bulges and loops adjacent to base-pairs have a free energy penalty BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 23 Ab Initio Prediction: What are the assumptions? • This is a new slide Native tertiary structure or "fold" of an RNA molecule is (one of) its "lowest" free energy configuration(s) Gibbs free energy = G in kcal/mol at 37C = equilibrium stability of structure lower values (negative) are more favorable Is this assumption valid? in vivo? - this may not hold, but we don't really know BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 24 Energy minimization: What are the rules? A A U U Basepair A=U A=U This is a new slide What gives here? G = -1.2 kcal/mole A U U A Basepair A=U U=A G = -1.6 kcal/mole C Staben 2005 BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 25 Energy minimization calculations: Base-stacking is critical AA UU AU or UA AU UA AG, AC, CA, GA UC, UG, GU, CU CC GG This is a new slide -1.2 CG GC -3.0 -1.6 GC CG -4.3 -2.1 GU UG -0.3 -4.8 XG, GX YU, UY 0 - Tinocco et al. C Staben 2005BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 26 This is a new slide Ab initio RNA Structure Prediction: Uses Nearest-neighbor parameters • Most methods for ab initio prediction (free energy minimization) use nearest-neighbor energy parameters (derived from experiment) for predicting stability of an RNA secondary structure (in terms of G at 37C) & most available software packages use same set of parameters - Mathews, Sabina, Zuker BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 27 This slide has been changed Ab Initio Energy Calculation • Search for all possible base-pairing patterns • Calculate total energy of each structure based on all stabilizing and destabilizing forces Total free energy for a specific RNA conformation = Sum of incremental energy terms for: • helical stacking (sequence dependent) • loop initiation • unpaired stacking (favorable "increments" are < 0) Fig 6.3 Baxevanis & Ouellette 2005 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects BCB 444/544 10/19/07 28 Dot Matrices • Can be used to find all possible base pair patterns • Compare input sequence to itself and put a dot where there is a complimentary base BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects R Knight 2005 10/19/07 29 Dynamic Programming This slide has been changed • Finding optimal secondary structure is difficult lots of possibilities • Compare RNA sequence with itself • Apply scoring scheme based on energy parameters for base stacking, cooperativity, and penalties for destabilizing forces • Find path that represents most energetically favorable secondary structure BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 30 Problem with DP Approach • DP returns SINGLE lowest energy structure • There may be many structures with similar energies • Also, predicted secondary structure is only as good as energy parameters used • Solution: return multiple structures with near optimal energies BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 31 Popular Ab Initio Prediction Programs • Mfold • Combines DP with thermodynamic calculations • Fairly accurate for short sequences, less accurate as sequence length increases • RNAfold • Returns multiple structures near predicted optimal structure • Computes larger number of potential secondary structures than Mfold, so uses a simplified energy function BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 32 2 - Comparative Prediction Approaches • Use multiple sequence alignment • Assume related sequences fold into same secondary structure BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 33 Co-variation patterns in MSAs are critical • RNA functional motifs are conserved • To maintain RNA structure during evolution, a mutation in a base-paired residue must be compensated for by a mutation in residue with which it pairs • Comparative methods search for co-variation patterns in MSAs BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 34 Consensus Structures • Predict secondary structure of each individual sequence in a MSA • Compare all structures and try to identify a consensus structure BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 35 Popular Comparative Prediction Programs Two main types: 1. Require user to provide MSA • RNAalifold 2. No MSA required • Foldalign • Dynalign BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 36 RNAalifold • Requires user to provide MSA • Creates a scoring matrix combining minimum free energy and co-variation information • DP used to identify minimum free energy structure BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 37 Foldalign • User provides pair of unaligned RNA sequences • Constructs alignment & computes conserved structure • Suitable only for relatively short sequences BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 38 Dynalign • User provides two unaligned input sequences • Calculates possible secondary structures using algorithm similar to Mfold • Compares multiple structures from both sequences to find a common structure BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 39 3 - Popular Programs that use Combined Computational Experimental Approaches • • • • • Mfold Sfold RNAStructure RNAFold RNAlifold BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 40 Comparison of Predictions for Single RNA using Different Methods SL Y SL Y SL Z SL X SL Z SL X Sfold -51.14 kcal/mol Mfold -54.84 kcal/mol SL Y SL Z SL Y SL X SL Z SL X RNAstructure -71.3 kcal/mol RNAfold -80.16 kcal/mol JH Lee 2007BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 41 Comparison of Mfold Predictions: -/+ Constraints Mfold -126.05 kcal/mol Mfold plus constraints -54.84 kcal/mol JH Lee 2007BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 42 Performance Evaluation • • • • This slide has been changed Ab initio methods? correlation coefficient = 20-60% Comparative approaches? correlation coefficient = 20-80% Programs that require user to supply MSA are more accurate Comparative programs are consistently more accurate than ab initio • Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace • BEST APPROACH? Methods that combine computational prediction (ab initio & comparative) with experimental constraints (from chemical/enzymatic modification studies) BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 43 BCB 544 "Team" Projects • 544 Extra HW#2 is next step in Team Projects • • • • Write ~ 1 page outline Schedule meeting with Michael & Drena to discuss topic Read a few papers Write a more detailed plan • You may work alone if you prefer • Last week of classes will be devoted to Projects • Written reports due: Mon Dec 3 (no class that day) • Oral presentations (15-20') will be: Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period See Guidelines for Projects posted online BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects 10/19/07 44