Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov. Outline Lecture 1 • Importance of RNA, examples (miRNA, riboswitches). • RNA 2D and 3D structure. • RNA structure prediction. Lecture 2 • RNA basepairs and 3D motifs • Predicting secondary structure from sequence (mfold) Lecture 3 • Statistical variability of protein and RNA sequences In the human, out of approximately 3 billion nucleotides, only about 1.5% code for proteins, although up to 93% are transcribed into RNA. What is this “non-coding” RNA doing? ENCODE Project Consortium, Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14447(7146):799-816 Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67. DNA Transcription RNA tRNA Ribosomal RNA Translation Protein DNA Reverse Transcription Transcription micro RNA Introns (RNA) RNA tRNA Ribosomal RNA Many other types of ncRNA Splicing Translation of exons Protein Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67. microRNA miRNAs in a transcript, waiting to be diced out Mattick, J.S. (2004) The hidden genetic program of complex organisms. Scientific American 291 (4): 60-67. Bioinformatical challenge: given a DNA sequence, predict microRNA genes and their respective targets. Kim VN, MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol. 2005 May;6(5):376-85 Acquisition of novel microRNAs (shown in white boxes) may be a driving force of recent evolution. Also a factor in cancers? Peterson, K.J., Dietrich, M.R. and McPeek, M.A. (2009) MicroRNAs and metazoan macroevolution: insights into canalization, complexity, and the Cambrian explosion. BioEssays 31:736–747. There are 84 mammal-specific microRNAs, and 84 more that are found exclusively in apes. RIBOSWITCHES RNAs which bind to other molecules when they are present, altering the shape and function of the RNA. Bioinformatic challenges: find riboswitches in genomic sequences, design novel riboswitches. Montange, R. K., & Batey, R. T. (2008). Riboswitches: emerging themes in RNA structure and function. Annu Rev Biophys 37:117-133. Types of RNA Bioinformatic challenges: Is this list final? Could there be more types of noncoding (ncRNA) that we don’t know yet? How to search for novel ncRNAs in genomes? http://en.wikipedia.org/wiki/List_of_RNAs Goals of RNA bioinformatics • Find and classify RNA genes in genomic sequences (using both experimental and computational methods). • Predict secondary and 3D structure from RNA sequence. • Infer function from structure. • Rationally design RNA molecules for biotechnology. • Find diseases associated with RNAs (e.g., cancer and miRNA) Why RNA is unique • Similar to DNA in chemical composition, primary and secondary structure, and information content, but with more complicated structure than helices • Similar to Proteins in tertiary and 3D structure and function, but also very different, mostly base-base interactions, fewer backbone-backbone • Binds substrates and catalyzes reactions, just as proteins. • Participates in all stages of gene expression and information transfer: transcription, splicing, translation. Frequent target of antibiotics. Similarities Between Protein and RNA 3D Structures • Compact folding • Hierarchical organization • Modular domains • Specific tertiary interactions • Molecular “mimicry” -- Proteins that “mimic” RNA The tertiary structures of tRNA-mimic translation factors and tRNA. (a) Thermus thermophilus EFG:GDP (PDB accession code 1DAR). (b) Thermus aquaticus EFTu:GDPNP:Phe-tRNAPhe (1TTT). (c) Thermus thermophilus RRF (1EH1). (d) Yeast Phe-tRNAPhe. LIANG, H., & LANDWEBER, L. F. (2005). Molecular mimicry: Quantitative methods to study structural similarity between protein and RNA. R 1172. RNAs are not linear - they fold back on themselves to match up complementary strands RNA 2D Structure Elements Basepairs are the basic units of secondary structure. Bioinformatics: sequence and genome analysis By David W. Mount Bioinformatic challenges: predict most stable 2D structures, resolve pseudoknotted regions etc. 2d to 3d structure of RNA