http://cs273a.stanford.edu [Bejerano Fall10/11] 1 Lecture 11 HW1 Feedback (ours) (Upcoming Project – discuss Wed) Non-Coding RNAs Halfway Feedback (yours) http://cs273a.stanford.edu [Bejerano Fall10/11] 2 “non coding” RNAs http://cs273a.stanford.edu [Bejerano Fall10/11] 3 Central Dogma of Biology: 4 RNA is an Active Player: reverse transcription long ncRNA 5 What is ncRNA? • Non-coding RNA (ncRNA) is an RNA that functions without being translated to a protein. • Known roles for ncRNAs: – RNA catalyzes excision/ligation in introns. – RNA catalyzes the maturation of tRNA. – RNA catalyzes peptide bond formation. – RNA is a required subunit in telomerase. – RNA plays roles in immunity and development (RNAi). – RNA plays a role in dosage compensation. – RNA plays a role in carbon storage. – RNA is a major subunit in the SRP, which is important in protein trafficking. – RNA guides RNA modification. – In the beginning it is thought there was an RNA World, where RNA was both the information carrier and active molecule. 6 RNA Folds into (Secondary and) 3D Structures AAUUGCGGGAAAGGGGUCAA CAGCCGUUCAGUACCAAGUC UCAGGGGAAACUUUGAGAUG GCCUUGCAAAGGGUAUGGUA AUAAGCUGACGGACAUGGUC CUAACCACGCAGCCAAGUCC UAAGUCAACAGAUCUUCUGU UGAUAUGGAUGCAGUUCA We would like to predict them from sequence. A C A GA CA G C U A G C 200 G C G U 120 U P5a CA G U G C P5 UC G G U A C G C G G G U U AA A U A A A AU A A A C 180 G C G C P5c G U A G C G U A A 260 A C G A A GG G C C A C P4 C G U GUUCC G A U A 140 G U G A U U U 160 G C P6 G C G AA A U C A C P5b G A U A C U G A G U G 220 G U C G U A A G C G P6a AA C G U U A A A G U U A C G A U A U P6b C G A U G C 240 A U UCU Waring & Davies. (1984) Gene 28: 277. Cate, et al. (Cech & Doudna). (1996) Science 273:1678. 7 RNA structure rules • • • • Canonical basepairs: – Watson-Crick basepairs: • G-C • A-U – Wobble basepair: • G–U Stacks: continuous nested basepairs. (energetically favorable) Non-basepaired loops: – Hairpin loop. – Bulge. – Internal loop. – Multiloop. Pseudo-knots RNA structure: Basics • • • Key: RNA is single-stranded. Think of a string over 4 letters, AC,G, and U. The complementary bases form pairs. Base-pairing defines a secondary structure. The base-pairing is usually non-crossing. Bafna 1 Ab initio structure prediction: lots of Dynamic Programming • Maximizing the number of base pairs (Nussinov et al, 1978) simple model: (i, j) = 1 Pseudoknots drastically increase computational complexity Nearest Neighbor Model for RNA Secondary Structure Free Energy at 37 OC: -2.1 -0.9 -1.6 C G G U U U U G Ghelix = G G C + G C A + 2G A A + G A C = C G U U U G G G -2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol U U G C A A A CA C G G Ghairpin loop = Ginitiation (6 nucleotides) + Gmismatch C A = -2.0 -0.9 -1.8 +5.0 5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol Gtotal = Ghairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287. http://cs273a.stanford.edu [Bejerano Fall10/11] 12 Zuker’s algorithm MFOLD: computing loop dependent energies Energy Landscape of Real & Inferred Structures http://cs273a.stanford.edu [Bejerano Fall10/11] 14 Unfortunately… – – Random DNA (with high GC content) often folds into low-energy structures. What other signals determine non-coding genes? 1 Evolution to the Rescue http://cs273a.stanford.edu [Bejerano Fall10/11] 16 http://cs273a.stanford.edu [Bejerano Fall10/11] 17 Stochastic context-free grammar (SCFG) S S aSu S uSa S gSc S cSg SL L aL L cL La Lc S S S L L L L c guu aga aac cucucccc • Each derivation tree corresponds to a structure. Stochastic context-free grammar (cont’) S aSu S aSu S cSg acSgu S gSc accSggu S uSa accuSaggu Sa accuSSaggu Sc accugScSaggu Sg accuggSccSaggu Su accuggaccSaggu S SS accuggacccSgaggu accuggacccuSagaggu accuggacccuuagaggu 1. A CFG 2. A derivation of “accuggacccuuagaggu” 3. Corresponding structure http://cs273a.stanford.edu [Bejerano Fall10/11] 20 MicroRNA http://cs273a.stanford.edu [Bejerano Fall10/11] 21 Genomic context known miRNAs in human intergenic polycistronic monocistronic intronic tRNA tRNA Activity http://cs273a.stanford.edu [Bejerano Fall10/11] 25 http://cs273a.stanford.edu [Bejerano Fall10/11] 26 Human specific accelerated evolution rapid change Human Chimp conserved http://cs273a.stanford.edu [Bejerano Fall10/11] 27 Human Accelerated Regions Human-specific substitutions in conserved sequences rapid change Human Chimp conserved Human Derived [Pollard, K. et al., Nature, 2006] Chimp Ancestral HAR1: • Novel ncRNA •Co-expressed in Cajal-Retzius cells with reelin. •Similar expression in human, chimp, rhesus. •18 unique human substitutions leading to novel conformation. •All weak (AT) to strong (GC). [Beniaminov, A. et al., RNA, 2008] 28 Other Non Coding Transcripts http://cs273a.stanford.edu [Bejerano Fall10/11] 29 http://cs273a.stanford.edu [Bejerano Fall10/11] 30 mRNA http://cs273a.stanford.edu [Bejerano Fall10/11] 31 EST http://cs273a.stanford.edu [Bejerano Fall10/11] 32 lincRNAs (long intergenic non coding RNAs) http://cs273a.stanford.edu [Bejerano Fall10/11] 33 X chromosome inactivation in mammals X X X Dosage compensation X Y Xist – X inactive-specific transcript Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67 Microarrays, Next Gen(eration) Sequencing etc. http://cs273a.stanford.edu [Bejerano Fall10/11] 36 End Results http://cs273a.stanford.edu [Bejerano Fall10/11] 37 http://cs273a.stanford.edu [Bejerano Fall10/11] 38 http://cs273a.stanford.edu [Bejerano Fall10/11] 39 Transcripts, transcripts everywhere Human Genome Transcribed (Tx) Leaky tx? Tx from both strands Functional? http://cs273a.stanford.edu [Bejerano Fall10/11] 40 Or are they? http://cs273a.stanford.edu [Bejerano Fall10/11] 41 Halfway Feedback http://cs273a.stanford.edu [Bejerano Fall10/11] 42