CS173 Lecture 6: NON protein coding genes MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu http://cs173.stanford.edu [BejeranoWinter12/13] 1 Announcements • HW1 due in one week. http://cs173.stanford.edu [BejeranoWinter12/13] 2 ATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA TATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC TAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC TGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT CTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG AATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA GCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA CTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG TTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT TTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG CGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA GAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA ATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA TTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA ATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT ATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTT TGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGT TCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATAC ATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT GCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTA CGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGA ATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACA TCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAAC GGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAA CTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTG GCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTC TTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAAT TGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCT GCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT AATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCT TCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT AATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGA TTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTA CTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTT TACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT ACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAA http://cs173.stanford.edu [BejeranoWinter12/13] 3 AATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGT “non coding” RNAs (ncRNA) http://cs173.stanford.edu [BejeranoWinter12/13] 4 Central Dogma of Biology: 5 Active forms of “non coding” RNA long non-coding RNA reverse transcription microRNA rRNA, snRNA, snoRNA 6 What is ncRNA? • Non-coding RNA (ncRNA) is an RNA that functions without being translated to a protein. • Known roles for ncRNAs: – RNA catalyzes excision/ligation in introns. – RNA catalyzes the maturation of tRNA. – RNA catalyzes peptide bond formation. – RNA is a required subunit in telomerase. – RNA plays roles in immunity and development (RNAi). – RNA plays a role in dosage compensation. – RNA plays a role in carbon storage. – RNA is a major subunit in the SRP, which is important in protein trafficking. – RNA guides RNA modification. – RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule. 7 “non coding” RNAs (ncRNA) Small structural RNAs (ssRNA) http://cs173.stanford.edu [BejeranoWinter12/13] 8 ssRNA Folds into Secondary and 3D Structures AAUUGCGGGAAAGGGGUCAA CAGCCGUUCAGUACCAAGUC UCAGGGGAAACUUUGAGAUG GCCUUGCAAAGGGUAUGGUA AUAAGCUGACGGACAUGGUC CUAACCACGCAGCCAAGUCC UAAGUCAACAGAUCUUCUGU UGAUAUGGAUGCAGUUCA We would like to predict them from sequence. A C A GA CA G C U A G C 200 G C G U 120 U P5a CA G U G C P5 UC G G U A C G C G G G U U AA A U A A A AU A A A C 180 G C G C P5c G U A G C G U A A 260 A C G A A GG G C C A C P4 C G U GUUCC G A U A 140 G U G A U U U 160 G C P6 G C G AA A U C A C P5b G A U A C U G A G U G 220 G U C G U A A G C G P6a AA C G U U A A A G U U A C G A U A U P6b C G A U G C 240 A U UCU Waring & Davies. (1984) Gene 28: 277. Cate, et al. (Cech & Doudna). (1996) Science 273:1678. 9 For example, tRNA tRNA Activity ssRNA structure rules • • • • Canonical basepairs: – Watson-Crick basepairs: • G–C • A–U – Wobble basepair: • G-U Stacks: continuous nested basepairs. (energetically favorable) Non-basepaired loops: – Hairpin loop – Bulge – Internal loop – Multiloop Pseudo-knots Ab initio RNA structure prediction: lots of Dynamic Programming • Objective: Maximizing the number of base pairs (Nussinov et al, 1978) simple model: (i, j) = 1 if allowed fancier model: GC > AU > GU Pseudoknots drastically increase computational complexity Objective: Minimize Secondary Structure Free Energy at 37 OC: Instead of (i, j), measure and sum energies: -2.1 -0.9 -1.6 C G G U U U U G Ghelix = G G C + G C A + 2G A A + G A C = C G U U U G G G -2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol U U G C A A A CA C G G Ghairpin loop = Ginitiation (6 nucleotides) + Gmismatch C A = -2.0 -0.9 -1.8 +5.0 5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol Gtotal = Ghairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287. http://cs273a.stanford.edu [Bejerano Fall11/12] 15 Zuker’s algorithm MFOLD: computing loop dependent energies RNA structure • Base-pairing defines a secondary structure. The base-pairing is usually non-crossing. Bafna 1 Stochastic context-free grammar S aSu S aSu S cSg acSgu S gSc accSggu S uSa accuSaggu Sa accuSSaggu Sc accugScSaggu Sg accuggSccSaggu Su accuggaccSaggu S SS accuggacccSgaggu accuggacccuSagaggu accuggacccuuagaggu 1. A CFG 2. A derivation of “accuggacccuuagaggu” 3. Corresponding structure Cool algorithmics. Unfortunately… – – Random DNA (with high GC content) often folds into low-energy structures. We will mention powerful newer methods later on. ssRNA transcription • ssRNAs like tRNAs are usually encoded by short “non coding” genes, that transcribe independently. • Found in both the UCSC “known genes” track, and as a subtrack of the RepeatMasker track http://cs173.stanford.edu [BejeranoWinter12/13] 20 “non coding” RNAs (ncRNA) microRNAs (miRNA/miR) http://cs173.stanford.edu [BejeranoWinter12/13] 21 MicroRNA (miR) ~70nt ~22nt miR match to target mRNA is quite loose. mRNA a single miR can regulate the expression of hundreds of genes. http://cs173.stanford.edu [BejeranoWinter12/13] 22 MicroRNA Transcription mRNA http://cs173.stanford.edu [BejeranoWinter12/13] 23 MicroRNA Transcription mRNA http://cs173.stanford.edu [BejeranoWinter12/13] 24 MicroRNA (miR) Computational challenges: Predict miRs. Predict miR targets. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA a single miR can regulate the expression of hundreds of genes. http://cs173.stanford.edu [BejeranoWinter12/13] 25 MicroRNA Therapeutics Idea: bolster/inhibit miR production to broadly modulate protein production Hope: “right” the good guys and/or “wrong” the bad guys Challenge: and not vice versa. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA a single miR can regulate the expression of hundreds of genes. http://cs173.stanford.edu [BejeranoWinter12/13] 26 Other Non Coding Transcripts http://cs173.stanford.edu [BejeranoWinter12/13] 27 lncRNAs (long non coding RNAs) Don’t seem to fold into clear structures (or only a sub-region does). Diverse roles only now starting to be understood. Hard to detect or predict function computationally (currently) http://cs173.stanford.edu [BejeranoWinter12/13] 28 lncRNAs come in many flavors http://cs173.stanford.edu [BejeranoWinter12/13] 29 X chromosome inactivation in mammals X X X Dosage compensation X Y Xist – X inactive-specific transcript Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67 Transcripts, transcripts everywhere Human Genome Transcribed* (Tx) Tx from both strands* * True size of set unknown http://cs173.stanford.edu [BejeranoWinter12/13] 32 Or are they? http://cs173.stanford.edu [BejeranoWinter12/13] 33 The million dollar question Human Genome Transcribed* (Tx) Leaky tx? Tx from both strands* Functional? * True size of set unknown http://cs173.stanford.edu [BejeranoWinter12/13] 34 Coding and non-coding gene production To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. The cell is constantly making new proteins and ncRNAs. These perform their function for a while, And are then degraded. Newly made coding and non coding gene products take their place. The picture within a cell is constantly “refreshing”. http://cs173.stanford.edu [BejeranoWinter12/13] 35 Cell differentiation To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. That is exactly what happens when cells differentiate during development from stem cells to their different final fates. http://cs173.stanford.edu [BejeranoWinter12/13] 36 Human manipulation of cell fate To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. We have learned (in a dish) to: 1 control differentiation 2 reverse differentiation 3 hop between different states http://cs173.stanford.edu [BejeranoWinter12/13] 37 Cell replacement therapies We want to use this knowledge to provide a patient with healthy self cells of a needed type. We have learned (in a dish) to: 1 control differentiation 2 reverse differentiation 3 hop between different states (iPS = induced pluripotent stem cells) http://cs173.stanford.edu [BejeranoWinter12/13] 38 How does this happen? Different cells in our body hold copies of (essentially) the same genome. Yet they express very different repertoires of proteins and non-coding RNAs. How do cells do it? A: like they do everything else: using their proteins & ncRNAs… http://cs173.stanford.edu [BejeranoWinter12/13] 39 Gene Regulation Some proteins and non coding RNAs go “back” to bind DNA near genes, turning these genes on and off. Gene DNA Proteins To be continued… http://cs173.stanford.edu [BejeranoWinter12/13] 40 Review Lecture 6 • Central dogma recap – Genes, proteins and non coding RNAs • RNA world hypothesis • Small structural RNAs – Sequence, structure, function – Structure prediction – Transcription mode • MicroRNAs – Functions – Modes of transcription • lncRNAs – Xist • Genome wide (and context wide) transcription – How much? – To what goals? • Gene transcription and cell identity – Cell differentiation – Human manipulation of cell fates • Gene regulation control http://cs173.stanford.edu [BejeranoWinter12/13] 41 (On Mondays) ask students to stack the chairs without wheels at the back of the room at the end of class. http://cs173.stanford.edu [BejeranoWinter12/13] 42