Dicer deletion and short RNA expression analysis in mouse embryonic stem cells by Joseph Mauro Calabrese B.S. Chemistry, Biochemistry, and Molecular Biology (2001) University of Wisconsin-Madison Submitted to the Department of Biology In Partial Fulfillment of the Requirements for the Degree of Doctor in Philosophy in Biology at the Massachusetts Institute of Technology February 2008 C Massachusetts Institute of Technology All rights reserved. Signature of Author:_ J Department of Biology December 18d, 2007 Certified by: C/' Phillip A. Sharp Professor of Biology Thesis Supervisor Accepted by: Steven P. Bell Professor of Biology Chair, Biology Graduate Committee MASSACHUSETT MImr1M OF TEOHNOLOGY FEB 1-2 2008 LIBRARIES ARCHIVE8 Dicer deletion and short RNA expression analysis in mouse embryonic stem cells By Joseph Mauro Calabrese Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of Doctor in Philosophy in Biology. ABSTRACT RNA interference (RNAi) manages many aspects of eukaryotic gene expression through sequence-specific interactions with RNA. Short RNAs, 20-30 nucleotides in length, guide the various effector proteins of RNAi to silence fully or partially complementary targets. The sequencing of endogenously expressed short RNA species coupled with genetic studies in various experimental organisms has revealed a role for RNAi in the silencing of protein-coding genes and repetitive elements in genomes. In mammals, it is unknown to what extent RNAi is involved in silencing processes other than the modulation of protein-coding gene expression, which is achieved through a class of short RNAs termed microRNAs (miRNAs). The work in this thesis quantitatively describes the short RNAs expressed in mouse embryonic stem (ES) cells. ES cell lines are derived from the pre-implantation blastocyst and can be cultured in vitro for extended periods while still maintaining pluripotency. It was demonstrated that approximately 130,000 5' phosphorylated short RNA molecules are present in a single ES cell. 10% of these short RNAs represent nonrandom fragments of larger, abundant non-coding RNA species, and have no known function. Low abundance short RNAs were discovered that cluster bidirectionally around the transcription start sites of protein-coding genes. These RNAs associate with features of active transcription, and may be evidence of widespread bidirectional initiation and pausing of RNA polymerase II in ES cells. There are on the order of 300 different miRNA species expressed in ES cells, comprising 85% of the total pool of 130,000 5' phosphorylated short RNAs. Based on experiments correlating miRNA abundance to target repression, only about 30 of these miRNAs are expected to carry significant ES cell regulatory capacity. ES cells lacking all miRNAs do not significantly change their morphology or gene expression patterns, but do show a significant drop in growth rate compared to controls, suggesting that a major function of ES cell miRNAs may be to govern cell division. A detailed comparison of short RNAs expressed in ES cells with and without the ribonuclease Dicer strongly suggests that miRNAs are the sole regulatory molecules that function through the RNAi pathway in ES cells. Considering previous work showing that repeating elements are frequently under Dicer-dependent repression, this observation raises the possibility that mammalian miRNAs may in certain contexts function to silence repeating genomic elements in addition to protein-coding genes. ACKNOWLEDGEMENTS To Phil, for training me to become a research scientist. I have learned so much in the lab it is impossible to recount the details in this space. Thanks for all of your help and advice, and for providing me with so many opportunities to discover. To my committee members, Dave Bartel and Rudolf Jaenisch for sage advice and expanding the way we thought about diverse problems. To all of my co-workers in the Sharp lab over the years, so many of you have been teachers and role models, in addition to friends. Thanks for everything, especially the Muddy Charles trips. Sharp lab rules. To my friends who have made my life outside of lab exciting, thanks. Many highlights come to mind, including: the CCR retreats, Harvard Law parties, firing Ann and Keara, taquito bombs, the syrup race, grilled pecan sandies, flip cup, uneven pool tables, road trips, good beadings, physical challenges on Boston Commons, Department-funded recruitment events at Pleasant Place, four 4 th of Julys, and other barbecue-centered events, too numerous to mention. To Nicole, for all of your support and for providing so many pleasant distractions during times of stress. To my parents, for all that you have done. Too few people in this world have had the opportunities and support that you have given me. Without your guidance, encouragement, and love things would be very different. Discovery consists of seeing what everyone else has seen and thinking what no one else has thought. - Albert Szent-Gy6rgyi 5 TABLE OF CONTENTS A bstract ..................................................................... ................ 2 Chapter 1 RNA interference and the biology of mouse embryonic stem cells............... 6 Animal miRNAs..............................................................8 miRNA biogenesis .............................................................. 8.. miRNA-mediated silencing mechanisms ..... .................... ..... ........ 11 m iRNA function .................................................................. 14 C. elegans antisense siRNAs................................... .......... 17 RNAi-mediated transcriptional silencing in S. pombe............................ 20 RNAi-mediated viral and transcriptional silencing in plants ....................... 21 RNAi-mediated silencing of transposons and transgenes in C. elegans...........23 RNAi-mediated transposon control in D.melanogaster............................. 24 Mammalian RNAi and repetitive elements ........................................... 26 M ouse embryonic stem cells ....................................................... ..... 29 Chapter 2 Characterization of the short RNAs bound by the P 19 suppressor of RNA silencing in mouse ES cells .............................................................. 48 Chapter 3 RNA sequence analysis defines Dicer's role in mouse ES cells..................89 Chapter 3 Appendix................................ ........................ 129 Chapter 4 Short RNAs in the sense and anti-sense orientation from transcription initiation sites in m ouse ES cells........................................................ ........ 147 Chapter 5 Examining miRNA function in mouse ES cells ................................. 174 Conclusions and future directions ............................................................... 202 Chapter 1 RNA interference and the biology of mouse embryonic stem cells Introduction To manage their gene expression programs, organisms employ many distinct mechanisms. In one set of regulatory mechanisms, termed RNA interference (RNAi), short RNAs 20-30 nucleotides (nt) in length, guide multi-protein complexes to suppress functions of complementary nucleic acid targets. Present in many single-celled, and likely all multi-celled eukaryotes, the processes of RNAi regulate protein-coding gene expression, initiate and maintain transcriptional silencing of specific genomic loci, and maintain genomic integrity and immunity via the silencing transposable elements. In Chapter 1,the various mechanisms of RNAi-mediated silencing are described, focusing heavily on studies conducted in mammals, though also touching on aspects of RNAi in many experimental organisms. Additionally, a basic introduction to the biology of mouse embryonic stem (ES) cells is included to provide appropriate background to the research described in this thesis. RNAi in the control of protein-coding gene expression in animals Animal microRNAs RNAi is a master regulator of protein-coding gene expression, predominantly through a class of -22 nt long non-coding RNA genes termed microRNAs (miRNAs). miRNAs are sequence-specific guide molecules for protein complexes that prevent productive translation and destabilize mRNAs. miRNAs appear to be ubiquitously expressed in all multi-cellular eukaryotes, and have recently been identified in the unicellular eukaryote, Chlamydomonas reinhardtii(Molnar et al. 2007; Zhao et al. 2007a). Their roles are diverse, and in many cases, essential. From a growing set of genetic, biochemical, and computational analyses, it appears that many miRNAs control cell-fate specification and have pleiotropic effects on cellular environments, similar to cell-type-specific transcription factors (Kloosterman and Plasterk 2006). miRNAs have critical regulatory roles in plants as well as animals; however, significant differences exist between plant and animal miRNA biosynthesis and function. In the text below, only animal miRNAs are discussed. miRNA biogenesis miRNAs are transcribed by RNA Pol II as long primary transcripts, termed primiRNAs, that are capped, poly-adenylated, and frequently poly-cistronic (Cai et al. 2004; Lee et al. 2004; Rodriguez et al. 2004). Many miRNAs are located in defined intergenic transcriptional units (Saini et al. 2007), others are located in introns and likely coexpressed with host genes from single promoters (Baskerville and Bartel 2005). PrimiRNAs are processed in the nucleus by the Drosha-DGCR8 heterodimer to generate -70 nt long pre-miRNA hairpins with characteristic 5' phosphates and 3' 2 nt overhangs (Lee et al. 2003; Zeng et al. 2005; Han et al. 2006). Pre-miRNAs are then exported into the cytoplasm by Exportin-5 and Ran-GTP (Ying et al. 2003). After nuclear export, the pre-miRNA hairpin is processed by the cytoplasmic enzyme Dicer to generate a ~22 base pair RNA duplex consisting of the mature miRNA paired to its complement, termed the miRNA* (Grishok et al. 2001; Hutvagner et al. 2001; Ketting et al. 2001). This duplex is likely short-lived, as miRNA* levels are up to 100 fold lower than levels of corresponding miRNAs (Ruby et al. 2006). Overexpression of an RNA-duplex binding protein in mammalian cells fails to capture miRNA-miRNA* duplexes, consistent with their proposed short life span and suggesting that these duplexes are bound by protein components in the cytoplasm (described in Chapter 2). After Dicer processing, the mature single-stranded miRNA is then displaced from the miRNA* and incorporated into an active silencing complex. The Argonaute proteins bind miRNAs in the core of the multi-subunit RNAinduced silencing complex (RISC), the protein complex that mediates RNAi-based silencing (Liu et al. 2004). Mammals have eight Argonaute paralogues, divided equally between the Ago and Piwi subfamilies (Carmell et al. 2002). Ago subfamily members are thought to strictly associate with miRNAs, while at least two Piwi subfamily members associate with a separate class of RNAs, termed piRNAs (Liu et al. 2004; O'Donnell and Boeke 2007). Of the 4 Ago proteins, only Ago2 is capable of cleaving target transcripts that are perfectly complementary to bound miRNAs (Liu et al. 2004). Transcript cleavage by Ago2 is multi-turnover and occurs on the target RNA directly across from the 10h nucleotide measuring from the 5' end of the miRNA (Hutvagner and Zamore 2002; Martinez and Tuschl 2004). Ago2 does not explicitly depend on this cleavage activity to function, as expression of a cleavage-deficient Ago2 mutant protein is able to fully rescue the phenotypic defects of a mouse Ago2 hematopoietic knockout (Tang et al. 2007). The other Ago proteins lack significant cleavage activity and likely function mainly to prevent translation of target mRNAs (Pillai et al. 2004). Many proteins associate with the Argonautes either as RISC loading or accessory factors. In HEK 293T cells, the double-stranded RNA (dsRNA) binding protein TRBP associates with Dicer and Ago2 and is likely required for proper loading of miRNAs into the RISC (Chendrimada et al. 2005). Biochemical studies mainly conducted in cells from Homo sapiens and Drosophilamelanogasterhave shown many other proteins associate with RISC as accessory factors, including: the fragile-X-mental-retardation protein (FMRP), tudor staphylococcal nuclease (TSN), the vasa intronic gene (VIG), Mov 10, elF6, and Gemin3 and Gemin4 (Meister et al. 2005; Sontheimer 2005; Chendrimada et al. 2007). The function of many of these proteins in miRNA-mediated silencing, and whether they consistently associate with Ago and the RISC in multiple cell types, remains unclear. Loading of single-strand mammalian miRNAs into the RISC is thought to depend on the difference in thermodynamic end stabilities between the two ends of the miRNA/miRNA* duplex. Analysis of functional short-interfering RNAs (siRNAs) and both vertebrate and invertebrate miRNAs has shown that the short RNA whose 5' terminus is located at the end of the duplex that is least thermodynamically stable is preferentially incorporated into the RISC (Khvorova et al. 2003; Schwarz et al. 2003). This difference in thermodynamic stability can in many cases accurately predict which strand of the pre-miRNA hairpin will be the miRNA and which will be preferentially degraded as the miRNA*; however, exceptions exist where differences in thermodynamic stability alone are insufficient to predict miRNA duplex strand choice (Khvorova et al. 2003; Schwarz et al. 2003). In addition to miRNA duplex end stabilities, base pairing between the body of the miRNA and miRNA* can affect how miRNAs are loaded into different RISCs. In D. melanogaster, miRNAs duplexes with perfect complementarity across from the site of Ago2 cleavage are preferentially incorporated into Ago2-containing RISCs, while those with bulges in the would-be cleavage region-positions 9 through 11 measuring from the miRNA 5' end-are preferentially incorporated into Agol-containing RISCs (Forstemann et al. 2007; Tomari et al. 2007). In mammals, this differential incorporation of miRNAs into RISCs does not appear to occur, as immunoprecipitation of different Argonautes followed by miRNA microarray analysis shows that Ago 1, 2, and 3 bind all miRNAs equally well (Liu et al. 2004). Dynamic changes in miRNA levels have been observed along developmental axes and changes in physiological state (Cheng et al. 2007; Neilson et al. 2007; Xu et al. 2007), suggesting the existence of active processes for the regulation of mature miRNA levels. Nevertheless, some mature miRNAs appear to be very stable in non-dividing cells (Song et al. 2003). Also, levels of mature miRNAs are often uncoupled from levels of pre- and pri-miRNAs, suggesting that miRNA processing itself is a regulated process (Obernosterer et al. 2006; Thomson et al. 2006). miRNA-mediated silencing mechanisms The mechanisms of miRNA-mediated gene silencing processes appear to be diverse. The founding miRNA, Caenorhabditiselegans lin-4, was observed to prevent translation of its target mRNA, lin-14, through an interaction that required partially complementary sequences in lin-14's 3' untranslated region (UTR). This translational repression did not significantly change lin-14 mRNA levels or the location of lin-14 mRNA in a polysome sedimentation profile (Lee et al. 1993; Wightman et al. 1993; Olsen and Ambros 1999). A large body of subsequent work shows that translational repression of mRNAs via partially complementary sequence interaction is the predominant mechanism of miRNA-mediated gene silencing in animals (Bartel 2004). However, at least one animal miRNA, miR-196, functions by cleaving perfectly complementary target transcripts (Yekta et al. 2004). Moreover, it has been observed that many miRNAs can destabilize target mRNAs, likely by causing relocation of mRNAs to cytoplasmic processing bodies (P-bodies) (Bagga et al. 2005; Lim et al. 2005). Sequence information in 3' UTRs predominantly dictates the type and extent of miRNA-mediated mRNA repression. Comparative genomics studies have shown that miRNA target sites on mRNAs are most conserved over bases 2-8 measuring from the 5' end of the miRNA, termed the "seed" region of the miRNA (Lewis et al. 2003; Lewis et al. 2005). Experiments testing the repressive capability of both artificial and natural miRNAs are in agreement with this, having demonstrated that perfect base pairing between the 5' end of the miRNA and the 3' UTR is a strong determinant of repression (Doench and Sharp 2004; Brennecke et al. 2005). Additionally, these experiments showed that extensive pairing to the 3' end of the miRNA can compensate for weak 5' pairing, and that miRNA sites in close proximity synergize with each other, demonstrating that a single UTR may be subject to regulation from many miRNAs (Doench et al. 2003; Doench and Sharp 2004; Brennecke et al. 2005). More recently, it has been shown that target site accessibility, local A/U content, target-site-proximal conservation, and location of the target site relative to the stop codon, are all additional determinants of miRNA-mediated repression (Grimson et al. 2007; Kertesz et al. 2007; Nielsen et al. 2007). Though it is accepted that the majority of animal miRNAs function by preventing productive translation of their target mRNAs, the apparent mechanisms of translational inhibition by miRNAs vary depending on the experimental system used by researchers. Studies using various in vitro cell extracts or in vitro transcribed mRNAs have shown miRNAs inhibit translational initiation in a manner dependent on a 7-methyl-guanine (m7G) cap structure and a poly-A tail (Humphreys et al. 2005; Pillai et al. 2005; Wang et al. 2006a; Mathonnet et al. 2007; Wakiyama et al. 2007). Argonaute proteins have a m7 G cap binding domain that is similar to the that of the cap binding protein eIF4E, and this domain is required for Ago-mediated translational repression of mRNAs (Kiriakidou et al. 2007). Furthermore, addition of recombinant eIF4E to extracts interferes with miRNA-mediated translational inhibition (Mathonnet et al. 2007). Together, these studies support a model by which miRNA-guided RISCs bind m7G cap structures to prevent translational initiation. Apparently at odds with these findings are a number of studies analyzing cells that suggest miRNAs inhibit translation at a step post-initiation (Olsen and Ambros 1999; Seggerson et al. 2002; Maroney et al. 2006; Nottrott et al. 2006; Petersen et al. 2006). In these works, miRNAs and mRNAs actively repressed by miRNAs co-sediment with polyribosomes in sucrose gradients, and this co-sedimentation can be disrupted by puromycin, suggesting it depends on actively translating ribosomes (Maroney et al. 2006; Nottrott et al. 2006; Petersen et al. 2006). Further, miRNAs inhibited translation of a reporter gene driven by the cricket paralysis virus IRES, which allows loading of elongation-competent 80S ribosomes on mRNAs without the requirement for canonical initiation factors and initiator tRNAs, again suggesting that miRNAs repress translation post-initiation (Petersen et al. 2006). One potential explanation for these discrepancies could be that miRNAs may inhibit translation both pre- and post-initiation, but certain experimental conditions, such as those that utilize in vitro transcribed mRNAs, are differentially sensitive to these two modes of inhibition. miRNAs can also induce mRNA destabilization by targeting mRNAs for deadenylation, and potentially decapping. Zebrafish miR-430 promotes clearance of maternal mRNAs at the onset of zygotic transcription via deadenylation, and the miRNA let-7 promotes translation-independent deadenylation of a reporter mRNA in vitro (Giraldez et al. 2006; Wakiyama et al. 2007). Separate experiments show that Argonaute proteins associate with P-bodies and decapping enzyme in tissue culture cells, indirectly linking miRNA-mediated repression to decapping (Jakymiw et al. 2005; Liu et al. 2005; Pillai et al. 2005; Sen and Blau 2005). Further, decay products have been detected of miRNA-targeted mRNAs that are consistent with the 5' to 3' exonucleolytic degradation mediated by Xrnlp, the nuclease that destroys uncapped mRNAs (Bagga et al. 2005). miRNA function The current miRNA database has annotations for 533 human and 442 mouse miRNAs (Griffiths-Jones 2004). Given the large number and apparent ubiquitous expression of miRNAs in animals, the potential for miRNA-mediated gene regulation is large. The founding miRNAs were identified in forward genetic screens via their phenotypic influence on genetic pathways (Chalfie et al. 1981; Reinhart et al. 2000); however, the small amount of sequence complementarity needed for miRNA-mediated repression suggests that most miRNAs affect many cellular pathways rather than one specifically. miRNAs tune translation from expressed mRNAs to define protein output of targeted genes. In one specific example, reduction of atrophin levels by DrosophilamiR8 is required for normal central nervous system function. Importantly, further reduction or overexpression of atrophin in otherwise wild-type flies result in a mutant phenotype, indicating that miR-8 reduces atrophin expression to a level appropriate for normal function (Karres et al. 2007). miR- 150 expression in differentiating B cells represents another example of miRNA-mediated tuning of protein expression (Xiao et al. 2007). miR-150 targets the transcription factor c-Myb during lymphocyte development. Similar to the situation described for atrophin and miR-8 in Drosophila,the relief of miR- 150mediated c-Myb repression or a reduction of c-Myb protein levels both result in the impairment of B cell development (Xiao et al. 2007). Many miRNAs are reciprocally expressed with target mRNAs, suggesting that in certain cases, miRNAs function as master regulators of cell-type specific transcriptional and translational output (Farh et al. 2005; Stark et al. 2005). Genes that are highly expressed in specific tissues have evolved to avoid targeting by abundant tissue-specific miRNAs, whereas genes that are conserved targets of tissue specific miRNAs are frequently expressed at low levels, or in tissues adjacent to the tissue specific miRNAsuch that gene expression boundaries and cell identity appear to be maintained by miRNA expression (Farh et al. 2005; Stark et al. 2005). Additionally, genes that need to be ubiquitously expressed, such as ribosomal protein genes, tend to have short UTRs that avoid miRNA targeting completely (Stark et al. 2005). Consistent with a role for miRNAs in the restriction of tissue identity, miRNA levels are generally down-regulated in tumors, and impaired miRNA processing enhances tumorigenesis, a process in which diverse collections of rapidly evolving cells need to adopt multiple cellular identities (Lu et al. 2005; Kumar et al. 2007). Genetic knockouts of specific mouse miRNAs reveal a striking intolerance for loss of tissue-specific miRNA expression. The most poignant example of this thus far is knockout of miR- 1-2, which is expressed specifically in muscle cells. Approximately 50% of miR-1-2 knockout mice die from severe cardiac dysfunction at or before weaning, indicating a critical role for miR-1-2 in the heart (Zhao et al. 2007b). Also, deletions of the lymphoid specific miRNAs miR-150 and -155 result in severe defects in B-cell and T-cell differentiation, respectively (Rodriguez et al. 2007; Thai et al. 2007; Xiao et al. 2007). Deletion of Dicer from mouse tissues results in catastrophic abnormalities in all cases examined, again suggesting that miRNA function is critical in many tissues; however, because Dicer is required for the biogenesis of several other regulatory RNAs in non-mammalian organisms, the phenotypic consequences of Dicer loss may not be solely due to loss of miRNA expression and must be carefully interpreted. Dicer knockout mice die at the earliest stage examined, E7.5, and oocyte-specific Dicer deletion results in arrest at oocyte meiosis I, showing the necessity of Dicer activity at the earliest stages in mouse development (Bernstein et al. 2003; Murchison et al. 2007; Tang et al. 2007). Tissue-specific deletions of Dicer in the limb, lung, immune system, heart, and epidermis all show catastrophic mutant phenotypes, consistent with a requirement for Dicer function in these tissues (Harfe et al. 2005; Andl et al. 2006; Harris et al. 2006; Yi et al. 2006; Zhao et al. 2007b). To conclude, there is a role for miRNA-mediated gene regulation in a large number of biological processes. Not discussed here are documented roles for miRNAs in a range of biology, including apoptosis, metabolism, cell division, metastasis, local translation at synapses, and management of circadian rhythms (Kloosterman and Plasterk 2006; Cheng et al. 2007; Wu et al. 2007; Xu et al. 2007). Additionally, it is possible that miRNAs have a generalized role in regulating gene expression during stress (Leung and Sharp 2007). Recent observations show that Ago2 and the RISC component FXR1 are curiously required for the up-regulation of TNFa protein in human cells after serum starvation (Vasudevan and Steitz 2007); whether or not this up-regulation is miRNAdependent is currently unclear. C.elegans antisense siRNAs In the nematode C. elegans, the expression of short RNAs antisense to proteincoding mRNAs is thought to modulate protein-coding gene expression in a manner separate from miRNAs, likely by guiding direct mRNA cleavage (Ambros et al. 2003). Several proteins are implicated in the biogenesis of these endogenous siRNAs, including Dicer, an RNA-dependent RNA polymerase (RdRP), an RNA helicase, an RNAse D homologue, a nucleotidyltransferase, and the conserved RNA phosphatase Pir- 1 (Duchaine et al. 2006; Lee et al. 2006a; Sijen et al. 2007). The mechanistic details of C. elegans siRNA biogenesis remain unclear. Potentially, target mRNAs serve as templates for an RdRP to generate double-stranded RNA species with 5' tri- or di-phosphates. Presumably, these phosphates need to be removed before Dicer processing, as pir-1 phosphatase mutants accumulate long RNAs anti-sense to target transcripts (Duchaine et al. 2006). Dicer processing then likely generates primary siRNAs that are low in abundance and serve as guides to initiate a second round of RdRP synthesis, this time resulting in abundant short 21-27 nt siRNAs that likely function to silence complementary mRNAs (Ruby et al. 2006; Sijen et al. 2007). C. elegans siRNAs have 5' tri- or di-phosphates, different from the 5' monophosphates of miRNAs. Unlike miRNAs, most C. elegans siRNAs do not serve as substrates for T4 RNA ligase in vitro, which requires a 5' mono-phosphate; however, they do show a shift in mobility after treatment with alkaline phosphatase, indicating the presence of at least one terminal 5' phosphate (Pak and Fire 2007; Sijen et al. 2007). C. elegans siRNAs also serve as substrates for in vitro capping reactions that require 5' trior di-phosphates, and exhibit gel mobility patterns that mimic the mobility of synthetic 5' tri- and di-phosphorylated RNAs, strongly suggesting they are marked with 5' tri- or diphosphates (Ruby et al. 2006; Pak and Fire 2007; Sijen et al. 2007). The 5' end modification of C. elegans siRNAs is noteworthy because it greatly reduces endogenous siRNA sequencing frequency in short cDNA libraries that have been prepared by selecting for the canonical 5' and 3' end modifications of animal miRNAs, 5' monophosphates and 3' hydroxyls. Ruby and colleagues, selecting for short RNAs with 5' monophosphates and 3' hydroxyls, found miRNAs to be 100-fold more abundant than anti-sense siRNAs in mixed-stage C. elegans (Ruby et al. 2006). In contrast, using a cDNA library preparation method that was independent of 5' modification, Ambros and colleagues found miRNAs and siRNAs to be approximately equal in abundance (Ambros et al. 2003). It is currently unclear whether other organisms express endogenous short RNAs with similarly modified 5' termini; however, these studies set the precedent for short RNA species eluding discovery because of end modifications incompatible with cDNA library preparation methods. RNAi in the control of heterochromatin and transposable elements RNAi has a conserved role in the silencing of transposable elements and the establishment of heterochromatin at repetitive loci in eukaryotic genomes. These RNAibased silencing processes are diverse, and understood in varying detail, discussed in the text below. There are many cases in which RNAi prevents the replication of exogenous RNA viruses on a post-transcriptional level. Also, at least in plants, RNAi transcriptionally silences exogenous viruses that have integrated into the genome. In a related set of silencing mechanisms, RNAi prevents the spread of endogenous transposable elements on both transcriptional and post-transcriptional levels. These types of endogenous transposable elements represent a large portion of many eukaryotic genomes, and usually express one or several proteins that function in concert with cellular machinery to replicate. In certain cases, formation of heterochromatin around these elements is a direct consequence of the protective role of RNAi. In many cases, repeats are silenced not as an act of genomic defense but instead as a means to coordinately regulate nuclear domains for maintenance of genome structure or in response to developmental cues, suggesting exaptation of this defense pathway. Notably, despite extensive conservation of RNAi components in eukaryotes, it is unclear to what extent RNAi mediates the silencing of repetitive elements and the formation of heterochromatin in mammals. RNAi-mediated transcriptional silencing in Schizosaccharomyces pombe RNAi-mediated transcriptional silencing is best understood in S. pombe, which has only one member from each of three major gene families involved in RNAi. Targeted deletion of the sole Dicer (dcrl), Argonaute (ago 1), or RdRP (rdp 1) de-silences centromeric repeats and results in defects in mitotic chromosome segregation and telomeric clustering, indicating a role for RNAi-mediated transcriptional silencing in genomic integrity and high-order nuclear structure (Volpe et al. 2002; Hall et al. 2003; Sugiyama et al. 2005). RNAi is also needed for heterochromatin establishment at the repetitive mating-type locus, which is a 20kb region harboring a copy of the centromeric repeat cenH flanked by inverted repeats that serve as boundary elements to heterochromatin formation (Hall et al. 2002; Jia et al. 2004). A detailed mechanistic model has emerged for RNAi-mediated establishment and maintenance of heterochromatin in S. pombe (Colmenares et al. 2007). Nascent transcripts generated by RNA Polymerase II (RNA Pol II) at heterochromatic loci are bound by complementary short RNAs carried in the RNA-induced-transcriptionalsilencing, or RITS, complex (Buhler et al. 2006; Irvine et al. 2006). The trimeric RITS complex, composed of Ago 1 bound to a heterochromatic siRNA, the chromodomaincontaining protein Chpl, and a protein of unknown function, Tas3 (Verdel et al. 2004), then recruits an RdRP-containing complex (the RDRC) to the heterochromatic locus in a manner that requires both the catalytic cleavage activity of Agol and the histone3-lysine9 (H3K9) binding activity of Chpl (Motamedi et al. 2004; Noma et al. 2004; Irvine et al. 2006). Heterochromatin formation and siRNA production both require the H3K9 methyltransferase Clr4, highlighting the importance of Chp l's interaction with chromatin (Noma et al. 2004). Once tethered to the nascent transcript, the RDRC creates a doublestranded RNA that is processed by Dicer to generate more heterochromatic siRNAs and start the cycle anew (Colmenares et al. 2007). Interestingly, a centromeric repeat exogenously introduced into euchromatin is sufficient to induce RNAi-mediated heterochromatin formation, suggesting either a sequence specificity to the RNAimediated induction of heterochromatin, or that siRNA-loaded RITS can act in trans to silence dispersed repeats (Hall et al. 2002). RNAi-mediated viral and transcriptional silencing in plants The RNAi pathway has a significant role in plant antiviral immunity. Many plant viruses encode single- or double-stranded RNA genomes that are recognized by RNAi machinery as deleterious and serve as substrates for the generation and amplification of targeting siRNAs. In this process, the viral genome is converted into double-stranded RNA via an RdRP and subsequently converted into siRNAs by a plant Dicer, after which the viral siRNAs are incorporated into a viral-targeting RISC that likely contains plant Agol at its core (Xie and Guo 2006; Zhang et al. 2006). An interesting aspect of this silencing mechanism is that viral siRNAs are not only present in the infected cells, but spread throughout the plant and protect distal portions of the plant from subsequent viral infections (Hamilton et al. 2002). Accordingly, plants with mutations in various Dicers (dcl2 and dcl4) and RdRPs (rdr I and rdr6) are highly susceptible to local and systemic viral infection (Mourrain et al. 2000; Deleris et al. 2006). Emphasizing its protective importance, potentially all plant RNA viruses have evolved to express proteins that are potent inhibitors of the plant RNAi antiviral response (Voinnet 2005). The P19 protein from Tombus family of viruses is the best-characterized viral inhibitor of RNAi (Scholthof 2006). P 19 functions by binding and sequestering viraltargeting siRNAs that in their free form would be incorporated into a plant RISC. X-ray crystallographic and biochemical studies show that head-to-head dimers of P19 bind siRNA duplexes with low nanomolar affinity (Vargason et al. 2003; Ye et al. 2003; Lakatos et al. 2004). Many viral inhibitors of RNAi are proposed to function similarly (Voinnet 2005); still others inhibit different steps of the RNAi pathway, as demonstrated by the 2b protein from the Cucumber mosaic virus, which inhibits plant AGO 1 (Zhang et al. 2006). RNAi also plays a significant role in the establishment of heterochromatin surrounding repetitive elements in plant genomes. One branch of this pathway mediates the formation of heterochromatin around invading viruses and exogenously introduced transgenes that have integrated into plant genomes, and has genetic requirements similar to those described above for post-transcriptional silencing of viruses, namely dcl2, rdrl, and rdr6 (Dalmay et al. 2000; Fagard et al. 2000; Mourrain et al. 2000). The other branch serves to silence endogenous repetitive elements, and requires separate RNAi paralogues to function. Certain classes of transposons and ribosomal DNA loci are de-silenced in mutant strains of Dicer (dcl3), Argonaute (ago4), and RdRP (rdr2). This de-silencing is accompanied by disappearance of short RNAs corresponding to the repeats, and a reduction in H3K9 and DNA methylation at the repetitive loci (Lippman et al. 2003; Zilberman et al. 2003; Chan et al. 2004; Xie et al. 2004). Synthesis and subsequent loading of heterochromatic siRNAs into silencing complexes also depends on RNA Polymerase IV, a DNA-dependent RNA polymerase that localizes with Ago4 in nuclear Cajal bodies (Onodera et al. 2005; Li et al. 2006; Zhang et al. 2007). RNAi-mediated silencing of transposons and transgenes in C. elegans RNAi is required for the suppression of transposon replication and the transcriptional silencing of transgene arrays in the C. elegans germline and soma. Genetic screens first uncovered a role for RNAi in these processes, showing that the RNAseD homologue mut-7 and the Argonaute-like protein ppw-2 are required for germline suppression of transposition in C. elegans (Ketting et al. 1999; Vastenhouw et al. 2003). Subsequent work showed that many genes required for the suppression of transposition are also required for co-suppression, the process by which high-copy transgenes can induce the silencing of related endogenous genes in trans (Ketting and Plasterk 2000). Candidate-based RNAi screens uncovered additional genes required for germline co-suppression, including many chromatin modifiers, suggesting that cosuppression is at least partly a transcriptional silencing process (Robert et al. 2005). Transcriptional silencing of transgene arrays in the C. elegans soma requires a different set of RNAi paralogues, including Dicer (dcr-1), the double-stranded RNA binding protein rde-4, the Argonaute protein rde-1, and the RdRP rrf-1 (Grishok et al. 2005). RNAi-mediated transcriptional silencing, transposon control, and viral defense in D. melanogaster Transcriptional and post-transcriptonal control of endogenous repeats by RNAi D. melanogasterRNAi pathway components are required for the transcriptional and post-transcriptional control of repetitive elements. Post-transcriptional silencing of the tandemly repeated Stellate genes and other classes of retrotransposons requires at least one of two genes from the Piwi-subfamily of Argonaute proteins, Aubergine and Piwi, and two DExH-box helicases also involved in RNAi, Spindle-E and Armitage (Aravin et al. 2001; Vagin et al. 2006). The silencing of transgene arrays on the transcriptional and post-transcriptional levels has similar genetic requirements (PalBhadra et al. 2002; Pal-Bhadra et al. 2004). Moreover, piwi, aubergine,and spindle-E mutant flies show genome wide defects in heterochromatin, including reduction in H3K9 methylation levels and delocalization of the heterochromatin binding protein, HP 1 (PalBhadra et al. 2004). Curiously, a repetitive locus associated with the telomere on the right arm of D. melanogasterchromosome 3, the 3R-TAS locus, is constitutively euchromatic; however, in piwi mutants 3R-TAS becomes heterochromatinized despite a genome-wide decrease in H3K9 marks, suggesting that Piwi may also inhibit the spread of heterochromatin in addition to nucleating its formation (Yin and Lin 2007). RNAi components are also required for the nuclear clustering of Polycomb genes in D. melanogaster(Grimaud et al. 2006). Using transgenic flies carrying multiple copies of the Polycomb response element Fab-7 integrated at different genomic locations, Grimaud and colleagues showed that Piwi, Ago 1, and Dicer-2 (Dcr-2) frequently colocalize with nuclear clusters of Fab-7 (Grimaud et al. 2006). Further, piwi, agol, and dcr2 mutant flies still maintained silencing but could no longer organize Fab-7 repeats into punctate nuclear foci, indicating a role for RNAi in the high-order nuclear organization but not silencing of Polycomb response elements (Grimaud et al. 2006). Sequence analysis of short RNAs associated with Piwi subfamily proteins (or piRNAs, for piwi-interacting RNAs) reveals several interesting characteristics (Brennecke et al. 2007; Gunawardane et al. 2007). D. melanogaster piRNAs are on average longer than miRNAs, 23-27 compared to -22 nt. Though the majority are complementary to highly repetitive elements, analysis of those with unique genomic locations reveals that piRNAs are generated from large, discreetly located clusters that harbor diverse classes of transposons (Brennecke et al. 2007). Strikingly, piRNAs associated with different Piwi subfamily members show characteristic first and tenth nucleotide biases, such that the large majority of Piwi- and Aubergine-associated RNAs begin with 'U' residues, while Ago3-associated RNAs frequently contain 'A' residues at their tenth nucleotide (Brennecke et al. 2007; Gunawardane et al. 2007). Even more surprising is the staggered overlap of piRNAs associated with different Piwi-subfamily members. It was observed that a large number of Ago3 piRNAs were complementary to the first 10 bases of piRNAs associated with Aubergine and Piwi (Brennecke et al. 2007; Gunawardane et al. 2007). Together with the observations that (1) the first and tenth nucleotides of Aubergine/Piwi- and Ago3-associated RNAs are complementary ('U' and 'A', respectively), (2) Piwi proteins are capable of cleaving complementary transcripts, and (3) piRNAs appear to exist in the absence of Dicer, the data suggest a model in which piRNAs are predominantly generated by other piRNA-containing protein complexes rather than Dicer cleavage of long, double-stranded RNA (Vagin et al. 2006; Brennecke et al. 2007; Gunawardane et al. 2007). Considering this model, one major puzzle is how this putative piRNA-induced biosynthetic loop is initiated. RNAi-mediated antiviral immunity in D. melanogaster D. melanogasterantiviral immunity depends on the RNAi pathway, analogous to the antiviral role of RNAi in plants. Mutations in Dicer-2, the Dicer-2 binding protein R2D2, or Ago2, render flies hypersusceptible to infection and lethality by various exogenous viruses (Galiana-Arnoux et al. 2006; van Rij et al. 2006; Wang et al. 2006b). At least 2 Drosophilaviruses encode proteins, necessary for successful infection, that function by potently suppressing Dicer-2/R2D2 processing of double-stranded RNA, further evidence that D. melanogaster RNAi participates in antiviral immunity (Li et al. 2002; Galiana-Arnoux et al. 2006; van Rij et al. 2006). Mammalian RNAi and repetitive elements As described in the preceding sections of Chapter 1, repetitive regions of many eukaryotic genomes are frequently maintained in a silent state via the RNAi pathway. Though mammalian repeating elements are frequently associated with heterochromatin (Thurman et al. 2007), the extent that mammalian RNAi is involved in heterochromatization of repetitive elements is currently unclear. Mammals do encode Piwi subfamily proteins that share characteristics with their Drosophilahomologues, potentially indicating the existence of a germline RNAi pathway to repress the expression of mammalian repetitive elements. Also, studies examining aspects of early mouse development potentially implicate the RNAi pathway in the silencing of mammalian repeats outside of the germline. Mammalian piRNAs Like their Drosophilahomologues, mammalian Piwi subfamily proteins associate with germ cell-specific RNAs termed piRNAs. These RNAs are 29-30 nt long and a large majority have 5' 'U' residues (Aravin et al. 2006; Girard et al. 2006; Lau et al. 2006). Unlike DrosophilapiRNAs, mammalian piRNAs are generally not repetitive and frequently map uniquely to the genome (Aravin et al. 2006; Girard et al. 2006; Lau et al. 2006). piRNAs are produced from genomic clusters spanning approximately 20-100 kb, and exhibit a striking strand bias within these clusters, such that sequences associated with a particular Piwi protein almost never overlap in polarity (Aravin et al. 2006; Girard et al. 2006; Lau et al. 2006). This polarity bias is consistent with a mammalian piRNA biosynthetic pathway similar to that proposed in Drosophila,where different Piwi paralogues associate with partially complementary sequences and appear to synergistically synthesize piRNAs (Brennecke et al. 2007; Gunawardane et al. 2007). piRNA sequences are not conserved between mouse, rat, and human; however, syntenic genomic regions give rise to piRNA clusters in these three organisms, suggesting that genomic location rather than sequence may be important for piRNA function (Aravin et al. 2006; Girard et al. 2006; Lau et al. 2006). Ablation in mice shows a role for Piwi proteins in spermatogenesis and implicates them in the germline silencing of repetitive elements. There are 4 Piwi proteins in mice: Miwi, Miwi2, Mili, and PiwiL3. PiwiL3 has not been studied in any context. Mice lacking Miwi, Mili, or Mili2 are male-specific infertile and show various defects in spermatogenesis (Deng and Lin 2002; Kuramochi-Miyagawa et al. 2004; Carmell et al. 2007). Additionally, Mili and Miwi2 null mice show reduced DNA methylation at repetitive elements, implicating these genes in a germline silencing pathway that methylates repetitive elements (Aravin et al. 2007; Carmell et al. 2007). While consistent with Piwi function in Drosophila,the proposed role of Miwi2 and Mili in the silencing of repeats is at odds with sequence data showing that Mili associates predominantly with short RNAs that are not repetitive. It would seem possible, however, that the minority of Mili-associated piRNAs that are repetitive could guide silencing in trans. Alternatively, the observed DNA methylation defects in the germlines of Mili and Miwi2 null mice may be an indirect consequence of Piwi protein loss. Proposed roles for Dicer in the silencing of endogenous repeats Studies of cells deleted or hypomorphic for Dicer indicate a potential role for mammalian RNAi in endogenous repeat silencing outside of the germline. DNA derived from the long interspersed nuclear element, or LINE, makes up approximately 20% of the mouse and human genome (Lander et al. 2001; Waterston et al. 2002). Full-length LINE repeats are -6kb in length and encode a reverse-transcriptase and chaperone protein that replicate the LINE genome after it has been transcribed by cellular RNA polymerase II. Human cells treated with siRNA to reduce Dicer levels show a mild increase in frequency of LINE retrotransposition, suggesting that RNAi may repress LINE replication (Yang and Kazazian 2006). Knockdown of Dicer via RNAi in one-cell mouse embryos results in a -50% increase in steady-state levels of different classes of endogenous long terminal repeat (LTR) retrotransposons, which have a different genome structure but replicate via similar mechanisms as LINE repeats (Svoboda et al. 2004). LTR elements are also abundant repeats, representing approximately 10% of the human and mouse genomes (Lander et al. 2001; Waterston et al. 2002). Similar to observations from early mouse embryos, Dicer knockout mouse oocytes accumulate retrotransposon RNA, and display elevated levels of transcripts containing specific classes of repeats (Murchison et al. 2007). Finally, Dicer knockout mouse ES cell lines have been reported to show increases in steady-state levels of centromeric repeat transcripts (Kanellopoulou et al. 2005; Murchison et al. 2005). All of these studies implicate Dicer in the control of repetitive elements in mammalian genomes, but it is currently unclear whether or not these effects are direct. Importantly, it has not yet been shown that mammalian Dicer generates the putative repeat-derived siRNAs that have been proposed to mediate the above-mentioned repressive effects. A brief introduction to mouse ES cells RNAi is essential for normal function in all mammalian tissue types examined, yet in the majority of these cases, the essential regulatory functions performed by RNAi are unclear. The functions of specific animal miRNAs have been difficult to determine, not only because they require so little sequence complementarity to influence target gene expression, but because they likely each affect several target genes, making it difficult to computationally determine functionally relevant targets within specific tissues. RNAi also functions through several different types of silencing molecules, affecting gene expression in diverse ways. In mammals, it is unclear how RNAi influences gene expression other than through miRNA-mediated silencing pathways. For example, it seems likely that mammalian piRNAs mediate silencing processes separate from miRNAs, but what these silencing processes are is unclear. The work described in this thesis focuses on defining the roles of RNAi in mouse ES cells. ES cells have a number of interesting properties relevant to both medicine and the biology of the early embryo; thus an understanding of RNAi-mediated gene regulation in ES cells will likely have broad applications. ES cells are cultured derivatives of the pre-implantation inner cell mass (ICM) of the blastocyst. The ICM is composed of the progenitor cells that will eventually give rise to a fully developed embryo (Niwa 2007). At the developmental stage from which ES cells are derived, the ICM is in an undifferentiated, epigenetically plastic state; genomewide DNA methylation levels are a fraction of what they will be in differentiated cells (Kafri et al. 1992; Rougier et al. 1998), and in female ICMs, paternal X chromosomes are re-activated to allow random inactivation in the epiblast (Mak et al. 2004). ES cells retain several characteristics of the ICM from which they are derived, most notably they exhibit a large degree of epigenetic plasticity and are pluripotent. ES cells can survive with two active X chromosomes, and in the complete absence of DNA methylation, highlighting their epigenetic plasticity (Rastan and Robertson 1985; Lei et al. 1996; Okano et al. 1999). A direct consequence this plasticity is ES cell pluripotency, defined as the ability of ES cells to give rise to all tissues in a fully developed embryo (Beddington and Robertson 1989). ES cell pluripotency can be maintained for extended periods in culture, and under appropriate conditions, ES cells can be differentiated into a number of cell types in vitro, raising the possibility that human ES cells may someday be used as tissue sources in regenerative therapies (Pera and Trounson 2004; Keller 2005). A number of factors have been implicated in the maintenance of ES cell pluripotency. The cytokine LIF (for leukemia inhibitory factor) activates a STAT3dependent transcriptional program that is important for maintenance of pluripotency (Smith et al. 1988; Williams et al. 1988; Niwa et al. 1998). Also, Smad-dependent induction of the Id (for inhibitor of differentiation) genes by BMP4 is critical for maintenance of ES cell pluripotency (Ying et al. 2003). Together, the LIF and BMP4 signaling molecules are sufficient for prolonged cell culture maintenance of ES cell pluripotency in the absence of serum (Ying et al. 2003). The transcription factors Oct4, Sox2, and Nanog are additional requirements for the maintenance of ES cell pluripotency (Nichols et al. 1998; Chambers et al. 2003; Mitsui et al. 2003; Masui et al. 2007). These three transcription factors frequently colocalize at the promoters of their target genes (Boyer et al. 2005). Oct4/Sox2/Nanog bound genes can be broadly grouped into two classes: genes that are transcriptionally active and likely contribute to ES cell identity, and genes that are transcriptionally silent. The silent class of Oct4/Sox2/Nanog-bound genes are also bound by the Polycomb complex, and are highly enriched in developmental regulator genes whose ES cell expression would likely lead to differentiation (Boyer et al. 2006; Lee et al. 2006b). RNAi also has a role in the maintenance of ES cell identity. Unlike many differentiated cell types, ES cells can survive deletion of Dicer; however, despite appearing morphologically normal and expressing wild-type levels of the pluripotency markers Oct4 and Nanog, Dicer null ES cells are no longer pluripotent (Kanellopoulou et al. 2005). Given the demonstrated role of miRNAs as essential regulators of cell-fate specification, this loss of pluripotency is likely not completely due to a change in ES cell state. Rather, it may partly be due to the inability of putative differentiated precursors to fully differentiate without the presence of additional non-ES cell miRNAs. Nevertheless, ES cells express a set of miRNAs specific to early developmental lineages that are likely important for ES cell identity (Houbaviy et al. 2003; Houbaviy et al. 2005; Tang et al. 2007). Dicer null ES cells also display proliferation defects (Murchison et al. 2005) (see also Chapter 5 of this thesis), again consistent with a cell autonomous role for RNAi in the maintenance of ES cell identity. References Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003. MicroRNAs and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818. Andl, T., Murchison, E.P., Liu, F., Zhang, Y., Yunta-Gonzalez, M., Tobias, J.W., Andl, C.D., Seykora, J.T., Hannon, G.J., and Millar, S.E. 2006. The miRNA-processing enzyme dicer is essential for the morphogenesis and maintenance of hair follicles. Curr Biol 16(10): 1041-1049. Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T. et al. 2006. A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442(7099): 203-207. Aravin, A.A., Naumova, N.M., Tulin, A.V., Vagin, V.V., Rozovsky, Y.M., and Gvozdev, V.A. 2001. Double-stranded RNA-mediated silencing of genomic tandem repeats and transposable elements in the D. melanogaster germline. Curr Biol 11(13): 1017-1027. Aravin, A.A., Sachidanandam, R., Girard, A., Fejes-Toth, K., and Hannon, G.J. 2007. Developmentally regulated piRNA clusters implicate MILI in transposon control. Science 316(5825): 744-747. Bagga, S., Bracht, J., Hunter, S., Massirer, K., Holtz, J., Eachus, R., and Pasquinelli, A.E. 2005. Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation. Cell 122(4): 553-563. Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Baskerville, S. and Bartel, D.P. 2005. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 11(3): 241247. Beddington, R.S. and Robertson, E.J. 1989. An assessment of the developmental potential of embryonic stem cells in the midgestation mouse embryo. Development 105(4): 733-737. Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills, A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. 2003. Dicer is essential for mouse development. Nat Genet 35(3): 215-217. Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G. et al. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122(6): 947-956. Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine, S.S., Wernig, M., Tajonar, A., Ray, M.K. et al. 2006. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441(7091): 349-353. Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and Hannon, G.J. 2007. Discrete Small RNA-Generating Loci as Master Regulators of Transposon Activity in Drosophila. Cell. Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. 2005. Principles of microRNAtarget recognition. PLoS Biol 3(3): e85. Buhler, M., Verdel, A., and Moazed, D. 2006. Tethering RITS to a nascent transcript initiates RNAi- and heterochromatin-dependent gene silencing. Cell 125(5): 873886. Cai, X., Hagedorn, C.H., and Cullen, B.R. 2004. Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. Rna 10(12): 1957-1966. Carmell, M.A., Girard, A., van de Kant, H.J., Bourc'his, D., Bestor, T.H., de Rooij, D.G., and Hannon, G.J. 2007. MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germline. Dev Cell 12(4): 503-514. Carmell, M.A., Xuan, Z., Zhang, M.Q., and Hannon, G.J. 2002. The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Gene Dev 16(21): 2733-2742. Chalfie, M., Horvitz, H.R., and Sulston, J.E. 1981. Mutations that lead to reiterations in the cell lineages of C. elegans. Cell 24(1): 59-69. Chambers, I., Colby, D., Robertson, M., Nichols, J., Lee, S., Tweedie, S., and Smith, A. 2003. Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell 113(5): 643-655. Chan, S.W., Zilberman, D., Xie, Z., Johansen, L.K., Carrington, J.C., and Jacobsen, S.E. 2004. RNA silencing genes control de novo DNA methylation. Science 303(5662): 1336. Chendrimada, T.P., Finn, K.J., Ji, X., Baillat, D., Gregory, R.I., Liebhaber, S.A., Pasquinelli, A.E., and Shiekhattar, R. 2007. MicroRNA silencing through RISC recruitment of eIF6. Nature 447(7146): 823-828. Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura, K., and Shiekhattar, R. 2005. TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436(7051): 740-744. Cheng, H.Y., Papp, J.W., Varlamova, O., Dziema, H., Russell, B., Curfman, J.P., Nakazawa, T., Shimizu, K., Okamura, H., Impey, S. et al. 2007. microRNA modulation of circadian-clock period and entrainment. Neuron 54(5): 813-829. Colmenares, S.U., Buker, S.M., Buhler, M., Dlakic, M., and Moazed, D. 2007. Coupling of double-stranded RNA synthesis and siRNA generation in fission yeast RNAi. Mol Cell 27(3): 449-461. Dalmay, T., Hamilton, A., Rudd, S., Angell, S., and Baulcombe, D.C. 2000. An RNAdependent RNA polymerase gene in Arabidopsis is required for posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell 101(5): 543-553. Deleris, A., Gallego-Bartolome, J., Bao, J., Kasschau, K.D., Carrington, J.C., and Voinnet, 0. 2006. Hierarchical action and inhibition of plant Dicer-like proteins in antiviral defense. Science 313(5783): 68-71. Deng, W. and Lin, H. 2002. miwi, a murine homolog of piwi, encodes a cytoplasmic protein essential for spermatogenesis. Dev Cell 2(6): 819-830. Doench, J.G., Petersen, C.P., and Sharp, P.A. 2003. siRNAs can function as miRNAs. Genes Dev 17(4): 438-442. Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in translational repression. Genes Dev 18(5): 504-511. Duchaine, T.F., Wohlschlegel, J.A., Kennedy, S., Bei, Y., Conte, D., Jr., Pang, K., Brownell, D.R., Harding, S., Mitani, S., Ruvkun, G. et al. 2006. Functional proteomics reveals the biochemical niche of C. elegans DCR-1 in multiple smallRNA-mediated pathways. Cell 124(2): 343-354. Fagard, M., Boutet, S., Morel, J.B., Bellini, C., and Vaucheret, H. 2000. AGO , QDE-2, and RDE-1 are related proteins required for post-transcriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci U SA 97(21): 11650-11654. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. 2005. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310(5755): 1817-1821. Forstemann, K., Horwich, M.D., Wee, L., Tomari, Y., and Zamore, P.D. 2007. Drosophila microRNAs are sorted into functionally distinct argonaute complexes after production by dicer-1. Cell 130(2): 287-297. Galiana-Arnoux, D., Dostert, C., Schneemann, A., Hoffmann, J.A., and Imler, J.L. 2006. Essential function in vivo for Dicer-2 in host defense against RNA viruses in drosophila. Nat Immunol 7(6): 590-597. Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. 2006. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312(5770): 75-79. Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099): 199-202. Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue): D109-111. Grimaud, C., Bantignies, F., Pal-Bhadra, M., Ghana, P., Bhadra, U., and Cavalli, G. 2006. RNAi components are required for nuclear clustering of Polycomb group response elements. Cell 124(5): 957-971. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27(1): 91-105. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. 2001. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106(1): 23-34. Grishok, A., Sinskey, J.L., and Sharp, P.A. 2005. Transcriptional silencing of a transgene by RNAi in the soma of C. elegans. Genes Dev 19(6): 683-696. Gunawardane, L.S., Saito, K., Nishida, K.M., Miyoshi, K., Kawamura, Y., Nagami, T., Siomi, H., and Siomi, M.C. 2007. A slicer-mediated mechanism for repeatassociated siRNA 5' end fobrmation in Drosophila. Science 315(5818): 1587-1590. Hall, I.M., Noma, K., and Grewal, S.I.S. 2003. RNA interference machinery regulates chromosome dynamics during mitosis and meiosis in fission yeast. P Natl Acad Sci USA 100(1): 193-198. Hall, I.M., Shankaranarayana, G.D., Noma, K., Ayoub, N., Cohen, A., and Grewal, S.I. 2002. Establishment and maintenance of a heterochromatin domain. Science 297(5590): 2232-2237. Hamilton, A., Voinnet, O., Chappell, L., and Baulcombe, D. 2002. Two classes of short interfering RNA in RNA silencing. Embo J21(17): 4671-4679. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901. Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. Proc Natl Acad Sci USA 102(31): 10898-10903. Harris, K.S., Zhang, Z., McManus, M.T., Harfe, B.D., and Sun, X. 2006. Dicer function is essential for lung epithelium morphogenesis. Proc Natl Acad Sci USA 103(7): 2208-2213. Houbaviy, H.B., Dennis, L., Jaenisch, R., and Sharp, P.A. 2005. Characterization of a highly variable eutherian microRNA gene. Rna 11(8): 1245-1257. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Humphreys, D.T., Westman, B.J., Martin, D.I., and Preiss, T. 2005. MicroRNAs control translation initiation by inhibiting eukaryotic initiation factor 4E/cap and poly(A) tail function. Proc Natl Acad Sci USA 102(47): 16961-16966. Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. 2001. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293(5531): 834-838. Hutvagner, G. and Zamore, P.D. 2002. A microRNA in a multiple-turnover RNAi enzyme complex. Science 297(5589): 2056-2060. Irvine, D.V., Zaratiegui, M., Tolia, N.H., Goto, D.B., Chitwood, D.H., Vaughn, M.W., Joshua-Tor, L., and Martienssen, R.A. 2006. Argonaute slicing is required for heterochromatic silencing and spreading. Science 313(5790): 1134-1137. Jakymiw, A., Lian, S., Eystathioy, T., Li, S., Satoh, M., Hamel, J.C., Fritzler, M.J., and Chan, E.K. 2005. Disruption of GW bodies impairs mammalian RNA interference. Nat Cell Biol 7(12): 1267-1274. Jia, S., Noma, K., and Grewal, S.I. 2004. RNAi-independent heterochromatin nucleation by the stress-activated ATF/CREB family proteins. Science 304(5679): 19711976. Kafri, T., Ariel, M., Brandeis, M., Shemer, R., Urven, L., McCarrey, J., Cedar, H., and Razin, A. 1992. Developmental pattern of gene-specific DNA methylation in the mouse embryo and germ line. Genes Dev 6(5): 705-714. Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev 19(4): 489-501. Karres, J.S., Hilgers, V., Carrera, I., Treisman, J., and Cohen, S.M. 2007. The conserved microRNA miR-8 tunes atrophin levels to prevent neurodegeneration in Drosophila. Cell 131(1): 136-145. Keller, G. 2005. Embryonic stem cell differentiation: emergence of a new era in biology and medicine. Genes Dev 19(10): 1129-1155. Kertesz, M., lovino, N., Unnerstall, U., Gaul, U., and Segal, E. 2007. The role of site accessibility in microRNA target recognition. Nat Genet 39(10): 1278-1284. Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. 2001. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15(20): 2654-2659. Ketting, R.F., Haverkamp, T.H., van Luenen, H.G., and Plasterk, R.H. 1999. Mut-7 of C. elegans, required for transposon silencing and RNA interference, is a homolog of Werner syndrome helicase and RNaseD. Cell 99(2): 133-141. Ketting, R.F. and Plasterk, R.H. 2000. A genetic link between co-suppression and RNA interference in C. elegans. Nature 404(6775): 296-298. Khvorova, A., Reynolds, A., and Jayasena, S.D. 2003. Functional siRNAs and miRNAs exhibit strand bias. Cell 115(2): 209-216. Kiriakidou, M., Tan, G.S., Lamprinaki, S., De Planell-Saguer, M., Nelson, P.T., and Mourelatos, Z. 2007. An mRNA m7G cap binding-like motif within human Ago2 represses translation. Cell 129(6): 1141-1151. Kloosterman, W.P. and Plasterk, R.H. 2006. The diverse functions of microRNAs in animal development and disease. Dev Cell 11(4): 441-450. Kumar, M.S., Lu, J., Mercer, K.L., Golub, T.R., and Jacks, T. 2007. Impaired microRNA processing enhances cellular transformation and tumorigenesis. Nat Genet 39(5): 673-677. Kuramochi-Miyagawa, S., Kimura, T., Ijiri, T.W., Isobe, T., Asada, N., Fujita, Y., Ikawa, M., Iwai, N., Okabe, M., Deng, W. et al. 2004. Mili, a mammalian member of piwi family gene, is essential for spermatogenesis. Development 131(4): 839-849. Lakatos, L., Szittya, G., Silhavy, D., and Burgyan, J. 2004. Molecular mechanism of RNA silencing suppression mediated by p19 protein of tombusviruses. Embo J 23(4): 876-884. Lander, E.S. Linton, L.M. Birren, B. Nusbaum, C. Zody, M.C. Baldwin, J. Devon, K. Dewar, K. Doyle, M. FitzHugh, W. et al. 2001. Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921. Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes. Science 313(5785): 363-367. Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5): 843854. Lee, R.C., Hammell, C.M., and Ambros, V. 2006a. Interacting endogenous and exogenous RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597. Lee, T.I., Jenner, R.G., Boyer, L.A., Guenther, M.G., Levine, S.S., Kumar, R.M., Chevalier, B., Johnstone, S.E., Cole, M.F., Isono, K. et al. 2006b. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125(2): 301-313. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S. et al. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425(6956): 415-419. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. 2004. MicroRNA genes are transcribed by RNA polymerase II. Embo J23(20): 40514060. Lei, H., Oh, S.P., Okano, M., Juttermann, R., Goss, K.A., Jaenisch, R., and Li, E. 1996. De novo DNA cytosine methyltransferase activities in mouse embryonic stem cells. Development 122(10): 3195-3205. Leung, A.K. and Sharp, P.A. 2007. microRNAs: a safeguard against turmoil? Cell 130(4): 581-585. Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): 15-20. Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. 2003. Prediction of mammalian microRNA targets. Cell 115(7): 787-798. Li, C.F., Pontes, O., El-Shami, M., Henderson, I.R., Bernatavichute, Y.V., Chan, S.W., Lagrange, T., Pikaard, C.S., and Jacobsen, S.E. 2006. An ARGONAUTE4containing nuclear processing center colocalized with Cajal bodies in Arabidopsis thaliana. Cell 126(1): 93-106. Li, H., Li, W.X., and Ding, S.W. 2002. Induction and suppression of RNA silencing by an animal virus. Science 296(5571): 1319-1321. Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, D.P., Linsley, P.S., and Johnson, J.M. 2005. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433(7027): 769-773. Lippman, Z., May, B., Yordan, C., Singer, T., and Martienssen, R. 2003. Distinct Mechanisms Determine Transposon Inheritance and Methylation via Small Interfering RNA and Histone Modification. PLoS Biol 1(3): E67. Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. 2004. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305(5689): 1437-1441. Liu, J., Rivas, F.V., Wohlschlegel, J., Yates, J.R., 3rd, Parker, R., and Hannon, G.J. 2005. A role for the P-body component GW 182 in microRNA function. Nat Cell Biol 7(12): 1261-1266. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A. et al. 2005. MicroRNA expression profiles classify human cancers. Nature 435(7043): 834-838. Mak, W., Nesterova, T.B., de Napoles, M., Appanah, R., Yamanaka, S., Otte, A.P., and Brockdorff, N. 2004. Reactivation of the paternal X chromosome in early mouse embryos. Science 303(5658): 666-669. Maroney, P.A., Yu, Y., Fisher, J., and Nilsen, T.W. 2006. Evidence that microRNAs are associated with translating messenger RNAs in human cells. Nat Struct Mol Biol 13(12): 1102-1107. Martinez, J. and Tuschl, T. 2004. RISC is a 5' phosphomonoester-producing RNA endonuclease. Genes Dev 18(9): 975-980. Masui, S., Nakatake, Y., Toyooka, Y., Shimosato, D., Yagi, R., Takahashi, K., Okochi, H., Okuda, A., Matoba, R., Sharov, A.A. et al. 2007. Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat Cell Biol 9(6): 625-635. Mathonnet, G., Fabian, M.R., Svitkin, Y.V., Parsyan, A., Huck, L., Murata, T., Biffo, S., Merrick, W.C., Darzynkiewicz, E., Pillai, R.S. et al. 2007. MicroRNA inhibition of translation initiation in vitro by targeting the cap-binding complex eIF4F. Science 317(5845): 1764-1767. Meister, G., Landthaler, M., Peters, L., Chen, P.Y., Urlaub, H., Luhrmann, R., and Tuschl, T. 2005. Identification of novel argonaute-associated proteins. Curr Biol 15(23): 2149-2155. Mitsui, K., Tokuzawa, Y., Itoh, H., Segawa, K., Murakami, M., Takahashi, K., Maruyama, M., Maeda, M., and Yamanaka, S. 2003. The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell 113(5): 631-642. Molnar, A., Schwach, F., Studholme, D.J., Thuenemann, E.C., and Baulcombe, D.C. 2007. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature 447(7148): 1126-1129. Motamedi, M.R., Verdel, A., Colmenares, S.U., Gerber, S.A., Gygi, S.P., and Moazed, D. 2004. Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs. Cell 119(6): 789-802. Mourrain, P., Beclin, C., Elmayan, T., Feuerbach, F., Godon, C., Morel, J.B., Jouette, D., Lacombe, A.M., Nikic, S., Picault, N. et al. 2000. Arabidopsis SGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance. Cell 101(5): 533-542. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005. Characterization of Dicer-deficient murine embryonic stem cells. ProcNatl Acad Sci USA 102(34): 12135-12140. Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon, G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693. Neilson, J.R., Zheng, G.X., Burge, C.B., and Sharp, P.A. 2007. Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev 21(5): 578-589. Nichols, J., Zevnik, B., Anastassiadis, K., Niwa, H., Klewe-Nebenius, D., Chambers, I., Scholer, H., and Smith, A. 1998. Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell 95(3): 379-391. Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B. 2007. Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 13(11): 1894-1910. Niwa, H. 2007. How is pluripotency determined and maintained? Development 134(4): 635-646. Niwa, H., Burdon, T., Chambers, I., and Smith, A. 1998. Self-renewal of pluripotent embryonic stem cells is mediated via activation of STAT3. Genes Dev 12(13): 2048-2060. Noma, K., Sugiyama, T., Cam, H., Verdel, A., Zofall, M., Jia, S., Moazed, D., and Grewal, S.I. 2004. RITS acts in cis to promote RNA interference-mediated transcriptional and post-transcriptional silencing. Nat Genet 36(11): 1174-1180. Nottrott, S., Simard, M.J., and Richter, J.D. 2006. Human let-7a miRNA blocks protein production on actively translating polyribosomes. Nat Struct Mol Biol 13(12): 1108-1114. O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against genome intruders. Cell 129(1): 37-44. Obernosterer, G., Leuschner, P.J., Alenius, M., and Martinez, J. 2006. Posttranscriptional regulation of microRNA expression. Rna 12(7): 1161-1167. Okano, M., Bell, D.W., Haber, D.A., and Li, E. 1999. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99(3): 247-257. Olsen, P.H. and Ambros, V. 1999. The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. Dev Biol 216(2): 671-680. Onodera, Y., Haag, J.R., Ream, T., Nunes, P.C., Pontes, 0., and Pikaard, C.S. 2005. Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120(5): 613-622. Pak, J. and Fire, A. 2007. Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science 315(5809): 241-244. Pal-Bhadra, M., Bhadra, U., and Birchler, J.A. 2002. RNAi related mechanisms affect both transcriptional and posttranscriptional transgene silencing in Drosophila. Mol Cell 9(2): 315-327. Pal-Bhadra, M., Leibovitch, B.A., Gandhi, S.G., Rao, M., Bhadra, U., Birchler, J.A., and Elgin, S.C. 2004. Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science 303(5658): 669-672. Pera, M.F. and Trounson, A.O. 2004. Human embryonic stem cells: prospects for development. Development 131(22): 5515-5525. Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress translation after initiation in mammalian cells. Mol Cell 21(4): 533-542. Pillai, R.S., Artus, C.G., and Filipowicz, W. 2004. Tethering of human Ago proteins to mRNA mimics the miRNA-mediated repression of protein synthesis. Rna 10(10): 1518-1525. Pillai, R.S., Bhattacharyya, S.N., Artus, C.G., Zoller, T., Cougot, N., Basyuk, E., Bertrand, E., and Filipowicz, W. 2005. Inhibition of translational initiation by Let-7 MicroRNA in human cells. Science 309(5740): 1573-1576. Rastan, S. and Robertson, E.J. 1985. X-chromosome deletions in embryo-derived (EK) cell lines associated with lack of X-chromosome inactivation. JEmbryol Exp Morphol 90: 379-388. Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. 2000. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772): 901-906. Robert, V.J., Sijen, T., van Wolfswinkel, J., and Plasterk, R.H. 2005. Chromatin and RNAi factors protect the C. elegans germline against repetitive sequences. Genes Dev 19(7): 782-787. Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. 2004. Identification of mammalian microRNA host genes and transcription units. Genome Res 14(10A): 1902-1910. Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A. et al. 2007. Requirement of bic/microRNA-155 for normal immune function. Science 316(5824): 608-611. Rougier, N., Bourc'his, D., Gomes, D.M., Niveleau, A., Plachot, M., Paldi, A., and Viegas-Pequignot, E. 1998. Chromosome methylation patterns during mammalian preimplantation development. Genes Dev 12(14): 2108-2113. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207. Saini, H.K., Griffiths-Jones, S., and Enright, A.J. 2007. Genomic analysis of human microRNA transcripts. Proc Natl Acad Sci USA. Scholthof, H.B. 2006. The Tombusvirus-encoded P19: from irrelevance to elegance. Nat Rev Microbiol. Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. 2003. Asymmetry in the assembly of the RNAi enzyme complex. Cell 115(2): 199-208. Seggerson, K., Tang, L., and Moss, E.G. 2002. Two genetic circuits repress the Caenorhabditis elegans heterochronic gene lin-28 after translation initiation. Dev Biol 243(2): 215-225. Sen, G.L. and Blau, H.M. 2005. Argonaute 2/RISC resides in sites of mammalian mRNA decay known as cytoplasmic bodies. Nat Cell Biol 7(6): 633-636. Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. 2007. Secondary siRNAs result from unprimed RNA synthesis and form a distinct class. Science 315(5809): 244247. Smith, A.G., Heath, J.K., Donaldson, D.D., Wong, G.G., Moreau, J., Stahl, M., and Rogers, D. 1988. Inhibition of pluripotential embryonic stem cell differentiation by purified polypeptides. Nature 336(6200): 688-690. Song, E., Lee, S.K., Dykxhoorn, D.M., Novina, C., Zhang, D., Crawford, K., Cerny, J., Sharp, P.A., Lieberman, J., Manjunath, N. et al. 2003. Sustained small interfering RNA-mediated human immunodeficiency virus type 1 inhibition in primary macrophages. J Virol 77(13): 7174-7181. Sontheimer, E.J. 2005. Assembly and function of RNA silencing complexes. Nat Rev Mol Cell Biol 6(2): 127-138. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. 2005. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123(6): 1133-1146. Sugiyama, T., Cam, H., Verdel, A., Moazed, D., and Grewal, S.I. 2005. RNA-dependent RNA polymerase is an essential component of a self-enforcing loop coupling heterochromatin assembly to siRNA production. ProcNatl Acad Sci USA 102(1): 152-157. Svoboda, P., Stein, P., Anger, M., Bernstein, E., Hannon, G.J., and Schultz, R.M. 2004. RNAi and expression of retrotransposons MuERV-L and IAP in preimplantation mouse embryos. Dev Biol 269(1): 276-285. Tang, F., Kaneda, M., O'Carroll, D., Hajkova, P., Barton, S.C., Sun, Y.A., Lee, C., Tarakhovsky, A., Lao, K., and Surani, M.A. 2007. Maternal microRNAs are essential for mouse zygotic development. Genes Dev 21(6): 644-648. Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A., Frendewey, D., Valenzuela, D., Kutok, J.L. et al. 2007. Regulation of the germinal center response by microRNA-155. Science 316(5824): 604-608. Thomson, J.M., Newman, M., Parker, J.S., Morin-Kensicki, E.M., Wright, T., and Hammond, S.M. 2006. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20(16): 2202-2207. Thurman, R.E., Day, N., Noble, W.S., and Stamatoyannopoulos, J.A. 2007. Identification of higher-order functional domains in the human ENCODE regions. Genome Res 17(6): 917-927. Tomari, Y., Du, T., and Zamore, P.D. 2007. Sorting of Drosophila small silencing RNAs. Cell 130(2): 299-308. Vagin, V.V., Sigova, A., Li, C., Seitz, H., Gvozdev, V., and Zamore, P.D. 2006. A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313(5785): 320-324. van Rij, R.P., Saleh, M.C., Berry, B., Foo, C., Houk, A., Antoniewski, C., and Andino, R. 2006. The RNA silencing endonuclease Argonaute 2 mediates specific antiviral immunity in Drosophila melanogaster. Genes Dev 20(21): 2985-2995. Vargason, J.M., Szittya, G., Burgyan, J., and Tanaka Hall, T.M. 2003. Size selective recognition of siRNA by an RNA silencing suppressor. Cell 115(7): 799-811. Vastenhouw, N.L., Fischer, S.E., Robert, V.J., Thijssen, K.L., Fraser, A.G., Kamath, R.S., Ahringer, J., and Plasterk, R.H. 2003. A genome-wide screen identifies 27 genes involved in transposon silencing in C. elegans. Curr Biol 13(15): 13111316. Vasudevan, S. and Steitz, J.A. 2007. AU-rich-element-mediated upregulation of translation by FXR1 and Argonaute 2. Cell 128(6): 1105-1118. Verdel, A., Jia, S., Gerber, S., Sugiyama, T., Gygi, S., Grewal, S.I., and Moazed, D. 2004. RNAi-mediated targeting of heterochromatin by the RITS complex. Science 303(5658): 672-676. Voinnet, 0. 2005. Induction and suppression of RNA silencing: insights from viral infections. Nat Rev Genet 6(3): 206-220. Volpe, T.A., Kidner, C., Hall, I.M., Teng, G., Grewal, S.I., and Martienssen, R.A. 2002. Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science 297(5588): 1833-1837. Wakiyama, M., Takimoto, K., Ohara, 0., and Yokoyama, S. 2007. Let-7 microRNAmediated mRNA deadenylation and translational repression in a mammalian cellfree system. Genes Dev 21(15): 1857-1862. Wang, B., Love, T.M., Call, M.E., Doench, J.G., and Novina, C.D. 2006a. Recapitulation of short RNA-directed translational gene silencing in vitro. Mol Cell 22(4): 553560. Wang, X.H., Aliyari, R., Li, W.X., Li, H.W., Kim, K., Carthew, R., Atkinson, P., and Ding, S.W. 2006b. RNA interference directs innate immunity against viruses in adult Drosophila. Science 312(5772): 452-454. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915): 520-562. Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75(5): 855-862. Williams, R.L., Hilton, D.J., Pease, S., Willson, T.A., Stewart, C.L., Gearing, D.P., Wagner, E.F., Metcalf, D., Nicola, N.A., and Gough, N.M. 1988. Myeloid leukaemia inhibitory factor maintains the developmental potential of embryonic stem cells. Nature 336(6200): 684-687. Wu, H., Neilson, J.R., Kumar, P., Manocha, M., Shankar, P., Sharp, P.A., and Manjunath, N. 2007. miRNA Profiling of Naive, Effector and Memory CD8 T Cells. PLoS ONE 2(10): e1020. Xiao, C., Calado, D.P., Galler, G., Thai, T.H., Patterson, H.C., Wang, J., Rajewsky, N., Bender, T.P., and Rajewsky, K. 2007. MiR-150 controls B cell differentiation by targeting the transcription factor c-Myb. Cell 131(1): 146-159. Xie, Q. and Guo, H.S. 2006. Systemic antiviral silencing in plants. Virus Res 118(1-2): 16. Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D., Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2(5): E 104. Xu, S., Witmer, P.D., Lumayag, S., Kovacs, B., and Valle, D. 2007. MicroRNA (miRNA) transcriptome of mouse retina and identification of a sensory organspecific miRNA cluster. JBiol Chem 282(34): 25053-25066. Yang, N. and Kazazian, H.H., Jr. 2006. L retrotransposition is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol 13(9): 763-771. Ye, K., Malinina, L., and Patel, D.J. 2003. Recognition of small interfering RNA by a viral suppressor of RNA silencing. Nature 426(6968): 874-878. Yekta, S., Shih, I.H., and Bartel, D.P. 2004. MicroRNA-directed cleavage of HOXB8 mRNA. Science 304(5670): 594-596. Yi, R., O'Carroll, D., Pasolli, H.A., Zhang, Z., Dietrich, F.S., Tarakhovsky, A., and Fuchs, E. 2006. Morphogenesis in skin is governed by discrete sets of differentially expressed microRNAs. Nat Genet 38(3): 356-362. Yin, H. and Lin, H. 2007. An epigenetic activation role of Piwi and a Piwi-associated piRNA in Drosophila melanogaster. Nature. Ying, Q.L., Nichols, J., Chambers, I., and Smith, A. 2003. BMP induction of Id proteins suppresses differentiation and sustains embryonic stem cell self-renewal in collaboration with STAT3. Cell 115(3): 281-292. Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. Embo J 24(1): 138-148. Zhang, X., Henderson, I.R., Lu, C., Green, P.J., and Jacobsen, S.E. 2007. Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci US A 104(11): 4536-4541. Zhang, X., Yuan, Y.R., Pei, Y., Lin, S.S., Tuschl, T., Patel, D.J., and Chua, N.H. 2006. Cucumber mosaic virus-encoded 2b suppressor inhibits Arabidopsis Argonaute 1 cleavage activity to counter plant defense. Genes Dev 20(23): 3255-3268. Zhao, T., Li, G., Mi, S., Li, S., Hannon, G.J., Wang, X.J., and Qi, Y. 2007a. A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii. Genes Dev 21(10): 1190-1203. Zhao, Y., Ransom, J.F., Li, A., Vedantham, V., von Drehle, M., Muth, A.N., Tsuchihashi, T., McManus, M.T., Schwartz, R.J., and Srivastava, D. 2007b. Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2. Cell 129(2): 303-317. Zilberman, D., Cao, X., and Jacobsen, S.E. 2003. ARGONAUTE4 control of locusspecific siRNA accumulation and DNA and histone methylation. Science 299(5607): 716-719. Chapter 2 Characterization of the short RNAs bound by the P 19 suppressor of RNA silencing in mouse embryonic stem cells This chapter has been presented in the context of its contemporary science, and originally appeared in RNA 12:2092-2102. Abstract Studies of mammalian RNA interference (RNAi) have focused largely on the actions of microRNAs; however, in other organisms, endogenous short-interfering RNAs (siRNAs) are involved in silencing processes. To date, similar molecules have been difficult to characterize in mammalian cells. P19 is a plant suppressor of RNA silencing that binds with high affinity to siRNAs. Here, the short RNAs bound by P19 in mouse embryonic stem (ES) cells have been characterized. We show that P19 selectively immunoprecipitates endogenous short RNAs from ES cells. Cloning of immunoprecipitated RNA reveals a strong selection for short RNAs that are exact matches to ribosomal RNA (rRNA), with particular short rRNA species highly enriched in P19 immunoprecipitates. Complementary strands to the enriched rRNAs were not cloned, surprising because P19 was previously though to bind only siRNAs. We show that P19 binds tightly to a non-canonical dsRNA substrate comprised of a short RNA annealed to a much longer partner, such that double-stranded region between the two is 19 base pairs long. Binding to similar endogenous species might explain the association of P19 with short rRNAs in ES cells. Finally, we show that the P19-enriched rRNAs are not involved in canonical RNAi, as they exist in the absence of Dicer and do not function as post-transcriptional gene silencers. Our results support the previous observation that endogenous siRNAs are not abundant molecules in mouse ES cells. Introduction Short RNAs play a central role in eukaryotic biology by regulating gene expression though a process called RNA interference (Novina and Sharp 2004). cDNA libraries sampling the short RNA population in mammalian cells reveal predominantly the products of a conserved class of non-coding RNA genes called microRNAs (miRNAs). Mature miRNAs are processed from longer RNA precursors through sequential cleavage by the RNAse III enzymes Drosha and Dicer, generating -22 nucleotide long species with defined 5' and 3' ends. It is thought that mammalian miRNAs primarily exert their influence on gene expression post-transcriptionally by binding with imperfect complementarity to the 3' UTR of their target mRNAs. An accumulating body of evidence suggests miRNAs are key regulators of both developmental transitions and cell-type specification (Ambros 2004; Bartel 2004; Alvarez-Garcia and Miska 2005; Farh et al. 2005; Stark et al. 2005). miRNAs are also misregulated in human cancers, potentially indicating a causal role in tumorigenesis (He et al. 2005; Lu et al. 2005; Voorhoeve et al. 2006). Almost all of the known functions of endogenous RNAi in mammals are attributed to miRNAs; in other eukaryotes this is not the case. The cloning of short RNAs from S. pombe, Arabidopsis, C. elegans, and Drosophilahas identified endogenous short-interfering RNAs (siRNAs) that are encoded by high copy sequences in the genome, such as transposons and retrotransposons. Except in Drosophila,these repeat-associated siRNAs (rasiRNAs) are thought to be processed by Dicer from long, repeat-derived double-stranded RNA (dsRNA). In S. pombe, Arabidopsis, and C. elegans, this dsRNA-precursor is generated in part by the action of an RNAdependent RNA polymerase, while in DrosophilarasiRNA biogenesis is less clear. rasiRNAs map predominantly to heterochromatic regions of the genome, and are proposed to be the guides for an siRNA-mediated transcriptional repression complex that nucleates heterochromatin (Ambros et al. 2003; Baulcombe 2004; Lippman and Martienssen 2004; Sontheimer and Carthew 2005; Lee et al. 2006; Vagin et al. 2006). Mouse embryonic stem (ES) cells are pluripotent cells derived from the inner cell mass of the blastocyst in the midst of the epigenetic reprogramming that occurs during early development (Jaenisch 1997; Li 2002). ES cells are capable of executing wellstudied epigenetic silencing processes, such as X-inactivation and silencing of Moloney leukemia viruses (Stewart et al. 1982; Cherry et al. 2000; Plath et al. 2002). These processes are similar to examples of rasiRNA-mediated silencing in other organisms. However, previous cloning efforts from ES cells did not identify repeat-associated siRNAs, potentially because they are too low in abundance compared to miRNAs and other short RNAs (Houbaviy et al. 2003). Developing methods to facilitate the identification of low abundance RNAi-specific short RNAs will therefore be necessary to more completely understand the function of RNAi in mammals. In this work, we express epitope-tagged versions of the P19 suppressor of RNAi silencing in ES cells in an attempt to identify endogenous siRNAs. The P19 protein is expressed by the Tombusvirus as a counter-defense against the plant RNAi-pathway that degrades RNA viruses, and functions by specifically binding and sequestering siRNAs that would otherwise mediate viral RNA destruction (Scholthof 2006). Biochemical and structural studies show that P19 binds with high affinity and specificity to siRNA-like molecules that are 5' phosphorylated with double stranded RNA segments 19 base pairs long. This affinity dramatically decreases if the double stranded RNA segment is either shorter or longer than 19 base pairs. P19 also has no measurable affinity for singlestranded RNA, or double-stranded DNA (Vargason et al. 2003; Ye et al. 2003; Lakatos et al. 2004). P19 transgenic plants display developmental defects associated with loss of miRNA function, and immunoprecipitations (IPs) of P19 from these plants enrich for miRNA duplexes, showing that P19 inhibits not only virally-induced gene silencing but also other endogenous RNAi processes (Silhavy et al. 2002; Chapman et al. 2004; Dunoyer et al. 2004). Together, these observations indicate P19 may be a useful tool to identify endogenous siRNAs present in mammalian cells. Indeed, P19 has previously been shown to inhibit RNAi in human cells, suggesting that its functions carry over to mammals (Dunoyer et al. 2004; Lecellier et al. 2005). Results Epitope-tagged P19 binds short RNAs from ES cells. Putative rasiRNAs involved in heterochromatin formation might be localized specifically to the nucleus. For this reason, we constructed two vectors for mammalian P19 expression; both were tagged at the C-terminus with a V5-6xHis epitope, and one contained two tandem SV40 nuclear localization sequences, increasing its nuclear concentration (hereafter referred to as P19V5 and P19NLS, respectively; Figure lA). Immunofluorescence of transiently transfected ES cells shows P19V5 is present in both the cytoplasm and nucleus, while P 19NLS is localized almost exclusively to the nucleus (Figure l B). To test if P19 bound endogenous short RNAs from ES cells, we extracted the nucleic acids that co-immunoprecipitated with our epitope-tagged constructs and radioactively labeled the 3' end of the bound RNA with 5'32p cytidine 3',5'bis(phosphate). To recover both cytoplasmic and nuclear siRNA molecules, P19V5 was immunoprecipitated from whole cell extracts (WCE). Western blots of the WCE detected both cytoplasmic and nuclear markers, indicating the presence of proteins from both sub-cellular compartments (Figure IC). Comparing the P19V5 bound RNA to the RNA from the WCE supernatant shows a strong enrichment for short RNAs -20 nucleotides long in the P19V5 immunoprecipitate (Figure ID, lane 7 vs. 8). This enrichment is P19V5-dependent, as there is no detectable RNA immunoprecipitated from cells transfected with GFP (Figure lD, lane 2). To better enrich for putative low-abundance nuclear siRNAs, we performed an immunoprecipitation of P19NLS from a nuclear extract made under non-denaturing conditions. Western blots comparing the intensities of GAPDH and Cyclin T1 between the cytoplasmic (CE) and nuclear (NE) fractions of the extract show a decrease in GAPDH and an increase in Cyclin Tl in the nuclear as compared to the cytoplasmic extract, indicating a -3-fold nuclear enrichment (Figure IC). Immunoprecipitation of P 19NLS from the NE enriches for short RNAs identical in length to those immunoprecipitated from the CE of the P 19NLS-transfected cells, as well as the WCE of P19V5-transfected cells (Figure ID lanes 4, 6, and 8). We cloned the short RNAs bound by the P19 constructs in ES cell extracts using a procedure that selects for 18-26 nucleotide long RNAs that have 5' phosphates and 2' or 3' hydroxyls ((Lagos-Quintana et al. 2001; Lau et al. 2001); J.R. Neilson and P.A.S, 54 manuscript in preparation). As controls, we also cloned short RNA from the supernatants of the P19-transfected cell extracts. In each case, roughly 300 independent clones were analyzed (Table 1). Sequences were initially annotated as known non-coding RNAs (ncRNAs) if they had an exact match to any annotated non-protein coding RNA in the Rfam and NONCODE RNA databases, including: rRNAs, tRNAs, snRNAs, snoRNAs, and miRNAs, as well as ncRNAs involved in imprinting and other processes (GriffithsJones et al. 2003; Griffiths-Jones et al. 2005). Those clones that had no matches in either database were deemed novel, and analyzed against the mouse genome using the UCSC genome browser to identify overlapping genomic features (Kent et al. 2002; Karolchik et al. 2003). Sequences with no exact match to the genome were re-annotated as known ncRNAs or novel if they had at least 90% identity with a sequence in either set. Tables I and 2 are summaries of the cloning data and short RNA annotation. The majority of all sequences cloned were known ncRNAs, regardless of the RNA starting material (Tables 1 and 2). Short RNAs immunoprecipitated by P19 were on average 2 nucleotides shorter than those cloned from control supernatants (22 vs. 20 nucleotides for the WCE supernatant vs. IP and 23 vs 21 nucleotides for the NE supernatant vs. IP, p <0.0001 for both; Table 1), consistent with previous observations that P19 has a decreased affinity for double-stranded RNA both shorter and longer than 19 base pairs (Vargason et al. 2003; Ye et al. 2003). Moreover, the GC content of the P19-bound RNA was significantly higher than control RNA (53% vs 75% for the WCE sup vs. IP, and 49% vs. 67% for the NE sup vs. IP; p <0.0001 for both; Table 1). This large average difference in GC-content was observed for both known ncRNAs and novel RNAs, indicating an overall preference of P19 for GC-rich RNA (Table 1). 55 A proportion of the novel sequences, between 2-3% of all sequences cloned from each population, were exact matches to known repetitive elements catalogued by Repeatmasker (Table 2, Supplementary Data). On average, these short RNAs were slightly shorter than the average miRNA cloned in this study (20.9 vs. 22.3 nucleotides for repeat-associated RNAs vs. miRNAs; p<0.0001). Similar repeat-derived short RNAs were not identified in previous ES cell cloning efforts, most likely due to limitations in the depth of short RNAs sequenced (Houbaviy et al. 2003). Strikingly, P19 immunoprecipitation selected against miRNAs and enriched for short RNAs matching mature ribosomal RNA (rRNA) species (Table 2, Supplementary Data). 48.9% and 57.9% of the short RNAs cloned from the WCE and NE supernatants mapped to annotated miRNA hairpins (miRNAs and miRNA*s), compared to only 7.5% and 17.3% of those cloned from the immunoprecipitates. Conversely, 69.4% and 51.0% of all sequences cloned from the WCE and NE immunoprecipitates were short rRNAs, compared to 29.4% and 22.1% of sequences cloned from the WCE and NE supernatants, respectively (Table 2). Interestingly, the immunoprecipitates also lacked the natural complementary sequence of the miRNA duplex, the miRNA* strand (Table 2). The ratio of miRNA to miRNA* strands in the supernatants was roughly 24:1, and about the same level in P19 immunoprecipitates, indicating there was no selection for miRNA duplexes by P19. This was surprising given that P19 binds tightly to double-stranded RNA with almost no affinity for single stranded RNA substrates, and associates with miRNA duplexes in plants (Silhavy et al. 2002; Vargason et al. 2003; Ye et al. 2003; Chapman et al. 2004; Dunoyer et al. 2004; Lakatos et al. 2004). The specific miRNAs cloned from the immunoprecipitates were similar to those cloned from the supernatants, perhaps consistent with the majority of P19-associated miRNAs being derived from a non-specific background. The same group of 20 miRNAs comprised 79% and 83% of all miRNAs cloned from WCE IP and supernatant, respectively, and comprised 78% and 77% of all miRNAs cloned from the NE IP and supernatant, respectively. There was no significant difference in the GC-content or length of the miRNAs cloned from the immunoprecipitates compared to those cloned from the supernatants (not shown). If indeed the miRNAs present in P19 immunoprecipitates were due to background contamination, then a comparison of the number cloned from each population would suggest that P19 immunoprecipitation gave a 7-fold enrichment of bound short RNAs versus miRNAs from the supernatant of the WCE, and a 3-fold enrichment of bound short RNAs versus miRNAs from the supernatant of the NE. These are minimal estimates as some miRNAs might be preferentially bound to P19. All but one of the short rRNAs cloned from the supernatants and the immunoprecipitations were in the sense orientation relative to the full-length transcribed 45S pre-rRNA and 5S RNA, and 97% of these sequences mapped to the mature 18S, 5.8S, or 28S rRNA. Figure 2A shows a representation of all the short rRNAs cloned, aligned to bases 3,900 to 13,000 of the 13,404 base 45S pre-ribosomal RNA. Only 13 out of 564 total cloned short rRNAs did not have an exact match in this region; 7 of these were matches to the 5S rRNA and the remaining 6 mapped to regions of the 45S precursor not included in Figure 2A. Particular classes of short rRNAs were highly enriched in the P19-bound RNA compared to unbound controls (marked with an * in 57 Figure 2A). Mapping these RNAs to established rRNA secondary structure maps shows the enriched short rRNAs are not necessarily from regions of the ribosome that resemble canonical Drosha or Dicer substrates (not shown; (Cannone et al. 2002)). Because P19 binds almost exclusively to double-stranded RNA, it was surprising that no short RNAs with exact complementarity P19-enriched short rRNAs were cloned. Speculating that these RNAs could possibly form partial duplexes with other short RNAs, we investigated whether the most abundantly cloned rRNAs in the immunoprecipitations (cloned more than 5x) could form bulged duplexes with other abundantly cloned species. We limited our initial analysis to the most abundantly cloned short RNAs, reasoning that abundantly cloned short RNAs could have been likely complements in binding to P19. The short rRNAs numbered 1 through 12 in Figure 2A were folded against each other in all possible permutations using the Mfold server (Zuker 2003). Of the 140 predicted structures, we found 14 that when annealed had a double-stranded region of the right size to potentially bind P19 with high affinity, between 18 and 20 base pairs (Vargason et al. 2003). In all cases, there was at least one bulged region in the RNA duplex predicted by Mfold (Figure 2B for selected examples); however, it has been previously observed that P19 associates with bulged duplexes in plants (Chapman et al. 2004). This partial complementarity between particular abundantly cloned species provides a possible explanation for the enrichment of some short rRNAs in the P19 IP, but not all. Notably, although short rRNA #6 was the second most abundantly cloned short rRNA in the P19 immunoprecipitations, it was not predicted to form any duplexed structures with other abundantly cloned short rRNAs. 58 P19 binds to regions of short dsRNA. It is possible that P19 also binds substrates with dsRNA regions that are siRNAlike in length, but contain one strand that is significantly longer than the standard siRNA length of 21 nucleotides. Biochemical analysis shows that RNA duplexes without 3' overhangs bind with slightly higher affinity to P19 than their 3' overhang-containing counterparts, indicating that the length of the RNA duplex and not the overhang is the key factor in determining P19 binding (Vargason et al. 2003). This hypothesis could explain why some molecules immunoprecipitated by P19 form no canonical siRNA-like duplexes with other cloned RNAs; short RNAs complementary to longer RNAs could be bound by P19, and the iterative size selection in the short RNA cloning protocol would exclude the longer RNA binding partner from the final set of cloned sequences. The high GC content of the short RNAs bound by P19 is consistent with this hypothesis (Table 1), allowing a greater promiscuity for stable binding to longer RNA partners. Indeed, in addition to enriching for short RNAs, P19 immunoprecipitation reproducibly pulls down larger RNA species that are the same size as abundant tRNAs, snRNAs, and rRNAs (Figure iD, 4A). While no immunoprecipitated RNAs have exact complements in the set of known tRNAs, snRNAs, and rRNAs, gapped alignments allowing G:U pairing show that P 19-enriched RNAs could form duplexes with several ncRNA species, potentially explaining the enrichment for ncRNAs as well as short rRNAs in the immunoprecipitation (not shown). We designed three different RNA duplexes to test the affinity of P19V5 for short RNAs bound to longer RNAs (Figure 3A). One strand of each duplex was a frequently cloned short rRNA, while the other varied such that its complement was either: in the 5' region of a 32 nucleotide long RNA species (5' complementary), in the 3' region of a 35 nucleotide long RNA species (3'complementary), or a 21 nucleotide long RNA that formed a canonical 19 base pair siRNA duplex (siRNA). The length of the doublestranded region in the 5' complementary RNA was 19 base pairs, and 21 base pairs in the 3' complementary RNA. Since P19 preferentially binds 19 base pair duplexes, the latter may be expected to have a lower affinity than the former. In all cases, the dsRNA species had 5' phosphate and 3' hydroxyl groups, mimicking endogenous siRNA structure. Binding assays were performed by incubating radiolabelled RNA substrate in increasing concentration with P19V5-Protein G agarose bead complexes (Figure 3). Controls experiments showed that all input RNA was double-stranded and of the appropriate dilution (not shown). Under the conditions assayed, there was negligible non-specific association of RNA with beads, and the P19V5-bead complexes had no affinity for single-stranded RNA (Figure 3B). The apparent dissociation constant, Kapp, of P19V5 for an siRNA duplex was determined to be 2.8 t 0.5 nM, nearly identical to previously published binding studies using a P19 C-terminally tagged with GST (Figure 3C, (Lakatos et al. 2004)). P19V5 bound the 5' and 3' complementary RNAs with a Kapp of 7.4 ± 0.7 nM and 27 ± 4 nM, respectively, supporting our hypothesis that P19 binds regions of dsRNA -19 base pairs long and not only siRNAs (Figure 3C). Unknown function of P19 bound short rRNAs. We next tried to determine the function of the short rRNAs enriched in the P19 immunoprecipitation. Short RNAs with exact matches to abundant tRNAs, rRNAs, and snRNAs have been frequently dismissed as non-functional degradation products of abundant ncRNAs, particularly in mammalian cloning efforts. While this may be true, it is worth noting that these non-coding RNAs are derived from highly repetitive elements in mammalian genomes, and in this respect are similar to other known targets of RNAimediated transcriptional silencing from which rasiRNAs have been cloned (Lander et al. 2001; Waterston et al. 2002). Studies from S. pombe and Arabidopsis have implicated short rRNAs in the formation of heterochromatin at rDNA repeats, setting a precedent for short rRNA functionality (Xie et al. 2004; Cam et al. 2005). One common feature of these short rRNAs involved in chromatin silencing is that they are both sense and antisense to the full-length, transcribed rRNA, supporting the idea that they are generated from processing of a longer dsRNA precursor (Xie et al. 2004; Cam et al. 2005). In contrast, all but one of the short rRNAs cloned in this study were in the sense orientation relative to transcription of the mature rRNAs, suggesting that they arose either by breakdown or processing of mature rRNA sequence. However, this observation does not exclude a possible role for short rRNAs in the chromatin silencing of ES cell rDNA repeats. Recently, ncRNAs mapping directly upstream of the rDNA transcriptional start site have been shown to direct the nucleolar remodeling complex, NoRC, to transcriptionally silence rDNA repeats in mouse 3T3 cells (Mayer et al. 2006). No short RNAs cloned in this study map to this region of the rDNA repeat, consistent with the authors' observation that the NoRC associated RNAs are 150-300 nucleotides long. It is also possible that the short rRNAs immunoprecipitated by P19 are involved in post-transcriptional gene silencing (PTGS). If this were true, one might expect complementary sequences to be present in exons or 3' UTRs of known genes. BLAST analysis of the P19-enriched rRNAs against the mouse genome shows that exactly complementary sequences are not present in known mRNAs, suggesting that these enriched rRNAs are not involved in siRNA-like PTGS. In support of this, no effect was seen on expression of a luciferase reporter when two perfectly complementary binding sites to selected short rRNAs were inserted into its 3' UTR (Supplementary Figure IA). Finally, we examined whether P 19V5 associates with short RNAs in an ES cell line lacking Dicer RNAse III activity. These ES cells were derived from mice homozygous for a conditional allele of Dicer in which the key catalytic residues in the second RNAse III domain are floxed (Harfe et al. 2005). Consistent with previously published results, excision of the floxed region via transient transfection of Cre recombinase results in viable ES cells that do not express miRNAs ((Murchison et al. 2005); Figure 4B; J.M.C. and P.A.S. unpublished). P19V5 immunoprecipitates short RNAs from Dicer null ES cells as efficiently as from Dicer containing cells, shown by 3' end labeling of immunoprecipitated RNA (Figure 4A). Similarly, short RNA northern blots probing P19 immunoprecipitates for enriched short rRNA #3 show the same enrichment in the presence and absence of Dicer (Figure 4B). Reprobing of the same northern blot shows miR295 is absent from P 19-immmunoprecpitates but present in the supernatants from Dicer positive cells, confirming the cloning data that shows a selection against miRNAs in P 19 immunoprecipitates (Figure 4B). Northern blots to total RNA preparations show that RNA#3 is present in ES cells at the same level in the presence and absence of P 19V5, and is not induced non-specifically by transfection (Supplementary Figure IB). A survey of RNA from various mouse tissues indicates that RNA #3 is detectable in several samples, but at levels much lower than those observed in ES cells (Supplementary Figure 1C). Together, these results indicate that enriched rRNAs are generated independently of Dicer and are not byproducts of P19 expression or transfection. Discussion We have shown that V5-tagged P19 associates with endogenous short RNAs in mouse ES cells. Cloning of these RNAs from whole-cell and nuclear extracts revealed that P19 associates predominantly with short, GC-rich RNAs that are exact matches to portions of the mature 28S and 18S rRNAs. The function of these short rRNAs is not clear. We show that they exist in the absence of Dicer and do not function in endogenous PTGS, suggesting that they are not in the canonical RNAi pathway. In other organisms, short rRNAs have been implicated in the chromatin silencing of rDNA repeats (Xie et al. 2004; Cam et al. 2005); whether the same is true of the short rRNAs identified in this work is not clear. Intriguingly, P19V5 did not immunoprecipitate an easily detectable pool of short RNAs from 293T cells as it did in ES cells, suggesting that similar short rRNAs are not produced to the same extent in this human embryonic kidney cell line, or that they are inaccessible to P19 (Supplementary Figure 2A). In plants, over-expression of P19 leads to a general accumulation of miRNA* strands, and P19 immunoprecipitates contain both miRNA and miRNA* strands, as detected by short RNA northern blots to total and P19 immunoprecipitated RNA, respectively (Chapman et al. 2004; Dunoyer et al. 2004). In contrast, P19 expression in mouse ES cells did not lead to accumulation of miRNA* strands in cell extracts, nor were miRNA* strands selectively detected in P19 immunoprecipitates, suggesting that P19 is unable to access miRNA duplexes as efficiently in ES cells as in plants. Because of our experimental design, we obtained cloning data from a nuclearenriched extract. We observed few significant differences between the short RNA profiles of nuclear compared to cytoplasmic extracts, suggesting that novel short RNAs are not abundant in the nucleus of ES cells. A surprisingly high proportion of short RNAs in the nuclear extract were miRNAs given that mature miRNAs are predominantly cytoplasmic (Houbaviy et al. 2005). A simple explanation for these results is that the profile of short RNAs in the nucleus is similar to that in the cytoplasm. It is also possible that most of the short RNAs present in the nuclear extract were from cytoplasmic contamination. Controls testing the extent of fractionation in nuclear extracts suggest a three-fold enrichment for components of this compartment. This level of enrichment might not be sufficient for the identification of short, nuclear-localized RNAs if they are low in abundance compared to cytoplasmic short RNAs. Alternatively, nuclear siRNAs might be chemically modified such that they are not identifiable by the cloning methods used here, which require a 5' phosphate and 2' or 3' hydroxyl groups. Notably, between 2 to 3% of the short RNAs cloned in this study overlapped with repetitive elements catalogued by Repeatmasker. It is unlikely that these RNAs exist predominantly as double-stranded siRNAs in cell extracts as they were not enriched in P19 immunoprecipitates. Their function is unclear, but may they may be analogous to repeat-associated siRNAs identified in S.pombe, C.elegans, Drosophila,and Arabidopsis. Alternatively, because such a large proportion of the mouse genome is annotated as repetitive, overlap of specific short RNAs with repeats may be coincidental and not indicative of novel function (Lander et al. 2001; Waterston et al. 2002). Recently, repeat-associated short RNAs were identified from mouse oocytes, where they appear to be as abundant as miRNAs. Also, reporter constructs with complementary 3' UTR sequences were destabilized, suggesting that at this developmental stage, repeat-associated short RNAs are involved at least in the PTGS of complementary transcripts (Watanabe et al. 2006). The short RNAs overlapping repetitive elements identified in this study are probably at a far lower abundance, being about 25 fold less abundant than miRNAs. This difference suggests a difference in activity of these sequences in ES cells compared to oocytes. Previous biochemical studies have shown that P19 dimers bind tightly to siRNA duplexes (Vargason et al. 2003; Ye et al. 2003; Lakatos et al. 2004). Here, we show that compared to an siRNA, P19 binds with roughly 3-fold reduced affinity to a dsRNA species containing one strand that extends far beyond the edge of the RNA duplex. We therefore conclude the likely reason for the association of P19 with specific short rRNAs in ES cells is that they bind with partial complementarity to larger, abundant non-coding RNAs to form RNA duplexes - 19 base pairs long. The observed 3-fold difference in affinity for a 19 base pair duplexed siRNA compared to a 19 base pair duplex with an extended 3' strand is small enough that abundant endogenous dsRNAs similar in structure to the latter could compete with endogenous siRNAs for P 19-binding. It should be noted that P19V5 as well as a previously published, HA epitopetagged P19 construct, did not inhibit siRNA-mediated knockdown of a reporter gene in 293T cells (Supplementary Figure 2C). This conflicts with results from HeLa cells, where epitope-tagged P19 was an effective inhibitor of exogenously introduced siRNAs, and with results from 293T cells, where untagged P19 was able to interfere with endogenous miRNA activity (Dunoyer et al. 2004; Lecellier et al. 2005). The reason for these discrepancies is unclear, but it suggests that P19 may not inhibit RNAi in heterologous systems as robustly as previously anticipated. The original objective of these experiments was to use the affinity of P19 for siRNA duplexes to identify these types of short RNAs in mouse ES cells. Functional and biochemical studies of P 19 in plants and animals were consistent with this approach. Analysis of the short RNAs bound by P 19 in ES cells did not generate an apparent siRNA fraction in spite of evidence for a strong enrichment of certain RNA species as compared to the supematants from associated ES cell extracts. RNAs with siRNA structure do bind to P19 with high affinity, and the failure to detect them might indicate their scarcity in these cells. Alternatively, P19 might be denied access to endogenous RNAi-related short RNAs if the majority of siRNAs in ES cells are bound by RNAi pathway components that have a higher affinity for these molecules than P19. Material and methods Plasmid construction. CIRV P19 (obtained from J. Burgyan) was amplified via PCR and cloned into pcDNA3.1 (Invitrogen), using the primers 5'-CACCATGGAACGAGCTATACAAGGA-3' and 5'- CTCGCTTTCTTTCTTGAAGGTTTCA-3'. To make pl9NLS, two SV40 NLS sequences were added to p19V5 by annealing two DNA oligos with an 18 bp overlap, (5'-CCGCTCGAGTGATCCAAAAAAGAAGAGAAAGGTAGATCCAAAA-3'and 5'- CGAACCGCGGTACCTTTCTCTTCTTTTTTGGATCTACCTTTCT-3'), filling-in with Taq DNA polymerase, and inserting into the XhoI/SacII sites of pcDNA3.1. pRL-CMV6xCXCR4 was described previously (Doench et al. 2003). The P19HA construct was obtained from O. Voinnet (Dunoyer et al. 2004). The V5-tagged PGalactosidase control plasmid was obtained from H. Houbaviy. The 3' UTR reporter constructs were made by inserting 5' phosphorylated, annealed dsDNA oligos into the XhoI /ApaI sites in the 3' UTR of pRL-CMV (Promega). The sequences of the DNA oligos used to make the UTR constructs are available upon request. All DNA oligos were purchased from IDT. Cell Culture, transfection, and luciferase assays. J1 ES and 293T cells were grown as in (Houbaviy et al. 2003; Petersen et al. 2006). Cells were transfected with Lipofectamine 2000 (Invitrogen) according to the manufacturer's instructions. 25 [tg of plasmid per 10cm plate was used for ES cell and 293T immunoprecipitation experiments. Transfection efficiency as assessed by GFP expression was between 60-80% for ES cell immunoprecipitation experiments, and >95% for 293T experiments. Dual luciferase assays were performed essentially as previously described (Doench and Sharp 2004; Petersen et al. 2006). For ES cell assays, Ing of each 3' UTR reporter construct, 100ng of pG13 control (Promega), and 700ng of pWhiteScript carrier DNA were transfected per well of a 24-well plate and lysed 24 hours post-transfection. For 293T assays, 4ug of plasmid (P19V5/NLS/HA or fPGal control) was transfected per well of a 6-well plate, and split 8 hours post-transfection into a 24-well plate, seeding cells at 2x10A5 cells per well. 24 hours later, an additional 0.7[tg of plasmid was cotransfected along with 30 ng of pRL and pGL3 plasmids (Promega) and the appropriate amount of siRNA to attain the concentration indicated in Supplementary Figure 1. Cells were lysed 24 hours post-transfection. The siRNA used perfectly targeted the coding sequence of Renilla Luciferase (target sequence: 5'-GCCAAGAAGUUUCCUAAUA- 3'). Western blots and immunohistochemistry. For western analysis, cell extracts were fractionated on 4-20% SDS-polyacrylamide gradient gels (Biorad) and transferred to Hybond-C membranes (Amersham). Membranes were blocked with 5% milk in PBS then incubated with the indicated antibody. Antibodies (aGAPDH (Chemicon); aV5 (Invitrogen); aCyclin Tl (Abcam)) were detected using HRP-conjugated antisera (Amersham) and chemiluminescence. For immunofluorescent staining, ES cells were transfected on gelatinized cover slips, then fixed with 4% PFA in PBS and permeabilized with 1%Triton X-100. Coverslips were stained with primary and secondary antibodies in PBS for one hour and affixed to glass slides with Prolong Gold with DAPI (Molecular Probes). Actin was visualized using Alexa 488 Phalloidin (Molecular Probes). Immunoprecipitations and RNA isolation and visualization. For each immunoprecipitation experiment three 10cm plates of ES cells were lysed two days post-transfection with WCE buffer (1%NP40, 30 mM HEPES KOH pH 7.5, 100 mM NaCi, 66 mM KC1, 1 mM MgCl 2, ImM DTT, 1U/ul SUPERaseln (Ambion), Complete Protease Inhibitor EDTA-free (Roche), and Phosphatase Inhibitors I and II (Sigma)) or NE buffer (0.2% NP-40, 100mM NaC1, 5mM MgCl 2, ImM DTT, 1U/ul SUPERaseln, Complete Protease Inhibitor EDTA-free, and Phosphatase Inhibitors I and II). The WCE lysate was spun at 20000 rpm in a tabletop centrifuge for 15 minutes after lysis to pellet insoluble material. The NE cells were lysed for 5 minutes at 40 C and spun for 5 minutes at 1000g. The pellet (containing nuclei) was resuspended in WCE buffer and sonicated on ice 3 times for 5 seconds at power level 2 on a Branson Sonifier 450 sonicator, with one-minute rest on ice between sonication. These sonication conditions did not disrupt endogenous short RNA binding to P19 (not shown). The sonicated nuclear fraction was spun at 20000 rpm for 15 minutes in a tabletop centrifuge to remove particulate. P19 was immunoprecipitated from extracts by adding 20ul of Protein G Plus agarose beads (Pierce) preconjugated overnight with 2ug of V5 antibody per 10cm plate, and rotating at 40 C for 45 minutes. Beads were washed with Iml of WCE buffer 3 times, switching tubes for the final wash. Beads were then resuspended in 300ul lx Proteinase K buffer (Nykanen et al. 2001) with 5ul of Proteinase K (Roche) and rotated at RT for 30 minutes, then extracted with phenol/chloroform and precipitated. To isolate supernatant RNA, 300ul of supernatant from the immunoprecipitation was extracted with phenol/chloroform and precipitated. 100ng of supernatant RNA and 1/10 th the volume of immunoprecipitated RNA were 3' end labeled with T4 RNA ligase (NEB) and 5' a32P 5'3' cytidine bis-phosphate (NEN) overnight at 40 C in lx RNA ligase buffer and 30% v/v DMSO. Unincorporated radioactivity was removed with G25 microspin columns (Amersham), and one-half of the labeling reaction was resolved on a 12% or 15% denaturing polyacrylamide gel (National Diagnostics). For size markers, 10 bp DNA ladder (Invitrogen) or a 21 bp siRNA was 5' end-labeled using Y32 P-ATP (NEN) and T4 PNK (NEB). Gels were wrapped in saran wrap and quantitated on a phosphoimager. Northern blots were performed as in (Houbaviy et al. 2003), except hybridization was carried out in Oligo-Hyb (Ambion) at 370 C. Total mouse tissue RNA was purchased from Ambion. The same conditions for the ES cell WCE immunoprecipitations were used for 293T immunoprecipitations. Binding assays. 293T lysates were used to make P19-containing cell extracts for binding assays because of the lack of association between P19 and endogenous RNAs in these cells (Supplementary Figure 1). Extracts were made 48 hours post-transfection with P19V5 using 400ul of WCE buffer per plate. Cells were lysed, and extracts spun at top speed in a tabletop centrifuge for 15 minutes. Protein G Plus agarose beads pre-conjugated overnight with V5 antibody were added to cleared lysate, and P19V5 was immunoprecipitated for 45 minutes at 40 C. Beads were washed 3x in WCE buffer, and resuspended in WCE buffer such that for each binding condition, 30ug of extract, 10•lI of 50% protein G slurry, and 0.5 gpg of V5 antibody were used. Radiolabeled dsRNA was added to P 19V5-bead mixtures at the indicated concentrations, and the binding reactions were rotated for 30 minutes at room temperature. Beads were washed 2x with 500ul of WCE buffer, then 2x with iml of WCE buffer, switching tubes for the final wash. The bound RNA was eluted using the Proteinase K phenol/chloroform treatment described above. One-half of each reaction was resolved on a 15% polyacrylamide gel and visualized with a phosphoimager, or quantitated via scintillation counting. Data points were then fit to a fixed-endpoint curve, (m2)*mO/(mO+ml), using KaleidaGraph software, where m2 is the maximum amount bound, ml is the apparent dissociation constant, Kapp, and mO is the [RNA]. To make radiolabeled, duplexed RNA, 5' phosphorylated RNA oligos were ("RNA#3" 5'- CGGCUCCGGGACGGCCGGGAA-3'; "5' complementary" 5'CCCGGCCGUCCCGGAGCCGGCUUGGCUUCGU-3'; "3'complementary" 5'CUUGGCUUCGUUUCCCGGCCGUCCCGGAGCCGUU-3'; "siRNA complementary" 5'- CCCGGCCGUCCCGGAGCCGUU-3') annealed to their complementary strands at 12 jtM in a solution containing 10mM HEPES pH 7.5, 20mM NaCl and ImM EDTA by heating the RNAs to 950 C and cooling 1 degree/minute until the samples reached room temperature. Radioactive 5' end-labeled RNA#3 was spiked into annealing reactions before heating. Dilution series of each dsRNA species were made, and portions of each were run on a 20% native polyacrylamide gel to assess efficiency of annealing and accuracy of dilution. By this analysis, all dsRNA preparations used for binding were >99% dsRNA and had an R2-value for the dilution series of at least 0.99 (not shown). Short RNA cloning and sequence analysis. Short RNAs were cloned using a procedure modified from (Lagos-Quintana et al. 2001; Lau et al. 2001) (J.R. Neilson and P.A.S, manuscript in preparation). Before adaptor ligation, 18-26 nucleotide long RNA was gel-purified from 5ug of supernatant RNA to use as starting material; immunoprecipitated RNA from one 10cm plate equivalent was used as starting material and was not gel purified. Short RNA sequences were extracted from concatamers using scripts from (Houbaviy et al. 2003). Rfam (http://www.sanger.ac.uk/Software/Rfam/) and NONCODE (http://www.bioinfo.org.cn/NONCODE/) RNA databases were used to define known miRNAs, tRNAs, rRNAs, snRNAs, and snoRNAs. All genome analysis was performed using the August 2005 assembly of the mouse genome (mm7). BLAST was run with a word size of 7 and the gap-opening penalty set to 1. Repeat and EST overlap was determined using the UCSC genome browser (http://genome.ucsc.edu/) Repeatmasker (http://wwwv.repeatmasker.org) and mouse EST tracks. Sequences with multiple repeat overlaps were annotated as the class of repeat that overlapped most frequently with the short RNA in question. p-values comparing values between data sets were obtained using a two-sample test for the difference in proportions. Supplementary Information. Supplementary information can be found at http://web.mit.edu/sharplab/calabrese_supp/. Acknowledgements. The authors thank J. Burgyan and 0. Voinnet for providing P19 plasmids, and are especially grateful to J. Neilson for protocols and help with short RNA cloning. We also 72 thank A. Seila for help with bioinformatics, C. Whittaker for help with Figure 2, A. Leung for help with microscopy, and S. Erkeland, A. Garfinkel, J. Neilson, and A. Seila for critical reading of this manuscript. This work was supported by a Program Project Grant from the National Cancer Institute to P.A.S. and partially by the Cancer Center Support (core) grant from the National Cancer Institute. Table and Figure legends. Table 1. WCE = whole cell extract (1%NP40 lysis); NE = nuclear extract (resuspended pellet from 0.2% NP40 lysis); sup = supernatant; known ncRNAs = clones with at least 90% sequence identity to miRNAs, rRNAs, tRNAs, and snRNAs, as well as RNAs involved in imprinting and other processes; novel short RNAs = clones that are not known ncRNAs. Figure 1. P19 binds endogenous short RNAs when expressed in ES cells. (A) P19 expression constructs used in this study. (B) Sub-cellular localization ofP19V5 and P19NLS in ES cells (scale bar = 15gtm). (C) Western blots showing protein composition of P19-containing extracts. GAPDH (cytoplasmic) and Cyclin T1 (nuclear) serve as fractionation controls. WCE = whole cell extract; CE = cytoplasmic extract; NE = nuclear extract. (D) P19 constructs bind short RNAs when expressed in ES cells. Cells transfected with GFP, P19V5, or P19NLS were lysed with either WCE or NE buffer and immunoprecipitations were performed with aV5 Protein G agarose beads. Bound RNAs were 3' end-labeled with 5'32P cytidine 3', 5'-bis(phosphate) and resolved on a 12% denaturing polyacrylamide gel. The size markers correspond to a 10 bp DNA ladder. The arrow denotes a -20 nucleotide band observed in the P19 immunoprecipitations. Figure 2. P 19 enriches for particular short rRNA species when expressed in ES cells. (A) Specific short rRNAs are highly enriched in P19 immunoprecipitations compared to control supernatants. Shown is a scaled representation of all the short rRNAs cloned, aligned to bases 3,900 to 13,000 of the 13,404 base pair 45S pre-rRNA. Highlighted in bold along the X-axis are the locations of the mature 18S, 5.8S, and 28S rRNA species relative to the full-length 45S pre-rRNA. Each grey bar represents one cloned short rRNA positioned directly above or below its matching sequence in the 45S pre-rRNA. Grey bars above the X-axis were cloned from one or both of the immunoprecipitations, and those below the X-axis were cloned from one or both of the supernatants. (B) Certain P19-enriched short rRNAs form partial double-stranded RNA structures with themselves. Shown are selected Mfold-predicted dsRNA structures of enriched rRNAs folded against each other. Figure 3. P19 binds with high affinity to RNAs containing 19 to 21 base pair doublestranded regions with extended 5' or 3' single-stranded segments. (A) Base composition and secondary structure of short RNAs tested for binding to P19V5. In all cases, the strand in grey is short rRNA #3 from Figure 2A. (B) Representative P19-binding assay. Shown is the RNA bound by P 19V5-bead complexes after incubation with increasing concentrations of radiolabeled RNA (1, 5, 10, 50, and 100 nM). (C) Determination of the affinity of P19 for selected RNA species. Shown is the quantitation of a binding assay similar to that in (B). The Kapp was determined by fitting the data points to a fixed endpoint curve using KaleidaGraph data analysis software. Figure 4. Endogenous short rRNAs exist independently of Dicer. (A) P19 associates with short RNAs in the absence of Dicer. Dicer +/+ or -/- cells were transfected with either P19V5 or GFP as a negative control. Immunoprecipitations using V5 antibody were performed, and the associated RNA was 3' end labeled and visualized as in Figure ID. (B) Similar RNA species associate with P19V5 in the presence or absence of Dicer. Shown is a short RNA northern blot probing immunoprecipitated and supernatant RNA from 4A with probes complementary to RNA#3, miR295, or U6 snRNA. Supplementary Figure 1. (A) Endogenous short rRNAs do not repress expression of a luciferase reporter with 2 perfectly complementary binding sites inserted in its 3' UTR. Three short rRNA sequences were tested for repression: RNA #3 was cloned frequently in P19 immunoprecipitations but not in supernatants, RNA #5 was cloned frequently in both P19 immunoprecipitations and supernatants, and RNA #13 was cloned only in supernatants. Target sequences complementary to the endogenously expressed miR295 were included as a positive control. (B) RNA #3 is not upregulated in ES cells by transfection of Pl9V5, Pl9NLS, or non-specifically by a GFP control. Shown is a short RNA northern blot to total RNA prepared from ES cells after transfection of Pl9V5, Pl19NLS, GFP, or from untransfected cells (Neg), probed either with the sequence complementary to RNA #3 (top panel; the arrow denotes the -20 bp species enriched in P19 immunoprecipitates) or a tRNA loading control (bottom panel). (C) RNA #3 is present at low levels in various mouse tissues, assessed here by short RNA northern blotting. The blot was probed as in (B). In addition to the signal seen in ES cells, there is a low but detectable signal migrating at -20 bp in RNA from D 10-12 embryo, ovary, testicle, thymus, and spleen. Supplementary Figure 2. (A) Immunoprecipitation of either P19V5 orP 19HA does not generate a comparable band of short RNAs in 293T cells as in ES cells. 293T cells were transfected with either GFP, P19 tagged at the C-terminus with 3 HA epitopes (Dunoyer et al. 2004), or P19V5, and processed as in Figures ID and 4A. P19V5-associated RNA from ES cells was used as a positive control for labeling. (B) Western blots to samples from (A) confirming successful P19V5/HA immunoprecipitation in 293T cells. (C) Expression of P I9V5/NLS/HA does not inhibit siRNA-mediated knockdown of luciferase in 293T cells. 293T cells were transfected sequentially, first with P19V5/NLS/HA or BGalactosidaseV5 (as a negative control), then again with luciferase plasmids, siRNA, and more P19V5/NLS/HA or PGal. Luciferase assays were performed 24 hours after the second transfection. 77 Table 1. Gross statistics of short RNAs cloned from indicated RNA starting material. RNA source: WCE sup WCE P19V5 IP NE sup NE P19NLS IP # of sequences cloned known ncRNAs cloned novel short RNAs avg length of clone (in nucleotides) avg %GC of clone avg %GC of cloned ncRNA avg %GC of novel clone 303 250 53 2212 53117 53- 17 52113 373 296 77 20+2 75 +17 76i16 71121 380 325 55 23-2 49 15 50:15 47i13 261 198 63 21:2 67119 68±18 63-20 78 Table 2. Percentage of cloned short RNAs from Table 1 mapping to selected genomic features. --- -- RNA source: % of clones mapping to: known ncRNAs miRs miR*s rRNAs tRNAs snRNAs ESTs known repeats no match WCE sup WCE P19VS IP NE sup NE P19NLS IP 82.5 46.9 2.0 29.4 3.6 79.4 85.5 54.5 3.4 22.1 4.5 75.9 16.9 0.7 2.6 2.3 11.9 7.5 0.0 69.4 0.5 1.9 2.1 2.1 13.7 0.4 51.0 5.7 1.1 1.9 3.7 2.6 7.4 5.7 3.1 14.6 A p19V5 pl9NLS B CMV CIRV-p19 H CMV I H CIRV-p9 Stain: Actin DAPI merged P19V5 P19NLS P19V5 WCE D P19NLS CE - -m GFP WCE NE P19NLS CE P19NLS NE P19V5 WCE Ir vs oo00nts i,... 4 IfI .. GAPDH Cyclin Ti 30nts 20nts 1 Figure 1. 2 3 4 5 6 7 8 3 60. # of hits 40. (combined P19 IPs) 20- 6 I I 9 2 10 11 * *12 4E ,· i 8 i: i .iItI-i:^ -··- -i· # of hits (combined P19 sups) 20o 13 18S 5' -UCCGGUGAGCUCUCGCUGGCCC- 3' 3 -AAGGGCCGGCAGGGCCUCGGC- 5' 3-9 dG = -19.9 Figure 2. 5.85S 28S 5.-CGCCGAGGGCGCACCACCGG- 3' 3' UCGACUCCGCUAGGUGCCC-5' 5 -UCCGGUGAGCUCUCGCUGGCCC-3' - 3 -CCCGGUCGCUCUCGAGUGGCCU 5' 4-11 9-9 dG = -16.9 dG = -24.3 5' complementary 3' complementary 5'-CCCGGCCGUCCCGGAGCCGGCUUGGCUUCGU-3' 5'-CUUGGCUUCGUUUCCCGGCCGUCCCGGAGCCGUU- 3' 5'- CCCGGCCGUCCCGGAGCCGUU- 3' siRNA S c. .ob:1 B 2b;.,,, ,,,. ,,.. input [RNA] .-------7 30nts 20nts * bound RNA 5'complementary C siRNA 3' complementary single-stranded #3 bead-only IP 3500 3000 -0 2500 I 2000 1500 ---- slRNA 1000 -- -5'complementary - - 3'complementary 500 0 "i 0 50 100 150 [RNA] nM Figure 3. 200 250 A P19 GFP P19 Dicer +/+ Dicer +/+ Dicer - S rr IP W· n Io a IDo B P19 GFP P19 Dicer +/+ Dicer +/+ Dicer - S IP S IP I S IP RNA#3 100 bi d0 miR295 pre-miR295 04. 20 bp Figure 4. r:: r U6 control B 20 * Dicer +/+ Dicer -/- Neg GFP P19V5 P19NLS 60 bp 2 0 0P10 30 bp C5 20 bp 0 no site RNA#3 RNA #5 RNA #13 miR295 -5 3' UTR complementary sites C 30 bp 20 bp -4- -... - - tRNA control Supplementary Figure 1. j * a* 000 control WtRNA A t 1.40 z C< a. a C. $ GFP 293T S IP P19HA 293T S IP P19V5 293T S IP P19V5 ES S IP " RBaIVS ~P19V5 1.20 *914A L I P19HA 1.00oo -i. 0.80 S0.60 100 bp 0.40 0.20 r 0.00 0 10 I 100 [siRNA] pM Ij 20 b WB: 293T GFP pre-IP post-IP sup sup aV5 IP 293T P19HA pre-IP post-IP sup sup aHA IP oaV5 aHA Supplementary Figure 2. 293T P19V5 pre-IP post-IP sup sup aV5 IP jI 1000 References Alvarez-Garcia, I. and Miska, E.A. 2005. MicroRNA functions in animal development and human disease. Development 132(21): 4653-4662. Ambros, V. 2004. The functions of animal microRNAs. Nature 431(7006): 350-355. Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003. MicroRNAs and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818. Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Baulcombe, D. 2004. RNA silencing in plants. Nature 431(7006): 356-363. Cam, H.P., Sugiyama, T., Chen, E.S., Chen, X., FitzGerald, P.C., and Grewal, S.I. 2005. Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome. Nat Genet 37(8): 809-819. Cannone, J.J., Subramanian, S., Schnare, M.N., Collett, J.R., D'Souza, L.M., Du, Y., Feng, B., Lin, N., Madabusi, L.V., Muller, K.M. et al. 2002. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics3: 2. Chapman, E.J., Prokhnevsky, A.I., Gopinath, K., Dolja, V.V., and Carrington, J.C. 2004. Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18(10): 1179-1186. Cherry, S.R., Biniszkiewicz, D., van Parijs, L., Baltimore, D., and Jaenisch, R. 2000. Retroviral expression in embryonic stem cells and hematopoietic stem cells. Mol Cell Biol 20(20): 7419-7426. Doench, J.G., Petersen, C.P., and Sharp, P.A. 2003. siRNAs can function as miRNAs. Genes Dev 17(4): 438-442. Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in translational repression. Genes Dev 18(5): 504-511. Dunoyer, P., Lecellier, C.H., Parizotto, E.A., Himber, C., and Voinnet, 0. 2004. Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. PlantCell 16(5): 1235-1250. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. 2005. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310(5755): 1817-1821. Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., and Eddy, S.R. 2003. Rfam: an RNA family database. Nucleic Acids Res 31(1): 439-441. Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., and Bateman, A. 2005. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33(Database issue): D121-124. Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. ProcNatl Acad Sci USA 102(31): 10898-10903. He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S., Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J. et al. 2005. A microRNA polycistron as a potential human oncogene. Nature 435(7043): 828833. Houbaviy, H.B., Dennis, L., Jaenisch, R., and Sharp, P.A. 2005. Characterization of a highly variable eutherian microRNA gene. Rna 11(8): 1245-1257. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Jaenisch, R. 1997. DNA methylation and imprinting: why bother? Trends Genet 13(8): 323-329. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J. et al. 2003. The UCSC Genome Browser Database. Nucleic Acids Res 31(1): 51-54. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res 12(6): 996-1006. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294(5543): 853-858. Lakatos, L., Szittya, G., Silhavy, D., and Burgyan, J. 2004. Molecular mechanism of RNA silencing suppression mediated by p 19 protein of tombusviruses. Embo J 23(4): 876-884. Lander, E.S. Linton, L.M. Birren, B. Nusbaum, C. Zody, M.C. Baldwin, J. Devon, K. Dewar, K. Doyle, M. FitzHugh, W. et al. 2001. Initial sequencing and analysis of the human genome. Nature 409(6822): 860-921. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858-862. Lecellier, C.H., Dunoyer, P., Arar, K., Lehmann-Che, J., Eyquem, S., Himber, C., Saib, A., and Voinnet, 0. 2005. A cellular microRNA mediates antiviral defense in human cells. Science 308(5721): 557-560. Lee, R.C., Hammell, C.M., and Ambros, V. 2006. Interacting endogenous and exogenous RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597. Li, E. 2002. Chromatin modification and epigenetic reprogramming in mammalian development. Nature Review Genetics 3(9): 662-673. Lippman, Z. and Martienssen, R. 2004. The role of RNA interference in heterochromatic silencing. Nature 431(7006): 364-370. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A. et al. 2005. MicroRNA expression profiles classify human cancers. Nature 435(7043): 834-838. Mayer, C., Schmitz, K.M., Li, J., Grummt, I., and Santoro, R. 2006. Intergenic Transcripts Regulate the Epigenetic State of rRNA Genes. Mol Cell 22(3): 351361. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005. Characterization of Dicer-deficient murine embryonic stem cells. ProcNatl Acad Sci USA 102(34): 12135-12140. Novina, C.D. and Sharp, P.A. 2004. The RNAi revolution. Nature 430(6996): 161-164. Nykanen, A., Haley, B., and Zamore, P.D. 2001. ATP requirements and small interfering RNA structure in the RNA interference pathway. Cell 107(3): 309-321. Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress translation after initiation in mammalian cells. Mol Cell 21(4): 533-542. Plath, K., Mlynarczyk-Evans, S., Nusinow, D.A., and Panning, B. 2002. Xist RNA and the mechanism of X chromosome inactivation. Annu Rev Genet 36: 233-278. Scholthof, H.B. 2006. The Tombusvirus-encoded P 19: from irrelevance to elegance. Nat Rev Microbiol. Silhavy, D., Molnar, A., Lucioli, A., Szittya, G., Hornyik, C., Tavazza, M., and Burgyan, J. 2002. A viral protein suppresses RNA silencing and binds silencing-generated, 21- to 25-nucleotide double-stranded RNAs. Embo J21(12): 3070-3080. Sontheimer, E.J. and Carthew, R.W. 2005. Silence from within: endogenous siRNAs and miRNAs. Cell 122(1): 9-12. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. 2005. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123(6): 1133-1146. Stewart, C.L., Stuhlmann, H., Jahner, D., and Jaenisch, R. 1982. De novo methylation, expression, and infectivity of retroviral genomes introduced into embryonal carcinoma cells. ProcNatl Acad Sci USA 79(13): 4098-4102. Vagin, V.V., Sigova, A., Li, C., Seitz, H., Gvozdev, V., and Zamore, P.D. 2006. A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313(5785): 320-324. Vargason, J.M., Szittya, G., Burgyan, J., and Tanaka Hall, T.M. 2003. Size selective recognition of siRNA by an RNA silencing suppressor. Cell 115(7): 799-811. Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P., van Duijse, J., Drost, J., Griekspoor, A. et al. 2006. A genetic screen implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell 124(6): 1169-1181. Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N., and Imai, H. 2006. Identification and characterization of two novel classes of small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes. Genes Dev 20(13): 1732-1743. Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915): 520-562. Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D., Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2(5): El 04. Ye, K., Malinina, L., and Patel, D.J. 2003. Recognition of small interfering RNA by a viral suppressor of RNA silencing. Nature 426(6968): 874-878. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406-3415. Chapter 3 RNA sequence analysis defines Dicer's role in mouse embryonic stem cells This chapter appears in the context of its contemporary science, and is published in PNAS 104:18097-18102. The described experiments were an equal collaboration with Amy C. Seila, and also performed with Gene W. Yeo. All Supporting Figures referenced in the original manuscript are included in Chapter 3. The Supporting Text and associated Supporting Figures to the original manuscript are included as an Appendix to Chapter 3. All Supporting Tables and Alignment Files are published online on the Proceedings of the National Academy of Sciences' website. Abstract Short RNA expression was analyzed from Dicer-positive and Dicer-knockout mouse embryonic stem (ES) cells using high-throughput pyrosequencing. A correlation of miRNA quantification with sequencing frequency estimates that there are 110,000 miRNAs per ES cell, the majority of which can be accounted for by six distinct miRNA loci. Four of these miRNA loci or their human homologues have demonstrated roles in cell cycle regulation or oncogenesis, suggesting that a major function of the miRNA pathway in ES cells may be to shape their distinct cell cycle. 46 novel miRNAs were identified, most of which are expressed at low levels and are less conserved than the set of known miRNAs. Low abundance short RNAs matching all classes of repetitive elements were present in cells lacking Dicer, although the production of some SINE- and simple repeat-associated short RNAs appeared Dicer-dependent. These and other Dicerdependent novel sequences resembled miRNAs. At a depth of sequencing that approaches the total number of 5' phosphorylated short RNAs per cell, miRNAs appeared to be Dicer's only substrate. The results presented suggest a model in which repeat-associated miRNAs serve as host defenses against repetitive elements, a function canonically ascribed to other classes of short RNA. Introduction RNA interference (RNAi) is a conserved set of gene regulatory mechanisms in which short RNA molecules guide protein complexes to suppress expression of complementary nucleic acid targets. Different classes of short RNAs, complexed with specific Argonaute protein family members, either induce the degradation, prevent the translation, or prevent the transcription of their target RNA species (Tolia and Joshua-Tor 2007). In mammals, Argonaute proteins are thought to associate predominantly with a class of non-coding RNA genes termed microRNAs (miRNAs). miRNAs are essential regulators of diverse biological processes, including cell division, apoptosis, and metabolism (Kloosterman and Plasterk 2006). miRNA precursors are processed sequentially by the enzymes Drosha and Dicer to yield mature -22 nucleotide (nt) long single-stranded miRNAs (Bartel 2004). miRNAs are thought to primarily influence gene expression by preventing productive translation of target mRNAs, though recent studies suggest that they may have other mechanisms of action (Hwang et al. 2007; Nilsen 2007). Other classes of short RNAs mediate different types of RNAi-based silencing. In Arabidopsis and S. pombe, Argonaute-associated short-interfering RNAs (siRNAs) cleave repetitive transcripts and nucleate heterochromatin at genomic repeats (Lippman and Martienssen 2004). These siRNAs require Dicer and an RNA-dependent RNA polymerase (RdRP) for biogenesis. Potentially analogous siRNA species were identified in mouse oocytes, although it is not clear if these oocyte siRNAs nucleate heterochromatin (Watanabe et al. 2006). In animal germ cells, the Argonaute subfamily of Piwi proteins associate with Dicer-independent short RNAs, termed piRNAs. Like Arabidopsis and S. pombe siRNAs, piRNAs are thought to silence repetitive sequences at the level of transcription (O'Donnell and Boeke 2007). Finally, in C. elegans, endogenous siRNAs exist that are thought to silence protein-coding genes at the post-transcriptional level (Ambros et al. 2003b). These siRNAs also require Dicer and RdRPs for biogenesis, and likely have 5' di- or tri- instead of 5' mono-phosphates (Lee et al. 2006; Ruby et al. 2006). Embryonic stem (ES) cells are derived from the inner cell mass of the blastocyst during the stage of development where epigenetic patterns of gene regulation are reestablished in preparation for implantation (Reik et al. 2001). ES cells can be propagated in vitro without the loss of pluripotency and induced to differentiate into specialized cell types when given appropriate cues, making them potential sources of tissue in regenerative therapies (Mayhall et al. 2004). Many cancers also have stem cell-like characteristics, underscoring the clinical relevance of ES cell biology (Jones and Baylin 2007). Despite recent fundamental advances in the understanding of global ES cell chromatin architecture, much remains to be learned about the mechanisms by which ES cells maintain the pluripotent state (Spivakov and Fisher 2007). Specifically, ES cells lacking Dicer are viable, but are incapable of differentiation and display severe growth defects, indicating that the RNAi pathway is required for pluripotency and aspects of ES cell division (Kanellopoulou et al. 2005; Murchison et al. 2005). Presumably, these defects are due to loss of miRNA biogenesis and not other types of short RNAs, as previous sequencing of short cDNA libraries revealed miRNAs to be the predominant class of short RNA in mouse ES cells (Houbaviy et al. 2003; Calabrese and Sharp 2006). However, Dicer is critical to the biogenesis of almost all classes of short RNAs described, with the potential exception of piRNAs, thus it is possible that other previously unidentified RNAs contribute to the Dicer null ES cell mutant phenotype. To further our understanding of Dicer function and the mechanisms by which short RNAs mediate gene regulation in ES cells, short RNA expression was profiled in four independently derived ES cell cDNA libraries, including a library made from Dicer null ES cells. From quantification of miRNA levels, we estimate that there are 130,000 5' phosphorylated short RNAs per ES cell. 15% of these RNAs are generated independently of Dicer, and consist of: short non-coding RNA (ncRNA) fragments, promoter proximal RNAs (described elsewhere, in preparation), presumed breakdown products of mRNAs, and low-abundance, highly repetitive sequences. The remaining 85% of 5' phosphorylated ES cell short RNAs consist of miRNAs or miRNA-like species that depend on Dicer for biogenesis. The majority of ES cell miRNAs appear to be generated by six distinct loci, four of which have been implicated in cell cycle control or oncogenesis. Notably, poorly conserved ES cell miRNA hairpins tend to overlap annotated repetitive elements, potentially connecting the miRNA pathway to host defense against accumulated repeats. Results Global statistics of short cDNA libraries. Four separate short cDNA libraries made from mouse ES cells were sequenced using high throughput pyrosequencing (Margulies et al. 2005). To determine whether classes of short RNAs other than miRNAs depend on Dicer for biogenesis, short cDNA libraries were made from a floxed Dicer ES cell line before and several months after deletion of the floxed region containing the key catalytic residues of Dicer's second - "', respectively). This RNAse III domain (referred to as libraries "Dicer+'+" and "Dicer" Dicer deletion cell line has been used in previous studies (Calabrese and Sharp 2006; Leung et al. 2006) and largely recapitulates the phenotypic defects observed from earlier studies of Dicer loss in mouse ES cells (Supporting Information (SI) Figure 4) (Kanellopoulou et al. 2005; Murchison et al. 2005). Additionally, to determine if changes in DNA methylation correlate with expression of novel classes of mammalian short RNAs, libraries were sequenced from J 1 ES cells before and five days after treatment with the DNA methyltransferase inhibitor 5-aza-deoxycytidine (referred to as libraries "J1" and "JI aza", respectively; SI Figure 5). Rationale for this experiment was based on observations made in Arabidopsis, where production of short RNAs by the RNAipathway stimulates DNA methylation at certain classes of repetitive elements (Lippman and Martienssen 2004). Subsequent sequencing and analysis indicated few significant differences between the J1 and Jlaza cDNA libraries (data not shown), and for the purpose of this study they were treated primarily as expression replicates for Dicercontaining ES cell libraries. Because of strain and sex chromosome differences between J1 and Dicer++ES cells, reads have only been compared between the Dicer+l+ and Dicer /-libraries when considering the consequences of Dicer loss. In total, the four libraries contained 418,093 reads representing 79,265 distinct sequences (Table 1). We focused our analysis on the 298,039 reads representing 29,016 distinct sequences that matched the mouse genome with 100% identity over their entire length. On average, 82% of all reads from the Dicer-positive libraries matched annotated miRNA hairpins, while 11% of reads matched other known ncRNAs (rRNAs, tRNAs, snRNAs, etc.), and 7% of reads were previously uncharacterized short RNAs (referred to as "novel" sequences; Table 1). As expected, the Dicer"-library was nearly devoid of miRNAs, and instead composed of other known ncRNAs (69%) and novel sequences (31%; Table 1). Expression and analysis of known miRNAs. To validate that the cDNA libraries accurately recapitulated short RNA expression in ES cells, the absolute numbers of seven known miRNAs were determined in JI and Dicer+' + ES cells using the Direct miRNA assay (Table 2) (Neely et al. 2006). The Pearson correlation coefficients between the miRNA quantification and sequencing frequencies in the JI and Dicer+'+ libraries were 0.62 and 0.95, respectively. Correlating miRNA quantification to sequencing frequency, we conclude that a single ES cell contains approximately 110,000 miRNAs from a total pool of 130,000 5' phosphorylated short RNAs. The calculated number of miRNAs per femtogram of total ES cell RNA is 5.4+1. The number of reads obtained for each library approaches the total number of 5' phosphorylated short RNAs per ES cell; thus, each cDNA library can be considered an accurate sampling of the spectrum of 5' phosphorylated short RNAs in a single ES cell. With this in mind, the Dicer+'- and J1 libraries were used to determine the most abundantly expressed ES cell miRNAs. Averaging values from the Dicer+' + and JI libraries estimates that 27 ES cell miRNAs are expressed above 1,000 molecules per cell, with the most abundant present at about 5,000 molecules per cell (SI Table 4). When considering the 126 miRNAs that are expressed at least 50 molecules per cell, the average and median miRNA expression per cell is 713 and 231 molecules, respectively (SI Table 4). The majority of miRNAs in both ES cell lines could be accounted for by six genomic loci, representing 76% and 69% of Dicer+' + and J miRNAs, respectively (Table 3; SI Table 4, 5). These include: the miR15b/miR16 cluster, the miR17-92 cluster, miR21, the miR290-295 cluster, a repetitive miRNA cluster on chromosome 2 (SI Table 7), and an imprinted miRNA cluster on chromosome 12 (SI Table 7) (Seitz et al. 2004). Certain of these miRNAs, specifically miR16 and several in the miRI 7-92 cluster, have multiple genomic locations that may contribute to expression. There were significant differences in expression of two of these miRNA clusters between J 1 and Dicer+'+ ES cells, possibly due to differences in strain or sex. JI ES cells appear to express the + ES cells, while Dicer+/+ ES chromosome 12 cluster in higher abundance than Dicer+l cells appear to express the chromosome 2 cluster in higher abundance than J ES cells. The other four miRNA loci appeared quite similar in expression between the two cell types. Validation of known miRNAs. Comparison of the Dicer+l+ and Dicer- libraries allowed for the genetic validation of miRNAs expressed in ES cells, as true miRNAs should be absent in the Dicer/ library. Six annotated miRNA hairpins expressed in the Dicer- library had exact matches to ribosomal or small-nuclear ncRNAs and are thus probably incorrectly designated as miRNAs (denoted as "ncRNA" in SI Table 5). There were 2.5 times as many reads matching these six miRNA hairpins in the Dicer'- library than the Dicer+' + library, consistent with their being generated from Dicer-independent processing of abundant ncRNA transcripts and not miRNA hairpins. Excluding these six hairpins, the overall ratio of Dicer+l+ to Dicer-1- reads was 213:1 for 240 miRNA hairpins present in the Dicer+l+ and Dicer-' - libraries. This clear Dicer-dependence of miRNA expression indicates that the previous annotation of mammalian miRNAs has been an accurate process. Hypothesizing that a low level of Dicer-independent cleavage of pre-miRNA hairpins generated the few miRNA-matching reads in the Dicer" library, we further examined the sequence characteristics of the Dicer'- miRNAs. Consistent with this hypothesis, the lengths of the Dicer' miRNA reads were more broadly distributed as compared to the lengths of the Dicer+' + miRNA reads (SI Figure 6A). 58% of the Dicer-miRNA reads were 21-23 nt long, compared to 91% of the Dicer+l+ miRNA reads (p=7e14). This difference was striking considering the similarity of the size distributions for all other known ncRNAs between the Dicer+ + and Dicer'- libraries (SI Figure 6A). Next, we examined the extent of miRNA processing variability in each library, defined here as the proportion of miRNA-matching reads that do not match the annotated 5' and 3' ends of mature miRNA sequences. Drosha defines the 5' ends of mature miRNAs from the 5' arm of pre-miRNA hairpins and the 3' ends of mature miRNAs from the 3' of pre-miRNA hairpins; Dicer defines the 5' ends of mature miRNAs from the 3' arm of pre-miRNA hairpins and the 3' ends of mature miRNAs from the 5' arm of premiRNA hairpins (SI Figure 6B). If Dicer-1 miRNA reads were excised from pre-miRNA hairpins by a Dicer-independent mechanism, more miRNA processing variability might be expected in the Dicer-" as compared to the Dicer+"' library. Also, the ends of DicermiRNAs that would normally be defined by Dicer might show greater processing variability compared to those defined by Drosha. Supporting these ideas, miRNA reads exhibited more processing variability in the Dicer'/ as compared to the Dicer+'+library (SI Figure 6C, D), and, though Dicer-processed miRNA ends showed more variability compared to Drosha-processed miRNA ends in all four libraries, this difference was greatest in the Dicer/1- library (SI Figure 6C, D). While we cannot formally exclude the possibility that some miRNAs in the Dicer"- library could be due to cross-contamination from Dicer-positive libraries, these clear differences in expression characteristics suggest that many of the miRNAs in the Dicer - library were generated by inefficient Dicerindependent processing of pre-miRNA hairpins. Annotation of novel miRNAs. Using guidelines for miRNA annotation established by Ambros et al. (Ambros et al. 2003a), and incorporating rules for Drosha processing of primary miRNA transcripts (Zeng et al. 2005; Han et al. 2006), 46 novel miRNAs were identified in the Dicerpositive libraries (see SI; SI Table 6; SI Alignment File). These 46 novel miRNA hairpins generate miRNAs with 42 distinct seeds, defined as bases 2-7 from the 5' end of the miRNA (Lewis et al. 2005). 40 of these 42 seeds are novel. As a group, the novel miRNAs are expressed at low levels in ES cells and less conserved than the set of known miRNAs (Figure IA). Despite their low expression levels, most of the novel miRNAs were consistently present in each Dicer-containingES cell library. 36 of the 46 novel miRNAs were sequenced in at least two of the three libraries made from ES cells with functional Dicer, with 21 of these being present in all three Dicer-containinglibraries. 20 of the novel miRNAs mapped into large clusters of previously identified miRNAs on chromosomes 2, 12, and X (SI Table 7). Out of the remaining 26 novel miRNA hairpins, only 2 were located within 5kb of a known miRNA. Consistent with the novel miRNAs being less conserved than the set of known miRNAs, 24 of the 46 novel miRNA hairpins overlapped at least partially with annotated repetitive elements. By comparison, only 31 known miRNA hairpins overlap repeats in the set of 360 mouse miRNAs that map to the mm7 build of the mouse genome (Figure IB). As expected, the proportion of miRNA hairpins overlapping repeats decreases as miRNA conservation increases (Figure lA). Analysis of repeat-overlapping novel reads. A small number of short RNAs overlapping highly repetitive sequences existed in each of the four libraries, defined as those sequences with at least 20 exact matches to the genome (SI Table 8; see SI for further analysis). The 1211 unique sequences in this group were represented by 1991 reads, and had 3,935,923 total hits to the genome covering approximately 48 Mb of DNA. Based on correlations of miRNA quantification with sequencing frequency (Table 2), as a class these repetitive RNAs are present at approximately 225-750 copies per ES cell. There were no strong biases in the first nucleotide or length of these highly repetitive short RNAs, although there were slightly more sequences beginning with 'U' as compared to the set of novel sequences with less than 20 matches to the genome (Figure 2A). Examining the length distribution of 100 repetitive sequences, we observed a peak above background at 22 nt (Figure 2B). This peak is due solely to a Dicer-independent short RNA that is antisense to the primerbinding site of the early transposon (ETn) repeat, an endogenous retrovirus abundantly expressed in the early mouse embryo and ES cells (Maksakova and Mager 2005). The proportions of repetitive sequences overlapping SINE and simple repeats - as compared to the Dicer+' + library (Figure 2C). were significantly lower in the Dicer-1 This suggests either that certain SINE- and simple repeat-associated RNAs are processed by Dicer from precursor dsRNA structures, or that a transcriptional difference between Dicer+/+ and Dicefr' cells results in differential expression of these short RNAs. Northern blots showed no significant difference in full-length SINE B 1 RNA levels between + and Dicer- - ES cells (SI Figure 7), arguing against the latter hypothesis. Dicer+l In contrast, short RNAs overlapping centromeric satellite repeats, LINEs, and LTR elements were clearly not dependent on Dicer for biogenesis (Figure 2C). This was surprising, as previous studies have suggested that Dicer-dependent siRNAs processed from long dsRNA precursors are important for silencing of these elements (Kanellopoulou et al. 2005; Watanabe et al. 2006; Yang and Kazazian 2006). The Dicer&ES cells analyzed here maintain genomic DNA methylation at satellite repeats and LINEs (SI Figure 4E), demonstrating that RNAi is not required for maintenance of global repeat methylation and suggesting that loss of centromeric silencing in certain Dicer null ES cells lines may be an indirect effect of Dicer loss (Kanellopoulou et al. 2005). Few non-miRNA Dicer-dependent sequences are expressed in ES cells. 101 Because Dicer is involved in the production of short RNAs other than miRNAs in several organisms, we next sought to determine what non-miRNA short RNAs might be Dicer-dependent in ES cells. Sequences present at least 3 times in the Dicer+'+ library and / library were flagged as potentially dependent on Dicer for absent in the Dicer> biogenesis (referred to as "Dicer-dependent" below) and subjected to further analysis. There were 50 distinct sequences, represented by 233 reads in the Dicer'l+and 139 reads in the Jl and Jlaza libraries, which matched these criteria and were not annotated above as novel miRNAs. Consistent with their being Dicer products, the length distribution of these sequences peaked more sharply at -21 nt when compared to all other novel + library (Figure 3A; p=4.0e-5). The Dicer-dependent short sequences in the Dicer+l RNAs are biased towards sequences that begin with 'A' as compared to the set of all novel reads, though this bias is not as strong as the 'U' bias seen for known miRNAs (Figure 3B). As expected from the analysis of highly repetitive reads, these sequences were enriched in SINE and simple repeat elements as compared to the set of novel sequences that did not meet the criteria for Dicer-dependence (SI Figure 8). Two groups of Dicer-dependent sequences, composed of 48 and 87 reads, were related in sequence (Figure 3C). Both of these sequence groups appeared to be repeat-derived, with Group 1 composed entirely of SINE B 1 overlapping reads, and Group 2 displaying more heterogeneity with respect to its repeat overlap (Figure 3C). The possibility that Dicer-dependent sequences represent endogenous siRNAs processed by Dicer from long double-stranded RNA was examined. Endogenous siRNAs processed from a single precursor would be expected to cluster near other short RNA sequences. In contrast, Dicer-dependent novel sequences do not cluster with any greater 102 frequency than novel sequences not defined as Dicer-dependent. 22% of novel sequences both defined and not defined as Dicer-dependent fell within 500 bases of at least one other short RNA from the set of 25,040 non-repetitive sequences present in all four libraries (10 out of 45 Dicer-dependent sequences, and 2,493 out of 11,493 other novel sequences; non-repetitive sequences were defined as having <20 matches to the genome). Moreover, of the 10 Dicer-dependent sequences that did cluster near other short RNA loci, eight overlapped protein-coding genes in the sense orientation, again not consistent with these sequences being canonical siRNAs involved in gene silencing processes. Instead of representing a class of endogenous siRNAs, it seems likely that many of these Dicer-dependent sequences are miRNA-like reads whose surrounding genomic sequences did not form prototypical miRNA hairpins. The two groups of related Dicerdependent sequences are in support of this hypothesis (Figure 3C). The five SINE B 1 associated sequences from Group 1 aligned to hairpins which were miRNA-like, but did not meet the minimum requirements for miRNA hairpin base-pairing used in this study (SI Alignment File). The Group 2 sequences are related to known miRNAs on Chromosome 2 (SI Table 7), and two sequences from this group also aligned to miRNAlike hairpins with poorly defined secondary structure (SI Alignment File). Again, these observations are consistent with Group 2 sequences being miRNA-like and not siRNAlike in origin. Sequences present less than three times in the Dicer+/+ library were not evaluated for Dicer-dependence, because the transcriptional program of Dicer+' + and Dicer' ES cells is likely quite different and minor differences in short RNA expression between the two cell types would be expected. There remained 1,096 novel sequences each present 103 + library, which were absent in the and represented by less then three reads in the Dicer+l Dicer- library and potentially dependent on Dicer for biogenesis. While some are expected to be Dicer products, as a class they clearly differed from the Dicer-dependent sequences described above; most notably, these sequences exhibited a broad length distribution uncharacteristic of Dicer products (SI Figure 9). Thus, if non-miRNA Dicerdependent short RNAs are expressed in ES cells, they are beyond the limits of detection in the cDNA libraries analyzed here. Discussion Of the estimated 130,000 5' phosphorylated short RNAs in an ES cell, roughly 85% are Dicer-dependent miRNAs or miRNA-like species and 15% are Dicerindependent short RNAs. These Dicer-independent RNAs consist primarily of short ncRNA species, promoter proximal RNAs that are likely the products of paused RNA polymerase II (described elsewhere, in preparation), presumed breakdown products of mRNAs, and highly repetitive short RNA sequences. At a depth of sequencing approaching the total number of 5' phosphorylated short RNAs per ES cell, the miRNA was the only class of short RNA found to be Dicerdependent. Other classes of Dicer-dependent short RNAs found in many non-mammalian organisms do not appear to be expressed in ES cells. Specifically not observed were the Dicer-dependent heterochromatic siRNAs, analogous to those seen in Arabidopsis and S. pombe, that have been proposed to guide the silencing of ES cell centromeric repeats (Kanellopoulou et al. 2005). While short RNAs corresponding to highly repetitive sequences were detected at low levels in the ES cells analyzed here, their biogenesis was 104 Dicer-independent. Moreover, the potential mammalian counterparts to these siRNAs, piRNAs, were also not detected in the analyzed libraries, nor were C. elegans-like siRNAs that are anti-sense to mRNAs (see SI). Direct comparison of the Dicer+/+ and Dicer - libraries did detect a small number of sequences, representing 0.5% of all Dicer+/+ reads, which appeared Dicer-dependent and were not annotated as miRNAs; however, many of these sequences appeared miRNA-like. In summary, the presented data strongly favor the hypothesis that Dicer's sole catalytic role in ES cells is to produce miRNAs, and that the phenotypic consequences of ES cell Dicer deletion are due solely to miRNA loss (Kanellopoulou et al. 2005; Murchison et al. 2005). In total, 323 distinct known and novel miRNA sequences were observed in the J1 + libraries. The most abundant of these have implied functions consistent with and Dicer""' the severe growth defects of Dicer null ES cells; miR21, the miR17-92 cluster, the miR15b/16 cluster, and the miR290-295 cluster, or their human homologues, have demonstrated roles in cell-cycle regulation or oncogenesis (He et al. 2005; Si et al. 2006; Voorhoeve et al. 2006; Linsley et al. 2007). Almost half of the 110,000 ES cell miRNAs can be accounted for by these four loci, suggesting that a major function of the miRNA pathway in ES cells is to contribute to the control of cell division. Close to two-thirds of the 323 ES cell miRNAs are expressed at less than 50 copies per cell. A subset of these lowly expressed miRNAs may play important roles in defining the ES cell state; however, many may have more critical roles in cell types other than ES cells, especially those that are the most conserved. Considering the latter possibility, their apparent ES cell expression could be due to the existence of a small number of differentiated cells within a larger population of undifferentiated ES cells. 105 Alternatively, the diverse set of lowly expressed miRNAs might reflect the heterogeneity of regulatory systems inherent within a pluripotent ES cell population. Many of the least conserved ES cell miRNA hairpins overlap annotated repetitive elements, suggesting that particular miRNAs may partially function to silence complementary repeat-containing RNAs (Smalheiser and Torvik 2005; Piriyapongsa et al. 2007). This repression could occur through a canonical miRNA-based targeting mechanism, resulting in the translational inhibition and targeting to cellular processing bodies of repeat-containing RNAs with seed complements to repeat-derived miRNAs. Alternatively, the most repetitive miRNA sequences have the potential to direct cleavage of transcripts with perfect or near perfect complementarity. Finally, in certain cases, it is possible that recognition of the miRNA hairpin itself may be the initiating signal for a silencing event in cis. In mouse oocytes, repetitive sequences appear to be under Dicer-dependent repression. Certain repeat-containing mRNAs were found to be expressed at higher levels in Dicer- compared to Dicer"I oocytes (Murchison et al. 2007). Further, expression of EGFP reporters with retrotransposon-derived 3'UTRs was repressed in mouse oocytes (Watanabe et al. 2006). These repressive effects were conjectured to be due to endogenous siRNA species arising from genomic repeats (Watanabe et al. 2006; Murchison et al. 2007). Similarly, LINE retrotransposition has been proposed to be repressed by Dicer-dependent siRNA species in human cells (Yang and Kazazian 2006). The apparent absence of analogous siRNA species in mouse ES cells, coupled with the observed relationship between miRNAs and repetitive elements, suggests that in certain contexts the miRNA pathway may perform functions canonically thought of as siRNA- 106 specific. This hypothesis argues for the re-evaluation of repressive effects associated with mammalian repetitive elements, and potentially has important implications during early mouse development, where repetitive element expression is dynamic (Peaston et al. 2004). Methods ES cell culture and manipulation. Generation of Dicer÷ " and Dicer, ES cells, and of Jlaza RNA, is described in the SI. miRNA quantification was performed essentially as described (Neilson et al. 2007). Briefly, trypsinized ES cells were counted and lysed directly in Trizol. 1.5 or 3 picomoles of single-stranded siRNA was spiked into Trizol solutions and quantified to normalize for short RNA recovery. From 15 preparations, the average total RNA per ES cell was 20pg and the average short RNA recovery was 76%. miRNA levels were quantified using the Direct assay (Neely et al. 2006). miRNA molarity per sample was determined by comparison to standard curves of synthetic miRNAs and normalized for short RNA recovery. miRNA per cell values were obtained by dividing miRNA copy number per sample by the number of ES cell equivalents of RNA measured per assay. The number of 5' phosphorylated short RNAs per ES cell, 130,000, was obtained by dividing the miRNA copy number per cell by the sequencing frequency of each quantified miRNA (SI Table 4) and taking the average for 7 miRNAs quantified in J1 and Dicer+'+ ES cells. Mature miRNAs sequenced per library included those truncated on their 3' end by one nucleotide, and those extending beyond the annotated 3' end. 107 Short cDNA library preparation and read processing. Short cDNA libraries were made as described (Neilson et al. 2007). Gel purifications of short RNA/DNA species extended from 16 to slightly past 30 nt. Downstream analysis was performed on sequences with perfect matches to either: the NCBI build 35 of the mouse genome (mm7), miRBase8.2 (Griffiths-Jones 2004; Griffiths-Jones et al. 2006), tRNA sequences (Lowe and Eddy 1997), the non-code RNA database (Liu et al. 2005), ENSEMBL non-coding RNAs (Hubbard et al. 2005), or the complete rDNA repeating unit (Grozdanov et al. 2003). Conservation and repeat information was obtained using the UCSC table browser (Karolchik et al. 2004); see SI for details. Novel miRNA annotation. Novel miRNAs were annotated according to pre-established guidelines, also incorporating rules for Drosha processing of primary miRNA transcripts (Ambros et al. 2003a; Zeng et al. 2005; Han et al. 2006); see SI for details. 16 of the 46 novel miRNAs were verified by other studies at the time of submission (Rfam version 10). Sequence information. Analyzed sequences are provided in SI Tables 9-13. Acknowledgements We thank A. Young and J. Neilson for critical reading of this manuscript, G. Zheng, C. Whittaker, and G. Ruby for advice on bioinformatic analysis, M. Lindstrom for figure help, and the Broad Institute for pyrosequencing. Thanks to D. Livingston for Dicer 108 antibodies. ACS was supported by NIH fellowship 5-F32-HDO51190 and GWY was funded by the Crick-Jacobs Center for Theoretical and Computational Biology. This work was supported by United States Public Health Service grants RO1-GM34277 from the NIH, PO1-CA42063 from the NCI to PAS and partially by grant P30-CA14051 from the NCI. 109 Table and Figure legends. Table 1. Composition of cDNA libraries analyzed, represented as percentages of the total number of reads matching the August 2005 build of the mouse genome ("match mm7"). Also shown is the total number of reads sequenced in each library ("all reads"). Table 2. Direct quantification of specific miRNAs per ES cell. The measured miRNA copy number is compared to the sequencing frequency per 130,000 reads in the J1 and Dicer_' - libraries. Error is the SEM from 2 to 21 triplicate measurements. Table 3. The major miRNAs expressed in ES cells. The genomic location and miRNAs contained in the chr2 and chrl2 clusters are described in SI Table 7. Figure 1. Conservation, expression, and repeat overlap of known and novel miRNA hairpins. (A) Conservation and ES cell expression of known and novel miRNA hairpins. The percentage of miRNA hairpins overlapping repeats is bracketed for three bins of conservation. (B) Repeatmasker overlap of known and novel miRNA hairpins. Numbers refer to the total number of miRNA hairpins in each category. "Multiple" refers to those hairpins overlapping more than one class of repeat. Figure 2. Analysis of highly repetitive novel sequences. (A) First nucleotide distribution of highly repetitive novel sequences (Ž 20 hits to the genome) compared to non-repetitive novel sequences (< 20 hits to the genome), and known miRNAs.(B) Length distribution 110 of highly repetitive novel sequences compared to all non-repetitive novel sequences. (C) Repeatmasker classification of highly repetitive novel sequences, represented as proportions of novel reads per library. The number of novel reads per library is in parentheses. Figure 3. Description of Dicer-dependent novel sequences. (A) Length distribution of Dicer-dependent novel sequences compared to all other Dicer+'+ and Dicer- novel sequences. (B) First nucleotide distribution of Dicer-dependent novel sequences. (C) Two groups of Dicer-dependent sequences share sequence similarity. Shown are identified sequence motifs along with aligning sequences, total reads by library, number of genome matches, and overlapping repeats. SI Figure 4. Characterization of Dicer+'+ and Dicer'-ES cells used in this study. (A) Schematic of the conditional allele of Dicer from (Harfe et al. 2005) and (B) genotyping PCR confirming successful deletion of the floxed region of Dicer after clonal selection of + ES cells. Deletion of Dicer Exon 24 from ES cells resulted in a Cre-treated Dicer+l decrease in proliferation rate followed by a growth recovery after several weeks in culture, as in (Murchison et al. 2005) (not shown). (C) PCR assay for sex-chromosome determination. Dicer+l+ and Dicer- ES cells appear female, shown by the absence of SRY. The microsatellite nds serves as a positive control. (D) Western blot confirms loss of detectable Dicer in Dicer- ES cells. (E) Genomic repeats maintain DNA methylation in the absence of Dicer. DNA from Dicer+l +, Dicer- -, and DNMT1 null ES (labeled as "c/c") was digested with the methylation sensitive restriction enzyme Hpa II or its 111 methylation insensitive isoschizamer, Msp I. Representative southern blot probed with the minor satellite repeat or L1 LINE element shows that Dicer--ES cells retain global levels of DNA methylation, as in (Murchison et al. 2005). An unmethylated mitochondrial DNA fragment serves as a loading control. Dicer-/ ES cells consistently + controls. displayed elevated levels of DNA methylation compared to Dicer+l SI Figure 5. Analysis of 5-aza-deoxycytidine treated ES cells. (A) HPLC trace showing distinct 5-methyl-C peak. (B) Total DNA methylation assessed by HPLC following 5aza-dC treatment. (C) Quantitative RT-PCR of MuERV element following 5-aza-dC treatment. The star in (B) and (C) denotes when RNA was collected for analysis. SI Figure 6. Sequence characteristics of miRNAs from the Dicer +/+and Dicer / libraries. (A) Length distributions of known ncRNAs between the Dicer +/+ and Dicer libraries: (i) miRNAs; (ii) tRNAs; (iii) non-miRNA/non-tRNA/non-rRNA known ncRNAs; (iv) rRNAs. Associated D- and p-values for the differences in length distributions are shown above length histograms. (B) Schematic of miRNA-end definition. (C, D) miRNA processing variability in the four cDNA libraries. SI Figure 7. Levels of SINE B1 RNA in Dicer"+ and Dicer/ ES cells. The glutamine tRNA serves as a loading control. 112 SI Figure 8. Dicer-dependent novel sequences are enriched in SINE and Simple repeat overlap. Shown is the proportional representation of repeat overlap for all Dicerdependent, Dicer-', and Dicer' 1 novel reads. "No rep overlap" refers to those sequences not overlapping annotated repeats. SI Figure 9. Comparison of Dicer-dependentnovel sequences with Dicer+ novel sequences represented by 1 or 2 reads. (A) First nucleotide comparison shows no bias in the set of Dicer / novel sequences represented by 1 or 2 reads compared to Dicer- dependent novels. (B) 1- and 2-read Dicer+"novelsare not enriched in SINE or Simple repeat overlap. (C) 1- and 2-read Dicer uncharacteristic of Dicer products. novels have a broad length distribution 113 Feature: %miRNA %rRNA %ncRNA %tRNA %novel reads match mm7 all reads Table 1. Jl 86.2 4.8 2.4 1.6 5.0 104,220 149,986 Jiaza 81.6 4.2 4.1 2.1 8.0 115,304 155,934 Dicer+/+ 78.0 9.3 1.0 3.0 8.7 45,320 57,834 Dicer 0.5 43.8 7.9 16.9 30.9 33,195 54,339 114 miRNA: miR15a miR 15b miR 16 miR 17-5p miR 19b miR 21 miR 30c Table 2. JI (quant) 290±50 950±20 1130±140 1510±110 2140±490 2750±410 250±20 JJ (reads) 175 2301 2037 795 14777 6172 2946 Dicer"+(quant) 280±20 970±40 1090±120 1440±170 2340±550 1340±450 220±40 Dicer+/'(reads) 293 1621 1199 1509 3918 2272 379 115 miRNA cluster: 290-295 17-92 chr2 chrl2 21 15b/16 Table 3. % of Ji miRNAs: 23 17 6 14 6 4 % of Dicer + + miRNAs: 29 11 27 4 2 3 116 VA * known miRs to * novel miRs C v 00 I-. 04 'A chr2 novel miRs * chrl2 novel miRs 01 x chrX novel miRs Conservation score 85 4 I/'-5 S LTR LINE 1SINE ESimple I DNA * Multiple a No Repeat 5 novel miRs (46) Figure 1. 329 known miRs (360) 117 C AtL v% 0 0-1 BM 35 . 25 0 0 Co * 16 18 20 22 24 26 28 30 32 Length (nt) I repetitive novels D non-repetitive novels Figure 2. 118 A "40f 30 • ~ 201 S10 U Dicer-dependent 1[- All oher +/+ novel 0 60 19 Length17 21 23 2 /- novels 27 €PI Length (nt) 6' Group 1 sequences ]QATCICCT aTCT ICCTC Group 2 sequences ] .. C Figure 3. total rads 4 13 7 20 4 x• x j; ~' +/+ 3 5 3 9 4 -I0 0 0 0 0 31 +/ 11 9 4 5 5 3 3 13 3 -/- hits 0 61 Simple~repeat tntal reads 12 9 4 7 5 6 C- 3 CT 37 4 genome hits 1330 7 1257 1222 172 31 ilaza 0 1 6 2 0 4 1 10 0 0 Repeat overlap B1 SINE 81 SINE B1 SINE B1 SINE B1 SINE enome 0 0 0 0 3 0 15 1 laza 1 0 0 2 0 0 0 9 0 0 0 0 0 0 0 0 0 6 1 4 8 1 2 1 12 Repeat overlap B3 SINE; Simple_repeat Simple repeat; 84 SINE Simplerepeat; RLTR21 LTR; RMER12 LTR LIMA8 LINE Simple repeat RMER12 LTR; RSINE1; B3 SINE; Simple-repeat 119 A 4 DEADHelicase RNAse III PAiZ OU I c Deleted region nds Endogenous: Ex•n +/+ -- 351 bp band .24 SRY Floxed: -OP,,If xon FRT-Neo-FRT IoxP 410 bp band S IoxP - w --- Dicer GAPDH 530 bp band IoxP B /j 0 \ Hpa II 500bp I +/+ II d+/+ /- MBp i I O- Hpa II I +1+ / I - c/c4+ Mep I I -/- c/ 300bp LINE 1 minor satellite SI Figure 4. mitochondrial 120 [ DADI A, Sig=280.4A RPe'450.8 rr J-- L A 1 .t., 3.1, 3 2.A 2 1. I I EN m _ Days post 5-aza-dC treatment Lf W (N 1q = Days post 5-aza-dC treatment ouJ SI Figure 5. M 121 i) miRNAs 0.6 N Dicer ii) 0.7 D=0.29;p=7e-14 ++ tRNAs 0.2 D=O5; I Dicer" iii) 0= 0.25 So.5( 04 O.Sl 0C o0.3 0.6 iv) ncRNAs ; p .39 0.2 Lot 0 0.15 o.2! 0.15 001 i 0 o 16 18 20 22 24 26 o00 16 18 20 22 24 Length (nt) Length (nt) 26 16 18 20 22 24 o0.1 o°os 16 26 Length (nt) Drosha rRNAs 04 p2.. 0.25 18 20 22 24 Length (nt) 26 Dicer *= 5p arm of pre-miRNA: Drosha defined 5' end, Dicer defined 3' end *= 3p arm of pre-miRNA: Dicer defined 5' end, Drosha defined 3' end " D 50 so 0 53p miRs 40 N[ 3p miRs I V.U. 80 8 0 .... ............ ..................... ... ... ........ ... ............ ............... 70 M 5p miRs 60 i 3pmiRs p 30 50 I 20 o 40n 99 0ý = 1io 41 30S i E 20 = 10 0 31 Jlaza +/+ -/- (5' ends of miRNA reads) SI Figure 6. 31 Jlaza +/+ -/- (3' ends of miRNA reads) 122 +1 -ISINE B1 Q-tRNA 1 +/+ SI Figure 7. -I- 123 __ 0.8 0.7 I_ I Dicer-dependent novels M All +/+ novels All -/- novels 1r 0.6 0.5 0.4 0.3 0.2 0.1 71 0 I No rep overlap SI Figure 8. SINE tL __ Simple repeat 7 __ LTR ELINE 124 A 100 do s1o0 - - 80 S!U B ~s· 601 0. ~~~s ---~ 40 *q# 30 I, 0- 'ier 3"~ 0A-i 10 I " I 0- L--.------...-- Dicer-dependent 1- and 2-read novels Dicer-dependent 0.45 - Dicer-dependent 0.4 i ;i 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 17 18 19 20 21 22 23 24 25 26 27 Length (nt) SI Figure 9. 1- and 2-read novels 125 References Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., and al., e. 2003a. A uniform system for microRNA annotation. Rna 9(3): 277-279. Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818. Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12): 2092-2102. Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue): D109-111. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34(Database issue): D140-144. Grozdanov, P., Georgiev, 0., and Karagyozov, L. 2003. Complete sequence of the 45-kb mouse ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 82(6): 637-643. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901. Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. ProcNatl Acad Sci USA 102(31): 10898-10903. He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S., Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J., and Hammond, S.M. 2005. A microRNA polycistron as a potential human oncogene. Nature 435(7043): 828-833. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X.M., Gilbert, J., Hammond, M., Herrero, J., Hotz, H., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., 126 Kokocinsci, F., London, D., Longden, I., McVicker, G., Melsopp, C., Meidl, P., Potter, S., Proctor, G., Rae, M., Rios, D., Schuster, M., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and Birney, E. 2005. Ensembl 2005. Nucleic Acids Res 33(Database issue): D447-453. Hwang, H.W., Wentzel, E.A., and Mendell, J.T. 2007. A hexanucleotide element directs microRNA nuclear import. Science 315(5808): 97-100. Jones, P.A. and Baylin, S.B. 2007. The epigenomics of cancer. Cell 128(4): 683-692. Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev 19(4): 489-501. Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32(Database issue): D493-496. Kloosterman, W.P. and Plasterk, R.H. 2006. The diverse functions of microRNAs in animal development and disease. Dev Cell 11(4): 441-450. Lee, R.C., Hammell, C.M., and Ambros, V. 2006. Interacting endogenous and exogenous RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597. Leung, A.K., Calabrese, J.M., and Sharp, P.A. 2006. Quantitative analysis of Argonaute protein reveals microRNA-dependent localization to stress granules. ProcNatl AcadSci USA 103(48): 18125-18130. Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): 15-20. Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R., Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H., and al., e. 2007. Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol 27(6): 2240-2252. Lippman, Z. and Martienssen, R. 2004. The role of RNA interference in heterochromatic silencing. Nature 431(7006): 364-370. Liu, C., Bai, B., Skogerbo, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y., and Chen, R. 2005. NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Res 33(Database issue): D 112-115. Lowe, T.M. and Eddy, S.R. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5): 955-964. 127 Maksakova, I.A. and Mager, D.L. 2005. Transcriptional regulation of early transposon elements, an active family of mouse long terminal repeat retrotransposons. J Virol 79(22): 13865-13874. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., and al., e. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057): 376-380. Mayhall, E.A., Paffett-Lugassy, N., and Zon, L.I. 2004. The clinical potential of stem cells. Curr Opin Cell Biol 16(6): 713-720. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005. Characterization of Dicer-deficient murine embryonic stem cells. Proc Natl Acad Sci USA 102(34): 12135-12140. Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon, G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693. Neely, L.A., Patel, S., Garver, J., Gallo, M., Hackett, M., McLaughlin, S., Nadel, M., Harris, J., Gullans, S., and Rooke, J. 2006. A single-molecule method for the quantitation of microRNA gene expression. Nat Methods 3(1): 41-46. Neilson, J.R., Zheng, G.X., Burge, C.B., and Sharp, P.A. 2007. Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev 21(5): 578-589. Nilsen, T.W. 2007. Mechanisms of microRNA-mediated gene regulation in animal cells. Trends Genet. O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against genome intruders. Cell 129(1): 37-44. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7(4): 597-606. Piriyapongsa, J., Marino-Ramirez, L., and Jordan, I.K. 2007. Origin and evolution of human microRNAs from transposable elements. Genetics 176(2): 1323-1337. Reik, W., Dean, W., and Walter, J. 2001. Epigenetic reprogramming in mammalian development. Science 293(5532): 1089-1093. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207. 128 Seitz, H., Royo, H., Bortolin, M.L., Lin, S.P., Ferguson-Smith, A.C., and Cavaille, J. 2004. A large imprinted microRNA gene cluster at the mouse Dlkl-Gtl2 domain. Genome Res 14(9): 1741-1748. Si, M.L., Zhu, S., Wu, H., Lu, Z., Wu, F., and Mo, Y.Y. 2006. miR-21-mediated tumor growth. Oncogene. Smalheiser, N.R. and Torvik, V.I. 2005. Mammalian microRNAs derived from genomic repeats. Trends Genet 21(6): 322-326. Spivakov, M. and Fisher, A.G. 2007. Epigenetic signatures of stem-cell identity. Nat Rev Genet 8(4): 263-271. Tolia, N.H. and Joshua-Tor, L. 2007. Slicer and the argonautes. Nat Chem Biol 3(1): 3643. Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P., van Duijse, J., Drost, J., Griekspoor, A., and al., e. 2006. A genetic screen implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell 124(6): 1169-1181. Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N., and Imai, H. 2006. Identification and characterization of two novel classes of small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes. Genes Dev 20(13): 1732-1743. Yang, N. and Kazazian, H.H., Jr. 2006. L 1 retrotransposition is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol 13(9): 763-771. Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. Embo J24(1): 138-148. 129 Chapter 3, Appendix Supporting information referenced in Chapter 3, "RNA sequence analysis defines Dicer's role in mouse embryonic stem cells". This chapter appears in the context of its contemporary science, and, with the exception of the included description of short rRNA fragments, appears as Supporting Information for PNAS 104:18097-18102 on the Proceedings of the National Academy of Science's website. 130 Complete description of repeat-overlapping sequences. The genomic distribution of repetitive sequences correlated well with total repeat content along chromosomes, and specific high-density clusters of repeat overlapping reads were not observed (SI Figure 10A,B). Of note, the ratio of short RNA associated repeats to total chromosome repeat content in all 4 libraries is highest on chromosome X (SI Figure 10B,C). This increase is due to a large number of matches to LINE-associated short RNAs, and not an absolute increase in the number of distinct repetitive short RNA sequences on chromosome X (SI Figure 10OD, E). Because of their large number of corresponding genomic locations, the highly repetitive sequences were not searched comprehensively for novel miRNAs (see SI Methods). Nevertheless, the total proportions of highly repetitive sequences compared to all novel reads were similar between the Dicer+l+ and Dicer-" libraries, indicating that as a class they exist independently of Dicer activity (SI Table 8). Further, these sequences share several descriptive characteristics with the set of repeat-overlapping novel sequences with less than 20 hits to the genome, described below, again indicating they exist as a class of sequences separate from miRNAs. The repeat overlap in the set of novel sequences with less than 20 hits to the genome was analyzed (referred to as "non-repetitive repeat-overlapping novel reads" below). Because these sequences had far fewer genomic hits than the highly repetitive sequences, a more in depth analysis of their relationship to surrounding genomic sequence was feasible. Foremost, these novel reads were comprehensively evaluated as potential miRNAs, and their surrounding genomic sequences are devoid of clear hairpin structures. As described above for the set of highly repetitive sequences, the proportions 131 of SINE and Simple repeat overlap in this set are reduced in the Dicer-1 library as compared to the Dicer+' + library, further supporting the idea that a subset of repeatassociated short RNAs depend on Dicer for biogenesis (SI Figure 11A).The nonrepetitive, repeat-overlapping novel sequences showed no clear strand bias with respect to overlapping repetitive elements, and had the same distribution of length and first nucleotides as the novel sequences that did not overlap repeats (SI Figure 11A, B, not shown). As expected, the non-repetitive, repeat-overlapping novel sequences were more frequently complementary to intergenic regions, and significantly less conserved when compared to the novel reads that did not overlap repeats (SI Figure 11C,D). Consistent with the broad chromosomal distribution of the set of highly repetitive novel sequences, there is no significant clustering of the repeat overlapping reads with less than 20 hits to the genome (not shown). No evidence for piRNAs in mouse ES cells. The recent description of piRNAs in the mouse, rat, and human testis prompted us to examine if ES cells expressed similar short RNAs (O'Donnell and Boeke 2007). There was no evidence for a distinct class of 29-31 nt piRNA-like species in any of the four cDNA libraries. There were, however, 51 distinct sequences, represented by 112 reads, which uniquely overlapped 14 known piRNA clusters (Lau et al. 2006). 30 of these sequences, represented by 82 reads, fell into one piRNA cluster and were generated by a group of known and novel miRNAs that was identified on chromosome X (Table 4). Accordingly, the length distribution and first nucleotide bias of the reads falling into 132 piRNA clusters were miRNA-like, with a major peak of reads surrounding 22 nt, and 60% of sequences beginning with a 'U' (SI Figure 12A,B). No evidence for C.elegans-like siRNAs in mouse ES cells. The possibility that ES cells express endogenous siRNAs similar to those observed in C.elegans was examined (Ambros et al. 2003b; Ruby et al. 2006; Pak and Fire 2007; Sijen et al. 2007). Such siRNAs are antisense to protein coding genes, and do not typically have the 5' mono-phosphates that were selected for in this study. Nevertheless, an analysis of 5' mono-phosphate-containing short RNAs from C. elegans did identify two distinct siRNA populations that had a strong 'G' first nucleotide bias and peaked at 22 and 26 nt, respectively (Ruby et al. 2006). There was no evidence that these species exist in ES cells. In the four libraries analyzed here, there were 190 distinct sequences, represented by 261 reads, that were uniquely anti-sense to known protein coding exons and could be considered potential analogues of C. elegans siRNAs. However, unlike C. elegans siRNAs, these sequences had no distinct length or first nucleotide bias when compared to the set of 1319 distinct sequences (1500 reads) that were uniquely sense to known protein coding genes (SI Figure 12C,D). This apparent absence of analogous siRNA species in ES cells is not entirely surprising considering that mammals do not have RdRPs homologous to those required for siRNA production in C. elegans. Anti-sense exon-overlapping short RNAs were present in all four libraries (SI Figure 12E). The low abundance of these species in our libraries suggests that they have limited physiological roles; however, similar, more abundant molecules may be 5' end modified such that they were excluded in the libraries analyzed here. 133 Description of ES cell short rRNA fragments Because the RNAi pathway has been implicated in the transcriptional silencing and nuclear organization of rDNA repeats in several model organisms, characteristics of the ES cell short RNAs matching the rDNA repeat were examined (Xie et al. 2004; Cam et al. 2005; Peng and Karpen 2007). It was previously observed in mouse ES cells that specific short rRNAs exist in the absence of Dicer, and do not function as siRNAs (Calabrese and Sharp 2006). The large number of reads sequenced in the libraries described in Chapter 3 allowed for a more extensive analysis of the short rRNAs expressed in mouse ES cells. Consistent with previous observations, short rRNAs mapped predominantly to the 45S precursor region of the rDNA repeat, and were overwhelmingly in the sense orientation with respect to rDNA transcription. 91.9 % of the 27,899 reads mapping to the rDNA repeat mapped to either the mature 18s, 5.8s, or 28s rRNA, and all of these reads were in the sense orientation relative to rDNA transcription. The distribution of short RNAs along the mature rRNA species was surprisingly non-random, and nearly identical between the Dicer+"+ and Dicer-' libraries, demonstrating that these sequences, like other non-miRNA known ncRNAs, are generated independently of Dicer (SI Figure 13). Based on miRNA quantification, as a class, short rRNA species are represented by approximately 9,000 molecules per ES cell. 134 Supporting methods. Generation and characterization of Dicer - - ES cells. All ES cells were cultured and transfected as described (Calabrese and Sharp 2006). Dicer ES cells were derived from mice homozygous for the floxed Dicer allele described in (Harfe et al. 2005), and floxed GFP described in (Ventura et al. 2004). To generate clonal Dicerý cell lines, Cre recombinase was transiently transfected into Dicer + + ES cells. 24 hours post-transfection cells were plated at clonal density onto feeder layers and individual GFP negative colonies were selected and cultured until growth recovered, then expanded, removed from feeder layers, and used for subsequent analysis. Dicer genotyping oligos, from 5' to 3' as illustrated in SI Figure 4, are as follows: (1) 5'-CATGACTCTTCAACTCAAACT-3'; (2) 5'CCTGACAGTGACGGTCCAAAG-3'; (3) 5'-AGCATGGGGGCACCCTGGTCCTGG3'. Sex determination of ES cells was determined as described (Conner 2000). Dicer antibody 1416 was from (Kanellopoulou et al. 2005). 5gg of DNA was used for the southern blot in SI Figure 1. The minor satellite probe was described in (Martens et al. 2005). The mitochondrial DNA probe and DNMT1 null ES cells were gifts from R. Jaenisch. LINE L and SINE B 1 probes were PCR amplified using primers from (Martens et al. 2005). The SINE B 1 northern blot was performed as in (Calabrese and Sharp 2006). Jlaza treatment and analysis. 135 JI ES cells were treated with 30gtM 5-aza-2'-deoxycytidine, 5-aza-dC, (Sigma) dissolved in DMSO, or DMSO only, for 24 hours. DNA and RNA samples were collected for each sample approximately every two days for a total of 2 weeks. To determine the percent cellular DNA methylation HPLC analysis was used as described (Ramsahoye 2002). Samples were loaded onto a Vydac 218TP52 reverse phase C18 HPLC column and a 60 minute isocratic run in 50mM Ammonium phosphate dibasic buffer pH 4.1 was used for separation. As a control, dTMP, dAMP, dCMP, dGMP, and 5mCMP (Sigma and Reliable Biopharmaceutical Corporation) were mixed equally and eluted as above. MuERV-L primers are from (Peaston et al. 2004). Calculation of mature miRNA processing variability. To calculate miRNA processing variability, the number of miRNA reads matching each annotated 5'/3' miRNA end was divided by the total number of reads overlapping the arm of the hairpin on which the mature miRNA was located. If previously unannotated, 5p and 3p miRNAs were assigned from miRNA* strands if the miRNA* species represented >20% of all reads originating from the hairpin. For miRNA hairpins with multiple genomic locations, only the hairpin with the most aligning reads was evaluated. Novel miRNA annotation. Only sequences with <20 matches to the genome were evaluated as potential miRNAs, with the exception of the 8 sequences with >20 hits to the genome that were sequenced >3x in the Dicer+ ' + library and absent in the Dicer"- library. For each short RNA genomic location, two potential miRNA hairpins were defined that encompassed 20 and 80 nt of 136 sequence around the 5' and 3' ends of the short RNA. Potential miRNAs were evaluated for their ability to form hairpins that had secondary structures consistent with Droshaand Dicer processing (Ambros et al. 2003a; Zeng et al. 2005; Han et al. 2006). Those hairpins whose RNAfold output (Hofacker et al. 1994) exhibited base pairing over at least 70% of the length of the potential miRNA, base pairing over or adjacent to the processed ends of the putative pre-miRNA, symmetrical bulges, and double-strandedness existing approximately one helical turn past the Drosha-processedend of the miRNA--between 717 nt after the end--were annotated as novel miRNAs. Additionally, in order for a hairpin to be annotated as a novel miRNA, we required that there existed a sequence comprising the majority of all reads aligning to the novel hairpin (a dominant miRNA), and that this dominant miRNA was sequenced more than once and between 20 and 25 nt long. Requiring a dominant miRNA allowed for unambiguous assignment of the novel miRNA's seed. miRNA-like hairpins with aligned reads that did not produce a dominant miRNA of the specified length, that had only one aligned read, or that had a predicted secondary structure that did not completely satisfy our requirements, were deemed miRNA candidates and not included in the final set of novel miRNAs (SI Alignment File). This set of candidate miRNA hairpins had secondary structures and expression characteristics similar enough to known miRNAs that we believe short RNAs mapping to these loci were likely generated by the miRNA-processing pathway; however, the putative hairpins either lacked sufficient expression for confident annotation, or had characteristics that differed slightly from known miRNAs. 137 From the set of 8 sequences with >20 hits the genome that were sequenced >3x in the Dicer+' + library and absent in the Dicer- library, 3 novel miRNAs were identified. The reads aligning to these three hairpins had 120, 306, 379, and 1416 hits to the genome, respectively. Repeat analysis. Repeat overlap was determined using the Repeatmasker track of the UCSC table browser (Karolchik et al. 2004). For sequences with >20 hits to the genome, repeat-identity was assigned to the repeat and class that overlapped most frequently with each short RNA sequence. For those sequences with >500 hits to the genome, 250 hits were randomly selected 5 times, and specific repeat and class was assigned if the majority of the 250 hits overlapped the same repeat and class all 5 times. Otherwise the specific repeat and class were designated "can't distinguish". For sequences with <20 hits to the genome, the specific repeat and class were annotated for each genomic hit and normalized to the number of genomic hits for each sequence. Conservation and motif analysis. Four-way mammalian alignments (mm7, hg 17, canFam2 and rn3) were extracted from the UCSC genome browser's 17-way mammalian alignments for a given region of length L. The conservation score for a region was determined by (Ji=1:L, Fi)/L, where Fi is the number of bases in the other species (hg 17, canFam2 or rn3) that is identical to that of mm7 at position i, divided by the number of aligned species (maximum of 3) at position i. Information content, IMof motif M of length N was computed as li=1:N, vj fij log2(fij/gj), 138 where fij was the frequency of nucleotide j at position i, and gj was the background frequency of nucleotide j. Nucleotide background frequencies (gA, gT, gG, gc) were defined as (0.3, 0.3, 0.2, 0.2). For small sets of sequences, the algorithm MEME was utilized to identify motifs with a minimum width of 6, and an e-value cutoff of 0.001 (Bailey 1994). 139 Figure legends. SI Figure 10. Highly repetitive reads distribute uniformly across chromosomes, with the highest density of short RNA-producing loci on chromosome X. (A) Count of short RNA-producing loci and total repeat content in 0.5 Mb windows is shown for representative chromosomes (1 and X). (B) The ratio of repetitive short RNA-producing loci to total repeat content, per chromosome. (C) The ratios from (B) represented by cDNA library, normalized to the total number of short RNA-producing loci divided by total genome repeat content. (D) Proportion of repeat-associated short RNA hits per chromosome, by repeat class. (E) Count of distinct repetitive short RNAs matching each chromosome. SI Figure 11. Analysis of repeat-overlapping novel reads with less than 20 hits to the genome. (A) Hit-normalized Repeatmasker classification of novel reads with less than 20 hits to the genome; "s" denotes sense overlap; "a" denotes anti-sense overlap. (B) Length distribution of repeat-overlapping compared to non-repeat overlapping novel reads. (C) Location of repeat overlapping novel reads with respect to UCSC known genes. (D) Conservation of repeat-overlapping compared to non-repeat overlapping novel reads. (E) Ratio of repeat overlapping short RNA loci to total repeat content, per chromosome. SI Figure 12. Description of sequences that are within piRNA clusters, and anti-sense to exons. Length distribution, (A), and first nucleotide, (B), of sequences mapping to piRNA clusters compared to known miRNA sequences. Length distribution, (C), and first nucleotide, (D), of sequences mapping anti-sense and sense to exons. (E) Proportion of 140 sequences that are uniquely sense and anti-sense to exons, by cDNA library. SI Figure 13. Comparison of Dicer+'' and Dicer- rDNA reads. Distribution of short rRNA hit starts along the 45kb rDNA precursor, bases 3500-14000, with Dicer +/+ reads above, and Dicer -/" reads below the x-axis. The location of the mature 18s, 5.8s, and 28s rRNA sequences are indicated below the graph. 141 ""^^ 0.. 0° 1 2500 -repetitive short RNA hits -total repeat content Srepetitive short RNA hits -total repeat content 2000 2000 I in . . 0 c. 1500 (= ooo1000 1000 -r 500 '1 0 20 40 60 80 100 120 140 160 180 200 0 Chromosome 1 postion (x 10^7 bases) 0.08 F I -- M 20 40 60 80 100 120 160 140 Chromosome X postion (x 10^7 bases) ni 0.07 131 831aza Dicer+/+ SDicer-/- 0.06 |; ' C 0.05 e 0.03 0.02 0.01 ;· i; - - ... -- t 0 0 lllllh .4.41eo4 .4 4 "14 "14aa 1 '""" ILINE 1SINE MSimple_repeat 0 LTR C MOther 1100 1050 C 930 tq SI Figure 10. 14 Id14 "O.4N An A " _a .4Nm~aaO.4O4MNuOeN5Q~) • =* ~.1_·oluu A.J~r~~(~rr~ 142 A 8 31 0 C J01+5-aza 7t Dicer +/+ N Dicer -/- S6 CL4 3 0 41 sSINE m aSINE sSimple aSimple s ILs NE R aLINE C 0.2 0U N novels with repeat overlap -0.18 1 I novels with no repeat overlap 0 0.16 Mnovels with repeat overlap 0 novels with no repeat overlap 0 0.8 S0.14 0.6 &0.12 L 0.1 ! 0.1 0 0.08 = 0.4 0.4 0.06 0 0.0 16 20 18 22 0.C 24 26 0 28 30 32 - 0/ Length (nt) D , o0 oo $ 0 E A 0.35 o 0.0018 * novels with repeat overlap 8 novels with no repeat overlap 0.3 c 0.0016 . 0.0014 0.25 0.0012 S0.2 0.001 C 0.15 0.0008 . 0.0006 z 0.0004 01 Q 0.05 e0 0.0002 0 0.2 0.4 0.6 Conservation score SI Figure 11. 0.8 1 0 molatiner oNm' torHVX 143 A S0.5 . (a 100- N miRs reads from - 4 0.41 piRNA clusters, C 80 Cr 0.3 60 o" 40 o S0.2 C 0 16 1 8 .AJII i,. 20 0 20 22 24 26 Length (nt) 0.2 • 100 I anti-sense to exons i sense to exons 01 ,m .I A-e m miRs 0 so O. U reads from piRNA clusters U. S60 L c 0.08 40 o0 0 0.04 0 6 0n16 i 0.121 20 0• o0 18 20 22 24 26 28 30 32 Length (nt) - sense to exons anti-sense to exons sense to exons 0.081 0.04 ~1 -------~' All libraries SI Figure 12. J 1 1Jlaza Dicer+/+ Dicer-/- _'I - - anti-sense to exons 144 Dicer +/+ iILL.. iJl, J1 ,I ,1 _~·ll '' ,,t..... Il Ji .I I Dicer -/1 18s SI Figure 13. 5.8s Sit --T 1,it L11 28s S II . 145 References. Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M. et al. 2003a. A uniform system for microRNA annotation. Rna 9(3): 277-279. Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr Biol 13(10): 807-818. Bailey, T.L.a.E., C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedingsof the Second International Conference on Intelligent Systems for Molecular Biology, August: 28-36. Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12): 2092-2102. Cam, H.P., Sugiyama, T., Chen, E.S., Chen, X., FitzGerald, P.C., and Grewal, S.I. 2005. Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome. Nat Genet 37(8): 809-819. Conner, D.A. 2000. Mouse Embryonic Stem (ES) Cell Isolation. CurrentProtocols in MolecularBiology 23.4.1. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901. Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. Proc Natl Acad Sci U S A 102(31): 10898-10903. Hofacker, I.L., Fontana, W., Stadler, P.F., Bonhoeffer, L.S., Tacker, M., and Schuster, P. 1994. Fast Folding and Comparison of RNA Secondary Structures. Monatsh Chem 125: 167-188. Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev 19(4): 489-501. Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32(Database issue): D493-496. Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes. Science 313(5785): 363-367. 146 Martens, J.H., O'Sullivan, R.J., Braunschweig, U., Opravil, S., Radolf, M., Steinlein, P., and Jenuwein, T. 2005. The profile of repeat-associated histone lysine methylation states in the mouse epigenome. Embo J 24(4): 800-812. O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against genome intruders. Cell 129(1): 37-44. Pak, J. and Fire, A. 2007. Distinct populations of primary and secondary effectors during RNAi in C. elegans. Science 315(5809): 241-244. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7(4): 597-606. Peng, J.C. and Karpen, G.H. 2007. H3K9 methylation and RNA interference regulate nucleolar organization and repeated DNA stability. Nat Cell Biol 9(1): 25-35. Ramsahoye, B.H. 2002. Measurement of genome wide DNA methylation by reversedphase high-performance liquid chromatography. Methods 27(2): 156-161. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207. Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. 2007. Secondary siRNAs result from unprimed RNA synthesis and form a distinct class. Science 315(5809): 244247. Ventura, A., Meissner, A., Dillon, C.P., McManus, M., Sharp, P.A., Van Parijs, L., Jaenisch, R., and Jacks, T. 2004. Cre-lox-regulated conditional RNA interference from transgenes. P NatlAcadSci USA 101(28): 10380-10385. Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D., Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2(5): E104. Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. Embo J24(1): 138-148. 147 Chapter 4 Short RNAs in the sense and anti-sense orientation from transcription initiation sites in mouse embryonic stem cells The described experiments were an equal collaboration with Amy C. Seila, and also performed with Gene W. Yeo, and Stuart S. Levine. 148 Abstract Short RNAs 20-30 nucleotides (nt) in length mediate essential regulatory processes in eukaryotes by interfering with gene expression on pre- and posttranscriptional levels in processes termed RNA interference (RNAi) (Bartel 2004; Tolia and Joshua-Tor 2007). To further our understanding of the mechanisms by which short RNAs regulate gene expression in mammalian cells, we characterized mouse ES cell short RNAs to a depth of one RNA per cell. Here we describe a new class of short RNAs that lie in close proximity to the transcription start sites of many genes (TSS-3kb RNAs). TSS-3kb RNAs exist independently of Dicer activity, and, surprisingly, are found in both the sense and anti-sense orientation relative to transcription start sites of their associated genes. TSS-3kb RNAs have no 5' nucleotide bias, in sharp contrast to the initial 'G' bias of known RNA polymerase II products, suggesting that TSS-3kb RNAs are generated by a processing event that occurs at the 3' ends of nascent transcripts. Genes associated with TSS-3kb RNAs show chromatin marks associated with active transcription in human ES cells, and are predominantly, although not exclusively, expressed in mouse ES cells. We hypothesize that TSS-3kb RNAs are evidence of widespread bidirectional initiation and pausing of Pol II in ES cells, and speculate that this frequent bidirectional initiation and pausing may help maintain chromatin structure in a state poised for rapid transcriptional activation. 149 Introduction The initiation of productive transcription is a multi-step process that requires the concerted action of transcription factors, chromatin modifying enzymes, and nucleosome remodeling complexes. In part, regulation of this process is achieved by modulating the recruitment of RNA Polymerase II (Pol II) to promoters (Ptashne and Gann 1997). However, recent work suggests a large fraction of genes are subject to regulation at the stage of transcriptional elongation, presumably by a process that induces the pausing of Pol II near transcription start sites (Guenther et al. 2007; Muse et al. 2007; Zeitlinger et al. 2007). Pol II pausing is a phenomenon by which Pol II initiates transcription but pauses approximately 20-50 nt downstream of initiation sites while remaining elongation competent (Saunders et al. 2006). The pausing of Pol II has been described for many genes, including several Drosophilaheat shock genes, and the mammalian c-myc, c-fos, and junB (Rougvie and Lis 1990; Spencer and Groudine 1990; Aida et al. 2006). In these specific examples, pausing is induced by DSIF and the negative elongation factor NELF, and is likely overcome by P-TEFb guided phosphorylation of Pol II's carboxy terminal domain, and by the transcript cleavage activity of TFIIS (Saunders et al. 2006). Genome-scale location analyses of markers for transcription initiation suggest that Pol II pausing occurs at a large fraction of eukaryotic genes. The promoters of many protein-coding genes in several human cell lines were recently found to associate with marks of transcription initiation, including RNA Polymerase II (Pol II), and H3 lysine 4 trimethlylated (H3K4me3) nucleosomes (Guenther et al. 2007). Notably, a significant 150 fraction of genes associated with these initiation marks were not expressed at detectable levels, suggesting the frequent occurrence of initiation without elongation human cells (Guenther et al. 2007). Genome-wide location analysis of Pol II in Drosophilamelanogaster also supports the notion that a significant fraction of genes are subject to post-initiation transcriptional regulatory mechanisms. Examination of whole embryos or DrosophilaS2 cells showed a promoter proximal enrichment of Pol II at 10-20% of genes (Muse et al. 2007; Zeitlinger et al. 2007). Many of these genes were expressed at low or undetectable levels, consistent with the observations described above for human cells (Guenther et al. 2007; Muse et al. 2007; Zeitlinger et al. 2007). Further, ontological analysis showed significant enrichment for genes involved in developmental control or response to external stimuli, supporting the idea that promoter proximal pausing of Pol II keeps genes poised to rapidly respond to developmental or environmental changes (Muse et al. 2007; Zeitlinger et al. 2007). Knockdown of NELF in DrosophilaS2 cells results in loss of promoter proximal Pol II enrichment at most genes, mechanistically linking genomewide Pol II promoter enrichment to previously described Pol II pausing phenomenon (Muse et al. 2007). Herein, we describe a class of short RNAs identified in mouse ES cells that we hypothesize are products of bidirectionally initiated and paused Pol II. These RNAs are low in abundance, exist independently of Dicer activity, and cluster near transcription start sites that associate with features of transcription initiation in ES cells. The existence of these short RNAs suggests that Pol II may initiate bidirectionally at many ES cell 151 protein-coding genes, potentially as a mechanism that helps maintain chromatin in an active state. Results During the analysis of approximately 300,000 short RNA reads from four mouse ES cell cultures (described in Chapter 3), we identified a class of short RNAs (TSS-3kb RNAs) that map in close proximity to the transcription start sites of known proteincoding genes. Specifically, 4,150 reads map to within ±3kb of the transcription start sites of 2,637 distinct genes. These RNAs cluster bidirectionally around transcription start sites in a strikingly non-random distribution (Figure lA). There is a major peak of sequences transcribed in the same orientation as their surrounding genes (defined as the "sense" orientation), located between 0 and 50 nts downstream of transcription start sites. A similar, but slightly broader, peak of sequences is observed at approximately 250 nts upstream of transcription start sites. Surprisingly, the sequences that map upstream of transcription start sites are transcribed in the opposite orientation (defined as the "antisense" orientation) to their associated genes. These metagene profiles do not significantly change when subpopulations of sequences that map uniquely to the genome or when sequences that map to a single gene annotation are analyzed (Figure IB). Further, similar TSS-3kb metagene profiles were present in 4 independently derived short cDNA libraries, including a library made from ES cells lacking Dicer, indicating that these RNAs are common of multiple ES cell lines and not generated by Dicer processing. (Figure IC). 152 The length distribution of TSS-3kb RNAs is broad and extends from 16 to 31 nucleotides with a mean of 21.5 (Figure 2A). The distinct peak at 21.5 nucleotides suggests that these RNAs do not arise from a process expected to generate a uniform distribution, as the cDNA library preparation method selected for RNAs between 17 and 30 nts in length. Comparison to quantification of miRNA levels (Chapter 3) suggests that individual TSS-3kb RNAs are present at a maximum of five copies per ES cell. Despite this low abundance, there are common TSS-3kb sequences and associated genes between cDNA libraries, suggesting a non-random biogenesis. For example, there were 29 identical TSS-3kb sequences present between the two libraries derived from J ES cells, indicating that the site of short RNA biogenesis at each gene is not random. Moreover, TSS-3kb RNAs associate with common genes between cDNA libraries, supporting the idea that a specific subset of genes generates promoter proximal short RNAs. 115 common genes were associated with short RNAs between the two Jl ES cell derived libraries, from a total of 582 and 819 TSS-3kb RNA associated genes, respectively. From this overlap, we estimate the total cellular pool of TSS-3kb associated genes to be approximately 4,100 (Methods). Further, the frequency of short RNAs per gene in this set is approximately 0.2, suggesting that the average initiation site in this set of genes has a detectable short RNA associated with it approximately 1/5 of the time. Their Dicer-independent biogenesis, together with their non-random orientation around transcription start sites, suggests that the TSS-3kb RNAs are generated by the bidirectional initiation of Pol II. Notably, the sense peak of TSS-3kb RNAs is located in a region relative to transcription initiation sites known to frequently harbor paused Pol II 153 (Saunders et al. 2006; Muse et al. 2007; Zeitlinger et al. 2007). Moreover, the transition from transcription of a naked DNA template to one wrapped around nucleosomes induces the pausing of Pol II (Kireeva et al. 2005), and the sense and anti-sense peaks of TSS3kb RNAs surround promoter regions known to typically have low nucleosome density (Lee et al. 2004; Giresi et al. 2007; Ozsolak et al. 2007). These observations raise the possibility that Pol II accesses regions of low nucleosome density bidirectionally, and pauses when transcription complexes encounter regions of higher nucleosomal density. We hypothesize TSS-3kb RNAs to be the products of this putative promoter proximal pausing of Pol II in mouse ES cells. TSS-3kb RNAs have no 5'nucleotide bias, in sharp contrast to the 'G' bias seen at the 5' ends of most Pol II initiation sites. 25% of TSS-3kb RNAs initiate with 'G', as compared to the 46% of TSS-3kb associated genes, where RNA synthesis initiates with 'G' (Figure 2B). Consistent with the bias observed for TSS-3kb associated genes, sequences associated with 5'capped mRNAs frequently begin with 'G'. Transcription start sites have been mapped throughout the mouse genome using a method known as Cap Analysis Gene Expression (CAGE). This method is based on the preparation and sequencing of DNA tags derived from the initial 20 nucleotides of 5'capped mRNAs (Shiraki et al. 2003; Carninci et al. 2005). Considering that CAGE tags cluster around transcription start sites, we analyzed the 5' nucleotide bias in this population. Similar to the bias observed for TSS-3kb associated genes, 51% of CAGE tags begin with 'G' (Figure 2B). The differences between the 5'nucleotide composition of the TSS-3kb RNAs and 5'nucelotide of Pol II initiation products strongly suggests that the 5' ends of the former are generated by processing of Pol II transcripts. 154 Interestingly, TSS-3kb associated promoters are associated with CpG islands above what would be expected from random chance. 80% of TSS-3kb associated genes have a CpG island within 1kb of their transcription start site. This represents a strong enrichment over the total gene population or randomly selected sets of genes (Methods), 55% and 56%±1 of which, respectively, have promoters mapping within 1kb of a CpG island. We next performed gene ontology analysis of TSS-3kb associated genes to determine if they are enriched in notable biological processes (Ashburner et al. 2000; Beissbarth and Speed 2004). In Drosophila,the pausing of Pol II frequently occurs at genes involved in responses to developmental or external stimuli (Muse et al. 2007; Zeitlinger et al. 2007). In contrast, gene ontology analysis of TSS-3kb associated genes shows no significant enrichment for genes involved in developmental responses or response to stimuli, although TSS-3kb associated genes are frequently associated with cellular metabolic processes (Figure 3B). Supporting the hypothesis that TSS-3kb RNAs are a product of bidirectionally initiated Pol II, the genes around which they cluster associate with marks of active transcription. The majority of TSS-3kb associated genes produce full-length transcripts that are detected in ES cells. For this analysis, we utilized two previously published mouse ES cell microarrays (Ivanova et al. 2002; Hailesellasse Sene et al. 2007). 74% of TSS-3kb genes with unambiguous calls on the arrays were present at detectable levels in ES cells. The remaining 26% of TSS-3kb genes with unambiguous calls on the arrays were not expressed at detectable levels. Sense and anti-sense short RNAs associated equally with expressed and non-expressed genes (Figure 4A). Together, these 155 observations show that, although most TSS-3kb associated genes are expressed at detectable levels, the presence of a TSS-3kb RNA at a promoter is not directly related to transcript levels. To determine if the chromatin structure surrounding TSS-3kb associated genes is consistent with the presence of a bidirectionally initiating Pol II, we compared TSS-3kb short RNA coordinates with previously published genome-wide location analyses of chromatin modifications and associated factors in human ES cells (Lee et al. 2006; Guenther et al. 2007). In order to perform this analysis, mouse TSS-3kb short RNA coordinates were converted to their homologous human location using the UCSC liftover utility LocusLink (http://genome.ucsc.edu/; Kent et al. 2002). After this conversion, 2,000 TSS-3kb short RNA coordinates (60% of the total) mapped uniquely to the human genome, 1,409 of which were within ±3kb of an annotated human gene TSS and present on the tiling arrays used for the genome-wide analysis. These coordinates showed the same distribution with respect to transcription start sites as the total population of TSS3kb RNAs, indicating that they comprise a representative subset of the larger population (not shown). Comparison to the genome-scale chromatin analyses shows that TSS-3kb RNAs associate with features of active transcription. 95% of converted TSS-3kb short RNA coordinates overlapped with sites of H3K4me3, and 79% overlapped with chromatin bound by the initiated form of Pol II, compared to 8 1±1% and 47±2% for the background set of coordinates, respectively (Figure 4B). Both of these chromatin marks are indicative of active transcription. Further, only 5% of converted TSS-3kb coordinates overlapped with the Polycomb complex component Suzl2 as compared 11 1% for the 156 background set (Figure 4B). Therefore, consistent with their strong association with markers of transcription activation, TSS-3kb short RNAs are not preferentially associated with genes that are repressed by the Polycomb complex. Discussion In summary, analysis of ES cell short RNA expression to single molecule depth revealed a class of low abundance, Dicer-independent short RNAs that cluster nonrandomly around transcription start sites of protein-coding genes. Based on previous work mapping Pol II pause sites and nucleosome density at promoters, we hypothesize that these RNAs are evidence of widespread bidirectional transcription initiation and nucleosome-induced pausing of ES cell Pol II. Consistent with this hypothesis, the genes around which short RNAs cluster associate with features of active transcription in ES cells, including the presence of H3K4me3 nucleosomes and Pol II at their promoters. Although it is known that Pol II can pause in the sense direction with respect to its bound genes, we believe the data presented here to be the first evidence for the presence of a paused Pol II in the anti-sense direction at many initiation sites. The position of paused, anti-sense Pol II is inferred by the presence of short RNAs most frequently located -250 nucleotides upstream of the TSS. Anti-sense short RNAs were observed at almost the same frequency as sense short RNAs, suggesting that significantly more promoters are capable of executing bidirectional initiation than previously expected (Li et al. 2006). Also, genes associated with short RNAs are enriched in CpG island promoters, suggesting that bidirectional transcription may frequently occur at this promoter structure. 157 TSS-3kb RNAs have no first nucleotide bias, suggesting they do not represent the initial -22 nucleotides of RNA transcribed by Pol II, but rather have been processed from the 3' ends of nascent transcripts. There is precedent for the generation of short RNAs from nascent transcripts: during the TFIIS-enhanced release of promoter-proximal Pol II pause, although the production of anti-sense short RNAs in this process has not been previously described (Adelman et al. 2005; Kireeva et al. 2005; Galburt et al. 2007). TFIIS aids in the reversal of Pol II pause by stimulating the intrinsic nuclease activity of Pol II, resulting in cleavage near the nascent transcript's 3' end, escape from pause, and transition to productive elongation (Wind and Reines 2000). Initial in vitro studies showed that TFIIS cleavage typically releases di-nucleotide RNA fragments from nascent transcripts, however, pauses induced by specific DNA sequences or nucleosomal barriers generate longer TFIIS cleavage fragments, up to 30 nt (Izban and Luse 1993a; Izban and Luse 1993b; Adelman et al. 2005; Kireeva et al. 2005). Although plausible, it seems unlikely that ES cell TSS-3kb RNAs are produced by TFIIS cleavage, as productive elongation is largely unidirectional and TSS-3kb RNAs exist in both orientations surrounding transcription start sites. Nevertheless, there is clearly an established link between the pausing of Pol II near promoters and the generation of short RNA fragments, and it is possible that a process mechanistically related to TFIIS cleavage is responsible for TSS-3kb short RNA production. Alternatively, TSS-3kb RNAs may be generated by exonucleolytic degradation of nascent transcripts associated with paused Pol II. The 5' to 3' exonuclease Xrn2 is known to facilitate a pause-dependent degradation of uncapped Pol II transcripts after poly(A) site cleavage (Gromak et al. 2006), and may be expected to generate -22 nt TSS-3kb 158 RNAs if paused Pol II remains associated with nascent transcripts during degradation. This putative biosynthetic pathway would predict the association of Xrn2 with promoter regions, and would also require that paused transcripts are decapped, as Xrn2 can only degrade substrates lacking a 7meG cap structure. Physical detection of the TSS-3kb short RNAs will likely provide insights into their mechanisms of biogenesis and biochemical properties. The broad, non-random size distribution of TSS-3kb short RNAs suggests that they are produced over a large range of sizes, and peak at -20 nt in length. Visualization of specific TSS-3kb short RNAs, perhaps via a hybridization-based protocol, will be necessary to either confirm or refute this observation. Initial detection attempts using a standard short RNA northern blotting procedure and large quantities of ES cell RNA have failed to detect TSS-3kb short RNAs (A.C. Seila, not shown), necessitating the development of more sensitive detection methods. In addition to size visualization, the establishment of a hybridization-based detection protocol will allow the determination of the TSS-3kb short RNA end modifications. The technology used to prepare the analyzed short cDNA libraries relied on the presence of a 5' phosphate and 3' hydroxl, and thus the sequenced TSS-3kb RNA sequences likely have these end modifications; however, it is possible that related short RNA species were excluded from the prepared cDNA libraries due to incompatible end modifications, such as a me 7G cap structure. Moreover, the physical detection of a transcribing, anti-sense Pol II is needed to support the hypothesis that Pol II initiates and pauses bidirectionally at thousands of ES cell genes. The detection of an anti-sense transcription bubble in vivo via potassium permanganate cleavage assays would be a strong indication of the presence of a 159 transcriptionally engaged anti-sense Pol II at promoters. Alternatively, genome-scale location analysis of Pol II using DNA fragmented with micrococcal nuclease may provide high enough resolution to differentially detect sense and anti-sense Pol II at the same promoter. Lastly, high-density promoter tiling arrays or large-scale sequencing of -200 nt RNAs may reveal the larger anti-sense transcripts proposed to be the precursors of TSS-3kb short RNAs. One such analysis performed in human cells did reveal the presence of <200 nt anti-sense RNAs near promoters (Kapranov et al. 2007); whether or not these human short RNAs are related to the process that generates ES cell TSS-3kb RNAs is unclear. We hypothesize that the putative bidirectional initiation and pausing of Pol II may be a mechanism by which ES cell genes are maintained in a state poised for transcriptional activation. Future work examining the location and polarity of Pol II at ES cell promoters will be the first step towards proof of this hypothesis. It will also be of great interest to see if other cell types express short RNAs that similarly cluster around transcription start sites. Of particular interest are the short RNAs present in Drosophila S2 cells, a single cell type in which the genome-wide pausing of Pol II is prevalent and now well documented (Muse et al. 2007). It is possible that bidirectional initiation of Pol II is an unobserved aspect of previously described pausing phenomena, and that downstream factors, such as pTEFb, impose the unidirectionality of productive elongation upon release from pause. 160 Methods Short RNA sequencing and identification of TSS-3kb RNAs. Preparation of short cDNA libraries and 454 read processing are described in Chapter 3 of this thesis. To identify TSS-3kb RNAs, genomic coordinates of previously uncharacterized short RNAs were compared with all mouse genes annotated in the UCSC known gene and RefSeq databases (http://genome.ucsc.edu/; Pruitt et al. 2005; Hsu et al. 2006). Those short RNAs within ±3kb of an annotated transcription start site (TSS) were defined as TSS-3kb RNAs. Based on this definition, a single TSS-3kb RNA can map to multiple gene annotations for the same gene. The distance to the TSS is defined as the distance from the TSS to the end of the short RNA closest to the TSS. Estimation of total TSS-3kb associated gene number. The Peterson estimator (Seber 1982) was used to estimate the number of genes associated with TSS-3kb short RNAs in ES cells, N: N nln 2/m2 Where nl and n2 are the number of TSS-3kb genes sampled in the first and second libraries, respectively, m2 is the number of TSS-3kb genes found in both the nl and n2 libraries, and N the estimated total of TSS-3kb genes. The J1 and Jlaza libraries contained 582 (ni) and 819 (n2) TSS-3kb genes, respectively. 115 of these genes overlap between these the two independent samplings (m2) suggesting that the total cellular pool of TSS-3kb associated genes is approximately 4,100. 161 CpG island overlap, first nucleotide analysis, and Gene Ontology analysis. CpG island and CAGE coordinates were downloaded from the UCSC genome database (Karolchik et al. 2003; Karolchik et al. 2004). Genes with transcription start sites located within lkb of CpG islands were counted as overlapping. To create background sets of genes, for each TSS-3kb short RNA a gene on the same chromosome was selected randomly from the list of UCSC known gene and RefSeq annotations. This randomization was run 100 times. The CpG island association for the background set was calculated by taking the average of 100 random runs and the error is defined as the standard deviation for the 100 random sets. All UCSC known gene and RefSeq annotations were used in the determination of the first nucleotide of protein-coding genes. Gene ontology analysis was performed using GOstat, comparing to all genes in the mouse GO database and using the Benjamini method for multiple testing correction (Beissbarth and Speed 2004). ES cell expression analysis. Data from previously published mouse ES cell microarrays was used for expression analysis (Ivanova et al. 2002; Hailesellasse Sene et al. 2007). The data from Hailesellasse Sene and colleagues consisted of 3 replicates per chip with present, absent and marginal calls defined for each gene in each replicate (Hailesellasse Sene et al. 2007). For this analysis the three replicates were combined; each gene with three present calls was defined as present, each gene with three absent calls was defined as absent, and each gene with different calls or three marginal calls was excluded from the analysis. For the data collected by Ivanova and colleagues, we utilized the present and absent calls 162 published in the manuscript (Ivanova et al. , 2002). For both data sets, each probe id in the affymetrix data set was associated with the appropriate gene symbol. If more than one probe id was associated with a given gene symbol, then the probe with the highest expression was used to define the Gene Symbol. Comparison to human ChIP-chip The mm7 TSS-3kb coordinates were mapped to human assembly hgl7 using liftover (http://genome.ucsc.edu/). Of the 3,372 mouse coordinates, 2,000 mapped to the human genome and 1,599 of these were present on the ChIP-Chip array (Lee et al. 2006). All transcripts assigned an Entrez-gene ID were used to determine if the TSS-3kb RNA was within +3kb of a TSS in both the human and mouse genomes (NCBI; Boyer et al. 2005). Homologene was used to determine if each mouse TSS-3kb associated gene mapped uniquely to a human gene (Wheeler et al. 2002); http://www.ncbi.nlm.nih.gov/HomoloGene). 85% of these genes mapped to human genes that were proximal to a remapped human TSS-3kb short RNA. Overlap was counted if a TSS-3kb associated gene fell within lkb of an enriched region for the factors described (Lee et al. 2006; Guenther et al. 2007). A background set of short RNAs was created to compare the enrichment of TSS-3kb RNA genes in H3K4me3, RNA Pol II, and Suzl2 with what would be expected from random. For each TSS-3kb RNA, the distance from the short RNA to the nearest promoter was determined, a random gene was selected from the mouse genome, and the sequence of equal distance from that gene was selected. The coordinates of this random sequence were then mapped to the human genome assembly hg 17 using liftover. Randomizing in mouse instead of in human was done to remove any 163 bias arising from the liftover process. We then determined if the human gene proximal to this random sequence was enriched for H3K4me3, RNA Pol II and Suzl2 as above. The enrichment for factor binding was calculated by determining the enrichment mean from 100 random data sets and the error is the standard deviation from the 100 data sets. 164 Figure Legends Figure 1. Distribution of short RNAs around transcription start sites of known genes. (A) Shown is a metagene profile of all TSS-3kb RNAs and their associated distances to UCSC known gene or Refseq gene transcription start sites. Counts of TSS-3kb RNA start positions relative to gene transcription start sites are binned in 50 nucleotide windows. Red and blue bars represent bins of TSS-3kb RNAs in the sense and anti-sense orientation with respect to gene transcription, respectively. (B) Metagene profiles for the 67% of all TSS-3kb RNAs that map uniquely to the genome or for the 24% that map to a single gene annotation. (C) Metagene profiles of TSS-3kb short RNAs in individual cDNA libraries. Figure 2. TSS-3kb RNA length and 1st nucleotide distributions. (A) The length distribution of the TSS-3kb short RNA sequences. (B) Ist nucleotide distribution for the TSS-3kb short RNAs, TSS-3kb associated genes, and all CAGE tags in the mouse genome. Figure 3. CpG island overlap (A) and significantly enriched GO terms (B) for TSS-3kb associated genes. Figure 4. TSS-3kb genes tend to associate with features of active transcription. (A) Metagene profiles of TSS-3kb short RNA locations relative to genes with detectable (present) or undectable (absent) expression in ES cell. (B) Fraction of TSS-3kb 165 associated genes enriched in protein-bound chromatin fragments compared to a random background. 166 SENSE c.) (n o C,) I -2000 -3000 B) 2000 3000 g 0. unique sites - _~ -1000 0 1000 distance to the TSS 00 o 88 Co 00l) 5) Cc)o- (9o -3000 -2000 ,0-00 0.0 10 -1000 0 1000 distance to the TSS (nc' ! 2000 3000 -3000 J1 60] 4020 I -2000 -1000 0 1000 distance to the TSS 2000 Jlaza Ii 8050- S40 20 -3000 -2000 -1000 0 1000 2000 distance to the TSS -31 distance to the TSS Dicer -/ - 200 C 000 -2000 -1000 0 1000 2000 3000 30 00 Dicer +/+ 40- S150 30- V) 1001 0 s50 C S0 IY 50j I5l 0 1000 20010 3000 distance to the TSS Figure 1. 20 o 9 10- 20 r 0 1000 2000 distance to the TSS 3000 3000 167 c- HI S0 00 cN -- I I 16 20 • I I• 25 30 read length B Sequence set: A C G U First nt TSS-3kb RNAs 0.27 0.24 0.25 0.24 First nt genes with TSS-3kb 0.25 0.21 0.46 0.09 First nt CAGE elements 0.16 0.51 0.13 Figure 2. 0.20 168 A 100 C: 0 4-a O 80- 0 60as CO 0-40- 0oc~ 0, 200- - TSS-3kb All genes Random genes set B I ~ RNA metabolism protein metabolism nucleotide metabolism .. .. .. cellular metabolic I processes "f t 'f" 20 40 60 80 Significance of enrichment (p-value = le X ) Figure 3. 169 A) O C0 0 -------. ----------- h .......... .- ---:~--o) 0 0 N E oO C5 -3000 -2000 -1000 · · 0 1000 2000 Distance to the TSS B) 100 C- U TSS-3kb 80 0I 60 oC) 40 genes Random set 20 I H3K4me3 Figure 4. Pol II -n I Suz12 3000 170 Adelman, K., Marr, M.T., Werner, J., Saunders, A., Ni, Z., Andrulis, E.D., and Lis, J.T. 2005. Efficient release from promoter-proximal stall sites requires transcript cleavage factor TFIIS. Mol Cell 17(1): 103-112. Aida, M., Chen, Y., Nakajima, K., Yamaguchi, Y., Wada, T., and Handa, H. 2006. Transcriptional pausing caused by NELF plays a dual role in regulating immediate-early expression of the junB gene. Mol Cell Biol 26(16): 6094-6104. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T. et al. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1): 2529. Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Beissbarth, T. and Speed, T.P. 2004. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics20(9): 1464-1465. Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G. et al. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122(6): 947-956. Carninci, P. Kasukawa, T. Katayama, S. Gough, J. Frith, M.C. Maeda, N. Oyama, R. Ravasi, T. Lenhard, B. Wells, C. et al. 2005. The transcriptional landscape of the mammalian genome. Science 309(5740): 1559-1563. Galburt, E.A., Grill, S.W., Wiedmann, A., Lubkowska, L., Choy, J., Nogales, E., Kashlev, M., and Bustamante, C. 2007. Backtracking determines the force sensitivity of RNAP II in a factor-dependent manner. Nature 446(7137): 820-823. Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. 2007. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17(6): 877-885. Gromak, N., West, S., and Proudfoot, N.J. 2006. Pause sites promote transcriptional termination of mammalian RNA polymerase II. Mol Cell Biol 26(10): 3986-3996. Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. 2007. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130(1): 77-88. Hailesellasse Sene, K., Porter, C.J., Palidwor, G., Perez-Iratxeta, C., Muro, E.M., Campbell, P.A., Rudnicki, M.A., and Andrade-Navarro, M.A. 2007. Gene 171 function in early mouse embryonic stem cell differentiation. BMC Genomics 8: 85. Hsu, F., Kent, W.J., Clawson, H., Kuhn, R.M., Diekhans, M., and Haussler, D. 2006. The UCSC Known Genes. Bioinformatics22(9): 1036-1046. http://genome.ucsc.edu/. Ivanova, N.B., Dimos, J.T., Schaniel, C., Hackney, J.A., Moore, K.A., and Lemischka, I.R. 2002. A stem cell molecular signature. Science 298(5593): 601-604. Izban, M.G. and Luse, D.S. 1993a. The increment of S11-facilitated transcript cleavage varies dramatically between elongation competent and incompetent RNA polymerase II ternary complexes. JBiol Chem 268(17): 12874-12885. -. 1993b. SII-facilitated transcript cleavage in RNA polymerase II complexes stalled early after initiation occurs in primarily dinucleotide increments. JBiol Chem 268(17): 12864-12873. Kapranov, P., Cheng, J., Dike, S., Nix, D.A., Duttagupta, R., Willingham, A.T., Stadler, P.F., Hertel, J., Hackermuller, J., Hofacker, I.L. et al. 2007. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316(5830): 1484-1488. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J. et al. 2003. The UCSC Genome Browser Database. Nucleic Acids Res 31(1): 51-54. Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32(Database issue): D493-496. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res 12(6): 996-1006. Kireeva, M.L., Hancock, B., Cremona, G.H., Walter, W., Studitsky, V.M., and Kashlev, M. 2005. Nature of the nucleosomal barrier to RNA polymerase II. Mol Cell 18(1): 97-108. Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. 2004. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36(8): 900-905. Lee, T.I., Jenner, R.G., Boyer, L.A., Guenther, M.G., Levine, S.S., Kumar, R.M., Chevalier, B., Johnstone, S.E., Cole, M.F., Isono, K. et al. 2006. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125(2): 301-313. 172 Li, Y.Y., Yu, H., Guo, Z.M., Guo, T.Q., Tu, K., and Li, Y.X. 2006. Systematic analysis of head-to-head gene organization: evolutionary conservation and potential biological relevance. PLoS Comput Biol 2(7): e74. Muse, G.W., Gilchrist, D.A., Nechaev, S., Shah, R., Parker, J.S., Grissom, S.F., Zeitlinger, J., and Adelman, K. 2007. RNA polymerase is poised for activation across the genome. Nat Genet. NCBI. Homologene. In. Ozsolak, F., Song, J.S., Liu, X.S., and Fisher, D.E. 2007. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol 25(2): 244-248. Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33(Database issue): D501-504. Ptashne, M. and Gann, A. 1997. Transcriptional activation by recruitment. Nature 386(6625): 569-577. Rougvie, A.E. and Lis, J.T. 1990. Postinitiation transcriptional control in Drosophila melanogaster. Mol Cell Biol 10(11): 6041-6045. Saunders, A., Core, L.J., and Lis, J.T. 2006. Breaking barriers to transcription elongation. Nat Rev Mol Cell Biol 7(8): 557-567. Seber, G. 1982. The Estimation ofAnimal Abundance and RelatedParameters.Arnold, London. Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R., Watahiki, A., Nakamura, M., Arakawa, T. et al. 2003. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. ProcNatl Acad Sci U S A 100(26): 1577615781. Spencer, C.A. and Groudine, M. 1990. Transcription elongation and eukaryotic gene regulation. Oncogene 5(6): 777-785. Tolia, N.H. and Joshua-Tor, L. 2007. Slicer and the argonautes. Nat Chem Biol 3(1): 3643. Wheeler, D.L., Church, D.M., Lash, A.E., Leipe, D.D., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Tatusova, T.A., Wagner, L. et al. 2002. Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res 30(1): 13-16. Wind, M. and Reines, D. 2000. Transcription elongation factor SII. Bioessays 22(4): 327336. 173 Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M., and Young, R.A. 2007. RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet. 174 Chapter 5 Examining miRNA function in mouse embryonic stem cells The experiments described in this chapter were performed with the help of Arvind Ravi, Grace X. Zheng, Stuart S. Levine, and Charlie Whittaker. 175 Introduction The work described in the preceding chapters of this thesis suggests that miRNAs are the sole regulatory short RNAs that function through the RNAi pathway in mouse embryonic stem cells. Here, a set of preliminary experiments is described that lay the foundation for future studies of miRNA function in mouse ES cells. Examination of large-scale sequencing data shows that although most miRNA genes express a single mature miRNA species, a minority of miRNA genes are alternatively processed to generate multiple mature miRNAs expected to each carry distinct regulatory potential. miRNAs were also shown to repress artificial target genes in ES cells in a manner that correlated linearly with their log-2 expression values, suggesting that the most highly expressed miRNAs dominate miRNA-mediated regulation in ES cells. Consistent with this hypothesis, preliminary experiments indicate that miRNA abundance is a useful predictor of endogenous miRNA target repression. Finally, the initial characterization of the growth defect induced by deletion of Dicer in ES cells is described. Steps towards the establishment of an experimental system in which this growth defect can be efficiently studied are also described. mRNA expression profiling of cells with and without functional Dicer shows, surprisingly, that loss of all miRNAs does not significantly alter mRNA expression profiles in ES cells. Results and discussion Alternate miRNA processing in ES cells. The extent of base pairing between the 5' ends of miRNAs and their targets is a critical determinant of miRNA specificity and activity in target repression. 3' end 176 complementarity appears generally less important for target repression (Lewis et al. 2003; Doench and Sharp 2004; Brennecke et al. 2005; Lewis et al. 2005). We therefore examined the variability of ES cell miRNA processing using the cDNA libraries described in Chapter 3 of this thesis. Because of the importance of the 5' end in target recognition, miRNA genes that express significant alternate processing products are expected to regulate additional target genes beyond that predicted by a single mature miRNA species. The starting point for this analysis were the 153 miRNA genes encoded by a single locus that had at least 30 aligning reads from the 4 ES cell cDNA libraries described in Chapter 3 of this thesis. Because many mature miRNAs are encoded by multiple genomic locations, and the pre-miRNA hairpins of these paralogous miRNA genes often have sequence differences between genomic locations, it is possible that miRNA paralogues are subject to differential processing. Therefore, mature miRNA end variability for paralogous miRNAs cannot be unambiguously assigned to a single miRNA hairpin, and these miRNA genes were excluded from the analysis described below. Also, hairpins with fewer than 30 aligning reads were not considered in the analysis so as to increase confidence in observed alternate processing products. The major miRNA produced from 64 of the 153 genes examined differed from the annotated sequence in the current release of the miRNA database (miRBase 10.0) (Griffiths-Jones et al. 2006). A full list of the differences is located at http://luria.mit.edu/caw web/ccr-bcc/mauro/miRBase-diffs.xls. The majority of these differences were 1 or 2 nucleotide additions or deletions at the miRNA 3' end, and thus are not expected to significantly alter miRNA function. 13 miRNA genes showed 177 significant 5' alternate processing compared to the annotated sequence in miRBase 10.0 (see Figure IB for selected examples). The hairpins encoding miRNAs 292, 363, and 367 each produce two major products from their 3' arms, and the hairpin encoding miR142 produces two major products from both its 5' and 3' arms. The alternate products produced by miR-142 have also been identified in primary T-cells, indicating these processing events are not ES cell specific (Wu et al. 2007). Because of the importance of 5' sequence in miRNA-based repression, these shifted miRNA pairs are expected to repress expression of different sets of genes, although this hypothesis has not yet been experimentally tested. Additionally, there were 12 miRNA genes whose major mature miRNA came from the hairpin arm opposite from the annotated sequence in miRBase 10.0. Interestingly, for the two examples shown in Figure 1C, the ES cell strand bias is also the opposite of what has been observed in a survey of miRNA expression from various mouse tissues (Landgraf et al. 2007). For example, miR-154 has a 9-fold bias towards expression from the 3' arm of its pre-miRNA hairpin in ES cells, whereas the sum of miR- 154 expression in various mouse tissues indicates a -3-fold bias towards 5' arm expression (Landgraf et al. 2007). Similarly, miR-411 has a 9-fold 3' arm bias in ES cells, whereas the tissue panel of mouse miRNA expression indicates a 12-fold 5' arm bias (Landgraf et al. 2007). If these sequencing results are verified by an independent method of miRNA quantification, they would suggest the existence of a novel pathway that directs the differential expression of mature miRNAs from pre-miRNA hairpin arms. miRNA expression levels correlate with repressive capacity. 178 Mammalian miRNAs are thought to be equally incorporated into different Argonaute-containing RISCs (Liu et al. 2004), and are presumed to function by translationally inhibiting or cleaving target mRNAs. The large-scale sequencing data described in Chapter 3 shows that approximately 300 miRNAs are expressed in ES cells at levels that span approximately 3 orders of magnitude (Figure 2A). It is not yet known if other mammalian cell types express similar numbers of miRNAs. Large-scale sequencing data from 6 different human colorectal cell lines potentially reveals fewer distinct miRNAs expressed per cell (Figure 2B), however differences in the number of sequences per cDNA library precludes an accurate comparison to the ES cell cDNA libararies (Cummins et al. 2006). It currently is unclear how the many ES cell miRNAs affect gene expression. Specifically, it is unknown at what level of expression a miRNA is expected to carry significant regulatory capacity. As a first step towards addressing this question, ES cell miRNAs expressed at varying levels were tested for their ability to repress an artificial miRNA target. Renilla luciferase reporters containing two perfectly complementary miRNA binding sites in / ES cells with an un-targeted their 3' UTRs were co-transfected into Dicer ' and Dicer- firefly luciferase to normalize for transfection efficiency. miRNA-mediated repression was estimated by comparing normalized targeted reporter expression to that of a reporter lacking miRNA-binding sites. The miRNAs tested, along with their estimated ES cell expression from Chapter 3, are shown in Table 1. In DicerF' ES cells, reporter expression correlated inversely with the log-2 transformation of miRNA expression values, consistent with the hypothesis that miRNA expression levels correlate with repressive capacity (Figure 3A). This inverse correlation was absent in Dicer- ES cells, 179 indicating it is dependent on miRNA expression (Figure 3B). Further, the correlation coefficient between reporter repression and miRNA expression increased when Dicer+/ was normalized to Dicer"- relative expression (Figure 3C). The observed increase in correlation suggests that comparison between the Dicer'" and Dicer" cell lines introduces an additional level of normalization, perhaps normalizing for variability in DNA plasmid quality. Together, these results show that the most highly expressed miRNAs have the largest regulatory capacity in ES cells. Although the correlation between miRNA expression and reporter repression is high, there are apparent exceptions. miR-467a is expressed at levels similar to miR-293 and miR-295, but the miR-467a reporter is repressed 3-5 fold less than the miR-293 and 295 reporters (Table 1). Similarly, miR-669a is expressed at similar levels to miR-21, but represses reporter expression approximately 4-fold less effectively (Table 1). The significance of these differences in repressive capacity is currently unclear. One potential hypothesis is that miR-467a and 669a have many more endogenous targets than miRNAs expressed at similar levels, and so fewer active RISCs containing these miRNAs are available for repression of the transfected reporter. Alternatively, because the reporter assay appears to be predominantly measuring mRNA cleavage and not translational repression (see explanation below), miR-467a and 669a may not be loaded as efficiently into Ago2-containing complexes, which are the only RISCs capable of catalyzing complementary mRNA cleavage. Because most miRNAs likely function by binding with imperfect complementarity to their target mRNAs, selected miRNAs from Table I were also tested for their ability to repress reporters containing two imperfect binding sites in their 3' 180 UTRs (Table 1, underlined rows; Figure 3, red dots). The apparent dynamic range of repression in this assay was low compared to the assay done with perfect reporters. Only the miR-293 and 295 bulged sites induced more than 2-fold repression, again suggesting that only the most abundant miRNAs have the potential to significantly alter target gene expression. Reporters with imperfectly complementary sites to miRNAs 16 and 21 were approximately 3-fold less repressed compared to reporters with perfectly complementary sites. Similarly, reporters with imperfectly complementary sites to miRNAs 293 and 295 were approximately 6-fold less repressed compared to reporters with perfectly complementary sites. These differences suggest that the repression of reporters with perfect miRNA binding sites is predominantly due to the perfectly matching miRNA and not related miRNAs with matching seeds. miRNA abundance is likely a useful predictor of endogenous target repression. Given the correlation between miRNA expression levels and repressive capacity, we examined whether the differential expression of luciferase reporters containing endogenous 3' UTRs between Dicer"'+ and Dicefr ES cells would correlate with the abundance of the miRNAs targeting those UTRs. For this analysis, twelve 3' UTRs over a range of targeting abundance (defined below) were cloned into the 3' UTR of Renilla luciferase and assayed for their ability to confer repression as in Figure 3C. The assayed 3' UTRs are listed in Table 2. miRNA:mRNA pairs listed in the TargetScan database were utilized to determine a list of miRNA binding sites per 3' UTR (Grimson et al. 2007). Contained in the database are all instances of miRNA "seed" matches to annotated Refseq 3' UTRs, defined as perfect complementarity between the mRNA and 181 bases 2-7 of the miRNA. To determine targeting abundance, the number of miRNA molecules per ES cell was summed for all annotated seed matches over each 3' UTR. To reduce noise from low abundance miRNAs, only miRNAs expressed at greater than 200 copies per ES cell were considered. When considering all binding sites to miRNAs expressed at greater than 200 molecules per ES cell, no strong correlation between targeting abundance and repression was observed for the UTRs tested (Figure 4A). Surprisingly, when considering only conserved miRNA binding sites (i.e. those conserved in human, mouse, rat, and dog), the correlation coefficient between repression and targeting abundance, as assessed via linear regression analysis, increased about 10fold (Figure 4B). These results are preliminary, and more UTRs need to be tested before confident interpretation; however, a potentially similar difference in functionality between conserved and non-conserved sites was noted in a previous studies analyzing miRNA-mediated mRNA down-regulation (Lim et al. 2005; Grimson et al. 2007; Nielsen et al. 2007). In the assay described in Figure 4, the simplest interpretation of a difference in normalized reporter expression between Dicer+..' and Dicer - ES cells is that the inserted UTR represses luciferase expression in a miRNA-dependent manner. The high degree of correlation between miRNA expression and repression described in Figure 3 argues that this simple interpretation is at least partially valid; however, the UTRs assayed in Figure 4 ranged in length between 1,000 and 1,700 base pairs, and so potentially contain nonmiRNA sequence elements that may be differentially affected between Dicer+/ and Dicer-' ES cells. These putative non-miRNA effects may at least partially explain the 182 imperfect correlation between conserved targeting abundance and WT/KO expression ratio. Characterizing the phenotypic effects of ES cell Dicer deletion. It is clear from previously published work and the results presented in this thesis that ES-like cells can survive in the absence of Dicer (Kanellopoulou et al. 2005; Murchison et al. 2005). However, although the cells that survive Dicer deletion appear ES-like in morphology and express normal levels of the pluripotency markers Oct4 and Nanog, they are incapable of differentiating in all assays tested and so are no longer functionally stem cells (Kanellopoulou et al. 2005; Murchison et al. 2005; Leung et al. 2006). This loss of pluripotency is likely partly due to the necessity of miRNAs expressed in differentiated cell types. However, miRNAs almost certainly have cell autonomous functions in ES cells, as ES cells express a specific set of miRNAs (Houbaviy et al. 2003), and Dicer deletion results in an acute loss in ES cell growth rate (described in Figure 5 below). The cell growth rate of clonal ES cell lines was monitored for three weeks immediately following deletion of Dicer activity with Cre recombinase. Three different Dicer conditionalES cell lines were used in the analysis: the Dicer+/+ ES cells described in Chapter 3, and two Dicer conditional ES lines derived from a cross between Dicer"/ (Harfe et al. 2005) and Dicerfl+/Cre-ERT2 mice, referred to below as cell lines 2FA and 2FC. To monitor growth rate, Dicer conditional ES lines were transfected with Cre recombinase, plated at clonal density 24 hours post-transfection, and 6 single colonies were picked from each cell line for further analysis. Because of the clonal selection, the 183 need to grow small numbers of ES cells on feeder cells, and the growth delay induced by Dicer loss, reliable growth counts were not obtained for all cell lines until 15 days post Cre-treatment. At final genotyping, 5 Dicer deletion lines were obtained, 3 from parental line 2FA and 2 from parental line 2FC. No deletion lines from the Dicer- l ES cell line were obtained; however, the growth recovery kinetics from the 2FA and 2FC Dicer deletion lines were similar to what was previously observed for the Dicer+'- deletion lines (J.M.C. and P.A.S., unpublished). The 5 deletion lines were compared in growth rate and morphology to 9 ES cell colonies picked from the same transfections that remained Dicer conditional, as assessed via genotyping PCR (not shown). The averaged growth rate of the deletion lines was approximately 2-3 fold lower when compared to the conditional lines, up to 18 days post Cre-treatment (Figure 5A). This difference in growth rate did not appear to be due to excessive cell death, as the Dicer deletion lines had normal ES cell morphology (Figure 5B for selected examples). After Day 18, the deletion lines showed a coordinated growth recovery, and at 3 weeks post Cre-treatment, their average growth rate was 21 hours per doubling as compared to the 16 hours per doubling of the control Dicer conditional ES lines (Figure 5A). The near simultaneous recovery of several clonal deletion lines argues against bypass of the growth defect by an acquired DNA mutation, as one would expect this to occur at random times throughout the 3 week time course. Instead, it seems more likely that an accumulated change in signaling or epigenetic state results in the growth recovery of Dicer deletion ES cells. Because a clear understanding of the growth defect induced by Dicer deletion is important for the understanding of ES cell miRNA function, attempts were made to 184 establish a system in which Dicer deletion could be induced efficiently in a population of ES cells. Such a system would allow a characterization of the acute effects induced by ES cell miRNA loss. ES cell lines were derived that were homozygous for the floxed allele of Dicer used throughout this thesis (Harfe et al. 2005), and heterozygous for an allele conferring tamoxifen-inducible Cre expression. This tamoxifen-inducible allele expresses Cre fused to the estrogen-binding domain of the estrogen receptor (ER), such that until the ER agonist tamoxifen is added to cells, Cre remains anchored nonfunctionally in the cytoplasm. This Cre-ER fusion has been shown to induce deletion of various floxed alleles upon tamoxifen addition at close to 100% efficiency in ES cells (Vallier et al. 2001). Dicere +mice harboring a single integration of the Cre-ER fusion allele in the Rosa26 locus (constructed by M. E. McLaughlin in the laboratory of T. Jacks) were obtained from M.S. Kumar (laboratory of T. Jacks) and crossed to Dicert mice to generate ES cell lines. Treatment of several different Cre-ER ES cell lines with varying levels of tamoxifen resulted in inefficient deletion of Dicer, such that no population-wide growth defect was detectable. At the highest level of tamoxifen treatment, 1.5 giM for 3 days, approximately 50% deletion of the floxed Dicer allele was detectable (A. Ravi, not shown). This poor efficacy of Dicer deletion compared to previously published (Vallier et al. 2001) and unpublished (M.E. McLaughlin) studies using the same Cre-ER allele is likely due to the strong selection against Dicer loss in ES cells coupled with the low amount of Cre-ER expression driven by the Rosa26 promoter. Higher deletion frequency may be achieved in future studies by expressing the Cre-ER fusion protein off of a strong ES cell promoter, such as the synthetic CAGGS promoter (Niwa et al. 1991). 185 Co-transfection of Cre and GFP followed by sorting of transfected cells was a much more effective strategy for Dicer deletion, resulting in polyclonal Dicer deletion populations that were close to 90% pure. Dicer conditional ES cells were transfected and sorted for GFP positive cells 24 hours after transfection. PCR genotyping, Dicer western blotting, and short RNA northern blotting for an abundant ES cell miRNA indicate that FAC-sorted ES cells have approximately a 90% reduction in miRNA levels 5 days postCre transfection (Figure 6A). Consistent with the described growth defect of Dicer deletion cells, there is a selection against Dicer loss after day 5 post-Cre transfection. At 9 days post-Cre, miRNA and Dicer protein levels are almost to pre-deletion levels (Figure 6A). It is currently unclear whether many Dicer null ES cells die after day 5, or they are simply out-competed by faster growing Dicer conditional cells. To gain insight into the potential change in ES cell identity induced by Dicer loss, microarray analysis was performed on mRNA from days 0, 5 and 9 post-Cre transfection (Figure 6A), and from 3 clonal Dicer null ES lines cultured for several months after Dicer deletion (Figure 6B). Replicate microarrays have not yet been performed and thus differences between expression of individual genes cannot be interpreted from the arrays shown in Figure 6B. However, it is clear that loss of Dicer from ES cells does not result in a major change in ES cell state. The 6 expression profiles are remarkably similar, with the two most divergent samples having a Pearson-correlation of 0.95 (Figure 6C). These results are preliminary but intriguing, suggesting that miRNAs do not have an essential role in governing ES cell expression profiles. Rather, specific transcription factors, likely Oct4, Sox2, Nanog, and potentially others, appear to be the dominant regulators of ES cell identity. Considering these results, it seems likely that the function of the major 186 miRNAs expressed in ES cells is to fine-tune the hard-wired output of these transcription factors, potentially as a means of optimizing the signaling pathways that govern rapid self-renewal. Methods Plasmid construction and transfection. 2x miRNA sites or single 3' UTRs were cloned into the XhoI-ApaI or Xhol-NotI sites in the pRL-CMV-FLAG 3' UTR from (Petersen et al. 2006). Transfections were performed as in (Calabrese and Sharp 2006). 2x bulged miRNA reporters were head-to-tail insertions of 2 copies of the sequence complementary to the annotated miRNA with an imperfect match from bases 9-13, usually comprised of TTTTT. 2x perfect reporters were head-to-tail insertions of 2 copies of the sequence complementary to the annotated miRNA. For 3' UTR reporters, sequence was obtained from the TargetScan database, and primers were designed that amplified all miRNA sites in each UTR, extending into flanking genomic sequence when necessary. Exact sequences of DNA oligos used to construct reporter plasmids are available upon request. ES cell culture and biological assays. ES cell culture and northern blots were performed as in (Calabrese and Sharp 2006). Western blots and genotyping were performed as in (Calabrese et al. 2007). RNA was prepared using Trizol (Invitrogen). For cell counts, ES cells were plated in 24-well wells on top of approximately 1 x 10^4 lethally irradiated MEFs, and counted using a hemacytometer. To estimate MEF contribution, MEF-only wells were also counted at 187 each trypsinization. For array analysis, 5 gg of RNA from each time point was hybridized to the Affymetrix 4302 chip. Expression data were summarized using GCRMA (http://www.bioconductor.org, using Spotfire software. and hierarchical clustering was performed 188 Table and Figure legends. Table 1. 2x perfect miRNA reporters used in Figure 3. Also shown are the molecules per Dicer` + ES cell of each miRNA, along with the relative reporter expression in Dicer+- + compared to Dicer- ES cells (WT/KO) and the inverse of this ratio, or fold repression of each miRNA reporter (KO/WT). The miR 467a and 669a constructs were made by G.X. Zheng. Table 2. 3' UTR reporters used in Figure 4. along with per UTR targeting abundance, considering either all binding sites (All sites log2 mpc), or only those sites conserved in mouse, rat, human, and dog (Conserved log2 mpc). All summations only consider those miRNAs expressed at greater than 200 copies per ES cell. Also shown is the relative reporter expression in Dicer" compared to Dicer- ES cells (WT/KO) and the inverse of this ratio, or fold repression of each 3' UTR reporter (KO/WT). Figure 1. Notable examples of alternate miRNA processing events from ES cell cDNA library analysis. The miRNA name is shown below each hairpin, with the total number of matching reads from the 4 ES cell cDNA libraries shown in parentheses. (A) miRNAs 15b and 106a are examples of miRNAs that have canonical processing patterns. For each hairpin, one major miRNA species is produced that starts and ends at a defined location. miRNAs originating from the opposite strand of each hairpin, termed miRNA* species, are detectable but represent a minority of accumulated sequences. Percentages indicate the total number of reads matching the highlighted sequence. (B) miRNA genes that 189 produce multiple mature miRNAs from the same hairpin arm. Percentages indicate the total number of reads initiating at the indicated nucleotide. (C) miRNA genes whose major mature sequences expressed in ES cells are on the opposite hairpin arm of the major annotated species. Percentages indicate the total number of reads matching the highlighted sequence. Figure 2. Distinct miRNAs expressed in ES cells. (A) The number of distinct miRNAs expressed in ES cells at the indicated copy number. (B) The number of distinct miRNAs detected in 6 human colorectal cancer cell lines (Cummins et al. 2006) compared to the ES cell libraries from Chapter 3 of this thesis. "Library size" refers to the number of sequence reads per cDNA library. In the case of the colorectal libraries, this number had to be approximated by dividing the total number of reported reads (266,430) by the number of libraries analyzed (6). Figure 3. miRNA repressive capacity correlates with copy number. R2 values refer to the linearity of relative 2x perfect reporter expression with the log-2 of Dicer - + miRNA copy number. (A) Ratio of 2x miRNA reporter expression divided by no-site reporter expression in Dicer' + ES cells. (B) Ratio of 2x miRNA reporter expression divided by no-site reporter expression in Dicer ES cells. (C) Ratio of relative 2x miRNA reporter expression in Dicer+/• ES cells divided by relative 2x miRNA reporter expression in Dicer~ ~ES cells. 190 Figure 4. Relative expression (WT/KO) for the 3' UTR luciferase constructs in Table 2, compared to the per UTR summation of the log2 transformation of the molecules per ES cell for seed-matching miRNAs. "Cumulative UTR mpc" refers to sum of the molecules per cell for miRNAs with seed matches to individual UTRs tested (A) All seed matches, (B) only those seed matches conserved in mouse, rat, human, and dog. Figure 5. Doubling time and morphology of Dicer conditional ES cells after transfection with Cre recombinase. (A) Hours per doubling of clonal ES cell lines picked after Cre transfection. The blue graph shows doubling times of cells successfully deleted for Dicer and the red graph cells that remained conditional for Dicer after Cre transfection. Growth rates are binned in two-day windows counting after Cre transfection. (B) Morphology and doubling times for selected knockout or conditional ES cell lines at day 15 post-Cre transfection. Circles highlight ES cell colonies. Knockout cells grow slowly but have a characteristic ES cell morphology. Figure 6. Expression analysis of Dicer conditional ES cells immediately following, and several months after Dicer deletion. (A) (i) Genotyping PCR indicating the relative quantities of floxed and deleted Dicer alleles in a sorted population of ES cells at the indicated days following Cre transfection. (ii) Dicer western blot analysis of the cell populations analyzed in panel (i). The bottom band is a non-specific hybridization that does not reproducibly appear using this antibody. (iii) Short RNA northern blot analysis of RNA from the cell populations analyzed in panel (i). Quantification of miR-292 relative to Day 0 cells is shown below the blots. The work shown in (A) was performed 191 by Arvind Ravi during a summer rotation project. (B) Dendogram of Affymetrix microarray expression analysis of RNA from Days 0, 5, and 9 post-Cre transfection, and from three clonal Dicer knockout lines cultured for approximately 2 months after Dicer deletion. KO #11 is the cell line that was analyzed for short RNA expression in Chapter 3. (C) Pearson-correlation coefficients between expression profiles in (B). The microarrays were performed in collaboration with S. S. Levine (laboratory of R. Young) and panels (B) and (C) were made by C. Whittaker. 192 miRNA 293 467a 295 19b 669a 21 16 18a 669d 34a 466k 466f 485 Table 1. Mol. per cell 5069 4435 4082 3918 3872 2272 1199 528 431 384 172 147 123 Rel. exp. (WT/KO) 0.05 0.16 0.03 0.06 0.29 0.37 0.21 0.41 0.56 0.54 0.72 0.78 0.71 Fold rep. (KO/WT) 19.8 6.4 33.8 15.4 3.5 2.7 4.8 2.4 1.8 1.9 1.4 1.3 1.4 193 3' UTR DEK NUDCD1 ABCC5 M6PR PTPN9 TMEM168 TOMM70A DAZAP2 YPEL5 SCML2 LATS2 ELL2 Table 2. All sites log2 mpc 10.9 11.1 15.2 17.5 23.7 24.4 41.1 45.1 63.0 67.4 68.0 145.8 Conserved log2 mpc 0.7 0.7 7.6 0.0 8.3 19.0 1.4 44.6 13.0 23.1 46.1 22.2 (WT/KO) 0.84 1.02 0.73 0.62 0.96 0.93 1.2 0.28 0.78 0.61 0.15 0.82 (KO/WT) 1.2 1 1.4 1.6 1 1.1 0.8 3.6 1.3 1.7 6.7 1.2 194 cAC A tUc AC U'A ti-C A-U C-G A UuUu U U-. U- A C-G A-U /t 0GA-U A Cu U-C A-ti C-ti c U C U-A c UmcA miRNA a99%/ (44 ti-A miRNA GA U G 950/% ACQ miRNA* A*U miRNA* U-u 1% Cu.AU CA U U A-U c a A-U c-G G-e A-U U c G-U U-A S-C - G*3 A c U-A C-G G-C U-A t-C A-U A-U C U U-A 6-C U-A S.A- U.3 miR-15b miR-106a (1158) (4444) uc UA6 c U-A6 B A c U C-t 6 A AC- U-4 AU tiU uU A-A u U 1 A G A c Q - 19( 0 43 A C- 6 A-U U-A C-tG A-U C-t G-U A-U t-C A-U U-A A-C A-U A-U U-A A-U 47%•-A G-C G-6 G 6 G-C a- tit UA U C-GG UU AA-UU AtiU C-G U-A CG ut "u A: U-A G-C UC:G c A-UG., $c: miR-142 (307) 3% mIR-292 (12207) 0 28 i 240o AUt AA u UA G A U U a A a a ti-ti A G U U A-U U-t a U cU 41%~ U·i~ A: U t-A ticA-U C-G 4• i/ AUA U-~P C G-C AU-A C t A-t A-t UC-t ti-C i-A U-A t*C GC A:UA A-U 3 6-C U-A Gti-tG C-G 100/0 --*U-c-G A ' A Uti-U C-ti Ct-C O - C-3 U-A U- miR-367 (725) miR-363 (876) UA UC U U U-A G-C C-t miRNA U A tUA GC· 9%(2405) U A miRNA* G6tiU A- c U- A c 6a 0GAA-UUU A iRNA UU: A' UmiR-154 c G·U AU-A 2. U A: AU Va 0p- Uý miR-154 (240S) Figure 1. miRNA 8%/i t AU A-Ui ti-CA Ati AU U 67 C-ti a-U U-A ruu: A A U i-t A AA 0 A C tc miRNA* I-4 A-: 70% t-C -A A A G-C UU-A U G A U mIR-411 (1648) 310/. 195 180 160 140 120 100 80 60 40 20 0 - - - - --- -- * -r- >1000 250-999 r- 50-249 0-49 miRNA molecules per ES cell Library # of miRNAs J1 306 Dicer 254 380N 156 380T 122 309N 151 309T 126 CACO-2 135 SW480 125 Figure 2. Reads per Library size distinct miRNA 104220 341 45320 178 44405 285 44405 364 44405 294 44405 352 44405 329 44405 355 DistinctmiRNAs per read 0.003 0.006 0.004 0.003 0.003 0.003 0.003 0.003 196 1.4 1.2 1 Dicer +/+ 0.8 0.6 *2x perfect *2x bulged 0.4 R 2 = 0.76 0.2 0 1 10 100 1000 molecules per cell 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 10000 Dicer-/* 2x perfect S2x bulged R 2 = 0.13 1 10 100 1000 molecules per cell 10000 1.4 1.2 1 Combined 0.8 0.6 0.4 0.2 0 * 2x perfect * 2x bulged R = 0.89 1 Figure 3. 10 100 1000 molecules per cell 10000 197 R 2= 0.046 1.5 All sites '0.5 0 0 25 50 75 100 125 150 Cumulative UTR mpc R 2 = 0.652 1.5 - lConserved sites T :0.5 + - 0 0 10 20 30 40 Cumulative UTR mpc Figure 4. 50 198 I 70 Cond 60 60 50 50 40 40 30 30 20 20 15/16 17/18 19/20 21/22 Days post Cre 43 hours per doubling 10 -m -m~ 15/16 Days post Cre 49 hours per doubling 22 hours per doubling Figure 5. 17/18 19/20 21/22 199 A B i) ii) -__ -T2- F--- -deleted -floxed __ am . 4-Dicer = .oma. & g iii) m GAPDH U6 Control pre-miR-292 0 . miR-292 C 1.00 C 0.99 Ct 0.98 KO# 0.97 KO# 0.96 KO# ,9 Figure 6. 40o*o* -4 ... 440 0.95 '*V I 10 09 0o * *to 200 References Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. 2005. Principles of microRNAtarget recognition. PLoS Biol 3(3): e85. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. ProcNatl Acad Sci USA 104(46): 18097-18102. Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12): 2092-2102. Cummins, J.M., He, Y., Leary, R.J., Pagliarini, R., Diaz, L.A., Jr., Sjoblom, T., Barad, O., Bentwich, Z., Szafranska, A.E., Labourier, E. et al. 2006. The colorectal microRNAome. ProcNatl Acad Sci USA 103(10): 3687-3692. Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in translational repression. Genes Dev 18(5): 504-511. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34(Database issue): D140-144. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27(1): 91-105. Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. ProcNatl Acad Sci USA 102(3 1): 10898-10903. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev 19(4): 489-501. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M. et al. 2007. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129(7): 1401-1414. Leung, A.K., Calabrese, J.M., and Sharp, P.A. 2006. Quantitative analysis of Argonaute protein reveals microRNA-dependent localization to stress granules. Proc Natl AcadSci USA 103(48): 18125-18130. 201 Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): 15-20. Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. 2003. Prediction of mammalian microRNA targets. Cell 115(7): 787-798. Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, D.P., Linsley, P.S., and Johnson, J.M. 2005. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433(7027): 769-773. Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. 2004. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305(5689): 1437-1441. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005. Characterization of Dicer-deficient murine embryonic stem cells. Proc Natl Acad Sci USA 102(34): 12135-12140. Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B. 2007. Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 13(11): 1894-1910. Niwa, H., Yamamura, K., and Miyazaki, J. 1991. Efficient selection for high-expression transfectants with a novel eukaryotic vector. Gene 108(2): 193-199. Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress translation after initiation in mammalian cells. Mol Cell 21(4): 533-542. Vallier, L., Mancip, J., Markossian, S., Lukaszewicz, A., Dehay, C., Metzger, D., Chambon, P., Samarut, J., and Savatier, P. 2001. An efficient system for conditional gene expression in embryonic stem cells and in their in vitro and in vivo differentiated derivatives. Proc Natl Acad Sci USA 98(5): 2467-2472. Wu, H., Neilson, J.R., Kumar, P., Manocha, M., Shankar, P., Sharp, P.A., and Manjunath, N. 2007. miRNA Profiling of Naive, Effector and Memory CD8 T Cells. PLoS ONE 2(10): e1020. 202 Conclusions and future directions 203 This thesis presents a quantitative analysis of ES cell short RNA populations. Short RNA fragments of larger non-coding RNAs are present in ES cells at levels high enough to be detected via standard hybridization-based protocols, and, though they have no known function, are apparently generated by non-random mechanisms. Additionally, Pol II transcription start sites associate with a novel class of low abundance bidirectionally oriented short RNAs. Finally, contrary to initial hypotheses based on the functions of RNAi in various non-mammalian organisms, miRNAs appear to be the sole short RNAs expressed through a Dicer-dependent pathway in mouse ES cells. The current challenge is to fully understand how these various short RNAs contribute to ES cell biology. The sequencing of short RNAs associated with the double-stranded RNA binding protein, P19, led to the observation that short fragments of rRNA were present in ES cells in a non-random distribution along the lengths of the mature 18S, 5.8S, and 28S species. High-throughput sequencing of ES cell short RNAs strengthened this initial observation, showing an almost identical profile of short rRNA fragments in ES cell lines with and without functional Dicer. These short rRNAs are expressed in ES cells at levels easily detectable by short RNA northern blotting, and as a class they are relatively abundant, present at about 9,000 copies per ES cell; however, reporter assays indicate that they do not have miRNA-like repressive capability. The function of these short rRNA species is entirely unknown. Although the reporter assays suggest otherwise, it would be of use to know whether short rRNAs associate with Argonaute proteins in ES cells. It would also be of interest to measure short rRNA half-life in ES cells. A long half-life would 204 potentially suggest the existence of a mechanism for short rRNA stabilization, and may be consistent with an unknown function, whereas a short half-life would perhaps suggest that rRNA fragments represent transient, non-functional degradation products of mature rRNA species. An additional puzzling observation is that short rRNA species associate with P19 in ES cells but not in the human embryonic kidney cell line, 293T cells. Determining whether P19 associates with short rRNA species in other mouse and human cell lines may provide additional insight into their functionality. High-throughput sequencing also identified a class of low abundance short RNAs that cluster in close proximity to the transcription start sites of protein-coding genes. These RNAs exhibit a strikingly non-random distribution around transcription start sites, flanking a region of promoters that consistently exhibits low nucleosome density (Lee et al. 2004; Giresi et al. 2007; Ozsolak et al. 2007). The existence of these short RNAs raises the possibility that the bidirectional initiation of Pol II at nucleosome-depleted regions may be a previously unobserved, general feature of transcription. Further, because productive Pol II elongation is unidirectional from most promoters, the existence of bidirectionally-oriented, promoter-associated short RNAs potentially indicates that additional signals, perhaps the concerted action of pTEFb and TFIIS, impose the unidirectionality of productive elongation post-initiation. A host of experiments are needed to verify these hypotheses. Perhaps the most illuminating will be those that prove the existence of bidirectional Pol II transcription at many ES cell genes. The position of an anti-sense Pol II is inferred by the presence of promoter proximal anti-sense short RNAs; however, physical detection of an anti-sense promoter proximal Pol II is needed before this conclusion can be definitively made. 205 Additionally, it will be important to experimentally determine the frequency of sense vs. anti-sense initiation. Individual promoter proximal short RNAs are expressed at an estimated 0.2 copies per cell, and approximately 50% of promoter proximal short RNAs are in an anti-sense orientation. These observations raise a number of important questions. Is the number of bidirectional initiation events equal to the number of antisense short RNAs present in an ES cell? That is, is anti-sense initiation an event that occurs at a single gene in approximately one out of ten ES cells? Or does anti-sense initiation occur at a higher frequency, and associated short RNAs are either not generated or escaped detection using our cDNA preparation protocol? Moreover, at a single gene, is there contemporaneous existence of bidirectionally oriented transcription complexes? These questions await the application of technique that can detect the polarity of DNAbound Pol II at single-nucleosome resolution, perhaps via potassium permanganate cleavage assays or chromatin immunoprecipitation and sequencing of Pol II-bound DNA after digestion with micrococcal nuclease. It will be of great interest to see if promoter proximal short RNAs can be detected in cell types other than ES cells. Of particular interest would be the short RNAs expressed in Drosophila S2 cells, where recent work has shown the frequent occurrence of Pol II pausing downstream of transcription start sites (Muse et al. 2007; Zeitlinger et al. 2007). It is possible that previously described Pol II pausing phenomena are associated with bidirectional initiation. The RNAi pathway has a conserved role in the silencing of repeating and transposable elements, in many cases through the action of Dicer-dependent siRNA molecules. Dicer-dependent siRNA species have yet to be documented in mammals, and 206 the work presented in Chapter 3 of this thesis shows no evidence for expression of siRNA-like molecules in ES cells. Highly repetitive short RNAs are expressed in ES cells, but at very low levels and thus not expected to be functional. Coupled with the Dicer-dependent repression of repeating elements in other mammalian cell types (Watanabe et al. 2006; Yang and Kazazian 2006; Murchison et al. 2007), these observations raise the possibility that repeat-derived miRNAs may in certain cases function similarly to putative repeat-derived siRNAs: by silencing fully or partially complementary repeating elements. Preliminary experiments show no consistent deregulation of abundantly encoded LINE and SINE repeats in Dicer knockout ES cells compared to controls (Figure 1), suggesting that miRNAs do not play a major role in the destabilization of LINE- and SINE-derived RNA in this ES cell line. However, this negative result does not preclude a role for repeat-derived miRNAs in the posttranscriptional suppression of complementary repeats, nor does it preclude a role for miRNAs in the silencing of complementary repeats in the early embryo or other developmental stages. It is unclear to what extent mammalian repeating elements are silenced through an RNAi-based mechanism. The expression of repeat-derived miRNAs provides a tangible mechanism by which the RNAi pathway could repress repetitive element propagation. Additionally, repeat-derived miRNAs could serve as a means to coordinately repress diverse repeat containing mRNAs. Different classes of transposable elements, as well as fusion transcripts between transposable elements and mRNAs, show differential expression in the earliest stages of mouse development (Peaston et al. 2004). It is possible then, that repeat-derived miRNAs may coordinately regulate expression of target 207 repeat-containing mRNAs in a stage-specific manner during early development. Further, it is possible that other currently undefined classes of short RNAs silence mammalian repeats in early development; alternatively, repetitive sequences may not be under RNAibased control in mammals, although the existence of mammalian piRNAs suggests that repeating sequences may be repressed via RNAi in the germline. In-depth sequence analysis of short RNA species from early mouse development will be necessary to further evaluate the potential role for RNAi in the silencing of repeating elements. The large amount of short RNA sequence data from ES cells provided interesting examples of non-canonical miRNA processing events. Many miRNA genes were found to express overlapping but separate mature miRNAs from the same hairpin arm, potentially increasing the regulatory capacity of these miRNA genes. Further, it was found that certain miRNA genes express major ES cell products originating from the hairpin arm opposite from what has been previously observed in other cell types. This observation suggests the existence of a pathway that directs the differential incorporation of mature miRNAs into the RISC. Direct quantification of these differences in ES cells and other differentiated cell types will be the first step towards testing this hypothesis. Deletion of Dicer from mouse ES cells results in an acute drop in growth rate, followed by an apparent recovery approximately 3 weeks post Dicer deletion. Dicer knockout ES cells are also incapable of differentiation (Kanellopoulou et al. 2005). The observation that Dicer's sole catalytic role in ES cells is to generate miRNAs suggests that these phenotypic effects are entirely due to loss of miRNA expression. Surprisingly, the preliminary data described in Chapter 5, where mRNA expression of ES cells was profiled immediately and several months after Dicer deletion, shows few major 208 expression changes between cells with and without Dicer. Together, these observations suggest that the core transcription factors in ES cells dominantly control identity, and abundant ES cell miRNAs tune this transcriptional output to increase the rate of ES cell self-renewal. Additionally, miRNAs are required for the transition from pluripotency to a differentiated cell state. Consistent with the former hypothesis, many of the most abundant ES cell miRNAs have documented roles in cell-cycle control or oncogenesis (He et al. 2005; Si et al. 2006; Voorhoeve et al. 2006; Linsley et al. 2007). In the most poignant example, expression of early embryo specific miRNAs in primary fibroblasts is sufficient for bypass of Ras-induced senescence, indicating that these miRNAs can manipulate existing expression profiles to induce cell division (Voorhoeve et al. 2006). It is possible that ES cells express a set of miRNAs that primarily function to promote self-renewing cell division, and it would therefore be of great interest to determine if other known stem cell populations express sets of miRNAs similar to those found in mouse ES cells. miRNA expression levels were found to correlate well with repressive capacity. Surprisingly, only the most abundant ES cell miRNAs, those expressed at greater than 1,000 copies per ES cell, silenced reporter gene expression by 5-fold or more. This observation suggests, that although ES cells express approximately 300 different miRNA species, only the -30 most highly expressed contribute significantly to miRNA-mediated repression. Consistent with this hypothesis, preliminary experiments indicate that the abundance of the miRNAs targeting specific 3' UTRs is a useful predictor of repression. Notably, this predictive measure only appeared true for miRNA target sites that were conserved between human, mouse, rat, and dog, suggesting that additional factors outside 209 of the miRNA target site are important for repression. It will be of great interest to see if these trends hold true in other cell types. Recently, a quantitative model of miRNAmediated repression was described that predicts the efficacy of different miRNA target sites, incorporating both target-site-dependent and -independent variables (Grimson et al. 2007). Incorporating endogenous miRNA abundance with this model of target site efficacy may be an accurate way to predict the extent of miRNA-mediated repression for individual genes. From a basic biological standpoint, it is important to place the discoveries and hypotheses generated from mouse ES cells in the context of the early mouse embryo. ES cells are derivatives of the inner cell mass (ICM) of the pre-implantation blastocyst, a cell compartment that exists for only a few hours during mouse development. During the life of the inner cell mass, it needs to rapidly divide while maintaining pluripotent. Rapid cell division and pluripotency are hallmarks of ES cells, and so the proposed role for ES cell miRNAs in promoting in vitro self-renewal is consistent with a similar role in vivo. However, there are differences between the ICM and ES cells, and so it is possible that certain abundant ES cell miRNAs have no discernable role in ES cells, but have critical functions in the ICM, or that certain abundant miRNAs expressed in the ICM are not expressed in ES cells. Nevertheless, the understanding of in vitro ES cell biology has important clinical implications. The guided differentiation of ES cells into various tissue types raises the possibility that ES-like cells may be a future tissue source in regenerative therapies (Pera and Trounson 2004). Remarkably, 4 transcription factors, including Oct4 and Nanog, are sufficient to turn differentiated mouse or human cells pluripotent (Takahashi and 210 Yamanaka 2006; Okita et al. 2007; Takahashi et al. 2007; Wernig et al. 2007; Yu et al. 2007). This finding is a large step forward for the eventual application of regenerative therapy. Thus the understanding of ES cell gene regulatory networks, including those governed by short RNAs, will likely have direct clinical applications. 211 Figure 1. Full-length RNA northern blot probed for GAPDH, LINE, and SINE. The first three lanes represent triplicate collections of Dicer+" ES cell RNA. The following four lanes show RNA from clonally derived deletion lines, followed by a lane loaded with Jl ES cell RNA, and a lane with RNA from NIH 3T3 fibroblasts. The short RNA expression analysis from Chapter 3 was performed on the Dicer+lVand KO-11 ES lines. Because the SINE RNA probe had such extensive hybridization, the entire blot is shown, with the locations of the 28S and 18S rRNA, and the likely full-length SINE RNA, marked beside the blot. The northern blot was performed as in (Calabrese and Sharp 2006). 212 I 4O Parent II- . W, Dicer +/+ 0 0 0 0••d w4Z GAPDH LINE [28s rRNA) [18s rRNA) SINE BI Figure 1. 213 References Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12): 2092-2102. Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. 2007. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17(6): 877-885. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27(1): 91-105. He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S., Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J. et al. 2005. A microRNA polycistron as a potential human oncogene. Nature 435(7043): 828833. Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T., Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Genes Dev 19(4): 489-501. Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. 2004. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36(8): 900-905. Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R., Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H. et al. 2007. Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol 27(6): 2240-2252. Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon, G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693. Muse, G.W., Gilchrist, D.A., Nechaev, S., Shah, R., Parker, J.S., Grissom, S.F., Zeitlinger, J., and Adelman, K. 2007. RNA polymerase is poised for activation across the genome. Nat Genet. Okita, K., Ichisaka, T., and Yamanaka, S. 2007. Generation of germline-competent induced pluripotent stem cells. Nature 448(7151): 313-317. 214 Ozsolak, F., Song, J.S., Liu, X.S., and Fisher, D.E. 2007. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol 25(2): 244-248. Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D., and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7(4): 597-606. Pera, M.F. and Trounson, A.O. 2004. Human embryonic stem cells: prospects for development. Development 131(22): 5515-5525. Si, M.L., Zhu, S., Wu, H., Lu, Z., Wu, F., and Mo, Y.Y. 2006. miR-21-mediated tumor growth. Oncogene. Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and Yamanaka, S. 2007. Induction of Pluripotent Stem Cells from Adult Human Fibroblasts by Defined Factors. Cell. Takahashi, K. and Yamanaka, S. 2006. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126(4): 663-676. Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P., van Duijse, J., Drost, J., Griekspoor, A. et al. 2006. A genetic screen implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell 124(6): 1169-1181. Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N., and Imai, H. 2006. Identification and characterization of two novel classes of small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes. Genes Dev 20(13): 1732-1743. Wernig, M., Meissner, A., Foreman, R., Brambrink, T., Ku, M., Hochedlinger, K., Bernstein, B.E., and Jaenisch, R. 2007. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448(7151): 318-324. Yang, N. and Kazazian, H.H., Jr. 2006. L 1 retrotranspositior is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol 13(9): 763-771. Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S., Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R. et al. 2007. Induced Pluripotent Stem Cell Lines Derived from Human Somatic Cells. Science. Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M., and Young, R.A. 2007. RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat Genet. 215 J. Mauro Calabrese Biographical note Education Ph. D. Biology 2007 Massachusetts Institute of Technology, Cambridge, MA B.S. Chemistry, Biochemistry, and Molecular Biology 2001 University of Wisconsin-Madison, Madison, WI Awards Phi Beta Kappa Elvehjem Scholarship for Excellence in Biochemistry University of Wisconsin Alumni Foundation Scholarship Order Sons of Italy Scholar Research Experience Graduate Student, MIT Department of Biology, 2002-2007 Advisor: Phillip A. Sharp, Ph.D. Doctoral Thesis: Dicer deletion and short RNA expression analysis in mouse ES cells Research Assistant, Department of Biochemistry, University of Wisconsin-Madison, 2000-2001 Advisor: Hector Deluca, Ph.D. Project: Purifying a novel co-activator of the aryl-hydrocarbon receptor from pig lung Research Assistant, Department of Ophthalmology, University of Wisconsin-Madison 1999-2000 Advisor: Len Levin, M.D., Ph.D. Project: Synthesizing a photoactivatable inducer of the Tet-On gene-expression system Research Assistant, Department of Bioinorganic Chemistry, University of Delaware 1998 Advisor: Charles Riordan, Ph.D. Project: Chemical modeling of anaerobic bacterial methanogenesis Publications 216 J. Mauro Calabrese*, Amy C. Seila*, Gene W. Yeo, and Phillip A. Sharp. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. PNAS 104:18097-18102. J. Mauro Calabrese and Phillip A. Sharp. (2006) Characterization of the short RNAs bound by the p19 suppressor of RNA silencing in mouse embryonic stem cells. RNA 12: 2092-2102. Anthony K. L. Leung, J. Mauro Calabrese, and Phillip A. Sharp. (2006) Quantitative analysis of argonaute protein reveals microRNA-dependent localization to stress granules. PNAS 103: 18125-18130. * denotes equal contribution. Teaching Experience Teaching Assistant, 2006 Course Title: Principles of Human Disease Instructors: David E. Housman and Jacqueline A. Lees Teaching Assistant, 2004 Course Title: Undergraduate Cell Biology Instructors: Angelika Amon and Harvey F. Lodish Presentations Oral presentation: "In depth analysis of embryonic stem cell short RNA expression: Observations and functional implications" Center for Cancer Research Fall Retreat, Waterville Valley, NH, 2007. Oral presentation: "In depth analysis of embryonic stem cell short RNA expression." RNA society conference, Madison, WI, 2007. Poster presentation: "RNAi and the silencing of mammalian repetitive elements." Genomic Impact of Eukaryotic Transposable Elements, Asilomar, CA, 2006. Poster presentation: " RNAi and the silencing of mammalian repetitive elements." Keystone symposium on RNA interference, Vancouver, Canada, 2005.