Examination of mammalian microRNAs by high-throughput sequencing By HyoJin Rosaria Chiang B.S., Molecular Biophysics and Biochemistry and Economics (2005) Yale University SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASSACHUsETS INSTIE OF TECHNOLOGY DOCTOR OF PHILOSOPHY AT THE MAY 2 5 2011 MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2011 LIBRARIES ARCHIVES @ 2011 Massachusetts Institute of Technology All rights reserved Signature of Author................................ Department of Biology May 17, 2011 Certified by ................................................... David P. Bartel Professor of Biology Thesis Supervisor Accepted by............................................. . 1. ........ Alan D. Grossman Professor of Biology Chairman, Graduate Committee Examination of mammalian microRNAs by high-throughput sequencing By HyoJin Rosaria Chiang Submitted to the Department of Biology on May 17, 2011 In Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy ABSTRACT Small non-coding RNAs play an important role in a wide range of cellular events. MicroRNAs (miRNAs) are an abundant class of small RNAs that post-transcriptionally repress expression of their target genes. Since miRNA targeting is based on its sequence, accurate and comprehensive annotation of miRNA genes is fundamental to understanding miRNA gene regulation. Advances in high-throughput sequencing technology have led to discoveries of novel small RNA genes and identifications of their properties. We describe a method for construction of small-RNA library for Illumina sequencing platform that improves upon previous efforts. Sequencing data from small-RNA libraries constructed using this protocol can be used to profile small RNAs from a broad range of samples. In particular, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. The analysis of the data provide a substantially revised list of confidently identified murine miRNAs, thereby providing a more accurate picture of the general features of mammalian miRNAs and their abundance in the genome. In addition, our results revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan pre-miRNA, cases of consequential 5' heterogeneity, newly identified instances of miRNA editing, and widespread pre-miRNA uridylation reminiscent of Lin28-like miRNA regulation. Thesis Advisor: David P. Bartel Title: Professor of Biology I would like to thank my collaborators who have studied murine miRNAs with me, especially Lori Schoenfeld, Wendy Johnston, Noah Spies, and Vincent Auyeung. I would especially like to thank Graham Ruby for introducing me to computational biology and Daehyun Baek for advices on both scientific and personal endeavors. I would like to thank the members of the Bartel lab for their support and discussion, especially Calvin Jan, Andrew Grimson, Mike Axtell, Anna Drinnenberg, Sue-Jean Hong, Gina Lafkas, Huili Guo, and Vikram Agarwal. I would like to thank my committee members for guiding me through my graduate career: Phil Sharp, Richard Hynes, Chris Burge, and Nelson Lau. I would especially like to thank my advisor Dave Bartel for his guidance and patience throughout the years even as I sometimes struggled to find my way. I would like to thank my classmates in the biology program for the exciting first year in the Pit and friendships throughout the years. I would especially like to thank Leah Okumura, Jen Leslie, Robin Stevens, and Jadyn Damon. I would like to thank the members of the Sidney-Pacific graduate community, in particular the past and present SPEC+ members for being my family away from the family: Swati Mohan, Wendy Iskendarian, Michelle Sanders, Jane Kim, Robert Wang, Matt Eddy, Ben Mares, Alex Lewis, Roger and Dottie Marks, Annette Kim, Roland Tang, and Joshua Tang. I would like to thank my friends who have supported me throughout the years: George Burkhard, Jinhee Chung, Jane Huh, and Jennie Johnson. I would like to thank my family for their support throughout the years, particularly for believing in me and supporting my decision to study abroad. I would especially like to thank my brother HyoSang Chiang for proofreading this thesis. Finally, I would like to thank my fiance Nan Gu for our times together, for more years to come, and for showing me that regardless of the path I choose in life, I do not have to walk it alone. Table of Contents Chapter I Chapter 2 Chapter 3 Chapter 4 Appendices Abstract Acknowledgements Table of contents Introduction Method for construction of small RNA libraries for Illumina high-throughput sequencing platform Mammalian microRNAs: experimental evaluation of novel and previously annotated genes Future directions Appendix A-D 3 5 7 9 41 59 129 141 Chapter tables of contents Chapter 1 Introduction Discovery of microRNAs Canonical miRNA biogenesis Transcription of pri-miRNAs Nuclear processing of pri-miRNAs by Microprocessor Nuclear export of pre-miRNAs by Exportin-5 Cytoplasmic processing of pre-miRNAs by Dicer RISC loading Noncanonical miRNA biogenesis MicroRNA function Global miRNA gene discovery Computational prediction of miRNA genes MicroRNA gene discovery by second-generation sequencing State of miRNA annotations Summary Figure legends References Figures Chapter 2 Method for construction of small-RNA libraries for Illumina high-throughput sequencing platform Abstract Introduction Method Overview Protocol Concluding remarks Figure legend References Figure 9-39 10 13 13 14 16 17 18 19 20 21 21 24 27 28 28 29 38 41-58 42 42 43 43 44 55 56 56 58 Chapter 3 Mammalian microRNAs: experimental evaluation of novel and previously annotated genes Abstract Introduction Results MicroRNA gene discovery Experimental evaluation of unconfirmed microRNAs Experimental evaluation of novel microRNAs and new candidates MicroRNA expression profiles General features of mammlian microRNAs MicroRNAs processed from both arms, with occasional tissue-specific differences in the preferred arm Sequential Dicer cleavage of a mirtron hairpin 5' Heterogeneity RNA editing Untemplated nucleotide addition Discussion The status of microRNA gene discovery in mammals Unknown features required for Drosha/Dicer processing Methods Figure legends Acknowledgements References Figures and tables Supplemental figures and tables 59-127 60 60 63 64 65 71 73 73 76 77 79 81 83 84 84 88 89 95 99 99 105 114 Chapter 4 Future directions MicroRNA gene annotations MicroRNA gene discovery by sequencing Computational prediction of miRNAs MicroRNAs mapping to multiple loci MicroRNA isoforms Dicer-independent and Ago2-dependent miRNAs Arm-switching miRNAs De novo prediction of piRNA clusters Acknowledgements References 129-140 130 131 132 134 136 136 137 138 139 139 Appendices Appendix A-D Appendix A Appendix B Appendix C Appendix D 141-184 143 155 161 171 Chapter 1 Introduction The word "gene" was coined by Wilhelm Johannsen to describe a unit of heredity (Johannsen 1911). Subsequently, the "one gene, one enzyme" hypothesis was proposed, suggesting that a single gene encodes one protein (Beadle and Tatum 1941). With the identification of DNA as a carrier of genetic material (Avery et al. 1944), the definition of a gene evolved to a DNA segment in the genome that encodes a protein. However, the discovery of functional non-coding RNAs, such as ribosomal RNA (rRNA) and transfer RNA (tRNA), further broadened this definition to include genomic regions that encode non-coding RNAs. While small non-coding RNAs smaller than tRNA were generally thought to be degradation fragments, the discovery of RNA interference (RNAi) has shifted this perspective. Gene silencing by RNAi was originally identified as "cosuppression" in plants (Napoli et al. 1990; van der Krol et al. 1990) and subsequently identified in animals (Fire et al. 1998). When a double-stranded RNA (dsRNA) is introduced into a cell, the dsRNA is cleaved into small RNAs of -22 nucleotides (nts), known as small interfering RNAs (siRNAs) (Zamore et al. 2000; Bernstein et al. 2001). The siRNA can guide the RNAi machinery to its target transcript (Elbashir et al. 2001). Although endogenous siRNAs (endo-siRNAs) are present in nematodes and insects (Ambros et al. 2003b; Czech et al. 2008; Ghildiyal et al. 2008), introduction of dsRNAs triggers interferon response in the majority of mammalian cell types (Sen and Sarkar 2007). Germline cells, however, do not activate interferon response in the presence of dsRNA (Svoboda et al. 2000), and sequencing small RNAs from mouse oocytes and ES cells revealed that they contain endo-siRNAs (Babiarz et al. 2008; Tam et al. 2008; Watanabe et al. 2008). Another class of small RNAs, known as PIWI-interacting RNAs (piRNAs), is also present in gonadal cells (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006; Lau et al., 2006). One subclass of piRNAs maps to repetitive regions of the genome, and they have been associated with suppressing transposons (Aravin et al., 2007; Brennecke et al., 2007). The second class of piRNAs maps to non-repetitive regions, and they are abundant in the pachytene stage of meiosis, but their roles have not yet been clarified (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006; Lau et al., 2006). The most abundant and ubiquitous class of small RNAs is microRNAs (miRNAs). MicroRNA genes give rise to ~22 nt non-coding RNAs that can posttranscriptionally regulate gene expression (Bartel 2004). These RNAs play a role in a wide range of biological events, such as stem cell self-renewal, differentiation, proliferation, immunity, and cancer (Huang et al. 2010). Since its first discovery in 1993, computational and experimental methods have been used to annotate more than 15,000 miRNA genes in miRBase, a miRNA database (Griffiths-Jones 2004; Griffiths-Jones et al. 2006). This chapter reviews animal miRNA biogenesis, function, and discovery, focusing on those in mammals. Discovery of microRNAs The first miRNA gene lin-4 was identified through a genetic screen for cell lineage ("heterochronic") aberrations in Caenorhabditiselegans (Horvitz and Sulston 1980; Chalfie et al. 1981). Animals with loss-of-function (LOF) mutations in lin-4 contained cells that repeated a larval developmental program similar to animals with gain-offunction (GOF) mutations in lin-14 (Chalfie et al. 1981; Ambros and Horvitz 1984). Subsequently, it was discovered that lin-14 was required for manifestation of lin-4 LOF mutation and that lin-4 LOF animals had higher lin-14 activity (Ambros 1989). These results suggested that lin-4 was a negative regulator of lin-14 (Ambros 1989; Ruvkun and Giusto 1989). When lin-14 was cloned and its two GOF mutants were analyzed, it was revealed that the mutations mapped to the 3' untranslated region (UTR) (Ruvkun et al. 1989; Wightman et al. 1991) Together, these findings led to the hypothesis that the gene product of lin-4 may directly bind to or activate a factor that binds to a regulatory element in the 3' UTR of lin-14 to inhibit LIN14 protein production (Arasu et al. 1991; Wightman et al. 1991). When lin-4 was cloned, however, its gene products turned out to be untranslated RNA molecules of 22 and 61 nts instead of a protein (Lee et al. 1993). A concurrent study found that the negative regulation of lin-14 by lin-4 was conserved to Caenorhabditisbriggsae (Wightman et al. 1993). The analysis of the conserved sequences in the lin-14 3' UTRs of C. elegans and C. briggsae revealed that there are multiple regions in the lin-14 3' UTR that are complementary to the lin-4 RNA sequence (Wightman et al. 1993). These results favored the model in which the lin-4 RNA binds to the 3' UTR of the lin-14 mRNA to negatively regulate LIN14 production through inhibition of post-transcriptional processing, transport, or translation (Lee et al. 1993; Wightman et al. 1993). Shortly thereafter, lin-28 was identified to be another gene that was regulated through its 3' UTR by lin-4 (Moss et al. 1997). While discovery of the lin-4 gene product and its inhibition of lin-14 and lin-28 via 3' UTR provided a novel paradigm for gene regulation by small RNAs, it was not until 2000 that the second non-coding small RNA gene with similar properties was discovered (Reinhart et al. 2000; Slack et al. 2000). Like lin-4, let-7 was identified through a genetic screen for heterochronic genes in C. elegans (Reinhart et al. 2000). When let-7 was mapped, no protein-coding genes could be predicted from the sequence. Instead, a 21 nt RNA transcript was detected by Northern blot (Reinhart et al. 2000). Given the precedent of lin-4 regulation of other heterochronic genes, the 3' UTRs of heterochronic genes were examined for complementarity to the let-7 RNA sequence. One of the predicted targets, lin-41, was experimentally shown to be regulated by let-7 (Reinhart et al. 2000; Slack et al. 2000). Unlike lin-4, which is only present in nematodes, let-7 is widely conserved to other animals (Pasquinelli et al. 2000). This observation led to anticipation of more discoveries of stage-specific small endogenous RNAs that control development, and lin-4 and let-7 became the founding members of "small temporal RNAs" (stRNAs). With the goal of identifying other stRNAs, the Ambros, Bartel, and Tuschl labs led the efforts to clone small RNAs from C. elegans, Drosophilamelanogaster,and HeLa cells (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). The results of these studies revealed that there were many more small RNA genes that resembled lin-4 and let-7. These genes mapped to the regions in the genome that could fold into stable stem-loop structures, and the longer precursor of ~65 nts could be detected by Northern blot for some of the genes. Although both lin-4 and let-7 mapped to the 5' arm of their hairpin precursors, the newly cloned small RNAs could come from either the 5' or the 3' arm. Like let-7, many of these miRNAs were conserved to other species, suggesting that they could have conserved biological functions. Unlike lin-4 and let-7, however, many of the small RNAs did not exhibit specific temporal expression, and some were expressed only in specific cell types. Consequently, these small RNAs were renamed miRNAs. Canonical miRNA biogenesis MicroRNAs mature through three intermediates: primary miRNA transcript (primiRNA), precursor miRNA (pre-miRNA), and miRNA:miRNA* duplex (Figure 1A, top) (Lee et al. 2002). A pri-miRNA folds into a hairpin with -33 base-pair (bp) stem after transcription, and it is cleaved in the nucleus to produce -65 nt pre-miRNA. The resulting hairpin is then exported to the cytoplasm and is cleaved into -22 nt miRNA. The mature miRNA is loaded into RNA-induced silencing complex (RISC) whereas the passenger strand (miRNA*) dissociates from the complex and is degraded. Transcription of pri-miRNAs When the first miRNA gene products were identified, two RNA species of ~22 nts and -65 nts were detected, but the question remained whether they were derived from a longer transcript. A number of miRNAs mapped to introns of protein-coding genes, and these miRNAs were thought to be processed from the pre-mRNAs of host genes (Rodriguez et al. 2004; Baskerville and Bartel 2005). The majority of miRNAs, however, did not overlap previously annotated genes. When the sequences of the miRNAs were compared to the mammalian cDNA databases, the presence of expressed sequence tags (ESTs) overlapping miRNAs suggested that the -65 nt precursor may be processed from an even longer primary transcript (Lagos-Quintana et al. 2001). This idea was further supported by the fact that some of the novel genes were clustered so closely together that they appeared to be transcribed as a single unit (Lagos-Quintana et al. 2001; Lau et al. 2001). Subsequent reverse transcription polymerase chain reaction (RT-PCR) experiments amplifying a larger region surrounding miRNAs revealed that pre-miRNAs were derived from a longer transcript now known as pri-miRNA (Lee et al. 2002). Although many non-coding RNAs, such as tRNAs and U6 small nuclear RNA, are transcribed by RNA polymerase III (pol III), RNA polymerase II (pol II) was hypothesized to transcribe miRNA genes. The pri-miRNAs can be over a kilobase, longer than most pol III-dependent transcripts, and they contain stretches of uridines that would terminate pol III transcription (Lee et al. 2002). Also, the expressions of many miRNAs are temporally or spatially restricted, which suggested pol II transcription. RNase protection assay (RPA) and RT-PCR of pri-miRNAs from RNAs that bound to capbinding protein eIF-4E indicated that pri-miRNAs contained the 5' cap (Cai et al. 2004; Lee et al. 2004). Furthermore, similar experiments performed with polyadenylated RNAs and identification of putative polyadenylation signals suggested that pri-miRNAs also had poly(A) tails (Bracht et al. 2004; Cai et al. 2004; Lee et al. 2004). Coupled with primiRNA transcription dependence on c-amanitin and pol II chromatin-IP (ChIP) results (Lee et al. 2004), these findings confirmed that most pri-miRNAs are transcribed by pol II. Nuclear processing of pri-miRNAs by Microprocessor To understand how pri-miRNAs are processed into pre-miRNAs, the cleavage sites of pri-miRNAs were determined by mapping the 5' and 3' ends of pre-miRNAs (Basyuk et al. 2003; Lee et al. 2003) When pre-miRNAs were characterized and folded, the hairpins had a 5' phosphate and a -2 nt 3' overhang typical of RNase III cleavage. There are three classes of RNase III, each class represented by Escherichiacoli RNase III, eukaryotic Drosha, and eukaryotic Dicer. Because pri-miRNAs are processed into pre-miRNAs in the nucleus (Lee et al. 2002), the nuclear RNase III enzyme Drosha became a primary candidate for pri-miRNA-processing machinery (Lee et al. 2003). As predicted, immunoprecipitated Drosha complex generated pre-miRNAs from pri-miRNAs in vitro, and inhibition of Drosha significantly repressed mature miRNA production in vivo (Lee et al. 2003). These findings supported the notion that Drosha cleaves pri-miRNAs into pre-miRNAs. Drosha has two RNase III domains (RIIIDs), a double-stranded RNA binding domain (dsRBD), and an extended N terminus which contains a proline-rich region and arginine- and serine-rich region (Figure 1B). The tandem RIIIDs form an intramolecular dimer which cleaves a pri-miRNA to generate a pre-miRNA hairpin with a -2 nt 3' overhang (Han et al. 2004). Although Drosha's dsRBD structure is similar to other RNAbinding dsRBDs (Mueller et al. 2010), it does not have significant RNA-binding activity (Han et al. 2006). Biochemical analysis of Drosha revealed that it existed in a complex with DiGeorge syndrome critical region gene 8 (DGCR8) (Denli et al. 2004; Gregory et al. 2004; Han et al. 2004). DGCR8 contains two dsRBDs that are arranged with pseudo twofold symmetry in its core as well as a WW domain that can interact with proline-rich peptides (Figure 1B) (Sohn et al. 2007). Alone, neither Drosha nor DGCR8 can process pri-miRNAs, but together, they can efficiently cleave pri-miRNAs to generate premiRNAs in vitro (Gregory et al. 2004; Han et al. 2004). The complex consisting of Drosha and DGCR8 is called the "Microprocessor." A pri-miRNA consists of a stem, a terminal loop, and nonstructured flanking sequences. Although there appeared to be no consensus sequence on the flanking regions, they have been shown to be important for efficient pri-miRNA processing both in vitro and in vivo (Lee et al. 2003; Chen et al. 2004; Zeng and Cullen 2005; Han et al. 2006). The stem is -3 helical turns, and the cleavage site is ~1 helical turn (I11 bps) from the base of the hairpin. (Han et al. 2006). Although the terminal loop has been reported to be important for pri-miRNA processing (Zeng et al. 2005), systematic mutagenesis experiments revealed that the site of Drosha cleavage is determined by the distance from the ssRNA-dsRNA junction (Han et al. 2006). Thus, the current model posits that DGCR8 binds to the base of the pri-miRNA hairpin with two dsRBDs contacting two discontinuous segments of the stem and positions Drosha such that it cuts the stem at a distance of -11 bps away from the base of the hairpin (Han et al. 2006; Sohn et al. 2007). Nuclear export of pre-miRNAs by Exportin-5 After the Microprocessor cleavage, pre-miRNAs are exported from the nucleus to the cytoplasm. Due to lack of a consensus sequence on pre-miRNAs, the export receptor was hypothesized to recognize a structural motif. Exportin-5 (Exp5) is a Ran-dependent nuclear transport receptor that recognizes RNA stem and a 2 nt 3' overhang, both structural elements of pre-miRNAs (Okada et al. 2009). Exp5 forms a complex with its cargo in presence of GTP-bound Ran and translocates to the cytoplasm. Upon export, GTP is hydrolyzed to GDP, and the cargo is released. In order to test whether Exp5 exports pre-miRNAs, Exp5 expression was repressed using RNA interference (RNAi). Inhibition of Exp5 resulted in reduction of pre-miRNAs and mature miRNAs in the cytoplasm as well as decrease in miRNA function (Yi et al. 2003; Lund et al. 2004). Furthermore, pre-miRNA binding to and export by Exp5 were dependent on Ran-GTP, and injection of purified Exp5 into Xenopus oocyte nuclei resulted in cytoplasmic accumulation of pre-miRNAs but not other RNAs (Lund et al. 2004). These results provided evidence that pre-miRNAs are exported to the cytoplasm by Exp5. Cytoplasmic processing of pre-miRNAs by Dicer Once pre-miRNAs are exported to the cytoplasm, they need to be processed into -22 nt mature miRNAs. Cytoplasmic RNase III enzyme Dicer had previously been implicated in processing of dsRNAs into small interfering RNAs (siRNAs) (Bernstein et al. 2001). Due to the similarities between miRNAs and siRNAs, the role of Dicer in pre-miRNA processing was examined (Grishok et al. 2001; Hutvigner et al. 2001; Ketting et al. 2001). When the level of Dicer was reduced, pre-miRNAs accumulated while the level of mature miRNAs decreased (Grishok et al. 2001; Hutvigner et al. 2001). Dicer also cleaved pre-miRNAs efficiently in vitro (Hutvigner et al. 2001; Ketting et al. 2001). Dicer consists of two RIIIDs, two dsRBDs, a DExD/H box RNA helicase domain, and a Piwi/Argonaute/Zwille (PAZ) domain (Figure 1B). Although Dicer was initially hypothesized to have two active dsRNA cleavage sites, mutagenesis experiments provided a model in which the two RIIIDs form an intramolecular dimer to form a single dsRNA processing center (Zhang et al. 2004). This model proposed that the PAZ domain recognized the 3' overhang of a pre-miRNA left by Drosha cleavage, and the distance between the PAZ domain and the RIIIDs dictated the site of Dicer cleavage. The structure of Dicer confirmed this model (Macrae et al. 2006). Thus, Dicer serves as a molecular ruler to measure a fixed distance from the site of Drosha cleavage to process pre-miRNAs into a -22 nt RNA duplex. RISC loading After Dicer cleavage, the ~22 nt RNA duplex is loaded onto the RISC by RISC-loading complex (RLC). The first RLC to be identified was fly Ago2-RLC, which consists of Ago2, Dicer-2, and its dsRNA-binding partner R2D2 (Liu et al. 2003; Pham et al. 2004; Tomari et al. 2004). Subsequently, human Ago2-RLC components were identified as Ago2, Dicer, and TAR RNA-binding protein (TRBP) (Chendrimada et al. 2005; Maniataki and Mourelatos 2005; Macrae et al. 2008). The association of Dicer in RLC raised a debate on whether pre-miRNA processing and RISC-loading were coupled. Although some studies support this model (Gregory et al. 2005; Maniataki and Mourelatos 2005), the more widely accepted view is that the two processes are not coupled (Murchison et al. 2005; Preall et al. 2006; Yoda et al. 2010). Usually, only one strand (miRNA) from the miRNA:miRNA* duplex is incorporated into the RISC while the passenger strand (miRNA*) is degraded. To determine how the miRNA strand is chosen, RISC-capture assay and thermodynamic profiling were performed on various RNA duplexes (Khvorova et al. 2003; Schwarz et al. 2003). In general, the species with less thermodynamically stable 5' end was incorporated into the RISC as the mature miRNA. The degree of functional asymmetry was attributed to the relative ease with which the 5' ends of the two strands can be unwound from the duplex. Noncanonical miRNA biogenesis At least three noncanonical miRNA biogenesis pathways have been identified. The first consists of a class of miRNAs called mirtrons that bypasses Drosha cleavage (Figure 1A, middle upper) (Okamura et al. 2007; Ruby et al. 2007). Initially identified in D. melanogasterand C. elegans, these miRNAs are derived from short introns of proteincoding genes, which are spliced by the spliceosome. After the excised lariat is debranched, it folds into a pre-miRNA hairpin, which is then exported into the cytoplasm for Dicer cleavage. Thus, mirtron pre-miRNAs are generated from pre-mRNA by the spliceosome rather than Drosha. Because the intron needs to fold into a hairpin suitable for Dicer processing, mirtrons generally arise from introns of length ~60 nts, the average length of a canonical pre-miRNA. While the genomes of C. elegans and D. melanogaster have abundance of introns with lengths similar to pre-miRNAs, many mammalian genomes-including mouse and human-contain few such introns and thus are less likely to evolve mirtrons (Ruby et al. 2007). As a result, although mirtrons have been observed in mammals, they comprise a smaller fraction of the pre-miRNAs (Berezikov et al. 2007; Babiarz et al. 2008). However, some longer introns have been observed to fold into a hairpin with a tail at either the 5' or the 3' end, and subsequent nucleolytic cleavage can yield a pre-miRNA-like hairpin (Figure 1A, middle lower) (Babiarz et al. 2008). This subclass of mirtrons is called tailed-mirtrons. The second class of noncanonical miRNA is endogenous small hairpin RNAs (shRNAs). Like exogenous shRNAs (Paddison et al. 2004), an endogenous shRNA transcript can fold into a hairpin, but it lacks significant base-pairing beyond the premiRNA hairpin (Babiarz et al. 2008). The processing of endogenous shRNAs is not dependent on DGCR8 but dependent on Dicer, which suggests that a pri-miRNA of an endogenous shRNA may be processed into a pre-miRNA in a Microprocessorindependent manner (Babiarz et al. 2008). One possibility is that the pri-miRNA is trimmed by nucleases into a pre-miRNA hairpin, which can then be processed by Dicer into a mature miRNA:miRNA* duplex (Figure 1A, bottom). A more recent observation has shown that the processing of miR-451 is dependent on Ago2 instead of Dicer (Figure IA, inset) (Cheloufi et al. 2010; Cifuentes et al. 2010). The pre-miR-451 is unusual in that it only has 17 bps in its stem-too short to be a Dicer substrate. Furthermore, the mature miR-451 spans the loop rather than being confined to one arm of the hairpin. The dissection of miR-451 maturation process has shown that after Drosha cleavage and nuclear export, Ago2, rather than Dicer, is responsible for the second cleavage. Ago2 cleavage generates a 30 nt product, which is likely trimmed by RNases/nucleosomes to the annotated length of 22 nts. Thus far, miR-451 is the only known miRNA to have Dicer-independent, Ago2-dependent biogenesis. MicroRNA function The predominant role of miRNAs is to repress gene expression (Fabian et al. 2010) although there have been reports of miRNAs that upregulate gene expression (Vasudevan et al. 2007; Orom et al. 2008). When a miRNA has an extensive complementarity to a target mRNA, such as miR-196 to its target mRNA HOXB, the target mRNA is cleaved by AGO2 (Yekta et al. 2004). However, most mammalian miRNAs lack such extensive pairing to their targets. In case of imperfect pairing, the main site of target guidance is on the nucleotides 2-7 of the miRNA, also known as the "seed" (Lewis et al. 2003; Lewis et al. 2005; Grimson et al. 2007; Bartel 2009). Initially, the primary mode of such gene downregulation was thought to be translational inhibition with little or no change at the mRNA level (Wightman et al. 1993; O'Donnell et al. 2005; Zhao et al. 2005). However, advanced proteomic surveys coupled with microarray analysis of miRNAs and their target genes have shifted the paradigm of miRNA-mediated gene repression from translational inhibition to mRNA destabilization (Baek et al. 2008; Selbach et al. 2008). Polysome and ribosome profiling of comparable samples supported the idea that most miRNA-mediated repression occurred primarily through decrease in target mRNA levels (Hendrickson et al. 2009; Guo et al. 2010). Global miRNA gene discovery Computational prediction of miRNA genes Although conventional cloning and sequencing small RNAs led to discovery of hundreds of mammalian miRNAs (Lagos-Quintana et al. 2001; Lagos-Quintana et al. 2002; Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov et al. 2006b; Landgraf et al. 2007), the number of miRNA genes identified was far from saturation due to lowthroughput sequencing and constrained expression patterns. The low-throughput Sanger sequencing allowed only the more abundant miRNAs in the sample to be sequenced. Furthermore, only the miRNAs that were present in the sample, rather than all the miRNAs encoded in the genome, could be identified. Thus, some miRNAs that were expressed at a low level or in specific cell types or conditions were not identified using this approach. Nonetheless, a number of properties characteristic of miRNAs were deduced from the expanded list of miRNAs, and these features were used to computationally predict miRNA genes from the genomic sequence. One feature that best distinguishes miRNAs is the stem-loop structure. Because a miRNA matures through Drosha and Dicer cuts, it must map to a locus that can fold into a stable hairpin of -33 bps. Another commonly used feature is conservation, because miRNAs frequently have biological functions that have been conserved through evolution. Other properties include sequence, additional structural information, thermodynamic stability, and genomic location. The earliest miRNA gene predictions relied heavily on conservation. For example, the MiRscan algorithm scanned the C. elegans genome for hairpin structures that were conserved to C. briggsae (Lim et al. 2003b). MiRscan then evaluated the filtered hairpins for secondary structure, sequence biases, and additional conservation to determine whether the hairpins resembled known miRNAs. The study identified 35 novel miRNA genes in C. elegans, and a subset of the predictions was tested by Northern blots and 5' rapid amplification of cDNA ends (RACE). Subsequently, this method was applied to vertebrate genomes to discover 38 novel human miRNAs and 14 homologs of previously known miRNAs (Lim et al. 2003a). Using phylogenetic shadowing, another study observed that the stem region of the pre-miRNA was conserved while the flanking regions and the terminal loop were not conserved (Berezikov et al. 2005). Sixteen human miRNAs and 976 candidates were identified by first scanning the genome for a pre-miRNA-like conservation profile and then filtering for thermodynamically favorable hairpins. Some of the candidates were supported by Northern blots and later by sequencing data and/or RNA-primed Arraybased Klenow Extension (RAKE) (Berezikov et al. 2006b). In an alternate method, conservation of potential target genes rather than that of hairpins was used to predict novel miRNA genes (Xie et al. 2005). First, 8-mer conserved motifs in 3' UTR of mRNAs were identified. Hypothesizing that the discovered motifs corresponded to locations where miRNA seed sequences bound, the sequences complementary to the motifs were mapped to the human genome to identify loci that could produce miRNAs with the corresponding seeds. If these sequences were conserved, the flanking region surrounding the sequence was folded to determine whether it could form a pre-miRNA-like hairpin. This method identified 129 novel candidates, some of which were supported by 5' RACE. Although methods using conservation have predicted plethora of novel miRNA genes, they cannot predict nonconserved genes. A number of machine learning-based approaches have been developed for ab initio miRNA prediction (Sewer et al. 2005; Xue et al. 2005; Helvik et al. 2007; Jiang et al. 2007; Wang et al. 2010). These algorithms first learn the properties characteristic of miRNAs and then build a classifier based on positive and negative samples to determine whether a given sequence resembles a miRNA gene. The positive samples are known miRNAs in the miRBase database; the negative samples are usually selected from hairpins from other non-coding RNAs such as rRNA or from mRNAs. The features that are used to describe the hairpins range from overall thermodynamic stability to percentage of nucleotide composition in a particular region of the hairpin. Early works demonstrated that such machine learning methods could separate the positives from the negatives (Sewer et al. 2005; Xue et al. 2005), and many recent efforts used a similar approach. MicroRNA gene discovery by second-generation sequencing Although computational approaches were able to identify novel miRNA genes and candidates, these efforts were limited by incomplete understanding of miRNA-processing and the low-quality positive and negative training sets. Furthermore, some predicted miRNAs may not even be transcribed and thus lack biological function. Cloning and sequencing small RNAs can bypass these problems. While the experimental method also has its own limitations, they can be ameliorated by deeper sequencing of broad range of samples. The small RNA cloning protocols for miRNA discovery were pioneered by three labs (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001), but the concepts behind them were similar-ligations of adaptors to the 3' and 5' ends of the size-fractionated small RNAs followed by reverse-transcription (RT) and polymerase chain reaction (PCR) amplification. The amplified constructs were concatemerized, cloned into a plasmid, and sequenced by Sanger method. With some adaptations, similar approaches were used to construct libraries for high-throughput sequencing platforms (Lu et al. 2007; Hafner et al. 2008). Massively parallel signature sequencing (MPSS) was an early high-throughput sequencing technology (Brenner et al. 2000). In order to sequence small RNAs, the 5' and 3' adaptors were first ligated onto small RNAs, and RNAs were reverse-transcribed into cDNAs and cloned into a vector with a unique identifier tag (Mineno et al. 2006). After amplification, the cDNA library was hybridized and ligated to microbeads with complementary tags (Brenner et al. 2000). Thus, each microbead carried 100,000 copies of an identical sequence. To determine the sequence, the construct was cleaved by a restriction enzyme to expose a 4 nt overhang, and encoded adaptors hybridized to the overhang and ligated to the construct. The encoded adaptors contained a 4 nt overhang with all possible nt combinations, a corresponding fluorescent label, and a restriction enzyme recognition site. The microbeads were imaged to determine the sequence of the overhang. The encoded adaptor was cleaved by the restriction enzyme, and the process was repeated. From more than 500,000 reads obtained from mouse embryos, 61 novel miRNA genes were identified after filtering for pre-miRNA-like hairpin structure (Mineno et al. 2006). While MPSS opened the door for high-throughput sequencing of miRNAs, it appeared to have sequence biases. The next major development in sequencing technology was 454 pyrosequencing (Margulies et al. 2005). Pyrosequencing technology takes advantage of the fact that a pyrophosphate is released upon nucleotide incorporation. Briefly, a DNA molecule is attached to a single bead by limiting dilution, and the sequence is amplified on the bead within a droplet of emulsion. A single nucleotide is washed over the beads, and a polymerase incorporates the nucleotide if it is complementary to the template. When a nucleotide is incorporated, a pyrophosphate is released and converted to ATP by ATP sulfurylase. In presence of ATP, luciferase emits light, and the signal is detected by a camera. Unincorporated nucleotides are washed away, and the cycle is repeated with the next nucleotide. Pyrosequencing can sequence ~1.5 million reads of 300-500 nts in length, and a number of studies have utilized this technology for miRNA discovery in mammals (Berezikov et al. 2006a; Berezikov et al. 2006b; Berezikov et al. 2007; Calabrese et al. 2007). Currently, the most widely used sequencing method is Illumina's reversible terminator technology due to the number of reads it can generate per run (Seo et al. 2004). First, a cDNA library is constructed such that the small RNA sequence is flanked by two adaptor sequences. The cDNAs are hybridized to the primers attached to the chip, whose sequences are complementary to the adaptor sequences. The opposite strand is synthesized by a polymerase, and the new strand, which is covalently attached to the chip, can bend over to anneal to another primer complementary to the free end. This process-bridge-amplification-is repeated to build clusters of DNAs. One of the strands is removed so that each cluster contains single-stranded DNA molecules with an identical sequence. To determine the sequence of each cluster, a sequencing primer, polymerase, and fluorescently labeled dNTPs are added to the chip. Each dNTP has a base-unique fluorescent label and is blocked on the 3' terminus so that only one nucleotide can be incorporated at each cycle. After imaging the chip, the terminator and fluorescent label are photo-cleaved. The process is repeated, and the sequence of each cluster can be determined by tracking the fluorescent label bound to the cluster at each step. Although this technology could initially only sequence up to 32 nts, it has been improved so that it can now generate 200 million reads of 100 nts per run. Many recent miRNA gene discovery efforts have utilized Illumina sequencing platform (Morin et al. 2008; Ahn et al. 2010; Su et al. 2010). State of miRNA annotations Since establishment miRBase, a database of miRNA annotations, the number of annotated miRNA genes has grown explosively (Figure 2) (Griffiths-Jones 2004; Griffiths-Jones et al. 2006). While both computational and experimental methods have contributed to miRNA gene discovery, almost all the novel miRNA gene annotations since 2008 have been the result of sequencing studies (Kozomara and Griffiths-Jones 2011). Although the database is continuously updated to provide the most accurate information, even a single study can deposit a large number of false entries. Two major factors contribute to inaccurate miRNA gene annotations. The first is non-stringent discovery methods. For example, a study identified fly miRNA genes based on a single read mapping to one arm of a hairpin structure (Lu et al. 2008). When more reads from a deeper sequencing study were mapped to the "genes," most of these entries appeared to be degradation fragments (Berezikov et al. 2010). Although the original guidelines for miRNA annotation only required presence of -22 nt RNA and hairpin structure (Ambros et al. 2003a), some studies have since adopted the following additional criteria: minimum level of expression, absence of overlap to annotated transcripts, relatively precise Drosha and Dicer cleavage sites, and presence of miRNA* species (Ruby et al. 2006; Grimson et al. 2008; Berezikov et al. 2010; Marco et al. 2010). The other factor contributing to inaccurate annotation is the number of reads that can be sequenced by contemporary technology. With an abundance of reads mapping to a genomic region, it is relatively easy to determine whether it is a miRNA gene. With fewer reads mapping to a locus, researchers must make the decision based on a limited amount of information. While setting a cutoff of minimum number of reads matching the putative miRNA can alleviate this problem, such requirement trades specificity-how well the method discriminates against false positives-for sensitivity-how well the method identifies all the true positives. Summary Since the discovery of lin-4, more than 15,000 miRNA genes have been identified. These genes play an important role in gene regulation by repressing the expression of their target genes. Since miRNA targeting is based on its sequence, accurate and comprehensive annotation of miRNA genes is crucial in understanding their biological roles. In the following chapters, the method to construct a small RNA library for Illumina sequencing platform (Chapter 2) and the analysis of the data derived from mouse libraries (Chapter 3) are described. In addition to substantially revising the list of confidently identified miRNA genes, we provided a medium-throughput method to test questionable annotations and described the general features of murine and mammalian miRNAs. Our analysis also revealed variations in miRNA processing with functional consequences. Figure Legends Figure 1. MicroRNA biogenesis. (A) Canonical miRNA biogenesis (top), mirtron biogenesis (middle upper), tailed-mirtron biogenesis (middle lower), endogenous shRNA biogenesis (bottom), and Ago2-dependent, Dicer-independent biogenesis (inset). Red strand corresponds to mature miRNA, and blue strand corresponds to miRNA*. (B) Schematic representation of domain structures of proteins in canonical miRNA biogenesis pathway. Figure is adapted and modified from (Nowotny and Yang 2009). RIIID, RNase III domain; dsRBD, double-stranded RNA-binding domain; PAZ, Piwi/Argonaute/Zwille domain. Figure 2. Growth of miRNA gene annotations in miRBase. The data tracks the number of mouse (green), human (red), and all miRNA gene entries (blue) from January 2004 to September 2010, corresponding to miRBase version 3.0 to 16.0. References Ahn, H.W., Morin, R.D., Zhao, H., Harris, R.A., Coarfa, C., Chen, Z.-J., Milosavljevic, A., Marra, M.A., and Rajkovic, A. 2010. MicroRNA transcriptome in the newborn mouse ovaries determined by massive parallel sequencing. Mol Hum Reprod 16(7): 463-471. Ambros, V. 1989. A hierarchy of regulatory genes controls a larva-to-adult developmental switch in C. elegans. Cell 57(1): 49-57. Ambros, V., Bartel, B., Bartel, D., Burge, C., Carrington, J., Chen, X., Dreyfuss, G., Eddy, S., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., and Tuschl, T. 2003a. A uniform system for microRNA annotation. Rna 9(3): 277-279. Ambros, V. and Horvitz, H.R. 1984. Heterochronic mutants of the nematode Caenorhabditis elegans. Science 226(4673): 409-416. Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs and Other Tiny Endogenous RNAs in C. elegans. CurrentBiology 13(10): 807818. Arasu, P., Wightman, B., and Ruvkun, G. 1991. Temporal regulation of lin-14 by the antagonistic action of two other heterochronic genes, lin-4 and lin-28. Genes & Development 5(10): 1825-1833. Avery, O.T., Macleod, C.M., and McCarty, M. 1944. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J Exp Med 79(2): 137-158. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. 2008. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes & Development 22(20): 2773-2785. Baek, D., Villdn, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. 2008. The impact of microRNAs on protein output. Nature 455(7209): 64-71. Bartel, D. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Bartel, D.P. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell 136(2): 215-233. Baskerville, S. and Bartel, D. 2005. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 11(3): 241-247. Basyuk, E., Suavet, F., Doglio, A., Bordonnd, R., and Bertrand, E. 2003. Human let-7 stem-loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res 31(22): 6593-6597. Beadle, G.W. and Tatum, E.L. 1941. Genetic Control of Biochemical Reactions in Neurospora. P Natl Acad Sci Usa 27(11): 499-506. Berezikov, E., Chung, W.-J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian mirtron genes. Mol Cell 28(2): 328-336. Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H.A., and Cuppen, E. 2005. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120(1): 21-24. Berezikov, E., Liu, N., Flynt, A.S., Hodges, E., Rooks, M., Hannon, G.J., and Lai, E.C. 2010. Evolutionary flux of canonical microRNAs and mirtrons in Drosophila. Nat Genet 42(1): 6-9; author reply 9-10. Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E., and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee brain. Nat Genet 38(12): 1375-1377. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J., Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16(10): 1289-1298. Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. 2001. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409(6818): 363-366. Bracht, J., Hunter, S., Eachus, R., Weeks, P., and Pasquinelli, A.E. 2004. Trans-splicing and polyadenylation of let-7 microRNA primary transcripts. Rna 10(10): 15861594. Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M., Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G., Vermaas, E., Williams, S.R., Moon, K., Burcham, T., Pallas, M., DuBridge, R.B., Kirchner, J., Fearon, K., Mao, J., and Corcoran, K. 2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18(6): 630-634. Cai, X., Hagedorn, C.H., and Cullen, B.R. 2004. Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. Rna 10(12): 1957-1966. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. P Natl Acad Sci Usa 104(46): 18097-18102. Chalfie, M., Horvitz, H.R., and Sulston, J.E. 1981. Mutations that lead to reiterations in the cell lineages of C. elegans. Cell 24(1): 59-69. Cheloufi, S., Dos Santos, C.O., Chong, M.M.W., and Hannon, G.J. 2010. A dicerindependent miRNA biogenesis pathway that requires Ago catalysis. Nature 465(7298): 584-589. Chen, C., Li, L., Lodish, H., and Bartel, D. 2004. MicroRNAs modulate hematopoietic lineage differentiation. Science 303(5654): 83-86. Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura, K., and Shiekhattar, R. 2005. TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436(7051): 740-744. Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., Wolfe, S.A., and Giraldez, A.J. 2010. A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328(5986): 1694-1698. Czech, B., Malone, C.D., Zhou, R., Stark, A., Schlingeheyde, C., Dus, M., Perrimon, N., Kellis, M., Wohlschlegel, J.A., Sachidanandam, R., Hannon, G.J., and Brennecke, J. 2008. An endogenous small interfering RNA pathway in Drosophila. Nature 453(7196): 798-802. Denli, A.M., Tops, B.B.J., Plasterk, R.H.A., Ketting, R.F., and Hannon, G.J. 2004. Processing of primary microRNAs by the Microprocessor complex. Nature 432(7014): 231-235. Elbashir, S.M., Lendeckel, W., and Tuschl, T. 2001. RNA interference is mediated by 21and 22-nucleotide RNAs. Genes & Development 15(2): 188-200. Fabian, M.R., Sonenberg, N., and Filipowicz, W. 2010. Regulation of mRNA translation and stability by microRNAs. Annu Rev Biochem 79: 351-379. Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. 1998. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391(6669): 806-811. Ghildiyal, M., Seitz, H., Horwich, M.D., Li, C., Du, T., Lee, S., Xu, J., Kittler, E.L.W., Zapp, M.L., Weng, Z., and Zamore, P.D. 2008. Endogenous siRNAs Derived from Transposons and mRNAs in Drosophila Somatic Cells. Science 320(5879): 1077-1081. Gregory, R.I., Chendrimada, T.P., Cooch, N., and Shiekhattar, R. 2005. Human RISC couples microRNA biogenesis and posttranscriptional gene silencing. Cell 123(4): 631-640. Gregory, R.I., Yan, K.-P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and Shiekhattar, R. 2004. The Microprocessor complex mediates the genesis of microRNAs. Nature 432(7014): 235-240. Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue): D109-111. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34(Database issue): D140-144. Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27(1): 91-105. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193Ul 115. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. 2001. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106(1): 23-34. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. 2010. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466(7308): 835-840. Hafner, M., Landgraf, P., Ludwig, J., Rice, A., Ojo, T., Lin, C., Holoch, D., Lim, C., and Tuschl, T. 2008. Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods 44(1): 3-12. Han, J., Lee, Y., Yeom, K.-H., Kim, Y.-K., Jin, H., and Kim, V.N. 2004. The DroshaDGCR8 complex in primary microRNA processing. Genes & Development 18(24): 3016-3027. Han, J., Lee, Y., Yeom, K.-H., Nam, J.-W., Heo, I., Rhee, J.-K., Sohn, S.Y., Cho, Y., Zhang, B.-T., and Kim, V.N. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901. Helvik, S.A., Snove, 0., and Saetrom, P. 2007. Reliable prediction of Drosha processing sites improves microRNA gene prediction. Bioinformatics 23(2): 142-149. Hendrickson, D.G., Hogan, D.J., McCullough, H.L., Myers, J.W., Herschlag, D., Ferrell, J.E., and Brown, P.O. 2009. Concordant Regulation of Translation and mRNA Abundance for Hundreds of Targets of a Human microRNA. PLoS Biol 7(11): e1000238. Horvitz, H.R. and Sulston, J.E. 1980. Isolation and genetic characterization of celllineage mutants of the nematode Caenorhabditis elegans. Genetics 96(2): 435454. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Huang, Y., Shen, X.J., Zou, Q., Wang, S.P., Tang, S.M., and Zhang, G.Z. 2010. Biological functions of microRNAs: a review. J Physiol Biochem. Hutvigner, G., McLachlan, J., Pasquinelli, A.E., Bilint, E., Tuschl, T., and Zamore, P.D. 2001. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293(5531): 834-838. Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. 2007. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res 35(Web Server issue): W339-344. Johannsen, W. 1911. The genotype conception of heredity. Am Nat 45: 129-159. Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. 2001. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes & Development 15(20): 2654-2659. Khvorova, A., Reynolds, A., and Jayasena, S.D. 2003. Functional siRNAs and miRNAs exhibit strand bias. Cell 115(2): 209-216. Kozomara, A. and Griffiths-Jones, S. 2011. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39(Database issue): D152-157. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294(5543): 853-858. Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New microRNAs from mouse and human. Rna 9(2): 175-179. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. 2002. Identification of tissue-specific microRNAs from mouse. Curr Biol 12(9): 735-739. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V., Chiaretti, S., Foi, R., Schliwka, J., Fuchs, U., Novosel, A., Muller, R.-U., Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M., Weir, D.B., Choksi, R., De Vita, G., Frezzetti, D., Trompeter, H.-I., Hornung, V., Teng, G., Hartmann, G., Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle, J.W., Ju, J., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein, M.J., Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl, T. 2007. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129(7): 1401-1414. Lau, N., Lim, L., Weinstein, E., and Bartel, D. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858862. Lee, R. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294(5543): 862-864. Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5): 843854. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Ridmark, 0., Kim, S., and Kim, V.N. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425(6956): 415-419. Lee, Y., Jeon, K., Lee, J.-T., Kim, S., and Kim, V.N. 2002. MicroRNA maturation: stepwise processing and subcellular localization. Embo J 21(17): 4663-4670. Lee, Y., Kim, M., Han, J., Yeom, K.-H., Lee, S., Baek, S.H., and Kim, V.N. 2004. MicroRNA genes are transcribed by RNA polymerase II. Embo J23(20): 40514060. Lewis, B., Burge, C., and Bartel, D. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): 15-20. Lewis, B., Shih, I., Jones-Rhoades, M., Bartel, D., and Burge, C. 2003. Prediction of mammalian microRNA targets. Cell 115(7): 787-798. Lim, L., Glasner, M., Yekta, S., Burge, C., and Bartel, D. 2003a. Vertebrate MicroRNA genes. Science 299(5612): 1540-1540. Lim, L., Lau, N., Weinstein, E., Abdelhakim, A., Yekta, S., Rhoades, M., Burge, C., and Bartel, D. 2003b. The microRNAs of Caenorhabditis elegans. Genes & Development 17(8): 991-1008. Liu, Q., Rand, T.A., Kalidas, S., Du, F., Kim, H.-E., Smith, D.P., and Wang, X. 2003. R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301(5641): 1921-1925. Lu, C., Meyers, B.C., and Green, P.J. 2007. Construction of small RNA cDNA libraries for deep sequencing. Methods 43(2): 110-117. Lu, J., Shen, Y., Wu, Q., Kumar, S., He, B., Shi, S., Carthew, R.W., Wang, S.M., and Wu, C.-I. 2008. The birth and death of microRNA genes in Drosophila. Nat Genet 40(3): 351-355. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. 2004. Nuclear export of microRNA precursors. Science 303(5654): 95-98. Macrae, I.J., Ma, E., Zhou, M., Robinson, C.V., and Doudna, J.A. 2008. In vitro reconstitution of the human RISC-loading complex. P Natl Acad Sci Usa 105(2): 512-517. Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams, P.D., and Doudna, J.A. 2006. Structural basis for double-stranded RNA processing by Dicer. Science 311(5758): 195-198. Maniataki, E. and Mourelatos, Z. 2005. A human, ATP-independent, RISC assembly machine fueled by pre-miRNA. Genes & Development 19(24): 2979-2990. Marco, A., Hui, J.H.L., Ronshaugen, M., and Griffiths-Jones, S. 2010. Functional shifts in insect microRNA evolution. Genome Biol Evol 2: 686-696. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.-J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.-B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., and Rothberg, J.M. 2005. Genome sequencing in microfabricated highdensity picolitre reactors. Nature 437(7057): 376-380. Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H., Takayama, M., Asada, K., Mirochnitchenko, 0., Inouye, M., and Kato, I. 2006. The expression profile of microRNAs in mouse embryos. Nucleic Acids Res 34(6): 1765-1771. Morin, R.D., O'Connor, M.D., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A.-L., Zhao, Y., McDonald, H., Zeng, T., Hirst, M., Eaves, C.J., and Marra, M.A. 2008. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res 18(4): 610-621. Moss, E.G., Lee, R.C., and Ambros, V. 1997. The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88(5): 637-646. Mueller, G.A., Miller, M.T., Derose, E.F., Ghosh, M., London, R.E., and Hall, T.M.T. 2010. Solution structure of the Drosha double-stranded RNA-binding domain. Silence 1(1): 2. Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005. Characterization of Dicer-deficient murine embryonic stem cells. P Natl Acad Sci Usa 102(34): 12135-12140. Napoli, C., Lemieux, C., and Jorgensen, R. 1990. Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. The Plant Cell Online 2(4): 279-289. Nowotny, M. and Yang, W. 2009. Structural and functional modules in RNA interference. Curr Opin Struct Biol 19(3): 286-293. O'Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., and Mendell, J.T. 2005. c-Mycregulated microRNAs modulate E2F1 expression. Nature 435(7043): 839-843. Okada, C., Yamashita, E., Lee, S.J., Shibata, S., Katahira, J., Nakagawa, A., Yoneda, Y., and Tsukihara, T. 2009. A high-resolution structure of the pre-microRNA nuclear export machinery. Science 326(5957): 1275-1279. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1): 89-100. Orom, U.A., Nielsen, F.C., and Lund, A.H. 2008. MicroRNA-10a binds the 5'UTR of ribosomal protein mRNAs and enhances their translation. Mol Cell 30(4): 460471. Paddison, P.J., Caudy, A.A., Sachidanandam, R., and Hannon, G.J. 2004. Short Hairpin Activated Gene Silencing in Mammalian Cells. RNA Interference, Editing, and Modification 265: 85-100. Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., Spring, J., Srinivasan, A., Fishman, M., Finnerty, J., Corbo, J., Levine, M., Leahy, P., Davidson, E., and Ruvkun, G. 2000. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408(6808): 86-89. Pham, J.W., Pellino, J.L., Lee, Y.S., Carthew, R.W., and Sontheimer, E.J. 2004. A Dicer2-dependent 80s complex cleaves targeted mRNAs during RNAi in Drosophila. Cell 117(1): 83-94. Preall, J.B., He, Z., Gorra, J.M., and Sontheimer, E.J. 2006. Short interfering RNA strand selection is independent of dsRNA processing polarity during RNAi in Drosophila. Curr Biol 16(5): 530-535. Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. 2000. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772): 901-906. Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. 2004. Identification of mammalian microRNA host genes and transcription units. Genome Res 14(10A): 1902-1910. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207. Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007. Intronic microRNA precursors that bypass Drosha processing. Nature 448(7149): 83-86. Ruvkun, G., Ambros, V., Coulson, A., Waterston, R., Sulston, J., and Horvitz, H.R. 1989. Molecular genetics of the Caenorhabditis elegans heterochronic gene lin-14. Genetics 121(3): 501-516. Ruvkun, G. and Giusto, J. 1989. The Caenorhabditis elegans heterochronic gene lin-14 encodes a nuclear protein that forms a temporal developmental switch. Nature 338(6213): 313-319. Schwarz, D.S., Hutvigner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. 2003. Asymmetry in the assembly of the RNAi enzyme complex. Cell 115(2): 199-208. Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. 2008. Widespread changes in protein synthesis induced by microRNAs. Nature 455(7209): 58-63. Sen, G.C. and Sarkar, S.N. 2007. The Interferon-Stimulated Genes: Targets of Direct Signaling by Interferons, Double-Stranded RNA, and Viruses. Interferon: The 50th Anniversary 316: 233-250. Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. 2004. Photocleavable fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific coupling chemistry. P Natl Acad Sci Usa 101(15): 5488-5493. Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., Tuschl, T., van Nimwegen, E., and Zavolan, M. 2005. Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics6: 267. Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. 2000. The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol Cell 5(4): 659-669. Sohn, S.Y., Bae, W.J., Kim, J.J., Yeom, K.-H., Kim, V.N., and Cho, Y. 2007. Crystal structure of human DGCR8 core. Nat Struct Mol Biol 14(9): 847-853. Su, R.-W., Lei, W., Liu, J.-L., Zhang, Z.-R., Jia, B., Feng, X.-H., Ren, G., Hu, S.-J., and Yang, Z.-M. 2010. The integrative analysis of microRNA and mRNA expression in mouse uterus under delayed implantation and activation. PLoS ONE 5(11): e15513. Svoboda, P., Stein, P., Hayashi, H., and Schultz, R.M. 2000. Selective reduction of dormant maternal mRNAs in mouse oocytes by RNA interference. Development 127(19): 4147-4156. Tam, O.H., Aravin, A.A., Stein, P., Girard, A., Murchison, E.P., Cheloufi, S., Hodges, E., Anger, M., Sachidanandam, R., Schultz, R.M., and Hannon, G.J. 2008. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453(7194): 534-538. Tomari, Y., Matranga, C., Haley, B., Martinez, N., and Zamore, P.D. 2004. A protein sensor for siRNA asymmetry. Science 306(5700): 1377-1380. van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N.M., and Stuitje, A.R. 1990. Flavonoid Genes in Petunia: Addition of a Limited Number of Gene Copies May Lead to a Suppression of Gene Expression. The Plant Cell Online 2(4): 291-299. Vasudevan, S., Tong, Y., and Steitz, J.A. 2007. Switching from repression to activation: microRNAs can up-regulate translation. Science 318(5858): 1931-1934. Wang, M., Song, X., Han, P., Li, W., and Jiang, B. 2010. New syntax to describe local continuous structure-sequence information for recognizing new pre-miRNAs. J Theor Biol 264(2): 578-584. Watanabe, T., Totoki, Y., Toyoda, A., Kaneda, M., Kuramochi-Miyagawa, S., Obata, Y., Chiba, H., Kohara, Y., Kono, T., Nakano, T., Surani, M.A., Sakaki, Y., and Sasaki, H. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453(7194): 539-543. Wightman, B., Burglin, T.R., Gatto, J., Arasu, P., and Ruvkun, G. 1991. Negative regulatory sequences in the lin- 14 3'-untranslated region are necessary to generate a temporal switch during Caenorhabditis elegans development. Genes & Development 5(10): 1813-1824. Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75(5): 855-862. Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. 2005. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434(7031): 338-345. Xue, C., Li, F., He, T., Liu, G.-P., Li, Y., and Zhang, X. 2005. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics6: 310. Yekta, S., Shih, I.-H., and Bartel, D.P. 2004. MicroRNA-directed cleavage of HOXB8 mRNA. Science 304(5670): 594-596. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. 2003. Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes & Development 17(24): 3011-3016. Yoda, M., Kawamata, T., Paroo, Z., Ye, X., Iwasaki, S., Liu, Q., and Tomari, Y. 2010. ATP-dependent human RISC assembly pathways. Nat Struct Mol Biol 17(1): 1723. Zamore, P., Tuschl, T., Sharp, P., and Bartel, D. 2000. RNAi: Double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101(1): 25-33. Zeng, Y. and Cullen, B.R. 2005. Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem 280(30): 27595-27603. Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. Embo J 24(1): 138-148. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. 2004. Single processing center models for human Dicer and bacterial RNase III. Cell 118(1): 57-68. Zhao, Y., Samal, E., and Srivastava, D. 2005. Serum response factor regulates a musclespecific microRNA that targets Hand2 during cardiogenesis. Nature 436(7048): 214-220. Figure 1 DGCR8/Drosha cleavage Exportin-5 transport 1b canonical pri-miRNA Dicer cleavage I pre-miRNA RISC loading RISC I0 miRNA:miRNA* duplex splicing & debranching DGCR8/Drosha cleavage RISC loading mirtron RISC Ago2 cleavage splicing & debranching degrad of t tailed-mirtron endogenous shRNA Pro-rich Arg/Ser-rich - RIIID - RIIID dsRBD _-- WW DExD helicase dsRBD PAZ RIIID dsRBD dsRBD - RIIID dsRBD - Drosha DGCR8 Dicer Figure 2 16000 1600 14000 1400 (> 12000 1200 a) (D C 10000 (D 1000 o z 800 E /0000"r, /Mwww -- 8000 6000 a) U) 600 : AMT 4000 400 E E ONOW 2000 200 0 2003 2004 2005 2006 2007 2008 2009 Year --.. Mouse .... Human - All organisms 2010 0 ' Chapter 2 Method for construction of small-RNA libraries for Illumina high-throughput sequencing platform H. Rosaria Chiang1 ,2, Wendy K. Johnston' 2 , Lori Schoenfeld', 2, Shujun Luo 3, and David P. Bartel1' 2 'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 3 Illumina Inc., Hayward, CA 94545, USA 2 S.L. provided an early draft of the protocol and information on Illumina sequencing platform, and H.R.C. performed the experiments and revised the method. W.K.J. and L.S. further updated the protocol. D.P.B. provided guidance throughout the project. Abstract Small non-coding RNAs play an important role in gene regulation. Previous efforts to clone and sequence small RNAs have led to discoveries of novel classes of small RNAs or identifications of additional genes and/or properties. Here the protocol for cloning small RNAs to construct cDNA libraries ("small-RNA libraries") for Illumina sequencing platform is described. This method can be used for gene discovery and profiling of small regulatory RNAs such as microRNAs (miRNAs). Introduction Many classes of small RNAs play a regulatory role in a wide range of cellular processes, such as differentiation and transposon silencing. Cloning and sequencing small RNAs have contributed to better understanding of small RNAs, such as miRNAs and Piwi-interacting RNAs (piRNAs). The early sequencing studies aimed to identify additional miRNA genes (LagosQuintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). In order to construct smallRNA libraries enriched for miRNAs with minimal amount of degradation fragments present, one of these studies adopted a method that took advantage of the molecular features of miRNAs-5' phosphate and 3' hydroxyl groups (Lau et al. 2001). The concept behind their cloning protocols was similar-ligations of adaptors to the 3' and 5' ends of the size-fractionated RNAs followed by reverse-transcription (RT) and polymerase chain reaction (PCR) amplification. The amplified constructs were concatemerized, cloned into plasmids, and sequenced by Sanger method. While many miRNA genes were identified by Sanger sequencing (LagosQuintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001; Lagos-Quintana et al. 2002; Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov et al. 2006b; Landgraf et al. 2007), advances in high-throughput sequencing technology have facilitated small RNA discovery efforts. For example, sequencing studies of small RNAs that associate with Piwi proteins led to the discovery of piRNAs (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Lau et al. 2006). Furthermore, the ping-pong biogenesis model of class II piRNAs and their role in transposon silencing were identified through analyses of sequencing data (Brennecke et al. 2007). Similarly, sequencing studies led to identification of novel miRNA genes (Berezikov et al. 2006a; Ruby et al. 2006; Berezikov et al. 2007; Calabrese et al. 2007; Ruby et al. 2007b; Babiarz et al. 2008) as well as a new class of miRNAs known as mirtrons (Okamura et al. 2007; Ruby et al. 2007a). Therefore, it stands to reason that much information can be gained through high-throughput sequencing of small RNAs. Illumina's sequencing-by-synthesis utilizes a reversible terminator-based method (Seo et al. 2004) to provide 200 million reads of 100 nt per run. Here the method for construction of small-RNA library based on Lau et al. (2001) is updated for Illumina sequencing platform. Method Overview The protocol is outlined in Figure 1. The total RNA isolated from a desired sample is size-fractionated on a gel using radioactively labeled RNA markers. The pre-adenylated 3' adaptor with blocked 3' terminus is ligated to the small RNAs by a RNA ligase mutant Rnl2(1-249)K227Q (Hafner et al. 2008) in the absence of adenosine triphosphate (ATP) and by T4 RNA ligase 1. The 3' ligated RNAs are gel-purified, and the 5' adaptor is ligated using T4 RNA ligase 1. The RNAs with both 5' and 3' adaptors are gel-purified and reverse-transcribed. The RNAs are base-hydrolyzed, and resulting cDNAs are PCR- amplified. The PCR products are purified on a formamide gel, which are then sent for Illumina sequencing. Protocol 1. Purification of small RNA from total RNA To isolate small RNAs from larger RNA species, such as mRNA and rRNA, as well as their larger degradation fragments, the total RNA is size-fractionated on a urea-gel. In order to visualize the area to cut from the gel, radioactively labeled RNA markers are spiked into the total RNA. This method is preferable to running a control miRNA or RNA ladder in another lane because it eliminates the possibility of contamination and serves as an internal control. Although some of the sequenced reads may correspond to the RNA markers, they will only represent a minute fraction of the reads, and they can even be used to normalize the reads across multiple samples if the samples are prepared with the same amount of RNA markers. Other RNAs with desired sequences and lengths can be used as markers provided that they do not match the genome from which the total RNA is sequenced and that they do not affect downstream cloning or sequencing steps. 1.1. Kinase 5' end of RNA markers with 32p y-ATP Individually kinase the 18-mer and 30-mer RNA markers separately and keep the markers separate. It is important to use 3P y-ATP with a very high specific activity so that minimal amounts of RNA markers are spiked into the total RNA. Doing so will result in smaller fraction of reads reflecting the sequence of the markers. 18-mer marker RNA: AGCGUGUAGGGAUCCAAA 30-mer marker RNA: GGCAUUAACGCGGCCGCUCUACAAUAGUGA Reagent 10 [M RNA marker lOX PNK buffer 12P y-ATP (6000 ci/mmol, 150 mCi/mL) dH20 PNK (10 units/L) Amount 2 [tL 2 pL 2 [tL 13 [tL 1 [tL e Incubate reaction for 1 hour at 37 C. e (Optional: Before gel-purifying, add 5 [tL H2 0 for total of 25 [tL reaction volume and spin through a MicroSpin G-25 column (GE Healthcare) to remove excess, unincorporated ATP.) - Gel purification: o Add 2X urea loading buffer (8M urea, 25mM EDTA, 0.025% (w/v) each xylene cyanol, bromophenol blue) to each marker. Heat to 80 C for 5 min and run on a 15% denaturing polyacrylamide gel until bromophenol blue dye is -1 inch from the bottom of the gel. o Dismantle gel apparatus and separate plates. Leave the gel on one of the glass plates. Cover the gel with clear plastic film and visualize it by exposing to phosphorimager plate for -10 sec. Develop image. Align a printed image of the gel under the actual gel on the glass plate. Cut out the gel pieces containing marker bands and put them into 1.5 mL Eppendorf tubes. (Optional: To assist in aligning gel to picture, use a pipette tip with a small amount of hot dye to prick gel at several spots. Expose gel to plate and develop picture, then align the dots of dye in the gel to the dots of signal on the picture.) o Elute RNA: Add 450 [tL of 0.3 M NaCl to the gel slices and rotate the tubes at 4'C overnight. o Precipitate RNA: Remove the supernatant and add 2.5 volumes of cold 100% ethanol; vortex. (Optional: Add 1[tL of GlycoBlue (Ambion) to help visualize RNA pellet.) Incubate at -20 C for 30 min. (Alternatively, gel-elute for 4 hours at room temperature then precipitate for 1 hour at -20'C.) o Spin samples at high speed for 15 min at 4'C in a microcentrifuge. Carefully remove all supernatant and resuspend each pellet in 10-30 [tL dH 20. e To combine markers: Measure the activity of each marker separately and combine the two markers so that counts per minute (CPM) of each marker are approximately equal. 1.2 Purify small RNA from total RNA e Add trace but very high specific activity labeled markers to 5-30 [tL total RNA. For example, use a Ludlum Model 3 Survey Meter to measure -20-60K CPM of combined marker. To approximate this amount, pipette a small volume of combined marker into a pipette tip. Hold the tip very close, but not touching, to the face of the radiation monitor and note the number of counts. Adjust volume as necessary. e Gel-purify as above. When cutting bands from the gel, cut the areas containing the labeled markers and everything in-between. Resuspend precipitated RNA in at least 10 [tL dH20. 2. 3' Adaptor ligation If the 3' ligation step is performed under standard T4 RNA ligase reaction conditions, RNA species with 5' phosphate and 3' hydroxyl groups, such as miRNAs, will circularize rather than ligate to the 3' adaptors. In presence of ATP, a nucleophilic lysine on the ligase attacks the ATP molecule to form an adenylated ligase intermediate (ApLigase). Ligase + ATP @ Ap-Ligase + PPi (1) The adenylated ligase then transfers the adenylate (Ap) to an RNA molecule with a 5' phosphate, ideally to the 3' adaptor (p-Adaptor). Ap-Ligase + p-Adaptor @ Ligase + App-Adaptor (2) The ligase then joins the 5' terminus of the adenylated adaptor to the 3' terminus of a substrate with 3' hydroxyl, releasing adenosine monophosphate (AMP) in the process. Since the 3' terminus of the 3' adaptor is blocked, the adenylated adaptor can only ligate to the 3' terminus of small RNAs, like a miRNA (p-miRNA). Ligase + App-Adaptor + p-miRNA > Ligase + p-miRNA-Adaptor + AMP (3) However, if the adenylated ligase transfers the adenylate to a miRNA, for example, the 5' terminus of the adenylated miRNA can ligate to its own 3' terminus and circularize. The circularized product will be eliminated during gel-purification or PCR amplification. To circumvent this problem, two approaches were previously used. One method dephosphorylated the RNAs using calf intestinal alkaline phosphase (CIP) prior to 3' ligation and rephosphorylated the ligated products before 5' ligation (Lagos-Quintana et al. 2001; Lee and Ambros 2001). This method, however, removes the 5' phosphate that distinguishes miRNAs from degradation fragments. An alternative method utilizes preadenylated 3' adaptors to perform the 3' ligation in absence of ATP (Lau et al. 2001). Of the two methods, the protocol using pre-adenylated 3' adaptors gained popularity due to its selective enrichment for miRNAs. However, even the purified ligases are partially adenylated, and these enzymatic reactions are reversible-i.e. the ligase can transfer the adenylate from a pre-adenylated adaptor to itself (2). The adenylated ligase can then transfer the adenylate to a miRNA, which will lead to a circularized miRNA. Using Rnl2(1-249) instead of T4 RNA ligase 1 improves the problem (Pfeffer et al. 2005) as this truncated mutant of T4 RNA ligase 2 has an impaired adenylate transfer function (Ho et al. 2004). Rnl2(1-249)K227Q was reported to perform even better than Rnl2(1-249) (Hafner et al. 2008) as K227 was implicated as a residue crucial for adenylate transfer activity (Ho et al. 2004). Because RNA ligases have different sequence preferences, one 3' adaptor ligation is performed with Rnl(1-249)K227Q and another with T4 RNA ligase 1. The two reactions can be combined immediately before or after the gel-purification. 2.1. Synthesize adenosine 5'-phosphorimidazolide (ImpA) (Lau et al. 2001) e Rinse 2 beakers in acetonitrile and air dry. e Make two mixtures: Mixture A: 174 mg AMP (FW 347.2) (0.5 mmol) 15 mL Dimethylformamide Mixture B: 262 mg Triphenylphosphine (FW 262.3) (1 mmol) 220 mg 2,2'-dipyridyldisulfide (FW 220.3) (1 mmol) 170 mg Imidazole (FW 68.08) (2.5 mmol) 0.90 mL Triethylamine (FW 101.2, d=0.726) 15 mL Dimethylformamide e Add Mixture A slowly into Mixture B while stirring until precipitates dissolve. - Cover beaker and stir for 1-1.5 hr at room temperature. - Make Precipitation Mixture: 1.1 g NaClO 4 (FW 122.4) (9 mmol) 225 mL Acetone 115 mL Anhydrous ethyl ether e Add Mixture A+B dropwise to Precipitation Mixture. e Remove solvent phase down to -60 mL. e Transfer precipitates to 50 mL conical bottom Corex or Teflon centrifuge tubes, rinse with acetone, centrifuge at 5000 rpm (3000g in ss34 rotor) for 10 min and pour off acetone. Repeat rinse 3 times. e Perform a final rinse with just ether, and spin down for 20 min. - Dry overnight in a vacuum vessel between 22.5-45'C. Store at -20'C. e 2.2. Adenylate 3' adaptor (Lau et al. 2001) 3' Adaptor: pTCGTATGCCGTCTTCTGCTTGidT Reagents ImpA MgC12 3' Adaptor Stock conc. 2M 1.3 mM e Incubate at 50'C for 3 hrs. - Gel purify on 20% gel. Amount 9 mg in 420 giL dH 20 7 [L 80 pL Final conc. 50 mM 25 mM 0.2 mM 2.3. Ligate 3' adaptor to small RNAs Reaction 1: Reac tion Reagent Purified 18-30nt RNA 100 pM Pre-adenylated 3' adaptor 1OX Ligation Buffer Amount 2.5 tL 0.5 pL 1 pL dH 2 0 5.5 pL Rnl2(1-249)K227Q (6.25 pig/ pL) 0.5 gL Total reaction volume 10 pL Reagent Amount Final 50 pmol lx _ ~3 pg 2: Purified 18-30nt RNA 2.5 pL 100 piM Adenylated 3' Adaptor lOX Ligation Buffer 0.5 pL 1 gL dH 2 0 5 pL T4 RNA Ligase 1 (NEB)(20 U/gL) 1 pL Total reaction volume 10 pL Final 50 pmol lx 20 units - Incubate Reaction 1 at 22'C for 30 min; incubate Reaction 2 at 22"C for 2 hours. Stop reactions by adding 2X urea loading buffer. e (Optional: Combine Reactions 1 and 2.) * Gel-purify on 15% gel as above. Run gel until bromphenol blue dye is close to the bottom; expose phosphorimager plate for -15-30 min. (Optional: run small amount of unligated material to track gel-shift.) e Resuspend precipitated RNA from combined reactions in 10 ptL dH2 O. 3. 5' Adaptor ligation The 5' ligation step enriches for RNA species with a 5' phosphate, a hallmark of RNase III cleavage. The sequence of the 5' adaptor cannot be changed without changing the sequencing primer as this region of the final construct anneals to the Illumina sequencing primer. 5' Adaptor GUUCAGAGUUCUACAGUCCGACGAUC Reagent Purified 3' Ligation product 100 pM 5' Adaptor lOX Ligation Buffer T4 RNA Ligase 1 4 mM ATP Amount 5 ptL 4 pLL 1.5 pL 1 pL 1 pL dH2 0 2.5 pL Total reaction volume (pL) 15 pL Final 400 pmol lx 20 units 4 nmol Incubate at 22*C for -18 hours. Stop reaction by adding 2X urea loading buffer. e Gel-purify on 10% gel. (Optional: Also run a small amount of 3' ligated products.) Run gel until BB dye just runs out. Expose phosphorimager plate for 2.5 hours to overnight. Keep gel at -20'C when exposing for long periods of time to minimize diffusion of RNA. - Resuspend precipitated RNA in 10 RL dH 2 0. 4. Reverse-transcription (RT) and base-hydrolysis 4.1. Reverse-transcribe ligated RNAs RT-primer/5' PCR primer CAAGCAGAAGACGGCATA Reagent Purified ligated RNA 100 uM RT-Primer/5' PCR primer dH 20 - Heat to 65C for 10 min, spin down briefly to cool. e Add following in order: o Amount 5 pL 1 pL 9.6 [tL 6.4 [tL 5X first strand buffer (Invitrogen) o 7 tL 1OX dNTPs (2 mM) o 3 [L 100 mM DTT e Heat to 48C for 3 min. e Remove 3 [tL for a RT-minus control. - Add 1 [tL of Superscript II RT (Invitrogen) (200 U/tL) and incubate at 44C for 1 hour. 4.2. Base-hydrolyze RNAs e Add 5 pL of 1 M NaOH and incubate for 10 min at 90*C. - Neutralize the base hydrolysis reaction with 25 pL of 1 M HEPES pH 7.0 and spin through Microspin G-25 column to desalt. Recover about 30 pL. 5. Splicing by overlap extension by PCR (SOE-PCR) While the reverse-transcribed cDNAs will have a length of -70 nts, they need to be extended to a length of~-92 nts for Illumina sequencing. Illumina determined the optimal length for cluster size and for bridge-amplification of the final construct on the flow-cell (pers. communication). The 3' PCR primer has the extender sequence, and its 3' end can anneal to the 3' end of the cDNA (Figure 1). To extend the cDNA to the final construct length, three rounds of PCR cycle are performed with the 3' PCR primer. After the extension, the 5' PCR primer is added for amplification. Due to its length, the final construct is purified on a formamide gel rather than on a urea gel to ensure that all double-stranded DNAs have denatured. RT-primer/5' PCR primer CAAGCAGAAGACGGCATA 3' PCR primer AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA e Set up SOE-PCR. (Note: Can use less RT reaction and increase PCR cycles.): Reagent RT reaction RT-minus control 5X PCR Buffer 2 mM dNTP (1Ox) 150 nM 3' PCR primer Phusion polymerase dH20 e e RT sample 10 ptL 0 20 pL 12.6 pL 2 piL (final 0.3 pmol) 1 pL 53.4 iL RT-minus sample 0 3 ptL 20 pL 12.6 pL 2 pL (final 0.3 pmol) 1 iL 60.4 uL Perform 3 cycles of PCR to let small RNA s extend before amplification: 98 0 C 30 sec 94 0 C 30 sec 60 0 C 30 sec 720 C 15 sec 72 0 C 10 min 3 Cycles To each sample add: 1 pL 25 IM 5' PCR primer 1 pL 25 piM 3' PCR primer e Split reaction(s) into 2 x 50.5 pL. - Perform 15-18 cycles of PCR: 98 0 C 30 sec 94 0 C 30 sec 60 0 C 30 sec 720 C 15 sec 72 0 C 10 min 15-18 Cycles of PCR e Ethanol-precipitate and resuspend in 15 jiL IX formamide loading buffer (95% formamide, 18mM EDTA, 0.025% (w/v) xylene cylenol, 0.025% (w/v) bromphenol blue, 0.025% (w/v) SDS). e Mix 2 pL l0bp DNA marker (1.0 pg/pL) with 13 tL lx formamide loading buffer. - Heat samples and DNA marker for 10 min at 85*C and gel-purify on 90% formamide, 8% acrylamide gel. - Stain with SYBR Gold (Invitrogen) (1 ptL/50 mL IX TBE ). Cut and elute 85-105nt gel piece. RT-minus sample will run at -40-50 bps. e Ethanol-precipitate as above, but do not add glycogen during final purification. Speed vacuum for 30 min to remove leftover formamide. - Resuspend in 15 [tL of 10 mM Tris and submit sample for sequencing. Concluding remarks Sequencing data from small-RNA libraries constructed using this protocol can be used to profile small RNAs from a broad range of samples. Variations of this protocol have been used to make the following small-RNA libraries: C. elegans libraries across developmental stages (Appendix A); Nematostella vectensis and Amphimedon queenslandicalibraries (Appendix B); murine heart and muscle libraries (Appendix C); murine brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns libraries (Chapter 3); and a human brain library (Appendix D). These datasets have contributed to understanding of small RNA-ome of these samples. In particular, the analysis of the data from mouse libraries is presented in the next chapter. Figure Legend Figure 1. Flowchart for construction of small-RNA library. The details on each step are explained in the main text under the corresponding heading. References Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., Chien, M., Russo, J.J., Ju, J., Sheridan, R., Sander, C., Zavolan, M., and Tuschl, T. 2006. A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442(7099): 203-207. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. 2008. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes & Development 22(20): 2773-2785. Berezikov, E., Chung, W.-J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian mirtron genes. Mol Cell 28(2): 328-336. Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E., and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee brain. Nat Genet 38(12): 1375-1377. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J., Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16(10): 1289-1298. Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and Hannon, G.J. 2007. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128(6): 1089-1103. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. P Natl Acad Sci Usa 104(46): 18097-18102. Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099): 199-202. Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. 2006. A novel class of small RNAs in mouse spermatogenic cells. Genes & Development 20(13): 1709-1714. Hafner, M., Landgraf, P., Ludwig, J., Rice, A., Ojo, T., Lin, C., Holoch, D., Lim, C., and Tuschl, T. 2008. Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods 44(1): 3-12. Ho, C.K., Wang, L.K., Lima, C.D., and Shuman, S. 2004. Structure and mechanism of RNA ligase. Structure 12(2): 327-339. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294(5543): 853-858. Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New microRNAs from mouse and human. Rna 9(2): 175-179. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. 2002. Identification of tissue-specific microRNAs from mouse. CurrBiol 12(9): 735-739. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V., Chiaretti, S., Foi, R., Schliwka, J., Fuchs, U., Novosel, A., MUller, R.-U., Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M., Weir, D.B., Choksi, R., De Vita, G., Frezzetti, D., Trompeter, H.-I., Hornung, V., Teng, G., Hartmann, G., Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle, J.W., Ju, J., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein, M.J., Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl, T. 2007. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129(7): 1401-1414. Lau, N., Lim, L., Weinstein, E., and Bartel, D. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858862. Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes. Science 313(5785): 363-367. Lee, R. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294(5543): 862-864. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1): 89-100. Pfeffer, S., Lagos-Quintana, M., and Tuschl, T. 2005. Cloning of small RNA molecules. Curr ProtocMol Biol Chapter 26: Unit 26.24. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 2 1U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207. Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007a. Intronic microRNA precursors that bypass Drosha processing. Nature 448(7149): 83-86. Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007b. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17(12): 1850-1864. Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. 2004. Photocleavable fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific coupling chemistry. P NatlAcadSci Usa 101(15): 5488-5493. Figure 1 small RNA 1. Purification of small RNAs from total RNA Use radioactive 18mer and 30mer as markers 2. 3' Adaptor Ligation Ligate pre-adenylated 3' adaptor to small RNA without ATP Reaction 1: Rnl2(1-249)K227Q Reaction 2: T4 RNA ligase 1 3. 5' Adaptor Ligation Ligate 5' adaptor using T4 RNA ligase 1 5' adaptor 4. RT & Base-Hydrolysis - m, - - 3' PCR primer - 5. SOE-PCR - extender Final Construct 5' adaptor small RNA - Chapter 3 Mammalian microRNAs: Experimental evaluation of novel and previously annotated genes H. Rosaria Chiang' 2 , Lori W. Schoenfeld' 2 , J. Graham Ruby1' 2 3 , Vincent C. Auyeung 1,2,4, Noah Spies1 ,2,Daehyun Baek' 2 , Wendy K. Johnston', 2 , Carsten Russ 5 , Shujun Luo6 , Joshua E. Babiarz7 , Robert Blelloch 7 , Gary P. Schroth 6, Chad Nusbaum5 , David P. Bartell, 2 'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 3 Current address: Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA 4Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA 5 Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA 6Illumina, Inc., Hayward, CA 94545, USA 7 Institute for Regeneration Medicine, Center for Reproductive Sciences, and Department of Urology, University of California San Francisco, San Francisco, CA 94143, USA 2Howard H.R.C. performed the computational analysis excluding RNA editing which was performed by V.C.A., untemplated nucleotide addition which was performed by N.S., and effects of miR-223 and miR-155 which was performed by D.B.. L.W.S. performed the transfections and W.K.J. made the libraries for the overexpression experiments. C.R., S.L., G.P.S., and C.N. sequenced some of the mouse libraries. J.E.B. and R.B. supplied the sequencing data from small RNA library of mouse embryonic stem cells. H.R.C., L.W.S., V.C.A., N.S., D.B., and D.P.B wrote the manuscript. Supplemental Tables 3, 5, 6, and 7 as well as Supplemental Figure 2 are provided as electronic files on the accompanying CD-ROM. Supplemental Table 3 is best opened with a web browser. Published as: Chiang, H. R., Schoenfeld, L. W., Ruby, J. G., Auyeung, V. C., Spies, N., Baek, D., Johnston, W. K., Russ, C., Luo, S., Babiarz, J. E., Blelloch, R., Schroth, G. P., Nusbaum, C., and Bartel, D. P. (2010) Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24:992-1009. Abstract MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. Analysis of these sequences confirmed 398 annotated miRNA genes and identified 108 novel miRNA genes. More than 150 previously annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins also did not yield small RNAs, whereas ectopically expressing the confirmed and newly identified hairpins usually did yield small RNAs with the classical miRNA features, including dependence on the Drosha endonuclease for processing. These experiments, which suggest that previous estimates of conserved mammalian miRNAs were inflated, provide a substantially revised list of confidently identified murine miRNAs from which to infer the general features of mammalian miRNAs. Our analyses also revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan pre-miRNA, newly identified instances of miRNA editing, and evidence for widespread pre-miRNA uridylation reminiscent of miRNA regulation by Lin28. Introduction MicroRNAs (miRNAs) are endogenous -22-nucleotide (nt) RNAs that posttranscriptionally regulate gene expression (Bartel 2004). MicroRNAs mature through three intermediates: a primary miRNA transcript (pri-miRNA), a precursor miRNA (pre- miRNA), and a miRNA:miRNA* duplex. RNA Polymerase II transcribes the primiRNA, which contains one or more segments that fold into an imperfect hairpin. For canonical metazoan miRNAs, the RNase III enzyme Drosha together with its partner, the RNA-binding protein DGCR8, recognize the hairpin, and Drosha cleaves both strands -1 1 base pairs from the base of the stem (Han et al. 2006). The cut leaves a 5' phosphate and 2-nt 3' overhang (Lee et al. 2003). The liberated pre-miRNA hairpin is then exported to the cytoplasm by Exportin-5 (Yi et al. 2003; Lund et al. 2004). There, the RNase III enzyme Dicer cleaves off the loop of the pre-miRNA, -22 nt from the Drosha cut (Lee et al. 2003), again leaving a 5' monophosphate and 2-nt 3' overhang. The resulting miRNA:miRNA* duplex, comprised of -22-nt strands from each arm of the original hairpin, then associates with an Argonaute protein such that the miRNA strand is usually the one that becomes stably incorporated while the miRNA* strand dissociates and is degraded. In addition to canonical miRNAs, some miRNAs mature through pathways that bypass Drosha/DGCR8 recognition and cleavage. Members of the mirtron subclass of pre-miRNAs are excised as intron lariats from the pri-miRNA by the spliceosome, and following debranching, fold into Dicer substrates (Okamura et al. 2007; Ruby et al. 2007a). For some mirtrons, known as tailed mirtrons, a longer intron is excised such that only one end of the pre-miRNA is generated by the spliceosome, whereas the other end of the pre-miRNA matures through the Drosha-independent trimming of a 5' or 3' tail (Ruby et al. 2007a; Babiarz et al. 2008). Members of another subclass of pre-miRNAs, called endogenous short-hairpin RNAs (shRNAs), are suitable Dicer substrates without preprocessing by either Drosha or the spliceosome (Babiarz et al. 2008). Other small silencing RNAs are generated from the sequential processing of long hairpins or long bimolecular duplexes. These small RNAs are classified as endogenous small interfering RNAs (siRNAs) rather than miRNAs because they derive from extended duplexes that produce many different small RNA species, whereas miRNAs derive from distinctive hairpins that produce one or two dominant species (Bartel 2004). The first indication of the abundance of miRNA genes came from sequencing small RNAs from mammals, flies and worms (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). Hundreds of mammalian miRNAs have been identified by Sanger sequencing of cloned small-RNA-derived cDNAs (Lagos-Quintana et al. 2001; Lagos-Quintana et al. 2002; Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov et al. 2006b; Landgraf et al. 2007). Some miRNAs, however, are expressed only in a limited number of cells or through a limited portion of development, and their rarity makes them difficult to detect. Computational methods have been used to identify mammalian miRNAs initially missed by sequencing, and some of these predicted miRNAs have been evaluated experimentally-e.g., by rapid amplification of cDNA ends (RACE) (Lim et al. 2003; Xie et al. 2005), hybridization to RNA blots (Berezikov et al. 2005), microarrays (Bentwich et al. 2005), and RNA-primed array-based Klenow extension (RAKE) (Berezikov et al. 2006b). Each of these experimental methods, however, can yield false positives. Indeed, recent work in invertebrates and plants (Rajagopalan et al. 2006; Ruby et al. 2006; Ruby et al. 2007b) has shown that the fraction of erroneously annotated miRNAs can be quite high, depending on the quality of the initial computational predictions. Even when miRNA genes are predicted correctly, the resolution of the prediction is often insufficient to confidently determine the precise 5' end of the mature miRNA. Because miRNAs repress target mRNAs by pairing to the seed sequence, which is defined relative to the position of the miRNA 5' end, singlenucleotide resolution of 5'-end annotations is required for useful downstream analysis of their physiological consequences (Bartel 2009). Another approach for finding miRNAs and other small RNAs missed in the early sequencing efforts is high-throughput sequencing (Lu et al. 2005). In mammals, highthroughput sequencing methods that have contributed to miRNA discovery efforts have included massively parallel signature sequencing (MPSS) (Mineno et al. 2006), miRNA serial analysis of gene expression (miRAGE) (Cummins et al. 2006), 454 pyrosequencing (Berezikov et al. 2006a; Berezikov et al. 2007; Calabrese et al. 2007) and Illumina sequencing (Babiarz et al. 2008; Kuchenbauer et al. 2008). Here we use the Illumina sequencing-by-synthesis platform (Seo et al. 2004) for miRNA discovery in mouse. Analyses of these reads, combined with experimental evaluation of newly identified miRNAs as well as previous annotations, has led us to substantially revise the set of confidently identified murine miRNAs, thereby providing a more accurate picture of the general features of mammalian miRNAs and their abundance in the genome. In addition, our results revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan pre-miRNA, rare instances of 5' heterogeneity, newly identified instances of miRNA editing, and widespread pre-miRNA uridylation reminiscent of Lin28-like miRNA regulation. Results We sequenced small-RNA libraries from three mouse tissues, brain, ovary, and testes, as well as embryonic days 7.5 (e7.5), 9.5 (e9.5), 12.5 (e12.5) and newborn. Combining these data with data collected similarly from mouse embryonic stem (ES) cells (Babiarz et al. 2008) yielded 28.7 million reads between 16 and 27 nt in length that perfectly matched the mouse genome assembly (Supplemental Table 1). Of these reads, 79.3% mapped to miRNA hairpins, and 7.1% mapped to other annotated noncoding-RNA genes (Supplemental Table 2). Because the sequencing protocol was selective for RNAs with 5' monophosphate and 3' hydroxyl groups, this dominance of miRNA species was expected (Lau et al. 2001). MicroRNA gene discovery As when analyzing high-throughput data from invertebrates (Ruby et al. 2006; Ruby et al. 2007b; Grimson et al. 2008), we identified miRNA genes in mouse by applying the following criteria: 1) expression of the candidate miRNA, with a relatively uniform 5' terminus, 2) pairing characteristics of the predicted hairpin, 3) absence of annotation suggesting non-miRNA biogenesis, 4) absence of proximal reads suggesting that the candidate is a degradation intermediate, and 5) presence of reads corresponding to a miRNA* species with potential to pair to the miRNA candidate with -2-nt 3' overhangs. Using a low-stringency genomic search strategy that considered the first four criteria, 736 miRNA candidates were identified from the total dataset of mouse reads. Manual inspection of these candidates, focusing on all five criteria, narrowed the list to 465 canonical miRNA genes, 377 of which were already annotated in miRBase v.14.0 (Griffiths-Jones 2004) and 88 of which were novel (Fig. 1A; Supplemental Fig. 51; Supplemental Table 3). We also found 14 mirtrons (including ten tailed mirtrons), four of which were already annotated, and 16 endogenous shRNAs, six of which were previously annotated (Figure IB). When added to the 88 novel canonical miRNA genes, the newly identified mirtons and shRNAs raised the total number of novel genes to 108. Of these 108 genes, 36 appeared to be close paralogs of previously annotated miRNA genes (most of which were paralogs of mir-466, mir-467, or mir-669), producing miRNA reads that were identical to the previously annotated miRNAs, creating ambiguity as to which loci contributed to the sequenced reads. Most of these close paralogs (35/36) as well as 14 other novel loci were clustered with annotated miRNAs. The 72 novel genes with reads distinguishable from those of previously identified genes were expressed at a lower levels than the previously annotated genes (median read counts, 27 and 8206, respectively), and compared to previously annotated miRNAs, a higher fraction of these novel miRNAs were located within introns of annotated [RefSeq (Pruitt et al. 2005)] mRNAs (47% and 26%, respectively). Experimental evaluation of unconfirmed miRNAs Of 564 miRBase-annotated miRNA genes that map to mm8 genome assembly, 157 annotated miRNAs did not pass the filters for miRNA candidates (Fig IA, B; Supplemental Fig. SI; Supplemental Table 4). Of these 157, 26 mapped to annotated rRNA and tRNA loci, 52 had no reads mapping to them, and another 72 had some reads but in numbers deemed insufficient for confident annotation. The remaining seven either had reads with very heterogeneous 5' ends, which suggested non-specific degradation of a non-pri-miRNA transcript (mir-464, mir-1937a, and mir-1937b), had many reads that mapped well into the loop of the putative hairpin, which were inconsistent with Dicer processing (mir-451, mir-469, mir-805), or did not give a predicted fold with the requisite pairing involving the candidate and predicted miRNA* (mir-484) (Supplemental Fig. S2). For five of these seven, we have no reason to suspect that they might be authentic miRNA genes. Among the remaining two, mir-484 might be regarded as a miRNA candidate because manual refolding was able to generate a hairpin with the requisite pairing, but even so, this candidate lacked reads for the predicted miRNA*. miR-451 is a noncanonical miRNA generated from an unusual hairpin without production of a miRNA:miRNA* duplex (S. Cheloufi and G. Hannon, personal communication). We do not suspect that any other annotated miRNA genes failed to pass our filters for the same reason as mir-451. An additional 20 annotated miRNA hairpins were in our set of candidates but failed the manual inspection because they lacked predicted miRNA* reads even after allowing for alternate hairpin structures. Hundreds of candidates from other miRNA discovery efforts (Xie et al. 2005; Berezikov et al. 2006b) also failed to pass the filters, usually because no reads mapped to them. One of the annotated miRNA genes missing from our datasets was mir-220, which had been predicted computationally using MiRscan as a miRNA gene candidate conserved in human, mouse and fish, and was supported experimentally using RACE analysis of zebrafish small RNAs (Lim et al. 2003). In contrast, the other 37 miRNAs newly annotated by Lim et al. (2003) were among our 387 confirmed miRNAs. The absence of mir-220 in our datasets might have reflected either very low expression in the sequenced samples or inaccuracy of its annotation. Similarly, mir-207, annotated in a contemporaneous study that cloned novel miRNAs from mouse tissues, was missing from our dataset, but another 27 miRNAs annotated from that study were confirmed (LagosQuintana et al. 2003). To evaluate whether the missing annotated miRNAs and candidates represented authentic miRNAs, we developed a moderate-throughput assay to examine if their respective hairpins can be processed as miRNAs in cultured cells (Fig. 2A). If these putative miRNAs were missing from our datasets because they were not expressed in the sequenced tissues or stages, we reasoned that they would probably be detected in cells ectopically expressing their respective hairpins, because most authentic miRNAs are correctly processed from heterologous transcripts that include the full hairpin flanked by ~100 nucleotides of genomic sequence on each side of the hairpin (Chen et al. 2004; Voorhoeve et al. 2006). Alternatively, if these putative miRNAs were missing because they were not authentic miRNAs and therefore lacked the features needed for Drosha and Dicer processing, they would not be sequenced from cells ectopically expressing their hairpins. To evaluate many hairpins simultaneously, we transfected pools of hairpinexpressing constructs into HEK293T cells and isolated small RNAs for high-throughput sequencing. The performance of 26 positive controls, chosen from canonical human/mouse miRNAs confirmed by our sequencing from mouse, illustrated the value of the assay. For all but one of these controls, miRNA and miRNA* reads were more abundant in the cells ectopically expressing the hairpin than in the cells without the hairpin constructs (Fig. 2B-D; Supplemental Figs. S3, S4). For example, both hsa-miR-193b and mmumiR-137 (from human and mouse, respectively) were >10 fold over-expressed (Fig. 2B). The positive controls included genes of tissue-specific miRNAs, including mir-122 (liver), mir-133 (muscle), mir-223 (neutrophil) and several neuron-specific miRNAs, with the idea that hairpins of tissue-specific miRNAs might require tissue-specific factors for their processing and therefore might be sensitive to the potential absence of such factors in HEK293T cells. Differences were observed, ranging from -100 to 10,000 reads above the control transfection (Fig. 2C; hsa-mir-214 and hsa-mir-9-1, respectively), consistent with the idea that factors absent in HEK293T cells might play a role in processing of some miRNAs. Alternatively, some miRNA hairpins might be processed less efficiently in all cell types, perhaps because our vectors might not present the hairpins in an optimal context for processing. Perhaps hsa-mir-192, the control gene that did not over-express in our assay lacked crucial processing determinants needed in all cells. In either scenario, the very high sensitivity of high-throughput sequencing enabled miRNAs to be observed from most of the less efficiently processed hairpins. From the 52 annotated mouse miRNAs that our study did not sequence, 17 miRNAs, including mir-220 and mir-207, were tested in the ectopic-expression assay. One, mir-698, generated a single read corresponding to the annotated miRNA, and the rest failed to generate any reads representing the annotated miRNA (Fig. 2D). From the 72 annotated miRNAs that we could not identify due to insufficient number of reads, 28 were tested, and only four of these were found to be over-expressed (Fig. 2D). The difficulty in over-expressing a canonical control miRNA (hsa-miR-192) illustrates that our ectopic-expression assay cannot be used to prove conclusively that a particular hairpin does not represent an authentic miRNA gene. However, the inability to overexpress every one of the 17 unsequenced miRNAs as well as most of the 28 insufficiently sequenced miRNAs strongly indicated that, overall, these annotations have been faulty and that our failure to detect previously annotated miRNAs in mouse samples was not merely due to inadequate sequencing coverage. We also tested ten of the 20 annotated miRNA genes that we had identified as candidates but did not confidently classify as miRNA genes because the predicted miRNA* species was not sequenced. Four of seven genes without a miRNA* read and one of three genes with substantially offset miRNA* reads produced the predicted miRNA* species in our ectopic-expression assay (Fig. 2D). mir-184 and mir-489, which both tested positive in this assay, are conserved. mir-184 is conserved throughout mammals, and mir-489 is conserved to chicken, although the miRNA seed, which is highly conserved in mammals and chicken, differs in mouse and rat. Thus, these two genes, as well as mir-875, which is a broadly conserved gene without a miRNA* read, were added to our set of confidently identified miRNA genes. Also added were mir-290, mir-291a, mir-291b, mir-292, mir-293, mir-294, and mir-295, which were missing in the genome assembly (mm8) used in our analysis because they fall in the region of the genome that is difficult to assemble. Including these 10 genes, plus mir-451, brings the total number of confidently identified miRNA genes to 506. Our sets of confirmed and novel murine miRNAs also provided the opportunity to evaluate results of other computational efforts to find miRNAs conserved among mammals. One set of studies predicted miRNAs based on phylogenetic conservation and then tested these and additional murine-specific hairpins using RAKE and cloning (Berezikov et al. 2005; Berezikov et al. 2006b). Among the 322 candidates supported by these experiments, 11 were in our sets of miRNAs (two in our confirmed set and nine in our novel set), and another nine did not satisfy our annotation criteria but had at least one read consistent with the predictions. Another study started with MiRscan predictions conserved in four mammals and filtered these predictions for potential seed pairing to conserved motifs in 3' UTRs (Xie et al. 2005). Of their 144 final candidates, 45 were paralogs of miRNAs already published at the time of prediction. Of the remaining 99 candidates, 27 were in our sets of miRNAs (26 in our confirmed set and one in our novel set), and one did not satisfy our annotation criteria but had three reads consistent with the miRNA* of the predicted miRNA. However, only four of the 27 confirmed miRNA genes (4% of the 99 novel predictions) gave rise to the mature miRNA with the predicted seed, suggesting that filtering MiRscan predictions for potential seed pairing provided little, if any, added benefit. This conclusion concurs with a recent analysis of miRNA targeting: miRNAs that are not conserved beyond mammals do not have enough preferentially conserved sites to place these sites as among the most conserved UTR motifs (Friedman et al. 2009). Therefore, it stands to reason that preferentially conserved UTR motifs would provide little value for predicting such miRNAs. To investigate whether the computational candidates might have been missed because of low expression in tissues and stages from which we sequenced, we included representatives from each study in our ectopic-expression assay. We randomly selected 12 Xie et al. candidates and eight Berezikov et al. 2006 candidates that our study did not sequence, as well as four human candidates from the Berezikov et al. 2005 set whose mouse orthologs were not sequenced. None generated reads representing the candidate miRNAs (Fig. 2C, D). Taken together, our results raise new questions regarding the authenticity of these candidates and suggest that previous extrapolation from these candidates, which had suggested that mammals have a surprisingly high number of conserved miRNA genes (as many as 1,000) (Berezikov et al. 2005) should be revised accordingly. Experimental evaluation of novel miRNAs and new candidates We also used the ectopic-expression assay to evaluate novel miRNAs identified from our sequencing. Of the 25 evaluated hairpins, 18 (72%) generated a significant number of miRNA-like reads in HEK293T cells, indicating that most, although perhaps not all, of our 108 novel annotations represented authentic miRNAs (Fig. 3; Supplemental Figs. S5, S6). These 25 hairpins were arbitrarily selected for evaluation, except for a preference for rare miRNAs, i.e., those that had less than ten mature miRNA reads. The rare miRNAs and the higher-abundance miRNAs performed similarly (5/7 and 11/14 positives, respectively). To evaluate Drosha- and Dicer-dependence of the over-expressed hairpins, the experiment was repeated with and without a plasmid encoding a dominant-negative allele of either Drosha or Dicer (Han et al. 2009) (Fig. 3A). All but two canonical miRNA controls and most of the novel canonical miRNAs (16/17) responded to TNdrosha coexpression (Fig. 3B; Supplemental Fig. S7). Fewer responded to TNdicer, suggesting that this construct was less disruptive of normal miRNA processing (Supplemental Fig. S7). The tested hairpins included several noncanonical miRNA precursors. The level of mmu-miR-1224, an annotated mirtronic miRNA (Berezikov et al. 2007), increased in presence of TNdrosha, as expected if this pre-miRNA had more access to Exportin-5 and Dicer when the canonical pre-miRNAs were reduced (Grimm et al. 2006). Although mmu-miR-1839, an annotated shRNA (Babiarz et al. 2008), did not over-express, mmumiR-344e and mmu-miR-344f, novel shRNAs, did over-express from our vector, and as expected for shRNAs, their biogenesis was Drosha-independent (Fig. 3B; Supplemental Figs. S5-7). Repeating the ectopic-expression assay in Dicer-knockout and control cells confirmed that mmu-miR-344e biogenesis was Dicer-dependent (data not shown). We also evaluated our candidates that had not satisfied our criteria for confident annotation as miRNAs, usually because they lacked reads representing the predicted miRNA*. We tested three sets of these candidates. One set represented our candidates that lacked predicted miRNA* reads yet based on small-RNA sequencing results from wild-type and mutant ES cells (Babiarz et al. 2008) appeared DGCR8- and Dicerdependent. Another set represented candidates that appeared conserved in syntenic regions of other mammalian genomes, and the third set was selected at random from among the remaining candidates. All but one of the 28 tested candidates failed to generate miRNA-like reads, and the processing of the candidate that did generate miRNA-like reads in HEK293T cells was not dependent on Dicer, based on its presence in Dicer-knockout ES cells (Babiarz et al. 2008). The results evaluating the novel miRNAs and candidates illustrated the importance of requiring a convincing miRNA* read as a criterion for confident miRNA annotation. Five previously annotated miRNAs that were initially rejected due to lack of a convincing miRNA* read had tested positive in our over-expression assay (Fig. 2D), which indicated that this criterion was too stringent for some of the previously annotated genes. However, the results for the newly identified miRNAs and candidates showed that the presence of a convincing miRNA* read was the primary criterion that distinguished the novel canonical miRNAs (most of which tested positive) from the remaining candidates (nearly all of which tested negative). By requiring a convincing miRNA* read in addition to the other four annotation criteria, our approach accurately distinguished miRNA reads from the millions of other small-RNA reads generated by high-throughput sequencing, with relatively few false positives among the novel annotations and few false negatives among the rejected candidates. MicroRNA expressionprofiles To compare expression levels of each miRNA in different sequenced samples, we constructed relative miRNA expression profiles (Fig. 4; Supplemental Table 5), and to compare the relative expression of various miRNAs with each other, we generated a table of overall miRNA abundance (Supplemental Table 5). Most miRNAs had substantially stronger expression in some tissues or stages than in others, in agreement with previous observations (Wienholds et al. 2005). We expect that strong tissue- or stage-specific expression preferences inferred from our limited sample set will be revised as more tissues and stages are surveyed. Generalfeatures of mammalian miRNAs Our analyses of high-throughput sequencing data and subsequent experimental evaluation reshaped the set of known murine miRNAs, setting aside 173 questionable annotations and adding 108 novel miRNA genes to bring the total number of confidently identified murine genes to 506. A majority (60%) of the 506 genes appeared conserved in other mammals (Supplemental Fig. SI; Supplemental Table 6). However, only 15 of the 108 novel miRNA genes were conserved in other mammals, suggesting that the number of nonconserved miRNA genes will soon surpass that of conserved ones as high-throughput sequencing is applied more deeply and more broadly. Five novel miRNAs (mir-3065, mir-3071, mir-3074-1, mir-3074-2, and mir-3111) mapped to the antisense strand of previously annotated miRNAs (mir-338, mir-136, mir24-1, mir-24-2, and mir-374, respectively), which when added to the previously identified mir-1-2/mir-1-2-as pair brings the total number of sense/antisense miRNA pairs to six. In addition, the mir-486 hairpin has a palindromic sequence, which resulted in the same reads mapping to both the sense (mir-486) and antisense (mir-3107) hairpins. Analysis of the antisense loci of all 498 miRNA genes identified six additional loci that gave rise to some antisense reads resembling miRNAs (antisense loci of mir-21, mir-126, mir-150, mir-337, mir-434, mir-3073). As more high-throughput data is acquired, these as well as other antisense loci are likely to be annotated as miRNA genes. However, < 0.00002 of our miRNA reads corresponded to miRNAs from antisense loci (excluding the reads mapping ambiguously to mir-4861mir-3107), raising the possibility that none of the murine antisense miRNAs have a function comparable to that of miR-iab-as in flies (Bender 2008; Stark et al. 2008; Tyler et al. 2008). Our substantially revised set of miRNA genes provided the opportunity to speak to the general features of 475 canonical miRNAs in mouse, with the properties of the 295 conserved genes applying also to the conserved genes of humans and other mammals (Table 1). Most canonical miRNA genes (61%) were clustered in the genome, falling within 50 kb of another miRNA gene, on the same genomic strand. Even when excluding the four known megaclusters (Calabrese et al. 2007), which are on chromosomes 2, 12 (two clusters), and X (with 69, 35, 16, and 18 genes, respectively), a sizable fraction of the remaining genes (153/337) were in clusters of 2-7 genes. As observed in humans (Baskerville and Bartel 2005), miRNAs from these loci within 50 kb of each other tended to have correlated expression, consistent with their processing from polycistronic pri-miRNA transcripts (Supplemental Fig. S8). In a scenario of one transcript per cluster, the 475 canonical miRNA genes would derive from 245 transcription units. In addition, many miRNA hairpins mapped to introns. Just over a third (38%) of the hairpins fell within introns of annotated mRNAs. Several lines of evidence, including coexpression correlations, chromatin marks, and directed experiments, indicate that miRNAs can be processed from introns (Baskerville and Bartel 2005; Kim and Kim 2007; Marson et al. 2008). In this scenario, as many as 107 (44%) of the 245 transcription units could double as pre-mRNAs. Other hairpins were found within transcripts that lacked other annotated functions, falling either within introns or exons, or in transcripts without evidence of splicing. MicroRNA hairpins are generally thought to each give rise to a single dominant mature guide RNA. This was usually the case for the murine miRNAs, although as in other species this result relied on grouping together as a single functional species all the isoforms that share the same 5' terminus. This grouping is justified based on the current understanding of miRNA target recognition, which stipulates that heterogeneity often observed at miRNA 3' termini should have no effect on miRNA target recognition (Bartel 2009). Most mature miRNA reads (97%) were 20-24 nt in length, with 20mer, 21mer, 22mer, 23mer, and 24mer comprising 5%, 19%, 47%, 21% and 4% of the reads, respectively (Supplemental Fig. S9). Although a single dominant mature species appears to be the most frequent outcome of miRNA biogenesis, some miRNA hairpins give rise to two or more species that each could function to target different sets of mRNAs. This expanded targeting potential arises from multiple mechanisms, including utilization of both strands of the miRNA:miRNA* duplex with similar frequency, 5' heterogeneity, sequential Dicer cleavage, and RNA editing. Addition of untemplated nucleotides to the 3' termini of the miRNAs can also occur, and although not thought to change targeting specificity, these changes could indicate posttranscriptional regulation of miRNA stability. Occurrence of each of these phenomena is described below. MicroRNAs from both arms, with occasional tissue-specific differences in the preferred arm Most canonical miRNA genes produced one dominant mature miRNA species, either from the 5' or from the 3' arm of the pre-miRNA hairpin, with an overall tendency to derive from the 5' arm (Table 1), as reported for previously annotated human miRNAs (Hu et al. 2009). Some, however, yielded a similar number of reads from both arms, suggesting that the two species enter the silencing complex with similar frequencies. For these genes, mature species from the 5' and 3' arms were annotated using the -5p and -3p suffixes, as is conventional in such cases (Griffiths-Jones 2004). Discrimination favoring one arm over the other was less pronounced for both the nonconserved miRNAs and the less highly expressed miRNAs (Fig. 5A), although for the miRNAs with very few reads this trend was likely enhanced by our requirement for a miRNA* read. Overall, the discrimination was high, with the species from the less dominant arm comprising 4.1% of the reads that map to a miRNA or miRNA*. For the ten most abundant miRNAs (sampling just the most abundant member in cases of repetitive miRNAs), discrimination was even higher, with the less dominant arm comprising only 1.3% of the reads. Nevertheless, the miRNA* species of these more highly expressed miRNAs were sequenced at a median frequency 13-fold greater than that of the median non-conserved miRNA, suggesting that a search for biological function for these miRNA* species might be at least as fruitful as that for the poorly expressed non-conserved miRNAs. If the mature miRNA accumulated preferentially from one arm of the pre-miRNA hairpin, the preferred arm generally remained consistent across the various libraries. For a few miRNAs, however, the preferred arms switched between samples (Fig. 5B), as reported previously using PCR-based miRNA quantification (Ro et al. 2007). For example, miR-142-5p was sequenced more frequently in ovary, testes and brain, and miR-142-3p was sequenced more frequently in embryonic and newborn samples. These results imply a developmental switch in targeting preferences. A similar arm-switching phenomena has been reported for a sponge miRNA (Grimson et al. 2008) and was observed for 20 other non-repetitive mouse miRNA genes (Fig. 5B). SequentialDicer cleavage of a mirtron hairpin In plants, a few pri-miRNA hairpins with long, continuous RNA duplexes are cleaved sequentially by Dicer to generate two adjacent miRNA:miRNA* duplexes (Kurihara and Watanabe 2004; Rajagopalan et al. 2006). Those precursors bear little resemblance to the shorter, imperfectly base-paired hairpins of metazoan miRNA genes. In mice, similar precursors are found in the form of hairpin siRNA (hp-siRNA) precursors, but their expression appears to be limited to germ-line tissues and totipotent ES cells, which lack a robust interferon response to intracellular dsRNA (Babiarz et al. 2008; Tam et al. 2008; Watanabe et al. 2008). However, we detected two miRNA:miRNA* duplexes deriving from the mmu-mir-3102 pre-miRNA hairpin, an apparent mirtron as evidenced by reads mapping to both boundaries of an intron (Fig. 5C; Supplemental Table 3). After splicing and debranching, the excised intron was predicted to fold into a 131-nt pre-miRNA hairpin-substantially longer than the average pre-miRNA length of 61 nts (calculated from the set of confirmed miRNAs). Reads from this locus suggested that Dicer cleaved this pre-miRNA twice, with the first cut generating the outer miRNA:miRNA* duplex and the second cut generating the inner miRNA:miRNA* duplex (Fig. 5C). The inner miRNA (miR-3102.2-3p) was among a set of proposed miRNA candidates (Berezikov et al. 2006b), but the most frequently sequenced species from this hairpin was the outer miRNA (miR-3102.1, Fig. 5C). Of the five genomes examined, the extended mir-3102 hairpin with both the inner and outer miRNAs appeared conserved only in rat, although the orthologous loci in cow, dog, and human also could fold into shorter hairpins, with miR-3102.1 potentially conserved in cow. We suspect that it is more than a coincidence that the single metazoan example of a sequentially diced miRNA is initially processed by the spliceosome rather than by Drosha. One way to explain this observation is that DGCR8/Drosha interacts directly with the loop of pri-miRNA stem-loops when recognizing its substrates (Zeng et al. 2005) and that the lack of sequentially diced Drosha-dependent miRNA hairpins in animals reflects the limited reach of this complex. 5' Heterogeneity Most conserved miRNAs had very precise 5' processing, with alternative 5' isoforms comprising only 8% of all miRNA reads (Fig. 6A, B). These results, analogous to those observed in worms and flies (Ruby et al. 2006; Ruby et al. 2007b), are consistent with the idea that selective pressure to avoid off-targeting acts to optimize precision of the cleavage event that produces the 5' terminus of the dominant species so as to prevent a consequential number of molecules with seed sequences in the wrong register. Moreover, 5' termini of conserved miRNAs were more precise than those of miRNA* reads (4% and 12% offset reads, respectively, excluding those that produce comparable numbers of small RNAs from each arm). For cases in which Dicer produced the 5' terminus of the miRNA, the Dicer cut appeared somewhat more precise than the Drosha cut (5% offset reads for miRNAs on the 3' arm, compared to 7% offset reads for miRNA* on the 5' arm), hinting that features of the pre-miRNA structure may supplement the distance from the Drosha cut as determinants of Dicer cleavage specificity (Ruby et al. 2006; Ruby et al. 2007b). A few miRNAs had less uniform 5' termini (Fig. 6A, B). For some miRNAs, 5' heterogeneity has been previously documented (Ruby et al. 2007b; Stark et al. 2007; Azuma-Mukai et al. 2008; Wu et al. 2009), the most prominent example being hsa-miR124, a conserved neuronal miRNA for which the 5'-shifted isoform was initially annotated as the miRNA and eventually replaced by the more prominent isoform following more extensive sequencing (Lagos-Quintana et al. 2002; Landgraf et al. 2007). Another prominent miRNA with unusually diverse 5' termini was miR-133a. This conserved miRNA, which is highly expressed in heart and muscle, had a second dominant isoform (miR-133a.2), which was shifted one nucleotide downstream from the annotated miRNA (miR-133a.1) (Fig. 6C; Supplemental Table 3). To test whether this heterogeneity might be explained by differential processing of the two mir-133a paralogous hairpins, as observed for the two Drosophilamir-2 hairpins (Ruby et al. 2007b), we tested the two mir-133a hairpins in our ectopic-expression assay. Although mir-133a-1 was somewhat more prone to produce the miR-133a.2 isoform, both hairpins produced a substantial amount of both isoforms (Fig. 6C). To investigate the functional consequences of miRNA 5' heterogeneity, we examined published array data showing the responses of mRNAs after deleting either mir-223, a miRNA with substantial heterogeneity, or mir-155, a miRNA with little heterogeneity. miR-223 is highly expressed in neutrophils, and analysis of small-RNA sequences from isolated neutrophils (Baek et al. 2008) was consistent with our sequencing results (Supplemental Table 3) in showing 5' heterogeneity, with 81% of the reads mapping to the 5' end of the major isoform miRNA and 12% mapping to the 5' end of a second isoform that was shifted by one nucleotide in the 3' direction (Fig. 6D). As expected, mRNAs with canonical 7-8-mer sites (Bartel 2009) matching the seed of the major isoform were significantly derepressed in the mir-223 deletion mutant [p < 10-1, Kolmogorov-Smirnov (K-S) test, comparing to no-site distribution]. mRNAs with canonical sites matching the minor isoform also showed a significant tendency to be derepressed, albeit to a lesser degree (Fig. 6D; p = 0.0022, 0.013, and 1.7 X 104, for 8mer, 7mer-m8, and 7-8mers combined, respectively). This result could not be attributed to the overlap between sites matching the major and minor isoforms because all mRNAs with a 6mer seed match to the major isoform (ACUGAC) were excluded, and additional analyses ruled out participation of the "shifted 6mer" match (Friedman et al. 2009) to the major isoform (AACUGA; Supplemental Figure S10A). Analogous analysis of miR-155 yielded strong evidence for function of the major isoform (Rodriguez et al. 2007) but no sign of function for the minor isoform, which comprised very few (1%) of our miR- 155 reads (Fig. 6E; Supplemental Table 3). Taken together, our results show that some miRNAs have alternative 5' miRNA isoforms that are expressed at levels sufficient to direct the repression of a distinct set of endogenous targets and thereby broaden the regulatory impact of the miRNA genes. Therefore, we suggest that rather than choosing one isoform over the other for annotation as the authentic miRNA, more of these alternative isoforms should be annotated, with the expectation that for some highly expressed miRNAs, more than one 5' isoform contributes to miRNA function. RNA editing RNA editing in which adenosine is deaminated and thereby converted to inosine (I) has been reported for some miRNA precursors (Blow et al. 2006; Landgraf et al. 2007; Kawahara et al. 2008). Because I pairs with C, such edits could change miRNA target recognition. Reasoning that the mammalian adenosine deaminases (ADARs) responsible for A-to-I editing are primarily expressed in the brain, we searched for sequencing reads from brain that did not match the genome and had as their closest match a mature miRNA or miRNA*. After filtering for mismatches occurring more than 2 nt from the 3' end, a step taken to avoid considering instances of untemplated 3'-terminal addition, only 4% of the reads had a single mismatches to the genome (Supplemental Fig. S1 1A). Moreover, the fraction of sequences with A-to-G changes (indicative of A-to-I editing) was only 0.61%, a fraction resembling that of other mismatches (Supplemental Fig. S lIA). This fraction was also similar to that of the A-to-G changes in our synthetic internal standards used for preparing the sequencing libraries. These results indicate that mature edited miRNAs are very rare and difficult to distinguish above the background level of sequencing errors. The low frequency of editing in mature miRNAs was consistent with the findings that edited processed miRNAs are more than fourfold less common in mouse relative to humans (Landgraf et al. 2007) and are less common than edited miRNA precursors (Kawahara et al. 2008). The latter observation might be due to rapid degradation or impaired processing, which has been shown for miR-142 (Yang et al. 2006) and miR-151 (Kawahara et al. 2007a). Although editing did not appear to be a widespread phenomenon among all mature miRNAs, editing at specific sites might still be important for a few individual miRNAs. To investigate this possibility, mismatch fractions were calculated as the fraction of reads bearing a particular mismatch over all reads covering that genomic position. For each library, a change was considered significant if the fraction exceeded 5% and at least ten reads contained the mismatch. Additional filters designed to remove sequencing errors, alignment artifacts and instances of untemplated nucleotide addition preferentially retained G-to-A changes while removing nearly all other events (Supplemental Fig. S I1B). Sixteen A-to-G events passed the filters and subsequent manual examination, all of which occurred only in the brain library (Table 2). Five of these inferred editing sites were also observed in a low-throughput sequencing effort in human brain samples (Kawahara et al. 2008), indicating that editing of some miRNAs is conserved between mammals. Consistent with that study, eight of 16 editing sites occurred in a UAG motif. A separate examination of read alignments with up to three mismatches showed that the vast majority of edited reads were edited at one position, suggesting either that editing of multiple sites in the same RNA molecule is rare, or that multiply-edited RNAs are more rapidly degraded. A-to-I editing of a seed nucleotide would dramatically affect targeting. In addition to editing in the miR-376 cluster described previously (Kawahara et al. 2007b; Kawahara et al. 2008), we found another eight miRNAs that are edited within the seed of either the miRNA or the star strand. A-to-I editing could also affect miRNA loading and thereby indirectly affect targeting. Indeed, the editing of miR-540 might help explain why the 5' arm is more abundant in the brain than in other tissues, although editing is too infrequent to fully explain the switch in strand bias. Altering Drosha and Dicer processing could also indirectly affect targeting. Analysis of 5' ends showed that seven of 16 instances of editing were associated with a statistically significant (p <0.05) shift in the 5' nucleotide, presumably due to changes in the Drosha and Dicer cleavage site (Supplemental Fig. S1 ID). Untemplatednucleotide addition Much more prevalent than editing of internal nucleotides was addition of untemplated nucleotides to miRNA 3' termini. As previously reported for miRNAs in mammals (Landgraf et al. 2007) and also observed for those of worms and flies (Ruby et al. 2006; Ruby et al. 2007b), nucleotides most frequently added to murine miRNAs were U and A (Fig. 7A). Addition of C or G was no higher than background, as estimated by monitoring apparent addition to tRNA fragments (Fig. 7A). Possible sources of the background rate could be sequencing error, transcription error, or a low level of biological nucleotide addition. Some miRNAs were much more frequently extended than others (Supplemental Table 7). One very frequently extended miRNA was miR-143, for which the extended reads outnumbered the non-extended ones (196,565 compared to 114,980 reads, respectively). For extension by U, RNAs from the pre-miRNA 3' arm were three-times more frequently extended than were those from the 5' arm (Fig. 7A; Fig. 7B, p = 2.3x10-4, KS test). This preference, not observed for A extension (Fig. 7A, C), suggests that much of the U extension occurs to the pre-miRNA, prior to Dicer cleavage-a state in which the 3' arm but not the 5' arm would be available for extension (Fig. 7D). TUT4-catalyzed poly(U) addition to the let-7 pre-miRNA, which is specified by Lin28, plays an important role in posttranscriptional repression of let-7 expression (Heo et al. 2008; Hagan et al. 2009; Heo et al. 2009). Our analyses indicating untemplated U extension to many other pre-miRNAs hints that this type of regulation may not limited to let-7 but that at analogous pathways, presumably using mediators other than Lin28, act to regulate the expression of other murine miRNAs. Discussion The status of miRNA gene discovery in mammals Our current study sets aside nearly a third (173/564) of the miRBase v. 14.0 gene annotations for lack of convincing evidence that these produce authentic miRNAs. It also adds another 108 novel miRNA loci, raising the question of how many more authentic loci remain undiscovered. This question is difficult to answer. Ever since the recognition that the poorly conserved miRNAs are also the ones expressed at lower levels in mammals and thus are the most difficult to detect by both computational and experimental methods, we have known that it is impossible to provide a meaningful estimate of the number of mammalian miRNA genes remaining to be discovered (Bartel 2004). The broadly conserved miRNAs are another matter. Only three of the 88 novel canonical miRNAs had recognizable orthologs sequenced in chicken, lizard, frog, or fish, and these three were antisense to previously annotated broadly conserved miRNA genes. Therefore, apart from miRNAs expressed at very low levels from the antisense strand of known genes, we suspect that the list of broadly conserved miRNA gene families is nearing completion. The current set of murine miRNA genes includes 192 genes that fall into 89 broadly conserved miRNA gene families (Supplemental Table 6). Another 107 miRNA gene families appeared conserved in other mammals (Supplemental Table 6). These were represented by 120 murine genes, including 14 novel genes that were conserved in other mammals. Of these novel genes, 11 were founding members of novel conserved gene families. Some of these were identified with only 11 reads, indicating that additional pan-mammalian gene families remain to be found, although we have no evidence supporting the idea that the number of conserved gene families will rise to the very high levels suggested by some earlier computational studies (Xie et al. 2005; Berezikov et al. 2006b). For now, we can say that mammals have at least 196 conserved miRNA gene families represented in mouse by at least 312 pre-miRNA hairpins (303 canonical and nine noncanonical hairpins) produced from at least 194 unique transcription units. Because a single miRNA hairpin can produce multiple functional isoforms, generated by either 5' processing heterogeneity or utilization of both arms of the miRNA duplex, a single conserved hairpin can produce more than one conserved miRNA isoform. Because the different isoforms have different seed sequences, they fall into different families of mature miRNAs. Thus, the number of conserved families of miRNAs (i.e., mature guide RNAs) will exceed the number of conserved families of genes (i.e., hairpins). Perhaps the best known example of a hairpin with two broadly conserved isoforms is mir-9, for which conserved miRNAs from both arms of the hairpin are readily detected by using in situ hybridization in both zebrafish and marine annelids (Wienholds et al. 2005; Christodoulou et al. 2010). Numerous conserved genes produce more than one miRNA isoform (Fig. 5A, 6A), but for most of these we do not yet know whether production of the alternative isoform is conserved in other species. Highthroughput sequencing from other species will help identify many additional conserved isoforms. We anticipate that the discovery of multiple conserved isoforms will contribute much more to the future growth in the list of broadly conserved miRNA families than will the discovery of new conserved genes. As expected, the conserved miRNAs tended to be expressed at much higher levels than were the nonconserved ones, with the median read frequency of conserved miRNAs 44-fold greater than that of the nonconserved miRNAs (Fig. 4A, 5B). Therefore, even if many nonconserved miRNA genes remained to be found, these would add little to the number of annotated miRNA molecules in a given cell or tissue, and presumably even less to the impact of miRNAs on gene expression (Bartel 2009). Indeed, even more pressing than the question of how many poorly conserved miRNAs remain undetected is the question of whether any of the known poorly conserved miRNAs have any consequential function in the animal. Most of these poorly conserved miRNAs could have derived from transcripts that fortuitously acquired hairpin regions with features needed for some Drosha/Dicer processing. In this scenario, most of these newly emergent miRNAs will be lost during the course of evolution before ever acquiring the expression levels needed to have a targeting function sufficient for their selective retention in the genome. Consistent with the hypothesis that most of these miRNAs play inconsequential regulatory roles, these miRNAs generally accumulated to much lower levels in our ectopic-expression assay, (Fig. 3B, median read frequencies of 58 and 844 for nonconserved and conserved miRNAs, respectively), and they displayed weaker specificity for one arm of the hairpin (Fig. 5A), as would be expected if there was no advantage for the cell to efficiently utilize their respective hairpins. Nonetheless, some were efficiently processed, and at least a few poorly conserved miRNAs probably have acquired consequential species-specific functions. Although none have known functions, such hairpins are worthy of annotation as miRNA loci (just as protein-coding genes can be annotated before the protein is known to be functional), and as a class these newly emergent miRNAs could provide an important evolutionary substrate for the emergence of new regulatory activities. The major challenge for miRNA gene discovery stems from the difficulty in proving that a nonconserved, poorly expressed candidate is an authentic miRNA, combined with the even greater difficulty in proving that a questionable candidate is not an authentic miRNA. This challenge has become all the more acute as miRNA discovery has reached the point to which nearly all of the novel candidates are both nonconserved and poorly expressed. Our approach of testing pools of candidates in an ectopicexpression assay provides useful data for evaluating miRNA authenticity. However, our approach cannot provide conclusive proof for or against the authenticity of a proposed candidate, leaving open the possibility that some of the nonconserved, poorly expressed candidates that we classify as "confidently identified miRNAs" are false positives. When considering the limitations of the current tools for miRNA gene identification, this possibility cannot be avoided. Therefore, if any nonconserved, poorly expressed miRNAs are annotated as miRNAs, the resulting list of miRNAs will have to be somewhat fuzzy, with an expectation that some of the annotated genes will not be authentic miRNAs. This expectation should not be viewed as advocating the indiscriminant annotation of all candidates as miRNAs. Our proposal is that miRNA gene-discovery efforts should annotate as miRNAs only those novel candidates that are both found in high-thoughput sequencing libraries and pass a set of criteria that is sufficiently stringent such that a majority of the novel canonical miRNAs are cleanly processed in a Drosha-dependent manner when using the ectopic-expression assay. Although implementing this proposal would not prevent all false-positives from entering the databases, it would preserve a higher quality set of miRNAs while eliminating few authentic annotations. Those wanting to take additional measures to avoid false-positives could focus only on the subset of miRNAs that both meet these criteria and are conserved in other species. Unknownfeatures requiredfor Drosha/Dicerprocessing. Before learning the results of our experiments, we wondered whether any ectopically over-expressed hairpin of suitable length would be processed as if it were a miRNA, a result that would have rendered our assay too permissive to be of value. In this scenario, most of the specificity that distinguished authentic miRNA genes from other regions of the genome with potential to produce transcripts that fold into seemingly miRNA-like hairpins would have been a function of whether or not the regions were transcribed. This scenario was not realized, however, and our assay turned out to be informative, which illustrates how much of Drosha/Dicer substrate recognition still remains unknown. Many of the previously proposed miRNA hairpins that had no reads in our mouse samples were indistinguishable from authentic miRNA hairpins with regard to the known determinants for Drosha/Dicer recognition, yet none of these unconfirmed hairpins produced miRNA and miRNA* molecules in our very sensitive assay (Fig. 2C, D). These results showing that major processing specificity determinants still remain undiscovered point to the importance of finding these determinants--efforts which, if successful, will mark the next substantive advance in accurately predicting and annotating metazoan miRNAs. Methods Librarypreparation Total RNA samples from mouse ovary, testes, and brain were purchased from Ambion, and total RNA from mouse e7.5, e9.5, e12.5 and newborn were obtained from the Chess lab. The small RNA cDNA libraries were made as described (Grimson et al. 2008), except for the 3' adaptor ligation, which was 5' adenylated pTCGTATGCCGTCTTCTGCTTGidT. For a detailed protocol, see http://web.wi.mit.edu/bartel/pub/protocols.html. MicroRNA discovery The reads with inserts of 16-27 nt were processed as described (Babiarz et al. 2008). The miRNA candidates were identified using reads matching genomic regions that were not very highly repetitive (reads with <500 genomic matches). Reads from all datasets were combined and grouped by their 5' terminal loci, requiring that each candidate 5' locus pass five criteria listed in the text. 1) To pass the expression criterion, a candidate required >10 normalized reads. 2) To address the hairpin requirement, the secondary structure of the candidate was evaluated by selecting for each 5' terminal locus the most abundant sequence and extending its 5' end by 2 nt to define the range of one strand of the potential miRNA/miRNA* duplex. Three genomic windows were extracted with the 5' end extended an additional 10 nt and the 3' end extended either 50, 100, or 150 nt. Three more windows were extracted extending the 3' end by 10 nt and the 5' end another 50, 100, or 150 nt. The secondary structure of each of the six windows was predicted using RNAfold (Hofacker et al. 1994), and the number of hairpin base pairs (denoted using bracket notation) involving the 5'-extended miRNA candidate was calculated as the absolute value of [(# 5'-facing brackets) minus (# 3'-facing brackets)]. A candidate with a minimum of 16 base pairs using at least one of the six genomic windows satisfied the hairpin criteria. 3) The candidates with non-miRNA biogenesis were found by mapping to annotated non-coding RNA loci (rRNA, tRNA, snRNA, srpRNA). 4) The candidates likely produced by degradation were defined as those failing the 5' homogeneity requirement. A candidate satisfied the 5' homogeneity requirement if at least half the reads within 30 nt of the candidate locus were present within 2 nt of the candidate locus and if the candidate locus comprised at least half the reads within 2 nt of the candidate locus or if there was only one other locus within 30 nt of the candidate locus that had more than half of the reads mapping to the candidate locus. 5) Manual inspection of reads mapped to predicted secondary structures identified candidates accompanied by potential miRNA* reads. For ten previously annotated miRNAs and seven novel miRNAs, a suitable miRNA* read was found only after considering alternative hairpin folds predicted to be suboptimal using mfold (Mathews et al. 1999; Zuker 2003). For the analysis of mir-290, mir-291a, mir-291b, mir-292, mir293, mir-294, and mir-295, which are not present in mm8 genome assembly, we mapped all reads to mm9 genome assembly corresponding to the region (chr7(+): 3218627-3220842). For conservation analysis, a candidate was considered broadly conserved if the hairpin structure and the seed sequence were conserved to chicken, fish, frog, or lizard (galGal3, danRer5, xenTro2, and anoCarl, respectively) in the UCSC whole-genome alignments (Kuhn et al. 2009). To identify a candidate conserved in mammals, we looked at 12 additional genomes (bosTau3, canFam2, cavPor2, equCabl, hgl8, loxAfrl, monDom4, ornAnal, panTro2, ponAbe2, rheMac2, and rn4) and calculated the branchlength score from a phylogenetic tree trained on mouse 3' UTR data (Friedman et al. 2009), using the cutoff score of 0.7. A gene was considered to be in a conserved miRNA gene family if the hairpin produced a miRNA with a seed matching that of a conserved miRNA (Supplemental Table 6). Ectopic over-expression assays To generate expression constructs, pre-miRNA hairpins and the surrounding regions were amplified from human genomic DNA (NCI-BL2126) or from mouse BL6 genomic DNA using Pfu Ultra II polymerase (Stratagene) and primers with Gateway (Invitrogen)compatible ends designed to anneal -100 nt upstream and downstream from the miRNA hairpins. PCR products were inserted into Gateway vector pDONR221 and subsequently into pcDNA3.2/V5-DEST, and the resulting plasmids were transformed into DH5-a cells. Positive clones were selected by colony PCR and sequenced. Clones that did not have a mutation within pre-miRNA hairpins were selected. Plasmid DNA from the confirmed expression clones was purified for transfection using the Plasmid Mini Kit (Qiagen). For each standard assay, plasmids for up to ten hairpin expression constructs were mixed in equal amounts to create seven or eight pools of -1.4 pg DNA each, with each pool including 1-3 positive-control hairpins. HEK293T cells were cultured in DMEM supplemented with 10% FBS and plated in 12-well plates -24 hours prior to transfection to reach -80-90% confluency. Each well of cells was transfected with one pool of DNA using Lipofectamine 2000 (Invitrogen). For the standard assays, 145-200 ng of pMaxGFP (Amaxa) was cotransfected with each pool to enable transfection efficiency to be confirmed by GFP expression. Control wells (no hairpin plasmid) were transfected only with 145 ng pMaxGFP. For the Drosha/Dicer-dependency assays, 7-8 hairpin constructs were combined to create six pools of -400 ng each. Each pool was mixed with 1.2 g of the pCK-Drosha-FLAG(TN) (TNdrosha), pCK-FLAG-Dicer(TN) (TNdicer), or pCKdsRed.T4 (control vector, constructed by replacing the Drosha coding sequence of TNdrosha with dsRed coding sequence) and used to transfect one well of HEK293T cells as above. Control wells were transfected with 1.2 ptg of either TNdrosha, TNdicer, or control vector. For the dependency assays, each transfection was performed in duplicate wells. Cells from all assays were harvested 39-48 hours after transfection. Cells from each treatment were combined, total RNA was extracted using TriReagent (Ambion), and small-RNA libraries were prepared for Illumina sequencing. The reads were processed as above, and RNA species were matched to the transfected hairpins. In the standard assay, reads were normalized by the median of the 30 most frequently sequenced endogenous miRNAs. For assays testing Drosha/Dicerdependency, reads were normalized based on the number of reads corresponding to an 18-nt internal standard that had been spiked into equivalent amounts of total RNA prior to beginning library preparation. Reads matching the transfected hairpins were grouped by their 5' termini (5' terminal locus). The locus with the largest number of reads was considered the 5' terminal locus of the mature miRNA produced by the hairpin, and similarly, the most dominant 5' locus on the opposite arm was considered the miRNA*. The normalized miRNA and miRNA* read numbers were summed to calculate the expression level. If an overexpressed hairpin generated mature miRNA with the dominant 5' terminal locus corresponding to the expected locus and at least one read corresponding to the miRNA* with a -2-nt 3' overhang, it was considered expressed. A hairpin was classified as over-expressed if there were at least three-fold more reads in the hairpin transfection than in the control transfection, after adding psuedocounts of five to both. A hairpin was classified as Drosha- or Dicer-dependent if the knockdown was at least threefold. Identification of arm-switchingmiRNAs To determine the read numbers from the 5' and the 3' arm, reads from each sample were grouped based on their 5' termini, and the read numbers were tallied for those corresponding to the miRNA or miRNA* 5' terminus. Only samples with >5 reads on either arm was considered. The fold enrichment was calculated as the ratio of 5' and 3' arm reads after adding pseudocounts of one. RNA editing analysis Sequencing libraries from individual tissues were combined and mapped to the genome using the Bowtie alignment tool (Langmead et al. 2009). The alignments were filtered for sequences that uniquely align to the genome, contain at most one mismatch to the genome, and have 5' ends that map to within one nucleotide of an annotated miRNA or miRNA* 5' end. The 12 possible mismatch types were then quantified at each position covered by the filtered reads. For example, to screen for A-to-G mismatches indicative of A-to-I editing sites, the editing fraction was calculated as the number of reads containing an A-to-G mismatch at a particular position, divided by the number of filtered reads covering that position. Sites were considered editing candidates if the editing fraction was greater than 5%, had at least ten A-to-G mismatch reads, and did not occur in the last two nucleotides of the corresponding miRNA or miRNA*. Candidate editing sites were then manually examined and discarded if an alternative explanation was more parsimonious. For example, the only non-brain editing candidate mapped to let-7c-1 but was most likely due to a handful of let-7b reads containing untemplated nucleotide additions that fortuitously matched the let-7c- 1 locus. Consistent with this explanation, the putatively edited reads were unusually long and at unusually low abundance. Candidate editing sites were also checked in the Perlegen SNP database (Frazer et al. 2007) and dbSNP; no editing candidates corresponded to known SNPs. Untemplatednucleotide analysis To examine untemplated nucleotide addition, non-genome-mapping reads were filtered for those that match miRNA or miRNA* sequences but also include a non-genomic poly(N) at the 3' end. The untemplated nt addition rate was calculated as the ratio of reads with the untemplated nt to the sum of the reads with and without the untemplated nt. After excluding miRNAs that map to multiple loci and any miRNAs or miRNA*s with a genomic T at the position immediately 3' of the annotated sequence, there were 343 miRNA/miRNA* species with untemplated U on the 5' arm and 318 on the 3' arm. Similarly, there were 287 5' arm species with untemplated A on the 5' arm and 324 on the 3' arm. The background tRNA untemplated U addition rate was calculated similarly. A two-sided Kolmogorov-Smirnov test was used to assess significant differences in distributions. Figure legends Figure 1. Mouse miRNAs and candidates identified by high-throughput sequencing. (A) Overlap between previously annotated miRNA hairpins (miRBase v. 14.0; green), miRNA candidates identified in the current study, and the subset of these candidates that met our criteria for classification as confidently identified canonical miRNAs (red). (B) Overlap between previously annotated mirtrons and shRNAs and the mirtrons and shRNAs supported by our study, colored as in A. Figure 2. Experimental evaluation of annotated miRNAs and previously proposed candidates. (A) Schematic of the expression vector transfected into HEK293T cells. (B) Examples of the standard ectopic-expression assay, transfecting plasmids indicated in the key. Reads from the control transfection (no hairpin plasmid) were from endogenous expression in HEK293T cells. (C) Assay results for annotated human miRNAs and published candidates. Bars are colored as in B; asterisks indicate detectable overexpression (>1 read from both the anticipated miRNA and miRNA*, with miRNA and miRNA* combined expressed more than threefold over endogenous levels. (D) Assay results for unconfirmed annotated mouse miRNAs and published candidates. Mouse controls were selected from miRNAs that were sequenced from our mouse samples. Bars are colored as in B; detectable overexpression is indicated (asterisks). Shown are the results compiled from two experiments (Supplemental Figs. S3, S4). Figure 3. Experimental evaluation of novel miRNAs and candidates. (A) Examples of assays evaluating Drosha- and Dicer-dependence, transfecting plasmids indicated in the key. (B) Assay results for control miRNAs, novel miRNAs, and miRNA candidates. Bars are colored as in A; detectable overexpression (black asterisks), overexpression attempted but not detected (black minus), detectable Drosha-dependence (orange asterisks), and Drosha-dependence assayed but not detected (orange minus) are all indicated. Shown are the results compiled from three experiments (Supplemental Figs. S5-7). Figure 4. MicroRNA relative expression profiles. Profiles of mature miRNAs were constructed as described (Ruby et al. 2007b). The relative contribution of each miRNA from each sample and the sum of the normalized reads of all samples are provided (Supplemental Table 5). Figure 5. Reads from both arms of a hairpin, and sequential reads from the same arm. (A) Fraction and abundance of miRNA reads from each miRNA hairpin. To calculate the fraction, the miRNA reads were divided by the total number of miRNA and miRNA* reads, considering on each arm only the major 5' terminus. The dashed lines indicate the median fraction of miRNA reads and the median number of miRNA reads for conserved (red) and nonconserved (blue) miRNAs. (B) Switching of the dominant arm in different samples. For each sample, the fold enrichment of miRNA reads produced from the 5' arm over those produced from the 3' arm and vice versa was calculated. Shown are results for non-repetitive miRNAs that switch dominant arms, with at least a fivefold differential between two samples. The samples are color-coded (key), and an asterisk indicates samples with statistically significant enrichment of miRNAs produced from one arm over the other (p < 0.05, Chi-squared test). (C) Sequential Dicer cleavage. Predicted secondary structure of mmu-mir-3102 pre-miRNA (Hofacker et al. 1994). Figure 6. MicroRNAs with 5' heterogeneity. (A) The distribution of conserved (red) and nonconserved (blue) miRNAs with reads ±5 nt offset at their 5' terminus. (B) The fraction of offset reads and abundance of reads for each miRNA hairpin, colored as in A. The dashed lines indicate the median level of reads for conserved (red) and nonconserved (blue) miRNAs. (C) 5' Heterogeneity of miR-133a. Data from mouse heart (Rao et al. 2009) and newborn are mapped to the mmu-mir-133a-1 hairpin (top), and data from the ectopic-expression assay are mapped to the indicated transfected hairpin. The lines indicate miR- 133a. 1 (dark blue) and miR- 133a.2 (light blue), and red nucleotides indicate those that differ between mmu-mir-133a-1 and mmu-mir-133a-2. (D) Effect of losing miR-223 on messages with 3'UTR sites for miR-223 major and minor isoforms. SmallRNA sequencing data from mouse neutrophils (Baek et al. 2008) were mapped to the mir-223 hairpin (top) as in C. For each set of messages with the indicated 3'UTR site for miR-233 (major isoform sites, bottom left; minor isoform sites, bottom right), the fraction that changed at least to the degree indicated following loss of miR-223 is plotted, using data published for neutrophils differentiated in vivo (Baek et al. 2008). (E) Effect of losing miR-155 on messages with 3'UTR sites for miR-155 major and minor isoforms, plotted as in D using published data from T cells (Rodriguez et al. 2007). Sequencing data from our study are mapped to the mir-155 hairpin (top) as in C. The mRNAs with 8mer and 7mer-A 1 sites for the minor isoform were excluded from the analysis because these sites overlapped with 7mer-m8 sites for the major isoform. Figure 7. Untemplated nucleotide addition. (A) Untemplated nucleotide addition rate for miRNA and miRNA* reads from the indicated arm. Rates for each miRNA are provided (Supplemental Table 6). As a control, tRNA degradation fragments were analyzed similarly. (B) Distribution of rates for untemplated U addition to RNAs from the 5' arm (blue) and from the 3' arm (red). (C) Distribution of rates for untemplated A addition to RNAs from the 5' arm (blue) and from the 3' arm (red). (D) Schematic of the biogenesis stage in which U could be added to the RNA of only one arm (pre-miRNA, left), and the stage in which U could be added to the RNA of either arm (mature miRNA and miRNA*, right). Acknowledgements We thank N. Lau and A. Chess for embryonic and newborn total RNA, R. Friedman for calculating branch-length scores for the analysis of conservation, A. Marson and N. Hannet for technical advice, and V. N. Kim for TNdrosha and TNdicer plasmids. Supported by a grant from the NIH (GM06703 1) to D. B. Accession numbers All small RNA reads are available at the GEO database with accession number GSE20384. References Azuma-Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi, M.C. 2008. Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. ProcNatlAcad Sci USA 105(23): 7964 - 7969. Babiarz, J.E., Ruby, J.G., Wang, Y.M., Bartel, D.P., and Blelloch, R. 2008. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessorindependent, Dicer-dependent small RNAs. Genes & Development 22(20): 27732785. Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. 2008. The impact of microRNAs on protein output. Nature 455(7209): 64-71. Bartel, D.P. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. -. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell 136(2): 215233. Baskerville, S. and Bartel, D.P. 2005. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11(3): 241-247. Bender, W. 2008. MicroRNAs in the Drosophila bithorax complex. Genes & Development 22(1): 14-19. Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, 0., Barzilai, A., Einat, P., Einav, U., Meiri, E., Sharon, E., Spector, Y., and Bentwich, Z. 2005. Identification of hundreds of conserved and nonconserved human microRNAs. Nature Genetics 37(7): 766-770. Berezikov, E., Chung, W.J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian mirtron genes. Molecular Cell 28(2): 328-336. Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H.A., and Cuppen, E. 2005. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120(1): 21-24. Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E., and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee brain. Nature Genet 38(12): 1375-1377. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J., Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Research 16(10): 1289-1298. Blow, M.J., Grocock, R.J., van Dongen, S., Enright, A.J., Dicks, E., Futreal, P.A., Wooster, R., and Stratton, M.R. 2006. RNA editing of human microRNAs. Genome Biol 7(4): R27. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proceedingsof the National Academy of Sciences of the United States ofAmerica 104(46): 18097-18102. Chen, C.-Z., Li, L., Lodish, H.F., and Bartel, D.P. 2004. MicroRNAs Modulate Hematopoietic Lineage Differentiation. Science 303(5654): 83-86. Christodoulou, F., Raible, F., Tomer, R., Simakov, 0., Trachana, K., Klaus, S., Snyman, H., Hannon, G.J., Bork, P., and Arendt, D. 2010. Ancient animal microRNAs and the evolution of tissue identity. Nature 463(7284): 1084-1088. Cummins, J.M., He, Y.P., Leary, R.J., Pagliarini, R., Diaz, L.A., Sjoblom, T., Barad, 0., Bentwich, Z., Szafranska, A.E., Labourier, E., Raymond, C.K., Roberts, B.S., Juhl, H., Kinzler, K.W., Vogelstein, B., and Velculescu, V.E. 2006. The colorectal microRNAome. Proceedingsof the NationalAcademy of Sciences of the United States ofAmerica 103(10): 3687-3692. Frazer, K.A., Eskin, E., Kang, H.M., Bogue, M.A., Hinds, D.A., Beilharz, E.J., Gupta, R.V., Montgomery, J., Morenzoni, M.M., Nilsen, G.B., Pethiyagoda, C.L., Stuve, L.L., Johnson, F.M., Daly, M.J., Wade, C.M., and Cox, D.R. 2007. A sequence- 100 based variation map of 8.27 million SNPs in inbred mouse strains. Nature 448(7157): 1050-1053. Friedman, R.C., Farh, K.K.H., Burge, C.B., and Bartel, D.P. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19(1): 92-105. Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Research 32: D109DI11. Grimm, D., Streetz, K.L., Jopling, C.L., Storm, T.A., Pandey, K., Davis, C.R., Marion, P., Salazar, F., and Kay, M.A. 2006. Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature 441(7092): 537-541. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193U1115. Hagan, J.P., Piskounova, E., and Gregory, R.I. 2009. Lin28 recruits the TUTase Zcchc 11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat Struct Mol Biol 16(10): 1021-1025. Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.-K., Yeom, K.-H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. 2009. Posttranscriptional Crossregulation between Drosha and DGCR8. Cell 136(1): 75-84. Han, J.J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y.J., Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901. Heo, I., Joo, C., Cho, J., Ha, M., Han, J.J., and Kim, V.N. 2008. Lin28 Mediates the Terminal Uridylation of let-7 Precursor MicroRNA. Molecular Cell 32(2): 276284. Heo, I., Joo, C., Kim, Y.-K., Ha, M., Yoon, M.-J., Cho, J., Yeom, K.-H., Han, J., and Kim, V.N. 2009. TUT4 in Concert with Lin28 Suppresses MicroRNA Biogenesis through Pre-MicroRNA Uridylation. Cell 138(4): 696-708. Hofacker, I.L., Fontana, W., Stadler, P.F., Bonhoeffer, L.S., Tacker, M., and Schuster, P. 1994. FAST FOLDING AND COMPARISON OF RNA SECONDARY STRUCTURES. Monatshefte Fur Chemie 125(2): 167-188. Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific MicroRNAs. Developmental Cell 5(2): 351-358. Hu, H., Yan, Z., Xu, Y., Hu, H., Menzel, C., Zhou, Y., Chen, W., and Khaitovich, P. 2009. Sequence features associated with microRNA strand selection in humans and flies. BMC Genomics 10(1): 413. Kawahara, Y., Megraw, M., Kreider, E., lizasa, H., Valente, L., Hatzigeorgiou, A.G., and Nishikura, K. 2008. Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36(16): 5270-5280. Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R., and Nishikura, K. 2007a. RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8(8): 763-769. Kawahara, Y., Zinshteyn, B., Sethupathy, P., lizasa, H., Hatzigeorgiou, A.G., and Nishikura, K. 2007b. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315(5815): 1137-1140. 101 Kim, Y.-K. and Kim, V.N. 2007. Processing of intronic microRNAs. EMBO J26(3): 775-783. Kuchenbauer, F., Morin, R.D., Argiropoulos, B., Petriv, 0.1., Griffith, M., Heuser, M., Yung, E., Piper, J., Delaney, A., Prabhu, A.L., Zhao, Y.J., McDonald, H., Zeng, T., Hirst, M., Hansen, C.L., Marra, M.A., and Humphries, R.K. 2008. In-depth characterization of the microRNA transcriptome in a leukemia progression model. Genome Res 18(11): 1787-1797. Kuhn, R.M., Karolchik, D., Zweig, A.S., Wang, T., Smith, K.E., Rosenbloom, K.R., Rhead, B., Raney, B.J., Pohl, A., Pheasant, M., Meyer, L., Hsu, F., Hinrichs, A.S., Harte, R.A., Giardine, B., Fujita, P., Diekhans, M., Dreszer, T., Clawson, H., Barber, G.P., Haussler, D., and Kent, W.J. 2009. The UCSC Genome Browser Database: update 2009. Nucl Acids Res 37(suppl 1): D755-761. Kurihara, Y. and Watanabe, Y. 2004. Arabidopsis micro-RNA biogenesis through Dicerlike 1 protein functions. Proceedingsof the NationalAcademy of Sciences of the United States ofAmerica 101(34): 12753-12758. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294(5543): 853-858. Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New microRNAs from mouse and human. Rna-a Publicationof the Rna Society 9(2): 175-179. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. 2002. Identification of tissue-specific microRNAs from mouse. CurrentBiology 12(9): 735-739. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V., Chiaretti, S., Foa, R., Schliwka, J., Fuchs, U., Novosel, A., Muller, R.U., Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M.C., Weir, D.B., Choksi, R., De Vita, G., Frezzetti, D., Trompeter, H.I., Hornung, V., Teng, G., Hartmann, G., Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle, J.W., Ju, J.Y., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein, M.J., Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl, T. 2007. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129(7): 1401-1414. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. 2009. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biology 10(3): R25. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858-862. Lee, R.C. and Ambros, V. 2001. An Extensive Class of Small RNAs in Caenorhabditis elegans. Science 294(5543): 862-864. Lee, Y., Ahn, C., Han, J.J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, 0., Kim, S., and Kim, V.N. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425(6956): 415-419. Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. 2003. Vertebrate MicroRNA genes. Science 299(5612): 1540-1540. 102 Lu, C., Tej, S.S., Luo, S., Haudenschild, C.D., Meyers, B.C., and Green, P.J. 2005. Elucidation of the Small RNA Component of the Transcriptome. Science 309(5740): 1567-1569. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. 2004. Nuclear export of microRNA precursors. Science 303(5654): 95-98. Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., Johnstone, S., Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., Calabrese, J.M., Dennis, L.M., Volkert, T.L., Gupta, S., Love, J., Hannett, N., Sharp, P.A., Bartel, D.P., Jaenisch, R., and Young, R.A. 2008. Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells. Cell 134(3): 521-533. Mathews, D.H., Sabina, J., Zuker, M., and Turner, D.H. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. JournalofMolecular Biology 288(5): 911-940. Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H., Takayama, M., Asada, K., Mirochnitchenko, 0., Inouye, M., and Kato, I. 2006. The expression profile of microRNAs in mouse embryos. Nucleic Acids Research 34(6): 1765-1771. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1): 89-100. Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucl Acids Res 33(suppl_l): D501-504. Rajagopalan, R., Vaucheret, H., Trejo, J., and Bartel, D.P. 2006. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes & Development 20: 3407-3425. Rao, P.K., Toyama, Y., Chiang, H.R., Gupta, S., Bauer, M., Medvid, R., Reinhardt, F., Liao, R., Krieger, M., Jaenisch, R., Lodish, H.F., and Blelloch, R. 2009. Loss of Cardiac microRNA-Mediated Regulation Leads to Dilated Cardiomyopathy and Heart Failure. Circ Res 105(6): 585-594. Ro, S., Park, C., Young, D., Sanders, K.M., and Yan, W. 2007. Tissue-dependent paired expression of miRNAs. Nucleic Acids Res 35(17): 5944 - 5953. Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., Vetrie, D., Okkenhaug, K., Enright, A.J., Dougan, G., Turner, M., and Bradley, A. 2007. Requirement of bic/microRNA-155 for Normal Immune Function. Science 316(5824): 608-611. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C-elegans. Cell 127(6): 1193-1207. Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007a. Intronic microRNA precursors that bypass Drosha processing. Nature 448(7149): 83-U87. Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007b. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Research 17(12): 1850-1864. Seo, T.S., Bai, X.P., Ruparel, H., Li, Z.M., Turro, N.J., and Ju, J.Y. 2004. Photocleavable fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific 103 coupling chemistry. Proceedingsof the NationalAcademy of Sciences of the United States ofAmerica 101(15): 5488-5493. Stark, A., Bushati, N., Jan, C.H., Kheradpour, P., Hodges, E., Brennecke, J., Bartel, D.P., Cohen, S.M., and Kellis, M. 2008. A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. Genes & Development 22(1): 8-13. Stark, A., Kheradpour, P., Parts, L., Brennecke, J., Hodges, E., Hannon, G.J., and Kellis, M. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res 17(12): 1865-1879. Tam, O.H., Aravin, A.A., Stein, P., Girard, A., Murchison, E.P., Cheloufi, S., Hodges, E., Anger, M., Sachidanandam, R., Schultz, R.M., and Hannon, G.J. 2008. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453(7194): 534-538. Tyler, D.M., Okamura, K., Chung, W.-J., Hagen, J.W., Berezikov, E., Hannon, G.J., and Lai, E.C. 2008. Functionally distinct regulatory RNAs generated by bidirectional transcription and processing of microRNA loci. Genes & Development 22(1): 2636. Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J.M., Stoop, H., Nagel, R., Liu, Y.P., van Duijse, J., Drost, J., Griekspoor, A., Zlotorynski, E., Yabuta, N., De Vita, G., Nojima, H., Looijenga, L.H.J., and Agami, R. 2006. A Genetic Screen Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell Tumors. Cell 124(6): 1169-1181. Watanabe, T., Totoki, Y., Toyoda, A., Kaneda, M., Kuramochi-Miyagawa, S., Obata, Y., Chiba, H., Kohara, Y., Kono, T., Nakano, T., Surani, M.A., Sakaki, Y., and Sasaki, H. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453(7194): 539-543. Wienholds, E., Kloosterman, W.P., Miska, E., Alvarez-Saavedra, E., Berezikov, E., de Bruijn, E., Horvitz, H.R., Kauppinen, S., and Plasterk, R.H.A. 2005. MicroRNA Expression in Zebrafish Embryonic Development. Science 309(5732): 310-311. Wu, H., Ye, C., Ramirez, D., and Manjunath, N. 2009. Alternative Processing of Primary microRNA Transcripts by Drosha Generates 5,A< End Variation of Mature microRNA. PLoS ONE 4(10): e7566. Xie, X.H., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. 2005. Systematic discovery of regulatory motifs in human promoters and 3 ' UTRs by comparison of several mammals. Nature 434(7031): 338-345. Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. 2006. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13(1): 13-2 1. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. 2003. Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes & Development 17(24): 3011-3016. Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J24(1): 13 8-148. Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucl Acids Res 31(13): 3406-3415. 104 Figure 1 miRNA candidates (706) Confidently identified canonical ( miRNAs(465) miRBase 14.{ Mirtrons (14) shRNAs (16) 105 Figure 2 - Genomic DNA' CMV prmqer\ Paly(A) signal Cadidate hairpin Pol111 expression vector I-m 1 2r1 * ha-m I No0I hari*ls amr10.* he h lisa-r 252. famr-4283 hsa-mir-49hsa-mir88 hsa-mir-1- - - E hsa-mir-220a candl41 oE -ru,4 candl42' candl81l cand3ld' 10 102 10' 10' mmu-mir-122 mmu-mirr433e-l mmu-mir-137 mmu-mir-138-1 mmu-rnir-139 mmij-ir-I53 mmu-mir-2M8 mmu-mir-216a mmu-jr-_17 mmu-nmir-223 mmu-mir-224 mmu-mir-375 mmu-mir-105 mmu-mir-207 rnmu-rnir-220 mmu-mir-327 mmu-mir-343 mmu-rnir-453 mmu-mir-8 mmu-nmir-654 mmu-mir-678 mmu-m jr-S80-3 mmu-rnir-687 mmu-m Jr-697 mmu-rmr-698 mmu-mjr-717 mmu-mir-719 mmu-mir-761 mmu-mir-882 mmu-mir-599 U, mmu-mr-5591 mmu-rnir-582 mmu-mir-584-1 w mmu-mir-584-2 E mmu-mir-685 S mmu-rnir-588 mmu-rir-690 S mmu-mir-593 0 mmu-mir-704 C mmu-mir-705 mmu-mir-707 m mmu-mir-763 mmu-rmr-1187 E mrnu-rmr-1192 mmu-mir-1894 mm -ir190 mu-mir-14 8 C mmu-mir-1904 mmu-m jr-I927 mmu-m jr-I929 mmu-rm r-1936 mmu-MIr-1937c mmu-mjr-1940 mmu-mir1959 mmu-mir-1 960 mmu-mir-1955 mmu-mir-1970 mmu-mir-184 la-S mmu-mnir-29 mmu-mir-466f-4 mmu-mir-489 mmu-mir-1191 mmu-mir-1953 mmu-mir-1969 mnn,-mir-449c mmu-mir-677 -mmu-m Jr-I944 mmu-mirc-miob-MM 28 mmu-mirc-niob-MAI(67 mmu-mirc-niob-MM 75 mmuj-mjrc-niob-MM 7155 mmu-mi.rc-niob-MAof185 mmu-mirc -niob-MAf 227 mmu-mirc-niob-MWr290 mmu-rnirc-niob-MAC298 ATR90 MIR103 MIR146 MIR165 MIR170 MIR174 MIR181 MIR192 MIR213 MIR223 MIR237 MI1R252 1C S1 Reads --- z .S 0 10 102 103 Reads 106 10' oL 10 &*C-caddae 1~~~~~3 DG*8 caddae deedn ~ ~ ~ ~~ Other~~~~~ Cosre 00 ____ NvlRr nR~ ___ ____ * ___ hdidatZ oe ___ ____ oe ____ nR~ ___ ocnn *h~ ____ ___CA-) *-n lMuecnnclcnrl *ontrol * z +- -0 + Iz z W.o0W.o IA Reads E 0IL LON 1,&MI A U C6 m 0 MCCL 0 oztro 00 0L MI 0 006~ U~ 0-'00 M L.jN4Id-I 00 IL~~0 CL dNU~~~~~~~~~~~C mU W II 064I C4 01 aA 06I0a L' 0. ~ - I, ggMAU WOAW -A 11 QOO0~~~ m~ 0 C Al ~ L UU~ %&a "'ACLAtUg 424 &LQA 0 ~ ~A L U &~ C && .66 a0000000-*0U' W~@~ A, ~ ~ N *C* I 0L No-, UOI0~0 jggLC W" KW A-,Cl U ........................................ ....................... Figure 5 A 100 e- 100 9* 80 * ** 0b 0 0 S 0 0 0 * 0 0*0 0 ,O* 1 0 0a * 00 0 0 0* a o a o. o 0'* ?*f e---mir 126 '-mir-296 P - -mir-219-2 10 - m ir-376b 8 - mir-14 mir-409 i . lo-36 0 0oconserved" 0o 0 0 * a mir-140 . 1 3' Arm 5' Arm mir-142 mir-154 mir-181c mir-214 mir-292 mir-296 mir-337 mir-350 mir-384 mir-455 mir-485 mir-493 mir-505 mir-539 mir-540 mir-544 mir-664 mir-673 mir-674 mir-700 mir-1193 10 10m 10 10 miRNA reads 102 10 -mir199a 0 o 500 1 moir-142 0*90l, 0 70 r- a m 0 *0o o -n -r-7 * .**.w F.* !t-0 L .-- - ' '------ 64 32 2 4 2 1 Fold enrichment mES me7.5 .e9.5 me12.5 uNewborn .Brain 16 8 4 8 16 32 Testes mOvary miR-3102.2-5p (35 reads) U U GA C G U G GGG A CUGG UGG GCAGG AG AGAGCC GUG GUGGCCA-GGGUG miR-3102.1* (1 read) III A G 1111111 11111 liii 111 11111 I 111111 U A G U U GACC ACC CGUCC UC UCUCGG CAC CAUCGGU CCCAC C A G C GA UC C GU miR-3102.1 (820 reads) Am miR-3102.2-3p (30 reads) 109 Figure 6 A 14 'U MConserved 0 Nonconserved 12 60 ~ 50 100 miR-223 ~ 60. ' -i2.14 *M C,* 40 * z * / * - , ImM -5p miR-101b *miR-101a -4b % 60 . 3 * 30- 40 U *l i ' * 20 * 840 24 * 1 10 103 104 miRNA reads 102 105 106 1 81% miR-223 major isoform, 12% minor isoform D CUUC U A N miRN-1 0 00* e 0 Fraction of offset reads (%) CUUC A IJ~ -- mi 1403 * -miR-16-3p U 0'0.-000 0 U - .- - 2 o Conserved o Nonconserved U C GG yG AGAG-----UG-UCAGUUUGU CAAAUACC OUGUCUCA UA CAAGUGU GGCCAUGC 3 CUCGCACUGUACG5' GGUUGAGUCGAACAGUUUAUGO CA GC CU GCC ~ C GC A G C U UA U A G AUA U C3 AC GC UGCU CG U A CG (In. 1.00 C AC C A UA GUCAGUU 8mer (AACUGACA) 0 t 0.75 7mer-m8 (AACUGAC) 7mer-A1(ACUGAA) LO LL 6mer (ACUGAC) No Site o UCAGUUU 8mer (AAACUGAA) .7mer-m8 (AAACUGA) 7mer-A1 (AACUGAA) 6mer (AACUGA) No Site 0.50 E E S0.25 8 0.25 GC CU AU -0.75 N AC -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) AUC GCUU AU AU A A CUG AA GUA GG A & CA UA K Cr U GC C. ( I GC UA AGGCUQUA UG-CUUUAAUGCUAAU 3 UCCQGAACAC GACAAUUACGAUUGU C C AUCCUCAG CAGC tit] I I U A^" 3 UCGU C G W UC) AU UCU E 0.2 0.25- M ) UCG ^nn -0.75 mir-133a-1 transfection AAUGCUA 0.75 7mer-m8 (UAGCAUU) 7mer-A1 (AGCAUUA) 6mer (AGCAUU) 0.5 No Site to CG GUG AGGGGUU 8mer (UAGCAUUA) $ UA gGGCC 5' UAAUGCU 8mer (AGCAUUAA) 0 7mer-m8 (AGCAUUA) 0.75, 7mer-A1 (GCAUUAA) 6mer (GCAUUA) No Site 0.50. 5'AU 3' AC 3 i -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) 94% miR-155 major isoform, 1% minor isoform AUA GC UAG 0.0 -0.75 mir-133a-2 transfection -0.5 -0.25 0.0 0.25 Fold Change (log2) 110 0.75 0.0C -0.75 -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) C . ............... :::::::: ................. .. Figure 7 A 5'Arm 3'Arm tRNA A C 7.3%1.2(343) 0.2%±0.1(348) 4.5%±1 .1(318) 0.2%±0.1(318) 0.9%±0.4(186) 0.3%±0.1(186) G 0.2%±0.0(288) T 6.5%±1.4(287) 0.2%±0.1(336) 19.9%±5.5(324) 0.5%±0.1(186) 2.8%±0.6(186) B 1.0- 1.0 0.8 0.80 0 2 0.6 0.6 - , 5 0.4 E 0.4 - 0.2 0.2 - 0.0 0.0. E p =0.30 (KS test) 0.0 Untemplated U addition rate ,3'+ (U)" vs. 5' arm 3' arm 0.2 0.4 0.6 0.8 1.0 Untemplated A addition rate 5' 3'+ (U), 5' 3' + (U), C 111 - Table 1. Properties of Canonical miRNAs Total Conserved Nonconserved Hairpins 475 295 180 in clusters 291 163 128 in small clusters 153 129 24 in large clusters 138 34 104 not in clusters 184 132 52 in introns (same strand) opposite introns not in introns 180 22 273 77 18 200 103 4 73 with miRNA from 5' arm 202 with miRNA from 3' arm 141 with miRNAs from both arms 132 137 102 56 65 39 76 112 Table 2. Inferred A-to-I Editing Sites in miRNAs miRNA Position Fraction edited miR-219-2-3p 15 0.064 miR-337-3p 10 0.062 miR-376a* 4 0.297 miR-376b-3p 6 0.501 miR-376c 6 0.311 miR-378 16 0.087 miR-379* 5 0.095 miR-381 4 0.125 miR-411-5p 5 0.239 miR-421 14 0.054 miR-467d 3 0.094 miR-497 2 0.104 miR-497* 20 0.699 miR-540* 3 0.080 miR-1251 6 0.431 miR-3099 7 0.209 113 .......... Supplemental Figure 1 Not sequenced (52) d Undet Not enough reads (72) annotated miRNAs (157) L.Failed other filters (33) -DGCR8 & Dicer-dependent (290 226) - DGCR8-dependent (2 2) Confirmed miRNAs (387) Annotated miRNAs (407) - Dioer-cependent (7,3) Not strongly dependent (3,3) miRAcanddate not confidently confirmed (20) Total candidates (736) N0 - Cannot determine (85, 49) - DGCR8 & Dicer-dependent (37, 0) Dicer-dependent (1, 0) Not strongly dependent (1, 0) determine (69, 15) New candidates (329) -Cannot DGCR8 & Dicer-dependent (45, 0) - miRNA candidates (221) Dicer-dependent (5,0) Not strongly dependent (42, 8) -Cannot determine (129, 9) Supplementary Figure 1. Mouse miRNA candidates identified by Illumina sequencing. MicroRNAs that are annotated in miRBase v.14.0 are boxed in green. The miRNA hairpin loci were further categorized by DGCR8- and Dicer-dependency using sequencing data from wild-type an mutant ES cells (Babiarz et al. 2008). The number in parenthesis is the total number of loci in the category. If followed by another number, the second number is the number of conserved loci. A candidate was considered DGCR8- and Dicer-dependent using criteria of a previous study (Babiarz et al. 2008), except that predicted hairpin loci replaced the 100-nt windows, with the read cutoffs scaled to the hairpin lengths. 114 .......... ....... .. .... .... .............................. .. ..... .. ........ .... ... .......... ........................... .......... Supplemental Figure 3 hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 I mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-105 mmu-mir-207 mmu-mir-220 mmu-mir-327 mmu-mir-343 mmu-mir-453 mmu-mir-568 mmu-mir-654 mmu-mir-678 mmu-mir-680-3 mmu-mir-687 mmu-mir-697 mmu-mir-698 mmu-mir-717 mmu-mir-719 mmu-mir-761 mmu-mir-882 mmu-mir-682 mmu-mir-690 mmu-mir-707 mmu-mir-763 mmu-mirc-niob-MM 28 mmu-mirc-niob-MM 57 mmu-mirc-niob-MM 76 mmu-mirc-niob-MM 155 mmu-mirc-niob-MM 185 mmu-mirc-niob-MM 227 mmu-mirm-niob-MM 290 mmu-mitc-niob-MM 298 MIR90 MIR103 MIR146 MIR165 MIR170 MIR174 MIR181 MIR192 MIR213 MIR223 MIR237 """" * w"""' "" * Human miRNA controls M6 "'"'""" = * *- I~=E~ Lim 2003 U* I- Berezikov 2005 -9- " * " * ....... Mouse miRNA controls * Not sequenced Not enough reads Berezikov 2006b Xie 2005 AP9.99 10 160 10bo 1000 10000 Reads N No hairpin plasmid U Hairpin plasmid Supplemental Figure S3. Ectopic-expression assay evaluating unconfirmed annotated miRNAs and predicted miRNAs. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figure S4 to produce Figure 2C and D. 115 Supplemental Figure 4 hsa-mir-193b mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-599 mmu-mir-669i mmu-mir-684-1 mmu-mir-684-2 mmu-mir-685 mmu-mir-688 mmu-mir-690 mmu-mir-693 mmu-mir-704 mmu-mir-705 mmu-mir-707 mmu-mir-763 mmu-mir-1187 " mmu-mir-1192 mmu-mir-1894 mmu-mir-1903 mmu-mir-1904 mmu-mir-1907 mmu-mir-1927 mmu-rmr-1929 mmu-mir-1936 mmu-mir-1937c mmu-rir-1940 mmu-mir-1959 mmu-mir-1960 mmu-mir-1966 mmu-mir-1970 mmu-mir-184 mmu-mir-297a-6 mmu-mir-466f-4 mmu-mir-489 mmu-mir-1191 mmu-mir-1953 mmu-mir-1969 mmu-mir-449c mmu-mir-677 mmu-mir-1944 Human miRNA control *1 * nu-unu -* Mouse miRNA controls 1* -* - * -I-I!-I U * i i Si I I* Not enough reads - 1 i- * 1 I * i- No miRNA* * * * M 10 160 1oo Incorrect miRNA* io60 100000 Reads NNo hairpin plasmid a Hairpin plasmid Supplemental Figure S4. Ectopic-expression assay evaluating unconfirmed annotated miRNAs. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figure S3 to produce Figure 2C and D. 116 .. ---.......... ..... .......... .................................................... :::::: .................. Supplemental Figure 5 hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-1941 mmu-mir-1964 mmu-mir-1968 mmu-mir-1912 mmu-mir-3061 mmu-mir-3072 mmu-mir-3073 mmu-mir-3075 mmu-mir-3081 mmu-mir-3089 mmu-mir-3090 mmu-mir-3093 , - * I-. - * " " Human miRNA controls ff- Lim 2003 Berezikov 2005 Mouse miRNA controls " "" * * Novel miRNAs a mmu-mir-3095 mmu-mir-3108 mmu-mir-3109 mmu-mir-3110 mmu-mir-344f mmu-mir-3104 noStar-014 noStar-033 noStar-043 noStar-073 noStar-080 - * 1* Novel shRNAs DGCR8- & DCR-dependent candidates noStar-087 nostar-117 noStar-135 noStar-150 noStar-154 noStar-166 wrongStar-016 noStar-149 s i1 Other candidate 10000 100000 1000 Reads n Hairpin plasmid mNo hairpin plasmid 10 100 Supplemental Figure S5. Ectopic-expression assay evaluating predicted miRNAs, novel miRNAs, and miRNA candidates. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figures S6 and S7 to produce Figure 3B. 117 Supplemental Figure 6 hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 .... mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-1188 mmu-mir-1197 mmu-mir-1933 mmu-mir-1947 mmu-mir-1224 mmu-mir-1839 mmu-mir-509 mmu-mir-3059 mmu-mir-3063 mmu-mir-3065 mmu-mir-3067 mmu-mir-3079 mmu-mir-3086 mmu-mir-3091 mmu-mir-3100 mmu-mir-3112 mmu-mir-344e mmu-mir-3111 UnoStar-046 noStar-148 wrongStar-017 noStar-020 noStar-034 noStar-054 noStar-056 noStar-068 noStar-093 noStar-122 noStar-126 noStar- 160 wrongStar-002 wrongStar-007 | wrongStar-009 - * "" * - - Human miRNA controls * l-t-t-* - I~i*- Lim 2003 Berezikov 2005 m * * """ Mouse miRNA controls *... Noncanonical controls Early miRBase Novel miRNA - m |* Rare novel miRNAs * .... ..- W. Novel shRNAs Conserved candidates * Other candidates " 10 * 160 iobo 1000 Reads ENo hairpin plasmid N Hairpin plasmid I- 100000 Supplemental Figure S6. Ectopic-expression assay evaluating novel miRNAs, miRNA candidates, predicted miRNAs, and an unconfirmed annotated miRNA (mmu-mir-509). Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figures S5 and S7 to produce Figure 3B. 118 ........................................ ..................... ............................................... Supplemental Figure 7 Human miRNAcontrol Lim 2003 - hsa-mir-19 3b hsa-mir-22 Oa mmu-mir-122 mmu-mir-133oa-2P1 mmu-mir-1 37 mmu-mir-13 8-1 mmu-mir-139 mmu-mir-153 mmu-mir-20 18ai mmu-mir-21 6a . Mouse mmu-mir-2 17 miRNA controls mmu-mir-2 23 mmu-mir-2 mmu-mir-3 75 mmu-mir-19 33 mmu-mir-19 41 mmu-mir-19 47 mmu-mir-19 64 mmu-mir-19 68 mmu-mir-12 24 Noncanonical controls L mmu-mir-18139 Early miRBasemmu-mir-5 09 mmu-mir-19 12 mmu-mir-30 59 mmu-mir-30 61 mmu-mir-30 72 mmu-mir-30 73 mmu-mir-30 75 Novel miRNAs mmu-mir-30k81 mmu-mir-30 k90 mmu-mir-30 195 mmu-mir-31 08 mmu-mir-31 09 mmu-mir-31 10 I = F mmu-mir-30 63 Novel rare miRNAs Novel shRNAs mmu-mir-30P65 mmu-mir-30 79 mmu-mir-30'86= Kmmu-mir-30 mmu-mir-344emmu-mir-3 14f noStar-020 P56 noStar-1 22 noStar-148 wrongStar-002 wrongStar-0109 noStar-0 Candidates <=1 10 * No hairpinplasmid+ no TNdrosha/TNdicerplasmid EHairpin plasmid+ no TNdrosha/TNdicerplasmid 100 Reads 100 " No hairpinplasmid+ TNdroshaplasmid "Hairpin plasmid +TNdroshaplasmid 106$0 10000 " No hairpin plasmid+ TNdicerplasmid " Hairpin plasmid+ TNdicer plasmid Supplemental Figure S7. Drosha/Dicer-dependent biogenesis of novel miRNAs. The selected hairpins were transfected into HEK293T with a control vector (blue), TNdrosha (red), or TNdicer (green). Similar transfections using the control vector instead of the hairpins are shown in light blue, orange, and light green, respectively. Results of this experiment were compiled with those of Supplemental Figures S5 and S6 to produce Figure 3B. 119 . ........................................................... Supplemental Figure 8 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 10 102 103 105 104 108 107 108 Genomic distance o Not clustered o Clustered Supplemental Figure S8. Correlation of expression and genomic distance. The correlation of expression with clustering was calculated as previously (Baskerville and Bartel 2005), except that miRNAs that mapped to the same pre-mRNA transcript were considered clustered regardless of genomic distance. The clustered miRNAs (red) were more correlated than non-clustered miRNAs (blue). Some miRNA pairs more than 50,000 nt apart were categorized as clustered with each other due to joint proximity to intervening miRNAs, and their correlated expressions supported this clustering method. Other miRNAs that are within 50,000 nt of each other were not considered clustered because one mapped within a pre-mRNA, whereas the other one did not; each of these three pairs of miRNAs were not correlated in expression. Correlated expression observed for many miRNAs located -130,000 nt apart was due to likely co-expression of two megaclusters on chr12. 120 109 .. ............... :::::::.: ..... . ....................... :................... ....................... .......................................................................................... ......... ........ .......... Supplemental Figure 9 350 12,000,000 300 10,000,000 250 8,000,000 200 U) ' 6,000,000 150 4,000,000 50A 100 2,000,000 0 16 17 18 19 20 21 22 23 24 25 0 26 27 miRNA length N Conserved M Nonconserved 18 19 20 21 22 23 24 25 26 miRNA length U Conserved N Nonconserved Supplemental Figure S9. The distribution of lengths of conserved (red) and nonconserved (blue) mature miRNAs. (A) Size distribution plotted in terms of number of normalized reads. (B) Size distribution plotted in terms of the dominant read length for each miRNA. 121 Supplemental Figure 10 UCAGUUC UCAGUUG DO. UCAGUUA 1.00 1.00. o 0.7 .L o0.75. E 0.50 0.50' .5 8mer 7mer-m8 7mer-Al 6mer No Site 0.251 0.05'7'-M -0.20 ~0.25' 0o5 0.75 05 -0.75 -0.5 Fold Change (log2) 0 1.00 , . 0.5 7mer-m8 7mer-Al 6mer No Site E E 00.25 0.0(14PTi -0.75 -0.5 0 AAUGCUG AAUGCUU 10. 1 00 -0.25 0.0 0.25 Fold Change (log2) 8mer 0.50 8mer 7mer-m8 7mer-Al 6mer No Site 0 0.75-. 075 -0.25 0.0 0.25 Fold Change (log2) 0 AAUGCUC - 1.00 0.75. 0 0.75 T U- E 0.50. 0.25-- 8mer 7mer-m8 7mer-Al 06mer T 0.50 E 8mer 0.25. No Site 0.0-0.75 -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) 0. -75 - -0.5 -0.25 .0 = 0.50a5 7mer-m8 7mer-Al 6mer No Site E 30.25.Se 0.5 0.00+-0.75 0.25 Fold Change (log2) 8mer 7mer-m8 7mer-Al 6mer 1 No Site -0.5 -0.25 0.0 0.25 0.5 C Fold Change (log2) Supplemental Figure S10. Controls to ensure that observed mRNA derepression attributed to the minor isoform was not due to overlap of its sites with offset 6mer sites of the major isoform. (A) Lack of statistically significant derepression by the three control motifs that differed from the miR-223 minor site by a single nt at position 8. (B) Same as in A except for the miR-155 minor site. The mRNAs with 8mer and 7mer-Al sites for the minor isoform were excluded from the analysis because these sites overlapped with 7mer-m8 sites for the major isoform. 122 ............................................................................ ..................... . ..... ... Supplemental Figure 11 A Brain miRNA-matching sequences Spiked-in sequence controls 5endof read typ T>G c-A Perfect ialbh G T G>C GPA A T A-c A>G (0.61%) (0.92%) (2-0%) C B 300 Fraction edited: >5% .4 250 e 200 Edited reads: >10 " 217 Sequences mapped (sequences with at least 5 reads shown) c>T G TTGTACTTAAAGCGAGGTTGCCCTTTGTATATTCGGTTTATTGACTGGAATATACAAiGGCAAGCTCTCTGGATATCAAACC CT TCT... .. .. .. .. . GGCAA .. .. .. .. .. .. .. .. .. .. . .. .. .. . .. ... .. .. ..CTATACA . .. .. . .. .... G CAG TCTCTGA. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. ATATACAA . .. .. .. .. ... .. .. . ... .. .. . .... .. .. .. .. .. .. .. .. .. .. ....TATACAAGGCAAGCTCTCTGC G C A CT C G . .IL . .. .. . ... . .. ... .. ... .. ... ATQCA . ...... ...... ... ... T T C A G C C C C C G . . ... .. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ........... . .... ..... ... ..... . .. .. .. ... .... .. .. TATACAAGCGGCAAGCTCTC G T . .. .. .... . .. .. .. .. .... .. .... .. .. .. .. . .. .. .. TAThAAAGG CAArCTCTCTGA. .. .. .. .. .. .. AGG CAGCC CTT . .. . ... .. ... .. ... . .. .. . ... . ... .. . ... . ... . ..TATA . . . .... . ... .. .. .. .. . ... .. .. .. .. ... . .. .. . .. .. .. .. . ... .. . ... .. .. . ... .. .. .. .. .. .. . ... .. .... TATACAAGGGCAAGCTCTCTGTA. TCTCTG . .. .. .. .. . .. . .. . ... ... . .. .. .. . ... ... ... .. .. . ... .. .. .. .. .. .. .. .. . ..TATEAAGGGCAAG .. .. .. .. .. ....AGCGAGGTT1GCCCTTTGTA. . .. .. . ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... . . .. .. ..... AGCGAGGTTGCCCTTTGTAT . .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ........ . . AGCGAGGTTGCCCTTTGTATA. ... .. . .. ... .. .. .. .. .. .. .. .. .. . ..... . ... .. .. .. .. .. .. ... . . .. :. . ... ... AGCGAG;GTTGCCCTTTGTATAT . .. . .. .. .. .. .. .. .. .. . ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. . .. ... .. .. .. .. .. . .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..... .A CGGGT CCTT TATATT . .. .. .. . .. .. .AGCGAGGTTGCCCTTTGTATATTC .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ... .. .. . .. . . .. .. . .:. . . . .. .. . . ............ :. : : .... : : :: ............. : : . : : :.:...ATATACAAGGG AAGCT TC... .. ... ... . .. ... .. . .. ... .. . . .. .. ...... .. .. ... .. .. .. .. .. .. .. .. ATATACAAGGGCAAGCTCTCT . .. .. ...... .. .. ... .. .. .. .. .. .. .. .. ATATACAAGGGCAAGCTCTCTG. .. .. .. .. .. ... ... .. .. .. ..... .. ... .. .. .. .. .. ....... ::ATATACAAGGGCA GCTTCTGT .. .......... . .. ............................. ....... . ... ...... .... .. . . .. TATACAAGGGCAAGCTCTC. .............. . . .. .. .. .. . .... . .. .. . ... .. .. .. .. .. ... ... .. .. .. .. .. .. ..TATAC GCAGT TCT . . .. .. .. . . . . .. . ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ..TATACAAGGGCAAGCTCTCTG .. .. .. .. .. ... . .. .. .. .. .. .. . ... .. . .... ... .. .. .. . ... . . .... .. .. .. . .. ..TATACAAGGGCAAGC TCCT T... . . .. .... . ... ....... ... ... .... .. .... ... ... ... ... AG ...... ... .... . .... .... .. ...... ................................. ...................... AMaGAaCTCTCTGT ............ .. .. .. .. . ... .. .. .. 150 E 100 . . . 30 65 Brain - mir-361 chr12(+): 110965025 - 110965112 Thresholds 20 10 Genom ine amiRNA eRtig eaichn nilters EVendter Readtere Eventfitter4 Unque b .. .. .. .. .. .. .. .. ... . .. .. AG >2ntftm ATACA G TCTCTG GRte: 0.125 'A 3500 210 118011 90 60 30 V 216 f 600 E miR-337 3p arm(star strand) CCATTCAGCTCCTATATGATGCCTTT 8 900 m750 5 3000 2500 1 2000 1501 1 120 J p = 3.27e-13 E 450 1500 1000 O3. 500 GT 0 miR-411 5p arm (star strand) 4 300 0 ACCGTATAGCGTACG 300 p < 2.2e-16 AA 0 600 70 miR-376a 5parm(star strand) GTAGATTCTCCTTCTATGAGT p <2.2e-16 140 i 24. 900 32 1200 280 40 1500 350 210 Supplemental Figure S11. RNA editing. (A) An overview of mismatches from the sequences indicated. In the two spiked-in synthetic RNAs of known sequence, mismatches were distributed throughout the length of the sequence, with no preference for A-to-G mismatches. In miRNA-mapping small RNA sequences from brain, mismatches were concentrated in the last 2 nt ofthe read, probably due to cellular terminal-transferase activity. (B) Loss of most mismatch events after applying filters expected to distinguish editing events from background. Mismatch events were considered significant if a position had at least 10 reads corresponding to a particular mismatch, and these reads accounted for at least 5% of reads covering that position. As successive filters were applied to the genome-mapping reads, the number of significant A-to-G mismatch events remained relatively unaffected, whereas nearly all other mismatch events were eliminated. In particular, C-to-T mismatches were mostly eliminated, indicating that C-to-U RNA editing does not occur to any significant degree in miRNAs. A-to-G mismatch events that passed all filters were considered editing candidates and manually examined to see if other plausible models could explain the mismatches. (C) A display of most abundant perfectly-matching and single-mismatch reads from the mmu-mir-381 locus illustrates that inferred A-to-I editing accounts for essentially all mismatches at the edited position, and the great majority of all mismatched reads mapping to the miRNA or miRNA*. An analogous pattern was found for all 16 miRNAs that passed filters and manual validation. (D) Editing of a miRNA or miRNA* was associated with significantly altered 5' end specificity. In the cases of mmu-mir-337 andmmu-mir-411, edited reads had more homogeneous 5' ends than unedited reads. 123 Supplemental Table 1. Summary of high-throughput sequencing. Sample Raw reads With linker seq Genome match (16-27nt) Ovary 641,583 416,374 259,684 Testes 5,427,076 2,308,332 1,614,777 Brain 13,024,478 10,513,006 6,984,353 Newborn 21,967,488 16,763,972 11,045,939 e12.5 3,936,146 3,467,324 2,457,730 e9.5 5,586,229 4,104,135 2,544,507 e7.5 5,762,821 4,816,695 2,705,251 ES 3,737,635 3,061,072 1,057,274 Total 60,083,456 45,450,910 28,669,515 124 Supplementary Table 2. Small RNA compositions. Non-coding RNA (ncRNA) refers to any reads that map to annotated rRNA, tRl loci. Small-interfering RNA (siRNA) and mRNA exon reads refer to reads that n the sense strands of annotated refSeq mRNAs, respectively. Sample miRNA ncRNA siRNA mRNA exon Ovary 180,069.19 47,827.49 944.82 1,316.74 Testes 180,547.41 57,455.81 2,442.20 12,939.83 Brain 6,261,981.23 240,935.49 154,737.47 7,559.71 Newborn 9,440,674.90 625,004.67 679,948.77 40,821.58 e12p5 2,070,477.89 199,596.51 40,199.63 4,483.84 e9p5 2,072,408.35 273,737.99 9,670.94 10,720.03 e7p5 2,052,457.81 367,164.12 4,752.57 11,128.34 ES 468,326.86 235,034.54 17,592.83 6,790.23 Total 22,726,943.64 2,046,756.63 910,289.23 95,760.30 % 79.27 7.14 3.18 0.33 125 Supplemental Table4. Hairpin mmu-mir-1937a mmu-mir-1937b mmu-mir-464 mmu-mir-1944 mmu-mir-1949 mmu-mir-449c mmu-mir-677 mmu-mir-702 mmu-mir-1190 mmu-mir-1191 mmu-mir-184 mmu-mir-1953 mmu-mir-1965 mmu-mir-1969 mmu-mir-297a-1 mmu-mir-297a-2 mmu-mir-297a-6 mmu-mir-466f-4 mmu-mir-468 mmu-mir-489 mmu-mir-574 mmu-mir-720 mmu-mir-875 mmu-mir-1186 mmu-mir-1187 mmu-mir-1192 mmu-mir-1195 mmu-mir-1196 mmu-mir-1274a mmu-mir-1892 mmu-mir-1893 mmu-mir-1894 mmu-mir-1900 mmu-mir-1902 mmu-mir-1903 mmu-mir-1904 mmu-mir-1906 mmu-mir-1907 mmu-mir-1927 mmu-mir-1929 mmu-mir-1932 mmu-mir-1935 mmu-mir-1936 mmu-mir-1937c mmu-mir-1938 mmu-mir-1940 mmu-mir-1945 mmu-mir-1946b mmu-mir-1948 mmu-mir-1950 mmu-mir-1951 mmu-mir-1954 mmu-mir-1956 mmu-mir-1957 mmu-mir-1958 mmu-mir-1959 mmu-mir-1960 mmu-mir-1962 mmu-mir-1963 mmu-mir-1966 mmu-mir-1970 mmu-mir-2137 mmu-mir-2139 mmu-mir-449b mmu-mir-466g mmu-mir-466i mmu-mir-466j mmu-mir-467f mmu-mir-467h mmu-mir-546 mmu-mir-599 mmu-mir-669g mmu-mir-669i mmu-mir-669j mmu-mir-669n mmu-mir-680-1 mmu-mir-680-2 mmu-mir-682 mmu-mir-684-1 mmu-mir-684-2 mmu-mir-685 mmu-mir-686 mmu-mir-688 mmu-mir-690 mmu-mir-692-1 mmu-mir-692-2 mmu-mir-693 mmu-mir-694 mmu-mir-703 mmu-mir-704 mmu-mir-705 mmu-mir-707 mmu-mir-713 mmu-mir-715 mmu-mir-763 mmu-mir-105 mmu-mir-1895 Previously annotated miRNA hairpins that did notpassour criteriafar consideration as miRNAo Status heterogeneous 5' heterogeneous 5' heterogeneous 5' incorrect miRNA incorrect miRNA* incorrect miRNA* incorrect miRNA* incorrect miRNA no miRNA* no miRNA* no miRNA* no miRNA* no miRNA no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* no miRNA* notenough reads notenough reads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenough reads notenough reads notenough reads notenough reads not enoughreads not enoughreads not enoughreads not enoughreads not enoughreads not enoughreads not enough reads not enough reads not enough reads not enough reads not enough reads not enoughreads not enoughreads not enoughreads not enoughreads not enoughreads not enoughreads not enough reads not enough reads not enough reads not enoughreads not enoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenough reads notenough reads notenough reads notenough reads notenough reads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenough reads notenoughreads notenough reads notenough reads notenough reads notenough reads notenough reads notenoughreads notenough reads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenoughreads notenough reads notsequenced notsequenced 126 Supplemental Table 4. mmu-mir-1896 mmu-mir-1897 mmu-mir-1898 mmu-mir-1899 mmu-mir-1901 mmu-mir-1905 mmu-mir-1928 mmu-mir-1931 mmu-mir-1939 mmu-mir-1942 mmu-mir-1946a mmu-mir-1952 mmu-mir-1961 mmu-mir-1967 mmu-mir-1971 mmu-mir-207 mmu-mir-2136 mmu-mir-220 mmu-mir-327 mmu-mir-343 mmu-mir-432 mmu-mir-453 mmu-mir-467g mmu-mir-509 mmu-mir-568 mmu-mir-654 mmu-mir-678 mmu-mir-680-3 mmu-mir-681 mmu-mir-683-1 mmu-mir-683-2 mmu-mir-687 mmu-mir-691 mmu-mir-695 mmu-mir-697 mmu-mir-698 mmu-mir-706 mmu-mir-709 mmu-mir-710 mmu-mir-711 mmu-mir-717 mmu-mir-718 mmu-mir-719 mmu-mir-721 mmu-mir-759 mmu-mir-761 mmu-mir-762 mmu-mir-767 mmu-mir-804 mmu-mir-882 mmu-mir-2132 mmu-mir-2133-1 mmu-mir-2133-2 mmu-mir-2134-1 mmu-mir-2134-2 mmu-mir-2134-3 mmu-mir-2134-4 mmu-mir-2135-1 mmu-mir-2135-2 mmu-mir-2135-3 mmu-mir-2135-4 mmu-mir-2135-5 mmu-mir-2138 mmu-mir-2140 mmu-mir-2141 mmu-mir-2142 mmu-mir-2143-1 mmu-mir-2143-2 mmu-mir-2143-3 mmu-mir-2144 mmu-mir-2145-1 mmu-mir-2145-2 mmu-mir-2146 mmu-mir-689-1 mmu-mir-689-2 mmu-mir-1983 mmu-mir-451 mmu-mir-469 mmu-mir-484 mmu-mir-805 continued fromprevious page notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced notsequenced nutsequenced notsequenced notsequenced notsequenced notsequenced overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlapsrRNA overlaps rRNA overlapsrRNA overlapsrRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlaps rRNA overlapsrRNA overlaps rRNA overlaps tRNA noncanonical miRNA;manyreadsthatmappedwell into theioopof the putativehairpin manvreadsthat mappedwell intothe loopof the putativehairpin didnot givea predictedfoldwith the requisitepairinginvolng thecandidate andpredictedmiRt many reads that mapped well intothe loopof the putativehairpin 127 Chapter 4 Future Directions MicroRNAs (miRNAs) play an important role in gene regulation by posttranscriptionally repressing expression of their target genes (Bartel 2004). The key determinant of miRNA targeting is the seed sequence, corresponding to nucleotides 2-7 of the mature miRNA (Lewis et al. 2003; Lewis et al. 2005; Bartel 2009). Hence, accurate annotation of mature miRNA species as well as authentic miRNA genes is fundamental to understanding miRNA gene regulation. However, the genomic study of murine miRNA genes, described in the previous chapter, suggests that many previous annotations are questionable. In addition, the study identified novel miRNA genes and variations in the miRNA biogenesis pathway that resulted in multiple miRNA isoforms. This chapter addresses three areas for future studies. First, the quality of miRNA annotations should be examined. With advances in sequencing technology, a large number of novel miRNA genes have been identified. The next step is to review the quality of the database so that it can provide the most accurate and comprehensive list of miRNA genes. In addition, deeper sequencing of small RNAs revealed interesting processing variations at each step of the miRNA biogenesis pathway. Many of these phenomena resulted in the production and/or RISC-loading of RNA species with different seed sequences, which led to targeting and inhibition of different sets of mRNAs. Further work on discovery of additional miRNAs that undergo similar processing variations and 129 identification of their biogenesis mechanisms will be informative in understanding miRNA gene regulation. Lastly, high-throughput sequencing technology has opened the door for integrative approaches to studying small RNAs on the genomic scale. Closer inspection of small-RNA sequencing data coupled with those from interactome or transcriptome studies may advance the understanding of biogenesis and/or function of small RNAs. MicroRNA gene annotations As the official archive of miRNA genes, miRBase should be a source of accurate information. Although stringent discovery methods can better distinguish authentic miRNAs from degradation fragments, incomplete understanding of miRNA biogenesis hinders the establishment of comprehensive guidelines for miRNA gene discovery. While the major proteins and hairpin features required for miRNA processing have been identified, other yet unidentified features appear to also affect miRNA biogenesis. Since the guidelines for miRNA discovery are rooted in the knowledge of how miRNAs are processed, better understanding of miRNA biogenesis will lead to improved miRNA gene identification methods. Despite the best efforts, however, some false annotations will likely continue to exist in miRBase. False entries are occasionally expunged from the database, but short of additional reads that suggest nonspecific degradation, it is difficult to prove that an entry is an incorrect annotation rather than a miRNA that is only produced under very specific conditions. 130 Previously, users could gauge the confidence with which miRNA genes are annotated using sequencing data obtained from Gene EXpression Omnibus (GEO). The processing of raw data, however, is laborious and may have discouraged users from utilizing this resource. To facilitate the process, miRBase has begun to incorporate sequencing data to the entries (Kozomara and Griffiths-Jones 2011). While users who look at individual entries will benefit most from this change, other users are more interested in the list of all miRNA genes in the genome. To address these distinct needs, miRBase could be separated into two databases, one of confidently identified miRNA genes and another of candidates. In this scenario, a novel hairpin could first be registered as a candidate and then be moved into the confidently identified list with additional confirmation. Under such a system, users can decide whether the more accurate or the more comprehensive list of miRNAs is appropriate for their studies. A similar suggestion of dividing miRBase into multiple parts has been recently proposed (Kozomara and Griffiths-Jones 2011). MicroRNA gene discovery by sequencing The miRNA gene discovery efforts strive to be not only accurate but also comprehensive. While most of the conserved miRNAs appear to be identified, additional miRNA genes will be discovered through deeper sequencing of a broader range of samples. These novel miRNAs may correspond to biologically relevant genes that are only expressed in specific cell types or under specific conditions. Some may correspond to low-abundance RNA transcripts that happen to fold into hairpins and "accidentally" fall into the miRNA biogenesis pathway but have not acquired any conserved biological function. 131 Nonetheless, even such miRNA genes can affect cell function if produced in sufficient quantity, much like transfected short hairpin RNAs (shRNAs) or small interfering RNAs (siRNAs). In order to identify additional novel miRNA genes in erythrocytes, small RNA sequencing data from murine erythrocytes at three different stages of maturation were analyzed. The three stages corresponded to burst-forming unit erythrocyte (BFU-E), colony-forming unit erythrocyte (CFU-E), and terminally differentiated cells identified by Ter 19+ antibody. The analysis identified 12 novel miRNAs in mouse. Further work on changes in miRNA-ome through erythrocyte maturation may provide insight into the role of miRNAs in the process. If stage-specific processing variation is observed in these samples, erythrocytes may become an attractive platform to experimentally investigate the mechanisms and biological functions of such processing variations. In addition to mouse sequencing data, small-RNA sequencing data from the human brain was examined (Appendix D), and 35 novel human miRNA gene candidates were identified. Although the human brain data has been informative in miRNA discovery, analysis of only one tissue sample is insufficient to construct a list of questionable human miRNA annotations. Sequencing data from additional samples will help to better portray the state of human miRNA gene annotations. Computational prediction of miRNAs While small RNA sequencing studies have identified many miRNA genes, these approaches can only identify those that are expressed above a certain level in the 132 sequenced sample. Thus, a comprehensive coverage of all miRNA genes in an organism remains difficult to achieve through sequencing. Alternatively, machine learning-based approaches can be used to predict all potential miRNA genes encoded in the genome. These entries can then be submitted as candidate genes waiting for experimental confirmation. In addition to identifying all possible hairpins that can be processed as miRNAs, these methods can also help discover additional features that affect miRNA processing. Understanding such features would not only help establish a more definitive guideline for miRNA discovery but also provide information on how to design artificial hairpins that can be more efficiently processed into mature miRNAs. A machine-learning algorithm learns the properties characteristic of miRNAs (features) from a training set of known miRNAs (positives) and other pseudo-miRNA hairpins that do not produce mature miRNAs (negatives). It then builds a classifier that can predict whether a given sequence is an authentic miRNA hairpin. To examine if there is room for improvement in previous studies utilizing machine learning-based approaches, their performances were tested on a set of confirmed miRNAs (positives) and pseudo-miRNA hairpins that failed the ectopic overexpression assay (negatives) (Sewer et al. 2005; Helvik et al. 2007; Jiang et al. 2007). Since most of the recently discovered miRNA genes are nonconserved, it is likely that most of the conserved miRNA genes have already been found. Accordingly, none of the tested methods used conservation as a feature to describe a hairpin property. The results demonstrated that the sensitivity and especially the specificity of these methods were 133 lower than the reported value. A better training set and/or additional features may improve the accuracy of these prediction programs. To determine if a better training set can improve the predictions, the programs were re-trained using a new training set. Previous methods have used contemporary miRBase entries as the positives and other non-coding RNA or mRNA hairpins as the negatives. Since the previous training set contained many false entries, it stands to reason that a better training set will improve prediction accuracy. The new training set will consist of confirmed miRNAs (positives) and unconfirmed miRNAs that did not have any reads mapping to them (negatives). The hairpins with high sequence similarities will be represented by a single hairpin so that the characteristics of any particular hairpin family are not overrepresented in the training set. Also, the hairpins used to test previous methods will be removed from the training set so that they can be used to determine if the re-training has improved the predictions. Although the re-trained algorithms are expected to predict miRNA genes with higher accuracy, it is unlikely that all of the features that distinguish genuine miRNAs from pseudo-miRNAs have been identified. Most previous works selected elements of sequence and secondary structure as features. A new classifier can be built using the most informative features from the re-trained programs as well as additional features that describe the flanking regions and the tertiary structure. If any of these features contribute to a more accurate identification of miRNA hairpins, their biological significance can be explored by a series of hairpin mutagenesis experiments. MicroRNAs mapping to multiple loci 134 Many miRNAs map to multiple loci in the genome, most likely due to gene duplication. When a sequence maps to multiple loci, the read numbers are distributed equally to the loci as though all loci have contributed equally to the production of the sequenced reads. Therefore, it appears as though multiple loci have produced equal amounts of identical miRNA species. In reality, at least one of the loci must generate the miRNAs, but not all the loci may be expressed and/or processed. Even if transcripts from all the loci are processed as miRNA hairpins, they may not produce identical mature miRNAs, as observed for mouse mir-133 and fly mir-2 (Ruby et al. 2007). Ectopic overexpression assay of miRNA genes that appear to produce identical mature miRNAs can identify gene products from each locus. While the sequence similarities may make it difficult to clone the hairpins, the information gained from differential processing of highly related loci would be valuable. First, the information may help identify additional features that contribute to miRNA processing. Since most of the sequence and thus the secondary structure of the hairpins would be identical, it may be easier to narrow down the elements responsible for differential miRNA processing. Furthermore, identification of loci responsible for miRNA production may affect experimental design of miRNA functional studies. For example, if transcripts from only one of the multiple loci can be processed into mature miRNAs, it may be sufficient to knock out the gene at just the one locus rather than at all loci. Lastly, if the loci that were previously thought to generate identical mature miRNAs actually produced miRNAs with different seeds, each locus would target different mRNAs and thus have distinct biological functions. 135 MicroRNA isoforms While most miRNA genes give rise to one mature miRNA species, some genes produce multiple miRNA isoforms with different 5' ends. Although miRNA 5' heterogeneity had previously been observed (Ruby et al. 2007; Stark et al. 2007; Azuma-Mukai et al. 2008; Wu et al. 2009), they were attributed to erroneous Drosha cleavage. However, the functional study of miRNA isoforms concluded that both isoforms could repress transcripts with corresponding seeds when they are produced in sufficient quantity. A point of interest is the identification of the feature that distinguishes the primiRNAs that generate isoforms from those that only generate a single dominant species. While the presence of a sequence or structural motif in the two groups of pri-miRNAs can be examined, there is also the possibility that additional factors are needed for heterogeneous processing. If pri-miRNAs that produce miRNA isoforms in vivo can also produce isoforms in an in vitro reaction with purified Microprocessor, then it can be concluded that all the features that encode for the isoform production are present on the pri-miRNA. Also, a number of conserved miRNA genes produce multiple miRNA isoforms, but it has not yet been examined whether the heterogeneous 5' processing is conserved to other species. Conservation of isoform production can be confirmed with small RNA sequencing data from other species. Dicer-independent and AGO2-dependent miRNAs MiR-451 is the only known miRNA to be generated by the noncanonical pathway through AGO2 cleavage rather than Dicer cleavage (Cheloufi et al. 2010; Cifuentes et al. 136 2010). To identify other miRNAs generated through AGO2 cleavage, mouse sequencing data was re-scanned for shorter hairpins with reads mapping through the terminal loop. Although several candidates were identified, none showed significant difference in expression in the AGO2 knockout mouse livers compared to the wild type (Cheloufi et al. 2010). Thus, the biogenesis of these candidates appeared to be AGO2-independent. In a different approach, each chromosome of the mouse genome was scanned using 100-nt window, and the reads from the wild type and the AGO2 knockout mouse livers were mapped to each window to determine the loci with AGO2-dependent reads. Although no other Dicer-independent and AGO2-dependent miRNAs were identified using these methods, sequencing additional samples from AGO2 knockout mouse may help identify other miR-45 1-like hairpins. Arm-switching miRNAs The arm-switching miRNAs produce mature miRNAs from the 5' or the 3' arm depending on the cell-type or developmental stage (Ro et al. 2007; Grimson et al. 2008). However the mechanism of this selection has not yet been explored. First, cell lines where arm-switching can be observed need to be identified. To this end, the arm preference of miRNAs in each cell line should be observed by sequencing or by quantitative Northern blot. Once two cell lines with different arm preferences are identified, the RISC-loading complex (RLC) can be immunoprecipitated using an antibody against one of its components. The other proteins that are pulled down with the complex can then be analyzed by mass spectrometry. The results from the two cell lines can be compared to determine which proteins were uniquely present in one cell line. To 137 determine if these candidates affect strand selection, they can be ectopically expressed in the cell line where it is normally absent. An alternative method is to reconstitute RLC in vitro with the candidate proteins and examine whether the strand preference changes. De novo prediction of piRNA clusters Piwi-interacting RNAs are a class of -26-30 nt small RNAs in germ cells that have been implicated in transposon silencing (Malone and Hannon 2009; Lau 2010). In 2006, efforts to identify RNA binding partners of Piwi proteins led to the discovery of piRNAs (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Lau et al. 2006). By definition, the most accurate method to identify piRNAs is through sequencing small RNAs that co-purify with Piwi proteins. Although this approach has led to identification of -140 piRNA clusters (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Lau et al. 2006), it is more burdensome than directly cloning small RNAs from total RNA. A computational method that can identify piRNA clusters from small-RNA sequencing data will be a beneficial tool for detecting piRNA production with minimal effort. Using the existing sequencing data gathered from RNAs that interact with individual members of the Piwi protein, an algorithm can be trained to identify features of distinct classes of piRNAs. Such feature would include known properties of piRNAs such as length, density of reads, nucleotide composition, and genomic location. If the trained features can adequately describe piRNAs, the algorithm should be able to identify piRNA clusters de novo from small RNA sequencing data. This method would not only examine the known piRNA clusters but also detect piRNA-like reads from loci that have not been previously implicated in piRNA production. 138 Acknowledgements I would like to thank B. Wong for the small RNA sequencing data of erythrocytes and V. Agarwal for the collaborative work on computational prediction of miRNAs. I would also like to thank D. Baek for discussions on arm-switching miRNAs and miRNA isoforms, and J. G. Ruby for help in looking at piRNAs in the mouse testes small-RNA library. References Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., Chien, M., Russo, J.J., Ju, J., Sheridan, R., Sander, C., Zavolan, M., and Tuschl, T. 2006. A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442(7099): 203-207. Azuma-Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi, M.C. 2008. Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. P NatlAcad Sci Usa 105(23): 7964-7969. Bartel, D. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116(2): 281-297. Bartel, D.P. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell 136(2): 215-233. Cheloufi, S., Dos Santos, C.O., Chong, M.M.W., and Hannon, G.J. 2010. A dicerindependent miRNA biogenesis pathway that requires Ago catalysis. Nature 465(7298): 584-589. Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., Wolfe, S.A., and Giraldez, A.J. 2010. A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328(5986): 1694-1698. Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099): 199-202. Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27(1): 91-105. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193UI115. Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. 2006. A novel class of small RNAs in mouse spermatogenic cells. Genes & Development 20(13): 1709-1714. 139 Helvik, S.A., Snove, 0., and Saetrom, P. 2007. Reliable prediction of Drosha processing sites improves microRNA gene prediction. Bioinformatics 23(2): 142-149. Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. 2007. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res 35(Web Server issue): W339-344. Kozomara, A. and Griffiths-Jones, S. 2011. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39(Database issue): D152-157. Lau, N.C. 2010. Small RNAs in the animal gonad: guarding genomes and guiding development. Int JBiochem Cell Biol 42(8): 1334-1347. Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes. Science 313(5785): 363-367. Lewis, B., Burge, C., and Bartel, D. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120(1): 15-20. Lewis, B., Shih, I., Jones-Rhoades, M., Bartel, D., and Burge, C. 2003. Prediction of mammalian microRNA targets. Cell 115(7): 787-798. Malone, C.D. and Hannon, G.J. 2009. Small RNAs as guardians of the genome. Cell 136(4): 656-668. Ro, S., Park, C., Young, D., Sanders, K.M., and Yan, W. 2007. Tissue-dependent paired expression of miRNAs. Nucleic Acids Res 35(17): 5944-5953. Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17(12): 1850-1864. Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., Tuschl, T., van Nimwegen, E., and Zavolan, M. 2005. Identification of clustered microRNAs using an ab initio prediction method. BMC Bioinformatics 6: 267. Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson, J.W., Crosby, M.A., Rasmussen, M.D., Roy, S., Deoras, A.N., Ruby, J.G., Brennecke, J., Hodges, E., Hinrichs, A.S., Caspi, A., Park, S.-W., Han, M.V., Maeder, M.L., Polansky, B.J., Robson, B.E., Aerts, S., van Helden, J., Hassan, B., Gilbert, D.G., Eastman, D.A., Rice, M., Weir, M., Hahn, M.W., Park, Y., Dewey, C.N., Pachter, L., Kent, W.J., Haussler, D., Lai, E.C., Bartel, D.P., Hannon, G.J., Kaufman, T.C., Eisen, M.B., Clark, A.G., Smith, D., Celniker, S.E., Gelbart, W.M., and Kellis, M. 2007. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450(7167): 219-232. Wu, H., Ye, C., Ramirez, D., and Manjunath, N. 2009. Alternative processing of primary microRNA transcripts by Drosha generates 5' end variation of mature microRNA. PLoS ONE 4(10): e7566. 140 Appendices Appendix A Appendix A has been previously published as: Batista, P.J., Ruby, J.G., Claycomb, J.M., Chiang, R., Fahlgren, N., Kasschau, K.D., Chaves, D.A., Gu, W., Vasale, J.J., Duan, S., Conte, D., Luo, S., Schroth, G.P., Carrington, J.C., Bartel, D.P., and Mello, C.C. 2008. PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C. elegans. Mol Cell 31(1): 67-78. © 2008 Elsevier Inc. Appendix B Appendix B has been previously published as: Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193U 1115. © 2008 Macmillan Publishers Limited. Appendix C Appendix C has been previously published as: Rao, P.K., Toyama, Y., Chiang, H.R., Gupta, S., Bauer, M., Medvid, R., Reinhardt, F., Liao, R., Krieger, M., Jaenisch, R., Lodish, H.F., and Blelloch, R. 2009. Loss of cardiac microRNA-mediated regulation leads to dilated cardiomyopathy and heart failure. Circ Res 105(6): 585-594. C 2009 American Heart Association, Inc. Appendix D Appendix D has been previously published as: Shin, C., Nam, J.-W., Farh, K.K.-H., Chiang, H.R., Shkumatava, A., and Bartel, D.P. 2010. Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38(6): 789-802. C 2010 Elsevier Inc. 141 .......... ..... .. .. ..... . .......... .... PR E S Molecular Cell PRG-1 and 21 U-RNAs Interact to Form the piRNA Complex Required for Fertility in C. elegans Pedro J. Batista,' 5-170 J.Graham Ruby,2 ,3.6,10 Julie M.Claycomb,1 Rosaria Chiang,2,3S6 Noah Fahlgren, 78, Conte, Jr.,' Shenghua Duan,' Darryl ,aDaniel A.Chaves,' Weifeng Gu,' Jessica J.Vasale,' Kristin D.Kasschau, 4 2 Shujun Luo,9 Gary P.Schroth,9 James C.Carrington, 7-8 David P. Bartel, ,3,e,* and Craig C.Mellol ,* 'Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA Hughes Medical Institute Department of Biology Massachusetts Institute of Technology, Cambridge, MA 02139, USA 4 Howard Hughes Medical Institute, Worcester, MA 01605, USA 5 Gulbenkian PhD Programme in Biomedicine, Rua da Quinta Grande, 6, 2780-156, Oeiras, Portugal 6 Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA 7Center for Gene Research and Biotechnology 8 Department of Botany and Plant Pathology Oregon State University, Corvallis, OR 97331, USA 91llumina, Inc., Hayward, CA 94545, USA 10 These authors contributed equally to this work *Correspondence: dbartel@wi.mit.edu (D.P.B.), craig.mello@umassmed.edu (C.C.M.) DOl 10.1016/j.molcel.2008.06.002 2 Howard 3 SUMMARY In metazoans, Piwi-related Argonaute proteins have been linked to germline maintenance, and to a class of germline-enriched small RNAs termed piRNAs. Here we show that an abundant class of 21 nucleotide small RNAs (21 U-RNAs) are expressed in the C. elegans germline, interact with the C. elegans Piwi family member PRG-1, and depend on PRG-1 activity for their accumulation. The PRG-1 protein is expressed throughout development and localizes to nuage-like structures called P granules. Although 21 U-RNA loci share a conserved upstream sequence motif, the mature 21 U-RNAs are not conserved and, with few exceptions, fail to exhibit complementarity or evidence for direct regulation of other expressed sequences. Our findings demonstrate that 21 URNAs are the piRNAs of C.elegans and link this class of small RNAs and their associated Piwi Argonaute to the maintenance of temperature-dependent fertility. INTRODUCTION Diverse organisms utilize sequence-specific gene regulatory pathways that share features with RNA interference (RNAi). The effector complex in all RNAi-related pathways consists of a single-stranded small RNA, and a member of the AGO protein family, which binds small-RNA termini, leaving internal nucleotides accessible for base-pairing interactions with target sequences. In canonical RNAi pathways, double-stranded RNA (dsRNA) is processed by members of the Dicer family of multifunctional ribonucleases into 21-24 nucleotide (nt)short interfering RNAs (siRNAs) that interact with and guide AGO proteins to complementary target sequences in the cell (reviewed in Hutvagner and Simard, 2007). Most animals have an additional AGO subfamily called Piwi. C. elegans has two Piwi-related genes (named prg-1 and prg2) that, like Piwi family members from a number of animal species, have been implicated in germline maintenance and fertility (reviewed in Klattenhoff and Theurkauf, 2008). Two classes of Piwi-interacting RNAs (piRNAs) have been identified, including (1)repeat-associated piRNAs (originally annotated as rasiRNAs) that appear to target transposons, and (2) a second, more mysterious class of piRNAs with no known targets (Lin, 2007). The latter class of piRNAs is extremely abundant in small-RNA fractions isolated from pachytene-stage mouse spermatocytes: over 80,000 distinct species are derived from large genomic clusters of up to 200 kb (Aravin et al., 2006; Grivna et al., 2006; Girard et al., 2006; Lau et al., 2006). These clusters exhibit a marked strand asymmetry, as though the piRNAs within a region are all processed from one large transcript or two divergent transcripts. Studies in C. elegans have identified several classes of endogenously expressed small RNAs (Ambros et al., 2003; Ruby et al., 2006). However, which, if any, of these represent piRNAs has yet to be determined. One class of small RNAs, termed 21 U-RNAs, shares several characteristics with the piRNAs of flies and mammals, including an overwhelming bias for a 5' uracil, a 5' monophosphate, and a 3' end that is modified and resistant to periodate degradation (Ruby et al., 2006; Ohara et al., 2007; Saito et al., 2007; Horwich et al., 2007; Kirino and Mourelatos, 2007). However, 21 U-RNAs are shorter than piRNAs in flies and mammals, and their genomic organization is very different, with 21 U-RNAs deriving from what appear to be thousands of individual, autonomously expressed loci broadly scattered intwo large regions of one chromosome. Here we show that 21 U-RNAs are expressed in the germline and that their accumulation depends on the wild-type activity of PRG-1. We show that PRG-1 localizes to germline P granules Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 67 ........... PR E S and that 21U-RNAs coimmunoprecipitate with PRG-1 from worm lysates. Our analysis identifies many additional 21URNAs, bringing the total number of 21U-RNA loci to 15,722, and confirms the expression of many 21 U-RNA loci previously predicted based only on the presence of an upstream sequence motif. Like the abundant pachytene piRNAs found in mammals, 21 U-RNAs encode remarkable sequence diversity and yet lack obvious targets. Although we identify one example of a transposon-directed 21U-RNA, our findings suggest that piRNA complexes of worms, charged with the remarkable sequence diversity encoded by 21 U-RNAs, are likely to provide other essential germline functions. Molecular Cell 21 U-RNAs Are C. elegans piRNAs CTGTTTCA L -A/T rich mIRNAs end endogenous ' T31 AT rich siRNAs -2K n 21U-RNA 21U-RNAs na I 160KK 140K120,K- ~ F-.-6K 8K8 -6K 10 K- -5K 80K- t 4K 60K 894 2090 2560 2854 2417 2098 3K 40K- 1405 1388 1754 2K g 20K j- 1K 0 -30 -20 -10 0 10 20 30 1 40 80 120 160200 Reeds perlocu 21U-RNA upstream molif score RESULTS 45K- Identification of Over 15,000 Unique 21U-RNA Species in C. elegans We used Solexa sequencing technology (Seo et al., 2004) to generate 29,112,356 small-RNA cDNA reads that perfectly matched the C. elegans genome. Among these we identified 971,981 reads from 15,458 unique loci with properties similar to previously defined 21 U-RNA loci (Ruby et al., 2006). These reads matched 95.1% of the 5454 previously sequenced 21 U-RNAs and 78.3% of the 10,644 previously predicted 21 U-RNAs (Ruby et al., 2006) and brought the total number of unique experimentally confirmed 21 U-RNA loci to 15,722. A common characteristic of 21 U-RNA loci is the presence of an upstream sequence motif (Figure 1A; Ruby et ai., 2006). As previously observed, RNA species 21 nt in length could be separated into two distinct sets based on the motif scores of their genomic loci (Figure 11B). Species with a high motif score also tended to exhibit the other essential features, including 21 nt length and 5'-U nucleotide, that together define the 21U-RNA class (see Figures S1A-S1C available online). 21 U-RNAs with strong upstream motif matches were concentrated in two broad regions along chromosome IV (Figure 1C; Ruby et al., 2006). Supporting the potential importance of this motif in 21 U-RNA biogenesis, the motif score strongly correlated with the magnitude of 21 U-RNA expression, as indicated by the number of sequenced reads in our data sets (Figure 1D). Despite the presence of many high-scoring 21 U-RNA motifs in orthologous regions of the C. briggsae genome, the 21 U-RNA sequences themselves were not conserved. Even in rare cases in which the core of the upstream motif was perfectly aligned to a high-scoring motif within a syntenic region of the C. briggsae genome (Blanchette et al., 2004), the sequence of the consequent 21U-RNA was essentially nonconserved (Figure 1E). Only approximately 6% of the 21U-RNA loci and/or motifs were unambiguously aligned within syntenic regions in C. briggsae. In these few cases, this was often due to overlap with annotated coding exons, which rarely contain 21 U-RNAs (Figure S1 D). The only portion of the 21 U-RNA flanking regions with elevated conservation frequencies above background was the 8 nt core of the upstream motif (Figure S1 E). 21U-RNAs Are Expressed in the C. elegans Germllne The developmental dynamics of 21 U-RNA expression were examined by northern blot analysis using probes specific for 68 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 35K25K15K4 0 2M I I I 4M 6M 8M E I IM Chrlv coordiats 12M 14M 16M 23reeds aanem eno.2gc!! '13 k 1111 I Mill111111 I Zs~ ?C5 Il II e31 seMCAnemca rerme Il 1111 I I 1: -- - I lI - 1 ... CA. 156red 221n amsmcasermre~aenceeceremm~ACc L.Th lil iiIIIII II I SenesseUmmasssemc.. I I I Il II iI I I til il Figure 1. 21U-RNAs Can Be Distinguished from Other RNA Species by Their Lengths and Upstream Motif Matches (A)Aschematic representation of the 21 U-RNA upstream motif as described previously (Ruby et al., 2006). (B)The number of 21 nt RNA reads (blue) or unique loci (pink) corresponding to each upstream motif score (rounded to the nearest unit). Ascore cut-off of 7 (orange) defined the 21 U-RNA population. (C)The distribution of 21 U-RNA reads across chromosome IV.Normalized read counts were summed for each nonoverlapping 100 kb bin (blue). (D)Correlation between the upstream motif score and the magnitude of 21 URNA expression. For each three-bit bin of motif scores, the number of reads was determined for every experimentally identified 21 U-RNA locus. The median read number is plotted, and the 25th and 75th percentiles are indicated (error bars), as isthe number of loci in each bin. (E)Two 21 U-RNA loci whose core upstream motifs are aligned (Blanchette et al., 2004). The core motif (green) and 21 U-RNA loci (pink) are highlighted. The C. briggsae 21U-RNA was annotated based on the highest-scoring 5' end corresponding to the conserved core motif. The number of reads from C.elegans isindicated, as isthe motif score for each 21 U-RNA ortholog. 21U-RNA-1 and 21U-RNA-3442. Both small RNAs were expressed at low levels from the Li to L3 stage, began to accumulate to high levels during the L4 stage, and reached maximal expression in the young adult and gravid adult stages (Figure 2A). This pattern of expression correlated with the proliferation of the germline and was consistent with a germline origin. Both RNAs were expressed at approximately equal levels in maleor female-enriched populations (Figure 2B) but were absent in .... .. ............ .... ..... ........... .. ............. ...... PR E S Molecular Cell 21 U-RNAs Are C. elegans piRNAs machinery, we systematically examined RNA prepared from mutant strains lacking specific components of the RNAi pathway. The accumulation of 21 U-RNAs did not require the wild-type activities of any of the previously described RNAi pathway components, including DCR-1 (Figure 3A, left, and Figure S2). 21UR-3442 To determine if accumulation of 21 U-RNAs is dependent on any AGO proteins, we also analyzed mutant strains representing miR-6 all of the C. elegans AGO family members, including several mulSL1 tiple mutant strains. Only prg-1 mutants lacked 21U-RNA-1 and 21 U-RNA-3442 (Figure 3A, right, and data not shown). Strains mutant for prg-2, a nearly identical homolog of prg-1, did not exhibit defects in 21U-RNA expression (Figure 3A, right). We observed no defects in miRNA expression. However, we did note two 21 U-RNAs that appear to have been misannotated as 41% miRNAs (see Supplemental Results). Moreover, prg-1 mutants 4% exhibited a wild-type RNAi response to foreign dsRNA (data not shown). These findings suggested that prg-1 was defective specifically in the 21 U-RNA pathway. Consistent with the genetic requirement of prg-1 for 21U-RNA accumulation, the stage-specific expression of PRG-1 protein was coincident with that of 21U-RNA-1 and 21U-RNA-3442. PRG-1 levels were reduced in L1/L2 and L2/L3 worms when compared with L4 worms, as well as young and gravid adults Figure 2. 21U-RNAs Are Expressed in the C.elegans Germline (A)RNA isolated from synchronized wild-type populations at the indicated (Figure 3B). As observed for 21U-RNAs, we could also detect the developmental stages analyzed on a northern blot, successively probing for PRG-1 protein in embryo extracts, and we were unable to detect two 21 U-RNAs, a miRNA, or a loading control (the SL1 precursor). PRG-1 inthe glp-4(bn2) mutant strain, suggesting that this protein (B)RNA isolated from wild-type worms, compared to that obtained from muis expressed in the germline. PRG-1 was also present in protein tant strains glp-4(bn2) and eft-3(q145), which lack a germline; fog-2(q71), extracts from both female- and male-enriched populations. Curia male-only population; and fem-1(hc17), which lack sperm, analyzed as in (A). (C)The expression profile for the bulk population of 21 U-RNAs as determined ously, the expression of prg-1 was reduced in wild-type worms by large-scale sequencing. Plotted for each library isthe percent of reads that cultured at 25*C (Figure 3B). Analysis of the expression of the represented 21 U-RNAs. Some libraries were prepared for sequencing starting prg-1/prg-2 mRNA by real-time PCR revealed an expression patwith Rnl2(1-249) ligase (light blue), and others were prepared starting with T4 tern similar tothatobservedfor the PRG-1 protein. The onlyexcepRNA ligase 1 (dark blue; see Experimental Procedures). tion observed was in the embryonic stage (Figure 3B). Although we could detect a high level of the PRG-1 protein in embryos, the mRNA was almost undetectable, supporting the idea that PRG-1 RNA samples prepared from germline-deficient glp-4(bn2) and eft-3(q145) mutant populations (Figure 2B). Finally, both small complexes in embryos are parentally derived. In wild-type worms, we observed a striking localization of RNAs were present in embryos (Figure 2A), which may reflect PRG-1 inthe cytoplasm and in prominent cytoplasmic structures maternal and/or paternal loading. High-throughput sequencing indicated that the developmen- in germ cells at nearly all stages of germline development. In both hermaphrodites and males, PRG-1 formed perinuclear foci in tal expression profile for the entire class of 21 U-RNAs was indistinguishable from that of 21 U-RNA-1 and 21 U-RNA-3442 both the mitotic and meiotic zones of the germline (Figures 3C (Figure 2C). The number of sequenced reads for each 21 U- and 3D). In mature oocytes the staining persisted, but PRG-1 RNA species increased dramatically in late larval and adult foci lost their perinuclear association and became dispersed in stages. Furthermore, the number of reads was reduced (130- the cytoplasm (Figure 3C and data not shown). In males, all fold), from 5.8% to just 0.04% of total reads, in animals lacking PRG-1 staining disappeared abruptly as spermatids matured a germline (Figure 2C). Adult hermaphrodites switch to an exclu- (Figure 3D). The pattern of PRG-1 localization, including its localsively female mode of gametogenesis and store only 200 to 300 ization during embryogenesis (Figures 3E and 3F), resembled mature sperm. The relative abundance of various individual 21 U- that of P granules, which are components of the C. elegans RNA species was comparable between male and adult her- germline cytoplasm, or nuage (Strome and Wood, 1982; Strome, maphrodite populations, suggesting that very similar 21 U-RNA 2005). Indeed, the localization of PRG-1 perfectly overlapped, populations are present in germlines undergoing oogenesis throughout development, the localization of the previously described P granule component, PGL-1 (Kawasaki et al., 1998; and spermatogenesis. Figure 3G; and data not shown). A 21UR-1 PRG-1 Is Expressed in the Germline and Required for 21U-RNA Accumulation To examine whether the accumulation of 21 U-RNA-1 and 21 URNA-3442 was dependent on known components of the RNAi 21U-RNAs Depend on and Interact Physically with PRG-1 To determine whether PRG-1 is required more broadly for 21 URNA accumulation, we performed high-throughput sequencing Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 69 ................... .- - - ............ UP R E S Molecular Cell 21 U-RNAs Are C. elegans piRNAs 21UR-3442 -- SLi 2 U, Em 4 2_ **- m u KEED. analysis on small-RNA populations prepared from prg-1 mutant animals and from wild-type animals reared at 20*C. For wild-type animals approximately 11% of the 1,789,450 genome-matching reads corresponded to the 21 U-RNAs, whereas forprg-1 mutant animals less than 0.05% of the 1,774,442 genome-matching reads corresponded to 21 U-RNAs (Figure 4A). This dramatic reduction in 21 U-RNAs resembled that observed in animals lacking a germline altogether (Figure 4B). However, prg-1 animals maintained at 200C were fertile and exhibited nearly wild-type levels of another class of germline-enriched small RNAs, the endogenous siRNAs (Figure 40). These findings indicate that prg-1 is required for the accumulation of the entire 21 U-RNA class of small RNAs. To examine whether the 21U-RNAs physically interact with PRG-1, we immunoprecipitated the PRG-1 protein complex along with associated RNA. Both 21U-RNA-1 and 21U-RNA3442 coprecipitated with the PRG-1 immune complex but not with precipitates recovered using preimmune serum (Figure 4D). Small-RNA species that did not require PRG-1 activity for accumulation, such as miR-66, were not detected in PRG1 immunoprecipitates (Figure 4D). In contrast, we found that ALG-1/ALG-2 AGO-associated immune complex contained miR-66 but not 21 U-RNA-1 or 21 U-RNA-3442 (Figure 4D). Biochemical analysis of small RNAs recovered in the PRG-1 IP complex demonstrated a strong bias for small RNAs with 5' U (>91%) compared to the total input population, which was enriched for 5' G (>70%; Figure 4E). Similarly, deep sequencing of small-RNA libraries prepared from the IP sample demonstrated a dramatic enrichment for 21 nt RNAs with 5' U in the 70 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. . ) A Figure 3. PRG-1 Protein Is Expressed in the Germline and Required for 21U-RNA Accumulation (A)Northem blot analysis of 21U-RNA-1, 21URNA-3442, and miR-66 expression in wild-type and the indicated homozygous strains. The double mutant wasprg- 1(tm872); prg-2(tm 1094). The SL1 precursor served as a loading control. (B)The PRG-1 developmental expression profile. Protein lysates generated from wild-type populations at distinct developmental stages were analyzed using a westem blot (top left), as were protein lysates from wild-type worms and from the mutant strains examined in Figure 2B (top right). Tubulin served as a loading control. Expression of prg-1/prg-2 mRNA was analyzed by quantitative real-time PCR, using actin (act-3) mRNA as the normalization standard (bottom panel). (C-F) PRG-1 immunofluorescence (red) and DNA DAPI staining (blue) in dissected gonad arms from an adult hermaphrodite (C)and male (D), a two-cell embryo (E), and a four-cell embryo (F). In (C)and (D)the mitotic (MPZ) and meiotic zones (transition zone plus pachytene) are indicated, as are the proximal zones containing oocytes and sperm (respectively). (G) Dual immunofluorescence analysis of three oocytes in the proximal arm of a wild-type hermaphrodite gonad stained for PRG-1 and PGL-1 as indicated. Yellow represents overlap in the merged image (bottom panel). PRG-1 complex (Figure 4F). In addition, 21mers with high-scoring motif matches were dramatically enriched in the IP sample (Figure 4G) and mapped comprehensively across the previously described 21 U-RNA clusters on chromosome IV (Figure 4H). No other RNA species was significantly enriched in the PRG-1 IP. The above observations suggest that PRG-1 specifically binds 21 U-RNAs to form a complex important for germline function and fertility. prg- 1 Mutants Exhibit a Broad Spectrum of Germline Defects A previous study demonstrated that RNAi targeting both prg-1 and prg-2 leads to reduced fertility (Cox et al., 1998). Our examination of the phenotypic contributions of recently identified probable null alleles revealed that most, if not all, of the germline defects result from the absence of prg-1. For example, prg-2 mutants exhibited wild-type brood sizes at both 200C and 250C (Figure 5A) as well as normal numbers of morphologically wild-type germ cells (compare Figures 5B and 5C). In contrast, prg-1 mutants exhibited dramatically reduced fertility at both temperatures (Figure 5A). Consistent with this phenotype, two different prg-1 mutant strains and a prg-1; prg-2 double mutant strain all exhibited a significant reduction in the total number of germ nuclei populating the adult gonad (Figures 5D-5F). The numbers of germ nuclei were reduced in each zone but were most dramatically reduced in the mitotic zone in these mutants. The reduction in germ cell numbers was observed at all temperatures, and thus does not by itself explain the sterility of prg-1 mutants at 250C. .. .. .................................... . ...... . S PR E Molecular Cell 21U-RNAs Are C.elegans piRNAs IP endo-SiRNMAS 21U-RNAS 21UR-1 21UR-3442 miR-66 a-PRG-1 ct-GFP (GFP:ALG-1/2) Ice VVV11"_ 21U-RNA upstream motif score F E a S'A E S' U H 0 5 G a 5' C Input A S16 171819 20 2122 23 24 25 26 1P 16 17 18 19 20 2122 23 24 25 26 Length (nt) -30 -20 -10 0 10 20 30 21 U-RNA upstream motif score Figure 4. PRG-1 Interacts with and Is Required for the Accumulation of All 21U-RNAs (A)The percentage of 21 nt RNA reads from wild-type young adults (blue) and prg-1(tm872) young adult (pink) corresponding to each upstream motif score (rounded to the nearest unit). Ascore cutoff of 7 (orange) defined the 21 U-RNA population. (B)Severe depletion of 21 U-RNAs inglp-4(bn2) and prg- 1(tm872) mutant worms. Plotted for each library isthe fraction of reads corresponding to 21 U-RNAs, with bars colored as in Figure 2C. (C)Severe depletion of endogenous siRNAs ingfp-4(bn2)but not prg-1(tm872) mutant worms. Plotted for each library isthe fraction of reads with 5' Gnucleotides and complete antisense overlap with coding exons (Ambros et al., 2003; Ruby et al., 2006), with bars colored as in Figure 2C. (D)Immunoprecipitation (IP)analysis of small RNAs in PRG-1 and GFP::ALG1/2 complexes. Immunoprecipitations were performed on lysates prepared from an otherwise wild-type transgenic strain carrying GFP-tagged ALG-1 and ALG-2. The top panels show a northem blot successively probed for the indicated small RNAs. The lower panels show westem blots probed as indicated. (E)Biochemical analysis of the first nucleotide of the small-RNA population that coimmunoprecipitated with the PRG-1 protein (IP). Bars show where the single nucleotides migrate in this thin-layer-chromatography system. (F)The length and 5' nucleotide distribution of reads from the input (top) and PRG-1 co-IP (bottom) libraries. To prevent underrepresentation of endogenous siRNAs, which usually begin with a 5' triphosphate, these libraries were constructed using a protocol that does not require a 5' monophosphate. (G)The percentage of 21 nt RNA reads from the input (blue) and PRG-1 co-IP (red) libraries at each upstream motif score, plotted as in (A). (H)The mapping of 21 U-RNA reads from the PRG-1 co-IP library (red) versus the young adult wild-type library prepared starting with T4 RNA ligase 1 (see Experimental Procedures; blue). Reads were classified as 21 U-RNAs by their motif scores, and normalized read counts were summed for each nonoverlapping 100 kb bin. Although prg-1 mutants exhibit temperature-dependent sterility, they do not appear to encode thermo-labile products. Rather, both alleles examined in this study are likely to represent null mutations (Yigit et al., 2006; Cuppen et al., 2007; Figure S3A). As expected for null mutants, the PRG-1 protein was either absent or truncated in these mutant strains at all temperatures (Figure S3B). Furthermore, the 21 U-RNA depletion associated with prg-1 mutants was observed at all temperatures examined, including the semipermissive temperatures of 150C and 200C. These findings suggest that, in addition to their role in maintaining proper germ-cell numbers at all temperatures, PRG-1/21 URNA complexes may function at higher temperatures to facilitate an otherwise temperature-dependent germline process required for normal fertility. Temperature-shift experiments demonstrated that the temperature-sensitive period of prg-1 mutants occurs during the adult stage. The fertility of animals shifted down from 250C as young adults was substantially rescued, to an average of 40 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 71 ............ _;::.. ........... .. UP R E S ---_-............. - __ __ I Molecular Cell 21 U-RNAs Are C. elegans piRNAs Figure 5. PRG-1 Exhibits a Broad Spectrum of Germline Defects (A)Brood size analysis of prg-1 and prg-2 mutant strains. The brood size of "n" individual animals for each strain was determined at 200Cand 250C. Left and right lines represent the highest and lowest values, respectively. Left and right ends of each box represent the 75th and 25th percentile, respectively; the diamond represents the average brood size; and the vertical line inside the box represents the median value. (B-F) DAPI staining of excised gonads from wild-type, prg-1, and prg-2 strains (as indicated). Gonadal zones are indicated as in Figure 3. progeny (n= 10). Conversely, maintaining animals at 150C during prg-1 Mutants Exhibit Surprisingly Subtle Changes the Li to adult stage, when the germline is proliferating most rap- in Gene Expression idly, did not significantly rescue the fertility defect. These results On chromosome IV hundreds of protein-encoding genes are insuggest that the germ cells produced in prg-1 null mutant ani- terspersed with intergenic and intronic 21 U-RNA loci over genomals (that entirely lack PRG-1 protein expression) are deficient mic regions that are millions of base pairs in length. Therefore, in a process important for their functionality at elevated temper- tiling arrays were used to profile changes in gene expression to ature. determine whether the absence of 21 U-RNAs in prg-1 mutants To examine the relative contribution of defects in sperm ver- might cause significant perturbations of gene expression either sus oocytes to the reduced fertility of prg-1 mutants, mutant on this autosome or elsewhere. We found that prg-1 and wildhermaphrodites raised at 250C were mated to wild-type males. type animals have broadly similar patterns of gene expression. The temperature-dependent sterility of prg-1 was partially res- Notably, genes located near 21 U-RNA loci, including genes locued, as the average number of prg-1 progeny produced by cated within and around the major clusters of 21 U-RNA loci on animals reared at 250C was 3 (n = 10), but this number in- chromosome IV,were not significantly altered in their expression creased to 19 (n = 10) when prg-1 mutants were mated with (Figure 6A). Among 88 groups of developmentally coregulated wild-type males. These findings suggest that the fertility defects genes, also referred to as gene "mountains" (Kim et al., 2001), of prg-1 hermaphrodites stem, in part, from defects in the 66 were essentially unchanged between the wild-type and production and/or functionality of both the male and female prg-1 strains (Figure 6B). Among the 16 mountains with degametes. creased expression in prg-1 mutants were several mountains In summary, prg-1 mutants exhibit dramatically reduced with germline functions such as cell division and oogenesis. germ-cell numbers at all temperatures, and the gametes pro- Among the six mountains with increased expression was one duced are markedly more sensitive to temperature than are containing spermatogenesis-related genes. those of wild-type animals. For example, at 250C wild-type aniIn C. elegans a large class of RdRP-derived endogenous siRmals produce ~200 progeny, about two-thirds of the brood NAs (endo-siRNAs) target transposons and repetitive sequences size observed at 200C, while prg-1 mutants produce an average as well as numerous protein-encoding genes (Ambros et al., brood size of only 3 progeny at 250C, less than one-tenth the 2003; Ruby et al., 2006; W.G. and D.C., unpublished data). Albrood size of 40 observed at 200C. This reduction in brood though PRG-1 does not appear to interact directly with small size at higher temperature correlates with a reduction inthe num- RNAs of this type (Figure 6C and Tables S2 and S3), we wonber of embryos observed, consistent with the idea that ovulation dered whether 21U-RNAs might be linked, perhaps indirectly, or fertilization are impaired at higher temperature. to changes in the patterns of endo-siRNA expression. In many 72 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. ..... ........ N., ............................................... ................ ........ .............. ...... .... .. ...... ....... PR E S Molecular Cell 21U-RNAs Are C. elegans piRNAs modified and resistant to periodate degradation (reviewed in Klattenhoff and Theurkauf, 2008). The C. elegans 21U-RNAs share these characteristics but also exhibit several other unique properties (Ruby et al., 2006). Perhaps the most remarkable distinction is that 21 U-RNAs originate from thousands of loci that frequently share a common upstream motif and are clustered in two large regions of one autosome. Within these two large regions of two million and four million base pairs, respectively, the 21 U-RNA loci are interspersed on both strands and rarely overlap with each other, repeat elements, or coding regions. Instead they localize to introns and intergenic regions within these chromosomal regions at an average density of one 21U-RNA locus every 200-300 bp. In other organisms, piRNAs lack discemable upstream motifs and are often found in much smaller clusters dispersed on all chromosomes. In flies a subgroup of piRNAs, originally termed repeat-associated siRNAs (rasiRNAs), are derived primarily from within repeats and transposons and appear to target transposons for silencing (Brennecke et al., 2007; Gunawardane et al., 2007; Saito et al., 2006). Furthermore, unlike 21 U-RNAs, repeatassociated piRNAs derived from opposite strands frequently overlap. In mammals, two types of piRNA clusters have been identified based on their temporal expression during spermatogenesis. Similar to Drosophila rasiRNAs, piRNAs expressed prior to meiotic pachytene in mice are derived from repeat- and transposonrich clusters. These rasi-like piRNAs interact with the MILl AGO, which is expressed in the same developmental stages (Aravin et al., 2007). During pachytene a second type of piRNA becomes abundant, which is derived from clusters that differ from both 21 U-RNA clusters and rasiRNA clusters. These pachytene piRNA clusters span tens of thousands of bases-the length of DISCUSSION a typical pre-mRNA transcript. Within these clusters the piRNAs AGO-protein/small-RNA complexes mediate biological activities exhibit remarkable strand bias, as though all the piRNAs within that fall into the two broad categories of genomic surveillance a region are processed from a single RNA-Polymerase Il tranand gene regulation. Several studies suggest that a metazoan- script or from two divergent transcripts (Aravin et al., 2006; Girspecific branch of the AGO family, called the Piwi AGOs, have ard et al., 2006; Grivna et al., 2006; Lau et al., 2006). In contrast, become specialized to provide surveillance functions required neighboring 21 U-RNA loci, even those within the same intron of for germline maintenance in animals (reviewed in Aravin et al., an annotated gene, appear to have autonomous biogenesis, 2007). C. elegans contains one of the largest and best studied each with their own 5' motif and deriving from the opposite families of AGO proteins. Yet, beyond a general requirement strand about as often as from the same strand. Despite these striking differences, mammalian pachytene piRfor fertility (Yigit et al., 2006), the function of C. elegans Piwi-related AGOs and the nature of their small-RNA cofactors had NAs are similar to 21U-RNAs in one very intriguing way. Both not been explored. We have shown that PRG-1, a Piwi subfamily types of small RNA encode tremendous sequence diversity AGO, interacts with 21 U-RNAs, which are encoded by over 15 and yet seem to lack obvious targets. In general, 21 U-RNAs do thousand genomic loci broadly clustered in two regions of chro- not match repeat sequences or protein coding genes with a fremosome IV.These findings link this unusual class of small RNAs quency any higher than that expected by chance. to an RNAi-related pathway and suggest that PRG-1 and 21 URNAs form an RNP complex required for proper germline devel- Piwi-AGO Complexes Exhibit a Conserved Localization opment. The sequence repertoire of 21 U-RNAs appears to be in Germline Nuage more diverse than expected by chance, and, with the exception We have shown that the PRG-1 protein localizes to the germline of Tc3 discussed below, obvious sequence-specific targets for nuage, called P granules, in C. elegans. In other animals, Piwi AGOs show similar localization. In both Drosophila (AGO3 and 21 U-RNAs are not found inthe C. elegans genome. Aubergine) and zebrafish (Ziwi), Piwi proteins localize to perinuclear nuage structures (Brennecke et al., 2007; Houwing et al., piRNAs in Worms, Flies, and Mammals Piwi AGOs bind small RNAs (piRNAs) with the following charac- 2007). A third Piwi protein from Drosophila, Piwi itself, exhibits teristics: a Dicer-independent biogenesis, a 5' end with a mono- a more complex distribution, localizing to the nuclei of both phosphate and a strong bias for Uracil, and a 3' end that is germ cells and somatic cells (Brennecke et al., 2007; Cox instances, changes in endo-siRNA levels correlated inversely with changes in gene expression from the corresponding interval (Figure 6D and Table S4). However, the regions with significant changes in endo-siRNA levels were not correlated with regions containing 21U-RNAs or sequences with extended sequence similarity to 21 U-RNAs. One curious exception to this finding was the transposon Tc3, within which resides a single 21 U-RNA. Found in all 22 Tc3 genomic loci, 21 U-RNA-15703 overlaps the 3' inverted repeat (IR) downstream of, and in the same orientation as, the transposase gene (Figure 6E). This sequence was identified three times among two million reads in our small-RNA library prepared from the PRG-1 immune complex, an apparent enrichment when compared to only 12 reads in over thirty million from the remaining non-IP-associated data set. Examination of the endosiRNA profile across a representative Tc3 element revealed two types of endo-siRNA reads. The first were antisense to the transposase gene and were unaffected in prg-1(tm872) mutants (Figure 6F). The second were directed, with a marked strand asymmetry, toward the Tc3 IR regions and were severely depleted in prg-1(tm872) mutants (Figure 6F). Neither the IR-directed nor the transposase-directed siRNAs exhibited coimmunoprecipitation with PRG-1 (Figure 6G). Although the numbers of endo-siRNAs targeting the transposase gene were not significantly reduced in prg-1, we nevertheless observed a 3- to 4fold upregulation of the Tc3 transposase mRNA (Figure 6H). Upregulation of the transposon mRNA, as well as a greater than 100-fold increase in Tc3 transposition frequency, were also observed for two different prg-1 mutant alleles in a parallel study (Das et al., 2008 [this issue of Molecular CellJ; see Discussion). Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 73 ................. ................................... UP R E S Molecular Cell 21 U-RNAs Are C. elegans piRNAs Lo2pobefenrchmenft vs.wT in pip-I(1n872) -b 0 1 2 invened rmpegt TbsA trmpone gene Invered repeat 31 .~hL.LL...L.LL..ni All1 .... Exonic All 21U-rlch besL 21U-poor probes 1+-I 1039034RNA 30 52938 986096 library mens. sran Uwneestrn B Rb complex 604 DNA synthesis i~-I 2704 Oocyte-enriched i-+16074 Mount 07 E+i37300 396 Mount 32 -+Cyclin 432 Topoisomerase i+656 Histone i+1192 Mitosis 5387 DNA repair -1 3089 Mount 11 I+38381 Germ line-enriched 33261 + Hermaphrodite-enriched I--4 4435 Melosis 1681 1893 Chromatin Programmed cell death I+675 66 mountains Mount 16" Mount 041 Sperm-enriched Mount 361 Mount 291 Mount 35 I--I --- i-. 9844 5 2 0 oI00 200 0- prg-1(tm872) sIRNA library 01 _ 300- 43605 27892 300 100 i 200 1356 C= Ratio: Input vs IP 0:10 1:92:83:74:85:56:4- 7:3- *- 8:2 9:1 10:0 . ** * .fM.k EndosIRNA pergene 21U-RNAs n - 329 gene n 96 21U-RNAs Log probeenrichment in pig-1 (n872) vs.WT -1 0 1 2 3 0:10L 50 o-1Plibrary PG-1 01- 1:928 . 2:8- 3:7 23 4:6-* Wild Type* prg-1(pk2298) 7:3- prg-1(Wm872) 9:1 ', 10:0- -I- Fold change In TC3A transposase mRNA Figure 6. prg-1 Mutants Exhibit Surprisingly Subtle Changes in Gene Expression (A)Gene expression was not preferentially affected inthe 21 U-rich portions of the C.elegans genome. For each of the indicated probe sets, median values are shown with error bars indicating 25th and 75th percentiles and "n" indicating the number of probes. (B)The overall expression of some gene mountains was significantly altered inthe prg-1 (tm872) mutant. All probes overlapping the exons of all genes from each mountain (Kim et al., 2001) were considered, and median log-fold changes were plotted as in (A)for those mountains changing by 0.4 log 2 units. (C)21 U-RNA depletion intheprg-1(tm872) mutant and enrichment inthe PRG-1 co-IP. The x axis indicates the ratio of read frequencies between the input versus PRG-1 co-IP libraries described inFigures 4F-4H. The y axis indicates the ratio of antisense read frequencies between the wild-type and prg-1(tm872) mutant siRNA-enriched libraries (made using a protocol that does not require a 5' monophosphate and therefore captures endogenous siRNAs beginning with a 5' triphosphate). Each blue dot indicates the antisense read count for one gene whose wild-type siRNA-enriched read count is t500. Each red dot indicates the read count for a 21 U-RNA species with 200 reads from the young adult wild-type library prepared starting with T4 RNA ligase 1(see Experimental Procedures)and at least one read between the two libraries of each plot axis. 74 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. . ..... ...... ....... PR E S Molecular Cell 21U-RNAs Are C. elegans piRNAs A IddW endoWnowulNA I o Gtmwaninsam afiuelmN .7G PAW= kwedsd Repo*t(E) Tc3A rwupasa gene U-21UA"ISM FI "n Vansct PRG-1/21U-RN-15703 m7 * "P INregulation N Figure 7. Models for 21U-RNA Function (A)Regulation of TC3 inverted repeats by PRG-1/ 21U-RNA-15703. (B)Regulation of germline transcripts by imperfect base paring. i*o mwftwe Tc3ATransposaserMA brwrlsd Repo*(QR) ...... .. ......... .............. IRregiatofi IRtaifioaded second.yWAGo recognitiondflR Negateegulhtondftrancrptontransposese et al., 2000). In mice, the localization of Miwi and Mili has been analyzed, and, although their expression peaks at different times, both are cytoplasmic proteins present in developing spermatids but absent in mature sperm (Deng and Lin, 2002; Kuramochi-Miyagawa et al., 2004). A striking feature of PRG-1 localization was its presence in P granules throughout development. In germline stem cells and developing gametes of C. elegans, P granules are localized in a perinuclear pattern and are often found in apposition to nuclear pores (Pitt et al., 2000). They are thought to function inthe sorting and storage of messages involved ingametogenesis and insubsequent parentally programmed zygotic development (Strome, 2005). In the fertilized egg and early embryo, the Pgranules dissociate from the nuclear periphery and are distributed in the cytoplasm. In the male germline, P granules are present in dividing stem cells as well as meiotic spermatocytes but rapidly disappear as the spermatids mature. Finally, similar to other organisms where piRNA expression correlates tightly with the expression of their Piwi-class AGO binding partners (Aravin et al., 2006; Girard et al., 2006; Houwing et al., 2007), the expression of 21 U-RNAs closely correlated with the expression of PRG-1. A Potential Role for 21U-RNAs in Tc3 Silencing In C. elegans, members of an expanded worm-specific AGO clade (the WAGOs) are required for the majority of transposon silencing and appear to function with RdRP-derived siRNAs (Tijsterman et al., 2002). Surprisingly, the silencing of a single transposon family, Tc3, appears to depend on both WAGO family members (Vastenhouw et al., 2003) and on prg-1 (Das et al., 2008). We found a single 21U-RNA, 21URNA-15703, that mapped to Tc3. This 21U-RNA appeared enriched - among small RNAs recovered from the PRG-1 immune complex but was located downstream of the transposase 3'UTR in the sense orientation and thus could not directly silence the transposase mRNA. Interestingly, 21 U-RNA-15703 was located just upstream of a series of siRNAs associated with the Tc3 inverted repeats (IR). The production of IR-associated siRNAs depended onprg-1 but also required the activities of two RdRPs and of an AGO inthe WAGO clade (data not shown). The production of the PRG-1 -dependent IR-associated siRNAs could be explained by a two-step model similar to one previously described for RDE-1-directed silencing in C.elegans (Yigit et al., 2006; Sijen et al., 2007; Pak and Fire, 2007). If a PRG-1 complex containing 21U-RNA-15703 were to cleave a target RNA that extended into Tc3 from the downstream genomic region (Figure 7A), it could create a template for the RdRP-dependent synthesis of the secondary IR-associated siRNAs. How the loss of these IR-associated siRNAs might lead to activation of Tc3 in prg-1 mutants remains unclear. Perhaps their loss leads to alterations in chromatin structure in the IRs or to changes in the expression of IR-associated regulatory transcripts. Such changes could explain the 3- to 4-fold increase in transposase mRNA levels observed by qRT-PCR and might also render the IR genomic regions more accessible for transposase-directed endonucleolytic cleavage. The notion that PRG-1 may serve as an upstream AGO capable of triggering secondary siRNA production has implications for how other 21 U-RNAs may function and could explain how loss of an exceptionally low-abundance 21 U-RNA could cause the 100-fold increase in transposition of Tc3 (Das et al., 2008). (D)Changes to mRNAs compared to their corresponding siRNA in the prg-1(tm872) mutants. Each point indicates a gene with 10 array probes and >500 antisense reads from the WT siRNA-enriched library overlapping annotated exons. The x axis isas in (A). The y axis isas in (C). (E)Aschematic view of a full-length Tc3 transposon showing the inverted repeats (gray) and Tc3A transposase gene (red). The position of 21 U-RNA-15703 is indicated with a red asterisk. (F)Density of reads mapping to the sense (blue) and antisense (orange) strands of the Tc3 element from (E).Reads per 50 nt window are plotted for the wild-type (top)and prg- 1tm872) mutant (bottom) siRNA-enriched libraries. Read counts are not normalized to the number of genomic matches. Dashed gray lines indicate 0.002% of each library. (G)Density of reads mapping to the sense (blue)and antisense (orange) strands of the Tc3 element from (E).Reads per 50 nt window are shown from the input (top) and PRG-1 co-IP (bottom) libraries. Read counts are not normalized to the number of genomic matches. Dashed gray lines indicate 0.002% of each library. (H)Expression of the TC3A mRNA. Primers recognizing TC3A mRNA were used in quantitative RT-PCR on mRNA generated from worms with the indicated genotypes, using actin (act-3) mRNA as the normalization standard. Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 75 ..... ... ...... CeUl P R E S S A Conserved Function for piRNA Complexes in Maintaining Pluripotency Despite differences in their expression and the types of clusters from which they derive, our findings suggest that the overwhelming majority of 21 U-RNAs and the abundant pachytene piRNAs of mammals share some intriguing similarities. Perhaps most notably, they share the confounding feature that, with few exceptions, they lack recognizable targets upon which they might specifically act. Although a number of genes exhibit changes in expression inprg-1 mutants, these changes could easily reflect alterations that arise indirectly. A parallel study has suggested that spermatogenesis-related gene expression is downregulated in prg-1 mutant males (Wang and Reinke, 2008). Conversely, our studies revealed an apparent upregulation of several spermatogenesis-related genes in prg-1 mutant hermaphrodites. However, inthese instances, unlike the Tc3 example, there is no direct evidence linking specific 21U-RNAs to the regulated genes, therefore it seems probable that these apparent discrepancies reflect indirect consequences of developmental defects and changes in germ-cell number that occur in the prg- 1 mutant gonads. Overall, our analyses suggest that there is no correlation between genes whose expression is altered in prg-1 mutants and the proximity of those genes to 21 U-RNA loci. One possible model to explain this paradox is to imagine that PRG-1/21 U-RNA complexes may base-pair imperfectly with targets. A precedent for this already exists with animal miRNAs and most of their targets, for which pairing to miRNA seed nucleotides 2-8 is often sufficient for target recognition (Grimson et al., 2007). However, if similar partial matches were sufficient for piRNA-mediated regulation, then the entire transcriptome could potentially be placed under 21 U-RNA-directed regulation. Perhaps 21 U-RNAs act collectively, through partial sequence matches, to negatively regulate gene expression broadly. For example, germline-expressed mRNAs recognized by 21U-RNA/ PRG-1 complexes could be stored in the cytoplasm (perhaps within P granules) until a secondary factor releases repression (Figure 7B). Such a mechanism would require the maintenance of sequence diversity within the 21U-RNA family as a whole, rather than conservation of specific 21 U-RNA sequences. Out of more than 15,000 different 21U-RNAs encoded in C. elegans, only one transposon-directed 21 U-RNA was identified, strongly suggesting that transposon silencing is not the only function mediated by this ancient metazoan-specific group of AGOs. It is interesting to note that many mammals, including humans, have, at great apparent cost to their fitness (Werdelin and Nilsonne, 1999), derived morphological adaptations that place the male germline external to the body cavity. Perhaps this adaptation is necessary to facilitate the same temperaturesensitive process in gametogenesis that is also facilitated in part by PRG-1. EXPERIMENTAL PROCEDURES Worm Strains The Bristol strain N2 was used as the standard wild-type strain. Alleles used in this study are listed below, grouped by chromosome: LGI: glp-4(bn2), prg1(tm872), prg-1(pk2298), rde-3(ne3364), ego-1(om7l), rrf-1(ok589), nf2(pk2040); LGIl: rrf-3(pk1426); LGIII: dcr-1(ok247), rde-4(ne299), mut7(ne311), eft-3(q145), qC1[nes(myo2::avr-15, rol-6, unc-22(RNAi))]; LGIV: 76 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. Molecular Cell 21 U-RNAs Are C. elegans piRNAs fem-1(hc17), prg-2(ok1328), prg-2 (tm1094); LGV: fog-2(q71). AGO deletions described in Ylgit et al. (2006) were also assayed for levels of 21U-RNA-1 and 21U-RNA-3442. C. elegans culture and genetics were as described in Brenner (1974). Antibody Generation Anaspec generated and purified the PRG-1 antibody in rabbits using the following peptides: RGSGSNNSGGKDQKYL and RQQGQSKTGSSGQPQKC. Biochemistry and Molecular Biology Protein and RNA purifications were performed as described in Hutvagner et al. (2004) and Duchaine et al. (2006), respectively. Antibodies used inthis study are as follows: (1)monoclonal antibody anti-AFP 3E6 (Qbiogene), (2)an affinity-purified polyclonal anti-PRG-1 antibody, (3)HRP-conjugated secondary antibody (Jackson Immunoresearch), (4) anti-tubulin (Accurate Chemical). Northem blot analysis was performed as in Duchaine et al. (2006). A more detailed description can be found inthe Supplemental Experimental Procedures. Quantitative Real-Time PCR Real-time PCR was performed using Superscript Il Reverse Transcriptase (Invitrogen) and Applied Biosystems SYBR Green PCR Master mix according to the supplier's instructions. Primer sequences are available upon request. Immunostaining and Microscopy Gonads were prepared for indirect immunofluorescence as in Pasierbek et al. (2001) and incubated with primary antibody (K76 [Kawasaki et al., 1998] and the anti-PRG-1 antibodies described above) ovemight at 4*C. Cy-3 antimouse IgM, and FITC or TRITC anti-rabbit secondary antibodies (Jackson Immunoresearch), were used to detect K76 anti-PGL-1 and anti-PRG-1, respectively. Slides were mounted inVectashield (Vector Labs) with DAPI. All images were collected using a Hamamatsu Orca-ER digital camera mounted on a Zeiss Axioplan 2 microscope and with Openlab software. Small-RNA Cloning Small endogenous C.elegans RNAs from embryos, five distinct larval stages (L1,L2, L3, L4, and dauer), mixed-stage animals, young adults from glp4(bn2), prg-1(tm872), fog-2(q71) mutant backgrounds, and wild-type control worms were prepared for sequencing using a protocol derived from Lau et al. (2001). Libraries generated from wild-type and prg-1(tm872) were constructed as described by W.G. and D.C. (unpublished data). To generate small-RNA libraries from PRG-1 immunocomplexes, PRG-1 IPs were performed on 70 mg of total wild-type protein as described in Duchaine et al. (2006). For comparison, total RNA was extracted from a fraction of worms equivalent to that used for the PRG-1 IPs. These small-RNA libraries were constructed using a method that does not require a 5' monophosphate (Ambros et al., 2003. PCR products generated for all the samples described above were sequenced on a Solexa sequencing platform (Illumina, Inc.) (Seo et al., 2004). Detailed description of the cloning protocols, as well as data analysis, can be found inthe Supplemental Experimental Procedures. Biochemical Analysis of 5' Nucleotide Small RNAs inthe 18-26 nt range, obtained from total RNA and the RNA fraction that coimmunoprecipitated with PRG-1, were gel purified, treated with Calf Intestinal Alkaline Phosphatase (NEB) inthe presence of 1 Uof Super RNase Inhibitor (Ambion), and labeled at the 5' end with T4 Polynucleotide Kinase inthe presence of yATP. The 5'end-labeled RNAs were gel purified and incubated with nuclease P1 (USBiological). Samples were spotted on a TLC plate developed with 0.5 M lithium chloride. Tiling Microarray Procedures Total RNA was extracted as described above and prepared using the RiboPure total RNA isolation kit (Ambion). Labeling reactions were performed following the manufacturer's protocols with the GeneChip WT Double-Stranded cDNA Synthesis Kit (Affymetrix), GeneChip Sample Cleanup Module (Affymetrix), and the GeneChip WT Double Stranded DNA Terminal Labeling Kit (Affymetrix). Array hybridization to GeneChip C.elegans Tiling 1.OR chips was done using standard Affymetrix protocols and reagents. Signal values for each array ............... :.: ......................... PR E SU Molecular Cell 21 U-RNAs Are C. elegans piRNAs probe were calculated using Affymetrix Tiling Analysis Software 1.1.2 (bandwidth: 30; intensities: PM/MM) with three replicates of prg-1 (tm872) experimental data sets and three control wild-type. Probe overlap with annotations was assessed using the Affymetrix-provided ce4 coordinate, which indicates the genomic position matching the center of the array probe. ACCESSION NUMBERS All RNA sequences extracted from Illumina reads as described were deposited in the Gene Expression Omnibus with the following accession number: GSE1 1738. Included under this accession number are the following data sets: developmental time-course/mixed stage, 5' monophosphate-dependent; prg-1(ftm872) and fog-2(q71) mutant analysis, 5' monophosphate-dependent; prg-1(tm872) mutant analysis, 5' monophosphate-independent; and the PRG-1 co-IP analysis. 21 U-RNA sequences are provided as a supplemental Fasta-formatted text file (Table S1). Tools for scoring 21 U-RNA loci trained using data from Ruby et al. (2006) and applied here are available for anonymous download at http://web.wi.mit.edu/bartel/pub/. SUPPLEMENTAL DATA The Supplemental Data include Supplemental Results, Supplemental Experimental Procedures, three figures, and four tables and can be found with this article online at http://www.molecule.org/cgi/content/full/31/1/67/DC1/. ACKNOWLEDGMENTS We thank our labmates for many helpful discussions and comments on the manuscript; Fan Zhang for her early efforts on this project; Eric Miska for sharing unpublished data; and R. Ketting, the CGC, and the C. elegans Gene Knockout Consortium for providing strains. P.J.B. issupported by a predoctoral fellowship from Fundagio para Ciencia e Tecnologia (SFRH/BD/1 1803/ 2003), Portugal. D.A.C. issupported by a predoctoral fellowship from Fundaeso para Ciencia e Tecnologia (SFRH/BD/1 7629/2004/H6BM). J.M.C. is an HHMI fellow of the LSRF. C.C.M. and D.P.B. are Howard Hughes Medical Institute Investigators. This work was funded in part by the National Institutes of Health (GM58800 and GM67031). Received: December 21, 2007 Revised: June 3, 2008 Accepted: June 9, 2008 Published online: June 19, 2008 REFERENCES Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. (2003). MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr. Biol. 13, 807-818. Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., et al. (2006). A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442, 203-207. Aravin, A.A., Sachidanandam, R., Girard, A., Fejes-Toth, K., and Hannon, G.J. (2007). Developmentally regulated piRNA clusters implicate MILI intransposon control. Science 316, 744-747. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708-715. Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and Hannon, G.J. (2007). Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089-1103. Brenner, S. (1974). The genetics of Caenorhabditis elegans. Genetics 77, 71-94. Cox, D.N., Chao, A., Baker, J., Chang, L., Qiao, D., and Lin, H.(1998). Anovel class of evolutionarily conserved genes defined by piwi are essential for stem cell self-renewal. Genes Dev. 12, 3715-3727. Cox, D.N., Chao, A., and Lin, H.(2000). piwi encodes a nucleoplasmic factor whose activity modulates the number and division rate of germline stem cells. Development 127, 503-514. Cuppen, E., Gort, E., Hazendonk, E., Mudde, J., van de Belt, J., Nijman, I.J., Guryev, V., and Plasterk, R.H. (2007). Efficient target-selected mutagenesis in Caenorhabditis elegans: toward a knockout for every gene. Genome Res. 17, 649-658. Das, P.P., Bagijn, M.P., Goldstein, L.D., Woolford, J.R., Lehrbach, N.J., Sapetschnig, A., Buhecha, H.R., Gilchrist, M.J., Howe, K.L., Stark, R., et al. (2008). Piwi and piRNAs act upstream of an endogenous siRNA pathway to suppress Tc3 transposon mobility in the Caenothabditis elegans germline. Mol. Cell 31, this issue, 79-90. Deng, W., and Lin, H.(2002). miwi, a murine homolog of piwi, encodes a cytoplasmic protein essential for spermatogenesis. Dev. Cell 2, 819-830. Duchaine, T.F., Wohlschlegel, J.A., Kennedy, S., Bel, Y., Conte, D.J., Pang, K., Brownell, D.R., Harding, S., Mitani, S., Ruvkun, G., Yates, J.R., Ill, and Mello, C.C. (2006). Functional proteomics reveals the biochemical niche of C.elegans DCR-1 inmultiple small-RNA-mediated pathways. Cell 124, 343-354. Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, MA (2006). A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442, 199-202. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, LP., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91-105. Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. (2006). Anovel class of small RNAs in mouse spermatogenic cells. Genes Dev. 20, 1709-1714. Gunawardane, L.S., Saito, K., Nishida, K.M., Miyoshi, K., Kawamura, Y., Nagami, T., Siomi, H., and Siomi, M.C. (2007). A slicer-mediated mechanism for repeat-associated siRNA 5' end formation in Drosophila. Science 315, 1587-1590. Horwich, M.D., Li, C., Matranga, C., Vagin, V., Farley, G., Wang, P., and Zamore, P.D. (2007). The Drosophila RNA methyltransferase, DmHenl, modifies germline piRNAs and single-stranded siRNAs in RISC. Curr. Biol. 17, 1265-1272. Houwing, S., Kamminga, L.M., Berezikov, E., Cronembold, D., Girard, A., van den Elst, H., Filippov, D.V., Blaser, H., Raz, E., Moens, C.B., et al. (2007). Arole for Piwi and piRNAs in germ cell maintenance and transposon silencing in Zebrafish. Cell 129, 69-82. Hutvagner, G., and Simard, M.J. (2007). Argonaute proteins: key players in RNA silencing. Nat. Rev. Mol. Cell Biol. 9, 22-32. Hutvagner, G., Simard, M.J., Mello, C.C., and Zamore, P.D. (2004). Sequencespecific inhibition of small RNA function. PLoS Biol. 2, E98. 10.1371/joumal. pbio.0020098. Kawasaki, I., Shim, Y.H., Kirchner, J., Kaminker, J., Wood, W.B., and Strome, S. (1998). PGL-1, a predicted RNA-binding component of germ granules, is essential for fertility inC. elegans. Cell 94, 635-645. Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N., and Davidson, G.S. (2001). A gene expression map for Caenorhabditis elegans. Science 293, 2087-2092. Kirino, Y., and Mourelatos, Z.(2007). The mouse homolog of HEN1 isa potential methylase for Piwi-interacting RNAs. RNA 13, 1397-1401. Klattenhoff, C., and Theurkauf, W.(2008). Biogenesis and germline functions of piRNAs. Development 135, 3-9. Kuramochi-Miyagawa, S., Kimura, T., Ijiri, T.W., Isobe, T., Asada, N., Fujita, Y., Ikawa, M., Iwai, N., Okabe, M., Deng, W., et al. (2004). Mii, amammalian member of piwi family gene, isessential for spermatogenesis. Development 131, 839-849. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 77 ................ ................... CeUl P R E S S Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and Kingston, R.E. (2006). Characterization of the piRNA complex from rat testes. Science 313, 363-367. Lin, H.(2007). piRNAs in the germ line. Science 316, 397. Ohara, T., Sakaguchi, Y., Suzuki, T., Ueda, H., Miyauchi, K., and Suzuki, T. (2007). The 3' termini of mouse Piwi-interacting RNAs are 2'-O-methylated. Nat. Struct. Mol. Biol. 14, 349-350. Pak, J., and Fire, A. (2007). Distinct populations of primary and secondary effectors during RNAj in C.elegans. Science 315, 241-244. Pasierbek, P., Jantsch, M., Melcher, M., Schleiffer, A., Schweizer, D., and Loidi, J. (2001). A Caenorhabditis elegans cohesion protein with functions in meiotic chromosome pairing and disjunction. Genes Dev. 15, 1349-1360. Pitt, J.N., Schisa, J.A., and Priess, J.R. (2000). Pgranules in the germ cells of Caenorhabditis elegans adults are associated with clusters of nuclear pores and contain RNA. Dev. Biol. 219, 315-333. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. (2006). Large-scale sequencing reveals 21 U-RNAs and additional microRNAs and endogenous siRNAs in C.elegans. Cell 127,1193-1207. Saito, K., Nishida, K.M., Mori, T., Kawamura, Y., Miyoshi, K., Nagami, T., Siomi, H., and Siomi, M.C. (2006). Specific association of Piwi with rasiRNAs derived from retrotransposon and heterochromatic regions in the Drosophila genome. Genes Dev. 20, 2214-2222. Saito, K., Sakaguchi, Y., Suzuki, T., Suzuki, T., Siomi, H., and Siomi, M.C. (2007). Pimet, the Drosophila homolog of HEN1, mediates 2'-O-methylation of Piwi- interacting RNAs at their 3' ends. Genes Dev. 21, 1603-1608. Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. (2004). Photocleavable fluorescent nucleotides for DNA sequencing on a chip con- 78 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. Molecular Cell 21 U-RNAs Are C. elegans piRNAs structed by site-specific coupling chemistry. Proc. Nati. Acad. Sci. USA 101, 5488-5493. Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. (2007). Secondary siRNAs result from unprimed RNA synthesis and form adistinct class. Science 315, 244-247. Strome, S. (2005). Specification of the germ line. InWormBook, The C.elegans Research Community, ed. 10.1895/wormbook.1.9.1, http://www.wormbook. org. Strome, S., and Wood, W.B. (1982). Immunofluorescence visualization of germ-line-specific cytoplasmic granules in embryos, larvae, and adults of Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 79,1558-1562. Tijsterman, M., Okihara, K.L., Thijssen, K., and Plasterk, R.H. (2002). PPW-1, a PAZ/PWI protein required for efficient germline RNAi, isdefective in anatural isolate of C. elegans. Curr. Biol. 12, 1535-1540. Vastenhouw, N.L., Fischer, S.E., Robert, V.J., Thijssen, K.L., Fraser, A.G., Kamath, R.S., Ahringer, J., and Plasterk, R.H. (2003). A genome-wide screen identifies 27 genes involved intransposon silencing in C.elegans. Curr. Biol. 13,1311-1316. Wang, G., and Reinke, V. (2008). A C. elegans Piwi, PRG-1, regulates 21 URNAs during spermatogenesis. Curr. Biol. 18, in press. Published online May 22, 2008. 10.1016/j.cub.2008.05.009. Werdelin, L., and Nilsonne, A.(1999). The evolution of the scrotum and testicular descent in mammals: a phylogenetic view. J. Theor. Biol. 196, 61-72. Yigit, E., Batista, P.J., Bei, Y., Pang, K.M., Chen, C.C., Tolia, N.H., Joshua-Tor, L., Mitani, S., Simard, M.J., and Mello, C.C. (2006). Analysis of the C.elegans Argonaute family reveals that distinct Argonautes act sequentially during RNAi. Cell 127, 747-757. ............ nat Vol 455130 October 2008 |dok10.1038/natureO7415 ARTICLES Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals Andrew Grimsoni, 2 , Mansi Srivastava4, Bryony Fahey 3, Ben J. Woodcroft 3, H. Rosaria Chiang1,2, Nicole King 4 , Bernard M. Degnan 3 , Daniel S. Rokhsar4'5 & David P. Bartel1,2 In bilaterian animals, such as humans, flies and worms, hundreds of microRNAs (miRNAs), some conserved throughout bilaterian evolution, collectively regulate a substantial fraction of the transcriptome. In addition to miRNAs, other bilaterian small RNAs, known as Piwi-interacting RNAs (piRNAs), protect the genome from transposons. Here we identify small RNAs from animal phyla that diverged before the emergence of the Bilateria. The cnidarian Nematostella vectensis (starlet sea anemone), a close relative to the Bilateria, possesses an extensive repertoire of miRNA genes, two classes of piRNAs and a complement of proteins specific to small-RNA biology comparable to that of humans. The poriferan Amphimedon queenslandica (sponge), one of the simplest animals and a distant relative of the Bilateria, also possesses miRNAs, both classes of piRNAs and a full complement of the small-RNA machinery. Animal miRNA evolution seems to have been relatively dynamic, with precursor sizes and mature miRNA sequences differing greatly between poriferans, cnidarians and bilaterians. Nonetheless, miRNAs and piRNAs have been available as classes of riboregulators to shape gene expression throughout the evolution and radiation of animal phyla. The RNA interference (RNAi) pathway, which processes long double-stranded RNA into small interfering RNAs and uses them to mediate gene silencing, is present in diverse eukaryotes, presumably with a role in transposon silencing or viral defence since early in eukaryotic evolution 1. Building on this basal pathway, which includes the Dicer endonuclease and the argonaute (Ago) effector protein, some eukaryotic lineages have acquired additional pathways, each using unique classes of small RNAs to guide silencing. MicroRNAs, -21-24-nucleotide RNAs that derive from distinctive hairpin precursors, pair to messenger RNAs to direct their posttranscriptional repression 2 . More than one-third of human genes are under selective pressure to maintain pairing to miRNAs, implying that these riboregulators influence the expression of much of the transcriptome'. Piwi-interacting RNAs are longer, -25-30 nucleotides, with incompletely characterized biogenic pathways. In mammals and flies, piRNA expression is restricted to the germ line, where they have crucial roles in transposon defence, although one class of mammalian piRNAs, highly expressed at the pachytene stage of sperm development, has unknown function'". The plant and algal miRNAs have gene structure, biogenesis and targeting properties distinct from those of animals-. These differences, considered together with the absence of miRNAs in fungi and all other intervening lineages examined, have led to the conclusion that miRNAs of animals and plants had independent origins'. Of the many miRNAs reported in Bilateria (Fig. 1), -30 appear to have been present in ancestral bilaterians"'; however, none have been reported in the earliest branching animal lineages, leading to the hypothesis that bilaterian complexity might, in part, be due to miRNA-mediated regulation". Likewise, piRNAs have not been reported outside Bilateria, raising the question of whether a rich small-RNA biology is characteristic of more complex animals, or whether these small RNAs might have emerged earlier in metazoan evolution. Diverse microRNAs of the starlet sea anemone Eumetazoa includes the Bilateria as well as the Cnidaria. Among sequenced genomes Cnidaria is represented by the starlet sea anemone, Nematostella vectensis". To explore whether cnidarians have miRNAs, we sequenced complementary DNA libraries generated from 18-30nucleotide RNAs isolated from Nematostella. High-throughput sequencing yielded 2.9 million reads perfectly matching the Nematostella genome (Fig. 2a). To identify miRNAs, we considered properties that have proved useful for distinguishing bilaterian miRNAs from other types of small RNAs represented in sequencing data"'". The first criterion was the presence of reads mapping to an inferred RNA hairpin with pairing characteristics of known miRNA hairpins. The second was the presence of reads from both arms of the Homo sapiens (human, 677 miRNAs) Mus musculus (mouse, 491 miRNAs) Drosophila melanogaster (fly, 147 miRNAs) Caenorhabditis elegans (nematode, 154 mIRNAs) Schmidtea mediterranea (planarian, 61 miRNAs) T T g r a -L , Nematostella vectensis (sea anemone) Trichoplax adhaerens (placozoan) Amphimedon queenslandica (sponge) Monosiga brevicolis (choanoflagellate) Schizosaccharomyces pombe (yeast, 0 miRNAs) Saccharomyces cerevisiae (yeast, 0 miRNAs) Neurospora crassa (fungus, 0 miRNAs) Arabidopsis thaliana (flowering plant, 199 miRNAs) Physcomitrella patens (moss, 263 miRNAs) Chlamydomonas reinhardti (green alga, 72 miRNAs) Figure 1 Phylogenetic distribution of annotated miRNAs. Cladogram of selected eukaryotes, with organisms investigated in this study indicated in red. Branching order of Bilateria is according to ref. 28 and the references therein, and that of basal Metazoa is according to ref. 17 (Supplementary Discussion). Annotated miRNA tallies are from miRBase (v10.1)". 2 'Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA. Howard Hughes Medical Institute, Department of Biology, 4 Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 'School of Integrative Biology, University of Queensland, Brisbane 4072, Australia. Department of Molecular and Cell Biology and Center for Integrative Genomics, University of California at Berkeley, Berkeley, California 94720, USA. 'Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA. 1193 @2008 Macmillan Publishers Limited. All rights reserved ................................ .................. ARTICLES NATUREI Vol 455130 October 2008 hairpin that, when paired to each other, formed a duplex with 2-nucleotide 3' overhangs. This duplex corresponds to an intermediate of miRNA biogenesis in which the miRNA and opposing segment of the hairpin, called the miRNA*, are excised from the hairpin through successive action of Drosha and Dicer RNase III endonucleases2 . The third criterion was homogeneity of the miRNA 5' terminus. Because pairing to miRNA nucleotides 2-8 is crucial for target recognition', reads matching bilaterian miRNAs display less length variability at their 5' termini than at their 3' termini"," . As exemplified by mir-2024d (Fig. 2b, c), 40 distinct Nematostella loci met these criteria (Fig. 2d and Supplementary Data 1; identical hairpins were not counted because they might have arisen from genome-assembly artefacts). Additional features, not used as selection criteria, resembled those ofbilaterian miRNAs2 , thereby increasing confidence in our annotations. For example, the loci usually mapped between annotated protein-coding genes (31 loci) or within introns in an orientation suitable for processing from the pre-mRNA (8 loci). The Nematostella miRNAs also had a tight length distribution (centring on 22 nucleotides, Fig. 2d), and five groups of miRNAs (corresponding to 13 miRNAs) mapped near to each other in an orientation suitable for production from the same primary transcript (Supplementary Data 1), as occurs in bilaterians2 . With the exception of two miRNA pairs (miR-2024a,b and miR-2024fd), the Nematostella miRNAs had unique sequences at nucleotides 2-8, suggesting notable diversity of miRNA targeting in this simple animal. Previous studies that explored the possibility that cnidarians might have miRNAs searched for Nematostella homologues of the -30 miRNA families broadly conserved within the Bilateria by probing RNA blots and examining candidate hairpin sequences",". These studies reported the possible presence of miR-10, miR-33 and miR100 family members in Nematostella. None of our reads matched the proposed miR-10, miR-33 or miR-100 homologues, and none matched the proposed hairpin precursors of miR-10 or miR-33. Such discrepancies were not unexpected, because detection of distantly related miRNAs by hybridization is prone to false-positives, a e 10 5'-nt identity IAU iG 0901 Despite this wholesale shift in their predicted targeting, the Nematostella and bilaterian versions of miR-100 had similarity throughout the RNA, suggesting common origins (Fig. 2e). This result confidently extended the inferred origin of metazoan miRNAs back to at least the last common ancestor of these eumetazoans. Systematic comparison to annotated miRNAs did not reveal any additional Nematostella miRNAs with similarity exceeding that of shuffled control sequences (Supplementary Fig. 1). Although the short length of miRNAs may cause sequence divergence to obscure common ancestry, it is noteworthy that only one of the 40 Nematostella miRNAs appeared homologous to extant bilaterian miRNAs, and even this one seemed to have profoundly different targeting properties. MicroRNAs near the base of the metazoan tree To determine whether miRNAs might be present in more deeply branching lineages, we generated 2.5 million genome-matching reads from the small RNAs of the demosponge A. queenslandica, a poriferan thought to represent the earliest diverging extant animal lineage 6" (Figs 1 and 3a). Eight miRNA genes were identified in Amphimedon adult and embryo samples (Fig. 3b and Supplementary Data 2), exemplified by mir-2018 (Fig. 3c). Six mapped between annotated protein-coding genes; two fell within introns. As is typical for bilaterian miRNAs 2 and is also found in Nematostella (Fig. 2d), reads from one arm of the hairpin usually greatly exceeded those from the other arm, enabling unambiguous miRNA 1516 17 18 19 20 2122 23 24 25 26 27 28 29 30 Length (nt) mir-2024d Sequence Hairpin miRNAmiRNA* reads reads reads 4,973 4,011 miR-100 MHRMACCGDMJAQIGG J()MMJAJGOJQAA CMC 192 169 miR-2022 miR-2023 AGMGMCMBJGGG 28,993 28,174 miR-2024a UJGCAOC'CCMWJCUGA 5,576 miR-2024b miR-2024c miR-2024d b miR-2024e miR-2024f miR-2024g 115.. MMAGQOGGGCMA.................................. 6 ..... MiMUAGAJGisi9GGAA................................... miR-2025 1 ...... MiM ALAJ WJMGG ...................................... miR-2026 miR-2027 1 ......... A IAa. iGCMMGs .................................. miR-2028 1 . DJ4A PJOGen.AAAGU............................ 1 ........................................... miR-2029 1 . M....... WGGCMMK......................... miR-2030 1 ... ...................... AJGCAMUCAAJXiG... miR-2031 1 ............... ............. AMCAEAX .... 1 ............................. AGAC. AUG miR-2032a 9..................... 24 ..................................... Luk~~~haw.. WD .... C M IA..... miR-2032b 3i7................... 5.......II~MDC~fh0 . miR-2033 AWJ~CD~ ....... 6-........................... .................. ..... ..... IA miR-2034 1.............................. LJUMQCAMQ..:::: miR-2035 4U ............................... IULrOKAWAJ9...... 34................................WACAXCCMO . .. miR-2036 2 ................................ UGMAXACCANGUIX.. miR-2037 6 ................................ UUKeLCAUQ1a.. miR-2038 3 ................................ U KCA MMJ ... miR-2039 1.................................zKMcA1 JAJOJIGA... miR-2040a C miR-2024d* miR-2040b GG U A GACUUG C UU C AG A A U AU UG G UACUGGGC AA A AGGU miR-2041 I 1I 1 1 IIIII IIII 111111~~~ miR-2042 A U U CCU C U GG S U C mRN-2024d A miR-2043 miR-2044a N. vectensis miR-100 - ACCCGUAGAUCCGAACUUGUGG miR-2044b miR-2045 AACCCGUAGAUCCGAACUUGUG H. sapiens miR-100 AACCCGUAGAUCCGAACUUGUGX tropicalis miR-100 miR-2046 AACCCGUAGAUCCGAACUUGUG D. reao miR-100 miR-2047 D. melanogaster miR-100 AACCCGUAAAUCCGAACUUGUGmiR-2048 AACCCGUAGAUCCGAUCUUGUGH. sapiens miR-99a miR-2049 CACCCGUAGAACCGACCUUGCGH. sapiens mlR-99b miR-2050 AACCCGUAGAUCCGAUCUUGUGX tropicalis miR-99 miR-2051 AACCCGUAGAUCCGAUCUUGUG D. rero miR-99 0 ACGGGUA 3. d 20 E and many genomic sequences can fold into hairpins. However, one of the newly identified miRNAs arose from the hairpin of the reported miR-100 homologue. The actual miRNA was offset by one nucleotide compared to bilaterian miR-100 family members (Fig. 2e). Because miRNA-targeting is defined primarily by nucleotides 2-8, this offset is expected to alter target recognition substantially, with the Nematostella version primarily recognizing mRNAs containing CUACGGG and UACGGGA heptanucleotide sites and the bilaterian versions recognizing mRNAs with two different sites, UACGGGU and 5,449 KAMMAJCCMnACUA A 300 205 6,732 UAJGCAU4JCCMUQJOJGA 4,294 4,131 L5JCMCM51IC1AXA 8.248 5,935 IJkM.CAOJ 4,331 4,182 UMC PAEGAUM JQJGA 2,396 2,071 WIJJAiCCCiKGMAJJ 6,681 6,539 3,319 3,140 AUGCGALMJCCM1GAA 2,543 1,688 1.JinammA ia 1,643 1,331 UACGMUCCO.GAMCAU 1,607 1,097 UWM CAIMPAiGAGAW 1,349 1,316 UAMCOlAAWMJiIACU 785 485 UCGCGACUAG 184 116 UCGADCMGAM GG 119 50 AGCMUMAGMDGAAMaa 154 74 GCMMCAMJAanJCAJ 138 128 132 81 ACAUGGUCUQAMCMiGA MUAMGiACUCUCAUA 117 98 GGAGA M J 116 107 88 116 ACOJAMUJCACNUGAUGA AMMMAG Q 85 76 LMAJCUCQPMJGC0UGG 75 44 41 71 LAJCQJKQCMCUGCCOJM UiMCQA1W6CMWimaJ 72 57 U*WAJCMUUCAUCGCAGMC 72 28 Liar.OaAssirinnrr 10 8 UGC00EMMQIAG 13 10 69 66 KC JCMAG i6 Uim UCRAJGAG 46 19 32 UCCVGA8JAJGAQCMMA41 (MAOWOCGWAU GAM 7,053 (MCAJMUCUAGGA (LJGI M 36 IAJCGMiAGCAGUJMGGiA 29 ACCUGADCMMAiCAA 25 CAMGCACGCAMUGAAU 22 KMMJ GJAJG 21 483 15 80 63 63 165 115 165 44 55 31 2 1 9 21 5 9 2 2 1 (2) 12 11 (2) 1 1 1 1 (4) 19 (1) 1 1 8 1 21 (6) 16 20 14 15 2 (1) 2 2 1194 @2008 Macmillan Publishers Limited. All rights reserved Figure 21 The miRNAs of N. vectensis. a, Length distribution of genome-matching sequencing reads representing small RNAs, plotted by 5'nucleotide (nt) identity. Matches to ribosomal DNA were omitted. b, Sequencing reads matching the mir-2024dhairpin. The sequence of the mir-2024d hairpin is depicted above the bracket-notation of its predicted secondary structure. The sequenced small RNAs mapping to the hairpin are aligned below, with the number of reads shown on the left, and the designated miRNA and miRNA* species coloured red and blue, respectively. Analogous information is provided for the other newly identified miRNAs (Supplementary Data 1). c, Predicted secondary structure of the mir-2024dhairpin, indicating the miRNA and miRNA* species. d, The 40 Nematostella miRNAs. MicroRNA read counts include those sharing the dominant 5' terminus but possessing variable 3' termini. Occasionally the only sequenced miRNA* species corresponded to a variant miRNA species rather than the major species (counts in brackets). e, Alignment of miR-100 homologues (Danio rerio,D. rerio;Xenopus tropicalis,X. tropicalis). .. ................................ ............. ..... ARTICLES NATUREI Vol 455130 October 2008 annotation of the miRNA and miRNA* (Fig. 3b). However, the number of reads from the two arms of the mir-2015 hairpin did not differ substantially, suggesting that each might have similar propensities to enter the silencing complex and target miRNAs. Moreover, the species from the 3' arm (miR-2015-3p) dominated in adult tissue, whereas the one from the 5' arm (miR-2015-5p) dominated in embryonic tissue (Fig. 3d), supporting the notion that this single hairpin produces two distinct miRNAs, and implying an intriguing, developmentally controlled differential loading into the silencing complex. In Amphimedon, pre-miRNA hairpins were larger than most of those of other metazoans (Fig. 3e). The Nematostella pre-miRNAs (including mir-100) fell at the other end of the spectrum, with a median length less than that of bilaterian pre-miRNAs (Fig. 3e). None of the Amphimedon miRNAs shared significant similarity with any previously described miRNAs (Supplementary Fig. 1), or with the miRNAs found in Nematostella.This observation, combined with their unusually large pre-miRNA hairpins, raised the possibility of an origin independent from that of eumetazoan miRNAs. Arguing against this possibility, we found Amphimedon homologues of Drosha and Pasha proteins (Table 1), which recognize the miRNA primary transcript and cleave it to liberate the pre-miRNA hairpin". Homologues of these proteins appeared to be absent in all lineages outside the Metazoa, indicating a single origin for these processing factors early in metazoan evolution and implying a single origin for their miRNA substrates. A third animal lineage branching basal to the Bilateria is Placozoa, 7 represented by the sequenced species Trichoplax adhaerens . that genes suggested of mitochondrial analyses earlier Although Trichoplaxdiverged before Amphimedon, genomic data indicate that Trichoplaxhad a common ancestor with cnidarians and bilaterians more recently than with Amphimedon" (Fig. 1 and Supplementary Discussion). Our study of Trichoplax small RNAs failed to find miRNAs, despite acquiring many more reads than required to identify miRNAs in all other animals and plants examined (Supplementary Figs 2 and 3). Thus, despite the formal possibility that TrichoplaxmiRNAs are expressed at levels so low that we failed to detect them, we favour the hypothesis that all miRNA genes have been lost in this lineage. Trichoplaxis thought to have derived from a more complex ancestor, having lost, for example, the hedgehog and Notch signalling pathways 7 . Supporting our hypothesis, no Pasha homologue was found in the Trichoplax genome, although we did find the core RNAi proteins-argonaute and Dicer-suggesting the production and use of small interfering RNAs (Table 1). Drosha, which partners with Pasha during miRNA biogenesis", was found also but might be required in the absence of miRNAs for ribosomal RNA maturation' 9 . Of the proteins involved in canonical miRNA biogenesis, Pasha is the one without known functions outside the miRNA pathway, and it was the one that appeared to have been discarded, together with all miRNAs, from the Trichoplax genome (Table 1). We also sequenced small RNAs from the single-celled organism Monosiga brevicollis (Supplementary Fig. 2), which represents the closest known outgroup to the Metazoa20 . We failed to detect any plausible miRNAs, a result consistent with our subsequent finding that Monosiga seems to lack all genes specific to small-RNA biology (Table 1). The absence of Dicer and argonaute seemed to be derived rather than ancestral, as the common ancestor of Monosiga and metazoans possessed these core RNAi proteins' (Table 1). The possibility that the absence of miRNAs in Monosiga might likewise be derived prevented us from setting an early bound on the origin of metazoan miRNAs. In summary, miRNAs appear to have been available to shape gene expression since at least very early in animal evolution. Nonetheless, the numbers identified in simpler animals (8 unique miRNAs in Amphimedon and 40 in Nematostella)were lower than those reported in more complex animals (Fig. 1). Although miRNAs expressed only under specific conditions or at restricted developmental stages were possibly missed in these and other animals, our results are consistent with the idea that increased organismal complexity in Metazoa correlates with the number of miRNAs and presumably with the number of miRNA-mediated regulatory interactions. Piwi-interacting RNAs in deeply branching animals We next turned to the possibility that piRNAs also might have early origins. Piwi proteins, the effectors of bilaterian piRNA pathways, are found in diverse eukaryotic lineages (although not in plants or fungi, Table 1), implying their presence in early eukaryotes'. In cases characterized, however, the small RNAs associated with non-metazoan Piwi proteins resemble siRNAs more than bilaterian piRNAs (deriving, for example, from Dicer-catalysed cleavage of long doublestranded RNA"), raising the question of when piRNAs of the types found in Bilateria might have emerged. The genomes of both Amphimedon and Nematostella, but not that of Trichoplax, encode Piwi proteins (Table 1) and express many -27-nucleotide RNAs with a 5'-terminal uridine (5'-U) (Figs 2a and 3a)-features reminiscent of a a 1 Hairpin miRNA miRNA* reads reads reads miRNA o> miR-2014 miR-2015-3p miR-2015-5p miR-2016 miR-2017 miR-2018 miR-2019 miR-2020 miR-2021 E. we E Length (nt) Sequence 178 17,843 17,043 UGCCAAACAAGUCCGAUCUACA 2,703 1,086 5 501 ACCUCUCCAUCAUGCAUGACA 2,657 2,063 ' UCAUGUAUUGUGGAGGGGAGA 7 37,606 36,675 UAGAUUGGGCUUGGUCGGCAGA 93 1,725 1,531 UACCUGUGCACCUGUGUGCCCA 107 1,529 1,309 UGUCGGAGCCGGAGGUUCCGGA 416 11,574 10,483 AAAGUGAUCGGGUUGCCGUCUG 5 13,936 13,700 UGGGUAGUGUGUCUUUUCGGA 25 7,642 7,537 UGGUGGUCGGUGUUUCGUGGA AGAU UGA A UA miR-2018 AAC c A GA A AUA AAGCCCAUGCA GGCAuUGGA AUAAACCGGUU GCAUGAGUUACAGUGUGUCG GAGCCGGAGGuuccGGAG l1 11111 11 1 G 11111 lii 1111 1 1111tIlI1 ll 111111111 I 11111 11 1 C UGUUUGGUCAu CGUAC CG UGA ACACAGC U CAGGU UCCAGGGCCUC C G A UUCGGUAACGU UUGUUACCU miR-2018* d Fold enrichment Embryo 4 2 1 2 4 miR-2014 miR-2015-5p miR-2015-3p miR-2016I miR-2017 miR-2018 miR-2019 miR-2020 miR-2021 e Adult 8 16 32 64 1.0 2 0.8 0 N.vectensis H. sapiens 0.6 0.4 melanogaster -2D. C. elegans A.queenslandica A. thaliana E 0.2 50 75 100 125 Pre-miRNA size (nt) Figure 31 The miRNAs of Amphimedon queenslandica. a, Length distribution ofgenomematching sequencing reads representing small RNAs, plotted by 5'-nucleotide identity. Matches to ribosomal DNA were omitted. b, The Amphimedon miRNAs, shown as in Fig. 2d. Information analogous to that of Fig. 2b is provided for these miRNAs (Supplementary Data 2). c, Predicted secondary structure of the mir2018 hairpin. d, Relative expression of Amphimedon miRNAs, as indicated by sequencing frequency from adult and embryo samples. e, Cumulative distributions of pre- miRNA lengths from miRNA transcripts of the species indicated. Amphimedon pre-miRNAs were significantly larger than those from any other animal species examined (P< 10-, Wilcoxon rank-sum test), whereas those from Nematostella were significantly smaller (P < 10-). 150 "95 @2008 Macmillan Publishers Limited. All rights reserved ....... . .. .. .. ........ ............... ................... ........................... ARTICLES NATURE IVol 455130 October 2008 Table 1 I The small-RNA machinery of representative eukaryotes Species Ago Piwi Dicer Drosha Pasha Hen1 Homo sapiens Drosophila melanogaster Coenorhabditis elegans* 4 2 5 4 3 3 1 2 1 1 1 1 1 1 1 1 1 1 Nematostella vectensist 3 2 5 1 1 1 1 2 3 01 1 Trichoplax adhaerenst Amphimedon queenslandicat Monosiga brevicollis Saccharomyces cerevisiae Schizosaccharomyces pombe 1| Arabidopsis thaliana Physcomitrella patens Chlamydomonas reinhardtii 3 4 ot 1 0§ 1 01 0 0 0 0 1 4 5 0 0 0 0 0 0 Ot 2 ot ot ot 2 1 3 0 0 1 01 01 01 0$ 1 10 6 Ot Ot 0t Ot 2 *Omitted is anematode-specific clade of proteins related to the Ago and Piwi protein families but distinct from both". t Protein sequences are listed in Supplementary Data 3. ! Inferred loss based on presence in earlier-diverging lineages. I Inferred loss based on presence inearlier-diverging lineages when assuming that Amphimedon diverged before Trichoplax (Supplementary Discussion). || Ago and Dicer, but not Piwi, Drosha, Pasha or Henl, were also identified in each of the additional fungal species examined (Aspergillus nidulans, Neurospora crassa and Sclerotinia sclerotiorum). piRNAs in vertebrates and ffies'. Moreover, 45% of Nematostella 5'-U 27-30-nucleotide RNAs originated from only 89 genomic loci (together comprising 0.4% of the genome), the largest of which was 62 kilobases, and essentially all of these small RNAs derived from one strand of each locus (Fig. 4a and Supplementary Table 3). In these respects the genomic loci producing a large fraction of the Nematostella reads closely resembled the loci producing bilaterian piRNAs, particularly the pachytene piRNAs'. We observed a similar clustering ofgenomic matches of Amphimedon 5'-U24-30-nucleotide RNAs, although the loci were smaller and accounted for fewer reads (10% of the reads originating from 73 loci comprising 0.2% of the genome, Supplementary Table 4). Another characteristic of piRNAs is that they undergo Henimediated methylation of their terminal 2' oxygen 22 . To test for this modification, we treated RNA from Nematostella and Amphimedon with periodate and then re-sequenced from both treated and untreated samples (Supplementary Fig. 4). Piwi-interacting RNAs and other RNAs modified at their 2' oxygen remain unchanged with this treatment and are sequenced, whereas those with an unmodified 2',3' cisdiol are oxidized, which renders them refractory to sequencing". In contrast to the Amphimedon miRNAs and many of the Nematostella miRNAs (Supplementary Tables 1 and 2), reads corresponding to the candidate piRNA clusters in both Nematostella and Amphimedon were not reduced after treatment (Supplementary Tables 3 and 4), indicating that their terminal 2',3' cis-diol was modified. This modification, considered together with their other features characteristic of vertebrate and fly piRNAs, including the length of 25-30 nucleotides, the 5'-U bias, and the single-stranded, clustered organization of their genomic matches, provided evidence that these small RNAs represented piRNAs of Nematostella and Amphimedon. The piRNAs were the type of small RNAs most abundantly sequenced in Nematostella and Amphimedon (Figs 2a and 3a, and Supplementary Discussion). A similar phenomenon is observed in mammalian testes, in which the pachytene piRNAs greatly outnumber the miRNAs and initially obscured detection of a second class of mammalian piRNAs, which resemble the most abundant Drosophila piRNAs with respect to both their biogenesis and their apparent role in suppressing transposon activity2 . Most of the Nematostella and Amphimedon genomic loci with clustered piRNA matches resembled the first class of piRNAs, in that they tended to fall outside of annotated genes (P < 10-3, Wilcoxon rank-sum test) and spawned piRNAs predominately from only one DNA strand (>99% and 96% from one strand, Nematostella and Amphimedon, respectively). To determine whether the second class of piRNAs might also exist in deeply branching lineages, we analysed the sequences from periodate-treated samples, focusing on the minority that matched annotated protein-coding genes (Fig. 4b). As expected for class II piRNAs, these piRNAs did not have such a strong tendency to match only one strand of the DNA (62% and 64% antisense for Nematostellaand Amphimedon, respectively). Moreover, among the predicted coding regions with the most matches to the piRNAs, a significant fraction (18 of 50 in Nematostella, P< 10-3; 12 of 40 in Amphimedon, P= 0.03, Supplementary Tables 5 and 6) were homologous to transposases. Having found small RNAs resembling bilaterian class II piRNAs we looked for evidence that they were generated through the same feed-forward biogenic pathway4 ". In this pathway, primary piRNAs from transcripts antisense to transposable elements pair to transposon messages and direct their cleavage. This cleavage defines the 5' termini of secondary piRNAs generated from the transposon message, and these secondary piRNAs pair to piRNA transcripts, directing cleavage and thereby defining the 5' termini of additional piRNAs resembling the primary piRNAs. Because the primary piRNAs typically begin with a 5'-U and direct cleavage at the nucleotide that pairs to position 10, the secondary piRNAs typically have an A at 8001 (Scaffold 328: 50-140 kb) a 600 10 kb 1 400 F 200 sili.L1 LA 0 .~L 1 H EA[[.ILIL.il~1Jl 150 Z 20 0 2() 100 nt 325-100 (genelD: 200314) ~-~- - C 100 Nematostella Amphimedon Sense 0080 Sense S60. 40. a201 Q 1 100 Nematostella Antisense 80 kb 0.1 Normalized reads -- uJr~-s -- 1. P 5-25 1-5 <1 Nucleotide identity 10 1 Amphimedon -. Antisense 60 40 a* 20 1 10 Position Figure 4 | The piRNAs of basal metazoans. a, Distribution of reads matching a Nematostefla piRNA locus. Plotted is the number of matching reads with 5' nucleotide falling within each 100-nucleotide window (main graph) or at each nucleotide (higher-resolution inset) spanning the genomic region. Bars above and below the x axis indicate matches to the indicated strand, with black bars indicating reads with a 5'-U and red bars indicating the sum of all other reads. For reads also matching other genomic loci, counts were normalized by total genome matches. Other annotated piRNA loci are presented in Supplementary Tables 3 and 4. kb, kilobases. b, An annotated pre-mRNA corresponding to numerous small RNAs resistant to periodate treatment. Annotated coding segments (open boxes) and intron segments (black line) are indicated. The gene was homologous to endonuclease/reverse transcriptases of other genomes and presumed to be a transposase. Small RNAs with unique 5' ends are represented by coloured bars above or below the transcript (sense and antisense, respectively), with colours indicating the read numbers (normalized to account for the number of transcriptome matches). Small RNAs matching splice junctions (observed only for sense reads) are represented by discontinuous bars, linked by dashed lines. Other Nematostella and Amphimedoncoding regions matching candidate piRNAs are listed in Supplementary Tables 5 and 6. c, Nucleotide composition of periodate-resistant small RNAs matching the indicated strand of Nematostella or Amphimedon annotated coding regions. 1196 @2008 Macmillan Publishers Limited. All rights reserved NATUREI Vol 455130 October 2008 ARTICLES position 10. Examination of all 27-30-nucleotide periodate-resistant reads antisense to Nematostella coding regions revealed a propensity for a 5'-U, characteristic of primary piRNAs (Fig. 4c). The sensestrand piRNAs lacked this 5'-U bias and instead displayed a propensity for an A at position 10 (Fig. 4c and Supplementary Fig. 5). Moreover, sense and antisense reads that paired to each other tended to have 10 base pairs formed between their 5' ends (Supplementary Fig. 6). For the 24-30-nucleotide periodate-resistant reads from Amphimedon, the same hallmark features of the back-and-forth, or ping-pong, amplification cycle for piRNA biogenesis4' 5 were observed (Fig. 4c and Supplementary Fig. 6). We conclude that the two classes of piRNAs found previously in mammals and flies have existed since the origin of metazoans: the class I piRNAs, represented by the mammalian pachytene piRNAs, which have unknown function during germline development, and the class II piRNAs, which use the ping-pong cleavage and amplification cascade to quiet expression of certain genes, particularly those of transposons. Indeed, the sequence-based transposon silencing by piRNAs, which by virtue of the feed-forward amplification process focuses on the most active transposon species, might be one of the principle drivers of transposon diversity in animals. Taken together, our results indicate that miRNAs and piRNAs, as classes of small riboregulators, have been present since the dawn of animal life, and indeed might have helped to usher in the era of multicellular animal life. However, metazoan miRNA evolution seems to have been very dynamic: all miRNAs have been lost in Trichoplax,and the pre-miRNAs of Porifera, Cnidaria and Bilateria have assumed distinct sizes. In addition, no miRNAs have recognizable conservation between poriferans, cnidarians and bilaterians, with only one of the Nematostella miRNAs displaying recognizable homology to bilaterian miRNAs, either because it is the only homologue of extant bilaterian miRNAs or because divergence has obscured common ancestry of other miRNAs. The wholesale shifts in miRNA function implied by this plasticity are congruent with the report that, although thousands of miRNA-target interactions have been maintained within each of the nematode, fly and vertebrate lineages, very few appear to be conserved throughout all three lineages2". The plasticity of miRNA sequences over long timescales helps to explain why the rich small-RNA biology in basal organisms had escaped detection for so long. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. METHODS SUMMARY The M. brevicollis library was constructed as described" and sequenced by 454 Life Sciences. All other libraries (Supplementary Table 7) were constructed using an analogous method and sequenced on the Illumina platform. Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature. Received 5 June; accepted 12 September 2008. Published online 1October 2008. 1. 2. 3. 4. 5. 6. 7. Cerutti, H. & Casas-Mollano, J.A. On the origin and functions of RNA-mediated silencing: from protists to man. Curr. Genet. 50, 81-99 (2006). Bartel, D.P.MicroRNAs: genomics, biogenesis, mechanism, and function. Cell116, 281-297 (2004). Lewis, B. P.,Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20 (2005). Brennecke, J.et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089-1103 (2007). Aravin, A. A., Hannon, G. J.& Brennecke, J.The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science 318, 761-764 (2007). Jones-Rhoades, M. W., Bartel, D. P. & Bartel, B. MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol. 57, 19-53 (2006). Molnar, A., Schwach, F.,Studholme, D.J.,Thuenemann, E.C.& Baulcombe, D. C. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature 447, 1126-1129 (2007). 29. Zhao, T. et al. A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii. Genes Dev. 21,1190-1203 (2007). Pasquinelli, A. E. et al. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89 (2000). Hertel, J.et al. The expansion of the metazoan microRNA repertoire. BMC Genomics 7, 25 (2006). Sempere, L.F.,Cole, C. N., McPeek, M. A. & Peterson, K. J.The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J.Exp. Zool. 306, 575-588 (2006). Prochnik, S.E.,Rokhsar, D. S. & Aboobaker, A. A. Evidence for a microRNA expansion in the bilaterian ancestor. Dev. Genes Evol. 217, 73-77 (2007). Putnam, N. H. et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86-94 (2007). Ruby, J.G. et al. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C.elegans. Cell 127, 1193-1207 (2006). Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res.17,1850-1864 (2007). Larroux, C. et al. Genesis and expansion of metazoan transcription factor gene classes. Mol. Biol. Evol. 25, 980-996 (2008). Srivastava, M. et al. The Trichoplax genome and the nature of placozoans. Nature 454, 955-960 (2008). Lee, Y., Han, J.,Yeom, K. H., Jin, H. & Kim, V. N. Drosha in primary microRNA processing. Cold Spring Harb. Symp. Quant. Biol. 71,51-57 (2006). Fukuda, T. et al. DEAD-box RNA helicase subunits of the Drosha complex are required for processing of rRNA and a subset of microRNAs. Nature Cell Biol. 9, 604-611 (2007). King, N.et al. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-788 (2008). Yao, M.-C. & Chao, J.-L. RNA-guided DNA deletion in Tetrahymena: an RNAibased mechanism for programmed genome rearrangements. Annu. Rev. Genet. 39, 537-559 (2005). Horwich, M. D. et al. The Drosophila RNA methyltransferase, DmHenl, modifies germline piRNAs and single-stranded siRNAs in RISC. Curr. Biol. 17, 1265-1272 (2007). Seitz, H., Ghildiyal, M. & Zamore, P. D. Argonaute loading improves the 5' precision of both microRNAs and their miRNA strands in flies. Curr. Biol. 18, 147-151 (2008). Aravin, A. A., Sachidanandam, R.,Girard, A., Fejes-Toth, K. & Hannon, G. J. Developmentally regulated piRNA clusters implicate MILl in transposon control. Science 316, 744-747 (2007). Gunawardane, L.S.et al. A slicer-mediated mechanism for repeat-associated siRNA 5' end formation in Drosophila. Science 315, 1587-1590 (2007). Chen, K. & Rajewsky, N.Deep conservation of microRNA-target relationships and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb. Symp. Quant. Biol. 71,149-156 (2006). Yigit, E.et al. Analysis of the C.elegans Argonaute family reveals that distinct Argonautes act sequentially during RNAi. Cell 127, 747-757 (2006). Bourlat, S.J.,Nielsen, C.,Economou, A. D.& Telford, M. J.Testing the new animal phylogeny: a phylum level molecular analysis of the animal kingdom. Mol. Phylogenet. Evol. 49, 23-31 (2008). Griffiths-Jones, S.,Saini, H. K., van Dongen, S.& Enright, A. J.miRBase: tools for microRNA genomics. Nucleic Acids Res.36, D154-D158 (2008). Supplementary Information is linked to the online version of the paper at www.nature.com/nature. Acknowledgements We thank M. Abedin and E. Begovic for preparing the Monosiga and Trichoplax samples, respectively, W. Johnston for technical assistance, and J.Grenier, C.Mayr, C. Jan and N. Lau for discussions. This work was supported by an NIH postdoctoral fellowship (A.G.), and by grants from the NIH (D.P.B.), Richard Melmon (M.S., N.K. and D.S.R.), the Center for Integrative Genomics (M.S. and D.S.R.), the Gordon and Betty Moore Foundation (N.K.) and the Australian Research Council (B.F., B.J.W. and B.M.D.). D.P.B. is an investigator of the Howard Hughes Medical Institute. Author Contributions A.G. constructed the libraries using procedures developed by H.R.C., and analysed the sequencing reads and protein homology. M.S., B.F., B.J.W., N.K., B.M.D. and D.S.R. provided samples for RNA extraction. A.G. and D.P.B. designed the study and prepared the manuscript, with input from other authors. Author Information RNA sequencing data were deposited in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE12578. Reprints and permissions information is available at www.nature.com/ reprints. Correspondence and requests for materials should be addressed to D.P.B. (dbartel@wi.mit.edu). 1197 @2008 Macmillan Publishers Limited. All rights reserved nature doi:10.1038/natureO7415 METHODS Small RNA sequencing. Samples of N. vectensis (mixed developmental stages, including adult), A. queenslandica (adult tissue, stored in RNAlater, Ambion) and M brevicollis were ground under liquid nitrogen, and then RNA was extracted with Trizol (Invitrogen). RNA from T. adhaerens (mixed developmental stages, including adult) and A. queenslandica (mixed embryos, from cleavage stage to the larval stage", stored in RNAlater) was extracted directly with Trizol. The M. brevicollis library was constructed as described"' and sequenced by 454 Life Sciences. All other libraries (Supplementary Table 7) were sequenced on the Illumina platform, and prepared as follows. The 18-30-nucleotide RNAs were purified from total RNA (typically 5 pg) using denaturing 32 polyacrylamide-urea gels. Before purification, trace amounts of 5'- P-labelled and GGCAUUAACGCGGRNA size markers (AGCGUGUAGGGAUCCAAA CCGCUCUACAAUAGUGA) were mixed with the total RNA and used to monitor this purification and subsequent ligations and purifications. The gel-purified RNA was ligated to pre-adenylated adaptor DNA (AppTCGTATGCCGTCTTCTGCTTG-[3'-3' linkage]-T) using T4 RNA ligase (10units ligase, GE Healthcare, 10 pl reaction, 50 pmol adaptor ATP-free ligase buffer", for 2 h at 21-23 C). Gel-purified ligation products were ligated to a 5'-adaptor RNA (GUUCAGAGUUCUACAGUCCGACGAUC), again using T4 RNA ligase (as above, except with 20 units ligase, 15 Il reaction supplemented with 4 nmol ATP, 400 pmol adaptor, for 18 h at room temperature). Gel-purified ligation products were reverse-transcribed (SuperScript II, Invitrogen, 30 pl reaction with the reverse transcription primer CAAGCAGAAGACGGCATA) and then RNA was base-hydrolysed with addition of 5 pl of 1 M NaOH and incubation at 90 'C for 10 min, followed by neutralization with addition of 25 PIl 1 M HEPES, pH 7.0, and desalting (Microspin G-25 column, Amersham). The resulting cDNA library was amplified with the RT primer and PCR primer (AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA) for a sufficient number of cycles (typically -20) to detect (SYBR Gold, Invitrogen) a clear band in a 90% formamide, 8% acrylamide gel, used for purification. Gelpurified amplicon (85-105nucleotides) from each library was subjected to Illumina sequencing. The adaptor and primer sequences enabled cluster generation on the Illumina machine and placed a binding site for the sequencing primer (CGACAGGTTCAGAGTTCTACAGTCCGACGATC) adjacent to the sequence of the small RNA. Periodate-treated libraries were generated identically, except total RNA was first subjected to p-elimination". Mock-treated libraries omitting periodate were constructed in parallel. MicroRNA identification and analysis. The N. vectensis, T. adhaerensand M. 720 were downloaded from JGI brevicollis genomes and predicted gene sets'" (http://jgi.doe.gov); the A. queenslandica genome was a preliminary assembly". After removing the adaptor sequences, reads were collapsed to a non-redundant set and matched to the appropriate genome. Genome matches were clustered if neighbouring matches fell within either 50 nucleotides (Amphimedon, Nematostella) or 500 nucleotides (Amphimedon) of each other. The increased size of the clustering window used for the Amphimedon analysis (500 nucleotides) was necessary because the 50-nucleotide window was insufficient to identify all Amphimedon miRNAs, owing to the increased size of their premiRNAs (Fig. 3e). No additional miRNAs were identified in Nematostellawhen using a 500-nucleotide window. Sequences of clusters containing 17-25-nucleotide reads cloned at least twice were folded with RNAfold". Ifthe most frequently sequenced species was located on one arm of a predicted hairpin and the region of the hairpin corresponding to that sequence contained 16 base pairs, the candidate locus was examined manually for characteristics of known miRNAs, using criteria described in the main text. Before comparing between adult and embryonic libraries (Fig. 3d), counts corresponding to each mature miRNA from each library were first normalized by the total number of genome-matching reads in that library. To detect possible homology between previously known miRNAs and either Nematostella or Amphimedon miRNAs, we searched miRBase (version 10.1) for miRNAs similar to our new miRNAs. Because miRNA conservation is most pronounced within the miRNA 5' region", we first identified any known and new miRNAs that shared a hexanucleotide within their first eight nucleotides, allowing two-nucleotide offsets. Because of the limited length of the search sequence, and the large number of miRNAs in miRBase, most Nematostella or Amphimedon miRNAs shared a hexanucleotide with miRBase miRNAs. For all such cases, we then searched for extended similarity between the pairs of miRNAs. With the exception of the miR-100 relationship, no more than chance similarity was observed (Supplementary Fig. 1). However, we cannot rule out the possibility that additional homologous relationships are present but undetectable. Because miRNAs are shorter than most other genetically encoded molecules, sequence divergence can more easily obscure homologous relationships, and although they resist changes in the seed region, which is crucial for target recognition, divergence in this 5' region can be accelerated with the processes of sub- and neo-functionalization". Piwi-interacting RNA identification and analysis. Nematostella 27-30-nucleotide RNAs and Amphimedon 24-30-nucleotide RNAs were mapped to their respective genome, and at each matching locus counts were normalized, dividing by the number of genome matches for the sequenced RNA. Regions with both a high number of match-normalized reads (Nematostella:>1,000 per 10 kilobases; Amphimedon: >100 per 5 kilobases) and a high diversity of read sequences (Nematostella): >500 different sequences per 10 kilobases; Amphimedon: >50 different sequences per 5 kilobases) were identified; following the periodate experiment we further evaluated these regions, which led to the removal of four Amphimedon regions that had far fewer reads in the periodate-treated libraries. The remaining regions are listed in Supplementary Tables 3 (Nematostella) and 4 (Amphimedon), which report the proportion of 5'-U match-normalized reads to each strand and the ratio of match-normalized read counts in periodate-treated compared to mock-treated libraries, after normalization for the number of gen3 6 ome-matching reads in each library. The number of predicted transcripts1' overlapping genomic piRNA clusters (Supplementary Tables 3 and 4) was calculated and compared to the number overlapping 1,000 random sets equal in size and number to the piRNA clusters. Inferred protein sequences from predicted transcripts matching the greatest number of periodate-resistant, match-normalized reads were compared to annotated protein sequences using BLAST. Transcripts that were significantly similar to annotated transposons, or protein domains implicated as transposases (for example reverse transcriptases) were considered to encode transposases. A random selection of 100 predicted transcripts was searched similarly to ascertain significance (Nematostella: 3 out of 100; Amphimedon: 6 out of 100). When mapping to annotated protein-coding regions (Fig. 4b), reads with both sense and antisense matches were distributed to both the sense and antisense tallies after weighting by the proportion of their sense and antisense matches. Cataloguing of the small RNA machinery. To identify homologues of components of the small RNA machinery, all established family members from H. sapiens, D. melanogaster, C. elegans, S. pombe and A. thaliana were used as BLAST query sequences against all annotated protein sequences of each species in Table 1. The top-ranking hits resulting from these initial searches were used reciprocally as query sequences against all annotated protein sequences of H. sapiens,D. melanogaster,C. elegans, S. pombe and A. thaliana.If the top-ranking hits of such reciprocal queries corresponded to an established family member, the query sequence was considered to be a candidate homologue. The domain structure of each candidate sequence was then evaluated", and candidates lacking the diagnostic domains were discarded. The diagnostic domains used were a Paz and a Piwi domain (for Ago and Piwi family members), two RNase III domains (Dicer and Drosha), a double-stranded RNA-binding domain (Pasha) and a methylase domain (Hen1). 30. Adamska, M. et af. Wnt and TGF-p expression in the sponge Amphimedon queenslandica and the origin of metazoan embryonic patterning. PLoS ONE 2, e1031 (2007). 31. England, T. E.,Gumport, R.I.& Uhlenbeck, 0. C. Dinucleoside pyrophosphate are substrates for T4-induced RNA ligase. Proc. Natl Acad. Sci. USA 74, 4839-4842 (1977). 32. Kemper, B. Inactivation of parathyroid hormone mRNA by treatment with periodate and aniline. Nature 262, 321-323 (1976). 33. Hofacker, I.L.Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 125, 167-188 (1994). 34. Lim, L.P.etal. The microRNAs of Caenorhabditis elegans. Genes Dev. 17,991-1008 (2003). 35. Marchler-Bauer, A. et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 35, D237-D240 (2007). @2008 Macmillan Publishers Limited. All rights reserved Loss of Cardiac microRNA-Mediated Regulation Leads to Dilated Cardiomyopathy and Heart Failure Prakash K. Rao, Yumiko Toyama, H. Rosaria Chiang, Sumeet Gupta, Michael Bauer, Rostislav Medvid, Ferenc Reinhardt, Ronglih Liao, Monty Krieger, Rudolf Jaenisch, Harvey F. Lodish, Robert Blelloch Rationale: Heart failure is a deadly and devastating disease that places immense costs on an aging society. To develop therapies aimed at rescuing the failing heart, it is important to understand the molecular mechanisms underlying cardiomyocyte structure and function. Objective: microRNAs are important regulators of gene expression, and we sought to define the global contributions made by microRNAs toward maintaining cardiomyocyte integrity. Methods and Results: First, we performed deep sequencing analysis to catalog the miRNA population in the adult heart. Second, we genetically deleted, in cardiac myocytes, an essential component of the machinery that is required to generate miRNAs. Deep sequencing of miRNAs from the heart revealed the enrichment of a small number of microRNAs with one, miR-1, accounting for 40% of all microRNAs. Cardiomyocyte-specific deletion of dgcr8, a gene required for microRNA biogenesis, revealed a fully penetrant phenotype that begins with left ventricular malfunction progressing to a dilated cardiomyopathy and premature lethality. Conclusions: These observations reveal a critical role for microRNAs in maintaining cardiac function in mature cardiomyocytes and raise the possibility that only a handful of microRNAs may ultimately be responsible for the dramatic cardiac phenotype seen in the absence of dgcr8. (Circ Res. 2009;105:585-594.) Key Words: cardiac disease m cardiac failure u cardiomyocytes * myocardium m microRNA A into perspective the enormous regulatory potential possessed by microRNAs. Not surprisingly, a number of studies have revealed the importance of the microRNA pathway as a whole, whereas others have pinpointed specific roles for individual microRNAs in various tissues.7 - 18 Although mature microRNAs are only ~22 nucleotides in length, they are generated from longer precursors whose length distribution is similar to that of a mRNA. Indeed, the primary transcripts (pri-miRNAs) are transcribed by RNA Polymerase II, capped, polyadenylated, and regulated by transcription factors like protein-coding mRNAs. 19 -21 Unlike mRNAs, miRNAs-because of their stem-loop structure-are cleaved within the nucleus by a Drosha/Dgcr8 containing complex into ~60- to 80-bp precursor miRNAs (premiRNAs). 22-24 The precursor miRNAs are transported out of the nucleus by Exportin-5 25 and subsequently processed by a cytoplasmic RNAseIII-Dicer 26-which also resides in a multiprotein complex. Because the Piwi, Argonaut, and Zwille (PAZ) domain of Dicer recognizes the 2-nucleotide 3'OH overhang 1 areRNAs, known small noncoding of endogenous assubset microRNAs (miRNAs or miRs), ~22 nucleotides long and modulate gene expression by targeting mRNAs for posttranscriptional repression. There are nearly 500 and 800 microRNAs in mice and humans, respectively 2 (http:// microrna.sanger.ac.uk). In animals, repression is achieved through imperfect base-pairing between the microRNA and its target mRNA. Although there are certain rare instances in which microRNAs have been reported to upregulate target gene expression,3 4, repression is the most well-documented direct effect. The target mRNA is rendered labile through mechanisms involving deadenylation/decapping, translational repression, or both. Target specificity is largely governed by the highly conserved seed region (nucleotides 2 to 8) of the miRNA. 5 Various target prediction programs have relied on this fact, and an estimated 30% of the mRNAs are susceptible to miRNA-mediated regulation. 6 Although this number is likely an overestimate, as it does not take into account the requirement for coexpression of miRNAs and mRNAs, it puts Original received May 5, 2009; revision received July 30, 2009; accepted August 3, 2009. From the Whitehead Institute for Biomedical Research (P.K.R., H.R.C., S.G., F.R., R.J., H.F.L.) and the Department of Biology (Y.T., M.K., R.J., H.F.L.), Massachusetts Institute of Technology, Cambridge; the Division of Cardiology (M.B., R.L.), Brigham and Women's Hospital, Harvard Medical School, Boston, Mass; and the Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Biology, Department of Urology (R.M., R.B.), University of California, San Francisco. Correspondence to Harvey F. Lodish, PhD, Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142. E-mail lodish@wi.mit.edu and Robert Blelloch, MD, PhD, The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, and Department of Urology, University of California, San Francisco, San Francisco, CA. E-mail blellochr@stemcell.ucsf.edu @ 2009 American Heart Association, Inc. Circulation Research is available at http://circres.ahajournals.org DOI: 10.1161/CIRCRESAHA.109.200451 585 586 Circulation Research September 11, 2009 Non-standard Abbreviations and Acronyms Dgcr8 PGC1a PGC1b Myh6 Myh7 KO DiGeorge syndrome critical region 8 PPARg coactivator-la PPARg coactivator-1 b myosin heavy chain 6 myosin heavy chain 7 knockout generated by Drosha/Dgcr8, it is believed that the nuclear Drosha/Dgcr8 cleavage is required for Dicer-mediated cytoplasmic cleavage of premiRNAs. Exceptions to this general dependence on Drosha/Dgcr8 occur in miRtrons and endogenous short hairpin RNAs, 27- 29 and in these rare cases other nucleases generate the necessary ends for subsequent Dicer recognition and cleavage. Importantly, spatial segregation of Drosha/Dgcr8 and Dicer substrates allows for the two cleavage events to occur in a sequential manner. We sought to uncover the regulatory potential of miRNAs in the heart by using 2 complementary approaches. First we catalog the known miRNA population of murine adult heart using deep (Solexa/Illumina) sequencing of a small RNA library. Secondly, we disrupt microRNA regulation by deleting dgcr8 and hence canonical microRNA biogenesis. We chose to focus on mature muscle tissue to establish the importance of microRNA function in the maintenance (as opposed to the development) of cardiac tissue. Mice lacking dgcr8 in muscle tissue die prematurely with signs of heart failure and dilated cardiomyopathy. Identification of the depleted microRNAs in dgcr8-deficient hearts led to the refined list of microRNA targets that may collectively play an important role in the development of the pathological state. Thus, the importance of the microRNA regulation in maintaining cardiomyocyte function is revealed by the fatal outcome associated with lack of dgcr8 in cardiomyocytes. Methods Details are included in the supplemental materials (available online at http://circres.ahajoumals.org). Briefly, a library of small RNAs was generated and sequenced using the Illumina platform. 30 For the generation of conditional dgcr8 knockout, floxed dgcr8 mice were crossed with Muscle Creatine Kinase (MCK)-Cre mice31; mutant mice were genotyped using tail DNA by a PCR-based approach. Age- and sex-matched mutant (21ox/21ox; Cre positive) and control (2lox/+; Cre positive) mice were analyzed pathologically; physiological studies were performed using telemetry and echocardiography. Molecular analyses were carried out using total RNA isolated from the heart; Northern blots were used to detect depletion of miR-1, miR-133, and miR-208. Array-based methods were employed to assess global loss of microRNAs. Results Deep Sequencing of microRNAs From Heart Tissue High throughput deep sequencing produces quantitative data with an extensive dynamic range, thereby enabling detailed insight into the relative levels of different microRNA in a particular tissue. Therefore, to gain such insight into the microRNA profile of the adult heart, we isolated small RNAs (16 to 24 nucleotides) from 6- to 8-week-old male and female hearts, built tagged cDNA libraries and sequenced the libraries on a Illumina Genome analyzer producing over 7 million reads from each sample. As has been reported previously, miR-1 and miR-133a were highly abundant (Figure 1A)however the relative abundance of miR-1 reads was quite striking. miR-1 accounted for nearly 40% of all known microRNA reads. Also noteworthy is the fact that other microRNAs, including miR-29a, miR-26a, let-7 family members, were more abundant than miR-133a. MicroRNAs from noncardiomyocytes (miR-29a and miR-29c from fibroblasts, 32 miR-126 from endothelial cells 9,33) also contributed to the library as expected because the different cell types in the heart were not separated. Within the cardiac-specific miR-208 subsets, 50 to 100 times more reads were obtained for miR-208a (encoded within an intron of myosin heavy chain 6 [Myh6]) when compared to miR-208b (encoded within an intron of Myh7), consistent with the relative overexpression of Myh6 compared to Myh7 in adult mice. miR-22 was highly expressed and showed gender-based differences in expression levels. Although sexually dimorphic gene expression patterns in somatic tissues 34 have been established,34 follow-up experiments will need to be carried out to confirm sex-based differences in miRNA expression in the heart. Reads from the miR-378 hairpin were also high; miR-378/378* (miR-378* is the same as miR-422b) is encoded within an intron of the PPARg coactivator-lb (PGC1b) gene. Because PPARg coactivator-la (PGCla) and PGClb regulate mitochondrial biogenesis and the heart is a mitochondria-rich organ, the high expression levels of miR378/miR-378* probably reflects the high endogenous levels of PGClb transcription. Because the ability of a miRNA to repress target gene expression is largely dependent on the 5' end of the miRNA, multiple miRNAs with identical 5' ends are expected to function in a similar manner. This seed identity is the basis by which microRNAs are grouped into families. Therefore, we tabulated all the microRNA reads within individual families (as defined by TargetScan 4.1; www.targetscan.org; Figure IB). By this analysis, the miR1/206 family still emerged as the most dominant microRNA family (the reads from miR-206 were insignificant). Because a considerable number of reads were obtained individually from members of the let7/miR-98 and miR-30a-5p family, these families were, respectively, the second and third most abundant microRNA families in the heart. Muscle-Specific Dgcr8 Knockout The importance of the microRNA pathway during development has been largely inferred from studies in which Dicer has been deleted.18, 35- 39 As dicer has roles outside of the canonical miRNA pathway, we sought to block microRNA maturation (and therefore microRNA-mediated regulation) using another component of the microRNA biogenesis pathway, namely Dgcr8. Dgcr8 deletion in embryonic stem cells has revealed that it is essential for microRNA biogenesis and implicate microRNAs in regulating efficient ES-cell differentiation. 40 Using MCK-Cre mice 3 ' and a conditional floxed allele of dgcr8, we generated mice with a muscle-specific Rao et al microRNAs in Adult Heart Maintenance 587 45 Umajle O0femnale 40 .835 ~30 .+25 430 C15 10 ikIk[frrn~nAtammrL LI E E E 40 35 0 .- 230 r-E .... A _. E E L ~ ~~~~~~ml EU E~ EEE=EaEE B545 rn. ru r-L EEEE EEEEE E E EE . Figure 1. microRNA abundance in the murine adult heart. A, The top 20 known microRNAs (interms of normalized read number) from the male (dark bars) and the female (clear bars) heart small RNA libraries were converted to percentage terms and plotted. Because the rank order differs slightly between the male and female libraries, the total number of microRNAs plotted is greater than 20. Note the abundance of miR-1 reads relative to other known microRNAs. B, microRNAs belonging to the same family (as defined by Targetscan; www. targetscan.org) were summed and plotted together. This analyses reveals that aside from miR-1, the let-7 and miR-30 families are among the ones that are highly abundant in the heart. E C 0 0 IL15 r6im Arh o rIIAA deletion of the dgcr8 gene. Endogenous MCK expression reportedly peaks around birth and declines to 40% of peak levels by day 10.31 This Cre line was deliberately chosen to match our interest in specifically disrupting microRNA biogenesis in mature differentiated muscle, as this allowed us to determine the importance of the microRNA pathway in muscle homeostasis. Genotyping analysis showed that although mutant (2lox/2lox; Cre positive) mice were slightly underrepresented at the time of genotyping, most mutant mice survived to at least 12 days after birth. We did not observe any pathology on 4-chamber sections (H & E stained) at 2 weeks of age. At 3 weeks of age, we detected fibrosis in the ventricular wall in all mice examined, and loss of ventricular function (as revealed by transthoracic echocardiography-see below). Subsequently, all mutant mice died before 2 months of age and the median survival was 31 days (Figure 2A and 2B). At end stage, the hearts of mutant mice showed marked decreases in the thickness of the left and right ventricular walls. Therefore, the development of the pathology is quite rapid and highly penetrant. This demonstrates the stringent requirement for a threshold level of microRNAs below which heart function rapidly deteriorates. To determine the extent of microRNA depletion in the heart, we performed Northern blot and quantitative RT-PCR analyses to quantify cardiomyocyte-specific microRNAs (Figure 2C) with RNA derived from the heart tissue of mutant and control (2lox/+; Cre positive) mice. At the time of sacrifice (when mutant mice were moribund), Northern blot analysis showed that 3 cardiac-enriched mature microRNAs (miR-1, miR-133a, and miR-208) were dramatically depleted, but not completely absent, in mutant heart tissue. Quantification of the Northern blots revealed that depending on the microRNA, the mature forms were depleted 10- to 60-fold (Figure 2C, bottom). Their precursor miRNAs (the ~60-bp product of Drosha/Dgcr8 cleavage) were detectable in the control lanes and absent in the mutant lanes (Figure 2D; ... ......... 588 A ........ .. ...... - - .. .. ........ Circulation Research I"r o st e ice September 11, 2009 miR-208 I 2oxilox 2ox/wt IWt/Wt mutant B control 10....... ....................................... ... pre C 2Iox/2Iox; Cre positive (n=42) 2lox/+; Cre positive (n=144) +/+; Cre positive (n=76) - c 4- J1II*M mature W 5075 100 Days (n=42) (n=144) (n=76) miR-1 (top) &U6 (bottom) miR-133 (top) &U6 (bottom) miR-208 (top) & US (bottom) cnntrol mutant rnntlr mutant Figure 2. Lethality and microRNA expression in musclespecific dgcr8 KO mice. A, Actual numbers of Crepositive mice (and the expected numbers-based on OfM-1 P=0,0148 m 33, p oR-1 s0.00s DM-208S P=0.0691 4 2 3 3 2 o aRNA Ol 0.01 1 21OX21O h~r 21x/+ Punnett square analysis for 2 independent loci) obtained from matings between 21ox/+; Cre-positive mice are shown. Expected numbers are based on the assumption that Cre transgene is heterozygous, although this is not known. B, Postnatal lethality of muscle-specific dgcr8 KO mice. Survival curves for Cre-positive mice are shown and reveal the lethality when dgcr8 is excised in muscle tissue. Moribund "hunched over" mice that had to be euthanized because of animal care committee specifications were considered dead for survival analysis. Survival curves were plotted using a built-in module in Prism software. C, miR-1, miR-133, and miR-208 expression was determined using Northern blots from total derived from heart tissue. Tissues from 3 mutant (2ox/2lox; Cre-positive) and 3 sex- and age-matched control (21ox/+; Cre-positive) siblingswerpysed for total RNA ages were 29 days (males), 29 days and 38 days (females). The same blots were reprobed for U6 (bottom ~rtisolation; part of each se) to normalize for differences inloading. Quantified miR/U6 ratios are plotted below for miR-1, miR-1 33, and miR-208, and the indicated probability values were obtained using 2-sample (unequal variance) 1-tailed t-tests. D,A larger region of the miR-208 Northern blot shown in Figure 2C(iii) reveals the absence of premiR-208 intotal RNA derived from the mutant heart. shown for miR-208). The complete loss of the short-lived ~60-bp precursor, but not mature miRNA, favors the argument that the residual amount of mature miRNA detected is attributable to its long half-life, rather than an incomplete excision of dgcr8 in these tissues. The hearts of the mutant animals exhibited a variety of abnormalities that suggest cardiac dysfunction was responsible for their premature death. Preliminary ECG analysis of revealed dramatic drops in the heart rate of mutant mice along with an increased PQ interval and QRS width (all at end stage) indicative of a cardiac conduction defect (supplemental Figure IV). Histopathologic analysis revealed that the hearts obtained from end-stage mutant mice were considerably enlarged with notable thinning of the ventricular walls (Figure 3A and 3B; note end-stage mutant hearts). Fibrosis was also evident (Figure 3C), an early and consistent pathological ... ............ . ........................... NOW- " .n" c 4$ conift 13 ' 011 M~t @C VD r-?I eIsmo Co" M'Al 4 I 0 J#/J4//0 tmwwb finding as it is observed in all mice at about 3 weeks of age (at which time there was no histopathologically obvious defect in the thickness of the ventricular wall; Figure 3B and 3C). Quantitative RT-PCR analyses of cardiomyocytespecific microRNAs was also carried out at end stage and at 2 weeks after birth. Precipitous decreases in miR-1, miR133a, and miR-208 levels was detected in 2-week-old mice (Figure 3D), and this preceded any pathophysiological changes that we observed. To assess left ventricular function, we performed echocardiography. We conducted these studies at 2 time points: 3 weeks and 4 weeks after birth as histopathologic analysis showed a dramatic progression between these 2 time points from mild fibrosis with otherwise no overt ventricular/wall defects (at 3 weeks) to extensive dilation (at 4 weeks). Accordingly, measurement of fractional shortening (FS) revealed that the mutant mice had dramatically reduced ventricular function at 4 weeks (supplemental Figure VI), although wall thickness was not significantly different. This finding was not surprising considering the clear histopathologic defects at this time point. The expectation at 3 weeks (Figure 4B) was more ambiguous because we noted fibrosis at this point but did not see an obvious defect in wall thickness or ventricular volume in tissue sections. However, echocardiography at 3 weeks after birth revealed that ventricular function (as assessed by FS readings) was decreased in mutant mice, and the trend toward increased ventricular volume was already evident (see numbers for EDD at 3 weeks in the table in Figure 4B). Given the defects in ventricular function, one plausible explanation is that the myofibrillar apparatus was disorganized to the extent that contraction was ineffective. Such disarray has been noted in mice bearing a cardiac-specific loss of function allele of dicer.' 3 Ultrastructural analysis (supplemental Figure V) revealed mild myofibrillar disarray mostly related to misalignment of the contractile apparatus. To determine whether pathology-associated cardiac markers n 589 microRNAs in Adult Heart Maintenance Rao et al A ........... Figure 3. A,Intact excised hearts from 30-day-old mutant (2lox/2lox: Cre-positive; left) and control (2lox/+; Cre-positive; right) female sibling mice. B, Representative long axis sections at different stages (as indicated) from mutant and control sexmatched sibling mice were stained with H & E (for end stage: LV indicates left ventricle; RV, right ventricle) or Masson Trichrome (for d14/15 & d21/ 22). Bar=500 pm. The stage at which the mice were euthanized to reveal end-stage pathology was variable and defined by the health status of the mice and is d34 in this panel. C, High magnification (20x) view of Masson Trichrome stained sections from 3-week and end-stage (d43 in this panel) mutant and control female sibling mice (interstitial blue staining, bright green arrows) is indicative of fibrotic collagen deposits. Bars=50 ym. D, RT-PCR assay to detect mature miR-1, miR-133a, and miR-208 levels showing that the decline is already evident at 2 weeks after birth and continues to decrease by the time the mice are moribund ("end stage"). The ratio of mutant to control is shown on the y axis, and the pairs chosen for evaluation were age- and sex-matched. t are expressed and fetal genes are activated, we performed real-time RT-PCR analysis. Nppa and Nppb were expressed at higher levels in mutant heart (Figure 5). Myh7, a fetal myosin whose reexpression in adulthood is associated with heart failure, was also expressed at higher levels in the mutant A (2lx/21ox;Cre pos) n= WTmm (2lox/+; Cre pos) n=3 n= -903 Figure 4. Trans-thoracic echocardiography. A, Representative short axis B and M mode images for both mutant (left) and control (right) mice at 3 weeks of age showing dilation in the mutant mice. B, Summary of echocardiographic data 3 weeks and 4 weeks after birth showing progressive dilation and reduction in ventricular function (see numbers for EDD and FS, respectively) between 3 and 4 weeks in mutant mice. WT indicates wall thickness; EDD, end-diastolic diameter; ESD, end-systolic diameter; FS, fractional shortening; HR, heart rate. *P<0.05 vs 2lox/+; Cre pos; tP<0.05 vs 3 weeks. .............. 590 Circulation Research September 11, 2009 Figure 5. Gene expression patterns in dgcr8 KO heart. Total RNA was obtained from end-stage (mutant) and sexand age-matched control mice. A minimum of 7 pairs -i -i iihearts. Myh6, the normal adult cardiac myosin, within which miR-208 is encoded, was expressed at similar levels in control and mutant hearts; thus the decrease seen in mature miR-208 is not attributable to differences in the regulation of the host gene. These molecular assays complement the pathological and echocardiographic observations and are consistent with a diagnosis of dilated cardiomyopathy. Next, we isolated RNA from mutant and control hearts to examine the expression of marker genes expressed in striated muscle (Figure 5). Cardiac markers were uniformly low in the mutant heart. In contrast, fast skeletal muscle markers were uniformly upregulated. One of the 3 slow skeletal markers (Tnnil) was also upregulated, whereas 2 (Tnntl and Tnncl) were not; intriguingly Tnnil also has a miR-133 binding site in its 3' untranslated region, and part of its upregulation may be attributable to the loss of miR-133. The upregulation of skeletal muscle genes has been previously noted in other miR knockout mice.8 10 As misexpression of skeletal muscle isoforms in the heart can lead to impaired cardiac function,41 at least part of the observed pathology may be attributed to increased expression of fast skeletal muscle transcripts at the expense of cardiac genes. An array-based profiling approach was carried out to compare relative levels of mature microRNAs in RNA derived from the hearts of mutant and control mice. MicroRNAs that are less abundant in the mutant heart when compared to the control heart are likely to be those enriched in cardiomyocytes (as dgcr8 is knocked out only in cardiomyocytes). Hence this analysis allows us to indirectly detect cardiomyocyteenriched microRNAs. As expected, we detected precipitous declines in the levels of cardiomyocyte-specific miR-1, miR133, miR-208 and miR-499 in the mutant hearts (supplemental Figure VII, compare to Figure 2C). Others that were decreased by greater than 2-fold, and therefore likely to be was used for the analysis of relative expression levels of the indicated genes. Box and whisker plots are represented for each gene. Expression is depicted as a ratio of mutant over control, with each being first normalized to GAPDH to account for differences in total amount of RNA used. Lack of a difference should manifest itself as having a ratio of 1.0 (dotted line). Although it is appreciated that some genes may fall into 2 or more categories during the embryonic and postnatal development, for convenience they are grouped into a single category (as indicated) that is representative of adult gene expression. enriched within the cardiomyocytes include miR-378/miR378* (aka miR-422b), miR-22, miR-486, miR-30e*, miR149, miR-709, miR-345, and members of the miR-30a-5p family (supplemental Figure VII). To uncover the scope of regulation that is disrupted by the loss of the microRNA pathway, we carried out an in silico analysis. We chose 10 microRNAs that were downregulated the most and used Targetscan to obtain a target list of mRNAs with conserved miRNA binding sites. Next, we extracted a published dataset42 that had compiled the list of genes that are expressed in the human heart. The intersection of these 2 lists (supplemental Table I) provided us with a list of genes whose expression could be upregulated in the hearts of mutant mice. This analysis suggests that approximately 14% (1140/7896) of the genes expressed in the heart could be potentially upregulated because of the loss of these 10 microRNAs that we determined to be cardiomyocyte-enriched. Included among this list of targets are genes that are involved in GPCR signaling (endothelin receptors), calcium signaling (Calcineurin subunits), smooth muscle contraction (Mylk), and calcification (Runx2). Thus it is likely that the complex phenotype is at least in part attributable to the misexpression of a subset of these genes. Discussion miR-1 and miR-133a Although many studies utilizing intertissue comparisons can attest to the abundance of miR-1 within muscle tissue, our study, by focusing on the intratissue abundance, has revealed a wide disparity between the levels of miR-1 and all the other microRNAs. This is especially significant when we compare miR-1 to miR-133 which are coregulated,43 (albeit differentially spliced") microRNAs. These results suggest that mechanisms other than transcription (eg, processing or stability) Rao et al can dramatically alter steady state levels of mature miRNAs. Recent evidence from other labs have shown such posttranscriptional regulation for let-7 and miR-21. 45-48 A second testament to the relative abundance of miR- 1 is evident when comparing the levels of mature miR-1 and miR-208a. Because miR-208a is resident within a cardiac myosin, its levels should be representative of a highly transcribed miRNA. The fact that miR-1 levels far exceed that of miR-208a provides further indirect evidence for posttranscriptional mechanisms governing microRNA stability. Lastly, as we are sampling a multitude of cell types in the heart, it is possible that within cardiomyocytes, the percentage of miR-1 in relation to other microRNAs is even higher, and further studies using purified cardiomyocytes will be needed to verify this possibility. The very high levels of miR-1 suggests it plays a central role in sustaining heart muscle function; indeed previous analysis of a miR-1-2 knockout mouse8 confirmed the importance of miR-l dosage in maintaining proper cardiac function. The homozygous loss of miR-1-2, 1 of 2 mir-1 loci, causes multiple defects in heart function. 8 We await the generation of appropriate conditional knockout mice lacking both miR1-1 & miR-1-2 mice to ascertain its singular importance in adult mouse myocardium. However, recent findings evaluating the mir-133a knockout mice clearly define its importance in cardiac development. Taken together, miR-1 and miR-133 appear to be attractive candidates for rescuing the phenotype associated with Dgcr8 loss. Our sequencing revealed that the miRBase annotation for miR-133a is offset by one nucleotide at the 5' end. The mature miRNA (which is the one that has the most common 5' end and is read most often) is UUGGUCCCCUUCAACCAGCUGU; the miRBase annotated miR-133a is UUUGGUCCCCUUCAACCAGCUG. Given the importance of the 5' end in determining target repression, this also changes the putative targets that may be repressed by miR-133a. We did obtain significant number of counts for the miR-133 species annotated on miRbase; however, based on our criteria for miRNA classification, we denote the species with one less U as the mature miRNA. Other independent sequencing data (HR Chiang, unpublished data, 2009) confirm this 5' heterogeneity of mature miR-133a. We also performed microarrays to determine the fold change in microRNA levels in mutant versus wild-type hearts. This strategy is particularly powerful as it enables the specific identification of microRNAs that are present in the cells expressing the cre transgene. Of note, the ranking of genes that were most dramatically reduced in the mutant hearts as determined by microRNA microarrays did not directly match with the relative amounts of individual microRNAs uncovered in the sequencing data. This difference is likely the result of a number reasons. First, the sequencing data represents the miRNAs in all the cell types of the heart, rather the cells expressing the transgene. Second, the absolute level of any microRNA will not necessarily correlate with the fold decrease after Dgcr8 loss as different miRNAs in the cardiomyocytes will almost certainly have different halflives. Third, we isolated RNA from the hearts of slightly different aged mice for sequencing (6 to 8 week) versus the microarrays (4 to 5 weeks). Hence age-related differences microRNAs in Adult Heart Maintenance 591 may partly explain the differences between the array and the sequencing data sets. Phenotype of Muscle-Specific Dgcr8 Knockout Mice We have used a loss of function allele of dgcr8 to uncover the importance of the microRNA pathway in cardiac integrity. The phenotypic outcome is similar to the cardiac-specific dicer deficient mice 18 and this similarity in phenotypes has also been shown in mice bearing conditional alleles of dgcr8 and dicer in the skin. 49 However, dgcr8 deficient mice have an advantage over dicer deficient as the former can potentially be rescued by a Dicer-substrate short hairpin RNA designed to produce mature miRNAs. Hence it will be possible to define, in a fairly straightforward manner, the "minimal microRNA" requirements for different cell types derived from these mice. This approach has been successfully used to reveal microRNAs important in the cell cycle regulation of murine ES cells, 5 0 and such an approach should be feasible in other cell types too. The results from the muscle-specific dgcr8 knockout mice demonstrate the essential role of microRNA regulation in cardiac function. Although we have not identified the root cause of dilated cardiomyopathy and HF in our mice, our data clearly demonstrates a role for the microRNA pathway in proper functioning of the heart. Changes in ventricular diameters were further visible in transthoracic echocardiography. This resulted in a significant decrease in mutant ventricular function as assessed by fractional shortening at 4 weeks of age compared to control littermates. Furthermore, transthoracic echocardiography revealed functional deterioration was already present at 3 weeks of age with mutant mice showing markedly reduced fractional shortening. This functional deterioration preceded dilation seen at 4 weeks of age (Figure 4). Heart rates and wall thickness were not significantly different in mutant mice (both at 3 weeks and at 4 weeks) supporting our histological observations that ruled out a hypertrophic phase prior to dilation. Ventricular walls from mutant mice did, however, exhibit a thinning trend that was detectable by echocardiography at 4 weeks (Figure 4B) and was very obvious histologically in end-stage mice (Figure 3B). Changes in microRNA levels have been noted to be secondary consequences of a stressed heart. 51 We demonstrate that cardiomyocyte-specific microRNA levels are depleted before the occurrence of pathophysiological changes (Figure 3D). These data are consistent with the microRNA loss being causative and representing a primary event in the emergence of the phenotype we have described. In comparison to the other single-microRNA knockouts,8 10 , 5, 2 the dgcr8 knockout exhibits a much more severe and penetrant phenotype. This is to be expected as a number of microRNAs are affected. Using a candidate gene approach and incorporating results from previously published work with the miR-208 / mice,10 we interrogated and detected the upregulation of several fast skeletal muscle genes in the heart. As cardiac muscle is more akin to a "slow" muscle the aberrant activation of fast skeletal genes could be pathological. Even though myofibrillar proteins are homologous, each striated muscle tissue has evolved to meet its particular needs 592 Circulation Research September 11, 2009 and previous studies have demonstrated that cardiac-specific overexpression of skeletal muscle specific protein can cause loss of cardiac function.4 ' Clearly, a widespread increase in skeletal gene expression, suggested by our candidate gene analysis (Figure 5), could contribute to the loss of cardiac function. Another aspect of pathological remodeling is the reestablishment of a fetal gene program in failing cardiomyocytes. Clearly this is also a consequence of Dgcr8 loss as exemplified by an increase in Myh7, a fetal myosin. Comparison With Other microRNA Deficient Mice In comparing our phenotype to the heart-specific Dicerdeficient mice (using the a-MHC promoter driven Cre)' we note the following important differences. We always detect fibrosis (Figure 3C) and see marked increases in MYH7 expression (Figure 5) in mutant mice. These could be attributable to differences in the timing of Cre-mediated excision, implying that loss of microRNA regulation at different times lead to different phenotypic outcomes. We also note that the recent report describing the knockout of dicer in the adult myocardium18 shows a broadly similar, but not an identical phenotype, to the dgcr8 knockout (KO) mice described herein. Importantly, we did not detect any hypertrophy during routine pathological staining, and the pathology of our mutant mice is more consistent with dilated cardiomyopathy that eventually leads to a phenotype that resembles human heart failure. Becuase the timing of the cre-mediated excision can cause different phenotypic outcomes, 18 direct comparison of the two dicer KO studies with this dgcr8 KO (using different promoters driving Cre) is complicated; nonetheless all these studies point to the clear importance for the microRNA pathway in cardiac function. A recent report from the Olson group reported the phenotype of a complete knockout of miR-133a.52 Pathologically, the dgcr8 muscle KO mice are very similar to the fraction of 133a-lk", 133aa-2k"(133a dKO) mice that survive to 2 or 4 months of age with fibrosis and ventricular wall thinning as common features. Similar to the 133adKO mice, we did not detect any gross hypertrophy before the advent of dilation; however, in contrast to the 133a dKO mice, we detect only mild myofibrillar disarray in our ultrastructural analysis. In both the 133a dKO mice and the cardiac-specific dicer KO mice, deletion of the cognate genomic region occurs early and hence may have a more profound effect on the arrangement of myofibrils. When dgcr8 expression (and therefore microRNA-mediated regulation) is perturbed after the establishment of the myofibrillar array, the requirement for an intact microRNA pathway may be less stringent (as is the case here). Nonetheless, overall pathological similarity suggests that one reason for the myopathy seen in the dgcr8 mice could be its lack of mature miR-133a. Fast skeletal gene expression has also been noted in the hearts of miR-208-'- mice and this particular phenotypic characteristic could be a consequence of low levels of miR-208. However, miR-208-1- mice do not lose cardiac function, and additional microRNAs have to be implicated in describing the complete phenotype associated with loss of Dgcr8 in the heart. Cardiac Heterochrony as a Possible Mechanism Underlying the Dramatic Phenotype An equally feasible and alternative (albeit speculative) explanation for the sustained expression of fetal gene markers in adulthood is that the lack of the microRNA pathway leads to an arrest in the development of the heart, such that late embryonic/prenatal gene expression patterns are maintained in the adult. Indeed, such heterochronic events in C elegans were instrumental in the identification of the founding members of microRNAs, 53"- and recent reports have confirmed that this heterochronic pathway is conserved. 56- 58 Two candidate genes that we have examined are normally repressed in adult tissues but continue to be expressed when Dgcr8 is absent: Myh7 (a fetal myosin) and Tnni (a slow skeletal muscle-specific troponin-complex subunit). Under normal physiological conditions, Tnni levels are downregulated in the heart after birth,59 but it continues to be expressed at relatively higher levels in the dgcr8 KO heart. Importantly, Tnni, as opposed to Myh7, is not expressed at higher levels in a pathological state, 59 and therefore its overexpression cannot be attributed to the cardiac reprogramming that occurs secondary to a failing heart. This observation suggests that the lack of the dgcr8 gene can cause a heterochronic phenotype (which in turn is incompatible with adult heart function). In addition, our in silico analysis (supplemental Table I) suggests that a large number of genes that are expressed in the heart are susceptible to microRNA-mediated regulation. Comparison of temporal mRNA expression profiles in mutant mice and wild-type mice will aid in the identification of the primary targets of the microRNA pathway as well as provide evidence for the existence of cardiac heterochrony in the mutant mice. In summary, we have, through complementary approaches, ascertained the importance of the microRNA pathway in maintaining cardiomyocyte function. We have identified high abundance microRNAs in the heart by performing intratissue comparisons and the preeminent position of miR-1 within muscle tissue has been quantitatively established. In addition, the requirement of the microRNA pathway in cardiac muscle maintenance has been unequivocally established (as its absence is lethal). Based on the results described, we suggest 2 distinct but related mechanisms to explain the drastic loss of cardiac function. The first one implicates fast skeletal muscle gene expression as a plausible causative factor in loss of cardiac function. Second, the loss microRNA function maybe causing cardiac heterochrony, which ultimately leads to heart failure. In addition, our data from the deep sequencing suggests that loss of a few microRNAs-including miR-1 and miR-133a may ultimately be responsible for the dramatic loss of function seen in Dgcr8 deficient cardiomyocytes. Acknowledgments We thank Ron Kahn (Joslin Diabetes Center) for the MCK-Cre line and members of the Lodish and Bartel laboratory for their insightful comments. We are indebted to Carsten Russ and the Broad Sequencing Platform for carrying out Illumina/Solexa sequencing runs at the Broad Institute. Sources of Funding This work was supported by the following grants: PKR (Muscular Dystrophy Association-3882), RB (NIH K08 NS48118-01, NIH Rao et al RO1 NS057221, Stem Cell Research Foundation and the Pew Scholars Program in the Biomedical Sciences); HFL (NIH-ROt DK068348-04 and a SPARC grant from the Broad Institute); MK (NIH-HL52212); HFL & MK (NIH/NHLBI P01 - HL066105); RJ (NIH RO1-CA087869, NIH R37-CA084198, and NIH RO1HD0445022); RL (NIH ROls, HL071775, HL088533, HL090884, and HL093148); MB (Max Kade Foundation, Austria); HRC (NIH RO1 GM067031 to David Bartel). Disclosures None. References 1. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004; 116:281-297. 2. Griffiths-Jones S, Saini HK, van Dongen S, Enright AL. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:D154-8. 3. Vasudevan S, Tong Y, Steitz JA. Switching from repression to activation: microRNAs can up-regulate translation. Science. 2007;318:1931-1934. 4. Orom UA, Nielsen FC, Lund AH. MicroRNA-10a binds the 5'UTR of ribosomal protein mRNAs and enhances their translation. Mol Cell. 2008;30:460-471. 5. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of mammalian microRNA targets. Cell. 2003;115:787-798. 6. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15-20. 7. Xiao C, Calado DP, Galler G, Thai TH, Patterson HC, Wang J, Rajewsky N, Bender TP, Rajewsky K. MiR-150 controls B cell differentiation by targeting the transcription factor c-Myb. Cell. 2007;131:146-159. 8. Zhao Y, Ransom JF, Li A, Vedantham V, von Drehle M, Muth AN, Tsuchihashi T, McManus MT, Schwartz RJ, Srivastava D. Dysregulation of cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2. Cell. 2007;129:303-317. 9. Wang S, Aurora AB, Johnson BA, Qi X, McAnally J, Hill JA, Richardson JA, Bassel-Duby R, Olson EN. The endothelial-specific microRNA miR-126 governs vascular integrity and angiogenesis. Dev Cell. 2008;15: 261-271. 10. van Rooij E, Sutherland LB, Qi X, Richardson JA, Hill J, Olson EN. Control of stress-dependent cardiac growth and gene expression by a microRNA. Science. 2007;316:575-579. 11. Thai TH, Calado DP, Casola S, Ansel KM, Xiao C, Xue Y, Murphy A, Frendewey D, Valenzuela D, Kutok JL, Schmidt-Supprian M, Rajewsky N, Yancopoulos G, Rao A, Rajewsky K. Regulation of the germinal center response by microRNA-155. Science. 2007;316:604-608. 12. Yi R, O'Carroll D, Pasolli HA, Zhang Z, Dietrich FS, Tarakhovsky A, Fuchs E. Morphogenesis in skin is governed by discrete sets of differentially expressed microRNAs. Nat Genet. 2006;38:356-362. 13. Chen JF, Murchison EP, Tang R, Callis TE, Tatsuguchi M, Deng Z, Rojas M, Hammond SM, Schneider MD, Selzman CH, Meissner G, Patterson C, Hannon GJ, Wang DZ. Targeted deletion of Dicer in the heart leads to dilated cardiomyopathy and heart failure. Proc Natl Acad Sci U SA. 2008;105:2111-2116. 14. Finnegan EJ, Margis R, Waterhouse PM. Posttranscriptional gene silencing is not compromised in the Arabidopsis CARPEL FACTORY (DICER-LIKEl) mutant, a homolog of Dicer-1 from Drosophila. Curr Biol. 2003;13:236-240. 15. Cuellar TL, Davis TH, Nelson PT, Loeb GB, Harfe BD, Ullian E, McManus MT. Dicer loss in striatal neurons produces behavioral and neuroanatomical phenotypes in the absence of neurodegeneration. Proc Natl Acad Sci U S A. 2008;105:5614-5619. 16. Kobayashi T, Lu J, Cobb BS, Rodda SJ, McMahon AP, Schipani E, Merkenschlager M, Kronenberg HM. Dicer-dependent pathways regulate chondrocyte proliferation and differentiation. Proc Nat Acad Sci U S A. 2008;105:1949-1954. 17. Koralov SB, Muljo SA, Galler GR, Krek A, Chakraborty T, Kanellopoulou C, Jensen K, Cobb BS, Merkenschlager M, Rajewsky N, Rajewsky K. Dicer ablation affects antibody diversity and cell survival in the B lymphocyte lineage. Cell. 2008;132:860-874. 18. da Costa Martins PA, Bourajjaj M, Gladka M, Kortland M, van Oort RJ, Pinto YM, Molkentin JD, De Windt LJ. Conditional dicer gene deletion in the postnatal myocardium provokes spontaneous cardiac remodeling. Circulation.2008;118:1567-1576. microRNAs in Adult Heart Maintenance 593 19. Rao M. Conserved and divergent paths that regulate self-renewal in mouse and human embryonic stem cells. Dev Biol. 2004;275:269-286. 20. Cai X, Hagedorn CH, Cullen BR. Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA. 2004;10:1957-1966. 21. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN. MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 2004;23: 4051-4060. 22. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark 0, Kim S, Kim VN. The nuclear RNase Ill Drosha initiates microRNA processing. Nature. 2003;425:415-419. 23. Denli AM, Tops BB, Plasterk RH, Ketting RF, Hannon GJ. Processing of primary microRNAs by the Microprocessor complex. Nature. 2004;432: 231-235. 24. Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N, Shiekhattar R. The Microprocessor complex mediates the genesis of microRNAs. Nature. 2004;432:235-240. 25. Yi R, Qin Y, Macara IG, Cullen BR. Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev. 2003;17: 3011-3016. 26. Bernstein E, Caudy AA, Hammond SM, Hannon GJ. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature. 2001; 409:363-366. 27. Babiarz JE, Ruby JG, Wang Y, Bartel DP, Blelloch R. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessorindependent, Dicer-dependent small RNAs. Genes Dev. 2008;22: 2773-2785. 28. Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell. 2007;130:89-100. 29. Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass Drosha processing. Nature. 2007;448:83-86. 30. Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N, Degnan BM, Rokhsar DS, Bartel DP. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature. 2008;455: 1193-1197. 31. Bruning JC, Michael MD, Winnay JN, Hayashi T, Horsch D, Accili D, Goodyear LJ, Kahn CR. A muscle-specific insulin receptor knockout exhibits features of the metabolic syndrome of NIDDM without altering glucose tolerance. Mol Cell. 1998;2:559-569. 32. van Rooij E, Sutherland LB, Thatcher JE, DiMaio JM, Naseem RH, Marshall WS, Hill JA, Olson EN. Dysregulation of microRNAs after myocardial infarction reveals a role of miR-29 in cardiac fibrosis. Proc NatI Acad Sci USA. 2008;105:13027-13032. 33. Fish JE, Santoro MM, Morton SU, Yu S, Yeh RF, Wythe JD, Ivey KN, Bruneau BG, Stainier DY, Srivastava D. miR-126 regulates angiogenic signaling and vascular integrity. Dev Cell. 2008;15:272-284. 34. Rinn JL, Snyder M. Sexual dimorphism in mammalian gene expression. Trends Genet. 2005;21:298-305. 35. Bernstein E, Kim SY, Carmell MA, Murchison EP, Alcorn H, Li MZ, Mills AA, Elledge SJ, Anderson KV, Hannon GJ. Dicer is essential for mouse development. Nat Genet. 2003;35:215-217. 36. Harfe BD, McManus MT, Mansfield JH, Hornstein E, Tabin CJ. The RNaselI enzyme Dicer is required for morphogenesis but not patterning of the vertebrate limb. Proc Natl Acad Sci U S A. 2005;102: 10898-10903. 37. Schaefer A, O'Carroll D, Tan CL, Hillman D, Sugimori M, Llinas R, Greengard P. Cerebellar neurodegeneration in the absence of microRNAs. J Exp Med. 2007;204:1553-1558. 38. Chen JF, Mandel EM, Thomson JM, Wu Q, Callis TE, Hammond SM, Conlon FL, Wang DZ. The role of microRNA-1 and microRNA-133 in skeletal muscle proliferation and differentiation. Nat Genet. 2006;38: 228-233. 39. O'Rourke JR, Georges SA, Seay HR, Tapscott SJ, McManus MT, Goldhamer DJ, Swanson MS, Harfe BD. Essential role for Dicer during skeletal muscle development. Dev Biol. 2007;311:359-368. 40. Wang Y, Medvid R, Melton C, Jaenisch R, Blelloch R. DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nat Genet. 2007;39:380-385. 41. Huang QQ, Feng HZ, Liu J, Du J, Stull LB, Moravec CS, Huang X, Jin JP. Co-expression of skeletal and cardiac troponin T decreases mouse cardiac function. Am J Physiol Cell Physiol. 2008;294:C213-C22. 42. Hannenhalli S, Putt ME, Gilmore JM, Wang J, Parmacek MS, Epstein JA, Morrisey EE, Margulies KB, Cappola TP. Transcriptional genomics asso- 594 43. 44. 45. 46. 47. 48. 49. 50. 51. Circulation Research September 11, 2009 ciates FOX transcription factors with human heart failure. Circulation. 2006;114:1269-1276. Rao PK, Kumar RM, Farkhondeh M, Baskerville S, Lodish HF. Myogenic factors that regulate expression of muscle-specific microRNAs. Proc Natl Acad Sci U S A. 2006;103:8721-8726. Liu N, Williams AH, Kim Y, McAnally J, Bezprozvannaya S, Sutherland LB, Richardson JA, Bassel-Duby R, Olson EN. An intragenic MEF2dependent enhancer directs muscle-specific expression of microRNAs 1 and 133. Proc Natl Acad Sci U S A. 2007;104:20844-20849. Davis BN, Hilyard AC, Lagna G, Hata A. SMAD proteins control DROSHA-mediated microRNA maturation. Nature. 2008;454:56-61. Heo I, Joo C, Cho J, Ha M, Han J, Kim VN. Lin28 mediates the terminal uridylation of let-7 precursor MicroRNA. Mol Cell. 2008;32:276-284. Newman MA, Thomson JM, Hammond SM. Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA. 2008;14:1539-1549. Viswanathan SR, Daley GQ, Gregory RI. Selective blockade of microRNA processing by Lin28. Science. 2008;320:97-100. Yi R, Pasolli HA, Landthaler M, Hafner M, Ojo T, Sheridan R, Sander C, O'Carroll D, Stoffel M, Tuschl T, Fuchs E. DGCR8-dependent microRNA biogenesis is essential for skin development. ProcNatl Acad Sci USA. 2008;106:498-502. Wang Y, Baskerville S, Shenoy A, Babiarz JE, Baehner L, Blelloch R. Embryonic stem cell-specific microRNAs regulate the G1-S transition and promote rapid proliferation. Nat Genet. 2008;40:1478-1483. Ikeda S, Kong SW, Lu J, Bisping E, Zhang H, Allen PD, Golub TR, Pieske B, Pu WT. Altered microRNA expression in human heart disease. Physiol Genomics. 2007;31:367-373. 52. Liu N, Bezprozvannaya S, Williams AH, Qi X, Richardson JA, Bassel-Duby R, Olson EN. microRNA-133a regulates cardiomyocyte proliferation and suppresses smooth muscle gene expression in the heart. Genes Dev. 2008;22:3242-3254. 53. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843-854. 54. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B, Hayward DC, Ball EE, Degnan B, Muller P, Spring J, Srinivasan A, Fishman M, Finnerty J, Corbo J, Levine M, Leahy P, Davidson E, Ruvkun G. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature. 2000;408:86-89. 55. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature. 2000;403: 901-906. 56. Decembrini S, Andreazzoli M, Barsacchi G, Cremisi F. Dicer inactivation causes heterochronic retinogenesis in Xenopus laevis. Int J Dev Biol. 2008;52:1099-1103. 57. Caygill EE, Johnston LA. Temporal regulation of metamorphic processes in Drosophila by the let-7 and miR- 125 heterochronic microRNAs. Curr Biol. 2008;18:943-950. 58. Sokol NS, Xu P, Jan YN, Ambros V. Drosophila let-7 microRNA is required for remodeling of the neuromusculature during metamorphosis. Genes Dev. 2008;22:1591-1596. 59. Huang X, Lee KJ, Riedel B, Zhang C, Lemanski LF, Walker JW. Thyroid hormone regulates slow skeletal troponin I gene inactivation in cardiac troponin I null mouse hearts. J Mol Cell Cardiol. 2000;32:2221-2228. :::::::::::::::::::: :_ ............. . ... ... ................. PR E S Molecular Cell Expanding the MicroRNA Targeting Code: Functional Sites with Centered Pairing 1 3 6 Chanseok Shin, 1,2,,s,26 Jin-Wu Nam, 1 ,2,3,6 Kyle Kai-How Farh,1,2, ,4, H. Rosaria Chiang,1,2S Alena Shkumatava, ,2,3 and David P. Bartel1. ,S.* 'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 2 Howard Hughes Medical Institute 3 Department of Biology 4 Division of Health Sciences and Technology Massachusetts Institute of Technology, Cambridge, MA 02139, USA 5 Department of Agricultural Biotechnology, Seoul National University, Seoul, 151-921, Republic of Korea 6 These authors contributed equally to this work *Correspondence: dbartel@wi.mit.edu DOI 10.1016/j.molcel.2010.06.005 SUMMARY Most metazoan microRNA (miRNA) target sites have perfect pairing to the seed region, located near the miRNA 5' end. Although pairing to the 3' region sometimes supplements seed matches or compensates for mismatches, pairing to the central region has been known to function only at rare sites that impart Argonaute-catalyzed mRNA cleavage. Here, we present "centered sites," a class of miRNA target sites that lack both perfect seed pairing and 3'-compensatory pairing and instead have 11-12 contiguous Watson-Crick pairs to the center of the miRNA. Although centered sites can impart mRNA cleavage in vitro (in elevated Mg2*), in cells they repress protein output without consequential Argonaute-catalyzed cleavage. Our study also identified extensively paired sites that are cleavage substrates in cultured cells and human brain. This expanded repertoire of cleavage targets and the identification of the centered site type help explain why central regions of many miRNAs are evolutionarily conserved. INTRODUCTION MicroRNAs (miRNAs) are a class of -22 nucleotide (nt) RNAs that direct the posttranscriptional repression of protein-coding genes (Bartel, 2004). After processing from hairpin precursors, miRNAs are loaded into Argonaute-containing silencing complexes, which downregulate mRNA targets through two distinct modes, either Argonaute-catalyzed cleavage or a second mode that involves mRNA destabilization and translational repression, at least in part through poly(A) shortening (Filipowicz et al., 2008). Argonaute-catalyzed cleavage of the target strand occurs in the context of extensive base pairing, at the linkage joining the ..................... ............ ....................... mRNA nucleotides that pair to miRNA positions 10 and 11 (Elbashir et al., 2001b; Hutvigner and Zamore, 2002; Yekta et al., 2004). In mammals, this slicing activity is catalyzed by Argonaute2 (AGO2), which leaves a 3' hydroxyl on the 5' cleavage fragment and a 5' monophosphate on the other fragment (Liu et al., 2004; Meister et al., 2004; Schwarz et al., 2004). Unlike miRNAs in plants, very few examples of miRNA-dependent cleavage targets have been reported in mammals (Yekta et al., 2004; Davis et al., 2005; Jones-Rhoades et al., 2006). Nonetheless, artificially designed small interfering RNAs (siRNAs) that silence target genes through this mechanism are widely used reagents, illustrating that in principle, the cleavage mode of repression can function in many contexts and for many targets (Elbashir et al., 2001 a). Sites that confer slicing-independent destabilization and translational repression typically pair to the 5' region of the miRNA, centering on miRNA nt 2-7, known as the miRNA seed (Lewis et al., 2005; Bartel, 2009). Introducing an siRNA/miRNA or deleting an endogenous miRNA leads to modest yet detectable changes in the output of hundreds of genes containing seed sites in their 3' UTRs (KrUtzfeldt et al., 2005; Lim et al., 2005; Giraldez et al., 2006; Grimson et al., 2007; Rodriguez et al., 2007; Baek et al., 2008; Selbach et al., 2008). Moreover, most mammalian protein-coding genes are under selection to maintain pairing to the seed of one or more miRNAs, and thousands of genes have also evolved to specifically avoid pairing to the seeds of preferentially coexpressed miRNAs (Farh et al., 2005; Lewis et al., 2005; Stark et al., 2005; Friedman et al., 2009). These observations illustrate both the broad scope of seed-type regulation and the widespread influence of this targeting mode on mRNA evolution. Pairing to the 3' region of the miRNA can supplement seed pairing to enhance target recognition, or it can even compensate for a mismatch to the seed; such sites are known as "3'-supplementary sites" and "3'-compensatory sites," respectively (Bartel, 2009). However, pairing to the 3' region appears to be consequential for relatively few sites (<10%) (Bartel, 2009). In principle, pairing to the central region of the miRNA could also supplement pairing to the other regions of the miRNA, but a role for such pairing has been demonstrated only for sites that Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 789 - - - W-gw - ......................................... UP R E S A Molecular Cell MicroRNA Centered Sites 200 I180 UmiRNA &mer seed-matched site OOpposite 5'-nnnnnnnnnnnnnnACUACCUA-3' mRNA 3'UTR 1111111 miRNA 3'-GGGUUGUUGUACUUUGAUGGAU-5' 4160 '140 e120 Centeredsite 5'-nnnnnnnACAUGAAACUACnnn-3' mRNA 111111llIll 3'-GGGUUGUUGUACUUUGAUGGAU-5' miRNA 100 Cleavagesite 40 E 20(D HOXB mRNA3'UTR 5'-CCCAACAACAUGAAACUGCCUA-3' i1111111IIl o11110111 3'-GGGUUGUUGUACUUUGAUGGAU-5' miR-196a 0 1 2 3 4 5 6 7 8 9 101112131415161718 5 position of4-mer 0. C rbi rI1 rbI r5ff HU 0 ~-0l 2 4 0.2- X 011-mer a Control -02 -0.3. 3 4 7 6 5 8 5' position of 1 1-mer Expression fold change (log2) 0.5 o 11-mer * 0.4" cO MControl 0 0.3- 5 d/ 0.2 zo0.1 - o . 0 7to T 2-0.1- x -0.2-0.3-0.4- 3 4 5 6 7 8 -05 -i.0 5' position of 11-mer 0.0 05 1.0 Expression fold change (log2) G mIR-1transfected Mild-type PLXNA1 PLXNA1 mutant ..R-1 5'-UCCUCAGAUUCACCGC(acgUCUGCGC-3' 5'-UCCUCAGAUUCACCGCGULGCUCUGCGC-3' 5mutant miR-124 1.5 3'-ACCGUAAGUGGCGCACGGAAU-5' MR-124 3'-ACCGUAAGUGGCGCACGGAAU-5' I -wild-type sites transfected sites miR-124 transfected sites miR-124 transfected mutantsites mutant RAPTCR RAPTOR OL 5'-CCCCCAUGGGCACCGCGacgCGCCUGC-3' 5'-CCCCCAUGGGCACCGCGUGCCGCCUGC-3' L1 00llililli o 1ill 00 niR-124 3'-ACCGUAAGUGGCGCACGGAAU-5' 3'-ACCGUAAGUGGCGCACGGAAU-5' miR-124 mutant VAMP1 VAMP1 0.5 5'-GAGCUUUCUCUUCUUUAguaUUUCUAC-3' 5'-GAGCUUUCUCUiCUUIJACAUUUUCUAC-3' 0 1ilillil loo 0 illitilliil100 3'-AUGUAUGAAGAAAUGUAAGGU-5' iR-1 3'-AUGUAUGAAGAAAUGUAAGGU-5' miR-1 ZNF586 ZNF586nmutant 0 5'-GAAUGCUAGCUUCUUUACAUAAAAGAG-3' 5'-GAAUGCUAGCUUCUUUAguaAAAAGAG-3' 101 ottttttttttt PLXNA1 RAPTOR VAMPI ZNF586 101ott1tttt1 3'-AUGUAUGAAGAAAUGUAAGGU-5' miR-1 3'-AUGUAUGAAGAAAUGUAAGGU-5' niR-1 miR-124 targets miR-1 targets Figure 1. Centered Sites Regulate Both mRNA Accumulation and Protein Output (A)Conservation of 4 nt segments of mammalian miRNAs. As schematized in Figure SlA, segments from the mature miRNA (orange) were compared with opposing segments from the other arm of the hairpin (gray). At each position of the mouse miRNAs, the number of segments perfectly conserved in the whole-genome alignments of the other 29 species (Blanchette et al., 2004) isplotted. (B)Spectrum of functional miRNA target sites. (C)Response of HeLa mRNAs with contiguous perfect pairing to 11-mer sites starting at the indicated positions of 78 transfected mi/siRNAs. Plotted are mean fold changes for mRNAs with 3' UTR sites to the cognate mVsiRNAs (gray)and the cohorts of chimeric miRNA-like control sequences (white, error bars indicate standard deviation for ten cohorts). Also plotted isthe mean expression change for mRNAs with a shifted 6-mer 3' UTR site (purple), which includes all mRNAs with 11 -mer sites starting at position 3. To assess statistical significance, the distribution of log2-fold changes for genes with sites was compared with the distribution of log 2-fold changes for genes without sites ("p < 0.05; *p < 0.01; K-S test). (D)In vivo response of mRNAs with 11-mer sites to endogenous miRNAs after loss of all miRNAs. Plotted isthe fold change of zebrafish mRNAs with 11-mer sites to 21 endogenous miRNAs depleted inthe dicer mutant. Otherwise, as in (C). 790 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. ...... ...... ............... . .... .................................. ...... ..,:,:..... ..... .......... PR E S Molecular Cell MicroRNA Centered Sites statistical significance for determining their efficacy. In order to systematically analyze these sites, we therefore compiled additional array data from HeLa experiments with similarly transfected miRNAs and siRNAs (Birmingham et al., 2006; Jackson et al., 2006a, 2006b; Schwarz et al., 2006; Anderson et al., 2008). To ensure that the transfected mVsiRNAs were loaded and active within the silencing complex, the pooled data sets were restricted to those of the 78 HeLa experiments for which the canonical 8-mer 3' UTR site to the transfected mVsiRNA was associated with downregulated mRNAs with high statistical significance (p < 0.0001, K-S test) (Table S1). Testing matches that did not include the canonical seed match to miRNA positions 2-7 showed that perfect 11-mer matches starting at miRNA RESULTS positions 3, 4, and 5 were each significantly associated with repression (Figure 1C), whereas perfect 10-mer matches, A Class of miRNA Target Site The most highly conserved region of metazoan miRNAs is the 5' perfect 9-mer matches, and near-perfect 11 -mer matches (those region containing the seed (Lewis et al., 2003; Lim et al., 2003), with single mismatches or wobbles) were not significantly assowhich is the region most important for recognizing most targets. ciated with repression (Figures S1C and S1D). The efficacy of centered sites matching ectopically introduced The next most highly conserved region spans nt 13-16, which is the region most important for 3'-supplementary and 3'-compen- miRNAs and siRNAs raised the possibility that such sites might satory pairing (Grimson et al., 2007). Despite being less also mediate endogenous miRNA targeting. Array results examconserved than other miRNA regions, we noted that the central ining the effects of miRNA loss in zebrafish embryos lacking region of vertebrate miRNAs is significantly more conserved Dicer provided data on a sufficient number of messages with than is the opposite arm of the pre-miRNA hairpin (Figures 1A centered sites to enable a systematic analysis of targeting interand S1A). Because both arms participate equally in the pairing actions in vivo. miRNAs present at 24 hr postfertilization, the required to form the pre-miRNA hairpin, preferential conserva- developmental stage used for mRNA analysis, were identified tion of the miRNA observed in this region suggested that these by high-throughput sequencing (Table S2). Sites were considcentral nucleotides play a role beyond that of miRNA biogenesis. ered for 21 of these miRNAs for which the canonical 8-mer 3' One such role would be to aid in target recognition, but among UTR sites were significantly associated with mRNAs derethe previously characterized targeting modes, the central region pressed in the dicer mutant (p < 0.01) (Table S2). As observed is known to function only for cleavage sites, which seemed too for the ectopic interactions, perfect 11-mer matches starting at rare to provide the additional selective pressure for conserving miRNA positions 3, 4, and 5 were each associated with signifinucleotide identity at the miRNA central regions. Therefore, we cant repression, although efficacy of sites starting at position 3 searched for another type of site that might explain this preferen- was mostly attributable to overlap with the "shifted 6-mer" seed match (Figure 1D), which comprises pairing to nt 3-8 (Friedtial conservation. Examination of array data investigating the response of man et al., 2009). Perfect 10-mer matches, perfect 9-mer mRNAs after transfecting 11 miRNAs into HeLa cells (Lim matches, and near-perfect 11 -mer matches generally were not et al., 2005; Grimson et al., 2007) revealed a type of site that associated with significant repression, although for a few of the was associated with mRNA downregulation (Figure S1 B). This numerous possibilities examined, marginal significance was site type, which we call the "centered site," was characterized observed (Figures S1 E and S1 F). When considering both ectopic and endogenous interactions, by at least 11 nt of contiguous Watson-Crick base pairing to the center region of the miRNA at either nt 4-14 or 5-15, without contiguous Watson-Crick 3' UTR pairing to the central region of substantial pairing to either the 5' or the 3' ends of the miRNA. the miRNA, at either nt 4-14 or nt 5-15, was unique among the Because of the location and extent of their base pairing, these tested possibilities in that it was both consistently associated sites occupy a position intermediate between seed sites and with mRNA repression and not attributable to overlap with previously described site types. A previous array study had reported the extensively complementary cleavage sites (Figure 1B). Because these sites are relatively rare, pooling of data from a handful of siRNA off-targets with similarly long stretches of multiple miRNA transfections was initially required to achieve contiguous Watson-Crick base pairing, but these sites were mediate Ago-catalyzed cleavage and not for sites that mediate destabilization and translation repression. Here, we describe "centered sites," a class of miRNA target sites that lack both perfect seed pairing and 3'-compensatory pairing and instead have 11-12 contiguous Watson-Crick pairs to miRNA nt 4-15. In the process of characterizing these sites, we found that Mg2+ concentration profoundly influences both the specificity and efficacy of miRNA-directed cleavage, and we performed whole-transcriptome analyses that substantially add to the number of known instances in which metazoan miRNAs direct mRNA cleavage. (E)Reduced levels of HeLa messages with either centered sites or canonical sites to 78 transfected mVsiRNAs. Shown is analysis of microarray data, plotting cumulative changes of mRNAs with single 3' UTR sites of the indicated types. Canonical sites (8-mer, 7m8, 7A1, and 6-mer) had either 8, 7, or 6 nt matches centered on the miRNA seed (Bartel, 2009); centered sites had 11 contiguous Watson-Crick pairs to miRNA positions 4-14 or 5-15; control sites were centered sites to the chimeric miRNA-like control sequences (combining results for all ten cohorts). (F)Efficacy of endogenous centered sites invivo. Shown isanalysis of microarray data, plotting cumulative changes of zebrafish mRNAs with single 3' UTR sites to 21 miRNAs depleted inthe dicer mutant. Otherwise, as in (E). (G)miRNA-mediated repression at centered sites. Shown isthe fold repression of luciferase reporter genes fused to 3' UTR fragments of the indicated genes with the indicated sites or mutant sites. Plotted are the geometric means, normalized to the geometric means observed for reporters with mutant sites. Error bars represent the second largest and the second smallest values among 12 replicates (from four independent experiments). Statistical significance is indicted (**, pvalue < 0.001, Wilcoxon rank-sum test). See also Figure S1 and Tables S1-S4. Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 791 1111- t . .. .. .. .... ...... 't " _.- _ .................................. PR E SU A K89 Molecular Cell MicroRNA Centered Sites 5'-GAUAAAAAUUCAGUCUGAUAACCUCAAA-3' FL9 II lllillllIll I Figure 5'-GUUGGCCCACUAGUCUGAUAAGAAGUCU-3' Il 011111111111 2. miRNA-Directed Cleavage at Centered Sites (A)Centered sites for miR-21 or let-7g within 3' UTR fragments of the indicated mRNAs tested in assays. K89, KIAA1189; FL9, GSTM3 5'-AGAAGUUUUUCAGUCUGAUAACUAUUGA-3' NFI 5'-CGAAAAUGGCAAACUACUACUACUACU-3' cleavage 011l11111l110 1 0 IllIllllIll 0 FLJ40919; NFI, NFIA (sequences provided in 3'-UGACAUGUUUGAUGAUGGAGU-5' let-7g 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 Supplemental Information). (B) Cleavage of target fragments directed by endogenous miRNAs invitro. 5'-Cap-radiolabeled K89-21as FL9-21as FL9 NFI-let-7g as NFl K89-21as K89 GSTM3 mRNA fragments were incubated in HeLa cyto0 3 150 3 15 0 3 1520 0 7.51520 0 3 15 0 3 15 0 3 15 0 3 15min plasmic S100 extract for the indicated time and analyzed on denaturing gels. As a control, fragments modified to be-fully complementary to the cognate miRNA, designated as antisense (as) substrates, were tested and analyzed in parallel. Whereas most fragments were cleaved predominantly at the expected site, NFA was cleaved at two positions (*, Figure S2), as is sometimes observed invitro (Martinez and Tuschl, 2004). (C)miR-21 -directed cleavage of GSTM3 mRNA in HeLa cells. 5'-RACE using primers specific for GSTM3 mRNA was performed on mRNA isolated from HeLa cells treated with two siRNAs targeting GSTM3 5'-AAGUUUUUCAGUCUGAUAACUAUUGAUAUAAUUUCCA-3' XRN1. Seven of nine sequenced clones mapped to 0 lllllllill 0 the position expected for miR-21 -directed miR-21 3'-AGUUGUAGUCAGACUAUUCGAU-5' cleavage. The other two clones mapped 52 nt 16 5 downstream. 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 offset further toward the 3' end of the miRNA, at nt 6-16 or 7-17 (Jackson et al., 2003), a region not significantly associated with targeting when examined using our larger data sets. Effective miRNA target-prediction algorithms rely heavily on perfect pairing to the seed region and thus miss this additional class of targets (Bartel, 2009). The transfected mVsiRNAs had an average of 11 and a median of eight centered sites in 3' UTRs of human mRNAs. About one-quarter of the mRNAs with a centered site lacked conventional seed sites to the transfected RNA and were sufficiently expressed in HeLa such that changes could be accurately measured on the arrays. Analysis of cumulative distributions of log-fold changes indicated that >20% of these mRNAs responded to the transfected mi/siRNAs in a manner attributable to the site, with a lower bound for site efficacy, resembling that of canonical 7-mer sites (Figure 1E). Likewise, >30% of the endogenous centered sites analyzed appeared to mediate repression in zebrafish embryos (Figure 1F). To examine whether centered sites also function in other animals, we analyzed mRNA array data sets monitoring the impact of knocking down proteins required for Drosophila miRNA biogenesis (Kadener et al., 2009). Following either Drosha or Dicer1 knockdown in Drosophila S2 cells, messages with 3' UTR centered sites matching the endogenous miRNAs had a significant propensity to be derepressed (Figures S1G and S1 H, p = 0.00045 and 0.027, respectively, for Drosha and Dicer1 knockdown data sets; Tables S3 and S4). To confirm that centered sites can be directly targeted by miRNAs, luciferase reporter constructs and their mutant counterparts with disrupted pairing were prepared and tested in both HeLa cells and S2 cells (Figures 1G and S11). For three of the four UTR fragments tested, the sites reduced protein output in a manner that depended on the presence of both the wild-type 792 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. site and the cognate miRNA. Taken together, the reporter and microarray results suggested that the centered site is a miRNA target site capable of downregulation comparable with that observed for single 7 nt seed sites. Although they are much less abundant than both seed-matched sites and sites with 3'-supplementary pairing, centered sites are present in numbers similar to 3'-compensatory sites and could help explain the preferential conservation observed in the central region of most miRNAs. miRNA-Directed Cleavage Detected at Centered Sites Because of pairing to the central region of the miRNA, centered sites might be subject to AGO2-dependent cleavage similar to that occurring for known cleavage sites of plants and animals, which are more extensively paired (Yekta et al., 2004; Davis et al., 2005; Jones-Rhoades et al., 2006). To test this possibility, we employed an in vitro cleavage assay using S100 extract prepared from HeLa cells (Martinez and Tuschl, 2004; Shin, 2008), focusing on mRNA fragments containing centered sites for miR-21 or let-7g miRNA, which are abundant in HeLa cells (Figure 2A and Table S10). Cleavage was observed at the position expected for AGO2-catalyzed cleavage of the centered sites (Figure 2B). To examine whether cleavage was also occurring in the cells, we tested for miR-21 -directed cleavage of GSTM3 mRNA (moderately expressed in HeLa cells) using RNA ligase-mediated rapid amplification of cDNA 5' ends (5'-RACE). By directly cloning and sequencing the 5' end of the 3' cleavage product, this assay can be used to validate miRNA-directed cleavage (Llave et al., 2002; Yekta et al., 2004). To increase the sensitivity of the assay, XRN1, the 5' -- 3' exonuclease responsible for degrading the 3' cleavage product (Souret et al., 2004; Orban and Izaurralde, 2005), was knocked down (Alemdn et al., 2007). . .. ........... .. ................................................ ..... . . ... ...... ...... ........... ::...................... ... ............ .................. ::_, PR E S Molecular Cell MicroRNA Centered Sites 5'-RACE fragments within -50 bp of the expected cleavage site were cloned for sequencing. The 5' ends for seven of nine sequenced clones precisely matched that expected for cleavage at the centered site inthe cell (Figure 2C). These results indicated that for an endogenous mRNA targeted at a centered site by an endogenous miRNA, at least some transcripts underwent AGO2-catalyzed cleavage in the cell. Pairing Requirements for Cleavage Are Sensitive to Mg 2* Concentration To understand the specificity of cleavage at centered sites, miR-21 recognition of the K89 mRNA fragment (Figure 2) was examined further. The K89 RNA sequence, which was perfectly complementary to positions 5-16 of miR-21, was systematically mutated at each nucleotide corresponding to miR-21 positions 1-16, substituting an A:C mismatch or a G:U wobble for each Watson-Crick match and substituting a Watson-Crick match for each of the two mismatches (Figure 3A). When using 5.8 mM Mg2+, as in Figure 2B, or 2.2 mM Mg2+, both of which were within the ~2-6 mM range used previously to study in vitro cleavage (Martinez and Tuschl, 2004; Gregory et al., 2005; Maniataki and Mourelatos, 2005; Miyoshi et al., 2005; Rand et al., 2005; Ameres et al., 2007; Wang et al., 2009a, 2009b), cleavage was retained after changing positions outside of the centered site and was reduced after changing most positions within the centered site, although wobble pairs were tolerated at positions 6 and 8 (Figure 36, top two panels). Mg2+ is essential for the in vitro cleavage reaction (Schwarz et al., 2004) but also has a striking effect on the relative stabilities of matched and mismatched RNA duplexes (Serra et al., 2002). Indeed, lowering the Mg2+ concentration increases the fidelity of RNA 2'-0-methylation, another reaction specified by WatsonCrick pairing between small guide RNAs and their targets (Appel and Maxwell, 2007). We found that lowering Mg2* gave maximal target RNA cleavage specificity and efficacy for substrates that were extensively paired to miR-21, whereas higher Mg2+ was optimal for more weakly pairing substrates (Figure 36). For example, the cleavage of K89-21 as RNA, which is fully paired to the miRNA, was the most efficient at 0.3 mM Mg2*, whereas cleavage of the wild-type K89 substrate containing the centered site with only 12 contiguous pairs was undetectable at 0.3 mM Mg2+ and most efficient at 5.8 mM Mg2+; K89m4GC, which had an intermediate number of contiguous pairs, had an intermediate Mg2+ optimum (Figures 3B and 3C). The free Mg2+ levels in the cytoplasm of various cells and tissues is less than 1 mM (Gunther, 2006), a concentration at which we found that efficient cleavage required pairing more extensive than that of typical centered sites (Figure 36). Nonetheless, some cleavage at the centered site was detected at physiological Mg2* concentrations (Figure 36) (0.75 mM Mg24), which explained why the 5'-RACE assay yielded fragments diagnostic of miR-21-directed cleavage in the cell (Figure 2C). Whole-Transcriptome Analysis of miRNA-Directed Cleavage The poor efficacy of cleavage at the centered site at physiological Mg2+ concentration called into question whether miRNAdirected cleavage plays a consequential role during repression ....... mediated by centered sites and suggested that most repression at centered sites might resemble the destabilization and translational repression observed for most seed-matched targets. To better characterize the scope of miRNA-directed cleavage in mammals and to examine the extent to which cleavage at centered sites is relevant to target gene regulation in vivo, we applied degradome sequencing to mammalian cells. Degradome sequencing generates short sequence tags representing the 5' ends of uncapped mRNA fragments found in the cell (Addo-Quaye et al., 2008; German et al., 2008). Although these fragments are predominantly 5' -+ 3' exonuclease degradation intermediates, they also include 3' fragments of Argonaute-catalyzed mRNA cleavage in sufficient numbers to enable empirical detection of endogenous cleavage targets of plant miRNAs and siRNAs (Addo-Quaye et al., 2008; German et al., 2008). Inspired by this success in plants and the ability to detect miR21-directed cleavage by 5'-RACE, we applied the method to HeLa cells following XRN1 knockdown by RNAj (Figure S3A). Sequencing yielded 14,323,668 tags mapping to the human genome, with a diversity of 2,069,190 unique tag sequences. Of the total tags, 61.2% came from protein-coding genes and represented 36,806 out of 46,319 ENSEMBL mRNAs (Figure 4A). The tags showed a relatively uniform distribution across the mRNAs, with averystrong peak atthe 5' terminus (Figure4B). About 30% of tags were not classified because they did not map to mature annotated RNAs (Figure 4A). Many of these were from introns and processing fragments from pri-miRNAs, mitochondrial tRNAs, ribosomal RNAs, and snRNAs, illustrating how unstable 3' products of endonucleases can be detected in mammalian cells by using degradome sequencing (Tables S5 and S6). To determine if miRNA centered sites were associated with cleavage at the expected position within the mRNA 3' UTR, we searched for centered matches to 50 distinct, conserved miRNAs most highly expressed in HeLa cells and tabulated the frequency of degradome tags corresponding to mRNA cleavage at the tenth position of these sites. Tags corresponding to cleavage at the expected position were found much more frequently for authentic miRNA:site pairs than for negativecontrol pairs (Figure 4C). However, when we excluded miR196a, miR-151, and miR-28, which target several extensively paired sites, the signal above background was greatly reduced, suggesting that most centered sites lacked the complementarity required for robust miRNA-directed cleavage (Figure 4C). The abundance of degradome tags mapping to the expected cleavage sites of the siRNAs targeting XRN1 illustrated that the method can identify tags diagnostic of AGO2-catalyzed cleavage in human cells (Figure S3). These results supported those from the invitro cleavage assays (Figure 36) in suggesting that under physiological Mg2+ conditions, the mRNA downregulation mediated by centered sites is usually accompanied by very little AGO2-catalyzed cleavage. Genome-wide Search for miRNA:site Duplexes with High Complementarity Our observation of significant cleavage at the small subset of centered sites with unusually extensive complementarity to the miRNA indicated that miRNA-directed cleavage at extensively paired sites was more frequent in animals than had been Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 793 E SU PR A K89 Molecular Cell MicroRNA Centered Sites K89-21as 5'-CUCUUUUUCACUGUAGAAUAAUGUGGAAAUAACCCUAGAUAAAAAUUCAGUCUGAUAACCUCAAAUCAAAAAGCUUUA-3' i 5'-GAUUCAACAUCAGUCUGAUAAGCUAAAA-3' li'lIIAlAG Giliilli 3 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 IiIlIIIIIII i1 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-ml 5' -GAUAAAAAUUCAGUCUGAUAACCUAAAA-3' Il IIiIlIIIII II 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm2 0 5'-GAUAAAAAUUCAGUCUGAUAACCCCAAA-3' |1 lllillilIll I 3' -AGUUGUAGUCAGACUAUUCGAU-5' miR-21 0 O06 K89-mm3GU 5'-GAUAAAAAUUCAGUCUGAUAACUUCAAA-3' |1 W 0 l1ilIllll 01 E E E4H 5' -GAUAAAAAUUCAGUCUGAUAAACUCAAA-3' il 11 IiIIIlIIIIII 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 E il I ilIlIlI K89-mm5GU 5.8 mM 20 [MgCl 2] 10 Of -l Il 2.2mM -- ... - 20 [MgCl 2] IlIllllillo II ----50 50 l 5'-GAUAAAAAUUCAGUCUGAUAGCCUCAAA-3' 00 30 5'-GAUAAAAAUUCAGUCUGAUAACCUCAAA-3' 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 03 : EEEEEEEEEEEEEEEE EEEEEEEEEEEEEEEEE 0 K89-m4GC M Er 3'-ACUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm4 15 min 0 15 min 10 loop-~ 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm6GU 5'-GAUAAAAAUUCAGUCUGAUGACCUCAAA-3' |I IIIIIlIIOI 11 miR-21 3' -AGUUGUAGUCAGACUAUUCGAU-5' K89-mm7 20 10 5'-GAUAAAAAUUCAGUCUGACAACCUCAAA-3' || 0 40 0.75 mM 111111111 il I1 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm8GU Il tiIIIIIOIII 11 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm9 32 30 5'-GAUAAAAAUUCAGUCUGGUAACCUCAAA-3' 0.5 mM [MgCl2] 20 10 5'-GAUAAAAAUUCAGUCUAAUAACCUCAAA-3' lill li Il 11111ll 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mmlO 5'-GAUAAAAAUUCAGUCCGAUAACCUCAAA-3' il 1il111 11ll1|| 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 30 0.3 mM [MgC 2] 20 0 K89-mmliGU 5'-GAUAAAAAUUCAGUUUGAUAACCUCAAA-3' 10 * 0 - - - - 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 189-mm12 5'-GAUAAAAAUUCAGCCUGAUAACCUCAAA-3' || il lill1111111 5-16match, K89 1-21match, K89-21as 2-16 match, K89-m4GC 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm13 5'-GAUAAAAAUUCAAUCUGAUAACCUCAAA-3' |I lIl11ll11li il 40 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mml4GU 5'-GAUAAAAAUUCGGUCUGAUAACCUCAAA-3' || 11011111111 il 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm15GU 5'-GAUAAAAAUUUAGUCUGAUAACCUCAAA-3' Il lOlllll1ll 0 2 U- || 3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21 K89-mm16 5'-GAUAAAAAUCCAGUCUGAUAACCUCAAA-3' I1 IIIIIlIIIII II 0.25 3' -AGUUGUAGUCAGACUAUUCGAU-5' miR-21 0.5 1 mM [Mg2+], 2 Figure 3. Pairing Requirements for Cleavage at a Centered Site and the Influence of Mg . Concentration (A)Sequences used to examine pairing requirements for cleavage. Sequences were derivatives of the K89 3' UTR fragment, a miR-21 target. K89-21 as, fully complementary version of K89; K89-ml, matched to position 1 of miR-21; k89-mm2, A:C mismatch at position 2; K89-mm3GU, G:U wobble pairing position 3. 2 (B)The influence of Mg + on cleavage specificity and efficacy in vitro. Reactions were performed as in Figure 2B, using the substrates depicted in (A), with the indicated Mg2+concentrations. Quantification of the fraction cleaved isplotted on the right. 2 (C)Plot of the effect of Mg *on cleavage efficacy for the model centered site (K89), a more extensively paired site (K89-m4GC), and a fully paired site (K89-21 as). appreciated. This insight prompted a systematic examination of mammalian sites with extensive miRNA complementarity of the type that would mediate cleavage in plants but might not have 794 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. fulfilled our criteria for classification as centered sites because they either had perfect seed pairing or lacked 11 contiguous pairs within positions 4-15. ............... .......... . .. ...... . ... PR E S Molecular Cell MicroRNA Centered Sites 0 mRNA MAntisense 0 MTRNAs o Pseudogene * rRNA STransposon 0.6% 1 0.9% OtherncRNA unmiNA 1.2% 0 Unclassified 3.6%1.4% Relative position on mRNAs - -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Relative cleavage sites -8 -30-28-26-24-22-20-18-16-14-12-10 Figure 4. Rare miRNA-Directed Cleavage Detected by Degradome Sequencing (A)Mapping of HeLa degradome sequencing tags to the transcriptome. Antisense corresponds to tags mapping antisense to annotated mRNAs, noncoding RNAs (ncRNAs), pseudogenes, and mitochondrial (MT) RNAs. Coverage indicates the fraction ofthe 46,319 annotated mRNAs that were represented by at least one tag. Unclassified tags mapped primarily to introns and 3' flanking regions of ncRNAs (Table S5). (B)The distribution of degradome tags along the length of the mature mRNAs. mRNAs were split along their length into 100 bins, and tag 5' ends were tallied for each bin. Shown are the aggregate tallies for all mRNAs. (C)Search for evidence of cleavage at centered sites. Plotted are the numbers of interactions with evidence for cleavage at the expected position (0)or at positions either upstream (negative values) or downstream (positive values). Interactions were counted if a centered site matching a conserved miRNA expressed in HeLa had a tag supporting cleavage at the indicated position (blue). Analysis excluding miR-196a, miR-28, and miR-151-5p isalso shown (red). As a control, the analysis was repeated using ten cohorts of artificial tags, generated by randomly positioning tags on mRNAs (gray; error bars, standard deviation). See also Tables S5 and S6. To search for potential cleavage sites in mammals, we used a scoring rubric similar to those that successfully identify miRNA target sites in plants (Figure 5A) (Jones-Rhoades and Bartel, 2004; Allen et al., 2005). The search yielded 106 predicted miRNA:site duplexes scoring 2.0 (Figure 5B), including 47 in annotated ORFs, 16 in 5' UTRs, and 43 in 3' UTRs (Table S7). At the mid-to-higher penalty scores, sites were no more abundant than expected by chance, but at scores <3.0, sites were at least 1.5-fold enriched compared to the control sets of chimeric miRNAs constructed so as to preserve the seeds as well as the overall dinucleotide and trinucleotide compositions of authentic miRNAs (Figure 5C). Repeating the analyses with annotated murine miRNAs yielded analogous results (Figures S4C-S4E and Table S8). The higher abundance of extensive matches to miRNAs compared to that of controls might indicate biological function. However, eukaryotic genomes, complex tapestries containing remnants of innumerable duplications and repetitive elements, are far from random, and thus this abundance might simply be a consequence of the miRNAs and sites sharing common ancestry. To distinguish between these possibilities, we examined the conservation of orthologous sites in five mammalian species, as assessed using a conservation-alignment (CA) score (Figure 5D). When applied to sites for distinct miRNAs conserved throughout mammals, 17 miRNA:site duplexes had CA scores 3.0 (Figure 5E), most of which were unlikely to be conserved by chance (Figure 5F). Four of the 17 top-scoring sites were miR-151-5p targets (Table S9). Cleavage at Highly Complementary Predicted Duplexes Having found evidence that the most extensively paired sites were more abundant and more conserved than expected by chance, we retumed to the degradome sequencing data to search for evidence that these sites were cleaved in the cell. Because the degradome sequencing data included intermediates of normal mRNA decay, steps were taken to distinguish AGO2 cleavage products from other decay intermediates. To do this, we considered the tag possession ratio (TPR), which represented the proportion of predicted miRNA:site duplexes that were represented by tags at the expected cleavage site (Figure 6A). When focusing on the miRNAs and mRNAs expressed in HeLa, miRNA:site duplexes with alignment penalty scores 2.5 possessed significantly more cleavage tags at the expected cleavage site than did control duplexes (Fisher's exact test, p = 1.1 x 10-04) (Figure 6B and Table S1 1). Even after excluding tags mapping to multiple loci, this TPR difference remained both substantial and significant (p = 2.6 x 10~4 (Figure 6C and Table S1 1). miRNA-directed cleavage in Arabidopsis sometimes occurs at ±1 nt from the expected cleavage site (Addo-Quaye et al., 2008). When applying a window of ±1 nt, there was no improvement in the TPR of expressed miRNA:site pairs (Figure S5A and Table S11). As an added control, we repeated the analysis for miRNAs that were not expressed in HeLa cells and found that these miRNAs performed similarly to the chimeric miRNA controls (p = 1.0) and significantly worse than the miRNAs expressed in HeLa cells (p = 5.3 x 10-5). These results strongly indicated that for miRNA:site pairs with favorable alignment scores ( 2.5), most tags at the expected cleavage site did not arise from background 5' -+ 3' degradation but instead were the consequence of miRNA-directed mRNA cleavage. miRNA-Directed Cleavage in HeLa Cells and Human Brain Using an alignment penalty score of 2.5, a threshold at which the cumulative TPR difference between signal and background was most significant in HeLa data (Table S11), we found eight Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 795 UP R E S Molecular Cell MicroRNA Centered Sites D Human 5'p AG-UUAACCUGGAAUACUUG 3'p | 011-|||I |-| | ||| 5 p 3'p UCGGAUAGGACCUAAUGAACUU 5'p Mouse 5'p AG-UUAACCUGGAAUAGUUG 3'p Core region (x2) 4.5 E E8 |1-0||-||||||-||-11|1 mRNA 5'p AG -UUAACCUGGAAUACUUG miRNA 3'p 3'p UCGGAUAGGACCUAAUGAACUU 5'p 10 2 1 0.5 1 pos. 2-21 E 3'p UCGGAUAGGACCUAAUGAACUU 5'p Pig 5'p 3'p UCGGAUAGGACCUAAUGAACUU a O 6.5 1000 7000 Ig 6000 25000 4348 600 4000 =3000 1912f 2000 C / 5'p AG UUAACCUGGAAUAGUGO 3'p E 0 . E 6.5 C M c) 5 16 51 521 M- 0 1 2 3 4 5 7 HHHflrir.~ 8 400 18 c 200 9 10 11 12 13 14 15 Alignment penalty score 0 0 1 3 13 2 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CAscore f lfl 12 10 8 6 4 2 01 512 3 4 5 6 7 8 9 10 11 12 13 14 15 Alignment penalty score 534 5 6 7 8 9 10 11 12 13 14 15 CAscore Figure 5. Enrichment and Conservation of miRNA:site Duplexes with Extensive Complementarity (A)Illustration of the alignment penalty score, used to judge the quality of pairing to miRNAs. Pairing to miRNAnt2-21 was considered, assigning a2-point penalty for each mismatch or alignment insertion/deletion (indel) within the miRNA core (nt 2-13) (Mallory et al., 2004), a 1-point penalty for each mismatch or insertion/ deletion outside the core or each G:U wobble within the core, and a 0.5-point penalty for each G:U wobble outside the core. An additional 1-point penalty was assigned to sites lacking an A across from miRNA nt 1 (Lewis et al., 2005). (B)Distribution of scores for potential miRNA:site duplexes with at least seven consecutive base pairs and 13 base pairs in total. Sites were considered for the 620 distinct human miRNA/miRNA*s annotated inmiRBase, version 11.0, excluding four miRNAs that paired to multiple repeat loci, skewing the distribution to the left (Figures S4A and S4B). (C)Analysis of site enrichment. To estimate the signal-to-background ratio for each score bin, the number of miRNA:site duplexeswas compared with the number of miRNA:site duplexes found when using chimeric control miRNAs (error bars, standard deviation for ten chimeric miRNA cohort sets). (D)Illustration of the conservation alignment (CA)score, used to identify conserved miRNA:site pairs. Alignment penalty scores were considered for human sites aligned in orthologous genomic regions of mouse, rat, dog, horse, and pig, with the second highest (second worst) among the six assigned as the CA score. (E)Distribution of CA scores for miRNA:site duplexes. Sites were considered for 165 distinct miRNAs conserved in mammals. (F)Analysis of preferential conservation of extensively paired sites. To estimate the signal-to-background ratioforeach score bin, thefraction ofmiRNA:site duplexes that were conserved was compared with the fraction of analogous duplexes that were conserved when using chimeric control miRNAs, accounting for the lower abundance of matches to control sequences (error bars, standard deviation for ten chimeric miRNA cohort sets). See also Figure S4 and Tables S7-S9. miRNA-directed cleavage targets with tags precisely at the expected cleavage site (Table 1 and Figure S5B). All eight cleavage sites were in 3'UTRs, and half were conserved inother mammals (Table 1 and Figure S5B). Four of the pairs involved miR-151-5p (Figures 6E-6G and Table 1). miR-196a and its cleavage target HOXB8 are both known to be moderately expressed in HeLa cells (Lim et al., 2005), and as expected, HOXB8 was among the eight (Figure 6H). To extend our results beyond cells in culture, we performed degradome sequencing using poly(A)-selected RNA from whole human brain. Sequencing yielded 9,240,114 reads mapping to the human genome, with a diversity of 2,360,502 unique tag sequences. miRNAs expressed in human brain tissues were found by small-RNA sequencing (Table S12). As in HeLa cells, we found a statistical association between the miRNA:site pairs and cleavage tags for miRNAs and mRNAs expressed in brain 796 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. (Figures 6D and S5D). For pairs with alignment score 3.0, the TPR was significantly higher than for that of the controls (p = 0.008 and p = 0.030, nonexpressed and chimeric controls, respectively) (Table S11). Statistical significance was retained when also including tags mapping 1 nt downstream of the expected cleavage site as diagnostic of cleavage (p = 0.011 and p = 0.013, nonexpressed and chimeric controls, respectively) (Table S11), perhaps because some 5' -+ 3' trimming occurred in the animal, where we could not knock down XRN1 activity. Eleven sites with scores 3.0 had tags suggestive of miRNA-directed cleavage (Table S13) at the expected position (Table 1), and two had tags suggestive of cleavage at position -1 (Figure S5E). Three of the 13 matched miR-151-5p and included N4BP1, which was also identified in HeLa cells. FRS2, a proposed target of miR-182, was also identified in HeLa cells. Four of the miRNA:site pairs identified in brain appeared ... .... nnnn: - ... ::.::::::::::::: :::::::::I I - -:-v:':-' :::-:::::::AW - :: :::::::r:r:-:: PR E S Molecular Cell MicroRNA Centered Sites A TAB mR NA miR-151-5p 5. 3 score: 5 p score:1.5 Expected cleavage site 3 score 21 Y2 1 Tag No tag 4 Smir-151 hairpin Expected cleavage site 5 6f 7 8 9 Conservation ... Repeat 1 L2 repeat (-) L2 repeat (+) B F - 0.45 0.35 0.05 51 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Alignment penaltyscore C 0.45- 0 0.25- $ tp 0 1500 Positionon mRNA N4BPI vs niR-151-5p 6401 0.05- 40- 2 4 D 5 6 7 9 10 11 12 13 14 15 score Alignmentpenalty 1000 2000 3000 4000 5000 6000 7000 Position on mRNA 0.20 1000 500 -2 0.15-0.5 4] LYPD3vsnmi-151-5p cleavage site(exdudingmultiplelodtags) Tagsat expected 0.35. 0 ATPAF1 0.15 -0.05 I LYPD3 N4BPI 0.25 3 & miR-151-5p Tagsat expected cleavagesite mutipleloc tags) cleavagesite(excluding atexpected Tags 0.15- 0.10.05 0. F-0.05 S1 2 3 500 4 5 6 7 8 9 10 12 13 14 15 1000 1500 Positionon rmRNA penaltyscore AJignment Figure 6. Enrichment of Degradome Tags at Sites of Expected miRNA-Directed mRNA Cleavage (A)The tag possession ratio (TPR), used to search for evidence of miRNA-directed cleavage. At each alignment penalty score, the number of miRNA:site duplexes that had at least one tag with its 5'-terminal nucleotide mapping to the expected site of cleavage (across from miRNA position 10) was tallied, as was the number duplexes that lacked a tag indicative of cleavage. The TPR was calculated as the number of duplexes with a tag divided by the sum of all duplexes. (B)Enrichment of tags at the expected sites. TPR values for miRNAs expressed inHeLa (blue) (Table S1 0) were compared to values for miRNAs not expressed in HeLa (red) and values for ten cohorts of chimeric control miRNAs (gray error bar, the standard deviation). (C)Enrichment of tags at expected site after excluding tags mapping to multiple loci. (D)Enrichment of tags at the expected sites, after omitting tags mapping to multiple loci, in human brain. TPR values for miRNAs expressed in human brain (blue line) (Table S1 2)were compared to values for miRNAs not expressed inhuman brain (red) and values for ten cohorts of chimeric control miRNAs (gray; error bar, the standard deviation). (E)The mir-151 hairpin and its cleavage targets. Schematic depicts the mir-151 hairpin, the positions of the two ancestral L2 LINE repeats that gave rise to the hairpin, and the region of high conservation insequenced mammals (Rhead et al., 2010). Once processed from the hairpin, the mature miR-1 51-5p pairs to and directs cleavage of mRNAs in HeLa cells. (F-H) The distribution of degradome tags along the length of the indicated mRNAs (omitting tags mapping to >10 genomic loci). The red peaks are cleavage tags corresponding to cleavage at the site expected forthe indicated miRNA. Shown are results from HeLa cells. Similar graphs are provided forthe other mRNAs with See also Figure S5 and Table S4. evidence of miRNA-directed cleavage in HeLa cells (Figure S5B) and in human brain (Figure 5SE). conserved in other mammals (CA Figure S5E). 5 3.0) (Table 1 and DISCUSSION Centered Sites We present centered sites as a type of miRNA target site. Centered sites contain at least 11 contiguous nucleotides that pair to a miRNA at positions 4-14 or 5-15, a pairing pattern distinct from that of most 3'-compensatory sites and seed sites. However, because a centered site might include additional nucleotide pairing on either side and a 3'-compensatory site might have additional pairing extending into the miRNA central region, there is potential overlap between a few extended centered sites and a few 3'-compensatory sites. Similarly, a seed site might include 3'-supplementary pairing extending Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 797 UP R E S Molecular Cell MicroRNA Centered Sites Table 1. mRNAs with Degradome Tags at the Expected Cleavage Site miRNA miRNA Reads mRNA Location of Site Score Cleavage Tags Degradome Fraction (%) Conserved HeLa Cell 1 1 50.0 Noa miR-196a 3682 HOXB8 3' UTR 0.7 No 34 IGFBP4 3' UTR, LINE L2 2 1 miR-545* miR-28-5p 1829 LYPD3 3' UTR, LINE L2 2.5 0.75 3.4 No 14.1 Yes 2526 N4BP1 3' UTR, LINE L2 2 55 miR-151-5p miR-151-5p 2526 MPL 3' UTR, LINE L2 1.5 0.5 6.9 No No 3' UTR, LINE L2 0 4 17.9 miR-151-5p 2526 LYPD3 miR-151-5p 2526 ATPAF1 3' UTR, LINE L2 1 1 1.4 Yes miR-182 624 FRS2 3' UTR 2.5 1 0.7 Yes Human Brain miR-28-5p 2544 MDGA1 3' UTR, LINE L2 3 1 0.1 No miR-151-5p 33,007 MDGA1 3' UTR, LINE L2 3 9 0.9 No miR-151-5p 33,007 N4BP1 3' UTR, LINE L2 2 6 2.1 Yes miR-873 2033 MAN2C1 ORF 2.5 1 0.2 Yes miR-330-5p 744 FAM62C ORF 3 2 1.3 No No 2.5 1 1.2 miR-95 523 EGLN3 3' UTR miR-182 115 FRS2 3'UTR 2.5 2 1.4 Yes miR-877 41 SPTBN1 ORF 3 2 0.1 No 0.8 No 1014 PMVK 5' UTR 3 2 miR-185 0.4 Yes 593 DCTN4 ORF 2 1 miR-383 0.3 No EFTUD2 ORF 2.5 1 miR-598 8131 Listed are miRNA:site pairs with alignment penalty scores (Score) for which the TPR (counting only tags mapping precisely to the expected cleavage site) significantly exceeded background ( 2.5 in HeLa cells and 3.0 in brain) (Fable S11). The expression of each miRNA is indicated by its miRNA reads in a high-throughput sequencing experiment (Tables S1 0 and S1 2). Cleavage tags were those tags mapping precisely to the expected site of cleavage and were normalized by the number of times they mapped to the genome. For each mRNA, the fraction of degradome tags that were cleavage tags is indicated (Degradome fraction). Sites with CA 53.0 are categorized as conserved (Figures S5B and SSE). aAlthough miR-1 96a:HOXB8 was not classified as conserved using our CA score because the site is missing in pig and horse, this pairing is known to be conserved in more distant lineages, including frog and fish (Yekta et al., 2004). into the miRNA central region, which creates potential overlap between a few extended centered sites and a few 3'-supplemen- tary sites. However, such overlap with previously known site types is very rare. For example, a search of annotated human 3' UTRs revealed that for most human miRNAs, no seedmatched sites extend into centered sites, i.e., most human miRNAs have no 3' UTR match with contiguous Watson-Crick pairing to nt 2-14. Furthermore, conservation analysis and array data show that seed-type targets prefer to acquire supplemental pairing at positions 13-16 rather than extending pairing through nt 9-12 (Grimson et al., 2007). The reason that centered sites had not been described previously can be explained by their relatively low abundance, which resembles that of 3'-compensatory sites and is far lower than that of seed-matched sites. Although no more effective than 7 nt seed-matched sites, centered sites are 4 nt longer, leading to an informational complexity -250-fold (~44-fold) greater than that of 7 nt sites and a correspondingly increased difficulty for their emergence and retention during evolution. The rarity of centered sites hampers statistical assessment of whether they are subject to evolutionary conservation. Nonetheless, the conserved miRNAs of mammals each match an average of 13 centered sites in human 3' UTRs (Figure S1J), and based on our zebrafish analyses, we estimate that on average about two 798 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. sites per miRNA both reside in messages coexpressed with the miRNA and mediate repression. The presence of even a few beneficial interactions (species-specific or more broadly conserved) for a subset of the miRNAs could impart at least intermittent pressure to preserve the miRNA sequence, thereby explaining the preferential conservation observed in the central region of vertebrate miRNAs (Figure 1A). Moreover, centered sites resemble 3'-compensatory sites in providing a mechanism by which different members of the same miRNA seed family can repress distinct targets (Bartel, 2009). Why would centered sites require so much more contiguous pairing than that required by seed sites? When bound by the Argonaute protein within the silencing complex, the seed is thought to be preorganized to favor Watson-Crick pairing to the mRNA (Bartel, 2004). In the current version of this seednucleation model, pairing cannot propagate to the center of a miRNA without a substantial conformational change in which the original contacts between Argonaute and the miRNA central and 3' regions are disrupted (Bartel, 2009). Disrupting these contacts offsets some binding energy gained in forming the central pairs, causing contiguous pairing adjacent to the seed to contribute less affinity than might have otherwise been expected. This lower contribution of pairing to the central region, combined with the higher contribution achieved by the ............................... ...... .............. .................. PR E S Molecular Cell MicroRNA Centered Sites preorganized seed, would explain why so much more pairing is needed for centered sites to achieve the same outcome as 7 nt seed sites. Mg2* Effect on Cleavage Specificity and Efficiency Our results shed light on the biochemistry of RNAi. We suggest that at 370C, in the low Mg2+ concentrations present in the cell, only the extensively paired sites can be bound with the stability and conformation that favors mRNA cleavage, and that after cleavage, the products are not so tightly bound so as to slow multiple turnover. In higher Mg2 +, however, less extensively paired sites achieve the stability and conformation needed for cleavage, and product release is more apt to slow turnover. This model explains the reduction of both specificity and efficiency at extensively paired sites observed in high Mg2. concentrations. Under these conditions, less extensively paired sites are more readily cleaved-hence, the reduced specificity. The more extensively paired sites, on the other hand, undergo slower product release and gain little benefit from this more permissive binding-and-cleavage regime. Indeed, any benefit gained is more than offset by the tighter binding of the miRNA to less extensively paired sites, which causes the total cellular RNA present in extracts used for cleavage reactions to more effectively inhibit utilization of the labeled substrates-hence, the reduced efficiency. The free cytoplasmic Mg2* concentration in most cells and tissues is <1 mM (Ginther, 2006), suggesting that cleavage specificity is very high in vivo. Our results explain previous observations regarding the effects of adding phosphate-containing compounds to in vitro cleavage reactions. Many diverse phosphate compounds, including inorganic monophosphate, stimulate the multipleturnover cleavage by the mammalian silencing complex (Gregory et al., 2005). We suggest that these phosphate compounds titrate the free Mg2+, which in turn increases product turnover through decreased RNA duplex stability. Endogenous miRNA-Directed mRNA Cleavage in Human We find that miRNA-directed cleavage of mammalian mRNAs, although even more rare than repression at centered sites, occurs more frequently than previously appreciated. Two endogenous cleavage targets had been reported in mammals, HOXB8 and RTL1 (Yekta et al., 2004; Davis et al., 2005). We substantially add to this list, with evidence for cleavage of seven additional targets in HeLa cells and cleavage of 13 in human brain, two of which overlapped with HeLa targets. This small overlap, largely attributed to differential expression of the miRNAs or mRNAs in the two samples (Tables 1, S10, S12, and S13), suggests that as more tissues are examined, more cleavage targets will be found. The fraction of degradome sequencing tags that provided evidence of miRNA-directed cleavage was generally higher in the HeLa analysis than inthe brain analysis (Table 1 and Figures S5B and S5E). Inthe brain, this fraction of cleavage tags was sufficiently low so as to suggest that some might represent degradation intermediates not indicative of miRNA-directed cleavage. Whether a smaller fraction of brain messages are cleaved, however, is unclear. The brain analysis lacked the benefit of the XRN1-endonuclease knockdown, designed to stabilize the tran- sient3' cleavage product so that it could be more readily detected over the background of metastable mRNA-decay intermediates. Moreover, whole brain has many cell types, with the possibility that differential expression of a miRNA and its cleavage targets might decrease the signal relative to background. Nonetheless, for most cleavage targets in HeLa and for some in brain, degradome profiles resembled thoseof plant targets with validated biological relevance (Figures 6F-6H, S5B, and S5E) (Addo-Quaye et al., 2008; German et al., 2008), strongly supporting the hypothesis that the miRNA-directed cleavage pathway is an important degradation pathway for those mRNAs. In both brain and HeLa cells, several cleavage targets identified were targets of miR-151-5p. This miRNA derives from a hairpin that has homology to the L2 subclass of repeat elements known as long interspersed nuclear elements (LINEs). L2 LINE elements are remnants of a non-LTR retrotransposon activity present in the common ancestor of mammals. They make up over 3% of the human genome (Kamal et al., 2006). Indeed, the miR-151 hairpin is derived from a tail-to-tail arrangement of two L2 fragments (Figure 6E). Hence, miR-151-5p derived from L2(+) is strongly complementary to several target sites derived from L2(-) repeats. Analogous tail-to-tail arrangements of short (S)INE fragments produce transcripts with longer hairpins that are processed in mouse ESCs into endogenous siRNAs (Babiarz et al., 2008). However, miR-151-5p and miR151-3p are typical miRNAs, in that (1) their accumulation depends on both Drosha/DGCR8 and Dicer endonucleases (Babiarz et al., 2008), (2) they pair to each other with 2 nt 3' overhangs, (3) they are the two dominant products accumulating from the hairpin (Figure S5C), and (4) their hairpin has aconservation pattern typical of other conserved miRNAs (Figure 6E). Two other miRNAs that direct cleavage in HeLa, miR-28-5p and miR-545*, are also L2 repeat-derived miRNAs. The notion that these miRNAs and their targets ultimately derived from the same ancestral elements is reminiscent of the origin of some plant miRNAs, which derive from duplicated fragments of their cleavage targets (Allen et al., 2004; Rajagopalan et al., 2006). In mammals, however, the miRNAs and target sites evolved in parallel from the common ancestor, rather than one from the other. Moreover, in mammals, common ancestry between the miRNAs and their targets can be detected for older, conserved miRNAs, such as miR-1 51 and miR-28, whereas in plants, common ancestry has been detected only for younger, nonconserved miRNAs. The observation that many of the cleaved mRNAs were the targets of repeat-derived miRNAs can be explained by the fact that repeat-derived miRNAs are more likely to encounter extensively complementary matches, since repeat-element remnants are found within many mRNAs. Over the course of evolution, repeat-derived miRNAs presumably had access to a wide variety of cleavage targets, providing the opportunity for some favorable regulatory interactions to emerge and be retained as conserved cleavage interactions. Thus, the repeat-derived miRNAs and their cleavage targets provide yet another avenue for repetitive elements to shape the regulation of cellular genes. Concluding Remarks The discovery of centered sites raises the question of how many additional site types remain to be found. On the one hand, Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 799 ............................ CeU transcriptome/proteome changes observed after introducing or deleting a miRNA can all be explained by direct interactions between the miRNA and messages with the five known site types (seed sites, 3'-supplementary seed sites, 3'-compensatory sites, centered sites, and cleavage sites) combined with indirect effects, as changes in the primary targets influence expression of secondary targets. On the other hand, detailed experimental follow-up on mRNAs that respond to the miRNA despite lacking any of these established site types seems to indicate that some of them should not be dismissed as secondary targets but might instead be direct targets (Lal et al., 2009). However, the pairing schemes proposed thus far for these unusual interactions have not been defined sufficiently to provide predictive utility. That is, incontrast to centered sites and the previously known site types, these pairing schemes lack the specificity required to distinguish other responsive messages with similar pairing from background. Hence, experiments like that shown in Figures 1C-1 F cannot distinguish responsive messages that satisfy these unusual pairing schemes from nonresponsive messages that do not. Perhaps unknown factors binding to neighboring UTR elements help achieve interaction specificity differently for each individual mRNA in a manner too idiosyncratic to be generalized into site types. Alternatively, future insights into miRNA targeting might identify commonalities in these unusual interactions, which could form the basis of novel site types with predictive value. Molecular Cell MicroRNA Centered Sites sequences (miRBase 11.0) were aligned and classified into groups whose members differed from each other at 55 positions. The miRNA with the lowest miRBase annotation number was selected as the representative from each group. For distinct mRNAs, the mRNA isoform with the longest 3' UTR (or, if all 3' UTRs were ofthe same length, a randomly chosen isoform) was selected from a previously filtered set of RefFlat and H-INV annotations (Baek et al., 2008). CA Score To search for orthologous sites, we used 165 distinct miRNAs conserved among mammals and a six-way genome alignment (human, mouse, rat, dog, horse, and pig) from the UCSC Genome Browser (hgl 8, http://genome. ucsc.edu/) (Rhead et al., 2010). Alignment penalty scores were determined, and the second-worst rather than the worst score was selected as the CA score to accommodate some genome-alignment errors, incomplete genome sequences, and species-specific losses. Generation of mIRNA-like Control Sequences To generate controls with the same seed composition and same trinucleotide composition as authentic miRNAs, chimeric miRNA sequences were created by reciprocally recombining, using the link between nt 10 and 11 as the crossover breakpoint, two miRNAs randomly chosen (without replacement) from miRNA pairs with the same dinucleotide at positions 10 and 11,considering only our set of distinct miRNAs. Ten chimeric miRNA cohorts were generated to estimate the signal-to-background ratios. ACCESSION NUMBERS High-throughput raw sequence reads and processed reads are available at the NCBI GEO (accession number GSE22068). EXPERIMENTAL PROCEDURES A detailed description of all materials and methods used can be found inthe Supplemental Information. Microarray and Molecular Analyses Array analyses were as inGrimson et al. (2007). Luciferase reporter constructs were prepared as in Grimson et al. (2007), and assays were performed as in Farh et al., (2005). In vitro cleavage reactions were essentially as in Haley and Zamore (2004) and Shin (2008). Uncapped 5' ends of GSTM3 mRNA degradation products were identified using the 5'-RACE kit (Invitrogen; Carlsbad, CA), as in Jones-Rhoades and Bartel (2004). starting with cells inwhich XRN1 mRNA was knocked down more than 90%, as confirmed by RT-PCR (Alemin et al., 2007). Degradome libraries were constructed essentially as in Addo-Quaye et al. (2008). Small-RNA libraries were prepared for Illumina sequencing as described (Grimson et al., 2008). Analysis of mIRNA Conservation Out of 223 miRNA genomic loci producing 197 mouse miRNAs conserved in other mammals (Friedman et al., 2009), 203 miRNA loci producing miRNAs with 5' ends validated from a large-scale profiling of mouse miRNAs (Chiang et al., 2010) were used inthe analysis of Figure 1A. Processing of Degradome Tags After removing linker sequences and tags shorter than 20 nt, degradome tags were mapped to RNAs annotated inthe ENSEMBL (http://www.ensembl.org/, requiring a perfect match. To find "multiple loci tags" and tags that did not map to annotated RNAs, filtered tags were mapped to the human genome (UCSC Genome Browser, hg18, http://genome.ucsc.edu/). When determining TPRs, filtered tags were mapped to a curated set of distinct mRNAs (Baek et al., 2008). Expressed mRNAs were those represented by at least one degradome tag. mIRNA:site Duplexes When searching for miRNA:site duplexes, distinct mRNAs and miRNAs were selected to avoid over-counting predicted duplexes involving miRNA families or mRNA isoforms. To select distinct miRNAs, all human miRNAs and miRNA 800 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures, Supplemental References, five figures, and 13 tables and can be found with this article online at doi:10.1016[j.molcel.2010.06.005. ACKNOWLEDGMENTS We thank Andrew Grimson, Michael Axtell, Daehyun Baek, and Alexander Subteny for helpful discussions; Shujun Luo and Gary Schroth for Illumina sequencing of the small-RNA library from brain; and the Whitehead Genome Technology Core for the remaining Illumina sequencing. This work was supported by a Damon Runyon postdoctoral fellowship (C.S.) and a grant from the NIH. D.P.B. isa Howard Hughes Medical Institute Investigator. Received: December 22, 2009 Revised: April 27, 2010 Accepted: June 3, 2010 Published: June 24, 2010 REFERENCES Addo-Quaye, C., Eshoo, T.W., Bartel, D.P., and Axtell, M.J. (2008). Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr. Biol. 18, 758-762. Alemdn, L.M., Doench, J., and Sharp, P.A. (2007). Comparison of siRNAinduced off-target RNA and protein effects. RNA 13, 385-395. Allen, E., Xie, Z., Gustafson, A.M., Sung, G.H., Spatafora, J.W., and Carrington, J.C. (2004). Evolution of microRNA genes by inverted duplication of target gene sequences inArabidopsis thaliana. Nat. Genet. 36, 1282-1290. Allen, E., Xie, Z., Gustafson, A.M., and Carrington, J.C. (2005). microRNAdirected phasing during trans-acting siRNA biogenesis in plants. Cell 121, 207-221. Ameres, S.L., Martinez, J., and Schroeder, R.(2007). Molecular basis for target RNA recognition and cleavage by human RISC. Cell 130, 101-112. ............ :.... . ... ... PR E S Molecular Cell MiCroRNA Centered Sites Anderson, E.M., Birmingham, A., Baskerville, S., Reynolds, A., Maksimova, E., Leake, D., Fedorov, Y., Karpilow, J., and Khvorova, A. (2008). Experimental validation of the importance of seed complement frequency to siRNA specificity. RNA 14, 853-861. Appel, C.D., and Maxwell, E.S. (2007). Structural features of the guide: target RNA duplex required for archaeal box C/D sRNA-guided nucleotide 2'-O-methylation. RNA 13, 899-911. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008). Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 22, 2773-2785. Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,281-297. Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233. Birmingham, A., Anderson, E.M., Reynolds, A., lisley-Tyree, D., Leake, D., Fedorov, Y., Baskerville, S., Maksimova, E., Robinson, K., Karpilow, J., et al. (2006). 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat. Methods 3, 199-204. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708-715. Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 992-1009. Davis, E., Caiment, F., Tordoir, X., Cavaills, J., Ferguson-Smith, A., Cockett, N., Georges, M., and Charlier, C.(2005). RNAi-mediated allelic trans-interaction at the imprinted RtI1/Peg1 1 locus. Curr. Biol. 15, 743-749. Elbashir, S.M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., and Tuschl, T. (2001 a). Duplexes of 21 -nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411, 494-498. Elbashir, S.M., Lendeckel, W., and Tuschl, T. (2001b). RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev. 15, 188-200. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821. Filipowicz, W., Bhattacharyya, S.N., and Sonenberg, N.(2008). Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat. Rev. Genet. 9, 102-114. Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105. German, M.A., Pillay, M., Jeong, D.H., Hetawal, A., Luo, S., Janardhanan, P., Kannan, V., Rymarquis, L.A., Nobuta, K., German, R., et al. (2008). Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol. 26, 941-946. Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and clearance of matemal mRNAs. Science 312, 75-79. Gregory, R.I., Chendrimada, T.P., Cooch, N., and Shiekhattar, R. (2005). Human RISC couples microRNA biogenesis and posttranscriptional gene silencing. Cell 123, 631-640. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity inmammals: determinants beyond seed pairing. Mol. Cell 27, 91-105. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455, 1193-1197. G(nther, T.(2006). Concentration, compartmentation and metabolic function of intracellular free Mg2+. Magnes. Res. 19, 225-236. Haley, B., and Zamore, P.D. (2004). Kinetic analysis of the RNAi enzyme complex. Nat. Struct. Mol. Biol. 11, 599-606. Hutvdgner, G., and Zamore, P.D. (2002). A microRNA in a multiple-tumover RNAi enzyme complex. Science 297, 2056-2060. Jackson, A.L., Bartz, S.R., Schelter, J., Kobayashi, S.V., Burchard, J., Mao, M., Li, B., Cavet, G., and Linsley, P.S. (2003). Expression profiling reveals off-target gene regulation by RNAi. Nat. Biotechnol. 21, 635-637. Jackson, A.L., Burchard, J., Leake, D., Reynolds, A., Schelter, J., Guo, J., Johnson, J.M., Lim, L., Karpilow, J., Nichols, K., et al. (2006a). Positionspecific chemical modification of siRNAs reduces "off-target" transcript silencing. RNA 12, 1197-1205. Jackson, A.L., Burchard, J., Schelter, J., Chau, B.N., Cleary, M., Lim, L., and Linsley, P.S. (2006b). Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. RNA 12,1179-1187. Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of plant microRNAs and theirtargets, including astress-induced miRNA. Mol. Cell 14, 787-799. Jones-Rhoades, M.W., Bartel, D.P., and Bartel, B. (2006). MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol. 57, 19-53. Kadener, S., Rodriguez, J., Abruzzi, K.C., Khodor, Y.L., Sugino, K., Marr, M.T., 2nd, Nelson, S., and Rosbash, M. (2009). Genome-wide identification of targets of the drosha-pasha/DGCR8 complex. RNA 15, 537-545. Kamal, M., Xie, X., and Lander, E.S. (2006). A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl. Acad. Sci. USA 103, 2740-2745. KrUtzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan, M., and Stoffel, M. (2005). Silencing of microRNAs in vivo with 'antagomirs'. Nature 438, 685-689. Lal, A., Navarro, F., Maher, C.A., Maliszewski, L.E., Yan, N., O'Day, E., Chowdhury, D., Dykxhoom, D.M., Tsai, P., Hofmann, 0., et al. (2009). miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to "seedless" 3'UTR microRNA recognition elements. Mol. Cell 35, 610-625. Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787-798. Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003). The microRNAs of Caenorhabditis elegans. Genes Dev. 17, 991-1008. Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, D.P., Linsley, P.S., and Johnson, J.M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-773. Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 isthe catalytic engine of mammalian RNAi. Science 305, 1437-1441. Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Mallory, A.C., Reinhart, B.J., Jones-Rhoades, M.W., Tang, G., Zamore, P.D., Barton, M.K., and Bartel, D.P. (2004). MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5' region. EMBO J. 23, 3356-3364. Maniataki, E., and Mourelatos, Z. (2005). A human, ATP-independent, RISC assembly machine fueled by pre-miRNA. Genes Dev. 19, 2979-2990. Martinez, J., and Tuschl, T. (2004). RISC isa5' phosphomonoester-producing RNA endonuclease. Genes Dev. 18, 975-980. Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 801 ................................ ...... ... 1 .1 ..... . UP R E S Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol. Cell 15, 185-197. Miyoshi, K., Tsukumo, H., Nagami, T., Siomi, H., and Siomi, M.C. (2005). Slicer function of Drosophila Argonautes and its involvement in RISC formation. Genes Dev. 19, 2837-2848. Orban, T.I., and Izaurralde, E. (2005). Decay of mRNAs targeted by RISC requires XRN1, the Ski complex, and the exosome. RNA 11, 459-469. Rajagopalan, R., Vaucheret, H., Trejo, J., and Bartel, D.P. (2006). A diverse and evolutionarily fluid set of microRNAs inArabidopsis thaliana. Genes Dev. 20, 3407-3425. Rand, T.A., Petersen, S., Du, F., and Wang, X. (2005). Argonaute2 cleaves the anti-guide strand of siRNA during RISC activation. Cell 123, 621-629. Rhead, B., Karolchik, D., Kuhn, R.M., Hinrichs, A.S., Zweig, A.S., Fujita, P.A., Diekhans, M., Smith, K.E., Rosenbloom, K.R., Raney, B.J., at al. (2010). The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613-D619. Published online November 11, 2009. 10.1093/nar/gkp939. Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007). Requirement of bic/microRNA-155 for normal immune function. Science 316,608-611. Schwarz, D.S., Tomari, Y., and Zamore, P.D. (2004). The RNA-induced silencing complex is a Mg2+-dependent endonuclease. Curr. Biol. 14, 787-791. Schwarz, D.S., Ding, H., Kennington, L., Moore, J.T., Schelter, J., Burchard, J., Linsley, P.S., Aronin, N., Xu, Z., and Zamore, P.D. (2006). Designing siRNAthat 802 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. Molecular Cell MICroRNA Centered Sites distinguish between genes that differ by a single nucleotide. PLoS Genet. 2, e140. Selbach, M., Schwanhdusser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63. Serra, M.J., Baird, J.D., Dale, T., Fey, B.L., Retatagos, K., and Westhof, E. (2002). Effects of magnesium ions on the stabilization of RNA oligomers of defined structures. RNA 8, 307-323. Shin, C. (2008). Cleavage of the star strand facilitates assembly of some microRNAs into Ago2-containing silencing complexes inmammals. Mol. Cells 26, 308-313. Souret, F.F., Kastenmayer, J.P., and Green, P.J. (2004). AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Mol. Cell 15, 173-183. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146. Wang, B., Li, S., Qi, H.H., Chowdhury, D., Shi, Y., and Novina, C.D. (2009a). Distinct passenger strand and mRNA cleavage activities of human Argonaute proteins. Nat. Struct. Mol. Biol. 16,1259-1266. Wang, Y., Juranek, S., U, H., Sheng, G., Wardle, G.S., Tuschl, T., and Patel, D.J. (2009b). Nucleation, propagation and cleavage of target RNAs in Ago silencing complexes. Nature 461, 754-761. Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596.